there is a lot of hype around qwen3.6 right now. so i decided to give it a go. surprisingly it well quite well.
so i installed llama.cpp on my desktop which has at least some gpu - rtx 1660 6gb vram. also i have 32gb of ddr4 slow 2666mhz ram. and run it with llama-server. after some playing with quantalization from unsloth i was able to run it at speed around 20t/s. not fast but makes it usable locally.
usage experience was not good with zed editor so I tried also hyped pi code harness, and it was surprisingly fast and responsive.
so now i have fully local model that helping out with python and some bash scripts. didn't build anything but it feels good to have some sort of a buddy during coding.
final config:
./llama-server \ -m "../models/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf" \ --fit on \ --fit-ctx 32768 \ --fit-target 256 \ -np 1 \ -fa on \ --no-mmap \ -ctk q8_0 \ -ctv q8_0 \ --temp 0.6 \ --top-p 0.95 \ --top-k 20 \ --min-p 0.0 \ --presence-penalty 0.0 \ --repeat-penalty 1.0 \ --reasoning-budget -1 \ --chat-template-kwargs "{\"preserve_thinking\": true}" \ --host 0.0.0.0 \ --port 8033