qwen3.6

Published on April 24, 2026

there is a lot of hype around qwen3.6 right now. so i decided to give it a go. surprisingly it well quite well.

so i installed llama.cpp on my desktop which has at least some gpu - rtx 1660 6gb vram. also i have 32gb of ddr4 slow 2666mhz ram. and run it with llama-server. after some playing with quantalization from unsloth i was able to run it at speed around 20t/s. not fast but makes it usable locally.

usage experience was not good with zed editor so I tried also hyped pi code harness, and it was surprisingly fast and responsive.

so now i have fully local model that helping out with python and some bash scripts. didn't build anything but it feels good to have some sort of a buddy during coding.

final config:

./llama-server \
  -m "../models/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf" \
  --fit on \
  --fit-ctx 32768 \
  --fit-target 256 \
  -np 1 \
  -fa on \
  --no-mmap \
  -ctk q8_0 \
  -ctv q8_0 \
  --temp 0.6 \
  --top-p 0.95 \
  --top-k 20 \
  --min-p 0.0 \
  --presence-penalty 0.0 \
  --repeat-penalty 1.0 \
  --reasoning-budget -1 \
  --chat-template-kwargs "{\"preserve_thinking\": true}" \
  --host 0.0.0.0 \
  --port 8033