Activate Gemma 4 MTP
Update llama first, tried with llama b9704
llama serve -m "..\gemma-4-12B-it-qat\gemma-4-12B-it-qat-UD-Q4_K_XL.gguf" --spec-type draft-mtp --spec-draft-n-max 2 --spec-draft-model "..\gemma-4-12B-it-qat\mtp-gemma-4-12B-it.gguf"
Using MTP increases performance from 10 to 14.8 tokens per second. Someone recommends trying --spec-draft-n-max 3 for coding workloads