local llm on laptop GPU using llama + gemma 4 qat

06 Jun, 2026

install llama.cpp https://llama.app/
llama serve -hf unsloth/gemma-4-12B-it-qat-GGUF
this gives me error: Failed to load CLIP model from
llama serve -hf unsloth/gemma-4-12B-it-qat-GGUF --no-mmproj
https://unsloth.ai/docs/models/gemma-4/qat
Open http://127.0.0.1:8080 test if it works Performance: 9 tokens/sec using Vulkan backend Next steps: try to use opencode with local llm