local llm on laptop 780M GPU using llama + gemma 4 qat
- install llama.cpp https://llama.app/
llama serve -hf unsloth/gemma-4-12B-it-qat-GGUF
this gives me error: Failed to load CLIP model from
llama serve -hf unsloth/gemma-4-12B-it-qat-GGUF --no-mmproj
https://unsloth.ai/docs/models/gemma-4/qat- Open http://127.0.0.1:8080 test if it works Performance: 9 tokens/sec using Vulkan backend Next steps: try to use opencode with local llm