Alper

local llm on laptop 780M GPU using llama + gemma 4 qat

  1. install llama.cpp https://llama.app/
  2. llama serve -hf unsloth/gemma-4-12B-it-qat-GGUF
    this gives me error: Failed to load CLIP model from
    llama serve -hf unsloth/gemma-4-12B-it-qat-GGUF --no-mmproj
    https://unsloth.ai/docs/models/gemma-4/qat
  3. Open http://127.0.0.1:8080 test if it works Performance: 9 tokens/sec using Vulkan backend Next steps: try to use opencode with local llm

#llm