possible to run on RTX 2060 8GB + 32GB RAM DDR4?
it would be amazing and if it is possible then how much context size I can keep?
This model runned on my Ryzen 5 5600 with Vega 7 GPU and 32 GB RAM DDR4, in RTX 2060 + 32GB RAM it would run easily
quantized yes mainly with kobold or ollama or llm studio or kobold(endpoint)+open gui
kv 4 https://huggingface.co/unsloth/Qwen3.5-9B-GGUF
a run the model https://huggingface.co/mradermacher/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-HERETIC-UNCENSORED-i1-GGUF/tree/main
https://huggingface.co/mradermacher/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-HERETIC-UNCENSORED-i1-GGUF/blob/main/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-HERETIC-UNCENSORED.i1-IQ4_XS.gguf + mradermacher/Qwen3.5-9B-Claude-4.6-HighIQ-THINKING-HERETIC-UNCENSORED-GGUF in kobold + open gui ....very Fast ...my config is Ryzen 5500 +32 RAM drr4 3200 + rtx 3060 12VRM
Your config is good for LLMs. I think i will buy a RTX 2060 to fast my training and inference on LLMs