xero110.com

I have been playing around with llama.cpp and the speed-up is crazy. I use Qwen 3.6 35B A3B which normally runs at about 15 tokens per second and with MTP I get a little over 40 tokens /s. Not only that, but I went from Q5 to the Q5XL with very little loss in speed. So I now have a useful local AI ...

Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding ...

A professional PyQt6-based graphical interface for managing llama.cpp server instances.
https://git.xero110.com/xero110/llama.cpp-GUI/media/branch/main/screenshot.png

Features

Server Binary Selection: Browse and select your llama.cpp server binary
Model Selection: Easy selection of GGUF ...

xero110.com

Search found 3 matches

llama.cpp + MTP is crazy fast

Qwen3.6-35B-A3B

[Realese] (Python) llama.cpp Server GUI