Search found 3 matches
- Thu Jun 04, 2026 5:24 pm
- Forum: Software
- Topic: llama.cpp + MTP is crazy fast
- Replies: 0
- Views: 24
llama.cpp + MTP is crazy fast
I have been playing around with llama.cpp and the speed-up is crazy. I use Qwen 3.6 35B A3B which normally runs at about 15 tokens per second and with MTP I get a little over 40 tokens /s. Not only that, but I went from Q5 to the Q5XL with very little loss in speed. So I now have a useful local AI ...
- Sat Apr 18, 2026 2:05 am
- Forum: AI News
- Topic: Qwen3.6-35B-A3B
- Replies: 0
- Views: 53
Qwen3.6-35B-A3B
Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding ...
- Sat Apr 18, 2026 1:02 am
- Forum: Software
- Topic: [Realese] (Python) llama.cpp Server GUI
- Replies: 0
- Views: 62
[Realese] (Python) llama.cpp Server GUI
A professional PyQt6-based graphical interface for managing llama.cpp server instances.
https://git.xero110.com/xero110/llama.cpp-GUI/media/branch/main/screenshot.png
Features
Server Binary Selection: Browse and select your llama.cpp server binary
Model Selection: Easy selection of GGUF ...
https://git.xero110.com/xero110/llama.cpp-GUI/media/branch/main/screenshot.png
Features
Server Binary Selection: Browse and select your llama.cpp server binary
Model Selection: Easy selection of GGUF ...