Search found 3 matches

by xero110
Thu Jun 04, 2026 5:24 pm
Forum: Software
Topic: llama.cpp + MTP is crazy fast
Replies: 0
Views: 24

llama.cpp + MTP is crazy fast

I have been playing around with llama.cpp and the speed-up is crazy. I use Qwen 3.6 35B A3B which normally runs at about 15 tokens per second and with MTP I get a little over 40 tokens /s. Not only that, but I went from Q5 to the Q5XL with very little loss in speed. So I now have a useful local AI ...
by xero110
Sat Apr 18, 2026 2:05 am
Forum: AI News
Topic: Qwen3.6-35B-A3B
Replies: 0
Views: 54

Qwen3.6-35B-A3B

Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding ...
by xero110
Sat Apr 18, 2026 1:02 am
Forum: Software
Topic: [Realese] (Python) llama.cpp Server GUI
Replies: 0
Views: 62

[Realese] (Python) llama.cpp Server GUI

A professional PyQt6-based graphical interface for managing llama.cpp server instances.
https://git.xero110.com/xero110/llama.cpp-GUI/media/branch/main/screenshot.png

Features

Server Binary Selection: Browse and select your llama.cpp server binary
Model Selection: Easy selection of GGUF ...