Access large language models from the command-line
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
LLM inference in C/C++
A high-throughput and memory-efficient inference and serving engine for LLMs