AMD柠檬水：一款快速、开源的本地LLM服务器，使用GPU和NPU。

AMD柠檬水：一款快速、开源的本地LLM服务器，使用GPU和NPU。
Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

## 柠檬汁：人人都能用的本地AI 柠檬汁是一个免费、开源的AI平台，旨在让您在电脑上直接运行强大的模型——例如gpt-oss-120b，仅需128GB内存。它优先考虑隐私、速度和易用性，拥有1分钟安装和轻量级2MB服务。主要功能包括通过OpenAI API兼容流行的AI应用，自动配置您的硬件（GPU和NPU），以及支持多种推理引擎（llama.cpp等）。柠檬汁通过统一的API处理各种AI任务——聊天、图像生成、视觉、转录和语音合成。柠檬汁由本地AI社区构建，提供内置的GUI用于模型管理和跨平台支持（Windows、Linux、macOS）。它不断发展，并进行频繁的更新和改进。

## 柠檬水：适用于AMD硬件的本地LLM服务器柠檬水是一个新的开源服务器，旨在简化在本地运行大型语言模型（LLM），尤其是在AMD硬件上。在AMD的支持下开发，它旨在成为文本、图像和音频生成的统一运行时，提供单个界面和兼容OpenAI的API端点。用户报告称，在Strix Halo等系统上成功运行了各种模型，包括Qwen3.5，利用GPU和NPU。虽然性能与`llama.cpp`等工具相当，但柠檬水通过管理后端并为AMD硬件提供优化来简化流程。主要功能包括对ROCm、Vulkan和CPU的支持，以及与Ollama和LM Studio等工具的集成。NPU支持虽然很有前景，但目前依赖于专有内核，最适合较小的模型。该项目因其易用性和简化本地AI开发潜力而受到欢迎，尤其是对于那些寻求在AMD系统上获得全面解决方案的人来说。

原文

Open source. Private. Ready in minutes on any PC.

Chat

What can I do with 128 GB of unified RAM?

Load up models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use.

What should I tune first?

You can use --no-mmap to speed up load times and increase context size to 64 or more.

Image Generation

A pitcher of lemonade in the style of a renaissance painting

Speech

Hello, I am your AI assistant. What can I do for you today?

Open Source

Built by the local AI community for every PC.

Lemonade exists because local AI should be free, open, fast, and private.

Built on the best inference engines

Ecosystem

Works with great apps.

Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard.

Tech Specs

Built for practical local AI workflows.

Everything from install to runtime is optimized for fast setup, broad compatibility, and local-first execution.

Native C++ Backend

Lightweight service that is only 2MB.

One Minute Install

Simple installer that sets up the stack automatically.

OpenAI API Compatible

Works with hundreds of apps out-of-box and integrates in minutes.

Auto-configures for your hardware

Configures dependencies for your GPU and NPU.

Multi-engine compatibility

Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more.

Multiple Models at Once

Run more than one model at the same time.

Cross-platform

A consistent experience across Windows, Linux, and macOS (beta).

Built-in app

A GUI that lets you download, try, and switch models quickly.

Unified API

One local service for every modality.

Point your app at Lemonade and get chat, vision, image gen, transcription, speech gen, and more with standard APIs.

POST /api/v1/chat/completions



    
    
      Latest Release
      Always improving.
      Track the newest improvements and highlights from the Lemonade release  stream.

AMD柠檬水：一款快速、开源的本地LLM服务器，使用GPU和NPU。
Lemonade by AMD: a fast and open source local LLM server using GPU and NPU