Build powerful AI locally, extend anywhere.
LlamaFarm is an open-source framework for building retrieval-augmented and agentic AI applications. It ships with opinionated defaults (Ollama for local models, Chroma for vector storage) while staying 100% extendable—swap in vLLM, remote OpenAI-compatible hosts, new parsers, or custom stores without rewriting your app.
- Local-first developer experience with a single CLI (
lf
) that manages projects, datasets, and chat sessions. - Production-ready architecture that mirrors server endpoints and enforces schema-based configuration.
- Composable RAG pipelines you can tailor through YAML, not bespoke code.
- Extendable everything: runtimes, embedders, databases, extractors, and CLI tooling.
📺 Video demo (90 seconds): https://youtu.be/W7MHGyN0MdQ
Prerequisites:
-
Install the CLI
macOS / Linux
curl -fsSL https://raw.githubusercontent.com/llama-farm/llamafarm/main/install.sh | bash
Windows (via winget)
winget install LlamaFarm.CLI
-
Adjust Ollama context window
- Open the Ollama app, go to Settings → Advanced, and set the context window to match production (e.g., 100K tokens).
- Larger context windows improve RAG answers when long documents are ingested.
-
Create and run a project
lf init my-project # Generates llamafarm.yaml using the server template lf start # Spins up Docker services & opens the dev chat UI
-
Start an interactive project chat or send a one-off message
# Interactive project chat (auto-detects namespace/project from llamafarm.yaml)
lf chat
# One-off message
lf chat "Hello, LlamaFarm!"
Need the full walkthrough with dataset ingestion and troubleshooting tips? Jump to the Quickstart guide.
Prefer building from source? Clone the repo and follow the steps in Development & Testing.
Run services manually (without Docker auto-start):
git clone https://github.com/llama-farm/llamafarm.git
cd llamafarm
# Install Nx globally and bootstrap the workspace
npm install -g nx
nx init --useDotNxInstallation --interactive=false
# Option 1: start both server and RAG worker with one command
nx dev
# Option 2: start services in separate terminals
# Terminal 1
nx start rag
# Terminal 2
nx start server
Open another terminal to run lf
commands (installed or built from source). This is equivalent to what lf start
orchestrates automatically.
- Own your stack – Run small local models today and swap to hosted vLLM, Together, or custom APIs tomorrow by changing
llamafarm.yaml
. - Battle-tested RAG – Configure parsers, extractors, embedding strategies, and databases without touching orchestration code.
- Config over code – Every project is defined by YAML schemas that are validated at runtime and easy to version control.
- Friendly CLI –
lf
handles project bootstrapping, dataset lifecycle, RAG queries, and non-interactive chats. - Built to extend – Add a new provider or vector store by registering a backend and regenerating schema types.
Task | Command | Notes |
---|---|---|
Initialize a project | lf init my-project |
Creates llamafarm.yaml from server template. |
Start dev stack + chat TUI | lf start |
Spins up server, rag worker, monitors Ollama/vLLM. |
Interactive project chat | lf chat |
Opens TUI using project from llamafarm.yaml . |
Send single prompt | lf chat "Explain retrieval augmented generation" |
Uses RAG by default; add --no-rag for pure LLM. |
Preview REST call | lf chat --curl "What models are configured?" |
Prints sanitized curl command. |
Create dataset | lf datasets create -s pdf_ingest -b main_db research-notes |
Validates strategy/database against project config. |
Upload files | lf datasets upload research-notes ./docs/*.pdf |
Supports globs and directories. |
Process dataset | lf datasets process research-notes |
Streams heartbeat dots during long processing. |
Semantic query | lf rag query --database main_db "What did the 2024 FDA letters require?" |
Use --filter , --include-metadata , etc. |
See the CLI reference for full command details and troubleshooting advice.
LlamaFarm provides a comprehensive REST API (compatible with OpenAI's format) for integrating with your applications. The API runs at http://localhost:8000
.
Chat Completions (OpenAI-compatible)
curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What are the FDA requirements?"}
],
"stream": false,
"rag_enabled": true,
"database": "main_db"
}'
RAG Query
curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/rag/query \
-H "Content-Type: application/json" \
-d '{
"query": "clinical trial requirements",
"database": "main_db",
"top_k": 5
}'
Dataset Management
# Upload file
curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/datasets/{dataset}/data \
-F "[email protected]"
# Process dataset
curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/datasets/{dataset}/process
Check your llamafarm.yaml
:
name: my-project # Your project name
namespace: my-org # Your namespace
Or inspect the file system: ~/.llamafarm/projects/{namespace}/{project}/
See the complete API Reference for all endpoints, request/response formats, Python/TypeScript clients, and examples.
llamafarm.yaml
is the source of truth for each project. The schema enforces required fields and documents every extension point.
version: v1
name: fda-assistant
namespace: default
runtime:
provider: openai # "openai" for any OpenAI-compatible host, "ollama" for local Ollama
model: qwen2.5:7b
base_url: http://localhost:8000/v1 # Point to vLLM, Together, etc.
api_key: sk-local-placeholder
instructor_mode: tools # Optional: json, md_json, tools, etc.
prompts:
- role: system
content: >-
You are an FDA specialist. Answer using short paragraphs and cite document titles when available.
rag:
databases:
- name: main_db
type: ChromaStore
default_embedding_strategy: default_embeddings
default_retrieval_strategy: filtered_search
embedding_strategies:
- name: default_embeddings
type: OllamaEmbedder
config:
model: nomic-embed-text:latest
retrieval_strategies:
- name: filtered_search
type: MetadataFilteredStrategy
config:
top_k: 5
data_processing_strategies:
- name: pdf_ingest
parsers:
- type: PDFParser_LlamaIndex
config:
chunk_size: 1500
chunk_overlap: 200
extractors:
- type: HeadingExtractor
- type: ContentStatisticsExtractor
datasets:
- name: research-notes
data_processing_strategy: pdf_ingest
database: main_db
Configuration reference: Configuration Guide • Extending LlamaFarm
- Swap runtimes by pointing to any OpenAI-compatible endpoint (vLLM, Mistral, Anyscale). Update
runtime.provider
,base_url
, andapi_key
; regenerate schema types if you add a new provider enum. - Bring your own vector store by implementing a store backend, adding it to
rag/schema.yaml
, and updating the server service registry. - Add parsers/extractors to support new file formats or metadata pipelines. Register implementations and extend the schema definitions.
- Extend the CLI with new Cobra commands under
cli/cmd
; the docs include guidance on adding dataset utilities or project tooling.
Check the Extending guide for step-by-step instructions.
Example | What it Shows | Location |
---|---|---|
FDA Letters Assistant | Multi-document PDF ingestion, RAG queries, reference-style prompts | examples/fda_rag/ & Docs |
Raleigh UDO Planning Helper | Large ordinance ingestion, long-running processing tips, geospatial queries | examples/gov_rag/ & Docs |
Run lf datasets
and lf rag query
commands from each example folder to reproduce the flows demonstrated in the docs.
# Python server + RAG tests
cd server
uv sync
uv run --group test python -m pytest
# CLI tests
cd ../cli
go test ./...
# RAG tooling smoke tests
cd ../rag
uv sync
uv run python cli.py test
# Docs build (ensures navigation/link integrity)
cd ..
nx build docs
Linting: uv run ruff check --fix .
(Python), go fmt ./...
and go vet ./...
(Go).
- Discord – chat with the team, share feedback, find collaborators.
- GitHub Issues – bug reports and feature requests.
- Discussions – ideas, RFCs, roadmap proposals.
- Contributing Guide – code style, testing expectations, doc updates, schema regeneration steps.
Want to add a new provider, parser, or example? Start a discussion or open a draft PR—we love extensions!
- Licensed under the Apache 2.0 License.
- Built by the LlamaFarm community and inspired by the broader open-source AI ecosystem. See CREDITS for detailed acknowledgments.
Build locally. Deploy anywhere. Own your AI.