Khoj: The Open-Source AI Second Brain You Can Self-Host
Khoj is an open-source personal AI app that acts as your AI second brain — chat with any LLM, search your documents with semantic AI, build custom agents, and self-host it completely on your own machine.
Part 1: Foundations — The Mental Model
You probably have notes scattered across Obsidian, Notion, PDF research papers, and markdown files. You switch between ChatGPT, Claude, and Gemini tabs, pasting context in by hand. You want an AI that already knows everything you know — but all the big players lock your data into their cloud.
That is exactly the gap Khoj is built to fill.
Mental Model: Think of Khoj as a personal AI brain running on Rails — not a chatbot, but an always-on knowledge assistant that has read every document you’ve ever written, can search the internet, can create autonomous agents, and can do all of this either on your own machine or on Khoj’s cloud, at your choice.
Where most AI tools are stateless (each conversation starts empty), Khoj is stateful and knowledge-indexed. It is your AI that remembers.
Part 2: The Investigation — Architecture Deep Dive
The Big Picture
Khoj is a full-stack Python application built on FastAPI at the core. Here is the high-level flow:
┌──────────────────────────────────────────────────────────────┐
│ Khoj Clients │
│ Web App │ Obsidian Plugin │ Emacs Package │ Phone │ WhatsApp │
└─────────────────────────┬────────────────────────────────────┘
│ REST / WebSocket API
┌─────────────────────────▼────────────────────────────────────┐
│ Khoj Server (FastAPI) │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Indexing │ │ Conversation│ │ Agent Engine │ │
│ │ Pipeline │ │ Router │ │ (Tool + Planner) │ │
│ └──────┬──────┘ └──────┬───────┘ └──────────────────────┘ │
│ │ │ │
└─────────┼────────────────┼────────────────────────────────────┘
│ │
┌─────────▼──────┐ ┌──────▼──────────────────────────────────┐
│ Vector Store │ │ LLM Adapters │
│ (embeddings) │ │ OpenAI │ Anthropic │ Google │ Ollama │
└────────────────┘ └─────────────────────────────────────────┘
Source Code Structure
Khoj’s codebase under src/khoj/ is cleanly organized by concern:
| Directory | Purpose |
|---|---|
routers/ | FastAPI REST & WebSocket endpoints (chat, agents, search, files) |
processor/conversation/ | LLM adapter per provider (OpenAI, Anthropic, Google, Ollama) |
processor/content/ | Document parsers (PDF, Markdown, Notion, Org-mode, Word) |
database/ | Django ORM models — conversations, agents, files, users |
search/ | Semantic search pipeline using sentence-transformers |
routers/api_agents.py | Full REST API for creating and managing agents |
LLM Adapter Pattern
One of Khoj’s most elegant design choices is the LLM adapter pattern. Each provider gets its own module with the same interface:
# src/khoj/processor/conversation/anthropic/anthropic_chat.py
async def converse_anthropic(
messages: List[ChatMessage],
model: Optional[str] = "claude-3-7-sonnet-latest",
api_key: Optional[str] = None,
deepthought: Optional[bool] = False,
tracer: dict = {},
) -> AsyncGenerator[ResponseWithThought, None]:
"""Converse with user using Anthropic's Claude"""
async for chunk in anthropic_chat_completion_with_backoff(
messages=messages,
model_name=model,
temperature=0.2,
...
):
yield chunk
The same pattern is replicated for openai_chat.py, google_chat.py, and ollama_chat.py. The router picks the right adapter at runtime based on the user’s configured model — you swap from GPT-4o to Gemini to Llama 3 without changing any application code.
Document Ingestion Pipeline
Khoj reads your knowledge base and indexes it into a vector store for semantic retrieval:
- PDF →
pypdfparser - Markdown / Org-mode → Plain text extraction
- Notion → Official API integration
- Word → Office XML parser
- Images → Vision LLM description
Everything lands in an embedding vector index (sentence-transformers). When you ask a question, Khoj performs semantic similarity search over your corpus, retrieves the top-k relevant chunks, and passes them as context to the LLM — classic RAG, but deeply integrated.
Part 3: The Diagnosis — What It Does for Developers
Use Case 1: Personal Research Assistant
Load your entire research library — 300 PDFs, 1,000 Markdown notes, every Notion page — and chat with it:
# Sync a local folder of docs
khoj --content-file /path/to/research/
# Or via the web UI: Settings → Files → Upload
Ask: “Which of my papers mentions transformer-based architectures for time-series forecasting?” Khoj retrieves the relevant sections, cites them, and synthesizes a coherent answer.
Use Case 2: Custom AI Agents
Khoj’s agent system lets you create specialized AI personas with their own knowledge base, LLM, system prompt, and tools:
Settings → Agents → Create Agent
- Name: "Python Code Reviewer"
- Model: Llama 3.1 70B (local via Ollama)
- Knowledge Base: your company's internal codebase docs
- Tools: Web Search, Code Execution
- Persona: "You are a strict senior engineer. Review code for security and correctness."
Each agent gets its own chat endpoint. You could have a “Research Analyst” agent reading academic PDFs and a “Marketing Copywriter” agent reading brand guidelines — both running on the same Khoj server.
Use Case 3: Autonomous Research (Scheduled Jobs)
Khoj can act as a proactive assistant:
- Set up a daily automated research task: “Every morning, search for news about AI safety and send me a summary newsletter”
- It browses the web, synthesizes information, and delivers it to your configured channel (email, webhook, etc.)
Use Case 4: Local-First Privacy
For developers who refuse to send data to third-party clouds:
# Run Llama 3 locally via Ollama
ollama run llama3.1
# Point Khoj to it
# In Khoj UI → Chat Models → Add Model → host: http://localhost:11434
Your documents stay on your disk. Your conversations are processed locally. Zero data leaves your machine.
Supported LLMs at a Glance
| Type | Provider | Example Models |
|---|---|---|
| Cloud | OpenAI | GPT-4o, o3-mini |
| Cloud | Anthropic | Claude 3.7 Sonnet |
| Cloud | Gemini 1.5 Pro, Flash | |
| Cloud | Cohere, Mistral AI | Command R, Mistral Large |
| Local | Ollama | Llama 3.1, Qwen, Gemma, DeepSeek |
Part 4: The Resolution — How to Get Started
Option A: Cloud (Zero Setup)
The fastest path — just go to app.khoj.dev and create a free account. No installation needed.
Option B: Self-Host with Docker (Recommended)
mkdir ~/.khoj && cd ~/.khoj
# Download the official compose file
wget https://raw.githubusercontent.com/khoj-ai/khoj/master/docker-compose.yml
# Start everything
docker-compose up -d
Open http://localhost:42110 and you’re in.
Option C: Self-Host with pip (Python Developers)
# Install with local LLM support (llama-cpp-python)
python -m pip install 'khoj[local]'
# Start the server
khoj
For GPU acceleration:
# NVIDIA CUDA
CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 python -m pip install 'khoj[local]'
# Apple M1/M2/M3
CMAKE_ARGS="-DGGML_METAL=on" python -m pip install 'khoj[local]'
Add Your Knowledge Base
After starting Khoj:
- Web App: Go to Settings → Files → drag-and-drop your PDFs, Markdown files, or connect Notion
- Obsidian Plugin: Install the Khoj plugin → it indexes your vault automatically
- CLI sync:
khoj --content-file ~/notes/ --content-file ~/research/*.pdf
Connect Your Preferred LLM
In Settings → Chat Models:
- Add your OpenAI key for GPT-4o
- Add your Anthropic key for Claude
- Point to
http://localhost:11434for Ollama local models
Khoj will route all conversations through whichever model you designate as default.
Final Mental Model
┌────────────────────────────────────────────────────────────┐
│ Khoj │
│ │
│ "Your open-source AI second brain" │
│ │
│ What it IS: │
│ → A self-hostable personal AI app (FastAPI + Python) │
│ → An LLM-agnostic router (GPT, Claude, Gemini, Ollama) │
│ → A RAG pipeline over YOUR documents │
│ → An agent builder with custom knowledge + tools │
│ │
│ What it SOLVES: │
│ → Knowledge fragmented across files, apps, and tools │
│ → Dependency on closed, cloud-only AI services │
│ → Privacy: your data stays on your machine if you want │
│ │
│ What it ENABLES: │
│ → Chat with 1,000s of your own documents │
│ → Local LLMs (Llama, Qwen, DeepSeek) via Ollama │
│ → Autonomous agents that research and deliver newsletters │
│ → Multi-platform: Web, Obsidian, Emacs, Phone, WhatsApp │
│ │
│ Self-host: pip install khoj | docker-compose up │
│ Cloud: app.khoj.dev (free tier available) │
└────────────────────────────────────────────────────────────┘
GitHub: khoj-ai/khoj
Docs: docs.khoj.dev
Live App: app.khoj.dev
Related posts
-
Project N.O.M.A.D.: The Knowledge Bunker You Build for a Rainless Day
When the cloud evaporates, what stays on your disk matters.
-
BitNet: The Era of 1-bit LLMs is Finally Here
Explore bitnet.cpp, Microsoft's official framework for 1-bit LLMs that replaces multiplications with additions for massive speedups.
-
Context Engineering: The Discipline That Separates Good AI Agents from Great Ones
A deep dive into Agent Skills for Context Engineering — the open-source toolkit cited in academic research that teaches you how to curate context windows like a professional AI engineer.
-
Inside the Black Box: What Leaked AI System Prompts Reveal About How Your Favorite Tools Actually Think
A deep-dive into the most comprehensive collection of leaked system prompts from Cursor, Manus, Windsurf, Devin, v0, and 30+ other AI tools — revealing their core architectures, tool designs, and agent philosophies.