Karpathy Just Showed Everyone How to Actually Use LLMs for Research

On April 2, 2026, Andrej Karpathy posted something that got 14 million views in a few days. Not a model release. Not a benchmark. Just a description of how he's been organizing his research.

The idea: use LLMs to build a personal knowledge base, a living, self-maintaining wiki made of markdown files. No vector databases. No complex RAG pipelines. Just a raw/ folder, an LLM, and Obsidian.

It sounds simple. The implications aren't.

TL;DR

Most people use LLMs as answer machines. Ask a question, get a reply, close the tab. Karpathy is doing something different. He dumps raw research material (papers, articles, repos) into a folder, then has an LLM compile all of it into a structured, interlinked wiki made of markdown files. The LLM writes and maintains the entire wiki, he rarely touches it manually. Once the wiki gets big enough (his is around 100 articles and 400,000 words), he can ask the LLM complex research questions against it and get answers that synthesize across everything he's ever read on the topic. The answers get filed back into the wiki, so every query makes the knowledge base smarter. No vector databases needed at this scale. Just markdown files, Obsidian as the viewer, and an LLM doing all the writing.

The Problem This Solves

Most people use LLMs in sessions. You open a chat, build up context, ask questions, get answers, and close the tab. Next time you come back, the model remembers nothing. You reconstruct everything from scratch.

For developers who vibe code, hitting a token limit or ending a session feels like starting over. You burn tokens just getting the AI back up to speed on what it already knew yesterday.

RAG was supposed to fix this. Connect your documents to a vector database, retrieve relevant chunks at query time, generate answers. It works. But the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask something that requires synthesizing five documents and the LLM has to hunt down and piece together the relevant fragments every time. Nothing builds up.

Karpathy's approach is different. Instead of retrieval at query time, the LLM compiles knowledge upfront and keeps compiling as new material comes in.

How the System Actually Works

Step 1: Raw data goes into raw/

Everything starts with a folder. Articles, research papers, GitHub repos, datasets, images, all of it lands in raw/. For web content, Karpathy uses the Obsidian Web Clipper browser extension to capture pages as markdown, with images downloaded locally so the LLM can reference them directly. At this step, nothing is smart yet. It's just disciplined capture into a consistent, LLM-friendly format.

The raw/ directory is sacred. The LLM can read anything in it but never writes to it. This matters because you always have the original sources to verify against. If the LLM makes a mistake somewhere in the wiki, you can trace it back to the source and fix it.

Step 2: The LLM compiles the raw data into a wiki

This is the core of the whole system.

Instead of indexing documents for later retrieval, the LLM reads the raw/ files and writes structured wiki pages. Summaries of every source, encyclopedia-style articles for core concepts, and explicit backlinks between related ideas. The model isn't a retriever here. It's the primary author and editor of the knowledge base.

The key quote from Karpathy: "The LLM writes and maintains all of the data of the wiki, I rarely touch it directly."

You're not using the LLM to answer questions. You're using it as a writer that builds a structured knowledge base on your behalf, incrementally, as new documents come in. The analogy Karpathy uses in his follow-up gist captures it perfectly: Obsidian is the IDE, the LLM is the programmer, the wiki is the codebase

Step 3: Obsidian as the frontend

Obsidian serves as the visual layer. Browse raw data, the compiled wiki, the graph view, and any visualizations the LLM generates. Plugins like Marp allow the LLM to render markdown directly as slides. Everything stays local and file-based.

The choice of markdown matters here. If Obsidian disappeared tomorrow, the knowledge base is still just a folder of plain text files that any editor can open. You own the data. The LLM is just a visitor that writes and edits files on your behalf.

Step 4: Q&A without fancy RAG

Once the wiki reaches scale, the LLM can handle complex queries across the entire knowledge base. Karpathy's research wiki sits at around 100 articles and 400,000 words on a single topic. He expected to need a full RAG stack at this size. He didn't.

The LLM auto-maintains index files and brief summaries of every document. When you ask it a complex question, it navigates the wiki using those indexes, follows backlinks, and synthesizes an answer that draws from everything relevant. No embedding latency. No retrieval noise. Just structured markdown the model already understands because it wrote it.

Step 5: Outputs that feed back in

Answers don't come back as chat text. Karpathy has the LLM render output as markdown files, Marp slides, or Matplotlib visualizations, all viewed directly in Obsidian. The crucial part is what happens next: those outputs get filed back into the wiki.

This is what separates it from every other note-taking or research tool. Most tools require you to manually capture what you learn. This one captures it automatically, as a byproduct of the questions you ask. Every query adds to the knowledge base rather than disappearing into a chat log.

Step 6: Health checks and linting

The wiki isn't static. Karpathy runs LLM "health checks," passes where the model scans the wiki for inconsistencies, fills in missing data using web search, finds interesting connections between articles, and suggests new topics to investigate. One person in the replies described it well: it's a living knowledge base that heals itself.

The LLM isn't just storing information at this point. It's actively suggesting what questions to ask next.

Why This Beats RAG at Small-to-Mid Scale

RAG isn't going anywhere. For massive corpora, millions of documents where semantic similarity search is the main requirement, it's the right tool.

But for mid-sized, high-signal knowledge bases (hundreds to tens of thousands of documents) where structure, traceability, and ongoing synthesis matter, the markdown wiki approach fits better.

The core difference is transparency. Vector embeddings are a black box. Every claim the LLM makes in this system can be traced back to a specific markdown file a human can read, edit, or delete. There's no opaque retrieval step where you wonder why a particular chunk got surfaced. The LLM knows the structure because it built it.

What Everyone Else Is Doing With This

The post hit 14 million views and sparked a serious discussion in the AI community.

Lex Fridman replied saying he runs a similar setup, extending it further by having the LLM generate dynamic HTML with JavaScript for interactive data visualization. He also uses it to spin up temporary focused mini-wikis that he loads into an LLM for voice-mode interaction during long runs. The idea of ephemeral, task-specific knowledge bases that dissolve once the work is done points toward something interesting: agents that build a custom research environment for a specific question, then clean up after themselves.

Steph Ango, co-creator of Obsidian, raised a concept he called "contamination mitigation." His suggestion: keep your personal vault clean and let agents work in a separate messy vault, only promoting distilled insights back into your trusted archive once they've been refined. It's the same logic as staging environments in software. Give the AI a sandbox to explore freely, treat promotion into the core vault as a controlled step.

Developers started building around the idea immediately. An open-source CLI called CRATE appeared within days, implementing the same three-layer architecture, immutable raw/, LLM-maintained wiki/, and agent hints, with full Obsidian-friendly paths and support for any OpenAI-compatible model.

Where This Is Going

Karpathy ended the original post with a line that's hard to ignore: "I think there is room here for an incredible new product instead of a hacky collection of scripts."

Right now the setup is exactly that. A custom search engine he vibe-coded, a handful of CLI tools, some Obsidian plugins, and prompt engineering holding it together. It works. But it's not something most people would set up on a Sunday afternoon.

The natural endpoint is fine-tuning. As the wiki grows and gets cleaned through repeated linting passes, the data becomes more structured and high-quality. At some point it becomes the perfect training set, not just a knowledge base you query, but material you use to fine-tune a model so it internalizes the knowledge in its weights rather than just its context window. That's when this stops being a research workflow and starts being something else entirely.

How to Start Today

You don't need a polished product to try this. The setup is three things.

Create a raw/ folder. Start dropping articles, papers, and notes into it. Use Obsidian Web Clipper to convert web pages to markdown as you read them. Don't overthink the structure at this stage, raw capture is the only goal.

Open a long-context LLM session (Claude or GPT-4o work well for this). Point it at your raw/ folder and prompt it to start compiling a wiki. Summaries of each document, concept articles that group related ideas, backlinks between them. Let it write. Don't edit manually.

Install Obsidian and point it at the same folder. The graph view alone is worth it. Watching connections emerge between concepts as the wiki grows is a different way of seeing your own research.

Ten documents is enough to see how the compilation step works. The compounding effect only becomes obvious at scale, but the pattern clicks fast.

The Actual Shift

People keep framing this as a productivity hack or a better note-taking method. It's neither.

The real shift is from LLMs as answer machines to LLMs as knowledge infrastructure. You're not asking questions and getting replies. You're building a system that gets smarter every time you use it, where the knowledge compounds instead of evaporating at the end of each session.

Karpathy has been early on most major LLM use patterns, from making neural nets accessible to coining vibe coding. This one feels the same. A year from now, building a personal knowledge base this way will probably feel obvious. Right now it still feels like something only a few people are doing.

That gap is usually worth paying attention to.

Frequently Asked Questions

Do I need a vector database for this?

No. Karpathy's whole point is that at the scale of a personal knowledge base, a well-structured markdown wiki with LLM-maintained indexes outperforms full RAG infrastructure. Vector databases make sense at millions of documents. For hundreds of high-quality articles, they add complexity without adding value.

Which LLM works best for this?

Claude and GPT-4o are the most common choices given their long context windows. The longer the context, the more of your wiki the model can hold at once during a compilation or Q&A session. Local models via Ollama work if you want full data sovereignty, but output quality on the compilation step drops noticeably.

Does this work for non-technical people?

The current setup requires comfort with file systems and markdown. Karpathy himself called it a "hacky collection of scripts." That gap between what this approach can do and how accessible it is right now is exactly where a product opportunity sits, and several teams are already building toward it.

What's the difference between this and Notion AI or NotebookLM?

Those tools own your data and retrieve on demand. This approach gives the LLM authorship. It writes and maintains the knowledge base rather than just searching it. Your data lives in plain markdown files you control. And crucially, outputs compound back into the wiki rather than disappearing into a chat history.

How big does the wiki need to be before it gets useful?

Karpathy's example is around 100 articles and 400,000 words. But useful starts earlier than that. Even at 20 to 30 well-structured articles on a single topic, the cross-referencing and backlinks the LLM creates start surfacing connections you'd never have made manually.

Sharing AI Tools, AI Updates and AI Prompts That Actually Matters

Search This Blog

Karpathy Just Showed Everyone How to Actually Use LLMs for Research

Comments

Post a Comment

Popular posts from this blog

These 10 GitHub Repos Teach ML Systems, Agents, RAG And MLOps Better Than Paid Bootcamps

Anthropic Academy Courses: Learn Claude, AI Fluency, Agents, And Prompt Engineering For Free

After 1000 Hours of Prompt Engineering, These 6 Prompt Patterns Actually Work (KERNEL Framework)