RAG Pipeline Visualizer
An interactive visualization of how Retrieval-Augmented Generation works — from query embedding to vector search to LLM response generation, using a fictional Space Colony knowledge base.
How It Works
Type a question (or pick a preset) and watch the full RAG pipeline animate step-by-step. The knowledge base contains 20 chunks from a fictional Space Colony Handbook.
| Stage | What Happens |
|---|---|
| 1. Embedding | Your query is converted into a vector representation (animated bars) |
| 2. Vector Search | TF-IDF cosine similarity finds the most relevant chunks |
| 3. Context Assembly | Top-K retrieved chunks are assembled into the LLM context window |
| 4. Generation | The LLM generates a response token-by-token |
Controls
| Control | Action |
|---|---|
| Preset buttons | Load a pre-written question |
| Top-K slider | Control how many chunks are retrieved (1-5) |
| Click a chunk card | View its full content and keywords |
The Concept
RAG (Retrieval-Augmented Generation) enhances LLM responses by first searching a knowledge base for relevant context, then including that context in the prompt. This demo uses real TF-IDF vectorization and cosine similarity to find the best matching chunks.