RAG Local: Memoria Infinita para tu Agente de IA por $0

RAG ya no es cosa de empresas con presupuesto de OpenAI. Hoy construimos un Knowledge Base completo con búsqueda semántica. Costo total: $0. Tiempo: ~1 hora. Todo corre local en tu Mac.

RAG is no longer just for companies with OpenAI budgets. Today we built a complete Knowledge Base with semantic search. Total cost: $0. Time: ~1 hour. Everything runs locally on your Mac.

Esto no es un tutorial teórico. Es lo que construimos hoy como skill de OpenClaw. Un sistema donde puedo guardar artículos, URLs, documentos — y después preguntarle a mi agente de IA "¿qué leí sobre embeddings?" y que me encuentre exactamente lo relevante.

This isn't a theoretical tutorial. It's what we built today as an OpenClaw skill. A system where I can save articles, URLs, documents — and then ask my AI agent "what did I read about embeddings?" and have it find exactly what's relevant.

Y antes de que digas "pero necesitas pagar por embeddings de OpenAI" — no. Ollama + mxbai-embed-large es gratis, corre local, y en benchmarks supera a text-embedding-ada-002.

And before you say "but you need to pay for OpenAI embeddings" — no. Ollama + mxbai-embed-large is free, runs locally, and in benchmarks outperforms text-embedding-ada-002.

1 ¿Qué Demonios es RAG?What the Hell is RAG?

RAG = Retrieval Augmented Generation. En español: darle memoria externa a tu LLM.

RAG = Retrieval Augmented Generation. In plain English: giving your LLM external memory.

El problema con los LLMs es que solo saben lo que estaba en su training data. No saben qué leíste ayer. No conocen tus documentos internos. No tienen idea de ese artículo que guardaste hace un mes.

The problem with LLMs is they only know what was in their training data. They don't know what you read yesterday. They don't know your internal documents. They have no idea about that article you saved a month ago.

RAG resuelve esto en 3 pasos:

RAG solves this in 3 steps:

┌────────────────────────────────────────────────────────────┐
│                    CÓMO FUNCIONA RAG                        │
│                                                            │
│  1. INDEXAR                                                │
│  ┌──────────┐    ┌────────────┐    ┌─────────────────┐    │
│  │ Documento│───▶│ Embeddings │───▶│ Base de Datos   │    │
│  │ o URL    │    │ (vectores) │    │ (SQLite)        │    │
│  └──────────┘    └────────────┘    └─────────────────┘    │
│                                                            │
│  2. BUSCAR                                                 │
│  ┌──────────┐    ┌────────────┐    ┌─────────────────┐    │
│  │ "¿Qué    │───▶│ Embedding  │───▶│ Búsqueda por    │    │
│  │ leí de   │    │ de query   │    │ similitud       │    │
│  │ RAG?"    │    │            │    │ coseno          │    │
│  └──────────┘    └────────────┘    └────────┬────────┘    │
│                                              │             │
│  3. GENERAR                                  ▼             │
│  ┌──────────────────────────────────────────────────┐     │
│  │ LLM recibe: tu pregunta + documentos relevantes  │     │
│  │ → Respuesta con contexto de TU conocimiento      │     │
│  └──────────────────────────────────────────────────┘     │
└────────────────────────────────────────────────────────────┘

Retrieval Augmented Generation: memoria infinita para tu agenteRetrieval Augmented Generation: infinite memory for your agent

La magia está en los embeddings. Son representaciones numéricas del significado de un texto. Dos textos sobre el mismo tema tendrán vectores similares, aunque usen palabras diferentes.

The magic is in embeddings. They're numerical representations of text meaning. Two texts about the same topic will have similar vectors, even if they use different words.

💡 Ejemplo: "El perro corre en el parque" y "El can trota en el jardín" tienen embeddings casi idénticos. Una búsqueda por keywords no los conectaría. Una búsqueda semántica sí.

💡 Example: "The dog runs in the park" and "The canine trots in the garden" have nearly identical embeddings. A keyword search wouldn't connect them. A semantic search does.

2 El Stack: $0 y Todo LocalThe Stack: $0 and All Local

Aquí está lo que usamos:

Here's what we use:

❌ Stack "Enterprise"❌ "Enterprise" Stack

OpenAI Embeddings: $0.0001/1K tokensOpenAI Embeddings: $0.0001/1K tokens
Pinecone/Weaviate: $70+/mesPinecone/Weaviate: $70+/month
AWS/GCP: Hosting adicionalAWS/GCP: Additional hosting
Privacidad: Todo en la nube 😬Privacy: Everything in the cloud 😬
Costo mensual: $100-500+Monthly cost: $100-500+

✅ Nuestro Stack✅ Our Stack

Ollama + mxbai-embed-large: $0Ollama + mxbai-embed-large: $0
SQLite: $0 (incluido en tu OS)SQLite: $0 (included in your OS)
Hosting: Tu MacHosting: Your Mac
Privacidad: 100% local 🔒Privacy: 100% local 🔒
Costo mensual: $0Monthly cost: $0

Los ComponentesThe Components

🦙 Ollama

Ollama es como Docker pero para LLMs. Un comando y tienes modelos corriendo local. Lo usamos para los embeddings, no para generar texto (eso lo hace tu LLM principal).

Ollama is like Docker but for LLMs. One command and you have models running locally. We use it for embeddings, not for text generation (your main LLM handles that).

🧠 mxbai-embed-large

El modelo de embeddings de Mixedbread AI. 335M parámetros. Vectores de 1024 dimensiones. Supera a OpenAI's ada-002 en el benchmark MTEB. Y es open source.

The embeddings model from Mixedbread AI. 335M parameters. 1024-dimension vectors. Outperforms OpenAI's ada-002 on the MTEB benchmark. And it's open source.

64.68%

Score MTEB de mxbai-embed-large vs 61.0% de text-embedding-ada-002MTEB score for mxbai-embed-large vs 61.0% for text-embedding-ada-002

🗄️ SQLite

La base de datos más deployeada del mundo. Viene en tu Mac. Cero configuración. Para nuestro caso de uso (cientos o miles de documentos personales), es más que suficiente. No necesitas Pinecone.

The most deployed database in the world. Comes with your Mac. Zero configuration. For our use case (hundreds or thousands of personal documents), it's more than enough. You don't need Pinecone.

3 Setup en 15 MinutosSetup in 15 Minutes

Esto es todo lo que necesitas:

This is all you need:

1. Instalar Ollama1. Install Ollama

# macOS
brew install ollama

# O descarga directa desde ollama.ai

2. Descargar el modelo de embeddings2. Download the embeddings model

ollama pull mxbai-embed-large

# Primera vez descarga ~670MB
# Después corre instantáneo

3. Verificar que funciona3. Verify it works

curl http://localhost:11434/api/embeddings -d '{
  "model": "mxbai-embed-large",
  "prompt": "Hola mundo"
}'

# Debería devolver un vector de 1024 dimensiones

Eso es todo el setup. Ollama corre como servicio en background. Ahora tienes un endpoint local para generar embeddings de cualquier texto.

That's all the setup. Ollama runs as a background service. Now you have a local endpoint to generate embeddings for any text.

📝 Nota: En Apple Silicon (M1/M2/M3), los embeddings se generan en ~100ms por documento. En Intel puede tardar un poco más, pero sigue siendo instantáneo para uso interactivo.

📝 Note: On Apple Silicon (M1/M2/M3), embeddings generate in ~100ms per document. On Intel it might take slightly longer, but it's still instant for interactive use.

4 La Arquitectura del Knowledge BaseThe Knowledge Base Architecture

Nuestro KB tiene una estructura simple pero poderosa:

Our KB has a simple but powerful structure:

┌─────────────────────────────────────────────────────────────┐
│                   KNOWLEDGE BASE SCHEMA                      │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │                    documents                         │    │
│  ├─────────────────────────────────────────────────────┤    │
│  │ id          │ INTEGER PRIMARY KEY                   │    │
│  │ title       │ TEXT                                  │    │
│  │ content     │ TEXT (el texto completo)              │    │
│  │ source_url  │ TEXT (opcional, de dónde vino)        │    │
│  │ tags        │ TEXT (JSON array)                     │    │
│  │ embedding   │ BLOB (vector de 1024 floats)          │    │
│  │ created_at  │ DATETIME                              │    │
│  │ updated_at  │ DATETIME                              │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
│  Índice: embedding para búsqueda por similitud              │
│  (En SQLite usamos búsqueda lineal, suficiente para <10K    │
│   docs. Para más, considera sqlite-vec o similar)           │
└─────────────────────────────────────────────────────────────┘

Schema simple: documento + embedding + metadataSimple schema: document + embedding + metadata

El Flujo de GuardadoThe Save Flow

// Pseudocódigo del flujo
async function saveDocument(title, content, sourceUrl, tags) {
  // 1. Generar embedding del contenido
  const embedding = await ollama.embeddings({
    model: 'mxbai-embed-large',
    prompt: content
  });
  
  // 2. Guardar en SQLite
  db.run(`
    INSERT INTO documents (title, content, source_url, tags, embedding)
    VALUES (?, ?, ?, ?, ?)
  `, [title, content, sourceUrl, JSON.stringify(tags), embedding]);
}

El Flujo de BúsquedaThe Search Flow

// Búsqueda semántica
async function search(query, limit = 5) {
  // 1. Embedding de la query
  const queryEmbedding = await ollama.embeddings({
    model: 'mxbai-embed-large',
    prompt: query
  });
  
  // 2. Obtener todos los documentos
  const docs = db.all('SELECT * FROM documents');
  
  // 3. Calcular similitud coseno con cada uno
  const results = docs.map(doc => ({
    ...doc,
    similarity: cosineSimilarity(queryEmbedding, doc.embedding)
  }));
  
  // 4. Ordenar por similitud y devolver top N
  return results
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, limit);
}

⚠️ Sobre escalabilidad: Sí, estamos haciendo búsqueda lineal. Para un Knowledge Base personal (<10,000 documentos), esto toma milisegundos. Si necesitas millones de documentos, considera sqlite-vec, FAISS, o una base de datos vectorial dedicada. Pero para uso personal, keep it simple.

⚠️ On scalability: Yes, we're doing linear search. For a personal Knowledge Base (<10,000 documents), this takes milliseconds. If you need millions of documents, consider sqlite-vec, FAISS, or a dedicated vector database. But for personal use, keep it simple.

5 Integración con tu AgenteIntegration with Your Agent

Aquí es donde la magia se vuelve práctica. El Knowledge Base es un skill de OpenClaw, lo que significa que mi agente puede:

Here's where the magic becomes practical. The Knowledge Base is an OpenClaw skill, which means my agent can:

Guardar: "Guarda este artículo en el KB con tags [rag, embeddings]"
Save: "Save this article to the KB with tags [rag, embeddings]"
Buscar: "¿Qué tengo guardado sobre autenticación con JWT?"
Search: "What do I have saved about JWT authentication?"
Fetch + Guardar: "Lee este URL y guárdalo en mi KB"
Fetch + Save: "Read this URL and save it to my KB"

┌────────────────────────────────────────────────────────────┐
│              FLUJO DE USO CON OPENCLAW                      │
│                                                             │
│  Usuario: "Guarda https://example.com/articulo-de-rag"     │
│                          │                                  │
│                          ▼                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ OpenClaw:                                             │  │
│  │ 1. web_fetch → extrae contenido del URL              │  │
│  │ 2. kb save → genera embedding + guarda en SQLite     │  │
│  └──────────────────────────────────────────────────────┘  │
│                          │                                  │
│                          ▼                                  │
│  Usuario: "¿Qué sé sobre embeddings?"                      │
│                          │                                  │
│                          ▼                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ OpenClaw:                                             │  │
│  │ 1. kb search "embeddings" → top 5 documentos         │  │
│  │ 2. Incluye contexto en prompt al LLM                 │  │
│  │ 3. Responde con información de TU knowledge base     │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────┘

El agente ahora tiene acceso a todo lo que has guardadoThe agent now has access to everything you've saved

El skill expone comandos simples que el agente puede usar:

The skill exposes simple commands the agent can use:

# Guardar un documento
kb save --title "Artículo sobre RAG" --content "..." --tags "rag,ai"

# Guardar desde URL
kb save-url --url "https://example.com/article" --tags "reference"

# Buscar
kb search --query "¿cómo funcionan los embeddings?" --limit 5

# Listar recientes
kb list --limit 10

6 El Elefante en la Habitación: ¿Por Qué $10/mes Importa?The Elephant in the Room: Why $10/month Matters

Aquí viene el plot twist. El Knowledge Base es gratis. Pero el LLM que lo usa no lo era... hasta ahora.

Here's the plot twist. The Knowledge Base is free. But the LLM that uses it wasn't... until now.

$10/mes

GitHub Copilot te da acceso a Claude Opus 4, GPT-4.5, Gemini 2.5 Pro. Los mismos modelos que las empresas pagan cientos por usar.GitHub Copilot gives you access to Claude Opus 4, GPT-4.5, Gemini 2.5 Pro. The same models enterprises pay hundreds to use.

Peter Steinberger, el creador de OpenClaw, lo llama "shipping at inference speed". Tienes los mejores modelos del mundo. Tienes un Knowledge Base local con tus documentos. Tienes herramientas de automatización. El stack está democratizado.

Peter Steinberger, the creator of OpenClaw, calls it "shipping at inference speed". You have the best models in the world. You have a local Knowledge Base with your documents. You have automation tools. The stack is democratized.

La pregunta ya no es "¿tienes acceso a IA de primer nivel?" Por $10/mes, sí. Todos tienen. La pregunta ahora es: ¿sabes usarla?

The question is no longer "do you have access to top-tier AI?" For $10/month, yes. Everyone does. The question now is: do you know how to use it?

El acceso ya no es el diferenciador. El conocimiento sí. Por eso construimos cosas como este Knowledge Base — para amplificar lo que ya sabemos, no para reemplazarlo.

Access is no longer the differentiator. Knowledge is. That's why we build things like this Knowledge Base — to amplify what we already know, not to replace it.

7 Privacidad: Por Qué Local ImportaPrivacy: Why Local Matters

Un Knowledge Base en la nube significa:

A cloud Knowledge Base means:

Tus documentos privados en servidores de terceros
Your private documents on third-party servers
Dependencia de APIs que pueden cambiar precios o términos
Dependency on APIs that can change prices or terms
Latencia de red en cada query
Network latency on every query
Costos que escalan con uso
Costs that scale with usage

Un Knowledge Base local significa:

A local Knowledge Base means:

Tus datos nunca salen de tu máquina (excepto cuando los envías explícitamente al LLM)
Your data never leaves your machine (except when you explicitly send it to the LLM)
Funciona offline (los embeddings al menos)
Works offline (embeddings at least)
Costo fijo: $0
Fixed cost: $0
Sin límites de uso
No usage limits

⚠️ Clarificación importante: Cuando haces una búsqueda y el agente usa los resultados para responder, el contenido sí va al LLM (Claude/GPT/etc). La diferencia es que tú controlas qué se envía. El KB y los embeddings son 100% locales. El LLM es donde decides qué contexto compartir.

⚠️ Important clarification: When you search and the agent uses results to respond, the content does go to the LLM (Claude/GPT/etc). The difference is you control what's sent. The KB and embeddings are 100% local. The LLM is where you decide what context to share.

8 Ideas para ExtenderIdeas for Extensions

Una vez que tienes el KB funcionando, las posibilidades son infinitas:

Once you have the KB running, the possibilities are endless:

📚 Importar tu Readwise/InstapaperImport your Readwise/Instapaper

Exporta tus highlights y artículos guardados, impórtalos al KB. Ahora puedes preguntarle a tu agente "¿qué he leído sobre productividad este año?".

Export your highlights and saved articles, import them to the KB. Now you can ask your agent "what have I read about productivity this year?".

📝 Indexar tus notas de Obsidian/NotionIndex your Obsidian/Notion notes

Un script que sincroniza tu vault de Obsidian al KB. Búsqueda semántica sobre años de notas personales.

A script that syncs your Obsidian vault to the KB. Semantic search over years of personal notes.

📧 Emails importantesImportant emails

Guarda emails clave (propuestas, contratos, decisiones) en el KB. "¿Qué acordamos con el cliente X sobre el precio?".

Save key emails (proposals, contracts, decisions) to the KB. "What did we agree with client X about pricing?".

🎙️ Transcripciones de reunionesMeeting transcripts

Transcribe tus llamadas de Zoom, guárdalas en el KB. "¿Qué mencionó Juan sobre el deadline en la última reunión?".

Transcribe your Zoom calls, save them to the KB. "What did Juan mention about the deadline in the last meeting?".

📄 Documentación técnicaTechnical documentation

Indexa la documentación de tus proyectos. El agente puede responder preguntas sobre tu propio código/arquitectura.

Index your project documentation. The agent can answer questions about your own code/architecture.

💡 Ideas Clave💡 Key Takeaways

RAG ya no requiere presupuesto enterprise. Ollama + mxbai-embed-large + SQLite = $0 y corre en tu Mac.
RAG no longer requires enterprise budget. Ollama + mxbai-embed-large + SQLite = $0 and runs on your Mac.
mxbai-embed-large supera a OpenAI ada-002 en benchmarks, y es completamente gratis y local.
mxbai-embed-large outperforms OpenAI ada-002 in benchmarks, and it's completely free and local.
Setup en 15 minutos: brew install ollama && ollama pull mxbai-embed-large. Listo.
Setup in 15 minutes: brew install ollama && ollama pull mxbai-embed-large. Done.
SQLite es suficiente para un KB personal. No necesitas Pinecone ni bases vectoriales complejas.
SQLite is enough for a personal KB. You don't need Pinecone or complex vector databases.
Privacidad por defecto: Embeddings y búsqueda son 100% locales. Tú decides qué contexto enviar al LLM.
Privacy by default: Embeddings and search are 100% local. You decide what context to send to the LLM.
El stack está democratizado. Por $10/mes (Copilot) tienes LLMs de primer nivel + KB local gratuito. El diferenciador ya no es el acceso — es el conocimiento.
The stack is democratized. For $10/month (Copilot) you get top-tier LLMs + free local KB. The differentiator is no longer access — it's knowledge.