How We Built a Production RAG System in 11 Days
The Challenge
When a FinTech client approached us with thousands of unstructured financial documents and a tight deadline, we knew a standard chatbot wouldn't cut it. They needed a system that could accurately retrieve and synthesize information from regulatory filings, internal policies, and customer correspondence — all while maintaining the precision that financial services demand.
The traditional approach would have involved months of planning, a large team, and a waterfall-style rollout. Instead, we took a different path: a focused 11-day sprint using a modern RAG (Retrieval-Augmented Generation) architecture.
Architecture Decisions
Our stack centered on three key choices that enabled rapid development without sacrificing production quality:
Vector storage with PostgreSQL + pgvector. Rather than introducing a standalone vector database, we extended the client's existing PostgreSQL infrastructure. This eliminated an entire category of operational complexity — no new backup strategies, no new monitoring, no new failure modes. The pgvector extension gave us efficient similarity search with HNSW indexes, and we could join vector results with structured metadata in a single query.
Chunking strategy matters more than model choice. We spent nearly two full days on chunking alone. Financial documents have complex hierarchical structures — sections reference other sections, tables need to be kept intact, and footnotes carry critical context. We implemented a hybrid chunking approach: semantic splitting for narrative text, structure-preserving extraction for tables, and parent-child relationships for cross-referenced sections.
LangChain for orchestration, Claude for generation. LangChain gave us the pipeline abstractions we needed to move fast, while Claude's large context window and instruction-following capabilities made it ideal for synthesizing information from multiple retrieved passages.
Results and Lessons Learned
The system went live on day 11 and handled its first production queries without incident. Within the first week, the client's support team reported a 60% reduction in time spent searching for policy information. More importantly, the accuracy of responses — validated against a test set of 200 question-answer pairs — exceeded 94%.
The biggest lesson: scope ruthlessly and ship incrementally. We deployed a working system on day 4 with basic retrieval, then layered on re-ranking, citation generation, and feedback loops over the remaining days. Each iteration was a deployable improvement, never a risky rewrite.