Production RAG System for FinTech

The Challenge

FinTech Co's customer support and compliance teams were drowning in documentation. With thousands of regulatory filings, internal policies, and customer correspondence spread across multiple systems, finding the right information for a customer query or compliance check was a manual, time-consuming process.

Their internal search tool returned too many irrelevant results, and team members had developed their own informal systems of bookmarks and tribal knowledge. When experienced staff left, critical institutional knowledge walked out the door with them.

The company had scoped an 8-month project to build an AI-powered document search system. After three months of planning and vendor evaluation, they had a 40-page requirements document but no working software.

Our Approach

We started with a two-day discovery sprint. Rather than reviewing the requirements document, we sat with the support team and watched them work. We identified the 10 most common query patterns and the documents they typically referenced. This gave us a focused scope for the initial build.

Days 1-2: Data pipeline and chunking. We built an ingestion pipeline that processed their document corpus — PDFs, Word docs, and HTML pages — into semantically meaningful chunks. Financial documents required special handling: tables were extracted with structure intact, cross-references were preserved as metadata, and section hierarchies were maintained.

Days 3-4: Core RAG pipeline. We implemented the retrieval and generation pipeline using LangChain for orchestration and Claude for synthesis. The first working version was deployed internally on day 4.

Days 5-8: Iteration based on feedback. With a working system in hand, we ran it against the 10 common query patterns and iterated. We added a re-ranking step to improve relevance, implemented citation generation so users could verify answers, and tuned the chunking strategy based on retrieval quality metrics.

Days 9-11: Production hardening. We added monitoring, error handling, rate limiting, and a feedback mechanism for users to flag incorrect answers. The system was deployed to production on day 11.

Results

The impact was immediate and measurable:

60% reduction in average time to resolve document-related queries
94% accuracy on a test set of 200 question-answer pairs validated by domain experts
4 months ahead of the original project timeline
Zero critical incidents in the first 30 days of production operation

The system now processes over 500 queries per day and has become the primary tool for both customer support and compliance teams.

Key Takeaways

The biggest factor in our speed wasn't technical — it was focus. By narrowing the initial scope to the 10 most common query patterns, we delivered a system that handled 80% of real-world usage on day one. The remaining edge cases were addressed incrementally over the following weeks.

"We spent three months planning and got nowhere. Kimaya shipped a working system in less than two weeks that our team actually uses every day." — VP of Engineering, FinTech Co

The Challenge

Tech Stack

The Challenge

Our Approach

Results

Key Takeaways