Developer Cuts RAG System Costs by 99.7%, Replacing $360/Month OpenSearch with S3 and In-Memory Search
Key Takeaways
- ▸A production RAG system was successfully migrated from AWS OpenSearch ($360/month) to S3 + in-memory search ($1.12/month), achieving 99.7% cost reduction while maintaining sub-200ms performance
- ▸The system uses a curated knowledge base covering 25+ years of engineering experience, ensuring all AI responses are grounded in verified content rather than generic LLM outputs
- ▸The migration demonstrates that managed vector databases like OpenSearch may be over-engineered for applications with hundreds rather than millions of daily queries
Summary
Software engineer Stephanie Spanjian has published a detailed case study showing how she reduced the infrastructure costs of her personal RAG (Retrieval-Augmented Generation) system from $360 per month to just $1.12 per month — a 99.7% cost reduction. The original architecture used AWS OpenSearch for vector storage, OpenAI's text-embedding-ada-002 for embeddings, and GPT-4 for generation, following enterprise-grade best practices with clean separation between ingestion, retrieval, and generation layers.
The breakthrough came from architectural simplification driven by cost constraints. Spanjian replaced the managed OpenSearch cluster with S3 for vector storage and implemented in-memory cosine similarity search, while maintaining sub-200ms query performance. The system powers an AI agent on her portfolio website that answers questions about her 25+ years of engineering experience using a curated knowledge base, with every response grounded in verified content rather than general LLM training data.
The migration demonstrates that production-quality RAG systems don't always require expensive managed services. By carefully evaluating scale requirements and optimizing for actual usage patterns rather than theoretical enterprise needs, Spanjian achieved dramatic cost savings while maintaining system reliability and performance. The case study provides a detailed breakdown of architectural decisions, including the move from external embedding APIs to more cost-effective alternatives and the tradeoffs between managed infrastructure and custom implementations.
This work arrives as AI developers increasingly grapple with the economics of running LLM-powered applications at scale. While the original OpenSearch-based architecture was technically sound, it represented over-engineering for a personal project handling hundreds rather than millions of daily queries. The successful migration offers a roadmap for startups and individual developers seeking to build sustainable AI applications without enterprise-scale budgets.
- Key architectural changes included moving from OpenAI's text-embedding-ada-002 to more cost-effective alternatives and replacing cloud-managed search infrastructure with custom implementations
- The case study highlights the importance of matching infrastructure complexity to actual usage patterns rather than following enterprise best practices by default
Editorial Opinion
This case study is a masterclass in engineering pragmatism over architectural dogma. While the tech industry often promotes 'enterprise-grade' solutions as inherently superior, Spanjian proves that thoughtful optimization for actual requirements can yield dramatically better outcomes. Her 99.7% cost reduction isn't just about saving money — it's about building sustainable AI applications that don't require VC funding to keep the lights on. As more developers deploy RAG systems, this kind of practical, cost-conscious architecture will become increasingly valuable, especially for startups and independent builders who can't afford $360/month vector databases for every project.



