AI Engineering in Production
From RAG pipelines and vector databases to MCP servers and agent security — the operational patterns for shipping LLM-backed systems that survive contact with real traffic.
Articles in this series
Building Production RAG Pipelines: Chunking, Embeddings, and Retrieval at Scale
Build RAG systems that work in production: chunking strategies, embedding selection, pgvector ops, and retrieval quality evaluation.
Vector Databases Compared: pgvector vs Pinecone vs Weaviate
Compare pgvector, Pinecone, Weaviate, Qdrant, Milvus, and Chroma on performance, cost, and operational fit with real code and benchmarks.
LLM API Integration Patterns for Backend Engineers
Production LLM API patterns: streaming, function calling, retries, token budgets, cost optimization, and observability for backend engineers.
Spring AI in Production: RAG Pipelines, Reliability, and Observability for Java Backends
Spring AI 1.1 deep-dive: production RAG pipeline with PII scrubbing, circuit breakers, Micrometer observability, and answer evaluation.
Building an MCP Server in Go with Code Mode: From 1.17M Tokens to 1,000
2,500 API endpoints in one MCP server without blowing context windows. The Code Mode pattern uses search + execute to cut token cost by 1,000x.
Securing AI Agent Infrastructure: MCP Servers, Tool Calls, and the Attack Surface You're Not Watching
AI agents calling tools via MCP create new attack surfaces: prompt injection through tool responses, credential leakage, and unauthorized execution.