Building an MCP Server in Go with Code Mode: From 1.17M Tokens to 1,000
2,500 API endpoints in one MCP server without blowing context windows. The Code Mode pattern uses search + execute to cut token cost by 1,000x.
Building with AI as a backend engineer — LLM integrations, RAG pipelines, embedding stores, vector databases, AI APIs, and production inference infrastructure.
From RAG pipelines and vector databases to MCP servers and agent security — the operational patterns for shipping LLM-backed systems that survive contact with real traffic.
2,500 API endpoints in one MCP server without blowing context windows. The Code Mode pattern uses search + execute to cut token cost by 1,000x.
AI agents calling tools via MCP create new attack surfaces: prompt injection through tool responses, credential leakage, and unauthorized execution.
Compare pgvector, Pinecone, Weaviate, Qdrant, Milvus, and Chroma on performance, cost, and operational fit with real code and benchmarks.
Production LLM API patterns: streaming, function calling, retries, token budgets, cost optimization, and observability for backend engineers.
Build RAG systems that work in production: chunking strategies, embedding selection, pgvector ops, and retrieval quality evaluation.
Spring AI 1.1 deep-dive: production RAG pipeline with PII scrubbing, circuit breakers, Micrometer observability, and answer evaluation.