Machine learning models are only as good as the features they consume. In production, serving features at inference time with low latency and high consistency is one of the hardest engineering challenges in ML infrastructure.
A modern feature store has two planes: the offline store for training data and the online store for serving. The offline store typically uses a data lake (Parquet files on S3). The online store requires a low-latency key-value store like Redis or DynamoDB.
The critical challenge is keeping these two stores consistent. Features computed during training must match exactly what is served during inference. This is known as the training-serving skew problem, and it is the single most common source of ML model degradation in production.
graph TD
Sources["Raw Data Sources"]
subgraph FP [Feature Pipeline]
Spark["Spark / Flink"]
end
subgraph Offline [Offline Store (Training)]
S3[("S3 / Data Lake")]
end
subgraph Online [Online Store (Inference)]
Redis[("Redis Cluster")]
end
subgraph Serving [Serving Layer]
API["Feature Service"]
Cache["L1 In-Memory Cache"]
end
Sources --> Spark
Spark -->|"Batch Write"| S3
Spark -->|"Stream Write"| Redis
API --> Cache
Cache -.->|"Miss"| Redis
style S3 fill:#e0f7fa,stroke:#006064
style Redis fill:#ffebee,stroke:#b71c1c
Features flow through a pipeline that reads raw events, applies transformations, and writes results to both stores:
from feast import FeatureStore, Entity, FeatureView, Field
from feast.types import Float32, Int64, String
user_entity = Entity(
name="user_id",
join_keys=["user_id"],
value_type=String,
)
user_features = FeatureView(
name="user_features",
entities=[user_entity],
schema=[
Field(name="avg_order_value_7d", dtype=Float32),
Field(name="total_orders_30d", dtype=Int64),
Field(name="preferred_category", dtype=String),
Field(name="account_age_days", dtype=Int64),
Field(name="fraud_score", dtype=Float32),
],
online=True,
ttl=timedelta(hours=24),
)The online store must serve features with sub-5ms latency. We use Redis Cluster with a two-tier caching approach: an L1 in-process cache (using Ristretto) and an L2 Redis cache.
type FeatureServer struct {
redis *redis.ClusterClient
cache *ristretto.Cache
metrics *prometheus.Registry
}
func (fs *FeatureServer) GetFeatures(
ctx context.Context,
entityKey string,
featureNames []string,
) (map[string]interface{}, error) {
if cached, ok := fs.cache.Get(entityKey); ok {
fs.metrics.CacheHits.Inc()
return filterFeatures(cached, featureNames), nil
}
pipe := fs.redis.Pipeline()
for _, name := range featureNames {
key := fmt.Sprintf("feat:%s:%s", entityKey, name)
pipe.Get(ctx, key)
}
results, err := pipe.Exec(ctx)
if err != nil && err != redis.Nil {
return nil, fmt.Errorf("redis pipeline: %w", err)
}
features := make(map[string]interface{}, len(featureNames))
for i, result := range results {
if val, err := result.(*redis.StringCmd).Result(); err == nil {
features[featureNames[i]] = deserialize(val)
}
}
fs.cache.Set(entityKey, features, int64(len(features)*64))
return features, nil
}Feature stores require specialized monitoring: feature freshness (lag between event time and availability), feature coverage (percentage of requests finding all features), and feature drift (distribution shift from training data).
Our production feature store serves 500K lookups per second with p50 latency of 1.2ms and p99 of 4.8ms. The L1 cache hit rate averages 73%, which is critical for achieving these latency numbers.
Building a production feature store requires balancing latency, consistency, and operational complexity. The two-tier caching approach with careful monitoring gives you the performance needed for real-time ML inference while maintaining the consistency guarantees that keep your models accurate.
System Architecture Group
Experts in distributed systems, scalability, and high-performance computing.
Explore advanced caching patterns including write-through, write-behind, cache-aside, and distributed caching with Redis Cluster for high-throughput systems.
When your distributed system hits 100k requests per second, your caching layer becomes the most critical component. Here's how to configure Redis Cluster for maximum resilience.