Feature Store Machine Learning Redis Python

Building a Real-Time Feature Store for ML Inference

System Architecture Group•Dec 28, 2024

14 min read

Building a Real-Time Feature Store for ML Inference

Introduction

Machine learning models are only as good as the features they consume. In production, serving features at inference time with low latency and high consistency is one of the hardest engineering challenges in ML infrastructure.

The Feature Store Architecture

A modern feature store has two planes: the offline store for training data and the online store for serving. The offline store typically uses a data lake (Parquet files on S3). The online store requires a low-latency key-value store like Redis or DynamoDB.

The critical challenge is keeping these two stores consistent. Features computed during training must match exactly what is served during inference. This is known as the training-serving skew problem, and it is the single most common source of ML model degradation in production.

graph TD
    Sources["Raw Data Sources"]
    
    subgraph FP [Feature Pipeline]
        Spark["Spark / Flink"]
    end
    
    subgraph Offline [Offline Store (Training)]
        S3[("S3 / Data Lake")]
    end
    
    subgraph Online [Online Store (Inference)]
        Redis[("Redis Cluster")]
    end
    
    subgraph Serving [Serving Layer]
        API["Feature Service"]
        Cache["L1 In-Memory Cache"]
    end
    
    Sources --> Spark
    Spark -->|"Batch Write"| S3
    Spark -->|"Stream Write"| Redis
    
    API --> Cache
    Cache -.->|"Miss"| Redis
    
    style S3 fill:#e0f7fa,stroke:#006064
    style Redis fill:#ffebee,stroke:#b71c1c

Feature Computation Pipeline

Features flow through a pipeline that reads raw events, applies transformations, and writes results to both stores:

from feast import FeatureStore, Entity, FeatureView, Field
from feast.types import Float32, Int64, String
 
user_entity = Entity(
    name="user_id",
    join_keys=["user_id"],
    value_type=String,
)
 
user_features = FeatureView(
    name="user_features",
    entities=[user_entity],
    schema=[
        Field(name="avg_order_value_7d", dtype=Float32),
        Field(name="total_orders_30d", dtype=Int64),
        Field(name="preferred_category", dtype=String),
        Field(name="account_age_days", dtype=Int64),
        Field(name="fraud_score", dtype=Float32),
    ],
    online=True,
    ttl=timedelta(hours=24),
)

Online Serving Layer

The online store must serve features with sub-5ms latency. We use Redis Cluster with a two-tier caching approach: an L1 in-process cache (using Ristretto) and an L2 Redis cache.

type FeatureServer struct {
    redis   *redis.ClusterClient
    cache   *ristretto.Cache
    metrics *prometheus.Registry
}
 
func (fs *FeatureServer) GetFeatures(
    ctx context.Context,
    entityKey string,
    featureNames []string,
) (map[string]interface{}, error) {
    if cached, ok := fs.cache.Get(entityKey); ok {
        fs.metrics.CacheHits.Inc()
        return filterFeatures(cached, featureNames), nil
    }
 
    pipe := fs.redis.Pipeline()
    for _, name := range featureNames {
        key := fmt.Sprintf("feat:%s:%s", entityKey, name)
        pipe.Get(ctx, key)
    }
    results, err := pipe.Exec(ctx)
    if err != nil && err != redis.Nil {
        return nil, fmt.Errorf("redis pipeline: %w", err)
    }
 
    features := make(map[string]interface{}, len(featureNames))
    for i, result := range results {
        if val, err := result.(*redis.StringCmd).Result(); err == nil {
            features[featureNames[i]] = deserialize(val)
        }
    }
    fs.cache.Set(entityKey, features, int64(len(features)*64))
    return features, nil
}

Monitoring and Observability

Feature stores require specialized monitoring: feature freshness (lag between event time and availability), feature coverage (percentage of requests finding all features), and feature drift (distribution shift from training data).

Performance Results

Our production feature store serves 500K lookups per second with p50 latency of 1.2ms and p99 of 4.8ms. The L1 cache hit rate averages 73%, which is critical for achieving these latency numbers.

Conclusion

Building a production feature store requires balancing latency, consistency, and operational complexity. The two-tier caching approach with careful monitoring gives you the performance needed for real-time ML inference while maintaining the consistency guarantees that keep your models accurate.

BackendBytes Architect Team

System Architecture Group

Experts in distributed systems, scalability, and high-performance computing.

graph TD Sources["Raw Data Sources"] subgraph FP [Feature Pipeline] Spark["Spark / Flink"] end subgraph Offline [Offline Store (Training)] S3[("S3 / Data Lake")] end subgraph Online [Online Store (Inference)] Redis[("Redis Cluster")] end subgraph Serving [Serving Layer] API["Feature Service"] Cache["L1 In-Memory Cache"] end Sources --> Spark Spark -->|"Batch Write"| S3 Spark -->|"Stream Write"| Redis API --> Cache Cache -.->|"Miss"| Redis style S3 fill:#e0f7fa,stroke:#006064 style Redis fill:#ffebee,stroke:#b71c1c

Feature Computation Pipeline

Features flow through a pipeline that reads raw events, applies transformations, and writes results to both stores:

from feast import FeatureStore, Entity, FeatureView, Field
from feast.types import Float32, Int64, String
 
user_entity = Entity(
    name="user_id",
    join_keys=["user_id"],
    value_type=String,
)
 
user_features = FeatureView(
    name="user_features",
    entities=[user_entity],
    schema=[
        Field(name="avg_order_value_7d", dtype=Float32),
        Field(name="total_orders_30d", dtype=Int64),
        Field(name="preferred_category", dtype=String),
        Field(name="account_age_days", dtype=Int64),
        Field(name="fraud_score", dtype=Float32),
    ],
    online=True,
    ttl=timedelta(hours=24),
)

Online Serving Layer

The online store must serve features with sub-5ms latency. We use Redis Cluster with a two-tier caching approach: an L1 in-process cache (using Ristretto) and an L2 Redis cache.

type FeatureServer struct {
    redis   *redis.ClusterClient
    cache   *ristretto.Cache
    metrics *prometheus.Registry
}
 
func (fs *FeatureServer) GetFeatures(
    ctx context.Context,
    entityKey string,
    featureNames []string,
) (map[string]interface{}, error) {
    if cached, ok := fs.cache.Get(entityKey); ok {
        fs.metrics.CacheHits.Inc()
        return filterFeatures(cached, featureNames), nil
    }
 
    pipe := fs.redis.Pipeline()
    for _, name := range featureNames {
        key := fmt.Sprintf("feat:%s:%s", entityKey, name)
        pipe.Get(ctx, key)
    }
    results, err := pipe.Exec(ctx)
    if err != nil && err != redis.Nil {
        return nil, fmt.Errorf("redis pipeline: %w", err)
    }
 
    features := make(map[string]interface{}, len(featureNames))
    for i, result := range results {
        if val, err := result.(*redis.StringCmd).Result(); err == nil {
            features[featureNames[i]] = deserialize(val)
        }
    }
    fs.cache.Set(entityKey, features, int64(len(features)*64))
    return features, nil
}

Building a Real-Time Feature Store for ML Inference

Introduction

The Feature Store Architecture

Feature Computation Pipeline

Online Serving Layer

Monitoring and Observability

Performance Results

Conclusion

Read Next

Caching Strategies at Scale: Beyond Simple Key-Value Stores

Scaling Redis for High-Throughput Systems

Caching Strategies at Scale: Beyond Simple Key-Value Stores

Scaling Redis for High-Throughput Systems

Building a Real-Time Feature Store for ML Inference

Introduction

The Feature Store Architecture

Feature Computation Pipeline

Online Serving Layer

Monitoring and Observability

Performance Results

Conclusion

Read Next

Caching Strategies at Scale: Beyond Simple Key-Value Stores

Scaling Redis for High-Throughput Systems

Caching Strategies at Scale: Beyond Simple Key-Value Stores

Scaling Redis for High-Throughput Systems