name: mlops-engineer
type: reference
description: "Provides MLOps patterns for ML CI/CD pipelines, model registries, monitoring, and data drift detection. Use when setting up ML infrastructure or when the user mentions MLOps, model deployment, ML pipeline, or model monitoring."
paths: ["/*.py", "/Dockerfile", "/requirements*.txt", "/mlflow*", "**/*.yaml"]
effort: 4
allowed-tools: Read, Glob, Grep, Write, Edit, Bash
user-invocable: true
when_to_use: "When building ML pipelines, experiment tracking systems, or model registries with MLflow or Kubeflow"
MLOps Engineer
Tool selection matrix
| Need | Tool | When to use |
|---|
| Experiment tracking | MLflow | Open-source, self-hosted |
| Experiment tracking | W&B | Cloud, rich visualization |
| Pipeline orchestration | Kubeflow | Kubernetes-native |
| Pipeline orchestration | Prefect | Python-first, dynamic |
| Data version control | DVC | Git-based datasets & models |
| Feature store | Feast | Open-source, online+offline |
| Model serving | KServe | K8s serverless inference |
| Model serving | SageMaker Endpoints | AWS managed |
| Monitoring / drift | Evidently | Open-source, alerting |
| CI/CD for ML | GitHub Actions + DVC | Lightweight |
MLflow: experiment tracking + model registry
import mlflow
import mlflow.sklearn
mlflow.set_tracking_uri("http://mlflow-server:5000")
mlflow.set_experiment("model-training")
with mlflow.start_run():
# Log params
mlflow.log_param("n_estimators", 100)
mlflow.log_param("max_depth", 5)
# Train
model = train(X_train, y_train)
metrics = evaluate(model, X_test, y_test)
# Log metrics
mlflow.log_metric("accuracy", metrics["accuracy"])
mlflow.log_metric("f1", metrics["f1"])
# Log model + register
mlflow.sklearn.log_model(
model, "model",
registered_model_name="fraud-detector",
)
# Promote to production via API
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="fraud-detector", version=3, stage="Production"
)
GitHub Actions: ML CI/CD pipeline
name: ML Pipeline
on:
push:
paths: ["data/**", "src/**", "params.yaml"]
jobs:
train-and-validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: iterative/setup-dvc@v1
- name: Pull data
run: dvc pull
- name: Run training pipeline
run: dvc repro
- name: Validate model metrics
run: |
python scripts/check_metrics.py \
--min-accuracy 0.92 \
--min-f1 0.88
- name: Register model if metrics pass
if: github.ref == 'refs/heads/main'
run: python scripts/register_model.py
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
Model serving: FastAPI + model registry
from fastapi import FastAPI
import mlflow.pyfunc
import os
app = FastAPI()
MODEL_NAME = os.environ["MODEL_NAME"]
MODEL_STAGE = os.environ.get("MODEL_STAGE", "Production")
# Load once on startup (cold start cost paid once)
model = mlflow.pyfunc.load_model(f"models:/{MODEL_NAME}/{MODEL_STAGE}")
@app.post("/predict")
async def predict(features: dict):
import pandas as pd
df = pd.DataFrame([features])
predictions = model.predict(df)
return {"predictions": predictions.tolist()}
@app.get("/health")
async def health():
return {"status": "healthy", "model": MODEL_NAME, "stage": MODEL_STAGE}
Data drift monitoring (Evidently)
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
import pandas as pd
def check_drift(reference_data: pd.DataFrame, production_data: pd.DataFrame) -> dict:
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=reference_data, current_data=production_data)
result = report.as_dict()
drift_detected = result["metrics"][0]["result"]["dataset_drift"]
drifted_features = [
f for f, v in result["metrics"][0]["result"]["drift_by_columns"].items()
if v["drift_detected"]
]
return {"drift_detected": drift_detected, "drifted_features": drifted_features}
# Trigger retraining if drift detected
if check_drift(ref, prod)["drift_detected"]:
trigger_retraining_pipeline()
Critical rules (non-obvious)
- Separate training and serving environments — training deps (torch, cuda) bloat serving images by 10x; use multi-stage Dockerfiles or separate images
- Pin all dependencies — ML stack changes break reproducibility; pin Python + all packages, freeze with
pip freeze not just requirements.txt
- Log everything before filtering — never decide what metrics to log during training; log all, filter in dashboards
- Separate model config from code —
params.yaml (DVC) or config.yaml for hyperparameters; never hardcode in training scripts
- Shadow mode before cutover — run new model version in parallel (shadow traffic), compare outputs before switching production
DVC pipeline (dvc.yaml)
stages:
preprocess:
cmd: python src/preprocess.py
deps: [src/preprocess.py, data/raw/]
outs: [data/processed/]
params: [params.yaml:preprocess]
train:
cmd: python src/train.py
deps: [src/train.py, data/processed/]
outs: [models/model.pkl]
params: [params.yaml:train]
metrics: [metrics/train.json]
evaluate:
cmd: python src/evaluate.py
deps: [src/evaluate.py, models/model.pkl, data/processed/]
metrics: [metrics/eval.json]