name: langchain-multi-env-setup
description: "Build reliable dev / staging / prod isolation for LangChain 1.0 services
\ \u2014\nPydantic Settings + SecretStr, cloud Secret Manager in prod, per-env\n
prompt and model version pinning, env-specific checkpointer and observability.\n
Use when graduating from .env-in-dev to real prod infra, or debugging a\nconfig
\ that loaded the wrong values in the wrong env.\nTrigger with "langchain multi-env"
, "langchain pydantic settings",\n"langchain secret manager", "langchain env
\ config", "langchain prod setup".\n"
allowed-tools: Read, Write, Edit, Bash(python:), Bash(gcloud:)
version: 2.0.0
license: MIT
author: Jeremy Longshore jeremy@intentsolutions.io
tags:
- saas
- langchain
- langgraph
- python
- langchain-1.0
- config
- pydantic
- multi-env
- secrets compatibility: Designed for Claude Code, also compatible with Codex
LangChain Multi-Env Setup (Python)
Overview
A team ships a LangChain 1.0 service to staging with python-dotenv loading
.env.staging into os.environ. Security audits —
docker exec STAGING-POD env prints ANTHROPIC_API_KEY=sk-ant-api03-... in
plain text. Anyone with kubectl exec, any sidecar, any core dump, any
error tracker that auto-captures process env sees the key. This is pain
P37: secrets loaded from .env in production containers leak via env.
A second failure chains. A developer runs the staging deploy from a shell
where LANGCHAIN_ENV=production was set hours earlier. The loader picks
the prod .env, staging answers with a prompt commit tuned only for the
prod model tier, latency doubles. Two root causes: no type-safe env gate,
no startup validation that would have caught the mismatched model id.
Both are one refactor:
# BAD — dotenv populates os.environ; any process with container access sees it
from dotenv import load_dotenv
load_dotenv(".env.production")
api_key = os.environ["ANTHROPIC_API_KEY"] # P37: leaks via `docker exec env`
# GOOD — SecretStr in a validated Settings object, pulled from Secret Manager
from pydantic import SecretStr
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
env: Literal["dev", "staging", "prod"]
anthropic_api_key: SecretStr
settings = build_settings() # pulls from GCP Secret Manager in prod
api_key = settings.anthropic_api_key.get_secret_value()
# repr(settings) prints `SecretStr('**********')` — safe to log
This skill owns the per-env config plumbing — Settings skeleton,
Secret Manager integration, per-env pinning, startup smoke test. It does
not own the full secrets lifecycle (rotation, revocation, scope) —
that belongs to langchain-security-basics.
Pin: langchain-core 1.0.x, langchain-anthropic 1.0.x, pydantic >= 2.5,
pydantic-settings >= 2.1. Pain anchors: P37 (primary), P20
(checkpointer schema — cross-ref langchain-langgraph-checkpointing).
Two numbers: smoke test < 10 seconds; env-var count ~15-30 (more
than 30 means Settings is absorbing feature flags and should split).
Prerequisites
- Python 3.10+ (3.11+ recommended for
LiteralandStrEnumergonomics) langchain-core >= 1.0, < 2.0pydantic >= 2.5,pydantic-settings >= 2.1- One secret backend: GCP Secret Manager (
google-cloud-secret-manager), AWS Secrets Manager (boto3), or HashiCorp Vault (hvac) - Completed
langchain-sdk-patterns— theSettingsobject is injected into the chain factories from that skill
Instructions
Run these six steps in order — each adds one invariant the next step depends on:
- Define a
Settingsclass withSecretStrkeys,Literalenv, and fail-fast validation. - Add a per-env loader — file in dev, env vars in staging, Secret Manager in prod.
- Use the cloud Secret Manager client to pull keys into memory only.
- Pin
model_id,prompt_commit_hash, andvector_index_nameper env. - Configure the checkpointer per env — memory in dev, Postgres elsewhere.
- Run a startup smoke test under 10 seconds before the HTTP server binds.
Step 1 — Create a Settings class with SecretStr and fail-fast validation
from typing import Literal
from pydantic import SecretStr, HttpUrl, Field, ValidationError
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=None, # see Step 2 — loader picks the file
env_file_encoding="utf-8",
case_sensitive=False,
extra="forbid", # reject unknown env vars — typo detection
)
# --- env switch (drives everything else) ---
env: Literal["dev", "staging", "prod"] = Field(..., alias="LANGCHAIN_ENV")
# --- secrets (always SecretStr — never str) ---
anthropic_api_key: SecretStr = Field(..., alias="ANTHROPIC_API_KEY")
openai_api_key: SecretStr = Field(..., alias="OPENAI_API_KEY")
langsmith_api_key: SecretStr = Field(..., alias="LANGSMITH_API_KEY")
# --- per-env pinning (see Step 4) ---
model_id: str = Field(..., alias="LANGCHAIN_MODEL_ID")
prompt_commit_hash: str = Field(..., alias="LANGCHAIN_PROMPT_COMMIT")
vector_index_name: str = Field(..., alias="LANGCHAIN_VECTOR_INDEX")
# --- endpoints (validated URLs — typo caught at startup) ---
checkpointer_url: HttpUrl | None = Field(None, alias="LANGCHAIN_CHECKPOINTER_URL")
otel_endpoint: HttpUrl = Field(..., alias="OTEL_EXPORTER_OTLP_ENDPOINT")
# --- budget guards (per-env) ---
max_cost_usd_per_day: float = Field(10.0, alias="LANGCHAIN_DAILY_BUDGET_USD")
max_rpm: int = Field(60, alias="LANGCHAIN_MAX_RPM")
SecretStr masks repr(settings) to SecretStr('**********') — a routine
logger.info(settings) cannot leak the key. The only way to read plaintext
is .get_secret_value(), which greps like a sore thumb in review.
extra="forbid" catches typos (LANGCHIN_MODEL_ID) at import time.
HttpUrl rejects http:/otel:4318 before the exporter wastes 60s on DNS.
See Settings Skeleton for the full class.
Step 2 — Per-env config loading (file OR Secret Manager, never both)
import os
from pathlib import Path
def build_settings() -> Settings:
env = os.environ.get("LANGCHAIN_ENV", "dev")
if env == "dev":
# Local dev: .env.dev file, values checked into 1Password not git
return Settings(_env_file=Path(".env.dev"))
if env == "staging":
# CI / staging: env vars injected by the orchestrator
# (GitHub Actions secrets, k8s envFrom: secretRef, etc.)
return Settings() # reads os.environ directly
if env == "prod":
# Prod: pull from Secret Manager into memory ONLY
values = pull_from_secret_manager()
return Settings(**values)
raise ValueError(f"unknown LANGCHAIN_ENV: {env!r}")
Three loaders, one class. Dev touches a file on disk. Staging inherits env
vars from the orchestrator — envFrom: secretRef is readable via
docker exec env, but the blast radius is bounded and rotation is weekly.
Prod is the P37 fix: pull_from_secret_manager() builds a dict and passes
kwargs to Settings(...). Values land in the instance attribute and
never touch os.environ. A subprocess will not inherit them.
Step 3 — Secret Manager pull (GCP example; AWS / Vault in reference)
from google.cloud import secretmanager
def pull_from_secret_manager() -> dict[str, str]:
client = secretmanager.SecretManagerServiceClient()
project = os.environ["GCP_PROJECT_ID"]
secret_names = ["ANTHROPIC_API_KEY", "OPENAI_API_KEY", "LANGSMITH_API_KEY"]
out: dict[str, str] = {}
for name in secret_names:
resource = f"projects/{project}/secrets/{name}/versions/latest"
response = client.access_secret_version(request={"name": resource})
out[name] = response.payload.data.decode("utf-8")
# Non-secret passthrough (model id, prompt hash, endpoints)
for key in ["LANGCHAIN_ENV", "LANGCHAIN_MODEL_ID", "LANGCHAIN_PROMPT_COMMIT",
"LANGCHAIN_VECTOR_INDEX", "LANGCHAIN_CHECKPOINTER_URL",
"OTEL_EXPORTER_OTLP_ENDPOINT"]:
if key in os.environ:
out[key] = os.environ[key]
return out
No os.environ[k] = v line. The dict goes straight into
Settings(**values). Workload-identity IAM handles auth; no static key on
disk. For AWS / Vault see Secret Manager Integration.
Step 4 — Per-env model and prompt pinning
Dev, staging, and prod run different model ids and different prompt
commit hashes. Pinning happens at env-var level so app code is env-agnostic
(see the Env Matrix below for values). One function reads
settings.prompt_commit_hash and pulls from LangSmith
(cross-ref langchain-prompt-engineering):
from langsmith import Client
ls = Client(api_key=settings.langsmith_api_key.get_secret_value())
def get_prompt(settings: Settings) -> ChatPromptTemplate:
return ls.pull_prompt(f"triage-prompt:{settings.prompt_commit_hash}")
Prevents: staging loading a prod prompt commit. Pinning per env makes promotion explicit — dev → staging → prod moves one hash at a time. See Per-Env Pinning.
Step 5 — Per-env checkpointer selection
Checkpointer choice is per-env too:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.checkpoint.postgres import PostgresSaver
def build_checkpointer(settings: Settings):
if settings.env == "dev":
return MemorySaver() # ephemeral, resets on restart
# staging + prod: Postgres with env-isolated schema
# cross-ref langchain-langgraph-checkpointing (P20) for schema migration
return PostgresSaver.from_conn_string(
str(settings.checkpointer_url)
)
Dev uses MemorySaver — no infra dependency, no state between runs.
Staging and prod use PostgresSaver against separate databases (or
separate schemas). Never share a checkpointer DB between envs; P20 explains
— schema migrations on a version bump corrupt cross-env threads.
Step 6 — Startup smoke test (< 10 seconds budget)
import time
from anthropic import Anthropic
def validate_integrations(settings: Settings) -> None:
t0 = time.monotonic()
# 1. Model reachable (1-token ping ~ $0.00001)
anthropic = Anthropic(api_key=settings.anthropic_api_key.get_secret_value())
anthropic.messages.create(
model=settings.model_id,
max_tokens=1,
messages=[{"role": "user", "content": "hi"}],
)
# 2. Checkpointer reachable
if settings.env != "dev":
checkpointer = build_checkpointer(settings)
checkpointer.setup() # runs SELECT 1 + schema check
# 3. Vector store reachable (see langchain-embeddings-search)
# ... describe_index call here ...
# 4. Observability endpoint reachable (OTLP HTTP health)
# ... requests.get(f"{settings.otel_endpoint}/health", timeout=2) ...
elapsed = time.monotonic() - t0
if elapsed > 10.0:
raise RuntimeError(
f"startup smoke test took {elapsed:.1f}s (budget 10s)"
)
Call validate_integrations(settings) before the HTTP server binds.
Failure aborts the deploy — the readiness probe never goes green, the
rollout halts, the bad version takes no traffic. Budget: 10 seconds.
Past 10s an integration is degraded — fail loudly rather than ship a 30s
cold start. See Startup Smoke Test.
Output
Settingsclass onpydantic-settingswithSecretStrfor keys,Literalenv,HttpUrlendpoints,extra="forbid"- Env-specific loader (file → dev; env vars → staging; Secret Manager → prod); values land in
Settingsonly, neveros.environ - Cloud Secret Manager integration (GCP / AWS / Vault) with IAM-bound auth; no static keys on disk
- Per-env pinning for
model_id,prompt_commit_hash,vector_index_name,checkpointer_url - Per-env checkpointer (
MemorySaverdev,PostgresSaveron isolated DBs staging/prod) - Startup smoke test — model / vector / checkpointer / observability under 10-second budget
Env Matrix
| Dimension | dev | staging | prod |
|---|---|---|---|
| Secret backend | .env.dev file (git-ignored) | orchestrator env vars | cloud Secret Manager, memory only |
os.environ holds keys | yes (local) | yes (sidecar visible) | no (P37 fix) |
model_id | claude-haiku-4-6 | claude-sonnet-4-6 | claude-sonnet-4-6 |
prompt_commit_hash | WIP | canary | stable (1 week old) |
temperature | 0.7 | 0.2 | 0.2 |
| Checkpointer | MemorySaver | PostgresSaver (staging DB) | PostgresSaver (prod DB) |
| Vector index | dev-index | staging-index | prod-index |
| OTEL sample rate | 1.0 | 1.0 | 0.1 |
| RPM limit | 10 | 60 | provider tier |
| Daily budget | $1 | $10 | $500-$5000 |
| Smoke probes | model | model + checkpointer + OTEL | all four |
Error Handling
| Error | Cause | Fix |
|---|---|---|
docker exec POD env shows ANTHROPIC_API_KEY=... in prod (P37) | dotenv / plain env injection in prod | Pull from Secret Manager into Settings(**values); never write to os.environ |
| Staging answers with prod prompts / wrong model | Loader defaulted or picked stale LANGCHAIN_ENV | Literal["dev","staging","prod"] on env; raise on unknown; no default |
ValidationError: extra fields forbidden at startup | Typo (LANGCHIN_MODEL_ID) | Fix the typo — extra="forbid" working as intended |
| Startup takes 30s before first request | Serialized probes or degraded integration | Enforce 10s budget; parallelize probes; fail the deploy |
repr(settings) in a log leaks the API key | Plain str used, not SecretStr | Change field to SecretStr; repr masks to '**********' |
Prod silently using MemorySaver | build_checkpointer defaulted when checkpointer_url was None | Require checkpointer_url in staging/prod via a model validator |
| Secret Manager auth fails in CI | SA not bound; google.auth fell back to ADC | Bind SA with roles/secretmanager.secretAccessor |
| Prompt hash rolled forward in staging without dev validation | Promotion skipped the dev gate | Enforce dev → staging → prod order in CI (see per-env pinning ref) |
Examples
Graduating a .env-in-dev service to prod
Start: a single .env committed (or leaked via docker exec env). End:
Settings class, three loaders, Secret Manager in prod, smoke test under
10s. Three PRs — (1) introduce Settings without changing loader behavior,
(2) add SecretStr and migrate call sites to .get_secret_value(),
(3) swap prod to Secret Manager and remove the prod .env from the image.
See Settings Skeleton and
Secret Manager Integration.
Wrong-env prompt loaded in staging — postmortem
Staging inherited LANGCHAIN_ENV=production from a stale shell. The
Literal["dev","staging","prod"] field rejects production; CI promotion
sets LANGCHAIN_ENV explicitly; direnv pins it per-project. See
Per-Env Pinning.
Smoke test blocked a bad model id
A prod deploy went out with LANGCHAIN_MODEL_ID=claude-sonnet-4-7 (not yet
rolled out). The 1-token ping failed with model not found,
validate_integrations raised, the container crash-looped, the rollout
halted, the previous version kept taking traffic. Zero user impact; failure
budget stayed under 3s. See Startup Smoke Test.
Resources
- Pydantic Settings docs
- Pydantic
SecretStr - GCP Secret Manager client
- AWS Secrets Manager
boto3 - HashiCorp Vault
hvac - LangChain 1.0 release notes
- Related skills in pack:
langchain-security-basics(secrets lifecycle, owns rotation and revocation — not duplicated here);langchain-langgraph-checkpointing(P20 schema migration);langchain-prompt-engineering(prompt pin / LangSmith pull workflow);langchain-reference-architecture(whereSettingsfits in the DI layer) - Pack pain catalog:
docs/pain-catalog.md(entries P37 primary, P20 cross-ref)