Skills Research

Skillsとは？

Skills（SKILL.md）は、AIエージェント（Claude Code、Cursor、Codexなど）に特定の能力を追加するための設定ファイルです。

詳しく見る →

リンク

検索に戻る

langchain-prod-checklist

https://github.com/jeremylongshore/claude-code-plugins-plus-skills

★2,009更新: 1週間前Claude CodeCodex🔧 開発ツールテスト自動化テストドキュメント Git・PR デバッグEnglish

name: langchain-prod-checklist description: 'Production readiness checklist for LangChain applications.

Use when preparing for launch, validating deployment readiness,

or auditing existing production LangChain systems.

Trigger: "langchain production", "langchain prod ready",

"deploy langchain", "langchain launch checklist", "go-live langchain".

' allowed-tools: Read, Write, Edit, Bash(node:), Bash(npm:) version: 1.0.0 license: MIT author: Jeremy Longshore jeremy@intentsolutions.io tags:

saas
langchain
deployment
audit compatibility: Designed for Claude Code, also compatible with Codex and OpenClaw

LangChain Production Checklist

Overview

Comprehensive go-live checklist for deploying LangChain applications to production. Covers configuration, resilience, observability, performance, security, testing, deployment, and cost management.

1. Configuration & Secrets

All API keys in secrets manager (not .env in production)
Environment-specific configs (dev/staging/prod) validated with Zod
Startup validation fails fast on missing config
.env files in .gitignore

// Startup validation
import { z } from "zod";

const ProdConfig = z.object({
  OPENAI_API_KEY: z.string().startsWith("sk-"),
  LANGSMITH_API_KEY: z.string().startsWith("lsv2_"),
  NODE_ENV: z.literal("production"),
});

try {
  ProdConfig.parse(process.env);
} catch (e) {
  console.error("Invalid production config:", e);
  process.exit(1);
}

2. Error Handling & Resilience

maxRetries configured on all models (3-5)
timeout set on all models (30-60s)
Fallback models configured with .withFallbacks()
Error responses return safe messages (no stack traces to users)

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 5,
  timeout: 30000,
}).withFallbacks({
  fallbacks: [new ChatAnthropic({ model: "claude-sonnet-4-20250514" })],
});

3. Observability

LangSmith tracing enabled (LANGSMITH_TRACING=true)
LANGCHAIN_CALLBACKS_BACKGROUND=true (non-serverless only)
Structured logging on all LLM/tool calls
Prometheus metrics exported (requests, latency, tokens, errors)
Alerting rules configured (error rate >5%, P95 latency >5s)

4. Performance

Caching enabled for repeated queries (Redis or SQLite)
maxConcurrency set on batch operations
Streaming enabled for user-facing responses
Connection pooling configured
Prompt length optimized (no unnecessary verbosity)

5. Security

User input isolated in human messages (never in system prompts)
Input length limits enforced
Prompt injection patterns logged/flagged
Tools restricted to allowlisted operations
LLM output validated before display (no PII/key leakage)
Audit logging on all LLM and tool calls
Rate limiting per user/IP

6. Testing

Unit tests for all chains (using FakeListChatModel, no API calls)
Integration tests with real LLMs (gated behind CI secrets)
RAG pipeline validation (retrieval relevance + no hallucination)
Tool unit tests (valid input, invalid input, error cases)
Load testing completed (concurrent users, batch operations)

7. Deployment

Health check endpoint returns LLM connectivity status
Graceful shutdown handles in-flight requests
Rolling deployment (zero downtime)
Rollback procedure documented and tested
Container resource limits set (memory, CPU)

// Health check endpoint
app.get("/health", async (_req, res) => {
  const checks: Record<string, string> = { server: "ok" };

  try {
    await model.invoke("ping");
    checks.llm = "ok";
  } catch (e: any) {
    checks.llm = `error: ${e.message.slice(0, 100)}`;
  }

  const healthy = Object.values(checks).every((v) => v === "ok");
  res.status(healthy ? 200 : 503).json({ status: healthy ? "healthy" : "degraded", checks });
});

// Graceful shutdown
process.on("SIGTERM", async () => {
  console.log("Shutting down gracefully...");
  server.close(() => process.exit(0));
  setTimeout(() => process.exit(1), 10000); // force after 10s
});

8. Cost Management

Token usage tracking callback attached
Daily/monthly budget limits enforced
Model tiering: cheap model for simple tasks, powerful for complex
Cost alerts configured (Slack/email on threshold)
Cost per user/tenant tracked

Pre-Launch Validation Script

async function validateProduction() {
  const results: Record<string, string> = {};

  // 1. Config
  try {
    ProdConfig.parse(process.env);
    results["Config"] = "PASS";
  } catch { results["Config"] = "FAIL: missing env vars"; }

  // 2. LLM connectivity
  try {
    await model.invoke("ping");
    results["LLM"] = "PASS";
  } catch (e: any) { results["LLM"] = `FAIL: ${e.message.slice(0, 50)}`; }

  // 3. Fallback
  try {
    const fallbackModel = model.withFallbacks({ fallbacks: [fallback] });
    await fallbackModel.invoke("ping");
    results["Fallback"] = "PASS";
  } catch { results["Fallback"] = "FAIL"; }

  // 4. LangSmith
  results["LangSmith"] = process.env.LANGSMITH_TRACING === "true" ? "PASS" : "WARN: disabled";

  // 5. Health endpoint
  try {
    const res = await fetch("http://localhost:8000/health");
    results["Health"] = res.ok ? "PASS" : "FAIL";
  } catch { results["Health"] = "FAIL: not reachable"; }

  console.table(results);
  const allPass = Object.values(results).every((v) => v === "PASS");
  console.log(allPass ? "READY FOR PRODUCTION" : "ISSUES FOUND - FIX BEFORE LAUNCH");
  return allPass;
}

Error Handling

Issue	Cause	Fix
API key missing at startup	Secrets not mounted	Check deployment config
No fallback on outage	`.withFallbacks()` not configured	Add fallback model
LangSmith trace gaps	Background callbacks in serverless	Set `LANGCHAIN_CALLBACKS_BACKGROUND=false`
Cache miss storm	Redis down	Implement graceful degradation

Resources

Next Steps

After launch, use langchain-observability for monitoring and langchain-incident-runbook for incident response.

GitHub で開く Raw を見るマーケットで出品する →

ナビゲーション