name: secret-patterns description: 30+ service-specific secret detection regex patterns, entropy-based detection, PEM/JWT/Base64 identification, and false positive filtering.
Secret Detection Patterns
Patterns for finding leaked credentials in codebases, git history, and CI logs.
AWS Credentials
# Access Key ID: always starts with AKIA (long-term) or ASIA (session)
AKIA[0-9A-Z]{16}
ASIA[0-9A-Z]{16}
# Secret Access Key: 40-char base64-ish string after aws_secret
aws_secret_access_key\s*=\s*[A-Za-z0-9/+=]{40}
# ripgrep one-liner
rg --no-heading -n '(AKIA|ASIA)[0-9A-Z]{16}' .
GitHub Tokens
# Personal access tokens (classic and fine-grained)
ghp_[A-Za-z0-9]{36}
github_pat_[A-Za-z0-9_]{82}
# OAuth / app tokens
gho_[A-Za-z0-9]{36}
ghs_[A-Za-z0-9]{36}
ghu_[A-Za-z0-9]{36}
ghr_[A-Za-z0-9]{36}
rg --no-heading -n 'gh[pousr]_[A-Za-z0-9]{36}' .
Stripe Keys
# Live secret (never commit)
sk_live_[A-Za-z0-9]{24,}
# Test secret (flag but lower severity)
sk_test_[A-Za-z0-9]{24,}
# Publishable keys (public, lower severity)
pk_live_[A-Za-z0-9]{24,}
pk_test_[A-Za-z0-9]{24,}
rg --no-heading -n 'sk_(live|test)_[A-Za-z0-9]{24,}' .
OpenAI / Anthropic Keys
# OpenAI
sk-proj-[A-Za-z0-9\-_]{50,}
sk-[A-Za-z0-9]{48}
# Anthropic
sk-ant-[A-Za-z0-9\-_]{90,}
rg --no-heading -n '(sk-proj-|sk-ant-)' .
JWT Tokens
# Three base64url segments separated by dots
eyJ[A-Za-z0-9\-_]+\.eyJ[A-Za-z0-9\-_]+\.[A-Za-z0-9\-_]+
# Decode header to verify (Python)
import base64, json
header = token.split('.')[0] + '=='
print(json.loads(base64.urlsafe_b64decode(header)))
PEM Private Keys
# RSA, EC, OpenSSH private keys
-----BEGIN (RSA |EC |OPENSSH )?PRIVATE KEY-----
rg --no-heading -n '\-\-\-\-\-BEGIN.+PRIVATE KEY' .
# Also catch generic PEM blocks
-----BEGIN CERTIFICATE-----
-----BEGIN PGP PRIVATE KEY BLOCK-----
Slack Tokens
xoxb-[0-9]{11,}-[0-9]{11,}-[A-Za-z0-9]{24} # Bot token
xoxp-[0-9]{11,}-[0-9]{11,}-[A-Za-z0-9]{32} # User token
xoxe\.[A-Za-z0-9\-_]{80,} # Enterprise grid
xoxs-[A-Za-z0-9\-]{80,} # SCIM token
rg --no-heading -n 'xox[bpse][-.]' .
Database Connection Strings
# PostgreSQL
postgres(ql)?://[^:]+:[^@]+@[^/]+/\S+
postgresql\+asyncpg://[^:]+:[^@]+@
# MongoDB
mongodb(\+srv)?://[^:]+:[^@]+@
# MySQL / MariaDB
mysql://[^:]+:[^@]+@
rg --no-heading -n '(postgres|mongodb|mysql)://\S+:\S+@' .
NPM, SendGrid, Twilio, Mailgun
# NPM publish token
npm_[A-Za-z0-9]{36}
# SendGrid
SG\.[A-Za-z0-9\-_]{22}\.[A-Za-z0-9\-_]{43}
# Twilio account SID + auth token
AC[0-9a-f]{32} # Account SID
SK[0-9a-f]{32} # API Key SID
# Mailgun API key
key-[A-Za-z0-9]{32}
rg --no-heading -n '(npm_[A-Za-z0-9]{36}|SG\.[A-Za-z0-9_-]{22}\.)' .
SSH Private Keys
-----BEGIN OPENSSH PRIVATE KEY-----
-----BEGIN RSA PRIVATE KEY-----
-----BEGIN EC PRIVATE KEY-----
-----BEGIN DSA PRIVATE KEY-----
Google Service Account JSON
// Detect the "type": "service_account" JSON pattern
rg --no-heading -n '"type"\s*:\s*"service_account"' .
// Or the private_key field
rg --no-heading -n '"private_key"\s*:\s*"-----BEGIN' .
High-Entropy String Detection (Shannon Entropy)
import math
from collections import Counter
def shannon_entropy(s: str) -> float:
if not s:
return 0.0
freq = Counter(s)
length = len(s)
return -sum((c / length) * math.log2(c / length) for c in freq.values())
def is_high_entropy_secret(value: str, threshold: float = 4.5) -> bool:
# Ignore short strings and common words
if len(value) < 20:
return False
return shannon_entropy(value) > threshold
# Example usage
candidates = [
"AKIAIOSFODNN7EXAMPLE", # entropy ~4.1 (low, example key)
"wJalrXUtnFEMI/K7MDENG/bPxRfi", # entropy ~4.7 (real secret)
"my-password-123", # entropy ~3.8 (low)
]
for c in candidates:
print(f"{c[:20]}... entropy={shannon_entropy(c):.2f} secret={is_high_entropy_secret(c)}")
False Positive Filtering Rules
FALSE_POSITIVE_PATHS = [
r'test[s]?/',
r'__tests__/',
r'spec/',
r'fixtures/',
r'examples/',
r'docs/',
r'\.md$',
r'CHANGELOG',
r'README',
]
FALSE_POSITIVE_VALUES = [
'example',
'placeholder',
'your_key_here',
'INSERT_KEY',
'REPLACE_ME',
'xxxx',
'1234',
'test',
'dummy',
'fake',
]
def is_false_positive(path: str, value: str) -> bool:
import re
for pattern in FALSE_POSITIVE_PATHS:
if re.search(pattern, path, re.IGNORECASE):
return True
lower = value.lower()
return any(fp in lower for fp in FALSE_POSITIVE_VALUES)
Pre-commit Hook Integration
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.4
hooks:
- id: gitleaks
- repo: https://github.com/trufflesecurity/trufflehog
rev: v3.63.5
hooks:
- id: trufflehog
entry: trufflehog git file://. --since-commit HEAD --only-verified --fail
# Manual scan of full git history
gitleaks detect --source . --log-opts="--all"
# TruffleHog targeted scan
trufflehog git https://github.com/org/repo --only-verified
# ripgrep composite scan (fast, no install)
rg --no-heading -n \
'(AKIA|ASIA|ghp_|sk_live_|sk-proj-|sk-ant-|xoxb-|npm_|SG\.|-----BEGIN.+PRIVATE)' \
--glob '!{node_modules,dist,.git}' .
Key rule: Any match outside test/, docs/, or examples/ with entropy > 4.5 is a confirmed finding. Rotate immediately.