name: spam-filter-avoidance description: Avoid triggering spam filters with your email content. Use when emails land in spam, auditing content for filter triggers, checking link patterns, or optimizing HTML structure. license: MIT
Spam Filter Avoidance
Understand how spam filters evaluate your email and write content that passes cleanly - without tricks.
When to use this skill
- Writing email content and want to avoid common spam filter triggers
- Emails are landing in spam despite good authentication and reputation
- Building HTML email templates and need to know what patterns get flagged
- Generating email content programmatically (AI agents, templates with dynamic variables)
- Diagnosing why a specific email was filtered when everything else looks correct
- Reviewing outbound email for content-level deliverability risks
Related skills
inbox-placement- the full picture of what determines inbox vs spam (reputation, engagement, authentication, and content)domain-authentication- SPF, DKIM, DMARC setup (filters check authentication before content)sender-reputation- reputation signals that outweigh content in filter decisionstemplate-design- building HTML emails that render correctly and avoid structural triggersemail-compliance- legal requirements like unsubscribe links that also affect filtering
How spam filters actually work
Modern spam filters are not keyword blocklists. They are multi-signal classifiers that evaluate messages across several dimensions simultaneously. Understanding the architecture matters because it tells you what you can and cannot control at the content level.
The filtering pipeline
When your email arrives at a mailbox provider, it passes through these stages in order:
- Connection-level checks - IP reputation, DNS blocklists, TLS, rate limiting. Bad senders get rejected here before the message is even read.
- Authentication checks - SPF, DKIM, DMARC. Failures add negative weight or cause outright rejection (Gmail rejects unauthenticated mail from bulk senders as of November 2025).
- Reputation scoring - Domain history, complaint rates, bounce rates, past engagement. This is the heaviest signal.
- Content analysis - The message itself: headers, subject, body text, HTML structure, links, attachments. This is where the patterns in this skill apply.
- Engagement prediction - ML models predict whether this specific recipient will engage with this specific message, based on past behavior with this sender.
- Final disposition - Inbox, spam, or category tab (Promotions, Other, etc.).
Content analysis is stage 4 of 6. By the time a filter evaluates your content, it has already formed an opinion based on your reputation and authentication. This is why the same content can land in inbox from one sender and spam from another.
The ML reality
Gmail processes over 15 billion unwanted messages daily using ML models trained on billions of user interactions. These models evaluate:
- Semantic meaning - NLP models interpret context and tone, not just keywords
- Structural patterns - HTML complexity, text-to-image ratio, link density
- Behavioral correlation - how recipients with similar profiles reacted to similar messages
- Temporal patterns - unusual send times, volume spikes, sudden content changes
SpamAssassin (still widely used by corporate mail servers, ISPs, and hosting providers) takes a different approach: rule-based scoring where each matched rule adds points toward a threshold (default 5.0). This means specific patterns have specific, predictable scores.
The practical consequence: you need to satisfy both ML classifiers (Gmail, Outlook) and rule-based systems (SpamAssassin, enterprise filters). ML classifiers are harder to game but more forgiving of individual signals. Rule-based systems are predictable but unforgiving when you trip multiple rules.
Subject line patterns
The subject line gets disproportionate attention from filters because spammers rely heavily on urgency and deception in subjects.
What triggers filters
ALL CAPS subjects. SpamAssassin rule SUBJ_ALL_CAPS adds 1.5+ points. Gmail's classifier also treats all-caps as a negative signal. A subject like LIMITED TIME OFFER trips both systems.
Excessive punctuation. Multiple exclamation marks (!!!), question marks (???), or dollar signs ($$$) are classic spam signals. SpamAssassin has specific rules for these. One exclamation mark is fine. Three is a flag.
Spam trigger phrases in subjects. These phrases in subject lines carry more weight than the same phrases in the body:
| Category | Examples | Why they trigger |
|---|---|---|
| Urgency | "Act now", "Limited time", "Urgent", "Expires today" | Pressure tactics are the most common spam pattern |
| Financial | "Free money", "No obligation", "Guaranteed", "Double your income" | Financial fraud is the #1 spam category |
| Deceptive | "Re:", "Fwd:" (on non-replies), "You've been selected" | Fake threading and fake personalization |
| Medical | "Lose weight", "Miracle cure", "No prescription" | Pharmaceutical spam is heavily targeted |
Misleading Re:/Fwd: prefixes. Adding Re: to a subject that isn't a reply trips the FAKE_REPLY rule in SpamAssassin and is actively penalized by Gmail. Same for Fwd: on messages that aren't forwards. Filters check Message-ID, In-Reply-To, and References headers to verify threading.
What's actually safe
- Normal capitalization and punctuation
- Specific, descriptive subjects ("Your invoice for March" not "IMPORTANT DOCUMENT ENCLOSED")
- Personalization with real data (recipient name, company name) - but not fake personalization like "Hi {{first_name}}" with an unfilled variable
- Emojis in moderation - one emoji is fine, five is a flag
Body content patterns
Spam phrases in context
The phrases listed below are not absolute blocklist words. A sender with strong reputation can use "free" or "guaranteed" without consequence. But these phrases add negative weight, and when combined with other signals (new domain, low engagement, poor HTML), they push the score over the threshold.
High-risk phrases (carry the most weight across both ML and rule-based systems):
- "Act now" / "Buy now" / "Order now"
- "Click here" (as the sole anchor text for a link)
- "Free money" / "No cost" / "Risk-free"
- "Winner" / "Congratulations" / "You've won"
- "No obligation" / "No strings attached"
- "Guaranteed" / "100% satisfied"
- "Double your income" / "Earn extra cash"
Medium-risk phrases (contribute to score but rarely trigger alone):
- "Limited time offer"
- "Exclusive deal"
- "Don't miss out"
- "Special promotion"
- "While supplies last"
The real rule: Density matters more than individual words. One instance of "free" in a 500-word email is noise. Five instances of pressure phrases in a 100-word email is spam. Filters evaluate the ratio of promotional language to total content.
Invisible text and encoding tricks
Filters specifically detect attempts to hide content or fool classifiers:
Zero-width characters. Inserting Unicode zero-width spaces (U+200B), zero-width joiners (U+200D), byte order marks (U+FEFF), or soft hyphens (U+00AD) between letters to break up spam words (like "V\u200Biagra") is an old trick that every modern filter detects. These characters are actively flagged and their presence alone is a spam signal.
Invisible text. White text on white background, font-size:0, display:none, or visibility:hidden content is detected by both SpamAssassin (HIDDEN_TEXT rules) and Gmail. Spammers use this to inject "good" text (like news articles) that the recipient can't see but the classifier reads, trying to dilute the spam score. Filters now treat hidden text as a strong negative signal.
HTML comment stuffing. Adding legitimate-looking text inside HTML comments (<!-- buy stocks at ... -->) to influence classifiers. Detected and penalized.
Character substitution. Using Cyrillic characters that look like Latin (e.g., Cyrillic "a" instead of Latin "a") or HTML entities (&#V;iagra) to bypass text matching. Modern filters normalize text before evaluation.
Text-to-code ratio
The ratio of visible text to HTML markup matters. An email that is mostly HTML tags with very little readable text looks like it's trying to hide something. Aim for substantial readable text in every email.
Link patterns
Links are the most scrutinized element in email content because they are the primary mechanism for phishing and malware delivery.
URL shorteners
Do not use URL shorteners (bit.ly, tinyurl.com, t.co, etc.) in email. They are heavily penalized because:
- They obscure the destination URL, which is the primary phishing vector
- Spammers use them to evade URL blocklist checks
- If another sender using the same shortener service gets blocked, your emails using that service may be blocked too - guilt by shared domain
- SpamAssassin has specific rules for known shortener domains (scored 2-4 points)
Use your own domain for all links. If you need click tracking, use a subdomain you own (e.g., track.example.com/click/...) with proper HTTPS.
Link density
Too many links signal promotional or phishing email:
- 0-3 links - normal for transactional and personal email
- 4-7 links - acceptable for newsletters with good text-to-link ratio
- 8+ links - starts triggering density rules, especially if links point to different domains
- 20+ links - almost certainly flagged
SpamAssassin scores increase progressively with link count. The LOTS_OF_MONEY and URI_COUNT family of rules fire at various thresholds.
Mismatched anchor text
When the visible text of a link is a URL that doesn't match the actual href, filters treat this as phishing:
<!-- BAD - anchor text says one URL, href goes somewhere else -->
<a href="https://evil.com/steal">https://www.yourbank.com/login</a>
<!-- BAD - "Click here" as sole anchor text -->
<a href="https://example.com/offer">Click here</a>
<!-- GOOD - descriptive, honest anchor text -->
<a href="https://example.com/pricing">View pricing details</a>
<!-- GOOD - matching URL text -->
<a href="https://example.com">https://example.com</a>
Gmail specifically checks for URL-as-anchor-text mismatches and flags them as potential phishing.
URL blocklists
Every link in your email is checked against real-time URL blocklists (URIBL, SURBL, Google Safe Browsing). SpamAssassin's URIBL rules carry high scores (1.5-3.6 points each). If any domain in your email appears on these lists, the entire message is penalized.
This means:
- Don't link to domains you don't control unless you trust them
- Don't use third-party redirect services
- Monitor your own domains on blocklists (MXToolbox, multirbl.valli.org)
- If you link to user-generated content, validate URLs before including them
HTTP vs HTTPS
All links should use HTTPS. SpamAssassin has rules for HTTP links in email (HTTP_IN_EMAIL), and Gmail treats HTTP links as a minor negative signal. More importantly, some enterprise filters block HTTP links outright as a security policy.
HTML structure
The way your HTML email is constructed tells filters a lot about whether you're a legitimate sender.
Text-to-image ratio
The widely cited guideline is 60:40 text-to-image ratio (by area). The practical rules:
- Minimum 400-500 characters of visible text. Below this, filters suspect your content is hidden in images.
- Never send image-only emails. An email that is one large image with no text is a strong spam signal. Filters can't read text in images, so they treat image-only messages as potentially hiding content.
- Alt text on every image. Besides accessibility, alt text provides text content that helps your text-to-image ratio when images are blocked (which is the default in many email clients on first view).
SpamAssassin's HTML_IMAGE_RATIO_02 rule fires when text-to-image ratio is below 20%. The rule itself has a low score, but it compounds with other signals.
HTML quality
Broken, malformed, or unnecessarily complex HTML is a spam signal:
- Missing closing tags - sloppy HTML suggests auto-generated spam
- Excessive nested tables - some depth is needed for email layout, but extreme nesting (10+ levels) is a flag
- Non-standard tags -
<marquee>,<blink>,<embed>,<object>,<form>tags are stripped by email clients and flagged by filters - JavaScript -
<script>tags,onclick,onload, and other event handlers are always stripped by email clients and are a strong spam signal - Iframes - always stripped and flagged
- CSS external stylesheets - not supported by most email clients and flagged by some filters. Use inline styles.
- Extremely large HTML - emails with more than 100KB of HTML are unusual for legitimate messages
Encoding and character sets
- Declare your character encoding explicitly (
charset=UTF-8in Content-Type) - Don't mix character encodings within a single message
- Avoid base64-encoding the entire body unless necessary (it looks like you're trying to hide content)
Multipart messages
Always send multipart messages with both HTML and plain text parts (multipart/alternative). Missing the plain text version is flagged by SpamAssassin (MIME_HTML_ONLY, scored at 0.7 points) and is a minor negative signal for Gmail.
The plain text part should be a real text rendering of your content, not a copy-paste of the HTML, not blank, and not "View this email in your browser." Recipients on text-only clients or with images disabled see this version.
Header hygiene
Missing or malformed headers are easy to detect and consistently penalized.
Required headers
Every email should include these headers:
| Header | Purpose | What happens without it |
|---|---|---|
From | Sender display name and address | Rejected by most servers |
To | Recipient address | Some filters flag missing/empty To |
Date | When the message was sent | SpamAssassin MISSING_DATE rule fires |
Subject | Message topic | Not technically required but absence is suspicious |
Message-ID | Unique identifier for this message | MISSING_MID rule fires, some providers reject |
MIME-Version | Always 1.0 | MISSING_MIME_VERSION fires |
Content-Type | Media type and charset | Assumed text/plain but absence is a flag |
List-Unsubscribe (marketing email)
For any promotional or marketing email, include both headers:
List-Unsubscribe: <https://example.com/unsub?id=abc123>, <mailto:unsub@example.com?subject=unsubscribe>
List-Unsubscribe-Post: List-Unsubscribe=One-Click
Gmail and Yahoo require one-click unsubscribe for bulk senders (5,000+ messages/day). Microsoft requires it for Outlook.com/Hotmail as of May 2025. Missing these headers on marketing email causes:
- No "Unsubscribe" button shown in email clients (forcing recipients to use the spam button instead)
- Higher complaint rates (because "report spam" becomes the only easy opt-out)
- Filter penalties for non-compliance with bulk sender requirements
The List-Unsubscribe-Post header tells email clients to use POST instead of GET for the unsubscribe request, preventing accidental unsubscribes from security scanners that follow links.
Headers to avoid
- X-Priority: 1 / Importance: High - marking your own email as high priority is a spam signal
- X-Mailer headers from known spam tools
- Excessively long Received chains - suggest the message is being relayed through open relays
Provider-specific differences
Gmail, Microsoft, and Yahoo run different filtering stacks. What passes at one may fail at another.
Gmail
- Heaviest reliance on engagement signals (opens, replies, time spent reading)
- ML-based classifier trained on billions of user actions
- Reply rates are a strong positive signal - a 2-3% reply rate significantly improves placement
- Strict authentication enforcement since November 2025 (unauthenticated bulk mail is rejected, not just filtered)
- Promotions tab is not spam - promotional email that Gmail routes to Promotions is being correctly classified, not penalized
Microsoft (Outlook.com, Hotmail, Exchange Online)
- Uses Exchange Online Protection (EOP) and Microsoft Defender for Office 365
- Introduced LLM-based semantic analysis for BEC (business email compromise) detection in late 2024
- More conservative filtering - Outlook's inbox placement rate has been significantly lower than Gmail's (as low as 27% in Q1 2025 vs Gmail's 87%)
- Sender reputation scoring is less transparent than Gmail's
- More sensitive to link patterns and attachment types
- Stricter authentication requirements for Outlook.com/Hotmail since May 2025
Yahoo
- Aligned with Gmail's bulk sender requirements since February 2024
- Slightly more permissive than Gmail on content signals
- Strong DMARC enforcement
- Less sophisticated engagement tracking than Gmail
SpamAssassin and enterprise filters
- Used by corporate mail servers, hosting providers, and ISPs
- Rule-based scoring with a configurable threshold (default 5.0)
- Over 700 rules covering headers, content, HTML, links, and Bayesian analysis
- Predictable - you can test against SpamAssassin before sending
- Commonly customized per organization, so scores and thresholds vary
Attachments
Attachments carry risk because they're the primary vector for malware delivery.
High-risk attachment types
These file types are blocked by most enterprise filters and many consumer providers:
- Executable:
.exe,.bat,.cmd,.msi,.scr,.pif,.com - Script:
.js,.vbs,.wsf,.ps1,.sh - Macro-capable:
.docm,.xlsm,.pptm(macro-enabled Office files) - Archives with executables inside:
.zip,.rar,.7zcontaining any of the above
Lower-risk attachment types
- PDF - generally safe but scanned for embedded JavaScript and links
- Images -
.jpg,.png,.gifare fine - Standard Office -
.docx,.xlsx,.pptx(non-macro) are usually accepted - Calendar invites -
.icsfiles are fine
Best practices for attachments
- Host files on your server and link to them instead of attaching when possible
- Keep attachment sizes under 10MB (many filters reject larger messages)
- Don't password-protect archives in transactional email - this is a common malware delivery pattern and is flagged
What doesn't work (and why)
These are techniques that either never worked, worked briefly, or actively make things worse.
"Spinning" or synonym substitution
Replacing spam words with synonyms ("F.R.E.E" instead of "free", "vi@gra" instead of "viagra") was defeated by filters over a decade ago. Modern classifiers normalize text, expand character substitutions, and evaluate semantic meaning. Attempted obfuscation is itself a spam signal.
White text / hidden text injection
Adding invisible "good" text (news articles, Shakespeare) to dilute spam scores stopped working when filters started detecting hidden content as a strong negative signal. SpamAssassin has specific rules for hidden text. Gmail's classifier treats any hidden content as suspicious.
Image-only emails
Putting all your content in a single image to avoid text analysis has never worked reliably. Filters can't read the text, so they assume the worst. Additionally, many email clients block images by default, so the recipient sees a blank email.
Sending from constantly new domains
Rotating through new domains to avoid reputation damage doesn't work because new domains have no reputation, which is itself a strong spam signal. Warming up a domain takes weeks, and providers track patterns across domains registered by the same entity.
Character encoding tricks
Using Unicode homoglyphs, zero-width characters, or HTML entities to break up spam words is detected by modern filters. The content-sanitizer in production email systems strips these characters before evaluation. Presence of these characters is treated as an evasion attempt.
Testing before sending
SpamAssassin scoring
Test your emails against SpamAssassin before sending. Several tools offer this:
- mail-tester.com - send an email, get a SpamAssassin score breakdown
- GlockApps - tests against SpamAssassin plus inbox placement across providers
- Mailtrap - SpamAssassin scoring in staging environments
Target a SpamAssassin score below 3.0 (threshold is 5.0, but some servers use lower thresholds like 3.0 or even 2.0).
Seed list testing
Send to test accounts at Gmail, Outlook, Yahoo, and any provider your recipients commonly use. Check:
- Does it land in inbox or spam?
- If Gmail, does it go to Primary or Promotions?
- Do images load? Do links work?
- What does the "Show original" reveal about authentication and filter headers?
Header analysis
Check the Authentication-Results and X-Spam-Status headers on received messages. They tell you exactly which checks passed, failed, and what scores were assigned.
Content checklist
Before sending, verify:
- Subject line uses normal capitalization (not ALL CAPS)
- No excessive punctuation (!!!, ???, $$$)
- No fake Re:/Fwd: prefixes on non-threaded messages
- Body contains at least 400-500 characters of visible text
- No hidden text (white-on-white, font-size:0, display:none)
- No zero-width Unicode characters or encoding tricks
- All links use HTTPS
- No URL shorteners (bit.ly, tinyurl, etc.)
- Link anchor text is descriptive (not "Click here")
- Link anchor text matches destination URL (no mismatches)
- Fewer than 8 links in a single message (fewer is better)
- Links don't point to blocklisted domains
- Images have alt text
- HTML includes a plain text alternative (multipart/alternative)
- HTML is well-formed (no unclosed tags, no script/iframe/form elements)
- Message-ID, Date, MIME-Version headers are present
- List-Unsubscribe and List-Unsubscribe-Post headers are present (marketing email)
- No high-priority markers (X-Priority: 1)
- Attachments (if any) use safe file types
- SpamAssassin score is below 3.0
Common mistakes
Obsessing over spam words while ignoring reputation. Content analysis is roughly 10% of the inbox placement decision. If your domain reputation is poor or your complaint rate is above 0.1%, no amount of content optimization will save you. Fix reputation first, then optimize content.
Sending image-only emails for "beautiful design." A single large image with no text is the worst thing you can send from a deliverability perspective. Build your layout in HTML with real text. Use images for supporting visuals, not as the entire message.
Using URL shorteners for "cleaner" links. bit.ly and similar services share their domain across all users, including spammers. Use your own domain for tracking links.
Adding invisible text to "improve" spam scores. This backfired years ago and now actively hurts deliverability. Hidden text, white-on-white text, and zero-width character insertion are all detected and penalized.
Missing the plain text part. Sending HTML-only email without a text/plain alternative trips SpamAssassin's MIME_HTML_ONLY rule and is a minor negative across all providers. Always include a real plain text version.
Not testing across providers. An email that passes Gmail's filter might fail Microsoft's, or vice versa. Test at the providers your recipients actually use, not just the one you use.
Ignoring List-Unsubscribe headers on marketing email. Without the unsubscribe header, recipients who want to opt out will hit the "Report Spam" button instead. Every spam complaint costs you significantly more than an unsubscribe.
Dynamic content without content review. AI agents and template systems that generate email content at runtime can produce phrases or patterns that score poorly in spam classifiers. The sender doesn't see the content before it goes out, so problems compound silently. Implement content linting on outbound messages - check for known spam phrases, validate link patterns, and enforce text-to-image ratios before the message reaches the wire.
References
- Gmail Spam Filter and Sender Guidelines - Google's official sender requirements
- Microsoft Anti-spam Protection - Exchange Online Protection documentation
- Yahoo Sender Best Practices - Yahoo's sender requirements
- SpamAssassin Test Descriptions - full list of SpamAssassin rules
- SpamAssassin Default Scores - scoring configuration
- RFC 8058 - One-Click Unsubscribe
- RFC 2045 - MIME format specification
- M3AAWG Best Practices - industry anti-abuse guidelines
- Google Postmaster Tools - monitor your domain's reputation and spam rate at Gmail
- URIBL - real-time URI blocklist
- SURBL - spam URI blocklist