name: vc-finder description: 'Takes a startup product URL or description, detects the industry and funding stage, identifies 5 comparable funded companies, searches who invested in those companies (Track A), finds VCs who publish investment theses about this space (Track B), and returns a ranked sourced list of relevant investors with deep-dives and outreach hooks. Use when asked to find investors for a startup, identify which VCs fund products like mine, research who backs companies in my space, build a VC target list, or find investor-market fit.' compatibility: [claude-code, gemini-cli, github-copilot]

VC Finder

Take a product URL or description. Detect industry and stage. Find 5 comparable funded companies. Run two research tracks: who invested in those comparables (Track A), and which VCs publish theses about this space (Track B). Return a sourced, ranked investor list with outreach hooks.

Zero-hallucination policy: Every fact in the output must be traceable to a specific Tavily search result or the fetched product page. This applies to:

Comparable company names: must appear in Tavily search results, not AI training knowledge
VC fund names: must appear verbatim in Tavily search results
Check sizes, stage focus, portfolio companies: must come from search snippets, not AI knowledge
Fund overviews and thesis summaries: extracted from search snippets only. If a detail is not in the search data, write "not found in search data" -- do not fill from training knowledge.

Common Mistakes

The agent will want to...	Why that's wrong
Add a16z or Sequoia because they are famous	A famous VC without evidence is noise. Only include VCs that appear in Tavily search results for this specific product. Name-dropping wastes the founder's time.
Generate comparable companies from training knowledge	Comparables must come from Tavily search results (Step 6). AI knowledge of companies is not evidence -- a company suggested from memory may have wrong funding status or may not be a true comparable.
Continue when all 5 Track A searches return 0 results	Zero Track A results means the comparables were wrong or too obscure. Stop, re-run Step 6 with broader search queries, and retry.
Include a Track B VC without citing the article or post	Thesis without a source is indistinguishable from hallucination. The founder cannot verify it and the list loses all credibility.
Fill in fund overview from training knowledge	Fund overviews must come from Tavily snippet text only. If the snippets don't describe the fund, write "not found in search data".
Detect stage from website aesthetics	Stage must come from the specific CTA signals detected in Step 4.
Write generic outreach hooks	Every outreach hook must name this specific product's differentiator and a specific VC portfolio signal or thesis quote from the search data.
Skip the URL fetch when the user also provides a description	Always fetch the URL. The live page often reveals stage signals that the user's description omits.

Step 1: Setup Check

echo "TAVILY_API_KEY:    ${TAVILY_API_KEY:+set}"
echo "FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY:-not set, Tavily extract will be used as fallback}"

If TAVILY_API_KEY is missing: Stop. Tell the user: "TAVILY_API_KEY is required to research VC investments and theses. There is no fallback for this. Get it at app.tavily.com -- free tier: 1000 credits/month (about 125 full runs). Add it to your .env file."

If only FIRECRAWL_API_KEY is missing: Continue silently. Tavily extract will be used for the URL fetch.

Step 2: Gather Input

You need:

Product URL (required, unless user pastes a product description directly)
Optional: target stage hint (pre-seed, seed, series-a, series-b) -- if provided, use it and skip stage detection
Optional: geography preference (US, Europe, global) -- defaults to US if not specified

If the user provides only a pasted description (no URL): Skip Steps 3-4. Go directly to Step 5 with the pasted text as product_content. Set stage_source to user_description.

If neither URL nor description is provided: Ask: "What is the URL of your product or startup? Or paste a short description: what it does, who it is for, and what stage you are at (pre-seed, seed, Series A)."

Derive product slug from URL for the output filename:

PRODUCT_SLUG=$(python3 -c "
from urllib.parse import urlparse
url = 'URL_HERE'
host = urlparse(url).netloc.replace('www.', '')
print(host.split('.')[0])
")

Step 3: Fetch Product Page

Primary: Firecrawl (if FIRECRAWL_API_KEY is set)

curl -s -X POST https://api.firecrawl.dev/v1/scrape \
  -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "URL_HERE", "formats": ["markdown"], "onlyMainContent": true}' \
  | python3 -c "
import sys, json
d = json.load(sys.stdin)
content = d.get('data', {}).get('markdown', '') or d.get('markdown', '')
print(f'Fetched: {len(content)} characters')
open('/tmp/vc-product-raw.md', 'w').write(content)
"

Fallback: Tavily extract (if FIRECRAWL_API_KEY is not set)

curl -s -X POST https://api.tavily.com/extract \
  -H "Content-Type: application/json" \
  -d "{\"api_key\": \"$TAVILY_API_KEY\", \"urls\": [\"URL_HERE\"]}" \
  | python3 -c "
import sys, json
d = json.load(sys.stdin)
content = d.get('results', [{}])[0].get('raw_content', '')
print(f'Fetched via Tavily extract: {len(content)} characters')
open('/tmp/vc-product-raw.md', 'w').write(content)
"

Step-level checkpoint:

python3 -c "
content = open('/tmp/vc-product-raw.md').read()
if len(content) < 200:
    print('ERROR: Page returned fewer than 200 characters.')
else:
    print(f'Content OK: {len(content)} characters')
"

If content < 200 characters: Stop fetching. Tell the user: "The product page returned no readable content. This usually means the site is JavaScript-rendered and requires a browser. Please paste your product description directly: what it does, who it is for, and what stage you are at."

Proceed to Step 5 using the pasted description as product_content.

Step 4: Detect Stage Signals Locally (No API)

Parse the fetched markdown with regex before the analysis step.

python3 << 'PYEOF'
import re, json

content = open('/tmp/vc-product-raw.md').read().lower()
stage_signals = []

if re.search(r'join\s+(the\s+)?waitlist|sign\s+up\s+for\s+beta|early\s+access|request\s+(an?\s+)?invite|get\s+notified', content):
    stage_signals.append({'signal': 'waitlist or beta CTA', 'stage_hint': 'pre-seed'})

if re.search(r'start\s+(your\s+)?free\s+trial|try\s+(it\s+)?for\s+free|request\s+a?\s+demo|book\s+a?\s+demo|schedule\s+a?\s+demo', content):
    stage_signals.append({'signal': 'free trial or demo CTA', 'stage_hint': 'seed'})

if re.search(r'contact\s+sales|talk\s+to\s+(our\s+)?sales|see\s+pricing|view\s+pricing|plans\s+and\s+pricing', content):
    stage_signals.append({'signal': 'pricing or sales CTA', 'stage_hint': 'series-a'})
if re.search(r'case\s+stud(y|ies)|customer\s+stor(y|ies)|trusted\s+by\s+[\d,]+|used\s+by\s+[\d,]+', content):
    stage_signals.append({'signal': 'case studies or customer count', 'stage_hint': 'series-a'})

if re.search(r'enterprise\s+(plan|pricing|tier)|we.?re\s+hiring|join\s+our\s+team|open\s+positions', content):
    stage_signals.append({'signal': 'enterprise tier or job openings', 'stage_hint': 'series-a-or-b'})

funding_match = re.search(
    r'raised\s+\$[\d,.]+\s*[mk]?|series\s+[abc]\s+round|seed\s+round|(\$[\d,.]+\s*[mk]?\s+(?:seed|series\s+[abc]))',
    content
)
if funding_match:
    stage_signals.append({'signal': f'funding text: {funding_match.group(0).strip()}', 'stage_hint': 'announced'})

if not stage_signals:
    dominant = 'unknown'
elif any(s['stage_hint'] == 'announced' for s in stage_signals):
    dominant = 'announced'
elif any(s['stage_hint'] == 'series-a-or-b' for s in stage_signals):
    dominant = 'series-a'
elif any(s['stage_hint'] == 'series-a' for s in stage_signals):
    dominant = 'series-a'
elif any(s['stage_hint'] == 'seed' for s in stage_signals):
    dominant = 'seed'
else:
    dominant = 'pre-seed'

confidence = 'high' if len(stage_signals) >= 2 else ('medium' if len(stage_signals) == 1 else 'low')

result = {'signals': stage_signals, 'dominant_stage': dominant, 'confidence': confidence}
json.dump(result, open('/tmp/vc-stage-signals.json', 'w'), indent=2)
print(f'Stage: {dominant} ({confidence} confidence) from {len(stage_signals)} signal(s)')
for s in stage_signals:
    print(f'  - {s["signal"]} -> {s["stage_hint"]}')
PYEOF

Step 5: Product Analysis (Taxonomy, Stage, ICP)

Print the product content and stage signals:

python3 -c "
import json
content = open('/tmp/vc-product-raw.md').read()[:6000]
signals = json.load(open('/tmp/vc-stage-signals.json'))
print('=== PRODUCT PAGE (first 6000 chars) ===')
print(content)
print()
print('=== DETECTED STAGE SIGNALS ===')
print(json.dumps(signals, indent=2))
"

AI instructions: Analyze the product page content above. Generate the taxonomy, ICP, and stage classification only -- do NOT generate comparable companies yet (that is done via live search in Step 6).

Write to /tmp/vc-product-analysis.json:

product_name: from the page
one_line_description: what it does, for whom, core value prop. Under 20 words. No marketing language.
industry_taxonomy: l1 (top-level: fintech / healthtech / developer tools / consumer / etc.), l2 (sector: sales technology / logistics software / etc.), l3 (specific niche: outbound prospecting / last-mile routing / etc.). Vague labels like "technology" or "software" alone are not acceptable.
icp: buyer_persona (job title), company_type, company_size
detected_stage: pre-seed / seed / series-a / series-b / unknown
stage_confidence: high / medium / low
stage_evidence: one sentence citing exactly which CTA or text on the page drove this. Write "no clear signals found" if unknown.
geography_bias: US / Europe / global / unclear
comparable_companies: leave as empty array [] -- will be filled in Step 6

python3 << 'PYEOF'
import json

analysis = {
    # FILL from your analysis above
    "comparable_companies": []
}

json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
print('Product analysis written.')
PYEOF

Verify:

python3 -c "
import json
a = json.load(open('/tmp/vc-product-analysis.json'))
print('Product:', a['product_name'])
print('Industry:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3'])
print('Stage:', a['detected_stage'], '(' + a['stage_confidence'] + ' confidence)')
"

Step 5b: Curated Pre-Match Against Verified Fund Dataset

Run the product taxonomy against a curated dataset of 25 verified VC funds (sourced from fund websites). Produces zero-hallucination fund matches and seed comparables for Track A -- no Tavily credits consumed.

Print product analysis for tag mapping:

python3 -c "
import json
a = json.load(open('/tmp/vc-product-analysis.json'))
print('Taxonomy:', a['industry_taxonomy']['l1'], '>', a['industry_taxonomy']['l2'], '>', a['industry_taxonomy']['l3'])
print('Stage:', a['detected_stage'])
print('Geography:', a['geography_bias'])
"

AI instructions: Map the product taxonomy to the standard tags used in the fund dataset. Available tags: DevTools, Infrastructure, Open Source, B2B SaaS, AI, Data, FinTech, HealthTech, Enterprise, Consumer, Marketplaces, E-commerce, Crypto, DeepTech, Cybersecurity, Generalist

Pick 2-4 tags that describe this product. Map detected_stage to: Pre-seed, Seed, Series A, or Growth. Map geography_bias to: US, Europe, India, or Global.

Write product context:

python3 << 'PYEOF'
import json

# FILL based on taxonomy analysis above
context = {
    "extracted_tags": ["TagA", "TagB"],  # 2-4 tags from the list above
    "stage_hint": "Seed",               # Pre-seed / Seed / Series A / Growth
    "geography_hint": "US"              # US / Europe / India / Global
}
json.dump(context, open('/tmp/vc-product-context.json', 'w'), indent=2)
print('Product context:', context)
PYEOF

Run scoring against the embedded curated dataset:

python3 << 'PYEOF'
import json

context = json.load(open('/tmp/vc-product-context.json'))

VC_FUNDS = [
  {"fund_name":"Y Combinator","thesis":"We provide seed funding for startups. We invest in deeply technical teams building massive companies across all domains.","check_size":"$500k","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","DevTools","AI"],"geography_focus":["Global"],"notable_portfolio":["Stripe","Airbnb","GitLab"],"website":"https://www.ycombinator.com"},
  {"fund_name":"boldstart ventures","thesis":"Day one partner for developer first, crypto, and SaaS founders. We love deeply technical founders solving hard infrastructure problems.","check_size":"$1M - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["DevTools","Infrastructure","Crypto"],"geography_focus":["Global","US"],"notable_portfolio":["Snyk","Blockdaemon","Superhuman"],"website":"https://boldstart.vc"},
  {"fund_name":"Heavybit","thesis":"The leading investor in developer-first startups. We help technical founders launch, gain traction, and build enterprise-ready companies.","check_size":"$1M - $5M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","Open Source"],"geography_focus":["Global","US"],"notable_portfolio":["PagerDuty","Sanity","Netlify"],"website":"https://www.heavybit.com"},
  {"fund_name":"Amplify Partners","thesis":"We invest in technical founders building the next generation of IT infrastructure, developer tools, and data platforms.","check_size":"$2M - $8M","stage_focus":["Seed","Series A"],"industry_tags":["DevTools","Infrastructure","AI","Data"],"geography_focus":["US"],"notable_portfolio":["Datadog","OCTO","dbt Labs"],"website":"https://www.amplifypartners.com"},
  {"fund_name":"OSS Capital","thesis":"We exclusively back early-stage founders building Commercial Open Source Software (COSS) companies.","check_size":"$500k - $2M","stage_focus":["Pre-seed","Seed","Series A"],"industry_tags":["Open Source","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Cal.com","Appsmith","Hoppscotch"],"website":"https://oss.capital"},
  {"fund_name":"Sequoia Capital","thesis":"We help the daring build legendary companies, from idea to IPO and beyond. Sequoia is an early-stage and growth-stage investor.","check_size":"$1M - $10M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","AI"],"geography_focus":["Global"],"notable_portfolio":["Apple","Google","WhatsApp"],"website":"https://www.sequoiacap.com"},
  {"fund_name":"Andreessen Horowitz (a16z)","thesis":"We invest in software eating the world. We back bold entrepreneurs building the future through technology.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Crypto","Enterprise","Consumer","AI"],"geography_focus":["Global","US"],"notable_portfolio":["Facebook","Coinbase","Figma"],"website":"https://a16z.com"},
  {"fund_name":"Point Nine Capital","thesis":"We are a seed-stage venture capital firm focused on B2B SaaS and B2B marketplaces globally.","check_size":"$1M - $3M","stage_focus":["Seed"],"industry_tags":["B2B SaaS","Marketplaces"],"geography_focus":["Europe","Global"],"notable_portfolio":["Zendesk","Typeform","Docplanner"],"website":"https://www.pointnine.com"},
  {"fund_name":"Cherry Ventures","thesis":"We champion founders in Europe from their earliest days. We are generalist seed investors.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["Europe"],"notable_portfolio":["FlixBus","Auto1 Group","Forto"],"website":"https://www.cherry.vc"},
  {"fund_name":"First Round Capital","thesis":"We are the seed-stage firm that builds the most supportive community for founders.","check_size":"$1M - $4M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer"],"geography_focus":["US"],"notable_portfolio":["Uber","Notion","Roblox"],"website":"https://firstround.com"},
  {"fund_name":"Bessemer Venture Partners","thesis":"BVP helps entrepreneurs lay strong foundations to build and forge long-standing companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["LinkedIn","Twilio","Shopify"],"website":"https://www.bvp.com"},
  {"fund_name":"Index Ventures","thesis":"We back the best and most ambitious entrepreneurs across all stages to build category-defining businesses.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","FinTech","Consumer","B2B SaaS"],"geography_focus":["Europe","US","Global"],"notable_portfolio":["Dropbox","Slack","Figma"],"website":"https://www.indexventures.com"},
  {"fund_name":"Lightspeed Venture Partners","thesis":"We invest globally in enterprise, consumer, and health founders who are shaping the future.","check_size":"$1M - $25M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Enterprise","Consumer","FinTech"],"geography_focus":["Global"],"notable_portfolio":["Snap","Rippling","MuleSoft"],"website":"https://lsvp.com"},
  {"fund_name":"Accel","thesis":"We partner with exceptional founders from inception through all phases of private company growth.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","DevTools"],"geography_focus":["Global"],"notable_portfolio":["Facebook","Atlassian","Spotify"],"website":"https://www.accel.com"},
  {"fund_name":"Bain Capital Ventures","thesis":"From seed to growth, we back founders building legendary infrastructure, fintech, application, and commerce companies.","check_size":"$1M - $50M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Infrastructure","FinTech","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["DocuSign","SendGrid","Redis"],"website":"https://www.baincapitalventures.com"},
  {"fund_name":"Greylock Partners","thesis":"We partner with early-stage founders to build enterprise and consumer software companies that define new categories.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Enterprise","Consumer","Cybersecurity","AI"],"geography_focus":["US"],"notable_portfolio":["Workday","Palo Alto Networks","LinkedIn"],"website":"https://greylock.com"},
  {"fund_name":"Unusual Ventures","thesis":"We provide a breakthrough level of support for early-stage founders building enterprise tech.","check_size":"$1M - $5M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Enterprise","DevTools","B2B SaaS"],"geography_focus":["US"],"notable_portfolio":["Arctic Wolf","Harness","Vivun"],"website":"https://www.unusual.vc"},
  {"fund_name":"Crane Venture Partners","thesis":"We back deep tech and enterprise founders in Europe solving hard problems with data and code.","check_size":"$1M - $4M","stage_focus":["Seed"],"industry_tags":["Enterprise","DeepTech","Data","AI"],"geography_focus":["Europe"],"notable_portfolio":["Onfido","Tessian","Forto"],"website":"https://crane.vc"},
  {"fund_name":"Founder Collective","thesis":"We are a seed-stage venture capital fund, built by founders, for founders. We back weird, wonderful, and wild startups.","check_size":"$500k - $2M","stage_focus":["Seed"],"industry_tags":["Generalist","Consumer","B2B SaaS"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Airtable","BuzzFeed"],"website":"https://www.foundercollective.com"},
  {"fund_name":"Benchmark","thesis":"We are a partnership of equal partners. We back mission-driven founders at the earliest stages and walk beside them for the long haul.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Marketplaces","Enterprise","Consumer"],"geography_focus":["US","Global"],"notable_portfolio":["Uber","Twitter","eBay","Snapchat"],"website":"https://www.benchmark.com"},
  {"fund_name":"Accel India","thesis":"We partner with exceptional founders from inception through all phases of private company growth in the Indian ecosystem.","check_size":"$1M - $15M","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","B2B SaaS","Consumer","FinTech","E-commerce"],"geography_focus":["India"],"notable_portfolio":["Flipkart","Swiggy","Freshworks"],"website":"https://www.accel.com/india"},
  {"fund_name":"Blume Ventures","thesis":"We are a seed and pre-seed venture fund that backs startups with both funding and active mentoring.","check_size":"$500k - $3M","stage_focus":["Pre-seed","Seed"],"industry_tags":["Generalist","B2B SaaS","Consumer","DeepTech","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Unacademy","Purplle","GreyOrange"],"website":"https://blume.vc"},
  {"fund_name":"Elevation Capital","thesis":"We partner with visionary founders in India across early stages to help them build category-defining businesses.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","HealthTech"],"geography_focus":["India"],"notable_portfolio":["Paytm","Swiggy","Meesho"],"website":"https://elevationcapital.com"},
  {"fund_name":"Peak XV Partners","thesis":"Formerly Sequoia India & SEA, we partner with founders across early, growth, and public stages to build enduring companies.","check_size":"$1M - $20M+","stage_focus":["Seed","Series A","Growth"],"industry_tags":["Generalist","Consumer","FinTech","B2B SaaS","DevTools","AI"],"geography_focus":["India","South Asia"],"notable_portfolio":["Zomato","Pine Labs","Cred"],"website":"https://www.peakxv.com"},
  {"fund_name":"Nexus Venture Partners","thesis":"We are a US-India venture capital firm backing extraordinary founders building product-first companies.","check_size":"$1M - $10M","stage_focus":["Seed","Series A"],"industry_tags":["B2B SaaS","Enterprise","DevTools","Consumer"],"geography_focus":["India","US"],"notable_portfolio":["Postman","Hasura","Zepto"],"website":"https://nexusvp.com"}
]

STAGE_ORDER = {"Pre-seed": 0, "Seed": 1, "Series A": 2, "Growth": 3}

def score_fund(fund, ctx):
    score = 0
    fund_tags = fund.get("industry_tags", [])
    extracted_tags = ctx.get("extracted_tags", ["Generalist"])
    tag_points = 0
    matched_tags = []
    for tag in extracted_tags:
        if tag in fund_tags:
            tag_points += 5 if tag == "Generalist" else 20
            matched_tags.append(tag)
    tag_points = min(tag_points, 60)
    score += tag_points
    stage_hint = ctx.get("stage_hint")
    fund_stages = fund.get("stage_focus", [])
    if not stage_hint:
        score += 10
    elif fund_stages:
        if stage_hint in fund_stages:
            score += 20
        elif stage_hint in STAGE_ORDER:
            hint_idx = STAGE_ORDER[stage_hint]
            if any(f in STAGE_ORDER and abs(STAGE_ORDER[f] - hint_idx) == 1 for f in fund_stages):
                score += 10
    geo_hint = ctx.get("geography_hint")
    fund_geo = fund.get("geography_focus", ["Global"])
    if not geo_hint or geo_hint == "Global":
        score += 10
    elif fund_geo == ["India"] and geo_hint == "US":
        pass
    elif geo_hint in fund_geo:
        score += 20
    elif "Global" in fund_geo:
        score += 15
    if geo_hint == "US" and "India" in fund_geo and "US" not in fund_geo and "Global" not in fund_geo:
        score = max(0, score - 30)
    if fund_tags and extracted_tags and fund_tags[0] not in extracted_tags and tag_points <= 20:
        score = max(0, score - 15)
    return score, matched_tags

scored = []
for fund in VC_FUNDS:
    score, matched_tags = score_fund(fund, context)
    tier = "High" if score >= 70 else ("Medium" if score >= 40 else "Low")
    scored.append({
        "fund_name": fund["fund_name"],
        "thesis": fund["thesis"],
        "check_size": fund["check_size"],
        "stage_focus": fund["stage_focus"],
        "industry_tags": fund["industry_tags"],
        "geography_focus": fund["geography_focus"],
        "notable_portfolio": fund["notable_portfolio"],
        "website": fund["website"],
        "source": "verified (fund website)",
        "score": score,
        "confidence": tier,
        "matched_tags": matched_tags
    })

scored.sort(key=lambda x: (-x["score"], x["fund_name"]))
relevant = [m for m in scored if m["confidence"] in ("High", "Medium")]

curated_comparables = []
for m in relevant:
    for company in m.get("notable_portfolio", []):
        if company not in curated_comparables:
            curated_comparables.append(company)

output = {
    "high_medium_matches": relevant,
    "curated_comparables": curated_comparables[:6]
}
json.dump(output, open('/tmp/vc-curated-matches.json', 'w'), indent=2)
print(f'Curated matches: {len(relevant)} High/Medium confidence funds')
for m in relevant[:8]:
    print(f'  {m["confidence"]:6} ({m["score"]:3}) {m["fund_name"]}')
print(f'Seed comparables from portfolio: {curated_comparables[:6]}')
PYEOF

Step 6: Discover Comparable Companies via Tavily

Load curated portfolio companies from Step 5b as seed comparables:

python3 -c "
import json
matches = json.load(open('/tmp/vc-curated-matches.json'))
curated = matches.get('curated_comparables', [])
print(f'Curated portfolio comparables ({len(curated)}): {curated}')
need = max(0, 5 - len(curated))
print(f'Tavily will supplement with up to {need} more')
"

Do not use AI training knowledge to generate comparable companies. Curated portfolio companies (above) are already zero-hallucination comparables from verified fund data. Tavily supplements with L3-niche-specific companies.

python3 << 'PYEOF'
import json, os, urllib.request

analysis = json.load(open('/tmp/vc-product-analysis.json'))
l2 = analysis['industry_taxonomy']['l2']
l3 = analysis['industry_taxonomy']['l3']
tavily_key = os.environ.get('TAVILY_API_KEY', '')

queries = [
    f'"{l3}" startup raised funding venture capital seed series',
    f'"{l2}" companies venture backed funded startup'
]

all_results = []
for query in queries:
    payload = json.dumps({
        "api_key": tavily_key,
        "query": query,
        "search_depth": "advanced",
        "max_results": 8,
        "include_answer": True
    }).encode()

    req = urllib.request.Request(
        'https://api.tavily.com/search',
        data=payload,
        headers={'Content-Type': 'application/json'},
        method='POST'
    )

    try:
        with urllib.request.urlopen(req, timeout=30) as resp:
            result = json.loads(resp.read())
            all_results.append({
                'query': query,
                'answer': result.get('answer', ''),
                'results': [
                    {'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:500]}
                    for r in result.get('results', [])
                ]
            })
            print(f'Comparable search: {len(result.get("results", []))} results for "{query[:60]}"')
    except Exception as e:
        print(f'Comparable search FAILED: {e}')
        all_results.append({'query': query, 'answer': '', 'results': [], 'error': str(e)})

json.dump(all_results, open('/tmp/vc-comparable-search.json', 'w'), indent=2)
PYEOF

Print results for AI selection:

python3 -c "
import json
results = json.load(open('/tmp/vc-comparable-search.json'))
for r in results:
    print(f'Query: {r[\"query\"]}')
    print(f'Answer: {r.get(\"answer\",\"\")[:400]}')
    for item in r.get('results', []):
        print(f'  - {item[\"title\"]} | {item[\"url\"]}')
        print(f'    {item[\"content\"][:200]}')
    print()
"

AI instructions: Combine the curated portfolio companies from /tmp/vc-curated-matches.json with the Tavily search results above. Pick exactly 5 comparable companies. Prioritize curated portfolio companies (already verified -- they are real portfolio companies of matched VC funds). Supplement with Tavily-discovered companies to reach 5 if needed.

For each comparable write:

name: company name
similarity_reason: one sentence explaining the fit (for curated: reference the fund that backed them; for Tavily: cite the snippet)
source_url: portfolio fund website for curated companies, Tavily result URL for discovered ones
estimated_stage: from curated data or snippet text -- write "not in search data" if unknown
source_type: "curated_portfolio" or "tavily_discovered"

Update /tmp/vc-product-analysis.json with the comparable_companies array:

python3 << 'PYEOF'
import json

analysis = json.load(open('/tmp/vc-product-analysis.json'))

analysis['comparable_companies'] = [
    # FILL 5 companies -- curated_portfolio first, then tavily_discovered
    # Each: {"name": str, "similarity_reason": str, "source_url": str, "estimated_stage": str, "source_type": str}
]

json.dump(analysis, open('/tmp/vc-product-analysis.json', 'w'), indent=2)
print('Comparables written:', ', '.join(c['name'] for c in analysis['comparable_companies']))
PYEOF

If fewer than 3 comparable companies appear in the search results: Broaden the queries. Run a third search: "[l1] startup" funding round venture capital. If still thin, proceed with what is available and flag in data_quality_flags.

Step 7: Track A -- Who Invested in Comparable Companies

Run 5 Tavily searches, one per comparable.

python3 << 'PYEOF'
import json, os, urllib.request

analysis = json.load(open('/tmp/vc-product-analysis.json'))
comparables = analysis['comparable_companies']
tavily_key = os.environ.get('TAVILY_API_KEY', '')
all_track_a = []

for comp in comparables:
    company = comp['name']
    query = f'"{company}" investors funding venture capital backed seed series'

    payload = json.dumps({
        "api_key": tavily_key,
        "query": query,
        "search_depth": "advanced",
        "max_results": 5,
        "include_answer": True
    }).encode()

    req = urllib.request.Request(
        'https://api.tavily.com/search',
        data=payload,
        headers={'Content-Type': 'application/json'},
        method='POST'
    )

    try:
        with urllib.request.urlopen(req, timeout=30) as resp:
            result = json.loads(resp.read())
            all_track_a.append({
                'comparable_company': company,
                'similarity_reason': comp['similarity_reason'],
                'query': query,
                'answer': result.get('answer', ''),
                'results': result.get('results', [])
            })
            print(f'Track A - {company}: {len(result.get("results", []))} results')
    except Exception as e:
        print(f'Track A - {company}: FAILED ({e})')
        all_track_a.append({
            'comparable_company': company,
            'similarity_reason': comp['similarity_reason'],
            'query': query,
            'answer': '',
            'results': [],
            'error': str(e)
        })

json.dump(all_track_a, open('/tmp/vc-tracka-results.json', 'w'), indent=2)
print(f'Track A complete. Comparables with results: {sum(1 for r in all_track_a if r.get("results"))}')
PYEOF

If all 5 Track A searches return 0 results: Re-run Step 6 with broader queries. Retry with well-covered companies (those with significant press coverage). If still 0: proceed to Track B only and flag in data_quality_flags.

Step 8: Track B -- VCs With Investment Theses About This Space

Run 3 Tavily searches using L2 and L3 taxonomy from Step 5.

python3 << 'PYEOF'
import json, os, urllib.request

analysis = json.load(open('/tmp/vc-product-analysis.json'))
l2 = analysis['industry_taxonomy']['l2']
l3 = analysis['industry_taxonomy']['l3']
stage = analysis['detected_stage']
tavily_key = os.environ.get('TAVILY_API_KEY', '')

queries = [
    {'name': 'thesis_l3', 'query': f'venture capital investment thesis "{l3}" investing 2023 OR 2024 OR 2025'},
    {'name': 'thesis_l2', 'query': f'VC fund "{l2}" investment thesis portfolio companies'},
    {'name': 'stage_space', 'query': f'{stage} investors "{l3}" startup venture capital fund'}
]

all_track_b = []

for q in queries:
    payload = json.dumps({
        "api_key": tavily_key,
        "query": q['query'],
        "search_depth": "advanced",
        "max_results": 7,
        "include_answer": True
    }).encode()

    req = urllib.request.Request(
        'https://api.tavily.com/search',
        data=payload,
        headers={'Content-Type': 'application/json'},
        method='POST'
    )

    try:
        with urllib.request.urlopen(req, timeout=30) as resp:
            result = json.loads(resp.read())
            all_track_b.append({
                'query_name': q['name'],
                'query': q['query'],
                'answer': result.get('answer', ''),
                'results': result.get('results', [])
            })
            print(f"Track B - {q['name']}: {len(result.get('results', []))} results")
    except Exception as e:
        print(f"Track B - {q['name']}: FAILED ({e})")
        all_track_b.append({'query_name': q['name'], 'query': q['query'], 'answer': '', 'results': [], 'error': str(e)})

json.dump(all_track_b, open('/tmp/vc-trackb-results.json', 'w'), indent=2)
PYEOF

If all 3 Track B searches return 0 results: Proceed with Track A results only. Note in data_quality_flags: "No thesis-led investors found via public search."

Step 9: Synthesize -- Rank and Score All VCs

Print the research data:

python3 -c "
import json

analysis = json.load(open('/tmp/vc-product-analysis.json'))
track_a = json.load(open('/tmp/vc-tracka-results.json'))
track_b = json.load(open('/tmp/vc-trackb-results.json'))
curated = json.load(open('/tmp/vc-curated-matches.json'))

track_a_summary = []
for item in track_a:
    snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
                for r in item.get('results', [])[:3]]
    track_a_summary.append({
        'comparable_company': item['comparable_company'],
        'similarity_reason': item['similarity_reason'],
        'answer': item.get('answer', '')[:500],
        'top_results': snippets
    })

track_b_summary = []
for item in track_b:
    snippets = [{'title': r.get('title',''), 'url': r.get('url',''), 'content': r.get('content','')[:400]}
                for r in item.get('results', [])[:4]]
    track_b_summary.append({
        'query_name': item['query_name'],
        'answer': item.get('answer', '')[:500],
        'top_results': snippets
    })

curated_summary = []
for m in curated.get('high_medium_matches', []):
    curated_summary.append({
        'fund_name': m['fund_name'],
        'confidence': m['confidence'],
        'score': m['score'],
        'matched_tags': m['matched_tags'],
        'thesis': m['thesis'],
        'check_size': m['check_size'],
        'stage_focus': m['stage_focus'],
        'notable_portfolio': m['notable_portfolio'],
        'website': m['website'],
        'source': 'verified (fund website)'
    })

print(json.dumps({
    'product': {
        'name': analysis['product_name'],
        'description': analysis['one_line_description'],
        'industry': analysis['industry_taxonomy'],
        'icp': analysis['icp'],
        'stage': analysis['detected_stage'],
        'stage_confidence': analysis['stage_confidence'],
        'geography': analysis['geography_bias']
    },
    'curated_matches': curated_summary,
    'track_a_research': track_a_summary,
    'track_b_research': track_b_summary
}, indent=2))
"

AI instructions -- zero-hallucination rules:

Every field in the output must be traceable to the printed data above. Rules:

curated_vcs: Use the curated_matches data directly. These are pre-verified -- no Tavily evidence required. fund_overview comes from the thesis field in the curated data. check_size and stage_focus come from the curated data fields. Do NOT fill from training knowledge even for these funds.
VC names (Track A / B): Only include a fund if its name appears verbatim in the snippet text or title. No exceptions.
evidence_company (Track A): The comparable company they backed -- must be stated in the snippet text, not inferred.
thesis_source_title (Track B): The exact title of the article or post as it appears in the search results.
fund_overview (Track A / B): Extract from snippet text only. Max 2 sentences. If the snippets do not describe the fund, write "not found in search data".
thesis_summary: Close paraphrase of the snippet text. Do not add context from training knowledge.
check_size (Track A / B): From snippet data only. Write "not in search data" if not mentioned.
portfolio_in_space: Only companies that appear in the search snippets. Write "not found in search data" if none.
stage_fit_score 1-10: Penalize 3 points if the VC's stated stage does not match the product's detected stage.
space_fit_score 1-10: 9-10 only if the VC backed 2+ companies in the L3 niche per the snippets or curated data.
approach_method: one of -- cold email / warm intro required / AngelList / application form / Twitter/X DM. Infer from snippets or fund website.
outreach_hook: Must name a specific portfolio signal or thesis quote. Generic hooks like "highlight your traction" are not acceptable.
No em dashes. No marketing language.

Write to /tmp/vc-final-list.json:

product_summary: name, one_line_description, industry_l1, industry_l2, industry_l3, detected_stage, comparable_companies_used (names only)
curated_vcs: fund_name, confidence ("High"/"Medium"), matched_tags, fund_overview (from thesis field), check_size, stage_focus, website, source ("verified (fund website)"), stage_fit_score, space_fit_score
track_a_vcs: fund_name, evidence_company (REQUIRED), evidence_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_method
track_b_vcs: fund_name, thesis_source_title (REQUIRED), thesis_source_url, stage_focus, check_size, fund_overview, thesis_summary, stage_fit_score, space_fit_score, approach_method
top_5_deep_dives: fund_name, track ("Curated"/"A"/"B"), fund_overview, why_fit, portfolio_in_space, how_to_approach (min 30 chars), outreach_hook
outreach_hooks: 3 objects -- hook_type, hook_text (2-3 sentences), best_for
data_quality_flags: gaps, missing fields, low-confidence areas

python3 << 'PYEOF'
import json

result = {
    # FILL from synthesis above
    # Must include: product_summary, curated_vcs, track_a_vcs, track_b_vcs, top_5_deep_dives, outreach_hooks, data_quality_flags
}

json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
print(f'Synthesis written. Curated: {len(result.get("curated_vcs", []))} VCs. Track A: {len(result.get("track_a_vcs", []))} VCs. Track B: {len(result.get("track_b_vcs", []))} VCs.')
PYEOF

Step 10: Self-QA

python3 << 'PYEOF'
import json

result = json.load(open('/tmp/vc-final-list.json'))
failures = []

# Remove Track A VCs missing evidence_company
original_a = len(result.get('track_a_vcs', []))
result['track_a_vcs'] = [v for v in result.get('track_a_vcs', []) if v.get('evidence_company')]
removed_a = original_a - len(result['track_a_vcs'])
if removed_a > 0:
    failures.append(f'Removed {removed_a} Track A VC(s) missing evidence_company')

# Remove Track B VCs missing thesis_source_title
original_b = len(result.get('track_b_vcs', []))
result['track_b_vcs'] = [v for v in result.get('track_b_vcs', []) if v.get('thesis_source_title')]
removed_b = original_b - len(result['track_b_vcs'])
if removed_b > 0:
    failures.append(f'Removed {removed_b} Track B VC(s) missing thesis_source_title')

# Remove deep dives for VCs that were stripped from all tracks
valid_funds = (
    {v['fund_name'] for v in result.get('curated_vcs', [])} |
    {v['fund_name'] for v in result.get('track_a_vcs', [])} |
    {v['fund_name'] for v in result.get('track_b_vcs', [])}
)
original_dives = len(result.get('top_5_deep_dives', []))
result['top_5_deep_dives'] = [d for d in result.get('top_5_deep_dives', []) if d.get('fund_name') in valid_funds]
removed_dives = original_dives - len(result['top_5_deep_dives'])
if removed_dives > 0:
    failures.append(f'Removed {removed_dives} deep dive(s) for funds stripped during QA')

# Check top 5 deep dives
dives = result.get('top_5_deep_dives', [])
if len(dives) < 5:
    failures.append(f'Only {len(dives)} deep dives (expected 5) -- insufficient search data')
for dd in dives:
    if not dd.get('how_to_approach') or len(dd.get('how_to_approach', '')) < 30:
        dd['how_to_approach'] = 'Approach method not determinable from search data. Check the fund website directly for application instructions.'
        failures.append(f"Fixed: '{dd.get('fund_name')}' had missing how_to_approach")
    if not dd.get('fund_overview') or dd.get('fund_overview') == '':
        dd['fund_overview'] = 'not found in search data'

# Check outreach hooks count
if len(result.get('outreach_hooks', [])) != 3:
    failures.append(f"Expected 3 outreach hooks, got {len(result.get('outreach_hooks', []))}")

# Check for em dashes
full_text = json.dumps(result)
if '—' in full_text:
    result = json.loads(full_text.replace('—', '-'))
    failures.append('Fixed: em dash characters replaced with hyphens')

# Check for forbidden words
forbidden = ['powerful', 'robust', 'seamless', 'innovative', 'game-changing', 'streamline', 'leverage', 'transform']
full_text_lower = json.dumps(result).lower()
for word in forbidden:
    if word in full_text_lower:
        failures.append(f"Warning: forbidden word '{word}' found in output -- review before presenting")

# Flag any "not found in search data" entries so user knows coverage is incomplete
not_found_count = json.dumps(result).count('not found in search data')
if not_found_count > 0:
    failures.append(f'INFO: {not_found_count} field(s) marked "not found in search data" -- verify directly before outreach')

if 'data_quality_flags' not in result:
    result['data_quality_flags'] = []
result['data_quality_flags'].extend(failures)

json.dump(result, open('/tmp/vc-final-list.json', 'w'), indent=2)
print(f'QA complete. Issues addressed: {len(failures)}')
for f in failures:
    print(f'  - {f}')
if not failures:
    print('All QA checks passed.')
PYEOF

Step 11: Save and Present Output

DATE=$(date +%Y-%m-%d)
OUTPUT_FILE="docs/vc-intel/${PRODUCT_SLUG}-${DATE}.md"
mkdir -p docs/vc-intel

Present the final output:

## VC Finder: [product_name]
Date: [today] | Stage: [detected_stage] ([stage_confidence] confidence) | Geography: [geography_bias]

---

### Product Analysis

What it does: [one_line_description]
Industry: [l1] > [l2] > [l3]
Buyer: [buyer_persona] at [company_type], [company_size]
Comparable companies used: [comma-separated list, noting source_type for each]

---

### Curated Matches (Verified)

*Funds matched from a verified dataset of 25 VC funds sourced from fund websites. Zero hallucination -- details come directly from the dataset.*

| Fund | Confidence | Stage Focus | Check Size | Matched Tags |
|---|---|---|---|---|
[one row per curated VC, sorted by confidence then score]

---

### Track A: VCs Who Backed Similar Companies

*These investors have already written a check in this space. Evidence from live Tavily search.*

| Fund | Backed Comparable | Stage Focus | Check Size | Fit Score | Approach |
|---|---|---|---|---|---|
[one row per Track A VC, sorted by space_fit_score descending]

---

### Track B: Thesis-Led Investors

*These investors are actively publishing about this space.*

| Fund | Thesis Source | Stage Focus | Check Size | Fit Score | Approach |
|---|---|---|---|---|---|
[one row per Track B VC, sorted by space_fit_score descending]

---

### Top 5 Deep Dives

#### [N]. [Fund Name] (Track [Curated/A/B])

Overview: [fund_overview -- from dataset or search data only]
Why it fits: [why_fit]
Portfolio in this space: [from dataset or search data, or "not found in search data"]
How to approach: [how_to_approach]
Outreach hook: "[outreach_hook]"

[repeat for all available deep dives]

---

### 3 Outreach Hooks for This Product Type

**1. [hook_type]**
[hook_text]
Best for: [best_for]

[repeat for all 3]

---
Data quality notes: [data_quality_flags, or "None"]
Saved to: docs/vc-intel/[PRODUCT_SLUG]-[DATE].md

Clean up temp files:

rm -f /tmp/vc-product-raw.md /tmp/vc-stage-signals.json /tmp/vc-product-analysis.json \
      /tmp/vc-product-context.json /tmp/vc-curated-matches.json /tmp/vc-comparable-search.json \
      /tmp/vc-tracka-results.json /tmp/vc-trackb-results.json /tmp/vc-final-list.json

ナビゲーション

Skillsとは？

リンク

vc-finder