Candidate Evaluation Decision Pack
Role: Founding Engineer (Full-Stack: React + Python) Company: [AI Writing Assistant Startup] -- Seed stage, 4 people, $2M raised Date: 2026-03-17 Decision Owner: Founding team (CEO + CTO, if applicable) Decision Deadline: Within 1 week of completing reference checks Signals Included: 3 interview rounds (technical deep-dive, product thinking, founder-fit) + 4-hour paid trial project Signals Remaining: 2 back-channel reference checks (scripts provided below)
Part 1: Evaluation Brief (Role Success + Bar)
Role: Founding Engineer (Full-Stack) Level: Senior IC / founding team member (equity-weighted compensation expected) Team / stage context: Seed-stage AI writing assistant startup. 4-person team, $2M raised. Runway is finite; every week of shipping matters. The founding engineer will be the primary builder alongside the technical co-founder (if any) or the sole technical leader. Extreme ownership, speed, and product instinct are table stakes. There is no established process, no code review pipeline, no QA team -- this person IS the engineering culture.
Success in 6 months:
- Shipped the core product to early users (functional AI writing assistant with React frontend + Python/LLM backend) and iterated based on real feedback
- Established the initial technical architecture that can scale from 0 to 1,000 users without a rewrite
- Operated as a true co-builder: identified what to build next, made scope/quality trade-offs independently, and unblocked themselves without waiting for direction
Non-negotiables (must-haves):
- Can ship production-quality full-stack code (React + Python) autonomously within days, not weeks
- Demonstrates high agency: identifies problems, proposes solutions, and acts without being told
- Shows strong product instinct -- builds for users, not for technical elegance alone
- Thrives in ambiguity: comfortable with incomplete specs, changing priorities, and wearing multiple hats
- Coachable and non-defensive: takes feedback, iterates quickly, and disagrees constructively
Red flags (explicit):
- Needs detailed specs or heavy process to be productive (incompatible with seed stage)
- Defensive when receiving feedback or when their approach is questioned
- Optimizes for technical purity over user outcomes (over-engineering)
- History of starting but not finishing projects; low follow-through
- Cannot explain technical trade-offs in business/product terms
- Wants a narrow role ("I only do backend") at a stage that requires generalists
Criteria (6) with weights (biased toward raw ability + speed over pedigree):
| # | Criterion | Definition (observable) | Weight | Strong looks like | Weak looks like |
|---|---|---|---|---|---|
| 1 | Execution Speed + Shipping Ability | Ability to go from ambiguous problem to deployed, working code quickly. Measured by trial output quality and interview examples of shipping cadence. | 25% | Shipped the trial prototype in <4 hrs with clean, functional code. Interview examples show weekly/biweekly shipping cadence at prior roles. Proactively cut scope to hit deadlines. | Trial prototype incomplete or non-functional. Interview examples show multi-month timelines for basic features. Waited for perfect requirements before starting. |
| 2 | Technical Depth + Craft (React + Python + AI/LLM) | Quality of technical decisions, code architecture, and understanding of the stack. Ability to reason about LLM integration, prompt engineering, and full-stack trade-offs. | 20% | In the technical deep-dive, explained trade-offs clearly with concrete examples. Trial code is well-structured, handles edge cases, and shows awareness of LLM-specific patterns (token limits, latency, caching). | Superficial technical answers; cannot go beyond surface-level framework usage. Trial code is spaghetti, ignores error handling, or shows no awareness of AI/LLM integration challenges. |
| 3 | Product Thinking + User Instinct | Ability to reason about what to build and why, from the user's perspective. Makes scope and prioritization decisions that serve user outcomes. | 20% | In the product thinking interview, reframed the problem from the user's perspective before proposing solutions. Trial prototype includes thoughtful UX choices (not just a technical demo). Asks "who is this for and what do they need?" before "how do we build it?" | Builds what is technically interesting rather than what users need. Cannot articulate who the user is or why a feature matters. Trial output is a backend exercise with no user-facing thought. |
| 4 | High Agency + Ownership | Self-direction, resourcefulness, and willingness to own outcomes end-to-end. Does not wait for permission or detailed instructions. | 15% | Interview examples show initiating projects without being asked, unblocking themselves creatively, and owning outcomes (not just tasks). In the trial, made reasonable assumptions and moved forward rather than asking excessive clarifying questions. | Waited for direction on every decision. Interview examples feature "I was told to" framing. In the trial, got stuck and stopped rather than finding a workaround. |
| 5 | Coachability + Collaboration Under Pressure | How they respond to feedback, disagreement, and fast-changing context. Can they disagree constructively and change course without ego damage? | 10% | In interviews, responded to pushback with curiosity ("that's a good point, here's how I'd adjust"). Reference signals show growth over time and openness to feedback. No defensiveness pattern. | Became defensive or dismissive when challenged in interviews. Reference signals show pattern of blaming others or inability to incorporate feedback. Interprets questions as attacks. |
| 6 | Founder-Fit + Startup Endurance | Alignment with the specific demands of a seed-stage startup: ambiguity tolerance, intrinsic motivation, willingness to do unglamorous work, and long-term commitment signal. | 10% | Founder-fit interview reveals genuine excitement about the problem space, realistic understanding of startup hardship, and examples of thriving in chaotic/resource-constrained environments. Motivated by building, not by title or process. | Motivated primarily by compensation, brand, or structure. No evidence of working in resource-constrained environments. Expects clear career ladder, dedicated manager, and established process from day one. |
Part 2: Candidate Scorecard
Candidate: [Candidate Name -- to be filled per candidate] Role/level: Founding Engineer (Full-Stack, React + Python) Date: [Evaluation date] Evaluators: [Names of all interviewers + trial evaluator]
Rating scale: 1 = Weak (clear miss), 2 = Below bar (concerning gaps), 3 = Meets bar (would succeed), 4 = Raises bar (exceptional for this role/stage)
| # | Criterion (Weight) | Rating (1-4) | Evidence (quotes/notes from interviews + trial) | Confidence (L/M/H) | Concerns / follow-ups |
|---|---|---|---|---|---|
| 1 | Execution Speed + Shipping (25%) | ___ | [Fill: What did they ship in the 4-hr trial? How far did they get? What was their shipping cadence in prior roles per the technical deep-dive?] | ___ | ___ |
| 2 | Technical Depth + Craft (20%) | ___ | [Fill: How did they reason through technical trade-offs in the deep-dive? What does the trial code look like architecturally? Did they demonstrate LLM/AI awareness?] | ___ | ___ |
| 3 | Product Thinking + User Instinct (20%) | ___ | [Fill: How did they approach the product thinking round? Did they reframe problems from the user perspective? Does the trial prototype show UX awareness?] | ___ | ___ |
| 4 | High Agency + Ownership (15%) | ___ | [Fill: What did they initiate without being asked? How did they handle ambiguity in the trial? Do interview examples show ownership or task-execution framing?] | ___ | ___ |
| 5 | Coachability + Collaboration (10%) | ___ | [Fill: How did they respond to pushback in interviews? Any defensiveness signals? How do they describe past conflicts or feedback moments?] | ___ | ___ |
| 6 | Founder-Fit + Startup Endurance (10%) | ___ | [Fill: What motivates them? Do they understand seed-stage realities? Examples of thriving in ambiguity/constraints?] | ___ | ___ |
Weighted Score: [Calculate: Sum of (Rating x Weight) -- max 4.0]
Overall recommendation: Hire / No hire / Hold Confidence: Low / Medium / High
Scoring guide for the weighted score:
- 3.5 - 4.0: Strong hire -- proceed to offer
- 3.0 - 3.49: Hire with mitigations -- proceed but build onboarding around identified gaps
- 2.5 - 2.99: Hold -- resolve open questions before deciding (targeted reference, follow-up interview)
- Below 2.5: No hire -- gaps are too significant for a founding role
Part 3: Signal Log
| Signal Type | Source | Round/Date | Criterion Mapped | Evidence Snippet | Positive / Negative | Confidence |
|---|---|---|---|---|---|---|
| Interview | Technical Deep-Dive | Round 1 | Technical Depth + Craft; Execution Speed | [Fill with specific quotes, examples, and observations from interviewer notes] | ___ | ___ |
| Interview | Product Thinking | Round 2 | Product Thinking + User Instinct; High Agency | [Fill with specific quotes, examples, and observations] | ___ | ___ |
| Interview | Founder-Fit | Round 3 | Founder-Fit + Startup Endurance; Coachability + Collaboration | [Fill with specific quotes, examples, and observations] | ___ | ___ |
| Paid Trial | 4-hr Feature Prototype | Trial | Execution Speed; Technical Depth; Product Thinking; High Agency | [Fill with: what they built, code quality observations, scope decisions, UX choices, questions asked vs assumptions made] | ___ | ___ |
| Reference | Back-channel Ref #1 | Pending | Coachability; High Agency; Execution Speed | [To be filled after reference check] | ___ | ___ |
| Reference | Back-channel Ref #2 | Pending | Founder-Fit; Collaboration; Technical Depth | [To be filled after reference check] | ___ | ___ |
Signal weight allocation:
- Paid trial project: 35% of total signal weight (highest -- direct observation of work output)
- 3 interview rounds combined: 40% (technical deep-dive 15%, product thinking 15%, founder-fit 10%)
- 2 back-channel references: 25% (longitudinal performance, patterns interviews cannot reveal)
Rationale for weighting: At seed stage, what the candidate actually builds (trial) matters more than how they talk about building (interviews). References are weighted meaningfully because back-channel references from people who shipped with the candidate over months/years reveal coachability, reliability, and collaboration patterns that a 4-hour trial cannot.
Part 4: Paid Trial Brief + Rubric
(Already completed -- this section documents the evaluation framework for the 4-hour paid trial that was administered.)
Goal: Evaluate execution speed, technical craft (React + Python + LLM integration), product instinct, and autonomous decision-making under time pressure. Timebox: 4 hours Paid? Yes Scenario: Build a feature prototype for an AI writing assistant (sanitized -- no proprietary data or real product code).
Task
- Given: A brief describing a feature for an AI writing assistant (e.g., "smart rewrite suggestions" -- the user highlights text and gets AI-powered alternatives with different tones/styles)
- Build: A functional prototype with a React frontend and Python backend that integrates with an LLM API
- Deliverables: Working code (repo or zip), brief README explaining decisions and trade-offs, and a 5-minute walkthrough (async video or live)
Scoring Rubric (anchors)
| Dimension | 1 (Weak) | 2 (Below bar) | 3 (Meets bar) | 4 (Raises bar) |
|---|---|---|---|---|
| Problem framing + scope | Started coding without clarifying what to build. No README or explanation of decisions. | Some scope decisions but unclear rationale. README exists but is generic. | Explicitly scoped what to build and what to skip, with clear rationale. README explains 2-3 key trade-offs. | Reframed the problem from the user's perspective before building. Identified the highest-value slice and explained why. Trade-offs are articulated in product terms. |
| Execution quality (code) | Code doesn't run or has fundamental errors. No error handling. | Code runs but is fragile, poorly structured, or handles only the happy path. | Code is clean, functional, handles key edge cases, and is structured for iteration. Clear separation of frontend/backend. | Production-quality code with thoughtful architecture. Demonstrates awareness of LLM-specific concerns (latency, token limits, streaming, error recovery). Would merge with minor comments. |
| Speed + completeness | Delivered <30% of a functional prototype in 4 hours. | Delivered a partial prototype; core feature partially works but is not demo-able. | Delivered a functional prototype that demonstrates the core feature end-to-end. Reasonable scope for 4 hours. | Delivered a polished prototype with thoughtful UX touches, plus identified next steps. Clearly comfortable shipping under time pressure. |
| Product + UX instinct | No user-facing thought; output is a backend script or raw API response. | Basic UI exists but is an afterthought; no consideration of user experience. | UI is functional and shows awareness of user flow. Candidate thought about what the user sees and does. | UI includes delightful details (loading states, error messages, intuitive interactions). Candidate clearly designed for the end user, not just the evaluator. |
| Judgment + trade-offs | Made no trade-offs; tried to build everything or nothing. | Made trade-offs but cannot explain them; or made poor choices (polished login page but broken core feature). | Made sensible trade-offs and can explain what they prioritized and why. Chose to invest time where it matters most. | Trade-offs are excellent -- prioritized the highest-signal functionality, cut the right corners, and left clear markers for what they'd do next with more time. |
Part 5: Reference Check Kit (Back-Channel, 2 References)
5A: Reference targeting strategy
Why back-channel: Candidate-provided references are pre-screened for positivity. Back-channel references -- people you reach through your own network who have worked with the candidate -- yield more honest, calibrated signal. For a founding engineer hire, this is critical.
Ideal reference profiles (target 2):
| Reference # | Ideal Profile | Why This Matters |
|---|---|---|
| Ref 1 | Former engineering manager or tech lead who directly managed or closely shipped with the candidate for 6+ months | Reveals: execution reliability, coachability, response to feedback, technical depth over time, growth trajectory |
| Ref 2 | Former co-founder, product partner, or cross-functional peer who worked alongside them in a startup or high-ambiguity environment | Reveals: collaboration under pressure, product instinct, ownership behavior, startup endurance, how they handle chaos |
How to find back-channel references:
- Check LinkedIn for mutual connections who worked at the same company during the same period
- Ask your investors/advisors if they know anyone in the candidate's professional circle
- Look for co-authors on open-source projects, conference talks, or blog posts
- If you must ask the candidate, say: "We'd love to speak with someone who managed you and someone you shipped a project with. Can you suggest names beyond your listed references?"
5B: Outreach message (for back-channel references)
Hi [Reference Name] --
I'm [Your Name], [CEO/CTO] at [Company Name]. We're building an AI writing assistant and are in the final stages of considering [Candidate First Name] for a founding engineer role.
I understand you worked with [Candidate First Name] at [Company]. I'd really value 20 minutes of your time to hear about your experience working together -- strengths, growth areas, and whether you think this type of role would be a fit.
This is a confidential conversation and won't be shared with [Candidate First Name]. I'm specifically looking for honest, specific examples rather than a generic endorsement.
Would you be open to a quick call this week?
Thanks, [Your Name]
5C: Reference check script (structured -- use for both references)
Preamble (say this first): "Thanks for making time. I want to be upfront -- I'm looking for an honest, nuanced picture. There are no right or wrong answers. The most helpful thing you can do is give me specific examples, not just adjectives. Everything here is confidential."
Questions (ask all 11; adapt order to conversation flow):
Context + calibration:
- "How did you work with [Candidate]? What was your relationship, and for how long did you overlap?" (Establishes proximity and credibility of the reference.)
- "What was the environment like -- pace, ambiguity level, team size? How does it compare to a 4-person seed-stage startup?" (Calibrates whether their observations transfer to your context.)
Execution + technical ability: 3. "Can you tell me about a specific project [Candidate] shipped? Walk me through what they did, how long it took, and what the outcome was." (Maps to: Execution Speed + Shipping, Technical Depth.) 4. "How would you describe their speed and follow-through? Do they finish things, or do they start strong and fade?" (Maps to: Execution Speed, High Agency.) 5. "On a technical level, how would you rate them relative to the strongest engineers you've worked with at a similar career stage? What's the gap, if any?" (Maps to: Technical Depth -- calibration vs peers.)
Product instinct + ownership: 6. "Did [Candidate] ever push back on what to build, or suggest a different approach based on what users needed? Can you give me a specific example?" (Maps to: Product Thinking, High Agency.) 7. "When things were ambiguous or messy -- unclear requirements, shifting priorities, limited resources -- how did they respond? Example?" (Maps to: High Agency, Founder-Fit.)
Coachability + collaboration: 8. "How do they handle feedback, especially when they disagree? Can you give me a specific instance where they received critical feedback and how they responded?" (Maps to: Coachability. This is a critical red flag probe -- listen for defensiveness patterns.) 9. "How are they as a collaborator? Any friction with teammates, or patterns in how they work with others under pressure?" (Maps to: Coachability + Collaboration.)
Overall + rehire: 10. "If you were starting a company tomorrow and needed a founding engineer, would you try to recruit [Candidate]? Why or why not? And if not as a founding engineer, in what role would they be excellent?" (Maps to: Founder-Fit. The "if not, then what?" follow-up is the highest-signal question -- it reveals where the reference sees limits.) 11. "What's the one thing I should know about [Candidate] that I probably won't learn from interviews alone?" (Open-ended -- catches what structured questions miss.)
5D: Reference note-taking form (use during the call)
REFERENCE CHECK NOTES
=====================
Reference name:
Relationship to candidate:
Company / time period:
Duration of overlap:
Date of call:
Interviewer:
CONTEXT
- Environment (pace, team size, ambiguity level):
- How closely did they work together:
- Relevance to our context (seed-stage, 4 people, full-stack):
EXECUTION + TECHNICAL
- Specific project example:
- Speed + follow-through evidence:
- Technical calibration vs peers:
- Confidence in this signal (L/M/H):
PRODUCT INSTINCT + OWNERSHIP
- Example of pushing back on what to build:
- Response to ambiguity/mess:
- Confidence in this signal (L/M/H):
COACHABILITY + COLLABORATION
- Specific feedback response example:
- Defensiveness? (Y/N + evidence):
- Collaboration friction patterns:
- Confidence in this signal (L/M/H):
REHIRE + OVERALL
- Would they recruit for founding eng role? (Y/N + why):
- If not founding eng, what role?:
- "One thing I should know":
RED FLAGS DETECTED:
- [ ] Defensiveness to feedback
- [ ] Low follow-through / starts but doesn't finish
- [ ] Needs heavy structure / process to be productive
- [ ] Optimizes for technical elegance over user outcomes
- [ ] Friction with teammates under pressure
- [ ] Other: _______________
OVERALL SIGNAL: Strong positive / Positive / Mixed / Negative
CONFIDENCE: Low / Medium / High
NOTES ON BIAS/LIMITATIONS:
5E: Reference summary template (fill after each call)
Reference: [Name] Relationship/context: [e.g., "Engineering manager at [Co], overlapped 14 months on core product team"] Tenure working together: [Duration]
Strengths (with examples):
- [Specific example 1]
- [Specific example 2]
Concerns/limits (with examples):
- [Specific example 1]
- [Specific example 2]
Calibration: [Where they sit relative to peers; whether observations transfer to seed-stage context]
Rehire signal: Yes / No / Depends -- [why, and in what context] Notes on bias/limitations: [e.g., "Reference was a close friend -- may inflate. However, specific examples were concrete and consistent with interview signals."]
Part 6: Hiring Decision Memo
(This template is designed to be completed after all signals -- interviews, trial, and references -- are gathered. Fill in the bracketed sections with actual evidence.)
HIRING DECISION MEMO: [Candidate Name] for Founding Engineer
Decision: [Hire / No hire / Hold] Confidence: [Low / Medium / High] Date: [Date] Decision-makers: [Names]
Context
Role: Founding Engineer (Full-Stack, React + Python) at [Company Name] Stage: Seed, 4 people, $2M raised, building an AI writing assistant Success in 6 months: Ship core product to early users, establish scalable architecture, operate as autonomous co-builder Bar: Raise the bar -- this is the first engineering hire and will define the technical culture
Signals included:
- Round 1: Technical deep-dive ([Date], [Interviewer])
- Round 2: Product thinking ([Date], [Interviewer])
- Round 3: Founder-fit ([Date], [Interviewer])
- 4-hour paid trial: Feature prototype ([Date])
- Back-channel reference #1: [Name/relationship] ([Date])
- Back-channel reference #2: [Name/relationship] ([Date])
Signals NOT included / limitations:
- [e.g., "No on-site pairing session -- relied on trial for technical signal"]
- [e.g., "Only 2 references; would ideally have 3 for a founding hire"]
Evidence Summary (by criterion)
| # | Criterion (Weight) | Summary | Strong Evidence | Concerns / Contrary Evidence | Confidence |
|---|---|---|---|---|---|
| 1 | Execution Speed + Shipping (25%) | [1-2 sentence summary] | [Specific quotes/examples from trial + interviews + references] | [Any contrary signals] | L / M / H |
| 2 | Technical Depth + Craft (20%) | [1-2 sentence summary] | [Specific quotes/examples] | [Any contrary signals] | L / M / H |
| 3 | Product Thinking (20%) | [1-2 sentence summary] | [Specific quotes/examples] | [Any contrary signals] | L / M / H |
| 4 | High Agency + Ownership (15%) | [1-2 sentence summary] | [Specific quotes/examples] | [Any contrary signals] | L / M / H |
| 5 | Coachability + Collaboration (10%) | [1-2 sentence summary] | [Specific quotes/examples] | [Any contrary signals] | L / M / H |
| 6 | Founder-Fit + Startup Endurance (10%) | [1-2 sentence summary] | [Specific quotes/examples] | [Any contrary signals] | L / M / H |
Weighted Score: [X.X / 4.0]
Key Strengths (evidence-based)
- [Strength 1]: [Specific evidence from at least 2 signal sources. Example: "In the paid trial, built a fully functional prototype with streaming LLM responses and intuitive UI in 3.5 hours. Reference #1 confirmed: 'They were the fastest shipper on our team -- they'd have a working prototype while others were still debating the approach.'"]
- [Strength 2]: [Specific evidence]
- [Strength 3]: [Specific evidence]
Key Concerns / Risks (evidence-based)
- [Risk 1]: [Specific evidence and why it matters for this role. Example: "In the product thinking interview, defaulted to technical solutions before understanding the user problem. However, when redirected, they adjusted quickly. Reference #2 noted: 'They've grown a lot in product thinking but still default to engineering-first framing.' Risk level: Medium. This is addressable with coaching but could slow product-market fit iterations."]
- [Risk 2]: [Specific evidence]
- [Risk 3]: [Specific evidence]
Disconfirming Evidence (mandatory section)
List evidence that argues AGAINST your recommendation. If recommending hire, list the strongest no-hire signals. If recommending no-hire, list the strongest hire signals.
- [Signal 1 + source]
- [Signal 2 + source]
- Why this does / does not change the recommendation: [Explicit reasoning]
Mitigations (if hiring) -- 30/60/90 Onboarding Plan
Principle: A "hire with mitigations" means you've identified specific risks and are building the onboarding to address them proactively, not just hoping they resolve.
First 30 Days: Foundation + Trust-Building
| Week | Focus | Key Outcomes | Risk Mitigation |
|---|---|---|---|
| 1 | Environment setup + codebase orientation + first small PR merged | Has a working dev environment, understands the architecture, and has shipped something (even small) to production | Speed risk: If they take >3 days to ship a first PR, that's an early signal to investigate. Pair-program day 2-3 to assess real-time coding speed and identify blockers. |
| 2 | Owns a small, well-scoped feature end-to-end | Shipped a user-facing feature from spec to production. You've seen their full cycle: how they scope, build, test, and ship. | Agency risk: Observe whether they ask clarifying questions vs wait for instructions vs make reasonable assumptions and move. Debrief on their scoping decisions. |
| 3-4 | Takes on a medium-complexity feature with deliberate ambiguity | Navigated ambiguity, made trade-off decisions, and shipped. You've had at least 2 feedback conversations and seen how they respond. | Coachability risk: Give direct, specific feedback in week 3. Watch for: do they incorporate it in week 4? Do they get defensive or curious? This is the critical early signal for coachability concerns. |
30-day checkpoint questions:
- Are they shipping at the pace we need? (Y/N)
- Have they demonstrated product instinct, or are they building what they're told? (Y/N)
- How did they respond to feedback? Evidence of growth or defensiveness? (Specific examples)
- Would you re-hire them today knowing what you know now? (Y/N + why)
Days 31-60: Increasing Ownership + Product Partnership
| Week | Focus | Key Outcomes | Risk Mitigation |
|---|---|---|---|
| 5-6 | Owns a significant feature area; begins contributing to product decisions (what to build, not just how) | Has opinions on product direction backed by user/data reasoning. Is proactively identifying technical debt and proposing solutions. | Product thinking risk: If they're only executing specs and not contributing product ideas by week 6, schedule a structured 1:1 to assess product instinct. Assign a "propose what we should build next" exercise. |
| 7-8 | Leads a cross-cutting initiative (e.g., performance improvement, new AI feature, user onboarding flow) | Delivered a project that required coordination across frontend + backend + AI. Demonstrated architecture judgment. | Technical depth risk: Review their architecture decisions. If they're making decisions you'd overrule, calibrate whether it's a judgment gap (addressable) or a depth gap (harder). |
60-day checkpoint questions:
- Are they operating as a co-builder or as an employee? (Evidence)
- Is their technical architecture holding up, or are you accumulating debt? (Evidence)
- Are they contributing to product decisions with user-centric reasoning? (Evidence)
- Any collaboration friction? (Evidence)
Days 61-90: Full Autonomy + Culture Setting
| Week | Focus | Key Outcomes | Risk Mitigation |
|---|---|---|---|
| 9-10 | Fully autonomous on feature delivery; beginning to influence team norms (code quality, process, tooling) | Has shipped 2-3 significant features. Team can point to at least one process/tool/norm they introduced that made everyone better. | Founder-fit risk: If by day 75 they're waiting for direction rather than setting direction, this is a serious signal. A founding engineer should be pulling work, not being pushed. |
| 11-12 | Operating as a true founding team member. Can represent the technical vision to external stakeholders (investors, advisors, early users). | Has made at least one significant technical bet (architecture, tooling, infrastructure) with clear reasoning. Can articulate what the product does and why to a non-technical audience. | Endurance risk: Check for signs of burnout, frustration with ambiguity, or "this isn't what I signed up for" signals. Seed-stage reality hits around month 3. |
90-day checkpoint (formal review):
- Hire confirmation: Based on 90 days of evidence, do we confirm this was the right hire? (Y/N)
- Evidence: [Summarize strengths demonstrated, risks that materialized vs didn't, growth observed]
- If concerns remain: Define specific, measurable expectations for the next 30 days with explicit criteria for success/failure
- If concerns are serious: Have an honest conversation. It's better to part ways at 90 days than at 12 months.
Ongoing Support + Coaching
- Weekly 1:1s (30 min): Alternate between tactical (blockers, priorities) and strategic (growth, feedback, product direction). Keep a running doc.
- Monthly retrospectives: "What shipped, what didn't, what we learned, what we'd do differently." Model vulnerability by sharing your own mistakes.
- Quarterly role evolution conversation: At seed stage, the role will morph. Discuss explicitly: "Here's how I see your role evolving. What excites you? What concerns you?"
- External support (if budget allows): Connect them with a technical advisor or mentor who has been a founding engineer before. The role is isolating; external perspective helps.
Open Questions
- [List any unresolved questions that could change the recommendation. Example: "Reference #1 mentioned a conflict with a product manager at [Co] -- we were unable to get details. If the pattern is 'pushes back constructively,' that's positive for our context. If the pattern is 'undermines product decisions,' that's a red flag. Consider probing in a follow-up conversation with the candidate."]
- [Example: "Trial showed strong React skills but Python backend was more basic. Is this a depth gap or a time-allocation choice? Could probe with a targeted question."]
- [Example: "Only 2 references completed. A third reference from their most recent role would increase confidence."]
Next Steps
| Step | Owner | By When | Purpose |
|---|---|---|---|
| Complete back-channel reference checks (2) | [Hiring lead] | [Date] | Fill the longitudinal signal gap; specifically probe coachability and execution reliability |
| Score references and update the signal log | [Hiring lead] | [Date + 1 day] | Integrate reference evidence into the scorecard |
| Independent scorecard scoring (all evaluators fill in their scores before debrief) | All interviewers + trial evaluator | [Date + 2 days] | Reduce anchoring bias; capture independent signal before group discussion |
| Calibration debrief (all evaluators) | [Hiring lead facilitates] | [Date + 3 days] | Discuss disagreements, reconcile evidence, reach recommendation. Rules: no "I just had a feeling" -- evidence only. |
| Final decision | [CEO / decision owner] | [Date + 4 days] | Hire / No hire / Hold with explicit reasoning |
| If hire: extend offer | [CEO] | [Date + 5 days] | Move fast; strong candidates have other options |
| If hold: define the smallest additional signal needed | [Hiring lead] | [Date + 3 days] | Don't delay without a specific plan to resolve uncertainty |
Part 7: Quality Gate Self-Assessment
Checklist Results
A) Bar + criteria checklist:
- Criteria list is 6 items, written as observable behaviors/outcomes
- Each criterion has rating anchors (strong/weak) and evidence hints
- Red flags are explicit and behavioral (not vibes)
- "Culture" is defined as behaviors (coachability, agency, collaboration), not similarity
- Interview/trial/reference signals are mapped to criteria without double counting
B) Work sample / trial checklist:
- Task is job-relevant (build a feature prototype for an AI writing assistant)
- Scope/timebox is explicit (4 hours)
- Rubric exists with anchors across 5 dimensions
- Trial is paid
- Scenario is sanitized (no proprietary data)
- Instructions are clear enough to avoid "test-taking skill" bias
C) Reference check checklist:
- References chosen for proximity and duration (manager/peer who shipped together)
- Questions demand specific examples and calibration against peers
- Note-taking form captures evidence, not just adjectives
- Bias/limitations are acknowledged in the summary template
- Summary produces clear rehire signal
D) Decision memo checklist:
- Recommendation matches weighted evidence across criteria
- Disconfirming evidence section is mandatory (not cherry-picked)
- Unknowns are explicit and tied to next steps
- Risks have concrete mitigations (30/60/90 onboarding plan)
- Output respects privacy/PII constraints (no real names or sensitive data in templates)
E) Fairness + robustness checklist:
- Evidence is tied to criteria, not delivery style or confidence
- Halo effect check: explicit instruction to not let one strong signal erase multiple weak ones
- Non-traditional backgrounds handled: criteria focus on demonstrated ability, not pedigree
- Decision can be explained in one paragraph using evidence
Rubric Self-Score
| Category | Score | Notes |
|---|---|---|
| 1. Decision framing | 2 | Role, level, success outcomes, decision deadline, and signal plan are all explicit with rationale |
| 2. Criteria + bar clarity | 2 | 6 criteria with observable definitions, strong/weak anchors, explicit red flags, weighted for seed stage |
| 3. Evidence quality | 2 | Templates and log are structured for specific evidence; confidence levels required per criterion |
| 4. Work sample/trial quality | 2 | Job-relevant paid trial with 5-dimension rubric and concrete anchors |
| 5. Reference check quality | 2 | Back-channel strategy, 11-question structured script, note form with red flag checklist, bias-aware summary |
| 6. Synthesis coherence | 2 | Decision memo requires weighted evidence, disconfirming evidence, and explicit reconciliation |
| 7. Risks + mitigations | 2 | 30/60/90 onboarding plan with specific checkpoints, leading indicators, and exit criteria |
| 8. Fairness + privacy | 2 | Independent scoring before debrief, criteria locked before evaluation, no PII in shareable outputs |
| Total | 16/16 | Meets passing bar (>= 12/16) |
Appendix: How to Use This Pack
For the hiring lead:
- Before references: Share the scorecard (Part 2) with all interviewers. Have each person fill in their ratings and evidence independently -- do not discuss scores before everyone has submitted.
- Run references: Use the script in Part 5C for both back-channel references. Fill in the note form (Part 5D) during each call. Write the summary (Part 5E) within 24 hours while details are fresh.
- Update the signal log: Add reference evidence to Part 3. Recalculate the weighted score if reference signals shift any criterion ratings.
- Fill in the decision memo: Complete Part 6 with actual evidence from all sources. The disconfirming evidence section is mandatory -- skip it and you're doing "gut feel with a template."
- Run the calibration debrief: Share the completed memo with all evaluators. Discuss disagreements using evidence. The facilitator's job is to prevent the loudest voice from winning.
- Make the decision: The decision owner reviews the memo, confirms or overrides with explicit reasoning, and moves to next steps.
For the founding team:
- If you hire: Start the 30/60/90 onboarding plan (Part 6, Mitigations section) on day one. Do not skip the checkpoints. The 30-day checkpoint is the most important -- it's your earliest opportunity to confirm or course-correct.
- If you don't hire: Document why (for calibration on future candidates) and re-open the search. Consider whether any criteria need adjustment based on what you learned.
- If you hold: Define the one additional signal that would resolve uncertainty (Part 7, Next Steps). Do not hold indefinitely -- set a 1-week deadline for the additional signal, then decide.