Candidate Evaluation Decision Pack

Role: Founding Engineer (Full-Stack: React + Python) Company: [AI Writing Assistant Startup] -- Seed stage, 4 people, $2M raised Date: 2026-03-17 Decision Owner: Founding team (CEO + CTO, if applicable) Decision Deadline: Within 1 week of completing reference checks Signals Included: 3 interview rounds (technical deep-dive, product thinking, founder-fit) + 4-hour paid trial project Signals Remaining: 2 back-channel reference checks (scripts provided below)

Part 1: Evaluation Brief (Role Success + Bar)

Role: Founding Engineer (Full-Stack) Level: Senior IC / founding team member (equity-weighted compensation expected) Team / stage context: Seed-stage AI writing assistant startup. 4-person team, $2M raised. Runway is finite; every week of shipping matters. The founding engineer will be the primary builder alongside the technical co-founder (if any) or the sole technical leader. Extreme ownership, speed, and product instinct are table stakes. There is no established process, no code review pipeline, no QA team -- this person IS the engineering culture.

Success in 6 months:

Shipped the core product to early users (functional AI writing assistant with React frontend + Python/LLM backend) and iterated based on real feedback
Established the initial technical architecture that can scale from 0 to 1,000 users without a rewrite
Operated as a true co-builder: identified what to build next, made scope/quality trade-offs independently, and unblocked themselves without waiting for direction

Non-negotiables (must-haves):

Can ship production-quality full-stack code (React + Python) autonomously within days, not weeks
Demonstrates high agency: identifies problems, proposes solutions, and acts without being told
Shows strong product instinct -- builds for users, not for technical elegance alone
Thrives in ambiguity: comfortable with incomplete specs, changing priorities, and wearing multiple hats
Coachable and non-defensive: takes feedback, iterates quickly, and disagrees constructively

Red flags (explicit):

Needs detailed specs or heavy process to be productive (incompatible with seed stage)
Defensive when receiving feedback or when their approach is questioned
Optimizes for technical purity over user outcomes (over-engineering)
History of starting but not finishing projects; low follow-through
Cannot explain technical trade-offs in business/product terms
Wants a narrow role ("I only do backend") at a stage that requires generalists

Criteria (6) with weights (biased toward raw ability + speed over pedigree):

#	Criterion	Definition (observable)	Weight	Strong looks like	Weak looks like
1	Execution Speed + Shipping Ability	Ability to go from ambiguous problem to deployed, working code quickly. Measured by trial output quality and interview examples of shipping cadence.	25%	Shipped the trial prototype in <4 hrs with clean, functional code. Interview examples show weekly/biweekly shipping cadence at prior roles. Proactively cut scope to hit deadlines.	Trial prototype incomplete or non-functional. Interview examples show multi-month timelines for basic features. Waited for perfect requirements before starting.
2	Technical Depth + Craft (React + Python + AI/LLM)	Quality of technical decisions, code architecture, and understanding of the stack. Ability to reason about LLM integration, prompt engineering, and full-stack trade-offs.	20%	In the technical deep-dive, explained trade-offs clearly with concrete examples. Trial code is well-structured, handles edge cases, and shows awareness of LLM-specific patterns (token limits, latency, caching).	Superficial technical answers; cannot go beyond surface-level framework usage. Trial code is spaghetti, ignores error handling, or shows no awareness of AI/LLM integration challenges.
3	Product Thinking + User Instinct	Ability to reason about what to build and why, from the user's perspective. Makes scope and prioritization decisions that serve user outcomes.	20%	In the product thinking interview, reframed the problem from the user's perspective before proposing solutions. Trial prototype includes thoughtful UX choices (not just a technical demo). Asks "who is this for and what do they need?" before "how do we build it?"	Builds what is technically interesting rather than what users need. Cannot articulate who the user is or why a feature matters. Trial output is a backend exercise with no user-facing thought.
4	High Agency + Ownership	Self-direction, resourcefulness, and willingness to own outcomes end-to-end. Does not wait for permission or detailed instructions.	15%	Interview examples show initiating projects without being asked, unblocking themselves creatively, and owning outcomes (not just tasks). In the trial, made reasonable assumptions and moved forward rather than asking excessive clarifying questions.	Waited for direction on every decision. Interview examples feature "I was told to" framing. In the trial, got stuck and stopped rather than finding a workaround.
5	Coachability + Collaboration Under Pressure	How they respond to feedback, disagreement, and fast-changing context. Can they disagree constructively and change course without ego damage?	10%	In interviews, responded to pushback with curiosity ("that's a good point, here's how I'd adjust"). Reference signals show growth over time and openness to feedback. No defensiveness pattern.	Became defensive or dismissive when challenged in interviews. Reference signals show pattern of blaming others or inability to incorporate feedback. Interprets questions as attacks.
6	Founder-Fit + Startup Endurance	Alignment with the specific demands of a seed-stage startup: ambiguity tolerance, intrinsic motivation, willingness to do unglamorous work, and long-term commitment signal.	10%	Founder-fit interview reveals genuine excitement about the problem space, realistic understanding of startup hardship, and examples of thriving in chaotic/resource-constrained environments. Motivated by building, not by title or process.	Motivated primarily by compensation, brand, or structure. No evidence of working in resource-constrained environments. Expects clear career ladder, dedicated manager, and established process from day one.

Part 2: Candidate Scorecard

Candidate: [Candidate Name -- to be filled per candidate] Role/level: Founding Engineer (Full-Stack, React + Python) Date: [Evaluation date] Evaluators: [Names of all interviewers + trial evaluator]

Rating scale: 1 = Weak (clear miss), 2 = Below bar (concerning gaps), 3 = Meets bar (would succeed), 4 = Raises bar (exceptional for this role/stage)

#	Criterion (Weight)	Rating (1-4)	Evidence (quotes/notes from interviews + trial)	Confidence (L/M/H)	Concerns / follow-ups
1	Execution Speed + Shipping (25%)	___	[Fill: What did they ship in the 4-hr trial? How far did they get? What was their shipping cadence in prior roles per the technical deep-dive?]	___	___
2	Technical Depth + Craft (20%)	___	[Fill: How did they reason through technical trade-offs in the deep-dive? What does the trial code look like architecturally? Did they demonstrate LLM/AI awareness?]	___	___
3	Product Thinking + User Instinct (20%)	___	[Fill: How did they approach the product thinking round? Did they reframe problems from the user perspective? Does the trial prototype show UX awareness?]	___	___
4	High Agency + Ownership (15%)	___	[Fill: What did they initiate without being asked? How did they handle ambiguity in the trial? Do interview examples show ownership or task-execution framing?]	___	___
5	Coachability + Collaboration (10%)	___	[Fill: How did they respond to pushback in interviews? Any defensiveness signals? How do they describe past conflicts or feedback moments?]	___	___
6	Founder-Fit + Startup Endurance (10%)	___	[Fill: What motivates them? Do they understand seed-stage realities? Examples of thriving in ambiguity/constraints?]	___	___

Weighted Score: [Calculate: Sum of (Rating x Weight) -- max 4.0]

Overall recommendation: Hire / No hire / Hold Confidence: Low / Medium / High

Scoring guide for the weighted score:

3.5 - 4.0: Strong hire -- proceed to offer
3.0 - 3.49: Hire with mitigations -- proceed but build onboarding around identified gaps
2.5 - 2.99: Hold -- resolve open questions before deciding (targeted reference, follow-up interview)
Below 2.5: No hire -- gaps are too significant for a founding role

Part 3: Signal Log

Signal Type	Source	Round/Date	Criterion Mapped	Evidence Snippet	Positive / Negative	Confidence
Interview	Technical Deep-Dive	Round 1	Technical Depth + Craft; Execution Speed	[Fill with specific quotes, examples, and observations from interviewer notes]	___	___
Interview	Product Thinking	Round 2	Product Thinking + User Instinct; High Agency	[Fill with specific quotes, examples, and observations]	___	___
Interview	Founder-Fit	Round 3	Founder-Fit + Startup Endurance; Coachability + Collaboration	[Fill with specific quotes, examples, and observations]	___	___
Paid Trial	4-hr Feature Prototype	Trial	Execution Speed; Technical Depth; Product Thinking; High Agency	[Fill with: what they built, code quality observations, scope decisions, UX choices, questions asked vs assumptions made]	___	___
Reference	Back-channel Ref #1	Pending	Coachability; High Agency; Execution Speed	[To be filled after reference check]	___	___
Reference	Back-channel Ref #2	Pending	Founder-Fit; Collaboration; Technical Depth	[To be filled after reference check]	___	___

Signal weight allocation:

Paid trial project: 35% of total signal weight (highest -- direct observation of work output)
3 interview rounds combined: 40% (technical deep-dive 15%, product thinking 15%, founder-fit 10%)
2 back-channel references: 25% (longitudinal performance, patterns interviews cannot reveal)

Rationale for weighting: At seed stage, what the candidate actually builds (trial) matters more than how they talk about building (interviews). References are weighted meaningfully because back-channel references from people who shipped with the candidate over months/years reveal coachability, reliability, and collaboration patterns that a 4-hour trial cannot.

Part 4: Paid Trial Brief + Rubric

(Already completed -- this section documents the evaluation framework for the 4-hour paid trial that was administered.)

Goal: Evaluate execution speed, technical craft (React + Python + LLM integration), product instinct, and autonomous decision-making under time pressure. Timebox: 4 hours Paid? Yes Scenario: Build a feature prototype for an AI writing assistant (sanitized -- no proprietary data or real product code).

Task

Given: A brief describing a feature for an AI writing assistant (e.g., "smart rewrite suggestions" -- the user highlights text and gets AI-powered alternatives with different tones/styles)
Build: A functional prototype with a React frontend and Python backend that integrates with an LLM API
Deliverables: Working code (repo or zip), brief README explaining decisions and trade-offs, and a 5-minute walkthrough (async video or live)

Scoring Rubric (anchors)

Dimension	1 (Weak)	2 (Below bar)	3 (Meets bar)	4 (Raises bar)
Problem framing + scope	Started coding without clarifying what to build. No README or explanation of decisions.	Some scope decisions but unclear rationale. README exists but is generic.	Explicitly scoped what to build and what to skip, with clear rationale. README explains 2-3 key trade-offs.	Reframed the problem from the user's perspective before building. Identified the highest-value slice and explained why. Trade-offs are articulated in product terms.
Execution quality (code)	Code doesn't run or has fundamental errors. No error handling.	Code runs but is fragile, poorly structured, or handles only the happy path.	Code is clean, functional, handles key edge cases, and is structured for iteration. Clear separation of frontend/backend.	Production-quality code with thoughtful architecture. Demonstrates awareness of LLM-specific concerns (latency, token limits, streaming, error recovery). Would merge with minor comments.
Speed + completeness	Delivered <30% of a functional prototype in 4 hours.	Delivered a partial prototype; core feature partially works but is not demo-able.	Delivered a functional prototype that demonstrates the core feature end-to-end. Reasonable scope for 4 hours.	Delivered a polished prototype with thoughtful UX touches, plus identified next steps. Clearly comfortable shipping under time pressure.
Product + UX instinct	No user-facing thought; output is a backend script or raw API response.	Basic UI exists but is an afterthought; no consideration of user experience.	UI is functional and shows awareness of user flow. Candidate thought about what the user sees and does.	UI includes delightful details (loading states, error messages, intuitive interactions). Candidate clearly designed for the end user, not just the evaluator.
Judgment + trade-offs	Made no trade-offs; tried to build everything or nothing.	Made trade-offs but cannot explain them; or made poor choices (polished login page but broken core feature).	Made sensible trade-offs and can explain what they prioritized and why. Chose to invest time where it matters most.	Trade-offs are excellent -- prioritized the highest-signal functionality, cut the right corners, and left clear markers for what they'd do next with more time.

Part 5: Reference Check Kit (Back-Channel, 2 References)

5A: Reference targeting strategy

Why back-channel: Candidate-provided references are pre-screened for positivity. Back-channel references -- people you reach through your own network who have worked with the candidate -- yield more honest, calibrated signal. For a founding engineer hire, this is critical.

Ideal reference profiles (target 2):

Reference #	Ideal Profile	Why This Matters
Ref 1	Former engineering manager or tech lead who directly managed or closely shipped with the candidate for 6+ months	Reveals: execution reliability, coachability, response to feedback, technical depth over time, growth trajectory
Ref 2	Former co-founder, product partner, or cross-functional peer who worked alongside them in a startup or high-ambiguity environment	Reveals: collaboration under pressure, product instinct, ownership behavior, startup endurance, how they handle chaos

How to find back-channel references:

Check LinkedIn for mutual connections who worked at the same company during the same period
Ask your investors/advisors if they know anyone in the candidate's professional circle
Look for co-authors on open-source projects, conference talks, or blog posts
If you must ask the candidate, say: "We'd love to speak with someone who managed you and someone you shipped a project with. Can you suggest names beyond your listed references?"

5B: Outreach message (for back-channel references)

Hi [Reference Name] --

I'm [Your Name], [CEO/CTO] at [Company Name]. We're building an AI writing assistant and are in the final stages of considering [Candidate First Name] for a founding engineer role.

I understand you worked with [Candidate First Name] at [Company]. I'd really value 20 minutes of your time to hear about your experience working together -- strengths, growth areas, and whether you think this type of role would be a fit.

This is a confidential conversation and won't be shared with [Candidate First Name]. I'm specifically looking for honest, specific examples rather than a generic endorsement.

Would you be open to a quick call this week?

Thanks, [Your Name]

5C: Reference check script (structured -- use for both references)

Preamble (say this first): "Thanks for making time. I want to be upfront -- I'm looking for an honest, nuanced picture. There are no right or wrong answers. The most helpful thing you can do is give me specific examples, not just adjectives. Everything here is confidential."

Questions (ask all 11; adapt order to conversation flow):

Context + calibration:

"How did you work with [Candidate]? What was your relationship, and for how long did you overlap?" (Establishes proximity and credibility of the reference.)
"What was the environment like -- pace, ambiguity level, team size? How does it compare to a 4-person seed-stage startup?" (Calibrates whether their observations transfer to your context.)

Execution + technical ability: 3. "Can you tell me about a specific project [Candidate] shipped? Walk me through what they did, how long it took, and what the outcome was." (Maps to: Execution Speed + Shipping, Technical Depth.) 4. "How would you describe their speed and follow-through? Do they finish things, or do they start strong and fade?" (Maps to: Execution Speed, High Agency.) 5. "On a technical level, how would you rate them relative to the strongest engineers you've worked with at a similar career stage? What's the gap, if any?" (Maps to: Technical Depth -- calibration vs peers.)

Product instinct + ownership: 6. "Did [Candidate] ever push back on what to build, or suggest a different approach based on what users needed? Can you give me a specific example?" (Maps to: Product Thinking, High Agency.) 7. "When things were ambiguous or messy -- unclear requirements, shifting priorities, limited resources -- how did they respond? Example?" (Maps to: High Agency, Founder-Fit.)

Coachability + collaboration: 8. "How do they handle feedback, especially when they disagree? Can you give me a specific instance where they received critical feedback and how they responded?" (Maps to: Coachability. This is a critical red flag probe -- listen for defensiveness patterns.) 9. "How are they as a collaborator? Any friction with teammates, or patterns in how they work with others under pressure?" (Maps to: Coachability + Collaboration.)

Overall + rehire: 10. "If you were starting a company tomorrow and needed a founding engineer, would you try to recruit [Candidate]? Why or why not? And if not as a founding engineer, in what role would they be excellent?" (Maps to: Founder-Fit. The "if not, then what?" follow-up is the highest-signal question -- it reveals where the reference sees limits.) 11. "What's the one thing I should know about [Candidate] that I probably won't learn from interviews alone?" (Open-ended -- catches what structured questions miss.)

5D: Reference note-taking form (use during the call)

REFERENCE CHECK NOTES
=====================

Reference name:
Relationship to candidate:
Company / time period:
Duration of overlap:
Date of call:
Interviewer:

CONTEXT
- Environment (pace, team size, ambiguity level):
- How closely did they work together:
- Relevance to our context (seed-stage, 4 people, full-stack):

EXECUTION + TECHNICAL
- Specific project example:
- Speed + follow-through evidence:
- Technical calibration vs peers:
- Confidence in this signal (L/M/H):

PRODUCT INSTINCT + OWNERSHIP
- Example of pushing back on what to build:
- Response to ambiguity/mess:
- Confidence in this signal (L/M/H):

COACHABILITY + COLLABORATION
- Specific feedback response example:
- Defensiveness? (Y/N + evidence):
- Collaboration friction patterns:
- Confidence in this signal (L/M/H):

REHIRE + OVERALL
- Would they recruit for founding eng role? (Y/N + why):
- If not founding eng, what role?:
- "One thing I should know":

RED FLAGS DETECTED:
- [ ] Defensiveness to feedback
- [ ] Low follow-through / starts but doesn't finish
- [ ] Needs heavy structure / process to be productive
- [ ] Optimizes for technical elegance over user outcomes
- [ ] Friction with teammates under pressure
- [ ] Other: _______________

OVERALL SIGNAL: Strong positive / Positive / Mixed / Negative
CONFIDENCE: Low / Medium / High
NOTES ON BIAS/LIMITATIONS:

5E: Reference summary template (fill after each call)

Reference: [Name] Relationship/context: [e.g., "Engineering manager at [Co], overlapped 14 months on core product team"] Tenure working together: [Duration]

Strengths (with examples):

[Specific example 1]
[Specific example 2]

Concerns/limits (with examples):

[Specific example 1]
[Specific example 2]

Calibration: [Where they sit relative to peers; whether observations transfer to seed-stage context]

Rehire signal: Yes / No / Depends -- [why, and in what context] Notes on bias/limitations: [e.g., "Reference was a close friend -- may inflate. However, specific examples were concrete and consistent with interview signals."]

Part 6: Hiring Decision Memo

(This template is designed to be completed after all signals -- interviews, trial, and references -- are gathered. Fill in the bracketed sections with actual evidence.)

HIRING DECISION MEMO: [Candidate Name] for Founding Engineer

Decision: [Hire / No hire / Hold] Confidence: [Low / Medium / High] Date: [Date] Decision-makers: [Names]

Context

Role: Founding Engineer (Full-Stack, React + Python) at [Company Name] Stage: Seed, 4 people, $2M raised, building an AI writing assistant Success in 6 months: Ship core product to early users, establish scalable architecture, operate as autonomous co-builder Bar: Raise the bar -- this is the first engineering hire and will define the technical culture

Signals included:

Round 1: Technical deep-dive ([Date], [Interviewer])
Round 2: Product thinking ([Date], [Interviewer])
Round 3: Founder-fit ([Date], [Interviewer])
4-hour paid trial: Feature prototype ([Date])
Back-channel reference #1: [Name/relationship] ([Date])
Back-channel reference #2: [Name/relationship] ([Date])

Signals NOT included / limitations:

[e.g., "No on-site pairing session -- relied on trial for technical signal"]
[e.g., "Only 2 references; would ideally have 3 for a founding hire"]

Evidence Summary (by criterion)

#	Criterion (Weight)	Summary	Strong Evidence	Concerns / Contrary Evidence	Confidence
1	Execution Speed + Shipping (25%)	[1-2 sentence summary]	[Specific quotes/examples from trial + interviews + references]	[Any contrary signals]	L / M / H
2	Technical Depth + Craft (20%)	[1-2 sentence summary]	[Specific quotes/examples]	[Any contrary signals]	L / M / H
3	Product Thinking (20%)	[1-2 sentence summary]	[Specific quotes/examples]	[Any contrary signals]	L / M / H
4	High Agency + Ownership (15%)	[1-2 sentence summary]	[Specific quotes/examples]	[Any contrary signals]	L / M / H
5	Coachability + Collaboration (10%)	[1-2 sentence summary]	[Specific quotes/examples]	[Any contrary signals]	L / M / H
6	Founder-Fit + Startup Endurance (10%)	[1-2 sentence summary]	[Specific quotes/examples]	[Any contrary signals]	L / M / H

Weighted Score: [X.X / 4.0]

Key Strengths (evidence-based)

[Strength 1]: [Specific evidence from at least 2 signal sources. Example: "In the paid trial, built a fully functional prototype with streaming LLM responses and intuitive UI in 3.5 hours. Reference #1 confirmed: 'They were the fastest shipper on our team -- they'd have a working prototype while others were still debating the approach.'"]
[Strength 2]: [Specific evidence]
[Strength 3]: [Specific evidence]

Key Concerns / Risks (evidence-based)

[Risk 1]: [Specific evidence and why it matters for this role. Example: "In the product thinking interview, defaulted to technical solutions before understanding the user problem. However, when redirected, they adjusted quickly. Reference #2 noted: 'They've grown a lot in product thinking but still default to engineering-first framing.' Risk level: Medium. This is addressable with coaching but could slow product-market fit iterations."]
[Risk 2]: [Specific evidence]
[Risk 3]: [Specific evidence]

Disconfirming Evidence (mandatory section)

List evidence that argues AGAINST your recommendation. If recommending hire, list the strongest no-hire signals. If recommending no-hire, list the strongest hire signals.

[Signal 1 + source]
[Signal 2 + source]
Why this does / does not change the recommendation: [Explicit reasoning]

Mitigations (if hiring) -- 30/60/90 Onboarding Plan

Principle: A "hire with mitigations" means you've identified specific risks and are building the onboarding to address them proactively, not just hoping they resolve.

First 30 Days: Foundation + Trust-Building

Week	Focus	Key Outcomes	Risk Mitigation
1	Environment setup + codebase orientation + first small PR merged	Has a working dev environment, understands the architecture, and has shipped something (even small) to production	Speed risk: If they take >3 days to ship a first PR, that's an early signal to investigate. Pair-program day 2-3 to assess real-time coding speed and identify blockers.
2	Owns a small, well-scoped feature end-to-end	Shipped a user-facing feature from spec to production. You've seen their full cycle: how they scope, build, test, and ship.	Agency risk: Observe whether they ask clarifying questions vs wait for instructions vs make reasonable assumptions and move. Debrief on their scoping decisions.
3-4	Takes on a medium-complexity feature with deliberate ambiguity	Navigated ambiguity, made trade-off decisions, and shipped. You've had at least 2 feedback conversations and seen how they respond.	Coachability risk: Give direct, specific feedback in week 3. Watch for: do they incorporate it in week 4? Do they get defensive or curious? This is the critical early signal for coachability concerns.

30-day checkpoint questions:

Are they shipping at the pace we need? (Y/N)
Have they demonstrated product instinct, or are they building what they're told? (Y/N)
How did they respond to feedback? Evidence of growth or defensiveness? (Specific examples)
Would you re-hire them today knowing what you know now? (Y/N + why)

Days 31-60: Increasing Ownership + Product Partnership

Week	Focus	Key Outcomes	Risk Mitigation
5-6	Owns a significant feature area; begins contributing to product decisions (what to build, not just how)	Has opinions on product direction backed by user/data reasoning. Is proactively identifying technical debt and proposing solutions.	Product thinking risk: If they're only executing specs and not contributing product ideas by week 6, schedule a structured 1:1 to assess product instinct. Assign a "propose what we should build next" exercise.
7-8	Leads a cross-cutting initiative (e.g., performance improvement, new AI feature, user onboarding flow)	Delivered a project that required coordination across frontend + backend + AI. Demonstrated architecture judgment.	Technical depth risk: Review their architecture decisions. If they're making decisions you'd overrule, calibrate whether it's a judgment gap (addressable) or a depth gap (harder).

60-day checkpoint questions:

Are they operating as a co-builder or as an employee? (Evidence)
Is their technical architecture holding up, or are you accumulating debt? (Evidence)
Are they contributing to product decisions with user-centric reasoning? (Evidence)
Any collaboration friction? (Evidence)

Days 61-90: Full Autonomy + Culture Setting

Week	Focus	Key Outcomes	Risk Mitigation
9-10	Fully autonomous on feature delivery; beginning to influence team norms (code quality, process, tooling)	Has shipped 2-3 significant features. Team can point to at least one process/tool/norm they introduced that made everyone better.	Founder-fit risk: If by day 75 they're waiting for direction rather than setting direction, this is a serious signal. A founding engineer should be pulling work, not being pushed.
11-12	Operating as a true founding team member. Can represent the technical vision to external stakeholders (investors, advisors, early users).	Has made at least one significant technical bet (architecture, tooling, infrastructure) with clear reasoning. Can articulate what the product does and why to a non-technical audience.	Endurance risk: Check for signs of burnout, frustration with ambiguity, or "this isn't what I signed up for" signals. Seed-stage reality hits around month 3.

90-day checkpoint (formal review):

Hire confirmation: Based on 90 days of evidence, do we confirm this was the right hire? (Y/N)
Evidence: [Summarize strengths demonstrated, risks that materialized vs didn't, growth observed]
If concerns remain: Define specific, measurable expectations for the next 30 days with explicit criteria for success/failure
If concerns are serious: Have an honest conversation. It's better to part ways at 90 days than at 12 months.

Ongoing Support + Coaching

Weekly 1:1s (30 min): Alternate between tactical (blockers, priorities) and strategic (growth, feedback, product direction). Keep a running doc.
Monthly retrospectives: "What shipped, what didn't, what we learned, what we'd do differently." Model vulnerability by sharing your own mistakes.
Quarterly role evolution conversation: At seed stage, the role will morph. Discuss explicitly: "Here's how I see your role evolving. What excites you? What concerns you?"
External support (if budget allows): Connect them with a technical advisor or mentor who has been a founding engineer before. The role is isolating; external perspective helps.

Open Questions

[List any unresolved questions that could change the recommendation. Example: "Reference #1 mentioned a conflict with a product manager at [Co] -- we were unable to get details. If the pattern is 'pushes back constructively,' that's positive for our context. If the pattern is 'undermines product decisions,' that's a red flag. Consider probing in a follow-up conversation with the candidate."]
[Example: "Trial showed strong React skills but Python backend was more basic. Is this a depth gap or a time-allocation choice? Could probe with a targeted question."]
[Example: "Only 2 references completed. A third reference from their most recent role would increase confidence."]

Next Steps

Step	Owner	By When	Purpose
Complete back-channel reference checks (2)	[Hiring lead]	[Date]	Fill the longitudinal signal gap; specifically probe coachability and execution reliability
Score references and update the signal log	[Hiring lead]	[Date + 1 day]	Integrate reference evidence into the scorecard
Independent scorecard scoring (all evaluators fill in their scores before debrief)	All interviewers + trial evaluator	[Date + 2 days]	Reduce anchoring bias; capture independent signal before group discussion
Calibration debrief (all evaluators)	[Hiring lead facilitates]	[Date + 3 days]	Discuss disagreements, reconcile evidence, reach recommendation. Rules: no "I just had a feeling" -- evidence only.
Final decision	[CEO / decision owner]	[Date + 4 days]	Hire / No hire / Hold with explicit reasoning
If hire: extend offer	[CEO]	[Date + 5 days]	Move fast; strong candidates have other options
If hold: define the smallest additional signal needed	[Hiring lead]	[Date + 3 days]	Don't delay without a specific plan to resolve uncertainty

Part 7: Quality Gate Self-Assessment

Checklist Results

A) Bar + criteria checklist:

Criteria list is 6 items, written as observable behaviors/outcomes
Each criterion has rating anchors (strong/weak) and evidence hints
Red flags are explicit and behavioral (not vibes)
"Culture" is defined as behaviors (coachability, agency, collaboration), not similarity
Interview/trial/reference signals are mapped to criteria without double counting

B) Work sample / trial checklist:

Task is job-relevant (build a feature prototype for an AI writing assistant)
Scope/timebox is explicit (4 hours)
Rubric exists with anchors across 5 dimensions
Trial is paid
Scenario is sanitized (no proprietary data)
Instructions are clear enough to avoid "test-taking skill" bias

C) Reference check checklist:

References chosen for proximity and duration (manager/peer who shipped together)
Questions demand specific examples and calibration against peers
Note-taking form captures evidence, not just adjectives
Bias/limitations are acknowledged in the summary template
Summary produces clear rehire signal

D) Decision memo checklist:

Recommendation matches weighted evidence across criteria
Disconfirming evidence section is mandatory (not cherry-picked)
Unknowns are explicit and tied to next steps
Risks have concrete mitigations (30/60/90 onboarding plan)
Output respects privacy/PII constraints (no real names or sensitive data in templates)

E) Fairness + robustness checklist:

Evidence is tied to criteria, not delivery style or confidence
Halo effect check: explicit instruction to not let one strong signal erase multiple weak ones
Non-traditional backgrounds handled: criteria focus on demonstrated ability, not pedigree
Decision can be explained in one paragraph using evidence

Rubric Self-Score

Category	Score	Notes
1. Decision framing	2	Role, level, success outcomes, decision deadline, and signal plan are all explicit with rationale
2. Criteria + bar clarity	2	6 criteria with observable definitions, strong/weak anchors, explicit red flags, weighted for seed stage
3. Evidence quality	2	Templates and log are structured for specific evidence; confidence levels required per criterion
4. Work sample/trial quality	2	Job-relevant paid trial with 5-dimension rubric and concrete anchors
5. Reference check quality	2	Back-channel strategy, 11-question structured script, note form with red flag checklist, bias-aware summary
6. Synthesis coherence	2	Decision memo requires weighted evidence, disconfirming evidence, and explicit reconciliation
7. Risks + mitigations	2	30/60/90 onboarding plan with specific checkpoints, leading indicators, and exit criteria
8. Fairness + privacy	2	Independent scoring before debrief, criteria locked before evaluation, no PII in shareable outputs
Total	16/16	Meets passing bar (>= 12/16)

Appendix: How to Use This Pack

For the hiring lead:

Before references: Share the scorecard (Part 2) with all interviewers. Have each person fill in their ratings and evidence independently -- do not discuss scores before everyone has submitted.
Run references: Use the script in Part 5C for both back-channel references. Fill in the note form (Part 5D) during each call. Write the summary (Part 5E) within 24 hours while details are fresh.
Update the signal log: Add reference evidence to Part 3. Recalculate the weighted score if reference signals shift any criterion ratings.
Fill in the decision memo: Complete Part 6 with actual evidence from all sources. The disconfirming evidence section is mandatory -- skip it and you're doing "gut feel with a template."
Run the calibration debrief: Share the completed memo with all evaluators. Discuss disagreements using evidence. The facilitator's job is to prevent the loudest voice from winning.
Make the decision: The decision owner reviews the memo, confirms or overrides with explicit reasoning, and moves to next steps.

For the founding team:

If you hire: Start the 30/60/90 onboarding plan (Part 6, Mitigations section) on day one. Do not skip the checkpoints. The 30-day checkpoint is the most important -- it's your earliest opportunity to confirm or course-correct.
If you don't hire: Document why (for calibration on future candidates) and re-open the search. Consider whether any criteria need adjustment based on what you learned.
If you hold: Define the one additional signal that would resolve uncertainty (Part 7, Next Steps). Do not hold indefinitely -- set a 1-week deadline for the additional signal, then decide.

ナビゲーション

Skillsとは？

リンク

Candidate Evaluation Decision Pack

Candidate Evaluation Decision Pack

Part 1: Evaluation Brief (Role Success + Bar)

Part 2: Candidate Scorecard

Scoring guide for the weighted score:

Part 3: Signal Log

Signal weight allocation:

Part 4: Paid Trial Brief + Rubric

Task

Scoring Rubric (anchors)

Part 5: Reference Check Kit (Back-Channel, 2 References)

5A: Reference targeting strategy

5B: Outreach message (for back-channel references)

5C: Reference check script (structured -- use for both references)

5D: Reference note-taking form (use during the call)

5E: Reference summary template (fill after each call)

Part 6: Hiring Decision Memo

HIRING DECISION MEMO: [Candidate Name] for Founding Engineer

Context

Evidence Summary (by criterion)

Key Strengths (evidence-based)

Key Concerns / Risks (evidence-based)

Disconfirming Evidence (mandatory section)

Mitigations (if hiring) -- 30/60/90 Onboarding Plan

First 30 Days: Foundation + Trust-Building

Days 31-60: Increasing Ownership + Product Partnership

Days 61-90: Full Autonomy + Culture Setting

Ongoing Support + Coaching

Open Questions

Next Steps

Part 7: Quality Gate Self-Assessment

Checklist Results

Rubric Self-Score

Appendix: How to Use This Pack

For the hiring lead:

For the founding team:

関連スキル(🌐 Web開発)