Process & methodology, end to end

Methodology — the rules that govern every phase below

Before the per-phase walkthrough, the cross-cutting commitments and disciplines that every phase honours. Read these first; the phase narratives reference them by name and assume the reader has the discipline vocabulary loaded.

Read this first · non-negotiable · supersedes every rule below

Research is the foundation. Strip it and what remains is theatre.

The methodology is downstream of research. If Research Arsenal does not run, there is no due diligence — only fabrication wearing methodology clothing.

The thirteen phases, the four-vendor Quad Review, the anti-hallucination protocol, the source-tier discipline, the triangulation matrix, the eight deliverables — every one of these operates on inputs produced by Research Arsenal. Without RA, the methodology has nothing to apply itself to. The synthesizer would produce fluent attribution-free prose; the verifier would have nothing to grep; the memo would be a structured hallucination dressed in 45 pages of formatted typography.

The product is research applied through a disciplined methodology. Not a methodology that occasionally consults research. Not a methodology that degrades gracefully when research is unavailable. The methodology is the rigorous application of research findings. Strip the research and what remains is theatre.

Therefore

Research Arsenal is a hard fail-closed gate at Phase 0. No soft skip. No fallback to local stubs. No "degraded-mode" path. If RA doesn't return status='success' with a populated run-package, the run halts before any analysis begins and the engine refuses to produce a memo. See the RA hard-blocker card below for the failure-mode taxonomy.
Every deploy must verify RA reachability before accepting traffic. A container that can't reach RA must fail its healthcheck and never serve a request. Configuration drift that disables RA must be impossible to ship.
Continuous monitoring of RA gate failures is operationally critical. A run that halts at the RA gate is the engine working correctly; a trend of RA gate halts is the platform broken. The first pages the operator; the second pages the owner.
No analyst-facing or buyer-facing surface ever ships output from a run that bypassed RA. Even if a bypass is ever needed for a synthetic test or a smoke probe, the bypass must be deterministically detectable (sentinel marker) and the resulting artefacts marked unshippable.

Methodology engine = how. Research Arsenal = what. Without "what," "how" produces nothing of value. This framing is binding on every artifact, every PR, every deploy, every monitoring decision. Locked 2026-05-24 in product spec §1.0.

The commitments

Eight things we hold ourselves to, on every deal, without exception.

Infinite-depth research as the default. Halt condition is insight-satisfaction across every material claim, not source-count satisfaction. The verifier owns the halt decision; skills do not self-terminate.
Five research dimensions, none skipped. Category · domain · customers · markets · brand. A "DD" that doesn't produce insight across all five is incomplete; the engine flags the gap rather than ship without it.
Source-tier discipline on every claim. Every numerical claim and every attribution carries one of five tags — [verified], [derived], [inferred], [benchmark], [judgment]. No bare prose without provenance; no source-tier inflation.
Triangulation as a binding rule. Above-MEDIUM claims require minimum source-class combinations per the per-claim-type rules. The verifier enforces; analysts cannot pass.
Adversarial review at every material claim, not only at the verdict. Six standing Quad Review checkpoints fire across every Full DD run plus per-claim review above the materiality threshold. Caution wins ties.
Falsifier on every above-MEDIUM claim. Each claim above MEDIUM confidence carries an explicit falsifier — the evidence that would disprove it. Visible in the claims ledger. Forces the analyst (and the model) to know what they don't know.
Visible rigor in every deliverable. The claims ledger, the entity register, the source-tier tags, the triangulation matrix, the Quad Review dispositions, the methodology snapshot stamp — all surface in the rendered memo. The diligence IS the product.
The DD never silently aborts. Hard blockers are a small, named set; everything else continues with an explicit Data gap: X. Operator action: Y. block in the affected section. The engine always ships a memo plus its diagnostic surface. (See No blockers below for the calibrated rule.)

Research Arsenal — hard blocker on Phase 0

The engine is the use-case-11 specialisation of Research Arsenal. If RA doesn't kick off, the run halts before any analysis begins. Read the "Research IS the foundation" callout above for why this is non-negotiable.

The rule. Every DD run dispatches the canonical Research Arsenal engine as the first action of Phase 0. Research Arsenal owns the research substrate; this engine is its use-case-11 specialisation. If the dispatch does not return status === 'success' with a populated run-package directory, Phase 0 fails closed and the run never advances. There is no soft skip, no fallback to local research stubs, no degraded-mode path that lets a run limp through the remaining 12 phases on stub provenance. A run that hits this halt is the engine working correctly — refusing to fabricate is the design, not an exception to it.

Why a hard blocker. Every research-grade claim downstream must trace to an RA-emitted source-tier record. The section validator already recognises research-arsenal:<run-id> and ra:<…> source markers when scoring claim provenance. If RA didn't run, every downstream phase can only produce stub-grade output, which violates two of the binding discipline contracts — visible rigor in every deliverable and no fabricated numbers. Failing fast at Phase 0 surfaces RA misconfig at the first moment instead of letting a run produce a memo with degraded provenance the operator only notices on read-through. There is no methodology without research inputs to apply the methodology to. A memo from a run that bypassed RA is structurally indistinguishable from a fabrication, and the engine refuses to ship one.

What counts as a non-success. The dispatcher returns one of eight structured statuses. Only success with a run-package directory progresses. The other seven all halt:

unconfigured — RESEARCH_ARSENAL_RUNS_DIR (and optionally RESEARCH_ARSENAL_PLUGIN_PATH / OCC_WORKSPACE_PATH / RESEARCH_ARSENAL_CMD) is not set in the engine env. This is the first thing to check in a fresh deploy.
timeout — hard cap reached without success sentinel. The run is not allowed to advance with partial RA output.
stale-no-heartbeat — subprocess alive but emitted no heartbeat within tolerance; assumed hung.
subprocess-error — RA subprocess exited with non-zero status or otherwise errored.
no-completion-sentinel — subprocess exited cleanly but never emitted the completion sentinel line on stdout.
run-package-missing — subprocess reported success but did not write the run-package at the expected path.
failure-sentinel-emitted — RA itself signalled it cannot produce a usable run-package.

What operators see on a halted run. The Phase 0 row in the engine's audit table records the failure with the dispatch status and the operator-actionable remediation string. The memo is not rendered. The run shows up as verdict=null, outcome=FAILED with a gateReason that names the missing env var or the failing dispatch status. Re-running is safe — the gate is idempotent and retries cleanly once the underlying configuration is fixed.

Locked 2026-05-24. This reverses the prior "no-hard-gates" design that let runs continue when RA was unavailable. The implementation lives at app/src/orchestrator/research-arsenal-gate.ts and is wired into phase0_initialise at app/src/orchestrator/phases.ts. The boundary contract with RA is at Due-Diligence-Engine/RA-Boundary-Reconciliation_2026-05-23.md.

Research depth — five dimensions, no stopping until each produces insight

Every deal carries five dimensions the engine researches against. A skipped dimension is a flagged gap, not silent absence.

Category — what does this product class look like? Margins, channel mix, regulatory exposure, comparable transactions, growth trajectory, demand seasonality.
Domain — what's the technical infrastructure? Tech stack, integrations, data lineage, operational dependencies, customer-support load.
Customers — who buys this? Cohort retention, repeat rate, AOV distribution, geographic concentration, channel attribution, organic vs paid mix.
Markets — where does this sell? Geographic footprint, regulatory regimes, channel maturity, competitive landscape per market, FX exposure.
Brand — what does the public think? Reputation across review surfaces (Trustpilot, Google, Reddit, news, locale-matched aggregators), IP exposure, third-party trademark posture, social signal.

Halt condition. Insight-satisfaction across all five dimensions. The verifier counts insight-qualifying claims per dimension; below the methodology-pinned floor surfaces a gap. "Infinite depth" means the engine doesn't stop because of source-count quotas — it stops when every material claim has an insight-grade attribution.

Source-tier discipline — every claim carries one of six tags, assigned deterministically

Six tiers, not five. Amended 2026-05-24 (Quad Review RF-M3). Tier is derived from the ingestion path — the LLM does not get to self-tag.

Tag	Meaning	Authorising ingestion path
`[verified]`	Pulled directly from a platform-of-record (Shopify GL, Stripe transactions, KvK registry, Companies House filing, USPTO TSDR, EUIPO TMview). Cannot be inflated.	`platform-of-record:` · `registry:`
`[derived]`	Computed deterministically from verified inputs (AOV = revenue ÷ orders, EBITDA from line-item sum, IRR from cash-flow schedule). Computation block in the ledger; a recompute must reproduce the value.	`computation:deterministic` · `parser:*` · `computation:fx-normalisation` · `computation:phase6-math-verifier`
`[inferred]`	Reasonable conclusion from indirect signals (channel mix inferred from UTM coverage; seller-exported number pre-reconciliation). Inference logic shown.	`inference:*`
`[benchmark]`	Comparison against a comp-set or industry reference; the benchmark cited.	`benchmark:pitchbook` · `benchmark:flippa` · `benchmark:internal-comps` · `benchmark:industry-report`
`[judgment]`	Analyst or model judgment with no harder source. Falsifier mandatory above MEDIUM confidence (confidence ≥ 4/5).	`judgment:analyst` · `judgment:model-assessment`
`[synthetic]`	Generated by an AI model without direct source grounding (TAM estimates, scenario probabilities, narrative reasoning). Cannot be promoted to `[verified]` or `[derived]` without a primary source landing in a subsequent phase. Falsifier mandatory regardless of confidence. Cross-vendor corroboration (Quad Review agreement on the same value within tolerance) required.	`synthetic:llm-output` · `synthetic:tam-estimate` · `synthetic:scenario-probability` · `synthetic:narrative-reasoning`

Tier is mechanical, not analyst discretion, AND not LLM self-classification. The Quad Review on 2026-05-24 caught the prior tier system's two architectural gaps: (1) no tier existed for LLM-generated claims, so engine outputs got mis-tagged as [judgment] or [inferred], inflating their apparent reliability; (2) tier assignment was a self-reported **Tier:** field in the ledger that the parser took on trust. The fix on both counts is the same: the tier comes from the ingestion path (which engine module produced the claim) via deriveTierFromIngestion() in app/src/orchestrator/source-tier.ts; the ledger parser cross-checks the declared tier against the deterministic value and surfaces a tier-mismatch violation when they disagree.

Mixed-input derivations follow the weakest-input-wins rule: a [derived] claim that mixes a verified platform pull with an inferred seller assertion is tagged [inferred], not [derived]. The only exception is pure currency / time normalisation from a verified source — it retains [derived] because the operation is reversible and the verified upstream is referenceable. The deriveTierForMixedInputs() helper implements the rule.

Above-MEDIUM confidence is now defined numerically as confidence ≥ 4/5 (see triangulation for where the threshold binds). Source-tier inflation (calling a seller-supplied number [verified]) fires the anti-hallucination Rule 8 cousin: the ledger writer's deterministic tier derivation cross-checks the declared tier and rejects the mismatch.

Triangulation — the 10 claim-type rules, specified

The Quad Review (RF-M2) found the prior rules were placeholders. Amended 2026-05-24 with the binding source-class matrix below.

The triangulation matrix pins the minimum source-class combinations required to clear any above-MEDIUM claim (confidence ≥ 4/5). The matrix surfaces on the deal page next to the claims ledger so the analyst sees which claims are triangulated and which aren't. Implementation at app/src/orchestrator/triangulation-matrix.ts; checkTriangulation() is the deterministic evaluator.

Inherently single-source exception. One claim type — cohort-retention — accepts a single-source pass because the data shape (a platform-pulled cohort export) cannot meaningfully be triangulated against a second independent source. There is no "independent second cohort export" because the cohort IS the export. The methodology preserves the rule for completeness but the matrix table flags this as a documented exception; cohort-retention claims are still required to carry a falsifier above MEDIUM confidence per the source-tier discipline, but the triangulation count is 1 by design.

Claim type	Min source classes	Accepted combinations (any one satisfies)	Banned pseudo-triangulation
regulatory	2 (independent)	registry + regulatory-authority · registry + seller-document	news article citing registry ≡ registry (collapses to 1)
financial-attribution	2 (independent)	platform-of-record + reconciled GL · two independent platform-of-record exports	two seller-export variants ≡ one source class
reputation	2 (independent)	two independent reputation surfaces · high-volume reputation + news/social corpus	news article citing Trustpilot ≡ Trustpilot (collapses to 1)
IP	2 (independent)	USPTO/EUIPO registry + seller assignment contract · domestic IP registry + Madrid/WIPO record	—
supplier-identification	2 (independent)	corporate registry filing + PO/invoice in GL · corporate registry filing + supplier-DD questionnaire response	seller-asserted supplier identity + supplier-DD response from same supplier ≡ supplier-DD (collapses to 1)
comparable-transaction	3 (independent)	PitchBook + Flippa + internal-comp · three of: PitchBook / Flippa / internal-comp / press-released transaction	—
scenario-probability	2 (independent)	historical pattern (derived) + industry benchmark · cross-vendor Quad Review agreement + industry benchmark	—
channel-attribution	2 (independent)	UTM coverage + order-attribution model · GA4 verified + ads-platform verified	—
cohort-retention	1	platform-pulled cohort export (single-source acceptable — the data shape can't be triangulated against itself)	—
benchmark	2 (independent)	published comp + adjusted-for-this-deal calculation	—

Upstream-lineage collapse. The matrix evaluator (collapseUpstreamLineage()) checks the banned-pseudo-triangulation patterns BEFORE counting source classes. Two sources that share a common upstream (news article citing Trustpilot; two variants of a seller export; analyst claim + supplier-DD response from the same supplier) collapse into one class for the purpose of meeting the minimum. The Quad Review caught this — pseudo-triangulation is the most common way a deal "passes" triangulation while actually relying on a single source.

Hard contradictions across sources (the seller says 5% repeat rate; the Klaviyo export implies 21%) surface as contradiction-detected in the ledger. The contradiction is never hidden — it surfaces in the memo's executive summary, not as an appendix footnote.

Sense-check matrix — the four dimensions that gate every memo section

Dimension	What it checks
Language	Banned vocabulary scan (engine jargon · methodology version strings · code identifiers in user-facing prose). Per project rule 7 — no methodology jargon in sales-side artefacts.
Breadth	Coverage spec per section. Each memo section has a methodology-pinned minimum sub-claim count and required source classes. Below the floor surfaces a structural failure-mode block instead of fabricated content.
Depth	Per-claim adversarial review above the materiality threshold. Quad Review fires on any claim with ≥10% impact on valuation, any binary verdict input, any contradiction-detected metric.
Decisions	Cross-layer reconciliation. Phase 7 verdict and Phase 8 HubSpot recall both read the same prior-phase outputs; mismatch surfaces. Phase 12 memo verdict review fires before the memo ships.

Cross-layer contradiction detection. A claim asserted in Section 3 must reconcile with the same claim asserted in Section 7 (or appendices). Phase 11 verifier surfaces drift as a sense-check violation; the methodology checkers fire on every synthesised section, not only in CI.

Quad Review — four independent reviewers, six standing checkpoints

Four model families review every memo in parallel, never seeing each other's verdicts before submitting their own:

Vendor	Model	Role
Anthropic	Claude Opus 4.7	Analytical depth + adversarial reasoning. Lead on financial reconstruction and verdict synthesis.
OpenAI (via OpenRouter)	openai/gpt-5.5	Breadth-of-source synthesis + reputation-corpus reading. Lead on review-aggregator pattern detection. Routed through OpenRouter as of 2026-05-22.
Google	Gemini Flash 3.5	Multilingual fluency + large-context document review. Lead on non-English data rooms and site-source inspection.
DeepSeek	DeepSeek V4 Pro	Cost-efficient breadth coverage + adversarial counterpoint. Lead on volumetric web research and benchmark cross-checks.

Each returns SUSTAINED / DISMISSED / MODIFIED per charge plus an overall disposition (UPHELD / RESTRUCTURED / OVERRULED / UNAVAILABLE). Caution wins ties: any single OVERRULED + RESTRUCTURED pair forces a re-pass.

Six standing checkpoints fire on every Full DD run:

After intake (Phase 0.5) — chain-of-custody on the ingested data room.
After financial reconstruction (Phase 2.5) — model integrity, reconciliation, source-tier, currency normalisation, cohort plausibility, marketplace coverage, multi-entity handling, downstream-schema compliance.
After the ten-layer structural review (Phase 4.5) — layer coverage, severity calibration, IP-clearance firing, deferred-awaiting contract, category-knowledge grounding, gate arithmetic, override application, downstream handshake.
After the reputation scan (Phase 5.5) — surface coverage, auto-flag firing under asymmetric-risk bias, integrity calibration, spike detection, GDPR locale discipline, cross-phase comparator, templated-language grounding, downstream handshake.
After verdict assembly (Phase 7.5) — terminal-verdict arithmetic reconstructability, reason-code principledness, charges grounding, rationale-prose ledger preservation, scenario LLM-judgment fold-in, pre-mortem shape, cross-deal recall agreement, downstream handshake.
After memo render (Phase 12) — chain-of-finding integrity, claim-citation discipline, entity-register coverage, quote fidelity, hedge tolerances, contradiction surfacing, offer derivation, source-tier discipline.

Phase 6 is deliberately absent from the standing checkpoints. Phase 6 produces arithmetic (IRR · MOIC · sensitivity grid · bear-case severity) — LLM judgment is the wrong tool for arithmetic. A deterministic Newton-Raphson recompute (verifyPhase6Math) checks Phase 6's math; the LLM-judgment portions of Phase 6 (bear-case narrative, flip-viability framing) fold into Phase 7.5's charge sheet instead.

How the verdict is assembled — rules-canonical, prose-only LLM

The verdict is decided by deterministic arithmetic, not by a language model. A verdict is the engine's most consequential output — it flows to the CRM, to the buyer, to the deal closing or not. Asking a language model to decide the verdict creates a class of failure no review can fully catch: a confidently-wrong call grounded in plausible-sounding rationale. The methodology takes the decision out of the model's hands.

Deferred-criterion backfills (retention vs comp · reputation classification) follow methodology-pinned thresholds (verdict-aggregation-rules_2026-05-21.md).
Phase-4 layer finalisations follow methodology-pinned mappings — Phase 6 bear-case severity → Layer 5 verdict; Phase 5 auto-flag fire-count → Layer 9 verdict.
HARD_PASS triggers are a priority-ordered enumeration; CONDITIONAL triggers are an explicit set; PASS is the default when no trigger fires.

The language model is reserved for rationale prose only — the English explanation that accompanies the verdict in the memo. The prompt instructs the model to explain the assembled verdict, not to evaluate whether the engine was correct. A produced prose paragraph that proposes a different verdict is rejected; the engine retries with a stricter prompt; second-pass disagreement halts and surfaces for partner arbitration. Full rules: verdict-assembly-prose-discipline_2026-05-22.md.

The pre-mortem the partner reads next. Five risks. Four categories. Eighteen months. After the engine assembles a PASS or CONDITIONAL verdict, the BYO pre-mortem skill asks the inverse: assume this deal turns out badly in 18 months — what were the five most likely reasons? Exactly five risks (forced tradeoff); ≥3 of 4 categories represented (operational / market / financial / governance); each risk ties to a contributing charge from the verdict. A second vendor runs the same prompt as a cross-check; complementary risks the primary missed are appended. Full rules: byo-pre-mortem-rules_2026-05-22.md.

Math gets a math checker. Phase 6 builds the three-scenario LBO + sensitivity grid. The math is checked by a deterministic recompute (verifyPhase6Math), not a language model. Newton-Raphson IRR solver; stated IRR compared to recompute within 0.5pp tolerance; drift surfaces as a structured warning on the run record. Arithmetic gets the arithmetic check.

Anti-hallucination protocol — nine rules; four of them are hard fail-closed gates

Amended 2026-05-24 (Quad Review RF-M1). The prior "all rules surface as diagnostics; the run still ships" framing was the methodology's most serious weakness. Rules 3, 6, 7, and the new Rule 9 are now hard gates: their failure blocks the memo from rendering.

A methodology that lives in a document but not in code is rhetoric. These nine binding rules are deterministic verifier functions in the engine — code, not LLM judgment. They run on every Full DD memo (phase11_verifier) and surface violations on the deal page. Five of the rules (R1, R2, R4, R5, R8) degrade gracefully when their inputs aren't present and surface as diagnostics. Four (R3, R6, R7, R9) are hard fail-closed gates — when they fail, phase11_verifier sets hardGateBlocking on the protocol result and the memo-renderer refuses to ship the deliverable.

Rule	What it checks	Enforcement
1. Re-computable numerical claims	Every stated numerical claim with a derivation basis is re-executed; mismatch beyond tolerance fires.	diagnostic
2. Named-party register resolution	Every named entity in the memo prose must resolve to an entry in the entity register, with fuzzy-name matching.	diagnostic
3. Claim-to-ledger trace	Every prose claim has a claim-ID anchor; every claim-ID resolves to a ledger entry; orphan claims fire.	HARD GATE
4. Tolerance bands on hedge-word claims	"Approximately" / "around" / "~" adjacent to a verified or derived citation is a calibration error; inferred / benchmark citations require a declared tolerance.	diagnostic
5. Stub-output detection	Any phase output bearing a placeholder marker surfaces as a sense-check violation rather than as analytical success.	diagnostic
6. Synthesizer prose preserves ledger	Numerical values in prose match the ledger within 1% tolerance; entity references match the register; misjoined source-and-value attribution fires.	HARD GATE
7. Grep-verified quotes	Direct quotes (verbatim text in double quotes) must grep-verify against the cited source; no-match fires.	HARD GATE
8. Coverage-spec compliance per section	Each memo section has a methodology-pinned floor for sub-claim count and source classes; below-floor surfaces a failure-mode block.	diagnostic
9. Immutable-anchor coverage (NEW)	Every above-MEDIUM claim (confidence ≥ 4/5) MUST reference at least one VERIFIED immutable anchor — raw API response hash, third-party registry record, file content hash, or signed platform export. The verifier is deterministic code: hash match, registry-ID format check, freshness window. The LLM does not get to certify anchors.	HARD GATE

Why Rule 9 is the structural anti-fabrication mechanism. The 2026-05-24 Quad Review caught that the prior protocol was structurally circular: Rule 6 checked memo prose against the claims ledger, but the ledger was produced by the same LLM pipeline as the memo. A synthesizer that fabricated a number could write it to BOTH the prose and the ledger and pass Rule 6. The peer-review correction (all four reviewers converged) was that the fix needed an immutable, independently-sourced anchor whose value the LLM pipeline cannot have produced — not just a cross-file reference within the same pipeline. Rule 9 is that anchor check. Implementation at app/src/orchestrator/immutable-anchor.ts; the four legitimate anchor kinds and their verifiers are defined there.

The four anchor kinds:

api-response-hash — raw response body from an external API (Shopify, Stripe, GA4, etc.), stored at ingestion with a SHA-256 of the body bytes. Verifier confirms the hash matches the stored response.
registry-record — a record from a third-party registry (USPTO TSDR, Companies House, KvK, EUIPO TMview), keyed by registry-issued ID + retrieval timestamp. Verifier validates the ID format against the registry's canonical pattern and rejects records older than 90 days.
file-content-hash — an uploaded document (PDF, XLSX, image) stored at ingestion with a SHA-256 of the file bytes. Verifier re-hashes the stored file and compares. Security caveat: the hash defends against post-ingestion tampering only — it does not prove the file's content was externally sourced. The engine's LLM pipeline can write a file and the hash would still match. This is the weakest of the four anchor kinds. Financial / regulatory claims at above-MEDIUM confidence MUST also carry a registry-record or api-response-hash anchor; file-content-hash cannot be the sole anchor for those claim types.
signed-platform-export — a platform export with a vendor-issued signature. Reserved for future use when a platform ships signed exports.

Shape-level methodology checkers run on every synthesised section in addition to the nine rules: failure-mode-block-engaged-when-required, citation-density floors, source-class coverage per section, hard-contradiction surfacing in the executive summary. The full set surfaces on the deal page next to the synthesis summary on every real run, not just in CI.

LLM-output coercion contract. The engine consumes structured LLM output through tolerant Zod preprocessors. When an LLM emits a small object inside a field the schema declares as string (e.g. {risk: "...", severity: "high"} in a risks array), the preprocessor extracts the readable text from a priority key list before falling back to a JSON render. The engine never ships the literal string "[object Object]" in any deliverable — that string is a code-bug signature, not legitimate content. Locked 2026-05-27 (B5) after the Penta panel found 20 instances of "[object Object]" across the thesis-tracker and deal-tracker deliverables.

Substantive-content gate on memo tables. A row that exists but carries no analytical signal (a reconciliation row with glRevenue=0 and classification='missing-channel', a comparable transaction with em-dashes for revenue/multiple/date, a ten-layer review entry whose source field is empty, a HackerNews thread with a blank title) is treated as fabricated content if it ships as a table row. The memo renderer's table-or-data-gap decision now requires at least one row to pass a section-specific substantive-content predicate. When zero rows pass, the renderer emits the canonical what's missing · what resolves it data-gap block instead. When some columns are dead while others aren't, the dead column is dropped rather than shipped blank. A table of dead rows is a fabrication; an honest gap-callout is data. Locked 2026-05-27 (B7) after the Penta panel found four such dead tables in a single memo (GL reconciliation, comparable transactions, ten-layer Source column, HackerNews titles).

Brand-book uniform application across every generated artifact. Every Word document produced by the engine ships with Cambria (Playfair Display fallback) on titles + headings, Inter (Calibri fallback) on body text, and table headers with brand-blue (#073877) fill + white bold text. Every Excel workbook ships with a Cover sheet as the first sheet (brand block, deal name, generated date, methodology version), frozen header pane on every data sheet, alternating soft-tint (#F6F9FC) row banding, and Inter font on body cells. No generator hardcodes brand colors or fonts; every reference resolves through the central ecomma-brand.ts contract. Locked 2026-05-27 (B8) after the Penta panel found zero of 13 artifacts met the binding brand contract on a recent smoke run.

Three-statement model is self-describing without Excel. The model XLSX uses live formulas in projection cells (so analysts can re-tune assumptions and Excel recomputes). Each formula now also carries a cached numeric result computed in TypeScript via the same math, so static readers — the engine's Phase 11 audit, third-party parsers, the data-room build pipeline — see the same projected values an Excel user would see. The Excel formula is still the source of truth; the cache is the visibility layer. Assets = Liabilities + Equity is provably zero across every projected year by a regression test on the projection helper. Locked 2026-05-27 (B9) after the Penta panel correctly observed that the model "only populates Year 1, so the audit artifacts reference projections that do not exist" — true of static reads only; this fix makes the static read agree with what Excel computes.

Seasonality caveat on linear annualisation. The memo's executive summary annualises a seller-provided P&L via linear extrapolation (revenue × 12 ÷ months_parsed). This is mathematically clean but methodologically biased for seasonal categories — apparel, lingerie, gifting, beauty, holiday goods all carry a material Q4 skew. The memo now appends a category-aware caveat sentence directly under the annualised number, naming the Q4 lift and instructing the analyst to re-derive with month-labelled data before anchoring an offer to the linear figure. Locked 2026-05-27 (B10) after the Penta panel's modify on the panel-required arithmetic check: the math is right, the framing is what needs the caveat.

No blockers — the DD must always complete

Make do with what we have. Hard blockers are a small, named set; everything else continues with documented gaps.

The methodology refuses silent aborts. Where data is missing or a tool is unavailable, the engine renders an explicit Data gap: X. Operator action: Y. block in the affected section and the run continues. The analyst, not the engine, decides whether the output is shippable. Below: the only conditions that legitimately halt a run, separated from the (much larger) set of conditions that are documented gaps within an otherwise-completing run.

Hard blockers — the engine stops

No financial data at all. Phase 2 cannot reconstruct against a nonexistent GL or platform ledger.
No methodology snapshot. Phase 0 cannot stamp a run baseline; reproducibility is broken.
No analyst identity. Login locked to @ecomma.co; unauthenticated submissions cannot be processed.
No mandatory HubSpot deal ID. Phase 0 refuses to start without one because the recall step depends on it.
Target currency / FX-rate snapshot missing. Phase 0 sub-step 1 cannot seal the methodology snapshot without the FX rate; downstream phases refuse to read uninitialised currency state.
Research Arsenal unavailable. Phase 0 dispatches the canonical RA engine as its first action (see Research Arsenal — hard blocker). The methodology's entire provenance chain depends on RA-emitted source-tier records; without them, downstream phases can only produce stub-grade output, which violates the visible-rigor and no-fabrication contracts. The 8 RA dispatch statuses are documented in the RA-blocker card; any non-success outcome halts Phase 0. Reconciled with the no-blockers contract on 2026-05-24.

Soft blockers — the engine continues with documented gaps

Google Analytics or Google Ads data unavailable. Channel attribution falls back to UTM-coverage inference; the section flags the gap.
One reputation surface returns rate-limited or empty. Aggregation continues with the remaining surfaces; sources-not-consulted.md records the skip.
A single comp source (Flippa / PitchBook / internal) absent. Comp set built from the remaining; comp_retention_percentage notes the missing class.
An Ecomma category-knowledge file missing. Phase 4 layer analysis proceeds with generic priors; the gap surfaces for analyst review.
Quad Review vendor times out. Disposition aggregates from the available vendors; the missing vendor logged as UNAVAILABLE.
One or more model-family API keys absent. The affected vendor returns UNAVAILABLE at the per-call boundary; the Quad Review aggregator caution-biases on the available vendors; the run still ships a memo with the missing-vendor diagnostic on the deal page. (Earlier methodology drafts framed this as a hard blocker; the engine actually honours the no-hard-gates principle here — DEGRADED disposition, not abort. The truth-in-methodology amendment 2026-05-22.)
Multi-entity consolidation incomplete. The LBO and memo render parent-entity only; subsidiary gap flagged.
A seller-provided document fails OCR. Phase 0 quarantines it in 00_intake/_unclassified/; the analyst reclassifies via the deal page.

Soft blockers are not exhaustive — every phase declares its own soft-fail modes in its gate-log JSON. The rule is structural: missing data is rendered as a gap-and-action block, not as a halt. The deal page surfaces every soft blocker hit during the run so the analyst can resolve them post-hoc or accept the gaps and ship.

The output bar — what a Goldman-grade memo looks like

The memo is the single deliverable the investment committee, the partner who signs the deal, and the next buyer will see. Quality is not negotiable. The structure below is the floor.

45+ pages substantive content. Eleven body sections plus appendices. Sections with no claim data render the methodology's failure-mode block (Data gap: X. Operator action: Y.) rather than fabricated content.
50+ claims-ledger entries. Every numerical claim and every attribution claim with its source tier, triangulation outcome, derivation basis where derived, and falsifier where above MEDIUM confidence. The prose carries footnote anchors to the ledger; the ledger is the audit trail.
∞ external sources attempted. Every external source the run attempted to consult is logged with its status — returned, returned-empty, failed, skipped-on-policy. The operator sees the surface, not a clean-looking memo with hidden gaps.
Brand template applied. Cover page with the firm's logo, brand fonts (Playfair Display + Inter), brand colour palette, headers and footers on every page, table-of-contents, page numbers. Engine vocabulary and methodology version strings never appear in user-facing output.

Brand discipline — every generated artefact must consume the Ecomma brand overlay

Locked 24 May 2026. Pre-overlay generators shipped templated PDFs and XLSX files with off-brand colours, no logo, ad-hoc fonts, and density that read as text-dumps. The rule and its enforcement test exist so this cannot recur.

What the rule covers. Every generator that produces a user-facing document — PDF, XLSX, DOCX, PPTX, HTML one-pager — must import the Ecomma brand overlay from app/src/lib/ecomma-brand.ts. That module is the single source of truth for the palette, typography, logo placement, spacing system, and file-naming convention. The canonical brand reference document is at Ecomma/Brand/Brand_Reference.md, extracted from the live ecomma.co stylesheet on 27 April 2026.

Forbidden in generator code:

Hardcoded hex colour literals not in ECOMMA_PALETTE (16 frozen values covering brand blue, primary navy, ink, cyan, growth green, deep blue, white, soft tint, body text, muted, lighter muted, hairline rule, subtle rule, critical, caution).
pdf-lib rgb() calls with float triples that resolve to off-palette hex values. Use paletteRgb('brandBlue') instead.
Standard fonts hardcoded without going through ECOMMA_PDF_STANDARD_FONTS. Display = Playfair Display (fallback Cambria → TimesRoman); body = Inter (fallback Calibri → Helvetica).
Generated PDFs without the Ecomma logo on page 1.
Generated XLSX without (a) a Cover sheet, (b) header row in brand blue with white text, (c) alternating row tint #F6F9FC, (d) frozen header pane.

File naming convention — every generated artefact follows {persona-slug}_{Document_Type}[_{variant}].{ext}. Persona slug is lowercase kebab-case, document type is Title_Case_With_Underscores, optional variant is lowercase, extension is one of pdf, xlsx, docx, json, csv. The validator validateFilename() in ecomma-brand.ts enforces.

How the rule is enforced. The vitest test at app/src/lib/__tests__/ecomma-brand-discipline.test.ts scans every generator file under scripts/mock-data/generators/ plus src/orchestrator/memo-renderer.ts on every CI run, reports every hex/rgb violation by file and line, and confirms each generator imports the brand module. Two known pre-overlay violations are documented via it.fails() markers; when the generators are wired through the brand overlay, the markers are removed and the suite begins asserting full compliance.

The failure mode this prevents. On 24 May 2026 the user opened a generated founder-bio PDF and rated it 1/10 — off-brand blue tint, no Ecomma logo, no spacing between callout boxes and body text, dense text-dump feel. The structural verification gates were all green (parser-clean, 0 warnings, tsc-clean, vitest-pass) but the human-readable output failed the brand bar. The brand-discipline rule + enforcing test close the gap between "passes the parser" and "looks like a real Ecomma artefact".

The HITL ship gate — every section requires an explicit analyst decision before it ships

The discipline above is enforced at every downstream publishing path the engine controls. No section reaches a buyer-facing artefact without an analyst's explicit decision recorded on disk. The gate is binding, not advisory; it is the third of three visible governance steps locked 23 May 2026.

Three governance surfaces.

Per-section validation. Mechanical engine-vs-manual comparison on three axes: coverage (which manual specifics the engine surfaced), novelty (engine specifics the manual missed), and source attribution (parsed vs analyst-derived vs external-research). The success bar is ≥ 90% specifics coverage AND zero unspecified-tier claims. Verdict surface: pass / fix-coverage / fix-attribution / fail.
HITL ship gate. Fail-closed by default. A section is shippable only when an explicit ship decision is on disk, recorded by a named analyst with timestamp and an evidence snapshot (R7 pass status · R7 violation count · coverage ratio · validator verdict · adversarial-review disposition). The two decisions that aren't ship are also binding: revise requires revision notes naming the change needed; reject requires a rejection reason naming the fundamental failure.
Recurring quality measurement. Per-slice metrics — R7 coverage ratio, R7 violation count, contradiction-surface count, claims-ledger size — are tracked against committed quality-snapshot floors. Regressions are classified by severity (soft ≤ 10% · material ≤ 50% · hard > 50%) and surfaced on the run page. The regression suite runs as a vitest invariant so every change to the engine code is gated on the existing slices not getting worse.

How the gate is enforced — two downstream consumers, no silent paths.

The memo renderer. When a section's gate denies, the rendered docx substitutes a Withheld pending analyst review block in place of the section's prose. The block names the analyst, the decision date, and the revision or rejection note — auditable in the rendered document, never silently included against a non-ship decision.
The data-room publisher. When the deal's section artefacts are packaged for delivery, every declared section appears in exactly one of two lists in the publish manifest: published (cleared to ship) or withheld (with the block reason carried through). The publisher never silently drops a section.

What the analyst sees before deciding. The auto-gate recommendation engine computes a recommended decision from R7 + the section-validator result + the 4-vendor adversarial review verdict. The analyst sees the recommendation alongside the underlying evidence. The decision is still made explicitly — the recommendation is input, not a substitute. Any decision that exceeds the recommendation's strictness (e.g., recording reject when the auto-gate said revise) requires the analyst's reason recorded inline.

Decision freshness binding (in progress). Each ship decision is dated; a future amendment will bind the decision to a content hash of the section prose and the claims ledger so a stale decision against revised prose can no longer pass the gate silently. Tracked as Step 2 of the rebuild sequence following this section.

The process — phase by phase

With the methodology above in hand, the per-phase walkthrough below describes what runs, when, and why. Every term used here is defined either in the glossary that follows or in the methodology sections above.

Before you read. This page uses some technical words because the engine genuinely uses those things. Every term gets defined the first time it appears, and the glossary below holds them all in one place. Click any row in any section to expand the full detail underneath. Nothing is hidden in code; if it runs, it's on this page.

Glossary of building blocks

Eight words you'll see throughout this page. Skim once, refer back as needed.

AI model

Software trained to read and write text. We use four model families, the current flagship version from each: Claude Opus 4.7 (Anthropic), GPT-5.5 (OpenAI), Gemini Flash 3.5 (Google), and DeepSeek V4 Pro. Claude is the orchestrator; the other three are independent reviewers. When we say "the engine asks Claude to do X" we mean we send Claude a written instruction and a pile of deal data, and it sends back a written answer. Versions update as the vendors release new models — the methodology is pinned to the version stamped in the run record.

Quad Review

The four-vendor adversarial review that runs as Phase 12 (and, scoped to intake hygiene, as Phase 0.5). The engine sends the memo (or, at Phase 0.5, the intake bundle) to Claude, GPT, Gemini, and DeepSeek in parallel against a fixed eight-charge sheet. Each returns SUSTAINED / DISMISSED / MODIFIED per charge plus an overall disposition. Caution wins ties — any single OVERRULED + RESTRUCTURED pair forces a re-pass.

API

A way for one program to ask another for data. When we "call the Shopify API," we're asking Shopify's servers for the target's order history. Shopify sends back a structured response. APIs are how the engine talks to the outside world.

Skill

A written instruction sheet stored in our project-starter library. Each skill tells an AI model exactly how to do one specific task (build a financial model, compare to competitors, build an LBO). The engine picks a skill, loads its instructions, feeds it the deal data, and gets a finished output back. We have ten Tier A skills in active rotation.

Reference file

A document in our project-starter library that holds Ecomma's rules and standards: the 12 acquisition criteria, the 10-layer review template, the comp database, etc. Skills read these to know what "good" looks like at Ecomma. Changes to the rules land here first.

Background job runner

A piece of software that holds tasks waiting to run and runs them on its own schedule. The analyst submits a DD and the analysis goes into the queue; the runner picks it up and works through it over hours; when it's done the analyst gets an email. The browser doesn't have to stay open. We use a product called Inngest for this, but the role is what matters, not the brand name.

Script

A small program (Python or JavaScript) that does one mechanical job. Cheaper and faster than asking an AI for things that don't need judgement — like checking that the numbers in two documents match exactly. We use scripts for verification, file naming, and a few internal plumbing jobs.

Artefact

Anything the engine produces: the memo (a Word document), the financial model (Excel), the data pack (Excel), the LBO model (Excel), the comps deck (Word or PowerPoint), the pre-mortem (Word). Each artefact is saved to the deal's Google Drive folder under a strict naming convention and shows up on the deal page.

What you'll find below

An analyst submits a new DD
The trigger fires (and it's the only one)
Phase 0 — Pin the run baseline
Phase 0.5 — P0 Hygiene Check (4-vendor Quad Review)
Phase 1 — Does it fit the thesis?
Phase 2 — Are the numbers real?
Phase 2.5 — Phase 2 sense-check (4-vendor Quad Review)
Phase 3 — What did similar deals sell for?
Phase 4 — Pick the business apart, layer by layer
Phase 4.5 — Phase 4 sense-check (4-vendor Quad Review)
Phase 5 — What does the public think?
Phase 5.5 — Phase 5 sense-check (4-vendor Quad Review)
Phase 6 — Run the deal three ways
Phase 7 — Make a call, then attack it
Phase 7.5 — Phase 7 verdict sense-check (4-vendor Quad Review)
Phase 8 — Write the verdict back to HubSpot
Phase 9 — Ask the seller to respond
Phase 10 — Write the memo
Phase 11 — Check our own work
Phase 12 — Four reviewers argue about the verdict (Quad Review)
Override paths — when the gate is wrong, on purpose
Research depth — seven surfaces, engine sweep, claims ledger
Filing & house-style branding (cross-cutting)
Delivery — Drive, HubSpot, the deal page

1. An analyst submits a new DD

Maksim signs in with his Google account. The login is locked to @ecomma.co emails, so only Ecomma staff can get in. He lands on the dashboard, which shows what's currently running and what was recently finished. He clicks the "+ New DD" button.

The new-DD form collects everything the engine needs: the target's name and website, what categories the target sells in (multi-select from our 37 verticals plus "Other"), what countries it ships to (multi-select including "Global"), and what kind of catalogue it has (own-brand, third-party trademarked, or mixed). Then a per-deal inputs block, broken down by category — marketing, commerce, customer-relationship data, financials, operations, legal, anything else. Each category has two ways to give the engine data: live connections to the seller's accounts (Shopify, Stripe, etc., authorised once per deal) or a drop zone for files (whatever the seller can send as a download — Excel, PDF, photo of a P&L, doesn't matter).

Maksim uploads what he has, requests the connections he wants, hits Submit run. The engine takes over.

2. The trigger fires (and it's the only one)

"Trigger" here means: the one moment that kicks the engine into motion. The website sends a single message to the background job runner saying "this deal needs analysis." That's it. From that point on, every step is deterministic — same data in, same steps out, same answer out. No AI is deciding what to do next or when to run; the orchestrator (a TypeScript program we wrote, which is just a long ordered list of function calls) does that.

This matters because the engine has to be reproducible. If the same deal runs twice, the same skills fire in the same order and produce the same artefacts. Across analysts, across days, across versions, the rules are the rules. The audit trail in the database proves it after the fact — every skill call is logged with who fired it, when, with what input, and what came back.

0Pin the run baseline

Lock in three things before anything analytical runs: which version of our rules this run will use, whether we've seen this target before, and the verified state of the seller's files.

Six sub-steps run in order. Phase 0 owns the chain-of-custody for everything Phases 1–12 build on top of: every file is hashed before it's touched, the rule files are snapshotted in full (not just version-labelled), and originals are kept immutable while classified copies go into the typed folders. The taxonomy is enforced here so downstream phases never have to think about file location.

What runs in this phase

Reference Methodology snapshot — full hash manifest, not just a version label

What it is

The set of rule files and templates that govern how DDs are done — the twelve acquisition criteria, the ten-layer review template, the comp database, the charge sheet, every SKILL.md our skills run from. Lives under project-starter/.claude/skills/. Current release version: v3.0.0.

What gets recorded at run start

The git commit SHA of the methodology repo at run-start moment.
A SHA-256 hash manifest of every rule file the run will read.
A full copy of those rule files into the deal's 01_methodology/ subfolder.
The FX-rate snapshot for the analyst's chosen target currency (see sub-step 2 below). The FX rate is fetched before the methodology snapshot is sealed; the manifest hash incorporates the FX-rate-snapshot file so any change to the rate (or the source it came from) invalidates the snapshot. A re-run under the same methodology version reproduces the exact same rate.

Why a copy AND a hash AND a SHA

Three layers because each fails differently. The git SHA tells you what was committed. The hash manifest tells you what was actually loaded (catches mid-run edits, hot-reloads, mis-syncs). The file copy means the rule files survive even if the repo is rewritten later. Version labels alone (v3.0.0) are a name, not a snapshot — if anyone edits the rule files later, v3.0.0 no longer means what it meant when the run ran. The point of versioning is to be able to re-run an old deal under its original rulebook; that needs the rulebook itself, not the rulebook's name.

Where the stamp lives

In the runs table as structured columns (methodology_version, methodology_git_sha, methodology_manifest_hash), and as an immutable 01_methodology/manifest.json file in the deal's Drive folder. Both are written before any analytical phase fires.

The methodology is documented inline above (Commitments through HITL gate). This page is the canonical methodology + process document.

Script Run identity & concurrent-run guard — records who, when, and whether another run is already in flight

What gets recorded

The analyst's identity (Google OAuth subject — only @ecomma.co emails are allowed past login).
Run start time in UTC.
The HubSpot deal ID the analyst attached at intake — mandatory. Phase 0 refuses to start without one because the recall step depends on it.
The target slug — lowercase, hyphen-joined slug of the legal name. Collisions on (date, target) get suffixed with the run ID short hash.
The analyst-chosen target currency for this run — selected on the New DD form at intake. Every numerical threshold and every financial figure produced downstream is normalised to this currency. The system never picks the currency on the analyst's behalf; it must be specified per run.

Currency normalisation policy

Currency normalisation is mandatory but never a system decision. Four rules:

The analyst chooses the target currency at intake — USD, EUR, GBP, AUD, or any ISO 4217 code Ecomma operates against. The engine surfaces this as a required field on the New DD form; there is no default.
Every monetary figure is persisted in both the original currency and the normalised currency, with the FX rate and FX source recorded on the claim. Source-tier is [derived] on the normalised figure with the conversion shown.
The FX rate is captured before the methodology snapshot is sealed — Phase 0 sub-step 1's manifest hash includes the FX-rate-snapshot file. This means the FX rate is part of the immutable run baseline, not a side effect: a re-run six months later under the same methodology version uses the same FX rate, not the spot rate at re-run time. Runs are reproducible.
If the FX source is unreachable at run start, the run does not proceed silently — Phase 0 surfaces the FX-fetch failure on the deal page and the analyst chooses (a) retry, (b) supply the rate manually with explicit source citation, or (c) abort the run. There is no fallback to a stale cached rate; reproducibility requires that the rate's source is named at the time it was captured.

Concurrent-run guard

Before Phase 0 proceeds, the engine checks for any in-flight run on the same HubSpot deal ID. If one exists, the analyst chooses: abort the new run, force-start anyway (logged with reason), or resume the in-flight run. Two analysts diligenceing the same target at the same moment is rare but the failure mode is silent double-spend on API quota and contradictory artefact files — better caught here than in Phase 12.

File-naming convention this anchors

{date}_{target}_{artefact}_{version}.{ext}

  {date}    — run start in UTC, ISO date (e.g. 2026-05-19)
  {target}  — lowercase hyphen-joined slug of legal name
  {version} — methodology version stamped at run start (e.g. v3.0.0)
  collision suffix — run ID short hash if two runs collide on (date, target)

API HubSpot recall + engine-own recall — "have we seen this target before, and how did it go?"

What HubSpot is

HubSpot is the database where Ecomma tracks every deal it has ever looked at — name, founder, status, prior verdict, reason codes, notes.

Token preflight first

Before the recall query, the engine validates the HubSpot OAuth token is still live. A revoked or expired token is a recoverable error — the analyst is asked to refresh the connection. If recall ultimately fails (token cannot be refreshed, API persistently down), the analyst chooses: abort the run, or proceed-without-recall. The choice is recorded in the run metadata as a known gap.

Match logic — three keys in order

Primary domain, canonicalised (strip www., lowercase, no trailing slash).
Founder email, exact match.
Fuzzy legal-name match, threshold-bound. Any candidate above threshold is presented to the analyst for confirmation, not auto-merged.

Multi-match cases are surfaced before the run proceeds — silent merge is a worse failure than asking.

Engine-own recall as the second source

HubSpot only knows about deals that made it into the CRM. The engine's own runs table knows about every run it ever started, including aborted ones. Phase 0 queries both — domain match, founder match, slug match — and joins the results. A target that aborted at Phase 1 four months ago and never made it into HubSpot still gets surfaced.

Aging the prior verdict

Prior verdicts are stamped with months-since-issued at recall time. A HARD PASS from 18 months ago is not the same signal as one from three weeks ago — past a threshold (provisionally six months) the verdict is marked stale — re-review recommended and does not block the new run on its own.

Re-engagement triggers

If the most recent prior verdict was HARD PASS within the freshness window, the engine checks re-engagement-triggers.md — one rule per reason code, listing the conditions that would unblock a re-pitch. If conditions are not all met, the engine flags it and the analyst gates the proceed.

What the HubSpot call looks like

GET https://api.hubapi.com/crm/v3/objects/deals/{hubspotDealId}
Authorization: Bearer {hubspot_token}

API Google Drive — ingest with hashes and volume guards — hashes every file before touching it, refuses to overstep scope

The Drive folder URL

Provided by the analyst at intake — either the seller-shared folder or an Ecomma-side mirror. The engine cannot start without one.

OAuth scope discipline

The Drive token is scoped to read on this specific folder and its descendants — nothing broader. The engine refuses to start if the token has wider scope; the analyst is asked to re-authorise with a narrower grant. This is non-negotiable: the engine reads financials, contracts, and trademark certs in this phase, and broader scope is a confidentiality risk we don't accept.

Volume guard

Before ingestion, the engine counts files and bytes. If the folder contains more than the threshold (provisionally 2,000 files or 5 GB), the analyst is asked to confirm. Often this means the seller accidentally shared their entire Drive. The threshold is tunable; the guard exists so we don't burn quota classifying junk.

Rate-limit awareness

Drive API calls are paginated and back-off aware. Sustained 429s pause the ingest with a clear status on the deal page rather than aborting the run.

Chain-of-custody hashing

Every file gets SHA-256'd before any mutation — before classification, before OCR, before translation, before re-filing. The hash is stored in the run record alongside the file's Drive ID and Drive's reported modified-time. Six months later, if a question arises about what the seller originally provided, the hash is authoritative. Two files with the same SHA-256 are treated as the same file: ingested once, referenced twice. The seller's duplicate uploads don't double-count downstream.

What's rejected at intake

Zero-byte files (logged, not ingested).
Files whose format the engine doesn't handle (logged with reason, surfaced to the analyst).
Malformed PDFs (detected on header parse; analyst notified).
Encrypted or password-protected files (engine cannot proceed without the password — analyst is asked to supply it or remove the file).

Skill Classify, OCR, translate — rule-first, AI fallback, original always kept

Classifier mechanism — two layers

Layer one is deterministic: filename heuristics, MIME type, and content keyword matches against a catalogue of known patterns (e.g. "P&L" / "Profit and Loss" / "Income Statement" for financials; "Supplier" / "MOQ" / "Lead time" for operations). Layer two is an AI fallback for files that fail the deterministic pass — a small Claude call that returns category + confidence. Layer two is an AI cost in Phase 0; it's bounded to ambiguous files and each call is logged in the audit trail with token cost.

Classification failure mode

If a file cannot be classified by either layer with sufficient confidence, it's quarantined in 00_intake/_unclassified/ and surfaced on the deal page for manual classification. If any high-value file (a candidate P&L, audit, or balance sheet that the deterministic pass tagged ambiguous) appears unclassified, the run does not proceed past Phase 0 until the analyst resolves it. Silent classification failure is the worst possible outcome — a P&L misclassified as "marketing" never reaches Phase 2 and the financial model is built from incomplete data.

OCR and translation

Image-based PDFs and screenshots are OCR'd. Files in supported non-English languages are translated to English: Dutch, French, German, Spanish, Italian, Portuguese for initial scope. Languages outside this list are flagged for analyst review — the engine does not silently auto-translate languages it isn't calibrated for.

Service failure modes

If the OCR or translation service is transiently unavailable, the engine retries with exponential backoff. If the failure persists past the retry budget, Phase 0 aborts with an explicit error rather than silently skipping the affected files. A skipped translation means Phase 2 would later try to extract numbers from Dutch text and produce nonsense.

What is NOT verified in Phase 0

OCR confidence and translation accuracy on numeric content are not Phase 0's job. Phase 0 captures and preserves; Phase 2 verifies. The handoff is concrete: the original-language file is kept immutable under 00_intake/ with its hash; the OCR'd or translated file is written to 00_intake/_processed/ with provenance metadata (which OCR engine, which translation service, when). Phase 2 cross-checks numeric tokens between the original and processed versions before trusting either for the financial model.

Script Re-file into the standard taxonomy — copies, never moves; intake stays immutable

Copy, never move

Classified files are copied from 00_intake/ into the typed folders. The intake folder is the immutable chain-of-custody record of what the seller actually provided. On a shared Drive folder where the seller has visibility, copying instead of moving also avoids reorganising the seller's own view of their files mid-conversation.

Multi-category files

A vendor MSA with payment terms is both legal and financial. The schema places each file in one canonical folder by tie-breaker (contracts go to legal > financial > operations) with a secondary-category metadata tag for retrieval. The taxonomy is single-folder but the metadata is multi-tag.

No graveyard category

The old 08_other/ bucket is gone. Files that don't fit the typed folders are quarantined in 00_intake/_unclassified/, forcing a manual classification decision rather than disappearing into a catch-all.

One-click reclassify

From the deal page, an analyst can reclassify any file. The reclassify action removes the prior typed-folder copy, writes a new copy to the correct folder, updates the file's category in the run record, logs the action in the audit trail (analyst identity + timestamp + before/after category), and re-fires any downstream phase that already consumed the file under the wrong category. The system is built to be corrected, not blindly followed.

Standard deal folder structure

{deal-name}/
  00_intake/                — original seller drop, immutable
    _processed/             — OCR'd and translated copies, with provenance
    _unclassified/          — quarantine for files needing manual classification
  01_methodology/           — rule-file snapshot + manifest.json + git SHA
  02_financials/            — P&L, balance sheet, audit, bank statements, tax filings
  03_commerce/              — Shopify exports, processor statements, order data
  04_operations/            — supplier list, 3PL contracts, fulfilment SLAs
  05_legal/                 — LOI, founder agreements, trademark filings, contracts
  06_marketing/             — ad-spend reports, attribution, channel breakdowns
  07_crm/                   — customer database, lifecycle data, segmentation
  10_dd-output/             — every artefact the engine produces this run
    {date}_{target}_dd-memo_v3.docx
    {date}_{target}_three-statement-model.xlsx
    {date}_{target}_data-pack.xlsx
    {date}_{target}_lbo-model.xlsx
    {date}_{target}_cash-flow-sidecar.xlsx
    {date}_{target}_comps-deck.docx
    {date}_{target}_sector-overview.docx
    {date}_{target}_competitive-analysis.docx
    {date}_{target}_byo-pre-mortem.docx
    {date}_{target}_thesis-tracker.docx
    {date}_{target}_p0-hygiene-check.json
    {date}_{target}_trifecta-disposition.json
  20_correspondence/        — emails with seller, IC notes, slack threads

Reference Folder Schema v1.0 — enum, required fields, validation — the contract every classified file must satisfy

Schema version

Folder Schema v1.0, introduced under methodology v3.0.0. Schema changes follow semver: minor versions add optional fields; major versions require an explicit migration script and re-classification of any affected runs. There is no silent breaking change.

Valid primary categories (closed enum)

financials → 02_financials/
commerce → 03_commerce/
operations → 04_operations/
legal → 05_legal/
marketing → 06_marketing/
crm → 07_crm/
unclassified → 00_intake/_unclassified/ (quarantine; transient — not a permanent category)

The engine refuses to write a file with a category outside this enum. There is no catch-all bucket — files that don't classify go to unclassified for manual resolution, not into a permanent "other" folder.

Required fields per file record

drive_file_id — Drive's authoritative ID
sha256 — hash captured at ingest, before any mutation
original_filename — as the seller named it
ingested_at — UTC timestamp from the engine clock
primary_category — one of the enum above
status — one of: active, quarantined, rejected, superseded
classifier_layer — deterministic, ai_fallback, or analyst (manual)

Optional fields

secondary_category — for multi-category files (e.g. vendor MSA with payment terms)
classifier_confidence — populated only when classifier_layer = ai_fallback
processed_from — pointer to the 00_intake/ original, populated only for OCR'd or translated copies under 00_intake/_processed/
processing_provenance — OCR engine name + translation service name + timestamp; populated only for processed copies
reclassified_from — prior primary_category, populated when an analyst reclassifies a file

What gets rejected at write time

Any file whose primary_category is not in the enum.
Any file missing a required field.
Any active file whose sha256 collides with another active file in the same run (in-run deduplication — collisions are referenced, not duplicated).

Migration policy

This is the target schema for the engine rebuild. The engine is pre-production at the time of this spec — no prior runs exist under this schema, so no backward-compatibility shim is required. Future schema changes follow semver and ship with explicit migration scripts.

0.5P0 Hygiene Check

Before any analytical phase fires, four independent reviewers (Claude + GPT + Gemini + DeepSeek) audit Phase 0's output bundle against an eight-charge hygiene sheet. If chain-of-custody is broken, the run does not proceed — it loops back to Phase 0.

Phase 0 owns the chain-of-custody for everything that follows. A single check by Phase 0 itself is not enough — the same component that produces an artefact cannot be the only one to certify it. Phase 0.5 is the independent panel of four reviewers. Same Quad Review primitive (runQuadReviewWith in app/src/orchestrator/quad-review.ts) that runs Phase 2.5 (financial-output sense-check) and Phase 12 (memo verdict review), scoped here to intake hygiene. The 4-vendor scope is structural: hygiene findings BLOCK the entire run before any analytical work starts, so coverage gaps at intake cost more than coverage gaps at downstream checkpoints. The ~30s additional intake latency is a once-per-run cost paid in exchange for full safety bias at the most consequential gate.

What runs in this phase

API Four independent reviewers — Claude + GPT + Gemini + DeepSeek, dispatched in parallel

Who reviews

Four vendors evaluate the intake bundle in parallel: Anthropic Claude (Opus 4.7), OpenAI GPT (5.5), Google Gemini (2.5 Pro), DeepSeek (V4 Pro). None is the orchestrator's training origin twice — independence is the point. Each vendor receives the same Phase 0 output bundle and the same eight-charge sheet; none sees the others' verdicts. Vendor identities are pinned by the methodology snapshot at run-start; flagship rotation stamps a new snapshot version.

Why this panel

Hygiene findings block the entire run before any analytical work fires, so coverage gaps at intake cost more than gaps at downstream checkpoints. The 4-vendor panel matches Phase 2.5 (financial-output sense-check) and Phase 12 (memo verdict review) so the same runQuadReviewWith primitive serves all three — one workflow, one vendor count. The ~30s additional intake latency is a once-per-run cost paid in exchange for the safety bias.

What they receive

Phase 0's full output bundle, structured: methodology snapshot + manifest, run identity record, recall results, ingest hashes + rejection log, classification log with confidence scores, OCR / translation provenance, re-filing log, schema-compliance status. Plus the eight-charge sheet (canonical source: app/src/orchestrator/charge-sheets/phase0_5_hygiene.ts). Each vendor returns per-charge votes plus an overall disposition.

Divergence is the signal

Where vendors return different votes on the same charge, the divergence is surfaced verbatim with all rationales side by side. Consensus is not engineered.

Reference The eight hygiene charges — what every reviewer is asked to evaluate

Methodology snapshot integrity — methodology_version + git SHA + manifest hash all present in the run record; rule-file copies present in 01_methodology/; manifest hash matches the actual files on disk.
Run identity completeness — analyst identity, UTC submission timestamp, HubSpot deal ID, and target slug all recorded; no concurrent in-flight run exists on the same deal ID.
Recall coverage — both HubSpot and engine-own runs table were queried; match keys applied in the documented order (domain → founder email → fuzzy legal name); multi-match cases surfaced for analyst confirmation; prior-verdict aging stamped.
Ingest chain-of-custody — every active file in the run record has a SHA-256 hash; the hash was captured before any mutation; Drive OAuth scope is deal-folder-only; the volume-guard threshold was respected.
Classification soundness — no high-value files (candidate P&L, audit, balance sheet) sit in 00_intake/_unclassified/; ai_fallback confidence scores are above threshold where used; deterministic-pass results are consistent with the content keywords they matched on.
OCR / translation provenance — every file in 00_intake/_processed/ carries originating-language + service + timestamp metadata; the original-language file is preserved immutably in 00_intake/; service-failure retry log present if applicable.
Re-filing log completeness — any file renamed, moved, or re-classified during Phase 0 has a re-filing entry capturing the original location, new location, reason, and analyst (or engine: <heuristic>). Phase 11 (verifier) reads this log when challenging a verdict; gaps here mean a file's provenance can't be traced six months later.
Schema compliance — every file record has the seven required fields (drive_file_id, sha256, original_filename, ingested_at, primary_category, status, classifier_layer); enum values are valid; status transitions are logical (no active file with a quarantined-only history).

For each charge, each vendor returns SUSTAINED (charge holds, hygiene is fine on this dimension), DISMISSED (charge doesn't apply or the source material doesn't support evaluating it), or MODIFIED (charge applies in a slightly different form than stated, with explanation). The eight-charge sheet's canonical source-of-truth is app/src/orchestrator/charge-sheets/phase0_5_hygiene.ts; this list mirrors it. Methodology amendments stamp a new PHASE_0_5_METHODOLOGY_VERSION constant and propagate to both surfaces in lockstep.

Aggregation Disposition aggregation — caution wins ties

Vendor dispositions (review-vocabulary)

Each of the four reviewers returns one of UPHELD / RESTRUCTURED / OVERRULED / UNAVAILABLE — the standard Quad Review vocabulary shared with Phase 2.5 and Phase 12. The aggregator maps these to the post-phase vocabulary (CLEAN / ISSUES_FOUND / BLOCKING / DEGRADED) so the deal page and the analyst-facing UI use consistent language across all three checkpoints.

Caution-bias aggregation

The aggregator counts only the dispositions returned by available vendors — UNAVAILABLE votes (vendor timeout, API down, parse error) do not contribute to the tally, but they reduce the panel size on which the verdict is computed. Partial unavailability (1, 2, or 3 of 4 vendors UNAVAILABLE) does NOT automatically downgrade to DEGRADED; the aggregator still emits a verdict from whatever vendors returned valid responses, and the per-vendor outage surfaces explicitly on the deal page so the analyst sees that the verdict is N-of-4 rather than 4-of-4.

All UPHELD (among available vendors) → CLEAN → auto-proceed to Phase 1.
2+ vendors OVERRULED, OR 1 OVERRULED with 1+ RESTRUCTURED → BLOCKING → run halts, Phase 0 re-runs with the flagged fixes.
1 OVERRULED alone (rest UPHELD), OR 1+ RESTRUCTURED → ISSUES_FOUND → run pauses at the deal page; the analyst reviews the rationales and confirms before proceeding to Phase 1.
All four UNAVAILABLE (no vendor returned a valid response) → DEGRADED → run proceeds with an explicit "hygiene check not run" entry in the deal-page audit log.

Divergence surfacing

Where the four vendors return different votes on the same charge, the divergence is surfaced verbatim — "Claude SUSTAINED, GPT MODIFIED, Gemini SUSTAINED, DeepSeek DISMISSED" — with all four rationales side by side. Consensus is not engineered; divergence is the signal.

Output p0-hygiene-check.json — all four vendors' votes + final disposition + divergence notes (persistence wired Task 11)

Structured record written to ${RESEARCH_ARSENAL_RUNS_DIR}/${runId}/10_dd-output/{date}_{target-slug}_p0-hygiene-check.json at the end of every Phase 0.5 run, regardless of disposition. Contains the full per-charge vote and rationale from each of the four vendors, the per-charge divergence summary where applicable, the final aggregated disposition, the methodology snapshot version, and a duration / cost line. Surfaced on the deal page alongside the Phase 12 memo-verdict review so the deal page shows both the intake-side check and the verdict-side check. Persistence wired in Task 11 via the shared writeGateLogFile helper (alongside Phase 1, Phase 2, Phase 2.5, Phase 3, Phase 4 gate-log writers); soft-fails when RESEARCH_ARSENAL_RUNS_DIR is unset (engine ships without RA wired by default). The symmetric readGateLogFile(runId, 'p0-hygiene-check') reader is what the deal-page audit-trail surface consumes.

If the disposition is BLOCKING or ISSUES_FOUND, the deal page renders an action banner with the affected charge(s) named and a "re-run Phase 0" or "confirm and proceed" button as appropriate.

1Does it fit the thesis?

Twelve criteria the deal has to meet to be worth analysing further. A defensible yes/no gate — bad deals die here with a documented short memo, an explicit override path, and a per-criterion audit trail.

The thesis is Ecomma's view of what kind of business is worth buying. Twelve numeric and categorical criteria. Phase 1 scores ten of them at the gate; criteria 11 (retention vs comp) and 12 (reputation) are deferred — they need Phase 3 (comps) and Phase 5 (reputation) to have run before they can be evaluated, so Phase 7 backfills them into the verdict. The gate decision returns one of four states: PASS (proceed), CONDITIONAL PASS (proceed with named gaps), HARD PASS (do not pursue), or OVERRIDE PROCEED (analyst override; the override reason is captured and propagates into the memo).

What runs in this phase

Pre-check Gate-input contract — before scoring, verify the engine actually has the data to score each criterion

Why this exists

Scoring criteria without sufficient input is how a gate hallucinates. The model will helpfully pattern-match an answer to a criterion it cannot actually evaluate from the supplied data. Phase 1 refuses to score a criterion unless the engine has the input the criterion requires.

Per-criterion input requirements

Criteria 1, 2, 9 — deal-form fields + Phase 0 classification log (always present).
Criteria 3, 4, 7, 8 — P&L or processor-derived revenue / margin data from Phase 0's classified financial documents.
Criterion 5 — order-to-delivery data from Phase 0's commerce or operations documents.
Criterion 6 — deal-form listing-ask field.
Criterion 10 — customer-order data from Phase 0's commerce or CRM documents.
Criteria 11, 12 — deferred (Phase 3 and Phase 5 backfill).

What happens when input is missing

Any criterion with insufficient input returns DEFERRED with the named missing-input. The gate aggregation treats DEFERRED distinctly from PASS / NEAR / FAIL — it surfaces on the deal page as "criterion N requires <data> which is not present; analyst can supply or accept the deferral." Silent best-effort scoring is forbidden.

Reference acquisition-thesis.md — the twelve gate criteria, the disqualifiers, the per-criterion metadata, the lesson history

What it is

The file that holds Ecomma's twelve acquisition criteria, the hard disqualifiers, the per-criterion metadata, and the lesson history for every criterion that was earned the hard way. Lives at project-starter/.claude/skills/run-deal-dd/references/acquisition-thesis.md. Version-stamped under the methodology snapshot at Phase 0; the run is bound to whatever version was current at run-start time.

Per-criterion metadata schema

Each criterion in the reference file carries this metadata, not just the threshold:

threshold — numeric or categorical (per-category table where the threshold differs by niche).
tolerance_band — what counts as "near" vs. "fail."
deal_stage_overrides — different thresholds for Pre-LOI screen vs. LOI vs. postmortem vs. BYO.
input_required — what data the criterion needs to be scorable (used by the gate-input contract above).
failure_template — the boilerplate sentence Phase 1 writes into the HARD-PASS memo when this criterion fails.
lesson_history — when this criterion was added, the deal that drove the lesson, the link to the post-mortem.
override_eligibility — structured field with two sub-fields: allowed (boolean — whether this criterion is overridable by the analyst at all; false for the seven hard disqualifiers, which are not analyst-overridable under any circumstances and instead route through the separate HubSpot IC-waiver pathway before the run starts); valid_reason_codes (array — when allowed is true, which of the six typed reason codes from the override taxonomy are accepted for this criterion specifically, since not every reason fits every criterion — e.g., criterion 1 niche fit accepts only IC_STRATEGIC and CATEGORY_OUTLIER_JUSTIFIED, never DATA_GAP_TEMPORARY; the array is empty when allowed: false).

The twelve criteria

1. Niche fit — primary category is one of: Apparel & Accessories, Baby/Kids, Home & Décor, Automotive Accessories, Jewelry & Beauty. Hard fail otherwise. Overridable only with documented IC strategic-fit reason.

2. Operating age — at least 12 months of continuous operation. Hard fail otherwise.

3. Monthly revenue — 12-month trailing average ≥ analyst-chosen-currency-equivalent of $10K USD AND latest 3 months ≥ same floor. Hard fail if both conditions miss. If the trailing average passes but the latest 3 miss, returns NEAR with paused-ads / seasonality flag. All thresholds are in the run's target currency (set at intake); the engine converts using the FX rate captured at run-start.

4. Monthly profit — average of latest 3 months ≥ target-currency-equivalent of $5K USD, after operator pay. Hard fail otherwise. Operator pay is defined as: market-rate replacement cost for the seller's role, sourced from category-comparable founder-salary benchmarks. Falls back to documented actual founder salary if the role is standard / not founder-dependent. Persisted as [derived] source tier with the computation shown.

5. Delivery time — average order-to-delivery < 2 weeks. Hard fail at 4+ weeks; NEAR at 2–4 weeks with explicit logistics-risk callout.

6. Deal size — listing ask between target-currency-equivalent of $50K and $800K USD. NEAR if outside (not auto-fail) with size-mismatch callout.

7. YoY trend — revenue decline no greater than 20% year-over-year. Hard fail if 3+ consecutive months below -20%. Note: criterion 3 and 7 can both fail on the same underlying decline signal; the gate aggregation counts that as one fail with two reinforcing diagnostics, not two independent fails.

8. Gross margin — per-category floor:

Category	GM floor
Apparel & Accessories	50%
Jewelry & Beauty	45%
Home & Décor	42%
Baby & Kids	38%
Automotive Accessories	35%

Hard fail if trailing-twelve-month GM < the category floor. The single most decisive criterion. Dropship aggregators usually fail here, intentionally. Calibration TBD against the comp database.

9. Model type — not a pure dropship aggregator. If catalogue contains third-party trademarked names (car marques, fashion houses, sports leagues), trademark verification with USPTO/EUIPO is added to the run. Lesson source: Aveugle retrospective — catalogue had third-party trademarked car-marque names that triggered Lesson 11.

10. Repeat rate — at least 25% of revenue from repeat customers in the trailing 12 months. NEAR if 15–25%, fail below 15%. Repeat customer = ≥2 orders from same email or phone within the 12-month window. DTC channel only; B2B excluded.

11. Retention vs public comp — deferred to Phase 7. Target's repeat-customer share within 20 percentage points of the closest public comparable. Comparable selection rule: same primary-category, similar size band, public filings within last 24 months, DTC channel mix > 60%. Phase 3 produces the comp set; Phase 7 backfills this criterion's score using the comp-set retention figure. Added May 2026 — lesson history: surfaced after retention-blind verdicts on three deals failed at the IC.

12. Reputation — deferred to Phase 7. No signs of templated / clustered / bought reviews. Spike-detection check passes across all reputation surfaces the engine probes — not just Trustpilot. Phase 5 runs the multi-source reputation sweep (Trustpilot, Google reviews, Yelp, Facebook reviews, Reddit mentions, category-specific review aggregators, plus local-market review sites discovered via search); Phase 7 backfills this criterion's score using the aggregated signal. The spike-detection rule, the templated-language rule, and the reviewer-history rule each apply per surface and aggregate via volume-weighted majority. Added May 2026 after Aveugle. Lesson reinforcement May 2026 from a manual DD where a local review site held 8,000 reviews against 3 on Trustpilot — sticking to one platform would have missed the entire signal.

The instant hard disqualifiers (separate from the twelve)

These are not scored — they are absolute and not overridable except by explicit IC waiver recorded in HubSpot.

Edible, perishable, or regulated products
Adult products
Self-fulfilled operations (we want third-party logistics)
Banned or restricted ad accounts on the major platforms
Active legal disputes or IP risks
Unresponsive or uncooperative seller
Inability to transfer key assets

API Anthropic — Claude Opus (methodology-pinned version) — scores the criteria with confidence + DEFERRED-on-missing-input semantics

Model + version

Claude — current flagship Opus version, pinned by the methodology snapshot at run-start (Opus 4.7 as of methodology v3.0.0). The methodology version is the binding reference; the specific model identifier lives in the methodology snapshot, not hardcoded in this narrative.

What the engine sends

Three inputs: the ten gate-active criteria (criteria 11 and 12 are deferred and not included in this call), the deal's grounding bundle (classified financial documents, commerce/CRM data, deal-form fields, Phase 0 classification log — explicitly the same bundle the gate-input-contract pre-check just validated), and an instruction to score each criterion with {vote, value, confidence, rationale, sources_used} per criterion. The instruction forbids inventing values — if the model cannot find the data it needs in the bundle, it must return DEFERRED with the named missing input, not a guessed score.

Scoring contract

Per criterion, Claude returns:

vote — one of PASS, NEAR, FAIL, DEFERRED.
value — the numeric or categorical value the model found (e.g. "47% GM", "$12K trailing average revenue").
confidence — 0–1 model self-rating. Confidence below the methodology-set floor (default 0.7) downgrades the vote to DEFERRED automatically.
rationale — one sentence on the reasoning.
sources_used — the specific Phase 0 files / data fields the model relied on. Used by the gate logger for the audit trail.

Cross-check (not a cost choice — a quality choice)

The same scoring runs against a second vendor in parallel for any criterion where the first vendor's confidence is between 0.7 and 0.85 — the band where the model is confident enough to vote but not confident enough to be solo-trusted. The second vendor is pinned by the methodology snapshot at run-start, drawn from the same four-vendor roster the Phase 12 Quad Review uses (Anthropic Claude Opus, OpenAI GPT, Google Gemini, DeepSeek); the current pinning is GPT (chat-API flagship 5.5) as the cross-check partner to the Claude primary scorer. Divergence between the two vendors on a load-bearing criterion (1, 3, 4, 7, 8, 9) auto-escalates the criterion to analyst review before the gate aggregates.

What happens if the cross-check vendor is unreachable

If the cross-check call fails (timeout, API error, key invalid), the affected criterion does not silently fall back to the single-vendor vote. Instead, the criterion is marked DEFERRED with the missing-cross-check reason recorded on the gate log, and the analyst is gated via the deal page banner before the run proceeds. The cost of unverified mid-confidence votes is paid in analyst time, not in silent acceptance.

Aggregation Gate aggregation rule — how individual criterion votes resolve to a single gate decision

The four gate states

PASS — every gate-active criterion (1–10) is PASS or, for the graduated criteria (5, 6, 10), NEAR with no other concerns. Proceed to Phase 2.
CONDITIONAL PASS — one or more NEAR verdicts on graduated criteria, OR one or more DEFERRED verdicts on non-load-bearing criteria, with no hard-fail on a load-bearing criterion. Proceed to Phase 2 with named gaps on the deal page that the memo must address.
HARD PASS — any FAIL on a load-bearing criterion (1, 2, 3, 4, 7, 8, 9), OR two-or-more FAIL verdicts on any criteria, OR any of the seven hard disqualifiers fires. Run halts and produces the HARD-PASS deliverable (see below).
OVERRIDE PROCEED — analyst override under the override-paths policy (cross-cutting section below). The gate's vote stands ("HARD PASS would have fired on criterion 8"); the run proceeds anyway with the override reason captured. The memo surfaces the override prominently, not in an appendix.

What load-bearing means

Load-bearing criteria are the ones where a single fail is decisive: 1 (niche fit), 2 (operating age), 3 (revenue floor), 4 (profit floor), 7 (YoY trend), 8 (gross margin), 9 (model type / IP). The other criteria (5, 6, 10) graduate through NEAR; the deferred ones (11, 12) get backfilled by Phase 7 and only affect the verdict, not the Phase 1 gate.

DEFERRED vs FAIL

DEFERRED never aggregates to a fail at Phase 1. It marks "the engine couldn't score this from current inputs" and surfaces on the deal page for the analyst to either supply the missing data, accept the deferral and proceed, or treat the deferral as a fail explicitly. The default is "surface and ask" — silent best-effort is the failure mode the gate-input contract exists to prevent.

Output phase1-gate-log.json + HARD-PASS memo (if applicable) — every gate decision produces a defensible artefact

phase1-gate-log.json

Written to 10_dd-output/{date}_{target}_phase1-gate-log.json on every run, regardless of gate outcome. Contains, per criterion: vote, value, confidence, rationale, sources_used, cross-check vendor verdict (if cross-check fired), and the final aggregated gate state. Surfaced on the deal page as a collapsible audit trail. A HARD PASS verdict challenged six months later is reviewed against this artefact.

HARD-PASS short memo

When the gate state is HARD PASS, the engine still produces a 1–2 page memo at 10_dd-output/{date}_{target}_dd-memo_hard-pass.docx. Contents:

Target identification (name, domain, founder, category).
The verdict and the specific criteria that failed, with the values found and the failure-template sentence from the methodology.
The underlying data sources that drove the failure verdict.
The re-engagement conditions — what would need to change in the target's business for Ecomma to re-open the deal. Pulled from re-engagement-triggers.md per reason code.
The override-trigger conditions — what an IC override would need to document if Ecomma wanted to proceed despite the failure.

HubSpot write

The gate decision is written back to the HubSpot deal record with the reason code and the re-engagement triggers attached, so future recall queries (Phase 0 sub-step 3) surface the prior verdict with the conditions needed to unblock it.

2Are the numbers real?

Discovers which platforms the target actually uses, pulls the data, builds the financial model, audits it for structural integrity, reconciles GL revenue against payment-processor settlements, and produces the IC-ready data pack. Catches inflated P&Ls and structurally-broken models.

This is the engine's core analytical phase. Before any pulls run, the engine discovers which commerce platforms the target operates (Shopify is one option among many — Magento, WooCommerce, BigCommerce, Salesforce Commerce Cloud, plus marketplace channels Amazon, eBay, TikTok Shop are all material for many ecommerce targets), which payment processors are connected, which accounting system the target runs on, and whether the target spans multiple legal entities (parent + subsidiaries). Then it normalises every raw amount to the run's target currency using the FX-rate snapshot captured at Phase 0. Then it builds the financial model, audits it for integrity, runs the GL-vs-settlements reconciliation, and produces the data pack — in that order, because a broken model cannot reconcile and an unaudited model should not be turned into a deliverable.

The reconciliation itself is the methodology's adversarial check #1. The revenue claimed in the GL has to match what actually settled through the connected payment processors plus the marketplace settlement reports, within the methodology-pinned tolerance. Mismatches outside the tolerance get classified by operational break-code (timing / missing channel / classification error / currency variance / suspicious / explained-by-analyst / pending-analyst) and surface on the deal page for analyst classification. A suspicious break with no operational explanation is the single biggest red flag in this category of deal — it almost always means inventory swaps, founder cash, or wholesale revenue is being added to a number being sold as DTC.

What runs in this phase

Pre-check Phase 2 gate-input contract — verifies the engine has what it needs before any skill fires

What it checks

Before any commerce / payments / accounting pull fires, the engine verifies (a) target_currency is set on the run record — Phase 2 fails loud if it isn't, since every downstream number depends on it; (b) the FX-rate snapshot captured by Phase 0 is present for every non-target currency the target's data touches; (c) at least one connected source exists per leg (commerce, payments, accounting). Missing inputs are recorded as deferred sub-aspects, not silent empty pulls.

What happens when an input is missing

The corresponding sub-output is marked DEFERRED with the missing-input reason recorded (e.g. accounting_unavailable: no_connector_registered). Downstream skills are passed the deferred state and produce explicit data-gap blocks rather than synthesising values from empty arrays. If all three legs are unavailable, Phase 2 cannot proceed and the gate aggregates to HARD PASS unless overridden by an analyst with documented data-gap reason.

Discovery Platform-of-record discovery — which commerce, payments, accounting platforms the target actually uses

Why discovery, not hardcoding

A target on Magento produces empty data if the engine hardcodes Shopify. A target running both a Shopify storefront and Amazon Brand Registry — 40%+ of revenue on Amazon — looks like a much smaller business if marketplaces are ignored. A target operating one parent legal entity plus two subsidiaries reconciles wrong if treated as a single entity. Discovery runs before the pulls so the engine actually knows what to pull and from where.

What's discovered

Commerce platforms — Shopify (storefront fingerprinting on the public site + OAuth integration probe), Magento, WooCommerce, BigCommerce, Salesforce Commerce Cloud, plus marketplace channels: Amazon (Brand Analytics + Seller Central), eBay (Sales reports), TikTok Shop (Analytics), Etsy. Per-platform discovery confidence is captured; low confidence triggers the seller-questionnaire / analyst-escalation loop.
Payment processors — Stripe, PayPal, Adyen, Braintree, Mollie, Klarna. Per-processor discovery covers multi-account targets (per-entity processor accounts).
Accounting systems — QuickBooks (US-default), Xero (UK / AU / NZ default), Sage (UK / IE / DE), Moneybird (NL — often returns Dutch, handled by Phase 0 translation), NetSuite (larger targets), Wave.
Entity structure — single legal entity vs parent + subsidiaries. Multi-entity discovery surfaces consolidation and intercompany-elimination requirements to the 3-statement-model skill.

Output

10_dd-output/{date}_{target}_phase2-platform-discovery.json. Surfaced on the deal page; the analyst can correct misdiscoveries (e.g. mark a detected platform as not-actually-the-target's) with one click, which re-fires the affected pulls.

Discipline Currency normalisation handshake — every figure normalised to the run's target currency before reconciliation

The handshake

Phase 0 captured target_currency at intake (mandatory, no default) and the FX-rate snapshot at run-start. Phase 2 reads both. Every raw amount that comes back from a pull (orders in EUR from a NL Shopify, settlements in USD from Stripe, GL transactions in EUR from Moneybird) is normalised to target_currency using the FX-rate snapshot — never re-fetched, so re-runs of the same run produce identical numbers.

Provenance preserved

Raw per-currency values are preserved alongside normalised values on every leg. The reconciliation operates on normalised values; the audit log and the data pack cite both. The FX-rate snapshot's capturedAt timestamp is propagated for every cross-currency conversion so the data-pack's audit trail shows which snapshot fired which conversion.

Pulls Commerce data — discovered platforms, target-currency normalised — orders, cohorts, returns from every detected commerce source

What's pulled

For each discovered platform: the trailing 24 months of orders, customer cohorts, returns, refunds, refund-lag distribution, product-mix shares, repeat-purchase share. Per-platform pulls run in parallel; per-platform failure surfaces in the leg's pull_status rather than blocking the aggregate. Marketplace pulls (Amazon Brand Analytics, eBay Sales reports, TikTok Shop Analytics) are first-class alongside the storefront pulls — a 40%-of-revenue Amazon channel produces 40% of the cohort data.

Source tier

Platform-pulled data is tagged [verified] — it can't be inflated because the platform is the source. Seller-exported analytical exports (e.g. a CSV the seller produced) are tagged [inferred] and require reconciliation against a primary source before promotion.

Pulls Payment-processor settlements — per-processor settlements, disputes, and chargeback economics

What's pulled

Per discovered processor, per connected account (multi-account targets handled by enumerating connected accounts): the actual settled money. Settlement-level — every charge, every refund. Plus disputes (open + resolved) and chargebacks. The chargeback dimension matters for the reconciliation denominator: gross settlement minus chargebacks = net revenue the business actually realised, which is what should reconcile against the GL net revenue line, not the gross.

Why disputes and chargebacks are separate from refunds

A refund is voluntary; the business chose to issue it and the operational metrics treat it as customer satisfaction or returns. A dispute or chargeback is involuntary and signals downstream issues (delivery failure, fraud, misrepresentation). For DD, the chargeback rate as a percentage of gross settlement is itself a quality signal — a chargeback rate above the methodology-pinned floor flags an operational risk in the deal page even if the reconciliation passes.

Pulls Accounting system + multi-entity GL — trial balance, P&L, balance sheet, GL transactions, COGS, inventory movements

What's pulled

From the discovered accounting system: trial balance, P&L, balance sheet, general-ledger detail, the COGS ledger, and inventory-movement records. The COGS + inventory pulls are separate from the GL pull because the COGS-to-inventory reconciliation (cost of goods sold this period vs inventory that moved this period vs supplier invoices booked this period) is its own integrity check — revenue inflation isn't the only fraud surface.

Multi-entity targets

If discovery flagged multiple legal entities (parent + subsidiaries), the engine pulls each entity's GL separately, then loads the consolidation eliminations from whichever entity carries them. The 3-statement-model skill receives both per-entity ledgers and the consolidated view; the audit-xls skill verifies that the consolidated total equals the sum of per-entity totals net of eliminations.

Data-room fallback

When no live accounting integration is available, the engine falls back to the seller-exported P&L from Phase 0's data-room ingestion. Seller-exported data is tagged [inferred] until GL reconciliation passes. The fallback is logged on the gate log so the verdict can be challenged later with "this verdict was reached on seller-exported financials, not live GL pulls."

Integrity check COGS / inventory reconciliation — ties cost of goods sold to inventory movements to supplier invoices

What's checked

For each month, the engine ties three numbers together: (a) COGS recognised in the P&L, (b) inventory that moved out of stock per the inventory ledger, (c) supplier invoices booked into accounts payable for the same period. These three should reconcile within the methodology-pinned tolerance. Where they don't, the gap is classified — common operational explanations include timing on inventory receipts vs invoice booking; uncommon explanations include misclassified expenses or inventory shrinkage.

Why this matters

Revenue inflation is one fraud surface; margin inflation via under-reported COGS is another. A target showing 50% GM might actually be 35% GM if 15 points of COGS sit unbooked. The COGS-inventory-invoice triangulation catches this class.

Skill 3-statement-model — integrated P&L + balance sheet + cash flow, methodology-pinned vendor

What the skill is

One of our Tier A finance skills. The SKILL.md file at project-starter/.claude/skills/finance/3-statement-model/ tells the methodology-pinned LLM how to build an integrated financial model: P&L, balance sheet, and cash flow, linked by formulas, with proper integrity checks (balance sheet balances, cash ties out from indirect-method cash flow to the actual cash account). For multi-entity targets the SKILL.md spec covers the per-entity build + the consolidation step.

Methodology-pinned vendor

The vendor and model identifier live in the methodology snapshot at run-start, not hardcoded in the skill body. Current pinning: Anthropic Claude Opus (version pinned by methodology version). The cross-check partner (when the skill's confidence band suggests one is needed) is GPT (chat-API flagship 5.5) per the same methodology snapshot. Methodology version drift = skill version drift = audit-trail entry.

What it produces

An Excel file: {target}_three-statement-model.xlsx. Formulas, not hardcodes. Every blue input cell cites its source. The model becomes the foundation for every downstream phase that needs financials — Phase 3 comps, Phase 4 ten-layer unit-economics review, Phase 6 scenarios, Phase 7 verdict, Phase 10 memo.

Skill audit-xls (runs BEFORE datapack) — closed-enumeration model integrity check

Why it runs before the datapack

The data pack is the IC-facing deliverable. If the underlying model has a balance-sheet imbalance or a broken cash-flow tie-out, the data pack will silently embed that error and cite it as ground truth. So the audit runs first; the data pack only runs against an audited model.

Closed enumeration of failure types

The audit-xls SKILL.md defines a closed list of structural failure types, each with its own tolerance and per-failure failure-mode template:

Balance-sheet imbalance — Assets ≠ Liabilities + Equity beyond the methodology-pinned tolerance. Hard structural error.
Cash-flow tie-out failure — Indirect-method cash flow's ending cash doesn't equal the balance sheet's cash line. Hard structural error.
Hardcoded numbers inside formulas — Blue input cells mixed with hardcoded values in computed cells. Recoverable; surfaces as warning + analyst review.
Broken roll-forwards — Period-N closing balance doesn't equal Period-N+1 opening. Hard structural error.
Missing inputs — Formulas pointing at empty source cells. Recoverable; surfaces with named gap.
Source-tier mis-tagging — Numerical claim without a [verified] / [derived] / [inferred] / [benchmark] / [judgment] source tier. Recoverable; analyst-correctable.

Four-state outcome

Audit-xls returns one of CLEAN / WARNINGS_ONLY / RECOVERABLE_ERRORS / HARD_ERRORS. CLEAN and WARNINGS_ONLY proceed. RECOVERABLE_ERRORS proceed with CONDITIONAL_PASS gate state and the errors surface on the deal page for analyst review. HARD_ERRORS fail the phase gate (HARD PASS) unless analyst override.

Skill gl-reconciliation-workflow — methodology-pinned tolerance, operational break-code taxonomy, cross-vendor verified

What the skill is

Reconciles the accounting system's GL revenue line against the payment-processor settlements (less chargebacks per the chargeback-economics row above) against the commerce-platform orders. For multi-entity targets the reconciliation runs per-entity plus on the consolidated total. For each month, the three totals should reconcile within the methodology-pinned tolerance.

Tolerance + denominator + aggregation rule (all methodology-pinned)

The tolerance, denominator, and aggregation rule all live in reconciliation-rules.md under the methodology snapshot — not hardcoded in the skill body. Current pinning: 2% tolerance on monthly net GL revenue, applied as "any single month exceeding 2% triggers classification; three consecutive months exceeding 1.5% also triggers; rolling 12-month average exceeding 1% triggers." The denominator is monthly net GL revenue (gross less refunds less chargebacks). Methodology amendments to any of these values stamp a new version into the snapshot.

Operational break-code taxonomy (not strategic — the audit-quad finding)

Breaks outside tolerance are classified by operational break code, distinct from Phase 1's strategic override reason codes:

TIMING — GL recognises revenue in month N+1; settlement landed in month N. Common, operationally explainable, low-signal.
MISSING_CHANNEL — GL revenue has no corresponding processor settlement (e.g. Amazon revenue not yet integrated as a payment leg). Resolvable by adding the missing source.
CLASSIFICATION_ERROR — GL account miscategorisation (refund posted to revenue, deferred revenue recognised early).
CURRENCY_VARIANCE — FX-conversion variance within methodology tolerance; tracked separately from suspicious for clarity.
SUSPICIOUS — Gap with no operational explanation. Primary fraud-signal bucket. Auto-escalates to analyst review before the gate aggregates.
EXPLAINED_BY_ANALYST — Analyst attached a specific operational note + accepted the break. Gate-log preserves both the break and the explanation.
PENDING_ANALYST — Surfaced to the deal page; awaiting analyst classification. Blocks the gate until classified.

Cross-vendor cross-check

The break classification is itself a model judgment. When the primary classifier's confidence is in the methodology-pinned mid-confidence band (currently 0.7–0.85), the same break is re-classified by the cross-check vendor (per the Phase 1 cross-check architecture). Divergence on a suspicious or pending classification auto-escalates to analyst review.

Skill datapack-builder (runs LAST) — IC-ready Excel pack, built against audited + reconciled inputs

Why it runs last

The data pack is the deliverable the IC sees. It cites the financial model, the reconciliation outcome, and the audit verdict. Running it last (after audit-xls + GL reconciliation) means every number it cites has already been integrity-checked and reconciliation-classified. Building it earlier means citing un-vetted numbers.

What it produces

A single Excel workbook with every number an investment-committee member would want to see: revenue waterfall (gross → less refunds → less chargebacks → net), margin walk, customer cohort table (monthly cohorts, retention by month, LTV), ad-spend breakdown, supplier-concentration chart, working-capital schedule, reconciliation outcome summary, audit-xls verdict summary. Every number cites its source-tier and which Phase 2 pull it came from.

Downstream consumers

Phase 3 (comps benchmark) reads the financial model + cohort table to position the target against peer transactions. Phase 4 (ten-layer review) reads the unit-economics layer + supplier-concentration chart. Phase 6 (scenarios) reads the cohort table + working-capital schedule to model retention curves and cash conversion. Phase 7 (verdict) reads everything. Phase 10 (memo render) cites the data pack rows by reference.

Override Phase 2 override path — typed operational reason codes (not strategic); analyst-set, IC-reviewed in aggregate

What can be overridden

Per-aspect overrides at Phase 2: input-contract failure (e.g. accounting unavailable but seller-provided P&L deemed adequate), audit-xls HARD_ERRORS verdict (e.g. balance sheet imbalance under a known accounting-system quirk), reconciliation outcome (e.g. SUSPICIOUS break that the analyst has reclassified to EXPLAINED_BY_ANALYST with documented note). Each override is keyed by sub-aspect, not by phase as a whole.

Reason codes are operational, not strategic

Phase 1's strategic reason codes (IC_STRATEGIC, FOUNDER_RELATIONSHIP, etc.) do not apply at Phase 2 — Phase 2 is an operational analytical phase, not a strategic-fit gate. The Phase 2 override taxonomy uses the operational break-code vocabulary from the reconciliation skill plus per-aspect typed reasons (e.g. ACCOUNTING_SYSTEM_QUIRK, SELLER_PNL_RECONCILED_OFFLINE, BREAK_RECLASSIFIED_ON_INSPECTION). The per-aspect valid_reason_codes set lives in reconciliation-rules.md alongside the tolerance pinning.

Surfacing + aggregate review

Every override surfaces in the executive summary of the memo + on the deal page. The IC monthly review aggregates Phase 2 overrides alongside Phase 1 and Phase 7 overrides — patterns (same break code reclassified often, same analyst reclassifying disproportionately) drive either methodology amendments to reconciliation-rules.md or an analyst conversation.

Aggregation Phase 2 four-state gate — PASS / CONDITIONAL_PASS / HARD_PASS / OVERRIDE_PROCEED

The four states

PASS — every leg's input contract satisfied, audit-xls CLEAN or WARNINGS_ONLY, reconciliation within tolerance or breaks operationally explained, COGS/inventory triangulation within tolerance, no override. Proceed to Phase 3 cleanly.
CONDITIONAL_PASS — one or more deferred sub-aspects (e.g. live accounting unavailable, seller-PnL fallback in use), OR audit-xls RECOVERABLE_ERRORS, OR within-tolerance breaks with analyst-explained codes. Proceed to Phase 3 with named gaps on the deal page that the memo must address.
HARD_PASS — audit-xls HARD_ERRORS (balance imbalance, cash tie-out failure, broken roll-forwards), OR reconciliation has SUSPICIOUS or PENDING_ANALYST breaks above methodology tolerance, OR input contract failed on all three legs with no override. Run halts and produces the HARD-PASS deliverable. Phase 3 does not start.
OVERRIDE_PROCEED — analyst override under the override-paths policy. The gate's vote stands ("HARD_PASS would have fired on the SUSPICIOUS break on the Amazon settlement gap"); the run proceeds anyway with the override reason captured. The memo surfaces the override prominently.

Why no other hard gates

The previous Phase 2 design had audit-xls as a hard run-stop ("if any critical errors are found, this phase fails and the run stops"). That violated the engine-wide no-hard-gates principle established in Phase 0.5 and Phase 1. The new four-state gate preserves the fail-fast posture for truly unrecoverable conditions (a model whose balance sheet doesn't balance cannot be salvaged) while routing everything else through CONDITIONAL_PASS or OVERRIDE_PROCEED so analyst judgment can decide.

Output {date}_{target}_phase2-gate-log.json — machine-readable audit trail of every Phase 2 decision

What's in it

Written to 10_dd-output/{date}_{target}_phase2-gate-log.json on every run, regardless of gate outcome. Contents: input-contract state per leg, discovered platforms, FX-rate snapshot timestamp, per-skill {ok, errors, warnings}, audit-xls failure list with severity, reconciliation outcome with per-break classification, COGS/inventory reconciliation outcome, final gate state, override (if any), analyst classifications applied during the run. Surfaced on the deal page as a collapsible audit trail. A verdict challenged six months later is reviewed against this artefact.

Output {date}_{target}_three-statement-model.xlsx — integrated financial model, audited, methodology-stamped

What it is

The financial model from the 3-statement-model skill, post-audit. Carries the methodology version stamp and the FX-rate snapshot timestamp on the cover sheet so future analysts know exactly which rules and which rates produced it. Filed under the deal's Drive folder. Available on the deal page under Supporting Documents.

Output {date}_{target}_data-pack.xlsx — IC-ready data pack, every number tier-tagged + reconciliation-cited

What it is

The data pack from the datapack-builder skill. Single Excel file. Every number cites its source-tier ([verified] / [derived] / [inferred] / [benchmark] / [judgment]), which Phase 2 pull it came from, and the reconciliation outcome that vouches for it.

2.5Phase 2 sense-check

Before the comp benchmark starts, four independent reviewers (Claude + GPT + Gemini + DeepSeek) audit Phase 2's financial output against an eight-charge sense-check sheet. If the financial DD has structural issues, the run does not proceed — it loops back to Phase 2 or routes to the analyst.

Phase 2 is the engine's core analytical phase: it owns the financial model, the GL reconciliation, the audit-xls outcome, and the IC-ready data pack. A single check by Phase 2 itself is not enough — the same component that produces an artefact cannot be the only one to certify it. Phase 2.5 is the independent panel of four reviewers. The reusable primitive lives in app/src/orchestrator/quad-review.ts as runQuadReviewWith; each invocation passes its own charge sheet, vendor list, and content blocks. The methodology snapshot pins each invocation's parameters at run-start, so reproducibility holds across re-runs. Phase 12 (memo verdict review) and Phase 0.5 (intake hygiene) are scheduled to migrate onto the same primitive in the post-Phase-2 refactor bundle — currently Phase 12 invokes the wrapper runQuadReview(runId, memo) and Phase 0.5 has its own bespoke 2-vendor logic in runP0HygieneCheck, both pending consolidation. Phase 2.5 is the first checkpoint to use the primitive directly.

What runs in this phase

API Anthropic — Claude Opus — reviewer #1, analytical-rigor pass

Why Claude for sense-check

Claude is the primary analytical model behind Phase 2's skill calls (3-statement-model, audit-xls, gl-reconciliation, datapack-builder). At Phase 2.5 a separately-prompted Claude instance reviews the output of those same skills — different prompt, different context, no shared memory between the build pass and the review pass. This is the same vendor's analytical rigor turned on its own work.

What it receives

Seven content blocks: input-contract state, discovered platforms (commerce / payments / accounting / entity structure), 3-statement model output, audit-xls verdict, GL reconciliation outcome, data pack summary, Phase 2 gate state. Plus the eight-charge sense-check sheet (see below). Returns per-charge votes plus an overall disposition.

API OpenAI — GPT 5.5 — reviewer #2, code-level + control-flow scrutiny

Why GPT for sense-check

GPT-5.5 has shown particular strength on code-level control flow and structural integrity questions — exactly the dimensions Phase 2's audit-xls verdict and reconciliation classification ride on. Different training origin from Claude; catches different things.

Same content, independent verdict

Identical seven content blocks and charge sheet. The four vendors do not see each other's verdicts. Divergence is high-signal — when reviewers disagree on a charge, the analyst sees all four rationales before deciding whether to proceed.

API Google — Gemini Pro — reviewer #3, schema + completeness pass

Why Gemini for sense-check

Different corpus and alignment from Claude and GPT. Strong on schema completeness and missing-field detection — useful when Phase 2's data pack columns need to match the downstream Phase 4 (ten-layer review) and Phase 6 (scenarios) consumer expectations.

API DeepSeek — V4 — reviewer #4, the contrarian

Why DeepSeek for sense-check

Different training origin from Claude, GPT, and Gemini. Consistently pushes back on overreach, gold-plating, and assertions presented as facts when the underlying data is ambiguous. Adversarial pressure on Phase 2's reconciliation classifications — particularly the SUSPICIOUS-vs-EXPLAINED-BY-ANALYST boundary — is exactly what DeepSeek is best at.

Reference The eight sense-check charges — what every reviewer is asked to evaluate

Financial model integrity — balance sheet balances; cash-flow tie-out reconciles indirect-method ending cash to the balance-sheet cash line; period-N closing balances equal period-N+1 opening; no hardcoded values inside formulas.
Reconciliation outcome credibility — break classifications are operationally explainable; the SUSPICIOUS bucket is non-empty only when the gap is genuinely unexplained; any >-tolerance break has either an operational explanation or a SUSPICIOUS / PENDING_ANALYST escalation.
Source-tier discipline preserved — every numerical output carries [verified] / [derived] / [inferred] / [benchmark] / [judgment] tagging; platform-pulled data is [verified]; seller-exported is [inferred] until reconciled.
Currency normalisation consistent — every raw amount is converted to target_currency using the FX-rate snapshot stamped on the run; per-currency raw values are preserved alongside normalised values; no silent mixed-currency aggregation.
Cohort retention and unit-economic plausibility — outputs are plausible relative to the seller-provided narrative AND the platform-pulled order data; no fabricated cohort tables; no LLM-hallucinated retention curves; cohort granularity matches the methodology-pinned definition.
Marketplace coverage complete — Amazon / eBay / TikTok Shop / Etsy revenue is captured where the target operates on those channels; the reconciliation denominator reflects all commerce channels, not Stripe + PayPal only.
Multi-entity handling correct — for targets with parent + subsidiaries, per-entity GL is consolidated with explicit eliminations; the audit-xls verdict checks per-entity-plus-consolidated; no entity is silently dropped.
Output schema compliance — data pack columns match the methodology-pinned schema; Phase 4 (ten-layer unit economics) and Phase 6 (scenarios) consumers receive the fields they expect.

For each charge, each vendor returns SUSTAINED (charge holds, Phase 2 output is fit on this dimension), DISMISSED (charge doesn't apply or the source material doesn't support evaluating it), or MODIFIED (charge applies in a slightly different form than stated, with explanation).

Aggregation Disposition aggregation — caution wins ties, four-vendor variant

The four post-phase dispositions

CLEAN — every available reviewer returned UPHELD; Phase 2 output is fit to proceed to Phase 3.
ISSUES_FOUND — one or more reviewers returned RESTRUCTURED. The run can proceed but the analyst confirms via the deal page.
BLOCKING — two or more reviewers returned OVERRULED (or one OVERRULED + one RESTRUCTURED) — Phase 2's output has a structural issue that makes Phase 3 unsafe. Phase 2 must address the findings and re-run.
DEGRADED — all four reviewers unavailable (timeout, API down, parse error). The run proceeds with an explicit "Phase 2.5 not run" entry in the deal page audit log.

How review-vocabulary maps to post-phase-vocabulary

Reviewers return UPHELD / RESTRUCTURED / OVERRULED / UNAVAILABLE (the standard Quad Review vocabulary). The aggregator maps these to the post-phase CLEAN / ISSUES_FOUND / BLOCKING / DEGRADED vocabulary so the deal page and the analyst-facing UI use consistent language across Phase 0.5, Phase 2.5, and Phase 12 checkpoints.

Caution wins ties

OVERRULED gets safety bias: any single OVERRULED combined with any RESTRUCTURED forces BLOCKING. Two or more OVERRULED is automatic BLOCKING. The principle: when reviewers disagree on whether Phase 2 output is fit, the cautious read wins.

Divergence surfacing

Output p2-sense-check.json — all four vendors' votes + final disposition + divergence notes

Structured record written to 10_dd-output/{date}_{target}_p2-sense-check.json. Contains the full per-charge vote and rationale from each of the four vendors, the per-charge divergence summary where applicable, the final aggregated disposition (CLEAN / ISSUES_FOUND / BLOCKING / DEGRADED), and a per-vendor duration line. Surfaced on the deal page alongside the Phase 0.5 hygiene verdict and the Phase 12 memo verdict so the IC reviewer sees every adversarial checkpoint's outcome in one place.

If the disposition is BLOCKING or ISSUES_FOUND, the deal page renders an action banner with the affected charge(s) named, the dissenting reviewer(s) cited, and a "re-run Phase 2" or "confirm and proceed" button as appropriate.

3What did similar deals sell for?

Builds the comparable-transactions analysis. Discovers which comp sources are reachable per the target's category mapping, pulls comp transactions currency-normalised at each comp's transaction date, decomposes earnouts and contingent consideration, normalises enterprise-value basis, applies methodology-pinned filters (lookback / size-band / channel-mix / source-tier), and produces the peer set + the comp_retention_percentage that Phase 7 uses to backfill criterion 11.

We never look at a deal in isolation. Phase 3 finds the businesses that look like the target and pulls what they sold for, in what year, at what multiple of which denominator. The methodology insists on more than the headline number: a $5M deal with $3M of earnout is not a $5M comp; a cash-free / debt-free comp is not a like-for-like with a comp that assumed working capital; a strategic-acquirer comp carries a control premium relative to a financial-buyer comp; a Flippa-tier comp is not the same evidentiary weight as a PitchBook-tier or an SEC-filing comp. Phase 3 surfaces these distinctions explicitly rather than averaging them away.

What runs in this phase

Pre-check Phase 3 gate-input contract — verifies at least one comp source is connectable AND Phase 2's prior output is usable before any pull fires

What it checks

Before any comp pull fires, the engine verifies (a) Flippa / PitchBook / internal comp-database connectors are reachable; (b) Phase 2's prior output (model + commerce breakdown + reconciliation outcome) is present and not HARD_PASS without override — running comps against an un-reconciled revenue figure produces meaningless size-band positioning; (c) target_currency + FX-rate snapshot are present (already enforced upstream but rechecked at this boundary).

What happens when an input is missing

The affected sub-output is marked DEFERRED with the missing-input reason. Downstream comps-analysis skill receives the deferred state and produces explicit data-gap blocks rather than synthesising a peer set from empty arrays. If every comp source is unavailable AND Phase 2 prior output is missing, Phase 3 aggregates to HARD_PASS unless overridden by an analyst with documented data-gap reason.

Discovery Comp-source discovery (taxonomy mapping) — maps the run's deal.categories to each source's native taxonomy before the pulls fire

Why discovery, not hardcoding

Each comp source uses a different category taxonomy: Flippa has its own marketplace categories; PitchBook uses NAICS / SIC industry codes; the curated comp-database is organised by Ecomma-internal verticals (beauty / apparel / home / baby / automotive). Passing the raw deal.categories strings to all three results in taxonomy misses — a "Watches" deal won't match PitchBook's Personal-luxury-goods NAICS code. Discovery-before-pull is the same pattern Phase 0.5 (vendor selection) and Phase 2 (platform-of-record) established.

Per-source mapping confidence

Each taxonomy mapping carries a methodology-pinned confidence score. Mappings with confidence below 0.7 surface on the deal page for analyst confirmation before the affected pull fires. Methodology amendments to the taxonomy-mapping tables stamp a new methodologyVersion; re-runs use the same mapping the original run used.

Output

Per-source { available, nativeCategories, mappingConfidence } structure plus a snapshot version stamp. Surfaced on the deal page; the analyst can correct a low-confidence mapping with one click, which re-fires the affected pull.

Discipline Currency normalisation handshake — every comp normalised to target_currency at the comp's transaction date (historical FX)

The handshake

Comp transactions come in their source currency: Flippa US comps in USD, PitchBook EU comps in EUR, Ecomma curated comps in whatever currency the original transaction was recorded in. Phase 3 normalises every raw amount to target_currency using the fxRateSnapshot AT THE COMP'S TRANSACTION DATE — historical FX, not the run-start snapshot used for Phase 2's current-state data. This is materially more complex than Phase 2's handshake because comps span historical dates with their own FX-rate context.

Why multiples themselves are currency-neutral

An EV/Revenue multiple of 3.5× is a dimensionless ratio — both numerator and denominator carry the same currency, so the ratio is identical whether expressed in USD or EUR. The currency-sensitive surface is the size-band metric (target revenue vs comp revenue), the monetary outputs (price ranges, EBITDA in absolute terms), and the per-source primary-listing currency. Phase 3 normalises the monetary inputs; the multiples flow through unchanged.

Provenance preserved

Each comp transaction carries fxRateApplied: { sourceCurrency, rate, rateCapturedAt } alongside its normalised values. The audit log can reconstruct the conversion six months later.

API Flippa — recent ecommerce-brand sales on Flippa's marketplace; source-tier inferred

What Flippa is

The largest open marketplace for small online-business sales. Most Ecomma-sized deals (under $1M) show up there at some point. The API gives us recent transactions filtered by category, multiple, and size — but the data is seller-asserted, not audited, so every Flippa row is tagged [inferred] source tier.

What's pulled

Per the discovered Flippa categories from the previous step, the engine pulls trailing-24-month sold listings (the lookback window is methodology-pinned in comp-selection-rules.md; default 24 months). Each row normalised to the canonical CompTransaction shape with currency normalisation + earnout decomposition + EV-basis tagging applied. Per-source pull_status (connected / unavailable / rate_limited / auth_failed / empty_for_category) surfaces in the gate aggregation.

API PitchBook — private-market deal database; source-tier verified

What PitchBook is

Subscription database of private-company transactions. Bigger deals than Flippa. Useful when the target is on the larger end of our range or when we want strategic-acquirer comps. Editorial review applied to every row, so source-tier is [verified].

What's pulled

Per the discovered PitchBook NAICS / SIC codes, the engine pulls trailing-24-month transactions filtered by the methodology-pinned size band (revenue ±50%, extended to ±100% only if the primary band yields fewer than 8 comps). Each row normalised to CompTransaction. Per-source pull_status surfaces in gate aggregation.

Reference comp-database.md — Ecomma's curated historical comps; source-tier curated with per-row vintage

What it is

Ecomma's own curated comp database. Organised by Ecomma-internal vertical with annotated multiples, growth rates, and gross margins on each comp. Source-tier is [curated] with per-row vintage + last-reviewed timestamps. A row added in 2022 that hasn't been reviewed since is flagged for status — the curation methodology pin tracks per-row review cadence so a 2-year-stale row doesn't get treated as authoritative.

Curation hygiene

The curation methodology (who adds, who reviews, retirement criteria, per-vertical reviewer assignment) is pinned in Wohnish-Methodology-Derivation.md as a methodology amendment. Drift between curated rows and current market reality (e.g. pre-rate-hike comps in a post-rate-hike market) is surfaced via the time-decay weighting that the comps-analysis skill applies.

Normalisation Earnout / contingent-consideration decomposition — $5M headline with $3M earnout is not a $5M comp

Why this matters

Headline transaction prices routinely include earnouts (contingent on hitting forward revenue / EBITDA milestones), seller notes (deferred payments), and equity rollovers (continued ownership stake) that materially distort the implied multiple. A $5M headline deal with $3M of earnout is, at signing, a $2M cash transaction with $3M of contingent upside — averaging that into a 5M-priced comp median over-weights the deal class by 2.5×.

What the decomposition produces

Each comp carries three price fields: headlinePrice (the announced figure), realisedPriceLow (cash + non-contingent equity paid up-front), realisedPriceHigh (max if all earnout triggers fire). Plus a contingentStructure sub-record capturing earnoutAmount, earnoutTriggers, equityRollover, sellerNote. The comps-analysis skill computes "implied multiple" from realisedPriceLow by default and surfaces the headlinePrice separately so the IC sees the difference. Comps where realisedPriceLow / headlinePrice < 0.5 (more than half is contingent) are flagged with the EARNOUT_DOMINATES break code.

Normalisation Enterprise-value basis normalisation — cash-free/debt-free vs with-liabilities-assumed vs with-WC-peg

Why this matters

Two comps both reported at "5× revenue" can mean different things depending on the EV definition: a cash-free / debt-free transaction (typical for clean PE deals) is not a like-for-like with a deal that assumed working capital deficits, or one with a working-capital peg adjustment at close. Without normalisation, multiples from different EV bases get averaged across structurally-different transactions.

What the basis-tag captures

Each comp carries evBasis: 'cash_free_debt_free' | 'with_liabilities_assumed' | 'with_wc_peg' | 'undisclosed'. The comps-analysis skill normalises to a consistent EV definition before computing multiples; undisclosed comps are flagged with the EV_BASIS_UNDISCLOSED break code and can be excluded or analyst-overridden depending on peer-set depth.

Handshake Phase 2 → Phase 3 prior-output read — consumes the 3-statement model + commerce-channel breakdown + reconciliation outcome

What Phase 3 reads from Phase 2

Three things: (a) phase2.model.consolidated.pnl.netRevenue — the reconciled revenue used for size-band positioning; (b) phase2.commerce — channel breakdown to derive the target's DTC channel mix (which the methodology-aligned-comparison rule requires > 60%); (c) phase2.recon — the reconciliation outcome that vouches for the revenue figure. If Phase 2's reconciliation has SUSPICIOUS breaks, Phase 3 surfaces a "reconciliation gap" warning on the comp positioning rather than silently using a possibly-inflated revenue figure.

Why fail-loud if Phase 2 HARD_PASS without override

A target whose Phase 2 reconciliation failed is one where the revenue figure isn't trustworthy. Running comps against an un-validated revenue figure produces meaningless peer-set positioning — a 5× multiple against an inflated revenue number is itself inflated. Phase 3 fails-loud unless the analyst has explicitly overridden Phase 2 with a documented reason.

Reference comp-selection-rules.md (methodology pins) — lookback, size band, channel-mix threshold, multiple-set per business type

What's pinned

The methodology snapshot fixes these per-run values so two analysts running the same target produce the same peer set:

Lookback window: trailing 24 months (default; per-business-type override allowed).
Peer-set size: 8 minimum (drop if smaller, unless override), 15 maximum (rank by relevance if larger).
Size band: ±50% of target revenue (primary band); extend to ±100% only when primary yields fewer than 8 comps.
DTC channel-mix threshold: 60% (applied for criterion-11 backfill peer selection; relaxed for general comp-set with explicit flag).
Multiple set per business type: DTC ecommerce → EV/Revenue + EV/EBITDA primary, EV/Customer secondary; Marketplaces → EV/GMV + EV/Revenue; Subscription / SaaS → EV/ARR + EV/Customer; Mixed → all relevant, analyst flags primary.

Methodology amendments stamp a new snapshot version; re-runs use the same pins the original run used.

Skill comps-analysis — peer-set construction, statistical benchmarking, multi-axis positioning

What the skill does

Constructs the peer set from the three sources, applies the methodology-pinned filters (lookback / size-band / channel-mix), strips outliers per the skill's pinned rule, splits strategic-vs-financial buckets per the methodology snapshot, and produces multi-axis positioning across the per-business-type multiple set. The SKILL.md at app/skills/finance/comps-analysis/SKILL.md is the canonical spec for: tier-weighted median formula, outlier-handling rule choice, IP/brand-portfolio comp class semantics, time-decay weighting, and strategic-vs-financial bucketing protocol.

Methodology-pinned vendor

Vendor and model identifier live in the methodology snapshot. Current pinning: Anthropic Claude Opus 4.7 as primary; GPT 5.5 as the cross-check vendor when peer-set ranking confidence is in the mid-confidence band [0.7, 0.85]. Methodology version drift = skill version drift = audit-trail entry.

Output

Two artefacts plus the structured outcome: {target}_comps-deck.docx (the IC-facing peer comparison), plus the structured CompsAnalysisOutput that the gate aggregator + Phase 6 (scenarios) + Phase 7 (verdict) read. Every cell cites source-tier per the M12 anti-hallucination protocol.

Selection rule comp_retention_percentage (criterion-11 backfill) — deterministic match function moved from narrative-only to code

The rule

From the comp set, the engine selects the closest public comparable using the methodology-aligned-comparison predicates (each a hard gate, all must hold):

Same primary-category (target.categories[0] === comp's primary).
Size band: comp revenue within ±50% of target revenue (primary band).
Public filings within trailing 24 months (capturedAt is recent).
DTC channel mix > methodology-pinned 60% threshold.
sourceTier === verified (public filings or PitchBook curation; Flippa-tier excluded).

The selected comparable's trailing-12-month repeat-customer share (orders ≥ 2 from same email or phone in the window; DTC channel only) is persisted as phases.3.output.comp_retention_percentage alongside the comparable's identity, source filing reference, and the per-predicate pass/fail trace. Phase 7 reads this field directly to backfill criterion 11; no further computation at backfill time.

Cross-check on ranking judgment

When peer-set ranking confidence is in the mid-confidence band [0.7, 0.85], the same ranking is re-run by the cross-check vendor (currently GPT 5.5). Divergent rankings escalate to analyst review with both rationales surfaced.

If no comparable qualifies

The engine returns NO_QUALIFYING_COMP with rationale (typically: target operates in a category with no public peer at similar scale, or all public peers fail the DTC channel-mix threshold). Phase 7 propagates this as DEFERRED on criterion 11 — a missing comparable is a data gap, not a thesis miss. The audit-quad's "category-mismatched analyst-relaxation path" can be surfaced via the override path below.

Override Phase 3 override path — operational reason codes (not strategic, not reconciliation); per-aspect, IC-reviewed in aggregate

What can be overridden

Per-aspect overrides at Phase 3:

NO_QUALIFYING_COMP_ACCEPTED — analyst accepts a category-mismatched "closest available" comp when zero rows survive the methodology-aligned-comparison predicates. Phase 7 receives a marked-up retention figure with the relaxed-comparison flag set.
BAND_OUTSIDE_TOLERANCE_ACCEPTED — comp included in peer set despite revenue outside ±100% extended band (e.g. only available comp is 3× target revenue, no closer options).
OUTLIER_INCLUDED_BY_ANALYST — outlier comp the skill would have dropped is re-included with documented analyst rationale.
SOURCE_TIER_GAP_ACCEPTED — peer set built primarily from Flippa-tier (inferred) data because no PitchBook or filings-tier comps exist in the category at this scale.

Reason codes are operational, not strategic

Phase 1's strategic reason codes (IC_STRATEGIC, FOUNDER_RELATIONSHIP, etc.) and Phase 2's reconciliation break codes do not apply at Phase 3 — Phase 3 is an analytical positioning phase, not a strategic-fit gate nor an integrity check. The Phase 3 override taxonomy is its own scope-appropriate vocabulary.

Surfacing + aggregate review

Every override surfaces in the executive summary of the memo + on the deal page. The IC monthly review aggregates Phase 3 overrides alongside Phase 1, Phase 2, and Phase 7 overrides — patterns (same break code reclassified often, same analyst overriding disproportionately) drive either methodology amendments to comp-selection-rules.md or an analyst conversation.

Aggregation Phase 3 four-state gate — PASS / CONDITIONAL_PASS / HARD_PASS / OVERRIDE_PROCEED

The four states

PASS — every leg's input contract satisfied, at least one comp source returned ≥ 8 transactions after methodology-pinned filters, peer set passes outlier handling, comp_retention_percentage selection rule yielded a qualifying comp. Proceed to Phase 4 cleanly.
CONDITIONAL_PASS — peer set ≥ 4 but < 8, OR > 1/3 of comps flagged EARNOUT_DOMINATES, OR mid-confidence taxonomy mappings, OR NO_QUALIFYING_COMP on criterion-11 selection but general comp set adequate. Proceed to Phase 4 with named gaps that the memo must address.
HARD_PASS — peer set < 4 AND no override, OR Phase 2 prior output was HARD_PASS without override, OR all three comp sources unavailable. Run halts; HARD-PASS deliverable produced. Phase 4 does not start.
OVERRIDE_PROCEED — analyst override under the override-paths policy. The gate's vote stands ("HARD_PASS would have fired on insufficient peer set"); the run proceeds anyway with the override reason captured. The memo surfaces the override prominently.

Output {date}_{target}_phase3-gate-log.json — machine-readable audit trail of every Phase 3 decision

What's in it

Written to 10_dd-output/{date}_{target}_phase3-gate-log.json on every run, regardless of gate outcome. Contents: input-contract state per source, discovery results with per-source mapping confidence, FX-rate snapshot timestamp, per-source pull_status, comps-analysis skill {ok, errors, warnings}, peer-set transaction count, comp_retention_percentage value + selection trace (or NO_QUALIFYING_COMP rationale), final gate state, override (if any). Surfaced on the deal page as a collapsible audit trail.

Output {target}_comps-deck.docx — IC-ready peer-set comparison with source-tier per cell

What it is

The comps deck from the comps-analysis skill. Word document with the peer-set table, distribution charts, multi-axis positioning, and the "where this deal sits relative to comparable transactions" verdict. Every multiple cites which comp it came from + that comp's source-tier (verified / curated / inferred) + the methodology snapshot version. Cover sheet stamps target_currency + FX-snapshot timestamp + the methodology snapshot.

Downstream consumers

Phase 4 (ten-layer review, Layer 7 competitive-analysis) reads the peer set. Phase 6 (three-scenario model) reads the multiple ranges to bound bull / base / bear pricing. Phase 7 (verdict) reads everything plus the comp_retention_percentage. Phase 10 (memo render) cites the deck by reference.

Output comp_retention_percentage — deferred-criterion-11 backfill payload for Phase 7

Why this is a first-class output, not just a column in the deck

Phase 1's thesis gate defers criterion 11 (retention vs comp) because Phase 1 runs before the comp set exists. Phase 3 is the phase that produces the comp set; the closest public comparable's repeat-customer share is therefore Phase 3's responsibility. Phase 7 reads phases.3.output.comp_retention_percentage directly to backfill criterion 11 into the full-twelve verdict aggregation. The key path is documented in engine-sync/SKILL.md so a Phase 7 rewrite never reads a phantom field name.

NO_QUALIFYING_COMP semantics

When no comparable qualifies, the field is null and phases.3.output.no_qualifying_comp carries the rationale. Phase 7 propagates this as DEFERRED on criterion 11, not FAIL — a missing comparable is a data gap, not a thesis miss.

4Pick the business apart, layer by layer

Ten structural lenses on the deal, examined one at a time per the binding methodology template — market sizing, traffic / SEO, customer retention vs comp, named competitive landscape, flip viability, supplier concentration, ad-account health, brand & digital-asset valuation, legal & IP exposure, operator dependency. Each layer gets a severity (Material / Caution / Minor) and the verdict tells Phase 7 what to push back against the financial story.

This is the deepest analytical phase. Most DDs only look at financials and shipping. We look at ten structural dimensions because the business itself is ten dimensions, and a clean P&L on top of a broken supplier relationship is a deal that dies in year two. The layers themselves are pinned by ten-layer-template.md (v3.0.0) — the methodology snapshot at run-start locks the layer set so two analysts running this on the same target a week apart produce comparable verdicts.

What runs in this phase

Pre-check Input contract — per-layer prerequisite availability before any skill fires

What it checks

Each layer's required inputs are verified before any LLM dispatch fires. Layer 3 (customer-retention) needs Phase 2's verified cohort + Phase 3's comp_retention_percentage; Layer 4 (named-competitive-landscape) needs Phase 3's peer set; Layer 8 (brand-and-digital-asset-valuation) reads Phase 2's commerce side. Missing inputs surface as DEFERRED on the affected layer, never as silent best-effort.

Phase 2 / Phase 3 handshake fail-loud

If Phase 2 returned HARD_PASS without override, Phase 4 fails loud — running structural review against an un-validated revenue figure produces misleading severities. Same for Phase 3 HARD_PASS without override (Layer 4 anchors against the comp set; no comp set means no defensible layer-4 evidence).

N/A semantics

Layers that don't apply to the deal type (e.g. Layer 7 ad-account-health on an all-organic target) are marked NOT_APPLICABLE with an analyst-attested rationale. N/A layers are excluded from the gate aggregator's fail-count denominator — they don't count toward the "multiple Material layers → HARD PASS" rule.

Discovery Category-knowledge discovery — per-vertical files loaded before any layer is reasoned about

Why discovery first

Each vertical has accumulated knowledge — typical margin ranges, repeat rates, common supply structures, common IP traps, common post-acquisition failure modes — captured in category-knowledge/{slug}.md files. Phase 4 maps deal.categories to per-vertical slugs and loads the corresponding files. Multi-vertical deals (e.g. apparel + home-goods) load multiple files.

Missing file = documented gap, not silent

If a vertical has no knowledge file (a new category, never DD'd before), the run carries an explicit CATEGORY_KNOWLEDGE_GAP on the gate-log. The ten-layer-review skill compensates with broader research at run-cost; the per-layer source-tier downgrades from [curated] to [judgment]; the gate aggregator surfaces CONDITIONAL_PASS as the floor unless the analyst overrides with CATEGORY_KNOWLEDGE_GAP_ACCEPTED.

Compounding knowledge

Per-deal insights emitted by Phase 4 surface to a deal-page review queue. Confirmed insights merge back into category-knowledge/{slug}.md with per-row provenance (deal reference, date, reviewer). The merge is review-gated — never runtime auto-write — so a seller's dataroom can't poison the methodology file for every subsequent deal in that vertical.

Reference ten-layer-template.md (v3.0.0, binding) — the layer set + severity rubric the engine is pinned against

The ten layers

Layer 1 — Independent market sizing. What's the TAM, growth rate, and direction for this target's actual sub-category? Common pitfall: choosing the wrong denominator (global dropshipping vs luxury e-commerce).

Layer 2 — Traffic / SEO / share-of-voice. Paid:organic mix, branded search trend, backlink authority, share of voice. Required when seller-stated "stable ROAS" claims need verification (post-iOS14 attribution compression makes "stable" a 15-20% real-dollar improvement).

Layer 3 — Customer retention (verified vs claimed, vs comp). Verified repeat rate compared against Phase 3's comp_retention_percentage. The retention deficit vs comp is often the single most decisive piece of evidence in the entire memo.

Layer 4 — Named competitive landscape. Specific competitors with pricing, positioning, tier reading. Where does the target sit? Path to defensibility runs UP through tiers, not within tier. Driven by the competitive-analysis skill.

Layer 5 — Flip viability / scenario model. FY+1 forward under downside / base / upside. Severity is DEFERRED at Phase 4; Phase 7 verdict finalises after Phase 6 scenarios produce the arithmetic.

Layer 6 — Supplier concentration risk. Top 5–10 suppliers by GMV, contract status, exclusivity, transferability. In dropship aggregators the supplier list IS the business; concentration above 30% with no written contract = revenue is contractually rented.

Layer 7 — Ad-account health. Account-level pulls beyond seller-stated ROAS — restriction history, audience saturation, frequency curves, creative diversity, Page feedback score, pixel transferability. Single restriction event = binary revenue risk for paid-dependent businesses.

Layer 8 — Brand & digital-asset valuation. Domain age/authority, email list size, social, returning-customer database — floor-value reasoning. Minor on most HARD PASS verdicts; matters more on PASS or CONDITIONAL where assets price into the negotiation ladder.

Layer 9 — Legal / IP exposure. Category-specific legal/IP risk + EU/regional consumer-protection exposure + customer-claims surface. Binding sub-rules (Lesson 11 + 12 from 2026-05-06): if the catalogue uses third-party trademarks OR if the target's own brand has unresolved clearance OR if oblique substitutes appear ("Beamer" for BMW), Layer 9 is Material by default and the license-evidence chain must clear before the verdict moves above HARD PASS.

Layer 10 — Operator / key-person dependency. What the founder actually does — hours, transferable assets, supplier relationships in personal vs company name, ad accounts in personal vs business Meta. In dropship aggregators the operator IS the supplier relationship.

Severity rubric per layer (template-binding)

Each layer gets one of three ratings — Material (load-bearing finding; could decide the verdict), Caution (material concern but not decisive on its own), or Minor (polish; surfaces in the memo, doesn't gate). Plus two engine-state sub-states: NOT_APPLICABLE (layer doesn't apply, analyst rationale required, excluded from fail count) and DEFERRED (awaits Phase 5 reputation or Phase 6 scenarios; Phase 7 verdict finalises). The "multiple Material layers → HARD PASS regardless of financial fitness" rule operates on the Material count among in-scope (non-N/A) layers.

Promote / demote rules

Promote to Material if: the layer surfaces an auto-flag (reputation / regulator), OR public-comp deficit is >2× the threshold, OR the gap is the deciding factor in the verdict. Demote to Caution / Minor if: the seller provides independent verification materially reducing the risk (e.g. signed supplier-transition assurance demotes Layer 6 from Material to Caution).

IP API USPTO TSDR + EUIPO — third-party trademark detection (Lesson 11) — catalogue scan against marque list

What this is

The first of three IP-clearance surfaces. Detects third-party trademarked branding in the catalogue (BMW, Porsche, Lamborghini, Disney, Marvel, Nike, LV, …) via a four-step path: (a) regex against the methodology-pinned marque list; (b) entity-register scan of Phase 0's seller dataroom for trademark mentions; (c) LLM judgment on catalogue descriptions; (d) cross-vendor consensus when single-vendor confidence < 0.85.

License-evidence chain

When a marque is detected, the license-evidence chain checks USPTO TSDR + EUIPO for the marque's registration + current owner + active enforcement status, then surfaces a license-verification step (brand-owner direct outreach + IP-counsel opinion when the deal proceeds). Per the binding Lesson 11 sub-rule: seller-attested licensing chains are insufficient evidence regardless of deal size. Layer 9 stays Material until the chain produces a verified-licensed conclusion for every detected marque.

What it caught

Carrora — the failure mode this lesson codifies. A "Porsche-branded car care" catalogue with seller-attested licensing that didn't survive a USPTO + brand-owner direct check; the deal walked.

IP API USPTO TSDR + EUIPO — target's own brand clearance — is the target's brand name itself clear in operating geographies?

What this is

The second IP-clearance surface. A target selling unbranded private-label goods could still have an unregistered brand name that conflicts with someone else's US/EU mark. The buyer inherits the IP exposure post-acquisition.

Detection path

Extract candidate brand-name(s) from deal.name + deal.domain + any "operating-as / d/b/a" mentions in the seller dataroom. USPTO TSDR query per brand name (status + conflicting registrations); EUIPO query per brand name. Per-geography filter — only conflicts in deal.geographies count toward Layer 9.

Why this is a separate axis from Lesson 11

Lesson 11 is "catalogue uses someone else's trademark" (Porsche). Target's-own-brand clearance is "the target's brand IS someone else's trademark, or is about to collide with one." Both are IP-exposure axes; both feed Layer 9 severity; both must be cleared for the deal to clear above HARD PASS.

IP API Oblique-substitute detection (Lesson 12) — "Beamer" / "Italian Bull" / dual-naming-convention signals

What this is

The third IP-clearance surface. Detects oblique substitutes for trademarked brands ("Beamer" for BMW; "Italian Bull" for Lamborghini) — particularly when the same brand appears under two distinct naming conventions within a single catalogue, which the Lesson 12 binding sub-rule classifies as Material by default.

Detection path

LLM judgment on catalogue against a curated substitute list, plus a dual-naming-convention check (does the same referent appear under two different names?). Cross-vendor consensus required — false-positive risk on legitimate category language demands multi-vendor confirmation.

Override path

When detected, override to clear below Material requires "independent comp evidence that category convention drives the naming, not evasion." Analyst-attested with citation; default posture is Material.

Reference category-knowledge/*.md — per-vertical guides loaded by the discovery step above

What they are

One file per vertical capturing Ecomma's accumulated knowledge — typical margin range, repeat rates, common supply structures, common fraud patterns, common IP traps, common post-acquisition failure modes. Currently in the library: luxury-fashion-dropship.md (Aveugle case insights). New files get authored as a methodology amendment cycle when a new vertical is DD'd.

How they're used

The ten-layer-review skill loads the relevant files into its prompt for every layer's reasoning. A reviewer running Layer 9 on an automotive accessory deal doesn't have to rediscover that car-marque trademarks need USPTO verification — it's already in the knowledge file.

Skill sector-overview — feeds Layer 1 (market-sizing)

What it does

Builds a sector landscape: TAM, growth rate, value chain, top 5–10 players, valuation context, and the explicit "winners, losers, and why" call. Two methods for TAM (bottom-up preferred, top-down with caveats). Three-state-with-substates source-tier on every figure. Cross-vendor cross-check on the winners-losers call when primary confidence lands in the [0.7, 0.85] band.

Output

{target}_sector-overview.docx — IC-ready document. Structured SectorOverviewOutput consumed by Layer 1 of the ten-layer-review skill, by the competitive-analysis skill (top-players seeds the named landscape), and by Phase 10 memo (executive summary + market context section).

Per M9 resolution

Layer 1 of the ten-layer review imports this skill's TAM + growth rate + winners-losers thesis rather than re-deriving. The skill produces the content; the layer review references it with a methodology-pinned synthesis rule.

Skill competitive-analysis — feeds Layer 4 (named-competitive-landscape)

What it does

Maps the named competitive landscape: direct competitors with pricing, positioning, tier reading, distribution-channel signal, marketing-mix signal, defensibility moat sources, and the "where does the target sit?" tier reading. Per the binding template: "specific competitors with pricing, positioning, tier reading. Common pitfall: 'no moat' structurally argued without naming specific competitors. The named version is more defensible."

Tier reading rubric

Methodology-pinned tier scale (tier-1 / tier-2 / tier-3 / broker / unranked) with explicit distribution-channel + pricing-index + product-breadth + marketing-mix signals per tier. Tell signals — what reveals the target's true tier vs claimed tier — captured per competitor.

Output

{target}_competitive-analysis.docx — IC-ready document with positioning charts. Structured CompetitiveAnalysisOutput consumed by Layer 4 of the ten-layer-review skill, and by Phase 10 memo (competitive landscape section).

Skill ten-layer-review — produces the per-layer severity verdicts

What it does

The canonical engine implementation of ten-layer-template.md. Dispatches per-layer LLM reasoning against the category-knowledge files + sector-overview + competitive-analysis + IP-clearance outputs + Phase 2 financial DD + Phase 3 comp benchmark, assigning severity per the Material/Caution/Minor rubric. Promoted from a research adapter to a skill in this rewrite — the work is too LLM-judgment-heavy to live in research.ts alongside deterministic adapters.

Per-layer output schema (template-binding)

Each layer carries five subsections: (1) what the financial DD covered; (2) what is missing; (3) external evidence with sources; (4) insight — "what no-one in the room would have said"; (5) impact on verdict (strengthens HARD PASS / weakens / supports CONDITIONAL / neutral). The insight subsection is the high-quality-bar element; the skill prompts explicitly for it.

Cross-vendor cross-check on severity

For every layer, when the primary vendor's confidence on the severity assignment lands in the [0.7, 0.85] band, the cross-check vendor runs the same prompt. Divergence captured per layer. Adoption rule: caution-bias (when in doubt, escalate); when cross-check votes less severe than primary, the layer becomes DEFERRED for analyst arbitration.

Compounding knowledge candidates

Insights surfacing during the layer review that aren't already in the relevant category-knowledge/{slug}.md are emitted as candidates to the deal-page review queue — confirmed by analyst (or IC) before merging into the methodology file with per-row provenance. Never runtime auto-write.

Output

{target}_ten-layer-review.docx — IC-ready document with the ten per-layer subsections rendered from TenLayerReviewOutput.verdicts. Structured output consumed by Phase 4 gate aggregator (Material count rule) and by Phase 7 verdict (severity-modulation against Phase 1 thesis-fit gate).

Handshake Phase 4 ↔ Phase 5 / Phase 6 split-revisit — Layer 5 + Layer 9 first-pass; Phase 7 finalises

The circular dependency

The template's Layer 9 binding rule requires Phase 5 reputation results ("auto-flagged reputation issues are automatically Material"). Layer 5 (flip-viability) requires Phase 6 scenario output. But Phase 5 + Phase 6 run AFTER Phase 4 in the orchestrator's PHASE_FUNCTIONS order. Re-ordering phases would break the recall write against prior runs and the run-package writer.

The resolution: split-and-revisit

Phase 4 produces first-pass severity with the inputs it has. Layers needing Phase 5/6 data carry severity: 'DEFERRED' + a populated awaiting field naming the upstream phase. Phase 7 verdict assembly reads the DEFERRED set + Phase 5/6 outputs and finalises severities. Same proven pattern as Phase 1 criteria-11/12 → Phase 7 backfill.

Exception: Layer 9 with IP-trigger fired

When any IP-clearance check (third-party / target-own / oblique-substitute) detects a trigger, Layer 9 severity is Material immediately — no waiting on Phase 5 reputation. The reputation cross-reference can only escalate; the IP-trigger establishes the floor at Material.

Gate Four-state gate aggregation — PASS / CONDITIONAL_PASS / HARD_PASS / OVERRIDE_PROCEED

Aggregation rules (priority order)

OVERRIDE_PROCEED — a non-expired analyst override for the phase is set.
HARD_PASS — any of: (a) Layer 9 Material from an IP-trigger AND no verified-licensed conclusion in the license-evidence chain; OR (b) ≥2 layers at Material with auto-flag triggers (reputation/regulator/comp-deficit > 2× threshold).
CONDITIONAL_PASS — any of: (a) 1 layer at Material without auto-flag (documented remediation path possible); (b) any layers DEFERRED awaiting Phase 5/6; (c) any category-knowledge file missing (documented domain-knowledge gap).
PASS — otherwise.

N/A excluded from fail-count denominator

Layers marked NOT_APPLICABLE are not counted toward the "≥2 Material" threshold — they're structurally not in scope, not a clean signal. Analyst-attested rationale required for every N/A.

Override Phase 4 operational reason codes — disjoint from Phase 1/2/3 codes

LAYER_FAIL_WITH_CONTRACTUAL_REMEDIATION — layer fail mitigated by signed contractual remediation (e.g. supplier-transition assurance).
IP_LICENSE_VERIFIED_OUT_OF_BAND — license evidence captured outside the engine (analyst-attested with citation).
SUPPLIER_RISK_ACCEPTED_WITH_INSURANCE — Layer 6 concentration accepted because key-supplier insurance / hedge in place.
CATEGORY_KNOWLEDGE_GAP_ACCEPTED — no category-knowledge file exists but analyst affirms domain expertise.
AD_ACCOUNT_HEALTH_DATA_UNAVAILABLE — Layer 7 ad-account-health can't be pulled (no Meta/Google access granted; public proxies suffice).
PHASE_5_6_DEFERRED_LAYER_PROCEED — Phase 4 first-pass severity accepted as final; Phase 7 doesn't re-finalise.

Output phase4-gate-log.json + three IC-ready docx artefacts

Gate log

10_dd-output/{date}_{target}_phase4-gate-log.json — written on every run regardless of outcome. Contains: input-contract state per layer, category-knowledge discovery results, IP-clearance three-surface outcome, per-skill .ok statuses, per-layer verdict with rationale + evidence + source-tier + insight + impact-on-verdict, gate state, override (if any).

Three deliverable docx files

{target}_sector-overview.docx · {target}_competitive-analysis.docx · {target}_ten-layer-review.docx — produced by the three skills above. Filed in the deal's Drive folder, surfaced on the deal page.

Phase 7 read path

Phase 7 verdict reads phases.4.output.layerVerdicts for the "multiple layer-fails → HARD PASS regardless of financial fitness" rule enforcement, plus phases.4.output.deferredLayers to know which layers to finalise after Phase 5/6 produce their outputs.

4.5Phase 4 sense-check

Before the reputation scan starts, four independent reviewers (Claude + GPT + Gemini + DeepSeek) audit Phase 4's ten-layer structural output against an eight-charge sense-check sheet. BLOCKING → Phase 4 re-runs with the flagged fixes before Phase 5 fires.

Phase 4 is the engine's most analytically diverse phase — ten layers of severity classification per the binding ten-layer template, with Lesson 11 IP-clearance three-surface checks and DEFERRED sub-flags that propagate into Phase 7's verdict finalisation. Misclassification at Phase 4 compounds two phases deep. Locked by decisions/cross-phase-sense-check-checkpoints-2026-05-21.html D1 (Add — 4-vendor standing checkpoint). Same runQuadReviewWith primitive Phase 0.5 / 2.5 / 7.5 / 12 use. Charge sheet at app/src/orchestrator/charge-sheets/phase4_5_sense_check.ts.

The eight charges cover: ten-layer coverage completeness (CH-01), severity-rubric calibration (CH-02), Lesson 11 IP-clearance firing with evidence chain (CH-03), DEFERRED-awaiting contract correctness (CH-04), category-knowledge grounding (CH-05), gate-aggregation arithmetic reconstructability (CH-06), override application discipline (CH-07), Phase 7 handshake schema preservation (CH-08). Reviewers return UPHELD / RESTRUCTURED / OVERRULED / UNAVAILABLE; the caution-bias aggregator maps to CLEAN / ISSUES_FOUND / BLOCKING / DEGRADED. BLOCKING halts (Phase 4 re-runs); ISSUES_FOUND surfaces on the deal page for analyst confirm; DEGRADED proceeds with documented gap.

5What does the public think?

Reads everything publicly written about the target across every reputation surface that exists for it — major review platforms, local review sites, social mentions, news — and looks for patterns of fakery or hidden problems. Never trusts one platform alone.

A clean P&L isn't always the truth. The reputation phase reads the public record. Reading one platform — even the biggest one — is how you miss the signal: in a recent manual DD, a local review site held 8,000 reviews against 3 on Trustpilot. Sticking to Trustpilot alone would have missed the entire reputation surface for that target. The engine probes a configurable set of platforms, discovers category-specific and local-market review surfaces via search, and aggregates the signal across all of them. The spike-detection, templated-language, and reviewer-history checks each apply per surface; the gate verdict (which Phase 7 uses to backfill criterion 12) is a volume-weighted aggregate, not a single-platform score.

What runs in this phase

Discovery Reputation-surface discovery — find every place the target actually has reviews, before deciding where to read

Why discovery first

The list of platforms that matter is not fixed in advance. Some categories have category-specific review aggregators (e.g., baby-product communities, automotive forums); some markets have dominant local review sites (Kiyoh in NL, eKomi in DE, Avis Vérifiés in FR, Recensioni Italia, Yandex Reviews in Russia-adjacent markets, BBB in the US for certain categories); some brands accumulate signal on Pinterest, Etsy, Amazon product pages, Yelp, or Facebook reviews more than on Trustpilot. The engine discovers which surfaces actually hold review volume for this target before it commits to which surfaces to read in depth.

How discovery runs

Search the target's brand name + "reviews" / "ervaringen" / "avis" / "rezensionen" / category-specific equivalents across major engines.
Probe the standard platforms (Trustpilot, Google reviews, Yelp, Facebook reviews, Sitejabber, ResellerRatings, Glassdoor for the employer surface, Reddit for organic mentions) and capture the review-volume on each.
Probe category-specific and locale-specific aggregators known to Ecomma's lessons table (the list grows over time).
Rank surfaces by review-volume × platform-credibility-factor (Trustpilot scores high on volume + credibility; a local 8,000-review platform scores high on volume; a 3-review platform on any service scores low).

Output of discovery

A reputation-surfaces.json for this target with every surface found, the volume, the platform-credibility, and the priority order in which the in-depth checks will run. Surfaced on the deal page so the analyst can add or remove a surface before the in-depth pass fires.

API Trustpilot (paid tier) — per-review metadata for the velocity / templated-language / reviewer-history checks

What it is

Trustpilot is the largest public reviews platform in many markets. The paid-tier API gives us per-review metadata: timestamp to the second, reviewer history, review text. The free tier only gives aggregates — useless for the per-platform integrity checks.

What we look for (applied per platform, not just here)

Clusters in time. Many reviews submitted within minutes of each other.
Templated language. Reviews with near-identical phrasing.
Reviewer history. Reviewers who only ever reviewed this business and nothing else.
Geographic anomalies. All five-star reviews coming from one country the target doesn't ship to.

Aveugle had clustered batches of five-star reviews posted in 90-second windows, all from accounts with zero other history, repeating the same phrasing. That alone moved the deal from CONDITIONAL to HARD PASS.

Important: a target with low Trustpilot volume is not automatically a low-reputation target. The engine checks Trustpilot, then checks every other discovered surface; the integrity verdict is the aggregate.

API Google Reviews + Google Maps Places — Google-attached review surface

Google Reviews via the Places API. Per-review metadata is more limited than Trustpilot's paid tier, but volume and timing are recoverable. The same spike-detection + templated-language + reviewer-history rules apply. Reviewer-history pattern is weaker (Google reviewer profiles are sparser), so this surface contributes more to the volume aggregate than to the templated-language signal.

API Facebook + Yelp + Sitejabber + ResellerRatings — second-tier review surfaces

Facebook reviews, Yelp (US), Sitejabber, and ResellerRatings each contribute a volume + integrity signal. The engine reads each surface where the target has >5 reviews (volume threshold below which the integrity checks lose statistical meaning). Each surface's verdict aggregates into the volume-weighted reputation score.

API Local-market review aggregators — the surfaces that hold the real volume in non-US markets

Why this is its own row

A US-only review surface set systematically misses signal for targets in other markets. The Netherlands has Kiyoh, klantenvertellen.nl, and Trustprofile. Germany has eKomi and Trusted Shops. France has Avis Vérifiés and Trustfolio. Italy has Recensioni Italia. The UK has Reviews.io. Each market has dominant local aggregators that Trustpilot does not displace.

The lesson

In a recent manual DD, a NL-market local aggregator (Kiyoh) held ~8,000 reviews for a target while Trustpilot held 3. A reputation phase that only read Trustpilot would have produced "low review volume, integrity inconclusive" — the wrong answer. Reading Kiyoh changed the verdict materially.

How it runs

The engine maintains a per-market table of dominant local aggregators (locale-keyed via target's primary geography from the deal form). It probes each, captures volume, and reads in depth where volume passes the threshold. The table is editable in project-starter/.claude/skills/run-deal-dd/references/reputation-surfaces.md; new local surfaces get added as lessons surface them.

API Category-specific review aggregators — vertical communities that often hold the real authority

Beauty has Sephora Community + MakeupAlley + Reddit r/SkincareAddiction. Baby/Kids has BabyCenter community + What to Expect forums. Automotive has marque-specific forums plus YouTube channel commentary. Home & Décor has Houzz + Etsy reviews + Pinterest sentiment. Apparel has Lyst + Style forums + brand-Reddit subs. The engine reads the per-category aggregators known to Ecomma's lessons table for the target's primary category.

API Reddit — organic mentions, customer complaints, founder posts

Reddit's API gives us recent mentions of the target's brand name, sentiment, and context. Useful for catching organic complaints (subreddits like r/scams or category-specific communities often surface problems before reviews catch up) and founder-posted content (the founder may have promoted on Reddit in ways that contradict the seller's current narrative).

API News search — press, regulatory actions, public incidents

Broad news search for any mention of the target. Catches regulatory actions, product recalls, lawsuits, and any incident that left a public footprint. Multi-language coverage (the target may be discussed primarily in its operating-market language; English-only news search systematically under-reads).

Aggregation Reputation aggregation — volume-weighted majority across surfaces, never single-platform

How surfaces combine

Each surface produces an integrity verdict: CLEAN (no spike, no templated language, no reviewer-history anomaly), SUSPICIOUS (one signal flagged), or FRAUDULENT (two or more signals flagged, or a single signal flagged at severe magnitude). The aggregate verdict is volume-weighted: a surface with 8,000 reviews contributes ~2,667× more weight than a surface with 3 reviews; a surface with credibility-factor 1.0 weighs equally per-review with another credibility-1.0 surface. The aggregate verdict is what Phase 7 backfills as criterion 12's score.

Divergence

When surfaces disagree at meaningful weight (one major surface flags FRAUDULENT, another flags CLEAN), the divergence is surfaced on the deal page with both verdicts shown. The analyst sees the disagreement before the verdict is locked.

The "low volume everywhere" case

If the target has below-threshold review volume on every surface, the aggregate verdict is INSUFFICIENT_VOLUME — not CLEAN. A new business with no reviews is not "reputation-verified clean"; it is "reputation-not-yet-evaluable." Criterion 12 then defers with the missing-volume rationale; Phase 7 surfaces the deferral in the memo.

Staging (v1.0.0 per Phase 5 audit-quad recommendation)

Core surfaces (Trustpilot + Google Reviews + Reddit + news + per-locale local-aggregator template) are wired this cycle. Long-tail surfaces (Yelp, Facebook reviews, Sitejabber, ResellerRatings, Glassdoor, BBB, category-specific aggregators) are typed stubs flagged deferred_long_tail on the discovery output so the analyst sees which surfaces aren't yet pulled for this run. Adapter implementation lands in a follow-up cycle.

Skill reputation-integrity — templated-language judgment with cross-vendor cross-check

The hybrid: deterministic + LLM judgment

Spike detection, reviewer-history anomaly, and geographic-anomaly are deterministic — the per-signal severity is the output of a math computation against methodology-pinned thresholds (≥ 10 reviews / ≤ 90s for spikes; ≥ 30% zero-history reviewers for fraudulent; ≥ 80% single-geography for fraudulent). These live in research.ts adapters.

Templated-language is LLM judgment — distinguishing "this is templated phrasing reflecting fraud-like coordination" from "this is natural-language repetition (category vernacular, brand-loyalist phrases, response to a brand-prompted review request)." The deterministic similarity computation (embedding-cosine ≥ 0.9 on review-pair text) flags candidate pairs; the skill's LLM call judges the candidate set with cross-vendor cross-check in the [0.7, 0.85] confidence band per the Phase 1 confidence-band pattern.

GDPR + reviewer-data restriction

In GDPR-scope locales (EU member states), reviewer-history scanning is skipped at the deterministic-adapter level — the skill respects that gap and does NOT attempt to backdoor reviewer enrichment via the LLM prompt. Documented gap surfaces on the deal page; override path GDPR_PRIVACY_CONSTRAINT_ACCEPTED for analyst attestation.

Aveugle regression fixture

The Aveugle pattern (10 clustered 5-star reviews from zero-history accounts in a 90-second window, repeating phrasing) is the canonical regression-protect test fixture. The skill must catch this pattern at every methodology-version snapshot; the vitest fixture asserts the deterministic + LLM-judgment chain produces a per-surface FRAUDULENT verdict on the synthetic Aveugle corpus.

Handshake Phase 4 ↔ Phase 5 Layer 9 auto-flag — closes the split-revisit DEFERRED awaiting reputation

The handshake the rewrite closes

Phase 4's runTenLayerReview emits Layer 9 (legal-IP-exposure) with severity: 'DEFERRED' + awaiting: 'phase5_reputation' when no IP-trigger fired at Phase 4. Per the binding ten-layer-template's Lesson 11 sub-rule, *"auto-flagged reputation issues are automatically Material"* — meaning a fraudulent reputation signal escalates Layer 9 from DEFERRED to Material. Phase 5 produces the auto-flag signal; Phase 7 verdict reads it to finalise Phase 4's Layer 9.

The signal

Phase 5's extractLegalIpAutoFlag adapter fires when ANY surface produces a SUSPICIOUS or FRAUDULENT verdict. Phase 5 output schema includes legalIpAutoFlag: { fired, surfaces[], rationale } — the field Phase 7's verdict-assembly code reads. Fires also surface on the deal page so the analyst sees the cross-phase escalation before the verdict locks.

Gate Four-state gate aggregation — PASS / CONDITIONAL_PASS / HARD_PASS / OVERRIDE_PROCEED

Aggregation rules (priority order)

OVERRIDE_PROCEED — a non-expired analyst override is set for the phase.
HARD_PASS — aggregate verdict FRAUDULENT AND ≥2 surfaces at SUSPICIOUS/FRAUDULENT (multi-surface confirmation; single-surface fraud is CONDITIONAL_PASS pending analyst review).
CONDITIONAL_PASS — aggregate verdict SUSPICIOUS, OR single-surface FRAUDULENT, OR INSUFFICIENT_VOLUME across all surfaces (deferred criterion 12).
PASS — aggregate verdict CLEAN.

Override reason codes (Phase 5 operational)

FRAUDULENT_SIGNAL_CATEGORY_NORM_ACCEPTED — fraud signal is category-baseline (some verticals routinely cluster on launch-day promotions)
INSUFFICIENT_VOLUME_NEW_BUSINESS_ACCEPTED — new business, no reviews yet, analyst attests that reputation evaluation will land at next-quarter review
LANGUAGE_GAP_ANALYST_TRANSLATED — non-English coverage gap manually closed by analyst
CROSS_PLATFORM_DEDUP_GAP_ACCEPTED — reviewer-identity dedup not yet wired; analyst attests no cross-platform pattern
SELLER_GAMING_FLAG_ACCEPTED — post-LOI manipulation suspected; analyst attests pre-LOI baseline is authoritative (the audit-quad-flagged risk)
GDPR_PRIVACY_CONSTRAINT_ACCEPTED — reviewer-history scanning skipped per data-protection law in EU geographies
EMPLOYER_SURFACE_OUT_OF_SCOPE_ACCEPTED — Glassdoor / employer-review surfaces excluded from consumer-reputation aggregate (e.g. sole-proprietor target with no employees)
REVIEW_AGE_NORMALISED_ACCEPTED — recent spike against a decade-old corpus of legitimate reviews; analyst attests the demote rule applies

Output phase5-gate-log.json + aggregate_reputation_verdict (criterion-12 key path)

Gate log

10_dd-output/{date}_{target-slug}_phase5-gate-log.json — persisted via the shared writeGateLogFile helper. Always attempted regardless of outcome. Contains: input contract state, surface discovery output, per-surface pull statuses, per-surface verdicts with per-signal rationale + cross-check trace, volume-weighted aggregate, legalIpAutoFlag, gate state, override (if any). Surfaced on the deal page as the per-surface audit trail.

Phase 7 read path

Phase 7 verdict reads phases.5.aggregate_reputation_verdict for criterion-12 backfill (stable key path documented in engine-sync SKILL.md to prevent silent rename drift; .output prefix retired per Phase 7 audit H12 coherence pass — engine stores phase outputs at priorOutputs.phaseN, no nested .output wrapper). INSUFFICIENT_VOLUME propagates as DEFERRED on criterion 12, not FAIL — mirrors Phase 3's NO_QUALIFYING_COMP design.

5.5Phase 5 sense-check

Before scenario modelling starts, four independent reviewers audit Phase 5's reputation scan and the legalIpAutoFlag producer output. The risk profile is asymmetric — false-negatives on the auto-flag are far costlier than false-positives, so adversarial pressure on every Full / BYO / postmortem run is the design.

Phase 5's output feeds two downstream phases: Phase 4 Layer 9 finalisation reads legalIpAutoFlag; Phase 7 criterion-12 backfill reads aggregate_reputation_verdict. A false-negative on the auto-flag lets a Layer 9 IP issue slide into Phase 7 as Minor instead of Material. Locked by decision page D2 (override of the original conditional recommendation — user picked uniform standing 4-vendor over conditional firing). Charge sheet at app/src/orchestrator/charge-sheets/phase5_5_sense_check.ts.

The eight charges cover: reputation-surface coverage completeness with locale routing (CH-01), legalIpAutoFlag firing rule under asymmetric-risk bias (CH-02), per-surface integrity verdict calibration (CH-03), Aveugle-pattern spike detection (CH-04), GDPR locale-aware reviewer-history-skip discipline (CH-05), Phase 3 ↔ Phase 5 comparator handshake soundness (CH-06), templated-language verdict grounding with cross-vendor cross-check (CH-07), Phase 4 + Phase 7 handshake schema preservation (CH-08). Same disposition + aggregation pattern as the other .5 checkpoints.

6Run the deal three ways

Builds the LBO model and a three-scenario projection (base, bear, bull). Answers "what happens to our money under each version of the future?"

An LBO model is the financial model from the buyer's point of view: how much we pay, how we finance it, what debt looks like over time, what return we get on our equity. The three scenarios add the realistic spread — base case is the path of least resistance, bear is what happens if things get harder, bull is the operating-uplift case where we drive real improvement post-acquisition.

What runs in this phase

Skill lbo-model — Sources & Uses, debt schedule, IRR, MOIC

What an LBO is

"Leveraged buyout." The standard way private-equity buyers underwrite acquisitions. You assume a purchase price, layer in some debt, model the cash flows over a hold period (typically 5 years for Ecomma's deals), assume an exit valuation, and compute the return on your invested equity.

What this skill produces

An Excel file with: Sources & Uses (where the money to buy the deal comes from), an operating model (what the business does each year), a debt schedule (how much we owe over time), and a returns analysis (IRR — internal rate of return; MOIC — multiple on invested capital). Plus sensitivity tables — what happens if the exit multiple is one turn lower, or two years later, or both.

Reference scenario-template.md (v1.0.0, binding) — the three-scenario framework + assumption-shift table

The three scenarios

Methodology-pinned assumption deltas per scenario, anchored to upstream phase outputs (the bear case is NOT a free LLM judgment — it's anchored to Phase 4 Layer 6 supplier concentration, Phase 4 Layer 7 ad-account health, and Phase 5 reputation aggregate):

Bull: DTC efficiency +15%, supplier cost −5%, terminal multiple at peer-set p75, retention +5%, exit Year 5
Base: Year-0 trend assumptions, terminal multiple at peer-set median, exit Year 5
Bear: DTC efficiency −20%, supplier cost −10% (−20% if Phase 4 Layer 6 Material), retention −10% (−15%/−30% if Phase 5 reputation SUSPICIOUS/FRAUDULENT), terminal multiple at peer-set p25, exit Year 4 (pulled forward)

Probability weights (methodology-pinned)

Default: bull 25% / base 50% / bear 25%. Analyst override via SCENARIO_WEIGHTS_ANALYST_OVERRIDE reason code with mandatory citation. Cross-vendor cross-check on override weights when analyst calibration confidence in [0.7, 0.85] band.

Sensitivity grid

3D grid: exit-multiple delta (−1, −0.5, 0, +0.5, +1 turns) × exit-timing delta (−1, 0, +1 years) × DTC-efficiency delta (−10%, −5%, 0%, +5%, +10%). 75 cells total, rendered as three nested tables in the LBO Excel.

Handshake Phase 4 ↔ Phase 6 flipViabilitySignal — closes the split-revisit DEFERRED on Layer 5

The handshake the rewrite closes

Phase 4's runTenLayerReview emits Layer 5 (flip-viability) with severity: 'DEFERRED' + awaiting: 'phase6_scenarios' per the split-revisit pattern from Phase 4's decision page D3. Per the binding ten-layer-template Layer 5 rule, *"show the arithmetic for the probability-weighted expected return explicitly"* — Phase 6 produces the arithmetic; Phase 7 verdict reads it to finalise Layer 5.

The signal

Phase 6's computeFlipViabilitySignal produces: scenarioWeightedExpectedReturn (Σ probability × IRR across scenarios), downsideSpreadToBase ((base − bear) / |base|), bearCaseSeverity (clean / caution / material per methodology-pinned thresholds — >50% downside-spread or negative expected return → material; 30–50% → caution; <30% → clean).

Phase 7's read

Phase 7 reads phases.6.output.flipViabilitySignal.bearCaseSeverity and maps it directly to Phase 4 Layer 5's final severity. If material, Layer 5 lands at Material in the final ten-layer record + verdict aggregator.

Gate Four-state gate aggregation — PASS / CONDITIONAL_PASS / HARD_PASS / OVERRIDE_PROCEED

Aggregation rules (priority order)

OVERRIDE_PROCEED — a non-expired analyst override is set for the phase.
HARD_PASS — bear-case IRR < methodology floor (default 0%) AND no override. The floor exists because a bear-case loss-on-invested-capital is a deal that should at minimum require explicit analyst attestation before proceeding.
CONDITIONAL_PASS — flipViabilitySignal.bearCaseSeverity is caution OR material (high downside spread; deal proceeds with named tail-risk on the deal page).
PASS — bear-case IRR ≥ floor AND downside-spread within tolerance.

Override reason codes (Phase 6 operational)

BEAR_CASE_BELOW_FLOOR_ACCEPTED — bear-case IRR < 0; analyst attests (e.g. earn-out structure absorbs the variance)
SCENARIO_WEIGHTS_ANALYST_OVERRIDE — analyst override of methodology-pinned probability weights with mandatory citation
TERMINAL_VALUE_BAND_RELAXED — exit multiple outside p25–p75 of comp-set; analyst attests
MULTI_ENTITY_CONSOLIDATION_SCENARIO_GAP_ACCEPTED — multi-entity target without per-entity consolidation; parent-entity LBO rendered with analyst attestation

Output phase6-gate-log.json + {target}_lbo-model.xlsx + {target}_cash-flow-sidecar.xlsx

Gate log

10_dd-output/{date}_{target-slug}_phase6-gate-log.json — persisted via the shared writeGateLogFile helper. Always attempted regardless of outcome. Contains: input contract state, per-scenario assumptions + IRR / MOIC / yearly cash flows, expected-return-weighted, flipViabilitySignal, gate state, override (if any).

Excel artefacts

The LBO model and the cash-flow sidecar with all three scenarios. The sidecar is structured so any IC member can compare the three side by side without having to switch tabs.

Phase 7 read paths

Phase 7 verdict reads phases.6.output.expected_return_weighted for verdict-assembly arithmetic, and phases.6.output.flipViabilitySignal for Phase 4 Layer 5 finalisation.

7Make a call, then attack it

Assembles the verdict (PASS / CONDITIONAL / HARD PASS), and immediately runs a "pre-mortem" against it — listing the ways the deal could go wrong if we proceed.

The verdict is conclusive. Three options, no "interesting but" — the IC should never have to ask "so what do you think?" If HARD PASS, the engine computes the three AND-conditions that would have to all be true for us to revisit. Single triggers are gameable; three AND-conditions aren't.

The pre-mortem is an adversarial exercise. Even on a PASS, the engine asks: "list the five ways this deal still goes wrong in the next 18 months." That list goes in the memo and on day one of ownership it becomes the operating team's risk register.

Phase 7 is the consumer side of every cross-phase handshake the prior phases built — Phase 1 deferred criteria 11 and 12 here; Phase 4 deferred Layer 5 and Layer 9 here; Phase 5 emitted the auto-flag signal Phase 7 reads to finalise Phase 4 Layer 9; Phase 6 emitted the flip-viability signal Phase 7 reads to finalise Phase 4 Layer 5. The verdict is the convergence point.

What runs in this phase

Check Input contract pre-check — every prior phase's load-bearing output verified before verdict-assembly fires

Before the rules-canonical verdict-assembly fires, the engine verifies it has every input it needs: phases.1.deferredCriteria array, phases.2.repeat_share, phases.3.comp_retention_percentage (or no_qualifying_comp), phases.4.layerVerdicts + phases.4.deferredLayers, phases.5.aggregate_reputation_verdict + phases.5.legalIpAutoFlag, phases.6.flipViabilitySignal + phases.6.expected_return_weighted. Any missing input surfaces as missingReasons on the gate-log and downgrades the per-phase gate state to CONDITIONAL_PASS rather than producing a verdict against incomplete inputs.

This closes the audit's H5 finding (no gate-input contract). The engine never silently produces a verdict against partial data.

Backfill Deferred-criterion backfill — criteria 11 and 12 (deferred by Phase 1) get their scores here, before the verdict aggregates

Why the backfill is Phase 7's responsibility

Phase 1's thesis gate scored ten criteria at intake and deferred two — criterion 11 (retention vs. public comp) needed Phase 3 to produce the comp set; criterion 12 (reputation) needed Phase 5 to run the multi-source reputation sweep. Both inputs now exist. Phase 7 is the first phase that has access to all twelve criteria simultaneously, so it is the phase that completes the gate.

Criterion 11 backfill

The engine reads phases.3.comp_retention_percentage (the closest-comparable retention figure Phase 3 produced under the methodology-aligned comparison rule) and the target's own trailing-12-month repeat-customer share from phases.2.repeat_share. Threshold check: target within 20 percentage points of the comparable. Returns PASS / NEAR / FAIL per the threshold, or DEFERRED if Phase 3 returned NO_QUALIFYING_COMP.

Criterion 12 backfill

The engine reads phases.5.aggregate_reputation_verdict (the volume-weighted aggregate Phase 5 produced across every reputation surface it probed) and maps it onto the gate vote: CLEAN → PASS, SUSPICIOUS → NEAR, FRAUDULENT → FAIL, INSUFFICIENT_VOLUME → DEFERRED. The per-surface divergence record from Phase 5 is preserved and surfaced in the memo when divergence is meaningful.

Final twelve-criterion aggregation

The full twelve criteria — Phase 1's ten plus the two backfills — are aggregated under the same four-state rule Phase 1 uses (PASS / CONDITIONAL_PASS / HARD_PASS / OVERRIDE_PROCEED). A FAIL on backfilled criterion 11 or 12 can move a CONDITIONAL_PASS to HARD_PASS or restructure an existing verdict. The verdict-assembly rules treat backfilled fails the same as gate-time fails — there is no discount for "found late."

What gets persisted

The backfill writes {date}_{target}_thesis-tracker.docx with the full twelve-criterion scorecard (gate-time + backfilled), and updates phase1-gate-log.json with a backfill sub-record that ties each backfilled score to its source phase output. The original Phase 1 gate verdict is preserved unmodified; the backfill is additive.

Handshake Phase 4 Layer 5 + Layer 9 finalisation — DEFERRED severities from Phase 4's first pass get their terminal severities here

Why the finalisation lives in Phase 7

Phase 4 produces first-pass severity for the ten layers with the inputs it has. Layer 5 (flip-viability) needs Phase 6's bear-case scenario arithmetic; Layer 9 (legal-IP-exposure, when no Phase 4 IP-trigger fired) needs Phase 5's reputation auto-flag. Both layers carry severity: 'DEFERRED' + awaiting: 'phase6_scenarios' | 'phase5_reputation' out of Phase 4. Phase 7 is the first phase that has access to both Phase 5 and Phase 6 outputs, so it is the phase that finalises.

Layer 5 finalisation

The engine reads phases.6.flipViabilitySignal.bearCaseSeverity and maps directly: clean → Minor, caution → Caution, material → Material. The finalised severity replaces the DEFERRED in the ten-layer record. If Material, the verdict-arithmetic checks Phase 6's bear-case IRR — Material Layer 5 with bear-IRR below floor is a HARD PASS.

Layer 9 finalisation

The engine reads phases.5.legalIpAutoFlag. Mapping: fired=false → Minor; fired=true with one surface → Caution; fired=true with two or more surfaces → Material. Material Layer 9 (legal-IP exposure substantiated by Phase 5 surfaces) is a HARD PASS reason code regardless of any other layer.

Template-version-skew check

Phase 7 records the ten-layer-template version Phase 4 ran against (phases.4.templateVersion) and compares against the methodology snapshot bound at run-start. Mismatch surfaces as a sense-check field on the gate-log. Override path TEMPLATE_VERSION_SKEW_ACCEPTED lets the analyst attest mid-run version-bump was analyst-driven.

API Rules-canonical verdict assembly — deterministic arithmetic produces the verdict; LLM is prose-only

Why rules, not LLM judgment

The terminal verdict is the most-consequential decision in the engine. Letting an LLM produce the verdict creates an irresolvable question: when the LLM says PASS but the rule-arithmetic on prior-phase outputs says HARD PASS, which wins? Phase 7's rewrite removes the question. Deterministic arithmetic on the prior-phase outputs produces the verdict. The LLM is called only to write the rationale prose for the memo — never to decide the verdict.

HARD PASS triggers (priority-ordered)

Backfilled criterion 11 (retention) is FAIL
Backfilled criterion 12 (reputation) is FAIL
Phase 5 aggregate reputation is FRAUDULENT
Phase 4 Layer 9 (legal-IP exposure) finalised as Material
Phase 4 Layer 6 (supplier concentration) is Material
Phase 4 has two or more Material layers (post-finalisation)
Phase 4 Layer 5 finalised as Material AND Phase 6 bear-IRR below 0%
Phase 6 bear-case IRR below 0% floor

CONDITIONAL triggers

Exactly one Material layer post-finalisation
Any DEFERRED criterion (11 or 12) post-backfill
Phase 6 expected-return-weighted below 12% threshold
Phase 5 aggregate reputation is SUSPICIOUS

Otherwise PASS. Full rules: Due-Diligence-Engine/verdict-aggregation-rules_2026-05-21.md (methodology amendment file).

Reference re-engagement-defaults.md — the AND-conditions schema

Defines the default AND-conditions per HARD PASS reason code. If a deal is HARD PASS because of supplier concentration, the default triggers are: "supplier table diversified to ≤30% top supplier AND 12 months of audited financials available AND price reduced by ≥25%." All three have to be true for the deal to come back to us. Three minimum per reason code — single triggers are gameable. Full per-code defaults: Due-Diligence-Engine/re-engagement-defaults_2026-05-21.md. Measurement-source paths pinned per condition so Phase 8 HubSpot recall serialises the exact downstream field to re-check.

Gate Four-state gate aggregation — PASS / CONDITIONAL_PASS / HARD_PASS / OVERRIDE_PROCEED

Phase 7's per-phase gate state

Phase 7's own per-phase gate state mirrors the four-state model every other phase uses: PASS, CONDITIONAL_PASS, HARD_PASS, OVERRIDE_PROCEED. HARD_PASS fires when the terminal verdict is HARD_PASS. CONDITIONAL_PASS fires when input contract has missing reasons OR terminal verdict is CONDITIONAL.

The terminal verdict is three states, NOT four

The IC reads terminalVerdict: PASS | CONDITIONAL | HARD_PASS. OVERRIDE_PROCEED is per-phase gate state only — it does not propagate as a fourth terminal verdict. The IC sees the override on the deal page next to the verdict it modulated, but the verdict stays in the three-state enumeration.

Override codes

BACKFILL_DIVERGENCE_ACCEPTED — Phase 1 gate PASS but backfilled criterion 11/12 FAIL; analyst accepts gate
MULTIPLE_MATERIAL_LAYERS_OVERRIDE_ACCEPTED — Phase 4 ≥2 Material; analyst attests deal correctly priced
BEAR_CASE_PHASE6_OVERRIDE_PROPAGATED — Phase 6 BEAR_CASE_BELOW_FLOOR_ACCEPTED; Phase 7 propagates
REPUTATION_PHASE5_OVERRIDE_PROPAGATED — Phase 5 SELLER_GAMING_FLAG_ACCEPTED etc; propagated forward
PRE_MORTEM_DOWNGRADE_ACCEPTED — arithmetic says PASS; pre-mortem surfaces deal-killer; analyst downgrades
CONDITIONAL_OFFER_ANALYST_ATTESTED — conditional-offer terms outside verdict-assembly arithmetic
TEMPLATE_VERSION_SKEW_ACCEPTED — Phase 4 ran against older template version than Phase 7 snapshot

Override-against-which-phase

Per-phase overrides remain logged against the contributing phase. A Phase 6 override stays at Phase 6 with its phase-specific reason code; Phase 7's collectUpstreamOverrides reads each contributing phase's override state and surfaces the propagation chain via the gate-log audit trail. Phase 7's own override codes (above) cover only cross-phase aggregation decisions.

API BYO pre-mortem generation — 5 risks across operating / market / financial / governance dimensions

The adversarial pre-mortem skill takes the rule-derived verdict + the structured charges and generates five risks total — typed by category across operating, market, financial, and governance dimensions. 18-month measurement horizon per methodology pin. Cross-vendor cross-check on the risk ranking lands with the byo-pre-mortem SKILL.md authoring cycle (deferred per audit-quad scope-down). Pre-mortem findings inform but do not normally modulate the verdict; the explicit override path PRE_MORTEM_DOWNGRADE_ACCEPTED handles the case where the analyst escalates a pre-mortem finding to deal-killer.

Output phase7-gate-log.json + verdict assembly + Phase 7 → Phase 8 / Phase 10 stable handshake

Gate log

10_dd-output/{date}_{target-slug}_phase7-gate-log.json — persisted via the shared writeGateLogFile helper. Contains: methodology version + ten-layer-template version, input contract state, full verdict assembly (terminal verdict, reason code, structured charges, criterion backfills, finalised layer verdicts, re-engagement triggers, pre-mortem risks, conditional offer if any), upstream-override propagation chain, gate state, override (if any).

Phase 8 / Phase 10 read paths

Phase 8 (HubSpot recall) reads phases.7.terminalVerdict, phases.7.reasonCode, phases.7.charges, phases.7.reEngagementTriggers. Phase 10 (memo render) reads all of the above plus phases.7.criterionBackfills (memo Section 3), phases.7.finalisedLayerVerdicts (Section 7), phases.7.preMortemRisks (Section 11), phases.7.assembly.conditionalOffer (Sections 2 + 4). Pinned in engine-sync SKILL.md to prevent silent rename drift.

Idempotency contract

Phase 7 reads the priorOutputs snapshot as-passed at invocation. If an upstream analyst override lands AFTER Phase 7 has completed, Phase 7 does NOT auto-rerun — an explicit re-execution flag (recorded against the run, e.g. runs.requires_phase7_rerun = true) is required. The deal-page surfaces the pending-rerun state so the IC sees the verdict is stale.

7.5Phase 7 verdict sense-check

Before the HubSpot recall write fires, four independent reviewers audit Phase 7's verdict-assembly output. The engine's highest-stakes adversarial checkpoint — a wrong verdict flows to HubSpot, the buyer, and the deal closing or not. BLOCKING halts before Phase 8 can persist a wrong verdict.

Phase 7 is the convergence point. The verdict label is rules-canonical (deterministic arithmetic, not LLM-decided); the rationale prose is LLM-emitted under a ledger-preservation discipline. Phase 12 reviews the rendered memo, but lives downstream of the assembly — the wrong-direction-verdict class of error has already propagated by then. Phase 7.5 IS the cross-vendor cross-check that Phase 7 audit M7 called for. Locked by decision page D4 (Add — highest priority of the four picks). Closes open-items #40 (verdict-assembly cross-vendor cross-check), #41 (cross-deal recall), ties to #38 (verdict-assembly methodology amendment). Charge sheet at app/src/orchestrator/charge-sheets/phase7_5_sense_check.ts.

The eight charges cover: terminal-verdict arithmetic reconstructability — a reviewer's recomputation disagreeing with the engine is the highest-signal finding the run can produce (CH-01), reason-code principledness against the methodology-pinned enum (CH-02), charges-array completeness and grounding (CH-03), rationale-prose ledger preservation (CH-04 — methodology amendment #38), Phase 6 LLM-judgment portions folded in per decision page D3 — bear-case narrative + flip-viability framing (CH-05), BYO pre-mortem shape — exactly 5 risks across ≥3 of 4 categories (CH-06), cross-deal recall agreement vs prior runs of the same/similar target (CH-07 — closes #41), Phase 8 / Phase 10 handshake schema preservation (CH-08).

Why Phase 6 doesn't get its own .5 quad: Per decision page D3 the deterministic verifyPhase6Math recomputes Phase 6's IRR / MOIC / sensitivity-grid / bear-case-severity threshold inside Phase 6 itself — sub-second, $0 inference cost, strict superset of what LLM judgment would catch on arithmetic. The LLM-judgment portions of Phase 6 (bear-case narrative + flip-viability framing) fold into Phase 7.5's charge sheet (CH-05) instead, so they get adversarial review at the verdict layer rather than paying for a separate 6.5 dispatch.

8Write the verdict back to HubSpot

Updates the deal record in HubSpot so the next time anyone touches this deal, the recall in Phase 0 picks up the verdict, reason code, and re-engagement triggers.

What runs in this phase

API HubSpot — PATCH the deal record

One write. Updates the deal record with: methodology version, verdict, reason code, AND-conditions, link to the memo in Drive, link to the run on dd.ecomma.co. The next time Phase 0 reads this deal, the recall is complete.

PATCH https://api.hubapi.com/crm/v3/objects/deals/{hubspotDealId}
Authorization: Bearer {hubspot_token}
Content-Type: application/json

{
  "properties": {
    "ecomma_verdict": "HARD_PASS",
    "ecomma_reason_code": "supplier_concentration",
    "ecomma_re_engagement_triggers": "[...]",
    "ecomma_memo_url": "drive.google.com/...",
    "ecomma_methodology_version": "v3.0.0"
  }
}

9Ask the seller to respond

For non-HARD-PASS verdicts, the engine sends the seller a summary and asks for their reply. Their response (or non-response) gets folded into the memo.

This phase is optional and only fires on full DDs that didn't HARD PASS in earlier phases. The point: give the seller a fair shot at addressing concerns before the verdict goes final. If they don't respond within 14 days, that's recorded and the run proceeds without their input.

What runs in this phase

Tool Email + portal intake — sends the seller a structured response form

The engine emails the seller a link to a structured response form. Each open issue from the prior phases becomes a question. The seller's answers flow back into the engine and become a section in the memo.

10Write the memo

Assembles the final Word document — 14 sections, every numerical claim tagged for source quality, every claim traceable to evidence.

The memo is the deliverable. It's a real .docx file using the Ecomma house-style template, 30–50 pages, structured the same way every time. Each section pulls from the prior phases' outputs. The thesis-tracker scorecard goes in as an appendix.

What runs in this phase

Skill thesis-tracker — scorecard appendix for post-acquisition tracking

Builds a per-deal scorecard: the falsifiable thesis statement, the supporting pillars, the risks, the catalysts, the pre-committed stop-loss triggers. If we acquire the deal, this scorecard is what the operating team tracks monthly to know whether the thesis is still intact or whether it broke.

Reference memo-template.docx — Ecomma house-style memo template

The fourteen memo sections

Executive summary
Verdict
Thesis fit (all 12 criteria scored)
Financial DD
Cash flow + working capital
Comparables
Ten-layer findings
Reputation
Three-scenario model
LBO returns
BYO pre-mortem
What we did NOT verify (the explicit disclosure)
Quad Review disposition
Appendix — thesis-tracker scorecard

Output {target}_dd-memo_v3.docx — the headline deliverable

The DD memo. Every numerical claim in the document is tagged [verified], [derived], [inferred], [benchmark], or [judgment] so the reader can see at a glance what's forensic and what's estimated. Inferred and judgment numbers render visually distinct in the docx.

11Check our own work

Three integrity checks run before anything ships. If any fail, the verdict doesn't go out.

What runs in this phase

Script consistency-check.py — numbers in the memo must match the numbers in the model

What it is

A Python script. Walks every cross-reference in the run's artefacts: revenue stated in the memo's executive summary has to match revenue in the financial DD section, which has to match the model's revenue line, which has to match the data pack's revenue cell. Any drift is a hard error.

Why a script, not an AI

This is exact-match arithmetic. Cheaper, faster, and more reliable to do mechanically than to ask an AI.

Skill audit-xls — final QA on every Excel artefact this run produced

The same skill from Phase 2, but run against every Excel file produced this run — the 3-statement model, the LBO model, the data pack, the cash-flow sidecar. Catches anything the scenario tweaks or model-update steps introduced.

Reference Source-tier tag verifier — every numerical claim must be tagged

The five tags

[verified] — from a sourced document with citation
[derived] — arithmetic from verified inputs (e.g. GM = GP / Revenue)
[inferred] — assumption made because data is missing, with explicit caveat
[benchmark] — category baseline from a cited industry source
[judgment] — analyst opinion / scenario probability

The verifier scans the memo prose. Every numerical claim must carry one of the five tags. Untagged claims block render. This is the anti-hallucination guard — added after Carrora, where unstated assumptions slipped into the memo as if they were facts.

12Four reviewers argue about the verdict

The finished memo gets sent to four different AI models in parallel. Each one critiques the verdict on eight standard charges. If they disagree with us, we look again.

This is the Quad Review. The reason it matters: a single AI can be confidently wrong, and a single analyst can miss something. Sending the memo to four different model providers — Claude, GPT, Gemini, DeepSeek — and asking each to attack it on the same eight charges catches what one reviewer alone would miss. If all four uphold us, the verdict ships. If they disagree among themselves, the analyst adjudicates. If they all flag the same thing, the verdict gets restructured or overruled. Caution wins ties: if even one reviewer says OVERRULED and any other says RESTRUCTURED or worse, the memo cannot ship as drafted.

What runs in this phase

API Anthropic — Claude — reviewer #1

Claude reviews the memo against the eight standard charges. For each charge, Claude returns one of: SUSTAINED (the charge holds), DISMISSED (the charge doesn't apply), or MODIFIED (the charge applies in a slightly different form). Claude is also the model that drafted the memo, so its review is treated with the lightest weight — it has a coherence bias toward its own output.

API OpenAI — GPT — reviewer #2, different model family

Independent reviewer trained on a different corpus and with different alignment objectives. Same eight charges. GPT tends to catch unstated assumptions and over-claimed certainty.

API Google — Gemini Pro — reviewer #3, different model family

Same eight charges, fresh perspective. Gemini was trained differently than Claude, so it catches different things. Particularly strong on completeness — flagging weaknesses the memo failed to address.

API DeepSeek — V4 — reviewer #4, the contrarian

Fourth independent reviewer. Different training origin from the other three. In practice, DeepSeek consistently pushes back on overreach, gold-plating, and assertions presented as facts when the source material is ambiguous — useful adversarial pressure that the other three (more consensus-leaning) reviewers sometimes miss.

Reference The eight standard charges — what every reviewer is asked to evaluate

Verdict-evidence mismatch — does the conclusion follow from the evidence presented?
Unsourced numbers — any number in the memo without a source citation?
Hallucinated comparables — any comp that doesn't actually exist or isn't actually similar?
Unstated assumptions — anything treated as fact that's actually an assumption?
Missed counter-evidence — any signal in the data that should have been addressed but wasn't?
Override path bypass — has the methodology's required override path been followed for any conditional pass?
IP-verification gaps — for catalogues with third-party trademarks, was Lesson 11 applied?
Recall registry compliance — has the verdict been written back to HubSpot with proper AND-conditions?

Script propagate-lesson — if reviewers found a real gap, update the methodology

When it fires

Only when the Quad Review returns RESTRUCTURED or OVERRULED. The script walks the cross-reference graph deterministically — for the lesson identified, which reference files should change? It proposes diffs, runs a regression on the Aveugle fixture to make sure the new rules don't break the old verdict, and increments the methodology version. The library gets smarter on every restructured deal.

Output trifecta-disposition.json — the four reviewers' votes + final aggregation (file name retained for code compatibility; rename to quad-disposition.json is pending migration F21)

Structured record of each reviewer's verdict on each of the eight charges, plus the final disposition: UPHELD (memo ships as-is), RESTRUCTURED (verdict outcome holds but reasoning gets revised), or OVERRULED (the verdict itself is wrong and affected phases need to be re-run). This file is included in the deal's Drive folder and surfaced on the deal page so anyone reading the memo can see what the reviewers said.

Filename note: the file is currently named trifecta-disposition.json for historical reasons (the review was three-vendor before DeepSeek joined). The schema rename to quad-disposition.json ships with engine migration F21 in coordination with the trifecta_verdicts table rename.

Override paths — when the gate is wrong, on purpose

Every gate decision in the engine — Phase 0.5's hygiene gate, Phase 1's thesis gate, Phase 7's verdict, Phase 12's Quad Review — can be overridden by an analyst with the right authority and a documented reason. Overrides are not a back door. They are a first-class part of the methodology: the IC sometimes has strategic information the engine cannot see, prior deal context can justify proceeding on a numerical miss, and refusing to allow overrides would force the analyst to silently work around the gate. The override path makes those decisions visible, defensible, and reviewable.

The override discipline is built on three principles: every override is captured with a reason, every override propagates into the deal memo prominently, and every override is reviewable in aggregate so patterns surface.

⚖How overrides work

The gate's vote is preserved. The override is recorded alongside it, with a typed reason, an analyst identity, and an IC-visibility flag. The memo surfaces the override prominently.

What the override path enforces

Mechanism Override scope — gate-override-with-proceed, never gate-skip

The gate still runs

An override does not skip the gate. The gate runs, the criterion is scored, the verdict is captured. The override is the analyst's decision to proceed despite the gate's verdict — not to bypass the scoring entirely. This means the engine always has the gate's verdict on record, and the memo can say "the gate said HARD PASS on criterion 8 because GM was 32%, below the 50% Apparel floor; the analyst overrode with reason X." Both states are visible.

Per-criterion override eligibility

The acquisition-thesis.md reference declares override_eligibility per criterion as a structured field with allowed and valid_reason_codes. The twelve thesis criteria have allowed: true with a constrained valid_reason_codes set — e.g., niche fit (C1) accepts IC_STRATEGIC or CATEGORY_OUTLIER_JUSTIFIED; revenue floor (C3) accepts DATA_GAP_TEMPORARY (with mandatory expiry) or IC_STRATEGIC; gross margin (C8) accepts CATEGORY_OUTLIER_JUSTIFIED or IC_STRATEGIC, never DATA_GAP_TEMPORARY alone. The seven hard disqualifiers (edible/perishable, adult, self-fulfilled, banned ad accounts, active legal disputes, uncooperative seller, asset-transfer blockers) have allowed: false with valid_reason_codes: [] — they are not analyst-overridable at all. The pathway for proceeding on a disqualifier is the separate HubSpot IC-waiver workflow: a deal record carries an explicit waiver decision made by the IC before the run starts; the engine reads that at run-start and treats the disqualifier as pre-cleared for the run, never as an override decision made during the run. The engine rejects an analyst-set override whose reason_code is not in the criterion's valid_reason_codes list (or whose criterion has allowed: false) with a typed error surfaced on the deal page, not a silent acceptance.

Reference Override reason taxonomy — typed reason codes, not free-form text alone

Why typed reasons

Free-form override text is unreviewable in aggregate. Typed reasons let the engine answer questions like "which criteria get overridden most often, by which analysts, with which justifications, and how do those overrides perform downstream?"

The reason codes

IC_STRATEGIC — Investment Committee pre-flagged this deal for strategic reasons that override the criterion. Free-form supplement required; IC member identified.
DATA_GAP_TEMPORARY — The criterion failed because of a known data gap that's being closed in a separate workstream. Override is time-bounded; expires at a stated date.
CATEGORY_OUTLIER_JUSTIFIED — The target is an outlier within its category in a way that makes the methodology threshold misapplied. Comparable target citation required.
FOUNDER_RELATIONSHIP — Pre-existing founder relationship justifies engagement despite the gate signal. Relationship documented in HubSpot.
LESSON_PRE_EARNED — The criterion was added after this deal was already in flight; retrospective scoring should not block continuing.
OTHER — Catch-all for cases not covered above. Free-form text required; reviewed by IC at the next monthly cadence to decide if a new reason code is needed.

Required fields for every override

reason_code — one of the typed reasons above.
reason_text — free-form supplement, mandatory regardless of reason code.
analyst — Google OAuth subject of the analyst who set the override.
ic_visibility — boolean. Default true. False only if the IC has separately authorised the override class for this deal stage.
expires_at — UTC date. Required for DATA_GAP_TEMPORARY; optional otherwise.
set_at — UTC timestamp, captured by the engine.
criterion_or_gate — which gate / criterion the override applies to.

Output Override surfacing — in the memo, on the deal page, in the aggregate view — overrides are visible to everyone, not buried

In the memo

Every override surfaces in the executive summary of the deal memo, not in an appendix. The pattern: "Phase N's gate verdict was X on criterion Y; the analyst overrode with reason Z; the override has not been retired as of memo render time." The reader sees both the gate signal and the override decision before reading the analytical body.

On the deal page

The deal page renders an override banner alongside the run progress. Clicking expands to the full override record. Analysts can retire an override (mark it resolved with explanation) which the audit log records; retiring an override does not remove it from history.

In the aggregate view

The repository page hosts an "Overrides" view that lists every override across every run: which criterion, which reason code, which analyst, which deal, what the downstream verdict was, whether the override held up. The IC reviews this view at the monthly cadence. Patterns that emerge — same criterion overridden 80% of the time, same analyst overrides 5× more than others, override-reason FOUNDER_RELATIONSHIP correlates with deal failure — drive either a methodology amendment or an analyst conversation. Override aggregation is how the methodology learns from its own escape hatches.

Discipline What overrides do not do — guardrails on the escape hatch

Overrides do not carry across runs. Re-running the same deal six months later requires re-justifying any override. Old overrides do not silently extend.
Overrides do not change the gate's record. The gate verdict is preserved verbatim. The override sits alongside it.
Overrides do not auto-propagate to similar deals. An override is per-deal. The IC monthly review is the path for promoting an override pattern into a methodology amendment.
Overrides do not bypass the hard disqualifiers. The seven disqualifiers require an explicit IC waiver in HubSpot before the run starts; analyst-level override cannot clear them.
Overrides do not delete the failure-template content. The memo's "criterion X failed because" sentence still gets written. The override adds context; it does not redact.

Research depth — the seven surfaces, the engine sweep, the claims ledger

Most due-diligence shops do "research" by Googling the target's name, reading the website, and pulling a handful of comps. We don't. The engine reaches into seven distinct research surfaces, dispatches the three biggest deep-research products on earth in parallel as sub-tools, validates every claim against four independent dimensions of authority, and runs two separate adversarial passes against the result. This is the part that catches what nobody else catches.

Source surfaces

Free public APIs

External deep-research engines

Adversarial passes

🔍The seven research surfaces

Where the engine looks. Each surface has its own specialist; multiple specialists run in parallel for any given research question.

The seven surfaces

Web Web — search, scrape, browse, plus 13 free public APIs

Tools

Google + Bing search, Firecrawl (renders JavaScript-heavy pages), Playwright (full browser automation when scraping needs to act like a human), agent-browser (the engine can navigate any site interactively), Wayback Machine (historical snapshots).

13 free public APIs the engine consults by default

GDELT — global news event database, real-time sentiment per region
Wayback Machine — historical snapshots of any URL. Verifies operating age, catches deleted claims, audits website-style changes that often precede a sale.
Reddit — full search across all subreddits with reviewer-history visibility
HackerNews — surfaces founder coverage, product launches, technical-community sentiment
Stack Overflow — useful for software-adjacent ecommerce targets and supplier API issues
OpenAlex — academic paper index, used for category research and supplier provenance
arXiv + bioRxiv — preprints, mostly for supplements / biotech-adjacent verticals
SEC EDGAR — public-company filings. Used when the target has a public-company customer or supplier or any predecessor public entity.
USPTO — US trademark verification (already used in Phase 4 Layer 9)
EPO — European Patent Office (already used in Phase 4 Layer 9)
Court Listener — US federal + state litigation database. Founder background, predecessor-entity disputes, IP suits.
Crunchbase-free — basic funding history and key-people data

Why this matters for DD: most "founder background" surface area lives across these APIs. Court Listener catches lawsuits the seller didn't disclose. Wayback catches website history mismatches. GDELT catches regional press mentions in languages we don't speak. None of these cost anything. None of them require manual checking.

Internal Internal knowledge — Notion, Slack, Atlassian, Gmail, Drive, Box

What it is

Ecomma's own institutional memory. If a partner has discussed this seller in Slack three months ago, if a previous DD memo on a competitor lives in Drive, if a prior outreach attempt is sitting in Gmail, the engine should find it before drawing conclusions.

What it adds to a DD

Catches when a seller has been pitched before under a different name (the engine searches all six surfaces for the founder's name + the entity name + the domain)
Surfaces prior internal opinions on the category (someone walked from a similar deal six months ago — why?)
Pulls relevant prior-deal pre-mortems so we can see if the same risk recurred

HubSpot recall already covers part of this for deal-level memory. The internal-knowledge surface goes wider: any document anyone at Ecomma has written about anything related to this deal.

Market Market intelligence — SimilarWeb, Ahrefs, Supermetrics, HubSpot, Klaviyo, Amplitude

The two we don't use yet but should

SimilarWeb gives us the target's actual web traffic — visits, sources, country mix, engagement, year-over-year trend. The seller can lie about traffic; SimilarWeb sees it independently. Plug into Phase 1 thesis-fit and Layer 6 (Channel).

Ahrefs gives us SEO health: organic keyword rankings, backlink profile, domain authority, content gaps versus competitors. Catches deals that look healthy on paid traffic but have zero organic moat — those die fast post-acquisition when ad costs rise. Plug into Layer 6 + Layer 7 (Competition).

The ones we already touch

HubSpot is in Phase 0 recall and Phase 8 writeback. Klaviyo is in Phase 2 commerce data. Supermetrics aggregates Meta/Google Ads/TikTok and would simplify Phase 2 marketing data pulls. Amplitude gives product-analytics depth if the target has it instrumented.

Warehouse Warehouse — BigQuery, Hex, Definite

For deals where the target has invested in a real data infrastructure, the engine can query the warehouse directly. Mostly applies to larger deals; smaller ecommerce brands rarely have one. When present, it gives ground-truth on cohort retention, LTV, returns, and channel attribution — the Phase 2 financial DD becomes much sharper.

Video Video — YouTube transcripts

What it does

The engine pulls transcripts of every YouTube video that mentions the target — founder podcasts, product reviews from creators, unboxing videos, complaint videos, category overviews. Then it searches the transcripts for sentiment, claims, contradictions with the seller's stated story.

What this catches

Founders who said one thing on a podcast 18 months ago and a contradictory thing in the LOI today
Reviewer accounts of product defects that never made it to written reviews
Podcast appearances where the founder discussed competitors, supply problems, or planned exits
Creator partnerships the seller didn't disclose as marketing dependencies

This is one of the highest-yield, lowest-cost surfaces in the stack. Almost nobody does it.

Codebase Codebase — Glob, Grep, Read

For technical due-diligence on rare deals where the target has custom-built software (a Shopify app, a fulfilment system, a custom storefront). Reads the codebase to assess code quality, dependency risk, security posture, and transferability. Most ecommerce-brand acquisitions don't need this; when they do, it's load-bearing.

External External deep-research engines — see dedicated section below

Gemini Deep Research, OpenAI Deep Research, Perplexity Sonar, and NotebookLM all run in parallel as sub-tools. Documented in the next card.

⚡The external engine sweep — three deep-research products in parallel

For any research question worth doing properly, the engine dispatches Gemini Deep Research, OpenAI Deep Research, and Perplexity Sonar in parallel, then reconciles their outputs. We treat the three biggest deep-research products on earth as sub-tools.

Most DD shops use one of these as their main research tool, if any. We use all three and force them to corroborate. When all three converge, we have triangulated truth. When they diverge, we have a flag worth investigating. The cost across all three is under a dollar per research question.

The four engines

API Gemini Deep Research — Google's deep-research mode, runs ~5-10 min, produces a sourced report

Google's flagship research-agent product. Reads dozens of sources for each query, builds a structured report, cites every claim. Strongest on category-level research, market-size questions, regulatory landscapes. We invoke it with the deal's category + geography + the specific question (e.g. "What is the trend in repeat-purchase rates for premium pet-care DTC brands in the EU 2024–2026?") and get back a sourced report.

API OpenAI Deep Research — ChatGPT's deep-research mode, multi-agent search

OpenAI's research agent. Different training, different search bias, different conclusions on borderline questions. Particularly strong on financial-market context, M&A history, and corporate-actions data. Same question goes to all three engines simultaneously.

API Perplexity Sonar — Perplexity's API access, real-time web with citations

Perplexity's research API. Faster than the other two (seconds, not minutes), real-time web indexing. Good for "what's been said about this entity in the last 24 hours" sweeps and live-news layered queries. Often catches things the slower agents miss because the index is fresher.

API NotebookLM — Google's source-grounded research, used selectively

NotebookLM is different from the other three — it researches against a specific corpus you give it, not the open web. We use it when we want to ask "what does this corpus of seller-provided documents actually say?" to cross-check seller-narrated claims against the seller's own documents. Useful in Phase 2 financial DD and Phase 9 seller-response.

Skill Reconciliation — how three reports become one ledger

What it does

The three engines return three reports. Reconciliation walks claim by claim and tags each one:

Triangulated — all three engines agree. Highest confidence.
Two-engine consensus — two agree, third either silent or different. Medium confidence; the third's position recorded as the contrarian view.
Diverged — all three disagree. Flagged for analyst review. Often the most interesting signal in a DD.
Single source — only one engine surfaced this. Recorded but tagged as low-corroboration.

The result is a single claims ledger with explicit corroboration metadata, ready to feed into the memo.

📋The claims ledger — provenance discipline on every claim

Every researched claim that goes into the memo carries four independent dimensions of authority. Source, date, author authority, confidence. Quoted material is grep-verified against the original source.

Source-tier tagging (the five tags from Phase 11) is the lite version of this. The claims ledger is the heavy version. The two work together: tier tags classify the type of evidence, the ledger records the chain of custody.

The four required dimensions per claim

Dimension Source — where the claim came from, with URL or document path

Every claim must cite a source — a URL, a Drive document path, a HubSpot record ID, a Stripe transaction ID. Not "industry sources" or "general knowledge." A specific, addressable, verifiable origin. The render pipeline refuses to ship a claim without a source.

Dimension Date — when the source was published or recorded

A 2019 article about the category is not the same as a 2025 one. Every claim records the source date so the reader can weight it. Stale claims get explicit "as of YYYY-MM" notation in the memo. Critical for fast-moving categories where a year-old benchmark is already wrong.

Dimension Authority — who published the source, and what's their standing

Statista is not a peer-reviewed paper. A Reddit thread is not a Bain report. Every source gets a tagged authority level: regulator (SEC, USPTO, FDA), peer-reviewed (academic), institutional (Bain, McKinsey, BCG, named-firm reports), journalist (named publications with editorial standards), analyst (Statista, eMarketer, named research firms), community (Reddit, HackerNews, forums), seller-provided (any document the target gave us), self-published (blog posts, opinion pieces).

Authority shapes how much weight the claim gets in the verdict assembly.

Dimension Confidence — how much we trust this specific claim, given the source + corroboration

A composite score from 0 to 100 based on: authority of the source, how recent it is, whether other sources corroborate, whether the engine could grep-verify the quoted material, and whether the claim survived the contrarian pass. Claims under 60% confidence go in claims-low-confidence.md, kept for the analyst's review but not used in the memo's headline conclusions.

Script Grep-verification — if a claim quotes material, the original is grep-checked

If the memo says 'According to Bain (2025), repeat-purchase rates in premium pet care average 38% in year-two', the engine fetches the Bain source and greps for the literal "38%" figure. Any quoted statistic, percentage, or named fact has to literally exist in the cited source. Catches the entire class of LLM hallucinations where a model confabulates a number that sounds right.

⚔Adversarial separation — Contrarian and Red Team are not the same thing

The methodology already runs the Quad Review in Phase 12. Research-grade work splits adversarial review into two distinct passes that attack different things: the claims, then the methodology. Done by different agents, on different inputs, in different stages.

The two adversarial passes

Contrarian Contrarian pass — for every bullish claim, find the bearish counter-evidence

What it attacks

The Contrarian agent attacks the claims. For every supportive piece of evidence in the draft memo, it goes hunting for the opposite. If the memo says "the category is growing 18% YoY," Contrarian asks "is there evidence it's growing 8% YoY or shrinking? what does the bear case look like?" and surfaces it.

What it produces

contrarians.md — a per-claim register of counter-evidence. Each entry has source, date, authority, confidence (the same four-dimension ledger). The synthesizer is required to address every contrarian when drafting the next memo version.

Red Team Red Team pass — attacks the methodology, not the findings

What it attacks

Red Team attacks how we did the research itself. Not whether the conclusions are right — whether the way we got there was sound. "Did the comp set actually represent the target? Was the Quad Review question framed in a leading way? Were the discovery specialists given enough breadth? Did we let the seller's framing dictate which questions we asked?"

Why both, not just one

The Contrarian could miss something the methodology never had a chance to surface. The Red Team finds those failure modes — gaps in the research design itself. Run by the existing tribunal-runner in research-claim style.

What it produces

red-team-verdict.md — methodology critique. Sometimes the verdict is "the research was fine, just incomplete on dimension X." Sometimes it's "the research was structurally biased; rerun with these constraints." Either way, the memo can't ship until red-team is satisfied.

🔌Where this plugs into the DD methodology

The research apparatus is not a sidecar. Specific surfaces and capabilities plug into specific phases. The mapping below is what the engine does today (✓) and what's still on the integration list (·).

Phase	Research surface / capability	What it adds	Status
Phase 0	Wayback Machine	Independent verification of operating age (Criterion 2). First-capture date is ground truth.	·
Phase 0	Internal knowledge (Notion, Slack, Atlassian)	Surfaces prior internal opinions, predecessor pitches, related deals discussed in chat	·
Phase 1	SimilarWeb + Ahrefs	Independent traffic and SEO data; verifies thesis-fit Criterion 3 (revenue) and 6 (model)	·
Phase 4	External engine sweep (Gemini DR + OpenAI DR + Perplexity)	Sector overview becomes triangulated. Three independent reads on TAM, growth, value chain, key players.	·
Phase 4	USPTO + EPO + Court Listener	Layer 9 IP verification + founder/entity litigation history	✓ partial (USPTO + EPO; Court Listener pending)
Phase 4	SimilarWeb + Ahrefs	Layer 6 (Channel) and Layer 7 (Competition) get hard data on traffic mix and SEO moat	·
Phase 5	Trustpilot velocity	Spike-detection on review timing — catches bought reviews	✓
Phase 5	Reddit + HackerNews + GDELT	Wider reputation surface; multi-language news via GDELT	✓ partial (Reddit + news; HN + GDELT pending)
Phase 5	YouTube transcripts	Founder podcasts, review videos, podcast contradictions with current narrative	·
Phase 5	Court Listener	Founder litigation history, entity disputes, IP suits	·
Phase 7	Contrarian pass on the verdict	For every bullish piece of evidence in the draft, find the bearish counter	·
Phase 9	NotebookLM against the seller's own corpus	Cross-checks seller-narrated claims against the seller's own documents	·
Phase 11	Grep-verification of every quoted claim	Catches confabulated numbers and quotations	·
Phase 12	Red Team via tribunal-runner (research-claim style)	Attacks the methodology of the research itself, not the findings	✓ partial (the Phase 12 Quad Review is methodology-level too, but the dedicated research-claim style is pending)

Most of these are integrations from the research-arsenal plugin into the DD orchestrator. The wiring work is mostly mapping research-arsenal source classes to the right DD phase functions in src/orchestrator/phases.ts. Once wired, the same DD form submission produces a memo backed by an order of magnitude more research depth.

Filing & house-style branding — how files get organized

This isn't a phase. It's a cross-cutting concern that runs from the moment files arrive in Phase 0 to the moment artefacts ship in Delivery. The point: every file Ecomma touches lives in a predictable place, named the same way, and every document the engine produces looks like it came from Ecomma. Without this, the dataroom turns into chaos within three deals and analysts can't find anything six months later. The filing component is what makes the whole system durable.

📁The filing & branding pipeline

Inbound files get classified and re-filed into the standard deal-folder structure. Outbound artefacts get rendered through Ecomma house-style templates and filed under the canonical naming convention. Both directions, every run, every time.

What runs across the whole DD

Reference Standard deal folder structure (Schema v1.0) — the ten subfolders every deal gets, plus the in-run scratch areas

Schema version

Folder Schema v1.0 under methodology v3.0.0. Full enum + required-field contract is documented in Phase 0's Folder Schema detail row. Changes here cascade — the schema documented in this section and the schema enforced by Phase 0 are the same artifact.

Why a fixed structure

Every deal at Ecomma uses the same subfolders. An analyst who has never touched the deal can land in the folder and immediately know where the P&L is, where the supplier list is, where the memo is. Without this, every dataroom would be organised differently and the institutional memory would be useless.

The subfolders

{deal-name}/
  00_intake/             — original seller drop, immutable (audit trail)
    _processed/          — OCR'd and translated copies, with provenance
    _unclassified/       — quarantine for files needing manual classification
  01_methodology/        — rule-file snapshot + manifest.json + git SHA
  02_financials/         — P&L, balance sheet, audit, bank, tax
  03_commerce/           — Shopify, processor statements, orders
  04_operations/         — suppliers, 3PL, fulfilment, returns workflow
  05_legal/              — LOI, founder agreements, trademarks, contracts
  06_marketing/          — ad-spend, attribution, channel breakdowns
  07_crm/                — customer database, lifecycle, segmentation
  10_dd-output/          — engine-produced artefacts this run
  20_correspondence/     — emails with seller, IC notes, slack threads

00 vs 10

The 00_intake folder is sacrosanct — original files arrive here and never get moved. Classified copies go to 02–07; ambiguous files get quarantined in 00_intake/_unclassified/ for manual classification rather than dumped into a catch-all bucket. If a classification turns out wrong six months later, the original is still there to re-process. The 10_dd-output folder is purely engine-produced; nothing manual goes in there.

Reference Naming convention — every artefact follows the same pattern, forever greppable

The pattern

{date}_{target-slug}_{artefact-kind}_{methodology-version}.{ext}

Examples

2026-05-10_carrora-watches_dd-memo_v3.0.0.docx
2026-05-10_carrora-watches_three-statement-model_v3.0.0.xlsx
2026-05-10_carrora-watches_lbo-model_v3.0.0.xlsx
2026-05-10_carrora-watches_byo-pre-mortem_v3.0.0.docx

Why every part matters

Date first — sorts chronologically in any folder view.
Target slug — fast filtering for one deal across mixed folders.
Artefact kind — finds every memo across all deals with one search.
Methodology version — when we re-run a deal under v3.1, the v3.0.0 file is still there for comparison. Nothing gets overwritten.

This convention isn't optional. The render pipeline refuses to write a file that doesn't conform.

Tool House-style templates — every output uses an Ecomma-branded template

What "house-style" means

Every artefact the engine produces — memos, models, decks, sidecars — is rendered through an Ecomma-branded template. Same fonts (Inter for body, Playfair Display for headings), same color palette (navy + cyan), same logo placement, same footer (Ecomma · Internal · DD platform · methodology version). The IC opens any document and recognises it instantly as one of ours.

Where the templates live

In project-starter under plugins/finance/ecomma-dd/templates/. One template per artefact kind: memo-template.docx, three-statement-model-template.xlsx, lbo-model-template.xlsx, comps-deck-template.docx, etc. The template is the brand.

The branding script

A small Node script called gen-house-style.js applies the house-style brand to any document the engine produces. It sets fonts, colors, header and footer, logo. The skills produce the content; the branding script makes it look like Ecomma. Both run before the file is filed in 10_dd-output/.

Script Inbound classifier — what figures out where each incoming file belongs

How files get labelled

Every file that lands in 00_intake/ goes through a two-step classifier. First a fast filename + content-shape heuristic (a file called P&L_2025.xlsx with revenue/cost/profit headers is almost certainly a P&L). Second, for ambiguous files, a short Claude call that reads the first page and assigns a category. The result is a label like financials.pnl or operations.supplier-list or legal.trademark-cert.

Then it moves the file

A copy of each file gets placed in the matching subfolder, renamed under the standard convention. The original stays in 00_intake/ untouched. Every move is logged so the analyst can see exactly what the classifier did.

What if it's wrong

Misclassifications get corrected on the deal page with one click. The file moves to the new folder and any downstream references update.

Script Outbound filer — writes engine-produced artefacts to the right places

What it does

Every artefact a phase produces (memo, model, deck, sidecar) is handed to the outbound filer. The filer applies the naming convention, applies the house-style branding through gen-house-style.js, writes the file to 10_dd-output/ in the deal's folder, and inserts a row in the artefacts table so the deal page can surface it.

What it refuses to do

Write a file with a non-conforming name. Write a file without house-style branding applied. Write a file without recording it in the artefacts table. Any of these would break the audit trail; the filer enforces them at the boundary.

API Google Drive — write side — how files actually land in Drive

The Drive API is what physically writes the files. Every move, every rename, every new file is a Drive API call. The engine has write permission scoped to the deal's folder only — it cannot touch anything outside that folder. Every write is logged so we can audit exactly what changed.

POST https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart
Authorization: Bearer {drive_token}
{ "parents": ["{deal-folder-id}"], "name": "2026-05-10_carrora-watches_dd-memo_v3.0.0.docx" }

Delivery — Drive, HubSpot, the deal page

Once the Quad Review lands, three things happen in fast sequence.

The memo and every artefact are written to Google Drive under the deal's folder. Naming convention is strict: {date}_{target}_{artefact-kind}_{methodology-version}.{ext}. The same convention applied to the source files filed during Phase 0 means everything in the dataroom is greppable forever.

The dashboard updates. The run moves from "In flight" to "Recently completed." The verdict pill appears next to the run name. The analyst gets an email.

The deal page populates. The memo is featured at the top, supporting documents are in a grid below, the run's full phase history is in a collapsible audit log, and the Quad Review disposition sits alongside the verdict. Anyone with access to the deal can click into any phase and see exactly what fired — which skills ran, which APIs returned what data, which references were consulted, which artefacts were produced, and at what cost.

Reproducibility, in one sentence: given the same deal, the same dataroom, the same methodology version, and the same firm-managed connector responses, the engine produces the same artefacts in the same order with the same verdict.