TASK-PLAN — QU100 pattern audit + config system (WS1)

Workstream summary ¶

WS	Goal	Effort	Risk	Decision
A	Replay the live pattern-layer ranking over 1yr `stock_prices`; full-composite replay deferred to WS3 (needs as-of money-flow selector)	M	Med	YES
B	Pattern forward-return audit → corpus (Parquet) + report	M	Low	YES
C	`champion.yaml` config system + deep-merge loader + history/registry (byte-identical seed)	M	Low	YES

Goal

Effort

Risk

Decision

Replay the live pattern-layer ranking over 1yr stock_prices; full-composite replay deferred to WS3 (needs as-of money-flow selector)

Med

YES

Pattern forward-return audit → corpus (Parquet) + report

Low

YES

champion.yaml config system + deep-merge loader + history/registry (byte-identical seed)

Low

YES

Problem ¶

We tune pattern matching blind: the per-pattern confidence weights and the 3-layer composite (money-flow 0.25 / sector 0.10 / pattern 0.65) were never checked against outcomes. Live data is ~2 weeks — too thin (n<50). One year of daily OHLC sits in Postgres stock_prices; we never replayed the detector over it to learn which patterns work.

What's replayable over 1 year, and what isn't. The pattern layer is fully derivable from stock_prices OHLC (run the detector as-of each day). Money-flow history exists — money_flow_snapshots covers 2020→present (~1,400 trading days, 276k rows; the 2026-06-04 backfill stamped 1,373) — but the live _screen_money_flow is latest-only: there is no as-of selector, and the backfill stamped every day with one shared captured_at, so point-in-time money-flow rank needs a new as-of query (latest captured_at within data_date ≤ t). Building that selector is its own work. So:

Pattern-layer audit → 1 year (the headline; answers "which patterns predict?"; feeds WS2 weight calibration; needs only the detector + prices).
Full 3-layer composite replay → deferred to WS3 (the data exists; it needs the new as-of money-flow selector). WS1's composite parity test runs on recent dates where the live latest-snapshot path already applies — no reconstruction needed.

stock_prices (1yr OHLC) ──► replay detector as-of t ──► pattern emissions + fwd returns
                                                         └─ per-pattern table (1yr)  ◄── headline
money_flow history exists (~1,400 days) BUT no as-of selector ──► full-composite replay = WS3 later
WS1 composite parity ──► recent dates only (live latest-snapshot path)
champion.yaml (weight sets + thresholds, seeded byte-identical)  ◄── the unit to tune later

Success criteria ¶

One command produces a per-pattern table (n, win-rate, mean/median fwd return at 5/10/20d, by regime incl. an unknown cell) over the 1yr window, reproducible.

The pattern-layer replay matches live consumption — same detect_patterns → _filter_actionable → best_pattern over the live ~6-month lookback. A parity test asserts the full ranking (ordering + best_pattern + composite score, not just symbol set) against screen_stocks; because the composite needs Layer-1 money-flow, that score assertion runs on recent dates (live latest-snapshot path) (the pattern-layer mechanism itself is exercised over the 1yr corpus).

champion.yaml seeded from today's effective config produces a byte-identical ranking on a fixture (behavior-preserving), proven by test.

The audit report names ≥1 concrete tuning hypothesis per later axis (WS2/WS3/WS4).

Deliverables ¶

WS A: pattern-layer replay module + a parity test vs screen_stocks (on recent dates (live latest-snapshot path) for the full composite).

WS B: corpus artifact (regenerable Parquet; added to CLAUDE.md disk-hygiene), a CLI (rainier pattern-audit or backtest-qu100 extension), and docs/REPORT-qu100-pattern-hit-rate.md (+ .html).

WS C: config/model/champion.yaml (seeded), the deep-merge loader (hooked into load_settings), config/model/history/, and a Parquet results registry.

Tests for each (below). Full gates green; codex + /review clean.

Work breakdown ¶

WS A — replay: for each symbol × trading day t, run the detector over the live ~6-month lookback window ending at t (matches live _fetch_stock_data period="6mo", ~126 bars — not all history, or pattern detection over a 250+ bar window diverges from live), then _filter_actionable → best_pattern. The pattern layer is the 1yr deliverable. The full 3-layer composite can only be reproduced where money-flow snapshots exist — build a parity test there that asserts ordering + best_pattern + composite score against screen_stocks. Note the composite double-counts sector (see Impl Notes) — the replay formula must match.
WS B — audit: attach forward returns (5/10/20d; null near window end) + regime tag (incl. unknown) to each emission; aggregate by (pattern, regime, horizon). Write corpus Parquet + the human report.
WS C — config system: champion.yaml holding the flat StockScreenerConfig field names matching settings.yaml:stock_screener (layer_weight_money_flow, layer_weight_pattern, strong_buy_threshold, …; pattern_weights is the one nested dict field) + a metadata header (version/parent/created/score); a loader that deep-merges defaults ← settings.yaml:stock_screener ← champion.yaml at the dict level before constructing StockScreenerConfig, hooked into load_settings (so both get_settings and load_settings_fresh see it); history dir + Parquet registry. Seed byte-identical to today. (The metadata header is stripped before model construction.)

Blockers (STOP and escalate, do not push) ¶

WS A pattern-layer: if the pattern-layer replay can't match the live detector path on a parity fixture, STOP — a non-faithful detector audit is worse than none.

WS A full-composite: money-flow history exists, but the live _screen_money_flow is latest-only. Do NOT build an as-of money-flow selector in WS1 (it's WS3 work) — scope WS1's composite parity test to recent dates where the latest-snapshot path applies. If tempted to reconstruct historical money-flow rank to extend composite parity, STOP — that's WS3.

WS C: if seeding champion.yaml can't produce a byte-identical ranking (config load-order side effects), STOP — a non-behavior-preserving "foundation" is a silent live change.

General: if stock_prices coverage is too thin (most symbols < ~200 bars) to yield usable per-pattern n, STOP and report.

Soft escalate (report, don't block): if unknown-regime emissions dominate the early window (SPY history not deep enough), surface it rather than silently absorbing them.

Acceptance criteria ¶

WS A: parity test — on recent dates (live latest-snapshot path), replay's selection ordering + best_pattern + composite score == screen_stocks on a fixture carrying both ≥126 price bars (so the 6-month windowing truncates) and a money-flow snapshot row (so the composite score is computable); look-ahead test — only bars up to t fed; determinism — identical corpus bytes on re-run (replay queries stock_prices with explicit ORDER BY symbol, date).

WS B: forward-return test — exact value for a known price path; near-window-end → null, not 0; aggregation groups by (pattern, regime, horizon) with per-pattern n surfaced and the unknown-regime cell reported (not dropped); directional-correctness column correct.

WS C: behavior-preservation test — seeded champion.yaml → byte-identical ranking on a fixture; precedence test — a champion.yaml setting one field proves unspecified fields fall through to settings.yaml (not to code defaults), i.e. deep-merge; hot-reload test — a champion.yaml change is picked up via load_settings_fresh without restart.

Risks ¶

Composite parity drift (sector double-count, 6-month windowing) → parity test asserting ordering+best_pattern+score is the gate.

Money-flow as-of gap → history exists but _screen_money_flow is latest-only; full-composite replay (needs an as-of selector) is deferred to WS3; WS1's composite parity uses recent dates; pattern-layer audit (the headline) is unaffected.

Detector look-ahead inflating win-rates → look-ahead + determinism tests.

Regime unknown dominance early in the window → SPY backfilled 200d before the window start; unknown cell reported.

Thin per-pattern cells (n<30) → report n; flag thin; don't over-tune.

Survivorship — universe = scraped names over history → disclosed in the report.

Two-DB footgun — registry is Parquet, off Neon; replay reads the legacy local TimescaleDB engine.

Non-goals ¶

WS2/WS3/WS4 (calibration, layer-rebalance, LLM-presentation) — held for the post-numbers design pass.

Any weight/threshold value change; any detector logic change.

Building the as-of money-flow selector / full historical composite replay (WS3); new price backfill; ML retraining; live champion/challenger A/B wiring.

Implementation Notes (for engineers) ¶

Live path to mirror: analysis/stock_screener.py screen_stocks(settings) (takes Settings, reads settings.stock_screener internally) → Layer 1 _screen_money_flow (latest QU snapshot, wall-clock staleness gate ~:222-233) + Layer 2 analyze_sectors + Layer 3 detect_patterns (analysis/stock_patterns.py:46) → _filter_actionable → best_pattern (= patterns[0] after the filter reorders).
Composite formula (sector double-count): the live composite is NOT signal_strength·w_mf + sector_boost·w_sector + best_confidence·w_pattern with independent layers — sector is folded into signal.signal_strength (stock_screener.py:68,377, sector_analyzer.py:223) AND sector_boost is added again (:100). The replay must reproduce this exactly or composite scores won't match. Refs stock_screener.py:68,100,377.
Why screen_stocks can't be reused verbatim: Layer 1 reads the latest snapshot + has a datetime.now() staleness gate, and Layer 3 does a live yf.download(period="6mo") — neither reproducible as-of t. Hence a faithful re-implementation + parity test (on recent dates (live latest-snapshot path) for the full composite).
Lookback window: detector runs over the live ~6-month span (period="6mo", ~126 trading bars) ending at t — window the bars, don't feed all history (max_pattern_bars=120, swing detection differ otherwise).
Money-flow as-of gap: money-flow history exists (money_flow_snapshots data_date 2020→present, ~1,400 days) — but as-of helpers exist only for cohort (paper/ingest.py:357) and sector (sector_analyzer.py:135), NOT for _screen_money_flow (latest-only; stock_screener.py:189 notes the backfill stamped all days with one shared captured_at, so an as-of query must pick latest captured_at within data_date ≤ t). Building that selector is WS3 work. WS1's composite parity runs on recent dates (live latest-snapshot path); the 1yr deliverable is the pattern layer.
Corpus source: Postgres stock_prices (OHLC) via core.database.get_session() (legacy local TimescaleDB) — not data/cache/qu100_backtest (adjusted-close only). Universe = symbols in money_flow_snapshots. Query with explicit ORDER BY symbol, date.
Regime tag: llm_thesis/research.py:compute_market_regime(*, as_of) (SPY vs 200-SMA; returns "unknown" with <200 SPY closes at/before as_of). SPY via paper/ingest.py:ensure_spy_history — backfill must cover 200 trading days BEFORE the audit window start or early emissions land in unknown.
Config: StockScreenerConfig (core/config.py); load_settings currently does StockScreenerConfig(**yaml["stock_screener"]) — a whole-object replace. The new loader must dict-level deep-merge defaults ← settings.yaml ← champion.yaml BEFORE constructing the model, hooked into load_settings (covers get_settings cache and load_settings_fresh wrapper). _filter_actionable's max_bars_since_breakout=10 / entry_proximity_pct=0.05 (stock_screener.py:454-455) are NOT in StockScreenerConfig — champion.yaml captures StockScreenerConfig fields only; those two constants are preserved by construction (out of the config surface), not tunable via the file this PR.
Forward return: close[t+H]/close[t] − 1 for H ∈ {5,10,20} trading days; within H of window end → null (never 0).
Determinism: no wall-clock; as-of date is t from the price index; corpus rows sorted by a stable key before write.
Manifest/registry: corpus + results registry are Parquet/CSV (feature-store convention); never a Neon market.* table.

TASK-PLAN — QU100 pattern audit + config system (WS1)

Goal ¶

Workstream summary ¶

Problem ¶

Success criteria ¶

Deliverables ¶

Execution order ¶

Work breakdown ¶

Dependencies ¶

143 (merged into `origin/main`) — `compute_market_regime`, shadow rails. The worker must branch off current `origin/main` (local `main` is stale at PR #142).

Blockers (STOP and escalate, do not push) ¶

Acceptance criteria ¶

Risks ¶

Non-goals ¶

Validation ¶

Implementation Notes (for engineers) ¶

Goal ¶

Workstream summary ¶

Problem ¶

Success criteria ¶

Deliverables ¶

Execution order ¶

Work breakdown ¶

Dependencies ¶

143 (merged into origin/main) — compute_market_regime, shadow rails. The worker must branch off current origin/main (local main is stale at PR #142).

Blockers (STOP and escalate, do not push) ¶

Acceptance criteria ¶

Risks ¶

Non-goals ¶

Validation ¶

Implementation Notes (for engineers) ¶

143 (merged into `origin/main`) — `compute_market_regime`, shadow rails. The worker must branch off current `origin/main` (local `main` is stale at PR #142).