champion.yaml is the unit of A/B + tracking; behavior-preserving this PRBuild the measurement + config foundation for pattern tuning — a pattern-layer forward-return audit over one year of prices, plus a single versioned champion.yaml — without changing any live weight or detector behavior.
| WS | Goal | Effort | Risk | Decision |
|---|---|---|---|---|
| A | Replay the live pattern-layer ranking over 1yr stock_prices; full-composite replay deferred to WS3 (needs as-of money-flow selector) |
M | Med | YES |
| B | Pattern forward-return audit → corpus (Parquet) + report | M | Low | YES |
| C | champion.yaml config system + deep-merge loader + history/registry (byte-identical seed) |
M | Low | YES |
We tune pattern matching blind: the per-pattern confidence weights and the 3-layer composite (money-flow 0.25 / sector 0.10 / pattern 0.65) were never checked against outcomes. Live data is ~2 weeks — too thin (n<50). One year of daily OHLC sits in Postgres stock_prices; we never replayed the detector over it to learn which patterns work.
What's replayable over 1 year, and what isn't. The pattern layer is fully
derivable from stock_prices OHLC (run the detector as-of each day). Money-flow
history exists — money_flow_snapshots covers 2020→present (~1,400 trading days,
276k rows; the 2026-06-04 backfill stamped 1,373) — but the live _screen_money_flow
is latest-only: there is no as-of selector, and the backfill stamped every day
with one shared captured_at, so point-in-time money-flow rank needs a new as-of
query (latest captured_at within data_date ≤ t). Building that selector is its
own work. So:
stock_prices (1yr OHLC) ──► replay detector as-of t ──► pattern emissions + fwd returns
└─ per-pattern table (1yr) ◄── headline
money_flow history exists (~1,400 days) BUT no as-of selector ──► full-composite replay = WS3 later
WS1 composite parity ──► recent dates only (live latest-snapshot path)
champion.yaml (weight sets + thresholds, seeded byte-identical) ◄── the unit to tune later
unknown cell) over the 1yr window, reproducible.detect_patterns → _filter_actionable → best_pattern over the live ~6-month lookback. A parity test asserts the full ranking (ordering + best_pattern + composite score, not just symbol set) against screen_stocks; because the composite needs Layer-1 money-flow, that score assertion runs on recent dates (live latest-snapshot path) (the pattern-layer mechanism itself is exercised over the 1yr corpus).champion.yaml seeded from today's effective config produces a byte-identical ranking on a fixture (behavior-preserving), proven by test.screen_stocks (on recent dates (live latest-snapshot path) for the full composite).rainier pattern-audit or backtest-qu100 extension), and docs/REPORT-qu100-pattern-hit-rate.md (+ .html).config/model/champion.yaml (seeded), the deep-merge loader (hooked into load_settings), config/model/history/, and a Parquet results registry.champion.yaml deep-merge loader seeded byte-identical (behavior-preserving). Lowest risk, unblocks the rest.stock_prices; uses the config object).t, run the detector over the live ~6-month lookback window ending at t (matches live _fetch_stock_data period="6mo", ~126 bars — not all history, or pattern detection over a 250+ bar window diverges from live), then _filter_actionable → best_pattern. The pattern layer is the 1yr deliverable. The full 3-layer composite can only be reproduced where money-flow snapshots exist — build a parity test there that asserts ordering + best_pattern + composite score against screen_stocks. Note the composite double-counts sector (see Impl Notes) — the replay formula must match.unknown) to each emission; aggregate by (pattern, regime, horizon). Write corpus Parquet + the human report.champion.yaml holding the flat StockScreenerConfig field names matching settings.yaml:stock_screener (layer_weight_money_flow, layer_weight_pattern, strong_buy_threshold, …; pattern_weights is the one nested dict field) + a metadata header (version/parent/created/score); a loader that deep-merges defaults ← settings.yaml:stock_screener ← champion.yaml at the dict level before constructing StockScreenerConfig, hooked into load_settings (so both get_settings and load_settings_fresh see it); history dir + Parquet registry. Seed byte-identical to today. (The metadata header is stripped before model construction.)origin/main) — compute_market_regime, shadow rails. The worker must branch off current origin/main (local main is stale at PR #142)._screen_money_flow is latest-only. Do NOT build an as-of money-flow selector in WS1 (it's WS3 work) — scope WS1's composite parity test to recent dates where the latest-snapshot path applies. If tempted to reconstruct historical money-flow rank to extend composite parity, STOP — that's WS3.champion.yaml can't produce a byte-identical ranking (config load-order side effects), STOP — a non-behavior-preserving "foundation" is a silent live change.stock_prices coverage is too thin (most symbols < ~200 bars) to yield usable per-pattern n, STOP and report.unknown-regime emissions dominate the early window (SPY history not deep enough), surface it rather than silently absorbing them.screen_stocks on a fixture carrying both ≥126 price bars (so the 6-month windowing truncates) and a money-flow snapshot row (so the composite score is computable); look-ahead test — only bars up to t fed; determinism — identical corpus bytes on re-run (replay queries stock_prices with explicit ORDER BY symbol, date).unknown-regime cell reported (not dropped); directional-correctness column correct.champion.yaml → byte-identical ranking on a fixture; precedence test — a champion.yaml setting one field proves unspecified fields fall through to settings.yaml (not to code defaults), i.e. deep-merge; hot-reload test — a champion.yaml change is picked up via load_settings_fresh without restart._screen_money_flow is latest-only; full-composite replay (needs an as-of selector) is deferred to WS3; WS1's composite parity uses recent dates; pattern-layer audit (the headline) is unaffected.unknown dominance early in the window → SPY backfilled 200d before the window start; unknown cell reported.uv run pytest tests/ -v (incl. parity/look-ahead/determinism/behavior-preservation/precedence/hot-reload tests) green.uv run ruff check src/ tests/ clean.analysis/stock_screener.py screen_stocks(settings) (takes Settings, reads settings.stock_screener internally) → Layer 1 _screen_money_flow (latest QU snapshot, wall-clock staleness gate ~:222-233) + Layer 2 analyze_sectors + Layer 3 detect_patterns (analysis/stock_patterns.py:46) → _filter_actionable → best_pattern (= patterns[0] after the filter reorders).signal_strength·w_mf + sector_boost·w_sector + best_confidence·w_pattern with independent layers — sector is folded into signal.signal_strength (stock_screener.py:68,377, sector_analyzer.py:223) AND sector_boost is added again (:100). The replay must reproduce this exactly or composite scores won't match. Refs stock_screener.py:68,100,377.screen_stocks can't be reused verbatim: Layer 1 reads the latest snapshot + has a datetime.now() staleness gate, and Layer 3 does a live yf.download(period="6mo") — neither reproducible as-of t. Hence a faithful re-implementation + parity test (on recent dates (live latest-snapshot path) for the full composite).period="6mo", ~126 trading bars) ending at t — window the bars, don't feed all history (max_pattern_bars=120, swing detection differ otherwise).money_flow_snapshots data_date 2020→present, ~1,400 days) — but as-of helpers exist only for cohort (paper/ingest.py:357) and sector (sector_analyzer.py:135), NOT for _screen_money_flow (latest-only; stock_screener.py:189 notes the backfill stamped all days with one shared captured_at, so an as-of query must pick latest captured_at within data_date ≤ t). Building that selector is WS3 work. WS1's composite parity runs on recent dates (live latest-snapshot path); the 1yr deliverable is the pattern layer.stock_prices (OHLC) via core.database.get_session() (legacy local TimescaleDB) — not data/cache/qu100_backtest (adjusted-close only). Universe = symbols in money_flow_snapshots. Query with explicit ORDER BY symbol, date.llm_thesis/research.py:compute_market_regime(*, as_of) (SPY vs 200-SMA; returns "unknown" with <200 SPY closes at/before as_of). SPY via paper/ingest.py:ensure_spy_history — backfill must cover 200 trading days BEFORE the audit window start or early emissions land in unknown.StockScreenerConfig (core/config.py); load_settings currently does StockScreenerConfig(**yaml["stock_screener"]) — a whole-object replace. The new loader must dict-level deep-merge defaults ← settings.yaml ← champion.yaml BEFORE constructing the model, hooked into load_settings (covers get_settings cache and load_settings_fresh wrapper). _filter_actionable's max_bars_since_breakout=10 / entry_proximity_pct=0.05 (stock_screener.py:454-455) are NOT in StockScreenerConfig — champion.yaml captures StockScreenerConfig fields only; those two constants are preserved by construction (out of the config surface), not tunable via the file this PR.close[t+H]/close[t] − 1 for H ∈ {5,10,20} trading days; within H of window end → null (never 0).t from the price index; corpus rows sorted by a stable key before write.market.* table.