diff --git a/README.md b/README.md index 0b65adf..009103f 100644 --- a/README.md +++ b/README.md @@ -1 +1,88 @@ -Carmignac Project \ No newline at end of file +# Carmignac — Data Repair Pipeline + +Registrar Account ID repair for AUM and Flows data. +Carmignac × ENSAE Data Challenge 2025. + +--- + +## Problem + +AUM and Flows tables must satisfy the stock-flow identity for every `(account, ISIN, month)`: + +``` +Q(t-1) + F(t-1→t) = Q(t) +``` + +In practice this identity is violated because Registrar Account IDs change over time — a single distributor may appear under different codes at different dates. The pipeline detects these ruptures, scores data quality, and resolves identity changes through targeted *surgeries*. + +--- + +## Files + +| File | Role | +|------|------| +| `carmignac_diagnostics.py` | Step 0 — market-level audit, broken months, error account | +| `carmignac_repair.py` | Steps 1–3 — universe, score propagation, surgery | +| `carmignac_branch.py` | Apply the repaired mapping to the raw AUM file | +| `carmignac_analysis.py` | HTML report from pipeline outputs | + +--- + +## Usage + +Run in order: + +```bash +# 1. Diagnostics (produces broken_months CSV fed into repair) +python carmignac_diagnostics.py \ + --aum raw_AUM.csv \ + --flows raw_flows.csv + +# 2. Repair +python carmignac_repair.py \ + --aum raw_AUM.csv \ + --flows raw_flows.csv \ + --broken-months carmignac_broken_months.csv + +# 3. Apply mapping to raw AUM +python carmignac_branch.py \ + --aum raw_AUM.csv \ + --mapping carmignac_mapping.csv \ + --surgery carmignac_surgery_log.csv + +# 4. Analysis report +python carmignac_analysis.py \ + --error-account-isin carmignac_error_account.csv \ + --error-account-agg carmignac_error_account_agg.csv +``` + +--- + +## Outputs + +| File | Content | +|------|---------| +| `carmignac_broken_months.csv` | (ISIN, month) pairs where the aggregate stock-flow equation is broken | +| `carmignac_error_account.csv` | Cumulative unresolved residuals per ISIN | +| `carmignac_error_account_agg.csv` | Same, aggregated over all ISINs | +| `carmignac_scores.csv` | Data quality score σ_r(t) for every (account, month) | +| `carmignac_mapping.csv` | Canonical identity mapping (date, reg_orig, reg_used) | +| `carmignac_surgery_log.csv` | All surgery operations with Jaccard, score gain, lookback | +| `AUM_repaired.csv` | Full AUM with corrected Registrar Account IDs | +| `AUM_paths.csv` | Universe accounts with their identity path over time | +| `carmignac_diagnostics.html` | Interactive diagnostic report | +| `carmignac_report.html` | Interactive repair & surgery report | + +--- + +## Key parameters + +| Parameter | Value | Role | +|-----------|-------|------| +| `MIN_AUM_EUR` | 5 M€ | Universe threshold at t* | +| `ALPHA` | 5% | Reconciliation tolerance | +| `SCORE_DROP_THRESHOLD` | 0.5 | Surgery trigger | +| `MIN_JACCARD` | 0.3 | ISIN portfolio pre-filter | +| `MAX_SURGERY_LOOKBACK` | 6 months | Max search window for predecessor code | +| `SYMMETRY_ATTENUATION` | 0.05 | Error discount for symmetric transfers | +| `BROKEN_MONTH_ATTENUATION` | 0.20 | Error discount for broken market months | \ No newline at end of file