Project_Carmignac/README.md
2026-04-13 22:09:16 +02:00

88 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## Carmignac — Data Repair Pipeline
Registrar Account ID repair for AUM and Flows data.
Carmignac × ENSAE Data Challenge 2025.
---
### Problem
AUM and Flows tables must satisfy the stock-flow identity for every `(account, ISIN, month)`:
```
Q(t-1) + F(t-1→t) = Q(t)
```
In practice this identity is violated because Registrar Account IDs change over time — a single distributor may appear under different codes at different dates. The pipeline detects these ruptures, scores data quality, and resolves identity changes through targeted *surgeries*.
---
### Files
| File | Role |
|------|------|
| `carmignac_diagnostics.py` | Step 0 — market-level audit, broken months, error account |
| `carmignac_repair.py` | Steps 13 — universe, score propagation, surgery |
| `carmignac_branch.py` | Apply the repaired mapping to the raw AUM file |
| `carmignac_analysis.py` | HTML report from pipeline outputs |
---
### Usage
Run in order:
```bash
# 1. Diagnostics (produces broken_months CSV fed into repair)
python carmignac_diagnostics.py \
--aum raw_AUM.csv \
--flows raw_flows.csv
# 2. Repair
python carmignac_repair.py \
--aum raw_AUM.csv \
--flows raw_flows.csv \
--broken-months carmignac_broken_months.csv
# 3. Apply mapping to raw AUM
python carmignac_branch.py \
--aum raw_AUM.csv \
--mapping carmignac_mapping.csv \
--surgery carmignac_surgery_log.csv
# 4. Analysis report
python carmignac_analysis.py \
--error-account-isin carmignac_error_account.csv \
--error-account-agg carmignac_error_account_agg.csv
```
---
### Outputs
| File | Content |
|------|---------|
| `carmignac_broken_months.csv` | (ISIN, month) pairs where the aggregate stock-flow equation is broken |
| `carmignac_error_account.csv` | Cumulative unresolved residuals per ISIN |
| `carmignac_error_account_agg.csv` | Same, aggregated over all ISINs |
| `carmignac_scores.csv` | Data quality score σ_r(t) for every (account, month) |
| `carmignac_mapping.csv` | Canonical identity mapping (date, reg_orig, reg_used) |
| `carmignac_surgery_log.csv` | All surgery operations with Jaccard, score gain, lookback |
| `AUM_repaired.csv` | Full AUM with corrected Registrar Account IDs |
| `AUM_paths.csv` | Universe accounts with their identity path over time |
| `carmignac_diagnostics.html` | Interactive diagnostic report |
| `carmignac_report.html` | Interactive repair & surgery report |
---
### Key parameters
| Parameter | Value | Role |
|-----------|-------|------|
| `MIN_AUM_EUR` | 5 M€ | Universe threshold at t* |
| `ALPHA` | 5% | Reconciliation tolerance |
| `SCORE_DROP_THRESHOLD` | 0.5 | Surgery trigger |
| `MIN_JACCARD` | 0.3 | ISIN portfolio pre-filter |
| `MAX_SURGERY_LOOKBACK` | 6 months | Max search window for predecessor code |
| `SYMMETRY_ATTENUATION` | 0.05 | Error discount for symmetric transfers |
| `BROKEN_MONTH_ATTENUATION` | 0.20 | Error discount for broken market months |