|
|
||
|---|---|---|
| notebooks | ||
| src | ||
| .gitignore | ||
| peers_summary.csv | ||
| README.md | ||
Carmignac × ENSAE Data Challenge 2025.
1 - Data Repair Pipeline
Registrar Account ID repair for AUM and Flows data.
1.1 - Problem
AUM and Flows tables must satisfy the stock-flow identity for every (account, ISIN, month):
Q(t-1) + F(t-1→t) = Q(t)
In practice this identity is violated because Registrar Account IDs change over time — a single distributor may appear under different codes at different dates. The pipeline detects these ruptures, scores data quality, and resolves identity changes through targeted surgeries.
1.2 - Files
| File | Role |
|---|---|
carmignac_diagnostics.py |
Step 0 — market-level audit, broken months, error account |
carmignac_repair.py |
Steps 1–3 — universe, score propagation, surgery |
carmignac_branch.py |
Apply the repaired mapping to the raw AUM file |
carmignac_analysis.py |
HTML report from pipeline outputs |
1.3 - Usage
Run in order:
# 1. Diagnostics (produces broken_months CSV fed into repair)
python carmignac_diagnostics.py \
--aum raw_AUM.csv \
--flows raw_flows.csv
# 2. Repair
python carmignac_repair.py \
--aum raw_AUM.csv \
--flows raw_flows.csv \
--broken-months carmignac_broken_months.csv
# 3. Apply mapping to raw AUM
python carmignac_branch.py \
--aum raw_AUM.csv \
--mapping carmignac_mapping.csv \
--surgery carmignac_surgery_log.csv
# 4. Analysis report
python carmignac_analysis.py \
--error-account-isin carmignac_error_account.csv \
--error-account-agg carmignac_error_account_agg.csv
1.4 - Outputs
| File | Content |
|---|---|
carmignac_broken_months.csv |
(ISIN, month) pairs where the aggregate stock-flow equation is broken |
carmignac_error_account.csv |
Cumulative unresolved residuals per ISIN |
carmignac_error_account_agg.csv |
Same, aggregated over all ISINs |
carmignac_scores.csv |
Data quality score σ_r(t) for every (account, month) |
carmignac_mapping.csv |
Canonical identity mapping (date, reg_orig, reg_used) |
carmignac_surgery_log.csv |
All surgery operations with Jaccard, score gain, lookback |
AUM_repaired.csv |
Full AUM with corrected Registrar Account IDs |
AUM_paths.csv |
Universe accounts with their identity path over time |
carmignac_diagnostics.html |
Interactive diagnostic report |
carmignac_report.html |
Interactive repair & surgery report |
1.5 - Key parameters
| Parameter | Value | Role |
|---|---|---|
MIN_AUM_EUR |
5 M€ | Universe threshold at t* |
ALPHA |
5% | Reconciliation tolerance |
SCORE_DROP_THRESHOLD |
0.5 | Surgery trigger |
MIN_JACCARD |
0.3 | ISIN portfolio pre-filter |
MAX_SURGERY_LOOKBACK |
6 months | Max search window for predecessor code |
SYMMETRY_ATTENUATION |
0.05 | Error discount for symmetric transfers |
BROKEN_MONTH_ATTENUATION |
0.20 | Error discount for broken market months |