Go to file
2026-04-14 19:33:20 +00:00
__pycache__ clustering 2026-04-03 08:55:04 +00:00
.ipynb_checkpoints Clustering temporel 2026-04-14 19:24:39 +00:00
notebooks add global clustering 2026-04-14 13:36:45 +02:00
src final commit on this branch, cleaned the code 2026-04-13 19:44:51 +00:00
.gitignore added the notebooks 2026-04-13 19:46:26 +00:00
peers_summary.csv Revised the architecture + linked peers and stocks 2026-03-11 14:34:49 +00:00
README.md Update README.md 2026-04-14 11:30:50 +02:00
temporal clustering.ipynb Clustering temporel 2026-04-14 19:24:39 +00:00

Carmignac × ENSAE Data Challenge 2025.

1 - Data Repair Pipeline

Registrar Account ID repair for AUM and Flows data.


1.1 - Problem

AUM and Flows tables must satisfy the stock-flow identity for every (account, ISIN, month):

Q(t-1) + F(t-1→t) = Q(t)

In practice this identity is violated because Registrar Account IDs change over time — a single distributor may appear under different codes at different dates. The pipeline detects these ruptures, scores data quality, and resolves identity changes through targeted surgeries.


1.2 - Files

File Role
carmignac_diagnostics.py Step 0 — market-level audit, broken months, error account
carmignac_repair.py Steps 13 — universe, score propagation, surgery
carmignac_branch.py Apply the repaired mapping to the raw AUM file
carmignac_analysis.py HTML report from pipeline outputs

1.3 - Usage

Run in order:

# 1. Diagnostics (produces broken_months CSV fed into repair)
python carmignac_diagnostics.py \
  --aum   raw_AUM.csv \
  --flows raw_flows.csv

# 2. Repair
python carmignac_repair.py \
  --aum   raw_AUM.csv \
  --flows raw_flows.csv \
  --broken-months carmignac_broken_months.csv

# 3. Apply mapping to raw AUM
python carmignac_branch.py \
  --aum     raw_AUM.csv \
  --mapping carmignac_mapping.csv \
  --surgery carmignac_surgery_log.csv

# 4. Analysis report
python carmignac_analysis.py \
  --error-account-isin carmignac_error_account.csv \
  --error-account-agg  carmignac_error_account_agg.csv

1.4 - Outputs

File Content
carmignac_broken_months.csv (ISIN, month) pairs where the aggregate stock-flow equation is broken
carmignac_error_account.csv Cumulative unresolved residuals per ISIN
carmignac_error_account_agg.csv Same, aggregated over all ISINs
carmignac_scores.csv Data quality score σ_r(t) for every (account, month)
carmignac_mapping.csv Canonical identity mapping (date, reg_orig, reg_used)
carmignac_surgery_log.csv All surgery operations with Jaccard, score gain, lookback
AUM_repaired.csv Full AUM with corrected Registrar Account IDs
AUM_paths.csv Universe accounts with their identity path over time
carmignac_diagnostics.html Interactive diagnostic report
carmignac_report.html Interactive repair & surgery report

1.5 - Key parameters

Parameter Value Role
MIN_AUM_EUR 5 M€ Universe threshold at t*
ALPHA 5% Reconciliation tolerance
SCORE_DROP_THRESHOLD 0.5 Surgery trigger
MIN_JACCARD 0.3 ISIN portfolio pre-filter
MAX_SURGERY_LOOKBACK 6 months Max search window for predecessor code
SYMMETRY_ATTENUATION 0.05 Error discount for symmetric transfers
BROKEN_MONTH_ATTENUATION 0.20 Error discount for broken market months