Extended README ; to complete with clustering descriptions.
This commit is contained in:
parent
8ce6c9da21
commit
1560d0d524
89
README.md
89
README.md
|
|
@ -1 +1,88 @@
|
|||
Carmignac Project
|
||||
# Carmignac — Data Repair Pipeline
|
||||
|
||||
Registrar Account ID repair for AUM and Flows data.
|
||||
Carmignac × ENSAE Data Challenge 2025.
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
AUM and Flows tables must satisfy the stock-flow identity for every `(account, ISIN, month)`:
|
||||
|
||||
```
|
||||
Q(t-1) + F(t-1→t) = Q(t)
|
||||
```
|
||||
|
||||
In practice this identity is violated because Registrar Account IDs change over time — a single distributor may appear under different codes at different dates. The pipeline detects these ruptures, scores data quality, and resolves identity changes through targeted *surgeries*.
|
||||
|
||||
---
|
||||
|
||||
## Files
|
||||
|
||||
| File | Role |
|
||||
|------|------|
|
||||
| `carmignac_diagnostics.py` | Step 0 — market-level audit, broken months, error account |
|
||||
| `carmignac_repair.py` | Steps 1–3 — universe, score propagation, surgery |
|
||||
| `carmignac_branch.py` | Apply the repaired mapping to the raw AUM file |
|
||||
| `carmignac_analysis.py` | HTML report from pipeline outputs |
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
Run in order:
|
||||
|
||||
```bash
|
||||
# 1. Diagnostics (produces broken_months CSV fed into repair)
|
||||
python carmignac_diagnostics.py \
|
||||
--aum raw_AUM.csv \
|
||||
--flows raw_flows.csv
|
||||
|
||||
# 2. Repair
|
||||
python carmignac_repair.py \
|
||||
--aum raw_AUM.csv \
|
||||
--flows raw_flows.csv \
|
||||
--broken-months carmignac_broken_months.csv
|
||||
|
||||
# 3. Apply mapping to raw AUM
|
||||
python carmignac_branch.py \
|
||||
--aum raw_AUM.csv \
|
||||
--mapping carmignac_mapping.csv \
|
||||
--surgery carmignac_surgery_log.csv
|
||||
|
||||
# 4. Analysis report
|
||||
python carmignac_analysis.py \
|
||||
--error-account-isin carmignac_error_account.csv \
|
||||
--error-account-agg carmignac_error_account_agg.csv
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Outputs
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `carmignac_broken_months.csv` | (ISIN, month) pairs where the aggregate stock-flow equation is broken |
|
||||
| `carmignac_error_account.csv` | Cumulative unresolved residuals per ISIN |
|
||||
| `carmignac_error_account_agg.csv` | Same, aggregated over all ISINs |
|
||||
| `carmignac_scores.csv` | Data quality score σ_r(t) for every (account, month) |
|
||||
| `carmignac_mapping.csv` | Canonical identity mapping (date, reg_orig, reg_used) |
|
||||
| `carmignac_surgery_log.csv` | All surgery operations with Jaccard, score gain, lookback |
|
||||
| `AUM_repaired.csv` | Full AUM with corrected Registrar Account IDs |
|
||||
| `AUM_paths.csv` | Universe accounts with their identity path over time |
|
||||
| `carmignac_diagnostics.html` | Interactive diagnostic report |
|
||||
| `carmignac_report.html` | Interactive repair & surgery report |
|
||||
|
||||
---
|
||||
|
||||
## Key parameters
|
||||
|
||||
| Parameter | Value | Role |
|
||||
|-----------|-------|------|
|
||||
| `MIN_AUM_EUR` | 5 M€ | Universe threshold at t* |
|
||||
| `ALPHA` | 5% | Reconciliation tolerance |
|
||||
| `SCORE_DROP_THRESHOLD` | 0.5 | Surgery trigger |
|
||||
| `MIN_JACCARD` | 0.3 | ISIN portfolio pre-filter |
|
||||
| `MAX_SURGERY_LOOKBACK` | 6 months | Max search window for predecessor code |
|
||||
| `SYMMETRY_ATTENUATION` | 0.05 | Error discount for symmetric transfers |
|
||||
| `BROKEN_MONTH_ATTENUATION` | 0.20 | Error discount for broken market months |
|
||||
Loading…
Reference in New Issue
Block a user