2026-03-10 18:45:51 +01:00
{
"cells": [
{
2026-04-05 17:49:37 +02:00
"cell_type": "markdown",
"id": "6c12c301",
2026-03-10 18:45:51 +01:00
"metadata": {},
"source": [
2026-04-05 17:49:37 +02:00
"# Level Shift Repair \n",
"\n",
"**Three improvements over first attempt:**\n",
"- Gap stability tolerance is now **relative** (1%) instead of absolute (1e-6)\n",
"- Detection is now **iterative**: multiple level shifts per trajectory are corrected\n",
"- Minimum relative gap threshold lowered to **2%** to capture smaller persistent anomalies\n",
"\n",
"**Sections:**\n",
"1. Imports & Data Loading\n",
"2. Build Panel\n",
"3. Improved Repair Algorithm\n",
"4. Rebuild Stocks File\n",
"5. Validation & Diagnostics\n",
"6. Figure"
]
},
{
"cell_type": "markdown",
"id": "26e87f83",
"metadata": {},
"source": [
"## 0. Imports & Data Loading"
2026-03-10 18:45:51 +01:00
]
},
{
"cell_type": "code",
2026-04-05 17:49:37 +02:00
"execution_count": 1,
"id": "80b3f4a1",
2026-03-10 18:45:51 +01:00
"metadata": {},
"outputs": [
{
2026-04-05 17:49:37 +02:00
"name": "stdout",
2026-03-10 18:45:51 +01:00
"output_type": "stream",
"text": [
2026-04-05 17:49:37 +02:00
"Stocks: (4880297, 19)\n",
"Flows: (2574461, 25)\n"
2026-03-10 18:45:51 +01:00
]
}
],
"source": [
2026-04-05 17:49:37 +02:00
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import plotly.graph_objects as go\n",
"\n",
"stocks = pd.read_csv(\"stocks.csv\", low_memory=False)\n",
"flows = pd.read_csv(\"flows.csv\", low_memory=False)\n",
2026-03-10 18:45:51 +01:00
"\n",
"stocks[\"Centralisation Date\"] = pd.to_datetime(stocks[\"Centralisation Date\"])\n",
2026-04-05 17:49:37 +02:00
"flows[\"Centralisation Date\"] = pd.to_datetime(flows[\"Centralisation Date\"])\n",
"\n",
"print(f\"Stocks: {stocks.shape}\")\n",
"print(f\"Flows: {flows.shape}\")"
2026-03-10 18:45:51 +01:00
]
},
{
2026-04-05 17:49:37 +02:00
"cell_type": "markdown",
"id": "17760729",
2026-03-10 18:45:51 +01:00
"metadata": {},
"source": [
2026-04-05 17:49:37 +02:00
"## 1. Build Panel (Account x ISIN x Date)"
2026-03-10 18:45:51 +01:00
]
},
{
"cell_type": "code",
2026-04-05 17:49:37 +02:00
"execution_count": 2,
"id": "939e717e",
2026-03-10 18:45:51 +01:00
"metadata": {},
2026-04-05 17:49:37 +02:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Panel size: (4865922, 5)\n",
"Accounts: 12,470\n",
"ISINs: 491\n"
]
}
],
2026-03-10 18:45:51 +01:00
"source": [
2026-04-05 17:49:37 +02:00
"KEY = [\"Registrar Account - ID\", \"Product - Isin\", \"Centralisation Date\"]\n",
"GROUP = [\"Registrar Account - ID\", \"Product - Isin\"]\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"stocks_panel = stocks[KEY + [\"Quantity - AUM\"]].copy()\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"flows_panel = (\n",
" flows[KEY + [\"Quantity - NetFlows\"]]\n",
" .groupby(KEY, as_index=False)[\"Quantity - NetFlows\"]\n",
" .sum()\n",
2026-03-10 18:45:51 +01:00
")\n",
"\n",
2026-04-05 17:49:37 +02:00
"df = stocks_panel.merge(flows_panel, on=KEY, how=\"left\")\n",
"df[\"Quantity - NetFlows\"] = df[\"Quantity - NetFlows\"].fillna(0)\n",
"df = df.sort_values(KEY).reset_index(drop=True)\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"# Remove negative AUM (data quality filter)\n",
"df = df[df[\"Quantity - AUM\"] >= 0].copy()\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"print(f\"Panel size: {df.shape}\")\n",
"print(f\"Accounts: {df['Registrar Account - ID'].nunique():,}\")\n",
"print(f\"ISINs: {df['Product - Isin'].nunique():,}\")"
2026-03-10 18:45:51 +01:00
]
},
{
2026-04-05 17:49:37 +02:00
"cell_type": "markdown",
"id": "5931079a",
2026-03-10 18:45:51 +01:00
"metadata": {},
"source": [
2026-04-05 17:49:37 +02:00
"## 2. Improved Repair Algorithm\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"**Fix 1 — Relative gap stability tolerance**\n",
"The original algorithm used `GAP_TOL = 1e-6` (absolute), requiring the gap to be perfectly constant.\n",
"In practice gaps fluctuate slightly due to valuation effects and rounding.\n",
"We now use a relative tolerance of 1%:\n",
"`|gap(t+k) - gap(t)| / |gap(t)| < REL_STABILITY_TOL`\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"**Fix 2 — Iterative detection**\n",
"The original algorithm stopped after the first detected shift.\n",
"We now loop until no more shifts are found, correcting multiple successive migrations.\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"**Fix 3 — Lower detection threshold**\n",
"Lowered from 5% to 2% to capture smaller but clearly persistent anomalies."
2026-03-10 18:45:51 +01:00
]
},
{
"cell_type": "code",
2026-04-05 17:49:37 +02:00
"execution_count": 10,
"id": "3754dcaa",
2026-03-10 18:45:51 +01:00
"metadata": {},
"outputs": [],
"source": [
2026-04-05 17:49:37 +02:00
"# ============================================================\n",
"# PARAMETERS\n",
"# ============================================================\n",
"\n",
"REL_GAP_THR = 0.02 # minimum relative gap to trigger detection (2%, was 5%)\n",
"REL_STABILITY_TOL = 0.05 # relative tolerance on gap stability (1%, was absolute 1e-6)\n",
"MIN_PERSISTENCE = 2 # minimum consecutive stable periods\n",
"MAX_ITERATIONS = 10 # maximum repair iterations per trajectory\n",
"\n",
"# ============================================================\n",
"# REPAIR FUNCTION (IMPROVED)\n",
"# ============================================================\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"def repair_group(g):\n",
" \"\"\"\n",
" Iteratively detect and correct persistent level shifts.\n",
" Improvements: relative stability tolerance, iterative detection, lower threshold.\n",
" \"\"\"\n",
2026-03-10 18:45:51 +01:00
" g = g.copy()\n",
2026-04-05 17:49:37 +02:00
" obs = g[\"Quantity - AUM\"].values.copy()\n",
" flows_ = g[\"Quantity - NetFlows\"].values\n",
" n = len(obs)\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" corrected = obs.copy()\n",
" repair_flag = np.zeros(n, dtype=bool)\n",
" n_repairs = 0\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" for _ in range(MAX_ITERATIONS):\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" # Build expected path from current corrected series\n",
" expected = np.empty(n)\n",
" expected[0] = np.nan\n",
" for t in range(1, n):\n",
" expected[t] = corrected[t-1] + flows_[t-1]\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" gap = corrected - expected\n",
" rel_gap = np.abs(gap) / np.maximum(np.abs(expected), 1.0)\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" # Search for first persistent level shift\n",
" idx = None\n",
" shift = None\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" for i in range(1, n - MIN_PERSISTENCE):\n",
" if rel_gap[i] <= REL_GAP_THR:\n",
" continue\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" # Check gap stability with RELATIVE tolerance (Fix 1)\n",
" window = gap[i:i + MIN_PERSISTENCE]\n",
" ref = gap[i]\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" if abs(ref) < 1.0:\n",
" stable = np.all(np.abs(window - ref) < 1.0)\n",
" else:\n",
" stable = np.all(np.abs(window - ref) / np.abs(ref) < REL_STABILITY_TOL)\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" if not stable:\n",
" continue\n",
"\n",
" idx = i\n",
" shift = ref\n",
2026-03-10 18:45:51 +01:00
" break\n",
"\n",
2026-04-05 17:49:37 +02:00
" # No more shifts found: stop (Fix 2 — iterative)\n",
" if idx is None:\n",
" break\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" # Safety: do not create new negative AUM\n",
" candidate = corrected[idx:] - shift\n",
" if ((candidate < 0) & (corrected[idx:] >= 0)).any():\n",
" break\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" # Safety: avoid extreme corrections\n",
" if abs(shift) > 2 * np.nanmax(np.abs(corrected)):\n",
" break\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" # Apply correction\n",
" corrected[idx:] = candidate\n",
" repair_flag[idx:] = True\n",
" n_repairs += 1\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" if n_repairs == 0:\n",
2026-03-10 18:45:51 +01:00
" return g\n",
"\n",
2026-04-05 17:49:37 +02:00
" # Rebuild expected path after all corrections\n",
" expected_corr = np.empty(n)\n",
2026-03-10 18:45:51 +01:00
" expected_corr[0] = np.nan\n",
2026-04-05 17:49:37 +02:00
" for t in range(1, n):\n",
" expected_corr[t] = corrected[t-1] + flows_[t-1]\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
" g[\"corrected_aum\"] = corrected\n",
2026-03-10 18:45:51 +01:00
" g[\"expected_stock_corr\"] = expected_corr\n",
2026-04-05 17:49:37 +02:00
" g[\"repair_flag\"] = repair_flag\n",
" g[\"n_repairs\"] = n_repairs\n",
2026-03-10 18:45:51 +01:00
"\n",
" return g"
]
},
{
"cell_type": "code",
2026-04-05 17:49:37 +02:00
"execution_count": 11,
"id": "5041d1cc",
2026-03-10 18:45:51 +01:00
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
2026-04-05 17:49:37 +02:00
"/tmp/ipykernel_7300/2675817932.py:11: FutureWarning:\n",
"\n",
"DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.\n",
"\n"
2026-03-10 18:45:51 +01:00
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
2026-04-05 17:49:37 +02:00
"=== REPAIR SUMMARY ===\n",
" Before repair After repair Repaired points Multi-shift trajectories\n",
"0 553084 148599 230833 1020\n"
2026-03-10 18:45:51 +01:00
]
}
],
"source": [
2026-04-05 17:49:37 +02:00
"# Apply repair\n",
"df_repair = df.copy()\n",
"df_repair[\"corrected_aum\"] = df_repair[\"Quantity - AUM\"]\n",
"df_repair[\"expected_stock_corr\"] = np.nan\n",
"df_repair[\"repair_flag\"] = False\n",
"df_repair[\"n_repairs\"] = 0\n",
"\n",
"df_repair = (\n",
" df_repair\n",
" .groupby(GROUP, group_keys=False)\n",
2026-03-10 18:45:51 +01:00
" .apply(repair_group)\n",
2026-04-05 17:49:37 +02:00
" .reset_index(drop=True)\n",
2026-03-10 18:45:51 +01:00
")\n",
"\n",
2026-04-05 17:49:37 +02:00
"# Rebuild expected stock before repair\n",
"df_repair = df_repair.sort_values(KEY).reset_index(drop=True)\n",
"df_repair[\"prev_aum\"] = df_repair.groupby(GROUP)[\"Quantity - AUM\"].shift(1)\n",
"df_repair[\"prev_flow\"] = df_repair.groupby(GROUP)[\"Quantity - NetFlows\"].shift(1).fillna(0)\n",
"df_repair[\"expected_stock\"] = df_repair[\"prev_aum\"] + df_repair[\"prev_flow\"]\n",
"\n",
"df_repair[\"gap_before\"] = df_repair[\"Quantity - AUM\"] - df_repair[\"expected_stock\"]\n",
"df_repair[\"gap_after\"] = df_repair[\"corrected_aum\"] - df_repair[\"expected_stock_corr\"]\n",
"df_repair[\"rupture_before\"] = df_repair[\"gap_before\"].abs() > 10\n",
"df_repair[\"rupture_after\"] = df_repair[\"gap_after\"].abs() > 10\n",
"\n",
"multi_shift = (df_repair.groupby(GROUP)[\"n_repairs\"].max() > 1).sum()\n",
"\n",
"print(\"=== REPAIR SUMMARY ===\")\n",
"print(pd.DataFrame({\n",
" \"Before repair\": [int(df_repair[\"rupture_before\"].sum())],\n",
" \"After repair\": [int(df_repair[\"rupture_after\"].sum())],\n",
" \"Repaired points\": [int(df_repair[\"repair_flag\"].sum())],\n",
" \"Multi-shift trajectories\": [int(multi_shift)]\n",
"}))\n",
"\n",
"df_repair[[\n",
" \"Registrar Account - ID\", \"Product - Isin\", \"Centralisation Date\",\n",
" \"Quantity - AUM\", \"corrected_aum\", \"Quantity - NetFlows\",\n",
" \"expected_stock\", \"expected_stock_corr\",\n",
" \"gap_before\", \"gap_after\", \"repair_flag\", \"n_repairs\"\n",
"]].rename(columns={\n",
" \"Quantity - AUM\": \"aum_raw\",\n",
" \"corrected_aum\": \"aum_repaired\",\n",
" \"Quantity - NetFlows\": \"flows\",\n",
" \"expected_stock\": \"expected_aum_raw\",\n",
" \"expected_stock_corr\": \"expected_aum_repaired\"\n",
"}).to_csv(\"df_repaired.csv\", index=False)"
]
},
{
"cell_type": "markdown",
"id": "56ac9cc7",
"metadata": {},
"source": [
"## 3. Rebuild Stocks File"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "542189d4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Share of repaired observations: 7.3211%\n",
"Number of repaired rows: 408,335\n",
"Trajectories with 2+ repairs: 1,020\n"
]
}
],
"source": [
2026-03-10 18:45:51 +01:00
"stocks_repaired = stocks.copy()\n",
2026-04-05 17:49:37 +02:00
"stocks_repaired[\"Centralisation Date\"] = pd.to_datetime(stocks_repaired[\"Centralisation Date\"])\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"repair_map = df_repair[KEY + [\"corrected_aum\", \"repair_flag\", \"n_repairs\"]].rename(\n",
" columns={\"corrected_aum\": \"Quantity - AUM repaired\"}\n",
2026-03-10 18:45:51 +01:00
")\n",
"\n",
2026-04-05 17:49:37 +02:00
"stocks_repaired = stocks_repaired.merge(repair_map, on=KEY, how=\"left\")\n",
"stocks_repaired[\"Quantity - AUM original\"] = stocks_repaired[\"Quantity - AUM\"]\n",
2026-03-10 18:45:51 +01:00
"stocks_repaired[\"Quantity - AUM\"] = np.where(\n",
" stocks_repaired[\"repair_flag\"] == True,\n",
2026-04-05 17:49:37 +02:00
" stocks_repaired[\"Quantity - AUM repaired\"],\n",
2026-03-10 18:45:51 +01:00
" stocks_repaired[\"Quantity - AUM\"]\n",
")\n",
"\n",
2026-04-05 17:49:37 +02:00
"# Recompute monetary values (NAV per share unchanged)\n",
"stocks_repaired[\"nav_ccy\"] = stocks_repaired[\"Value - AUM CCY\"] / stocks_repaired[\"Quantity - AUM original\"]\n",
"stocks_repaired[\"nav_eur\"] = stocks_repaired[\"Value - AUM €\"] / stocks_repaired[\"Quantity - AUM original\"]\n",
"stocks_repaired[\"Value - AUM CCY\"] = stocks_repaired[\"Quantity - AUM\"] * stocks_repaired[\"nav_ccy\"]\n",
"stocks_repaired[\"Value - AUM €\"] = stocks_repaired[\"Quantity - AUM\"] * stocks_repaired[\"nav_eur\"]\n",
"stocks_repaired = stocks_repaired.drop(columns=[\n",
" \"Quantity - AUM repaired\", \"Quantity - AUM original\", \"nav_ccy\", \"nav_eur\"\n",
"])\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"print(f\"Share of repaired observations: {stocks_repaired['repair_flag'].mean():.4%}\")\n",
"print(f\"Number of repaired rows: {stocks_repaired['repair_flag'].sum():,}\")\n",
"print(f\"Trajectories with 2+ repairs: {(stocks_repaired.groupby(GROUP)['n_repairs'].max() >= 2).sum():,}\")\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"stocks_repaired.to_csv(\"stock_repaired.csv\", index=False)"
]
},
{
"cell_type": "markdown",
"id": "9644cf0e",
"metadata": {},
"source": [
"## 4. Validation & Diagnostics"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "76cd990d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Negative AUM before repair: 27,047\n",
"Negative AUM after repair: 22,630\n",
"Series where repair created negatives: 213\n",
"\n",
"Ruptures before: 553,084\n",
"Ruptures after: 148,599\n",
"Reduction rate: 73.1%\n"
]
}
],
"source": [
"# Safety check: no new negative AUM\n",
2026-03-10 18:45:51 +01:00
"df_compare = stocks.merge(\n",
2026-04-05 17:49:37 +02:00
" stocks_repaired[KEY + [\"Quantity - AUM\"]],\n",
" on=KEY, how=\"inner\", suffixes=(\"_raw\", \"_repaired\")\n",
2026-03-10 18:45:51 +01:00
")\n",
"\n",
2026-04-05 17:49:37 +02:00
"neg_raw = (df_compare[\"Quantity - AUM_raw\"] < 0).sum()\n",
"neg_rep = (df_compare[\"Quantity - AUM_repaired\"] < 0).sum()\n",
"created_neg = df_compare[\n",
" (df_compare[\"Quantity - AUM_raw\"] >= 0) &\n",
" (df_compare[\"Quantity - AUM_repaired\"] < 0)\n",
"].groupby([\"Registrar Account - ID\", \"Product - Isin\"]).size()\n",
"\n",
"print(f\"Negative AUM before repair: {neg_raw:,}\")\n",
"print(f\"Negative AUM after repair: {neg_rep:,}\")\n",
"print(f\"Series where repair created negatives: {len(created_neg):,}\")\n",
"\n",
"n_before = int(df_repair[\"rupture_before\"].sum())\n",
"n_after = int(df_repair[\"rupture_after\"].sum())\n",
"print(f\"\\nRuptures before: {n_before:,}\")\n",
"print(f\"Ruptures after: {n_after:,}\")\n",
"print(f\"Reduction rate: {1 - n_after/n_before:.1%}\")"
2026-03-10 18:45:51 +01:00
]
},
{
"cell_type": "code",
2026-04-05 17:49:37 +02:00
"execution_count": 14,
"id": "895ec10a",
2026-03-10 18:45:51 +01:00
"metadata": {},
"outputs": [
{
"data": {
2026-04-05 17:49:37 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAHqCAYAAADVi/1VAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA3flJREFUeJzs3Xd4VGX6//H39JZk0iCFLiA27Iq62NnFsiqCveHa+7qu9bs2WH+rrrr2rgtiWSvuuuqqu/aCBbCgKEoTQkiAlEkyvZzfHydEQk0gyckkn9d15YIzOXPmnpKZe+7zPPdjMwzDQEREREREREREpAvZrQ5ARERERERERER6HxWlRERERERERESky6koJSIiIiIiIiIiXU5FKRERERERERER6XIqSomIiIiIiIiISJdTUUpERERERERERLqcilIiIiIiIiIiItLlVJQSEREREREREZEup6KUiIiIiIiIiIh0ORWlREREREREpMucfvrpDB48uEtua+rUqdhsNhYvXtwltyci7aOilIj0OKuTj9U/TqeTfv36cfrpp7Ns2TKrwxMRERHpNMqDRCSbOK0OQESks0yePJkhQ4YQi8X49NNPmTp1Kh999BHffvstXq/X6vBEREREOk13zoMeffRRMpmMpTGISPegopSI9FiHHnoou+++OwBnnXUWxcXF3Hrrrbzyyiscd9xxFkcnIiIi0nm6cx7kcrk2uU8qlSKTyeB2u7sgIhGxiqbviUivse+++wKwYMECABKJBNdffz277bYbwWCQQCDAvvvuy7vvvtvqervuuivjx49vddnIkSOx2Wx88803LZc999xz2Gw2vv/++06+JyIiIiLts3YeBPDDDz9wzDHHUFhYiNfrZffdd+eVV15pdb3V0wE/+OADzj33XIqKisjLy+O0006jrq6u1b7/+te/OPzwwykvL8fj8TB06FD+/Oc/k06nW+23dk+pxYsXY7PZuP3227nrrrsYOnQoHo+HuXPntjlOgO+++46DDjoIn89H//79uemmmzQiS6Sb00gpEek1Vje4LCgoAKChoYHHHnuME088kbPPPpvGxkYef/xxxo4dy+eff87OO+8MmEncP/7xj5bj1NbW8t1332G32/nwww/ZcccdAfjwww/p06cP2267bZfeLxEREZFNWTsP+u677/jVr35Fv379uPrqqwkEAjz//POMGzeOl156iaOPPrrV9S+66CLy8/O58cYbmTdvHg8++CA///wz7733HjabDTALWDk5OVx22WXk5OTwzjvvcP3119PQ0MBtt922yRinTJlCLBbjnHPOwePxUFhY2OY4q6qqOPDAA0mlUi37PfLII/h8vg58FEWko6koJSI9VigUYtWqVcRiMT777DMmTZqEx+Pht7/9LWAmZYsXL241LPzss89mm2224d577+Xxxx8HzKLUPffcw/fff8+2227Lxx9/jNvtZuzYsXz44YdceOGFgFmUGj16dNffUREREZG1bCoP+v3vf8/AgQP54osv8Hg8AFxwwQWMHj2aq666ap2ilNvt5u23326Zejdo0CCuvPJK/v3vf3PkkUcC8Mwzz7QqAp133nmcd955PPDAA9x0000tt7MhFRUVzJ8/nz59+rRcNmbMmDbFeeutt7Jy5Uo+++wz9txzTwAmTpzI8OHDN/sxFJHOp+l7ItJjjRkzhj59+jBgwACOOeYYAoEAr7zyCv379wfA4XC0FKQymQy1tbWkUil23313Zs+e3XKc1cPdP/jgA8AsPu2xxx78+te/5sMPPwSgvr6eb7/9tmVfERERESttLA+qra3lnXfe4bjjjqOxsZFVq1axatUqampqGDt2LD/99NM6K/Wdc845rXpBnX/++TidTl5//fWWy9YsSK0+7r777kskEuGHH37YZMwTJkxoVZBqT5yvv/46e+21V0tBCqBPnz6cfPLJ7X/wRKTLaKSUiPRY999/P1tvvTWhUIi///3vfPDBB+ucoXviiSe44447+OGHH0gmky2XDxkypOX/JSUlDB8+nA8//JBzzz2XDz/8kAMPPJD99tuPiy++mIULF/L999+TyWRUlBIREZFuYWN50Pz58zEMg+uuu47rrrtuvddfsWIF/fr1a9lee8RRTk4OZWVlLdMCwZwSeO211/LOO+/Q0NDQav9QKLTJmNfMv9ob588//8yoUaPW+f2IESM2ebsiYh0VpUSkx9pzzz1bVp0ZN24co0eP5qSTTmLevHnk5OTw1FNPcfrppzNu3DiuuOIK+vbti8Ph4Oabb27VBBRg9OjRvP3220SjUWbNmsX111/PDjvsQH5+Ph9++CHff/89OTk57LLLLlbcVREREZFWNpYHrW7+ffnllzN27Nj1Xn/YsGHtur36+nr2339/8vLymDx5MkOHDsXr9TJ79myuuuqqNjUcX7v/U2fEKSLdi4pSItIrrC42HXjggdx3331cffXVvPjii2y11VZMnz69pUEnwA033LDO9ffdd1+mTJnCs88+SzqdZp999sFutzN69OiWotQ+++yDw+HoyrslIiIisklr50FnnHEGAC6XizFjxrTpGD/99BMHHnhgy3ZTUxPLly/nsMMOA+C9996jpqaG6dOns99++7Xst2jRos2Oe6uttmpznIMGDeKnn35a5/J58+Zt9u2LSOdTTykR6TUOOOAA9txzT+666y5isVhLAckwjJZ9PvvsM2bMmLHOdVdPy7v11lvZcccdCQaDLZe//fbbzJw5U1P3REREpNtaMw/Ky8vjgAMO4OGHH2b58uXr7Lty5cp1LnvkkUdatTp48MEHSaVSHHrooQDrzasSiQQPPPDAZsfct2/fNsd52GGH8emnn/L555+3+v3TTz+92bcvIp1PI6VEpFe54oorOPbYY5k6dSq//e1vmT59OkcffTSHH344ixYt4qGHHmK77bajqamp1fWGDRtGaWkp8+bN4+KLL265fL/99uOqq64CUFFKREREurU186D777+f0aNHM3LkSM4++2y22morqqurmTFjBhUVFXz99detrptIJDj44IM57rjjmDdvHg888ACjR49uWXlvn332oaCggIkTJ3LJJZdgs9l48sknWxWpNkdb47zyyit58sknOeSQQ/j9739PIBDgkUceYdCgQXzzzTdbFIOIdB4VpUSkVxk/fjxDhw7l9ttvZ968eVRVVfHwww/z5ptvst122/HUU0/xwgsv8N57761z3X333ZcXXniB0aNHt1y222674ff7SaVS622uKSIiItJdrJkHnX322cycOZNJkyYxdepUampq6Nu3L7vssgvXX3/9Ote97777ePrpp7n++utJJpOceOKJ3HPPPS0tEIqKinj11Vf54x//yLXXXktBQQGnnHIKBx988Ab7QbXFdttt16Y4y8rKePfdd7n44ou55ZZbKCoq4rzzzqO8vJwzzzxzs29fRDqXzdjS0rWIiIiIiIj0SFOnTuV3v/sdX3zxRUvjdBGRjqKeUiIiIiIiIiIi0uVUlBIRERERERERkS6nopSIiIiIiIiIiHQ59ZQSEREREREREZEup5FSIiIiIiIiIiLS5VSUEhERERERERGRLue0OoDuJpPJUFlZSW5uLjabzepwREREpAsZhkFjYyPl5eXY7Tp3t7mUT4mIiPRubc2pVJRaS2VlJQMGDLA6DBEREbHQ0qVL6d+/v9VhZC3lUyIiIgKbzqlUlFpLbm4uYD5weXl5FkcjIl3ph6oQUz9aTE04TlHAw+mjB7NNadDqsHqVUDRJJJ7C73ES9LmsDkd6i2++gVNOgZ9/psHtZkAi0ZIPyOZRPiXSu/1QFeIfny6hJpygKODmxL0GKqcS6Q3efx9OPx1qa2nIz2dAff0mcyoVpdayeoh5Xl6ekiiRXiZeHSdp9zKwNI/6cIKk3av3gS4Uiib5uTpOYyxDrjfDyDyfClPS+Z58Es45B2IxGDIEpk2DfffVlLMtpHxKpHfrn3Gx69Zp/C4HkWSaASXF5OX5rA5LRDqLYcBtt8E110AmA7vsAk88ATvuuMmcSs0SRESa+V1OAl4nqbRBwOvE51LdviuF4ykaYyn6F/h
"text/plain": [
"<Figure size 1200x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Accounting scatter: Flow vs ΔAUM (before / after)\n",
"\n",
"def build_accounting_df(stocks_df):\n",
" flows_agg = flows[KEY + [\"Quantity - NetFlows\"]].groupby(KEY, as_index=False).sum()\n",
" d = stocks_df.merge(flows_agg, on=KEY, how=\"left\")\n",
" d[\"Quantity - NetFlows\"] = d[\"Quantity - NetFlows\"].fillna(0)\n",
" d = d.sort_values(KEY)\n",
" d[\"prev_aum\"] = d.groupby(GROUP)[\"Quantity - AUM\"].shift(1)\n",
" d[\"flow_lag\"] = d.groupby(GROUP)[\"Quantity - NetFlows\"].shift(1).fillna(0)\n",
" d[\"delta_aum\"] = d[\"Quantity - AUM\"] - d[\"prev_aum\"]\n",
" return d[d[\"prev_aum\"].notna()]\n",
"\n",
"fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n",
"for ax, stocks_df, title in zip(axes, [stocks, stocks_repaired], [\"Raw\", \"Repaired\"]):\n",
" s = build_accounting_df(stocks_df).sample(min(20000, len(stocks_df)), random_state=1)\n",
" ax.scatter(s[\"flow_lag\"], s[\"delta_aum\"], alpha=0.2, s=4)\n",
" lim = s[\"flow_lag\"].abs().quantile(0.99)\n",
" x = np.linspace(-lim, lim, 100)\n",
" ax.plot(x, x, color=\"red\", linewidth=1.5, label=\"Perfect identity\")\n",
" ax.set_xlim(-lim, lim); ax.set_ylim(-lim, lim)\n",
" ax.set_xlabel(\"Flow (t-1)\"); ax.set_ylabel(\"Δ AUM\")\n",
" ax.set_title(title); ax.legend()\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "4ec5038d",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3kAAAGGCAYAAADGq0gwAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAmapJREFUeJzs3Xd8U+X+B/DPSdqke9BJS+liySpYLKIWUKugXhDlCriAiuMiKFInV6WiVyqgiAiC1wsC+ruyLm4FoYoyRZmyVxfde6Vt2pzn90eatKEzpW3S9vN+vUKbJ885+eb0tOSb5zzPVxJCCBAREREREVGnoLB0AERERERERNR6mOQRERERERF1IkzyiIiIiIiIOhEmeURERERERJ0IkzwiIiIiIqJOhEkeERERERFRJ8Ikj4iIiIiIqBNhkkdERERERNSJMMkjIiIiIiLqRJjkEZHVkiQJb7zxhqXD6DA+++wz9OvXD7a2tnBzc7N0OFbvjTfegCRJlg6DqEMYPXo0Ro8e3S7Pxd9NomvHJI+IjNatWwdJkow3Ozs79OnTB7Nnz0ZmZqalw2uW//73v1i2bJmlw2h3Z8+exfTp0xEaGopPPvkE//73vy0dklXQaDR44403sHv3bkuHQg2YPn26yd8dtVqNPn36YP78+SgvL7d0eG3OkNAYbra2tggKCsKzzz6LgoICS4dHRB2UjaUDICLr8+abbyI4OBjl5eXYu3cvVq1ahR9++AEnT56Eg4NDu8VRVlYGGxvz/kz997//xcmTJ/Hcc8+1TVBWavfu3ZBlGR988AF69epl6XCshkajwYIFCwCgzijEa6+9hldeecUCUdHV1Go1/vOf/wAACgsL8fXXX+Ott97CpUuX8H//938Wjq59rFq1Ck5OTigtLUV8fDw+/PBDHDlyBHv37rV0aACAn376ydIhEJEZmOQRUR133XUXhg0bBgB4/PHH4eHhgaVLl+Lrr7/Ggw8+eE371mg0zU4U7ezsrum5upKsrCwA4GWaZrCxsTH7QwRqGzY2NnjkkUeM959++mncdNNN+OKLL7B06VL4+PhYMLr28fe//x2enp4AgKeeegpTpkzBpk2bcOjQIURERFg4OkClUjXZp7y8HCqVCgoFLxQjsjT+FhJRk2677TYAQEJCgrHt888/R3h4OOzt7dGtWzdMmTIFKSkpJtuNHj0aAwcOxOHDhzFy5Eg4ODjgn//8JwDgzz//xJgxY+Dp6Ql7e3sEBwfjscceM9n+6jl5xcXFeO655xAUFAS1Wg1vb2/ccccdOHLkiPH5vv/+eyQlJRkvfQoKCjJuX1FRgdjYWPTq1QtqtRoBAQF46aWXUFFRUed5Z8+eja+++goDBw6EWq3GgAEDsH379jrHJjU1FTNmzICfnx/UajWCg4Mxc+ZMaLVaY5+CggI899xzCAgIgFqtRq9evbBo0SLIstys4//RRx9hwIABUKvV8PPzw6xZs0wu4woKCkJsbCwAwMvLq8m5jCdOnMD06dMREhICOzs7+Pr64rHHHkNubm6LX9/cuXONP5cePXpg6tSpyMnJMfbJysrCjBkz4OPjAzs7O4SFhWH9+vUmz7V7925IklTn0srExERIkoR169YZ26ZPnw4nJyekpqZiwoQJcHJygpeXF1544QXodDrjdl5eXgCABQsWGM8Jw7Gpb96POT/73bt3Y9iwYbCzs0NoaCg+/vhjs+YSrVy5EiEhIbC3t0dERAT27NlTZ96TVqvF/PnzER4eDldXVzg6OiIyMhK//PJLvcfo3Xffxfvvv4/AwEDY29tj1KhROHnypEnfjIwMREdHo0ePHlCr1ejevTvuvfdeJCYmNivu9iBJEm655RYIIXD58mVje1JSEp5++mn07dsX9vb28PDwwAMPPGASe0FBAZRKJZYvX25sy8nJgUKhgIeHB4QQxvaZM2fC19e3wTi2bt0KSZLw66+/1nns448/hiRJxuPb2sc1MjISAHDp0iWT9t9//x1jx46Fq6srHBwcMGrUKOzbt8+kj+E8PHv2LCZNmgQXFxd4eHhgzpw5dS6B/fTTT3HbbbfB29sbarUa/fv3x6pVq+rEc/W5afh93bhxI1577TX4+/vDwcEBRUVFzY4TAPbu3YsbbrjB5PeIiK4dP8IkoiYZ3mR4eHgAAN5++228/vrrmDRpEh5//HFkZ2fjww8/xMiRI3H06FGT0aTc3FzcddddmDJlCh555BH4+PggKysLd955J7y8vPDKK6/Azc0NiYmJ2LZtW6Nx/OMf/8DWrVsxe/Zs9O/fH7m5udi7dy/OnDmD66+/Hq+++ioKCwtx5coVvP/++wAAJycnAIAsyxg/fjz27t2LJ598Etdddx3++usvvP/++zh//jy++uork+fau3cvtm3bhqeffhrOzs5Yvnw5Jk6ciOTkZONxSEtLQ0REBAoKCvDkk0+iX79+SE1NxdatW6HRaKBSqaDRaDBq1CikpqbiqaeeQs+ePbF//37MmzcP6enpTc4ffOONN7BgwQJERUVh5syZOHfuHFatWoU//vgD+/btg62tLZYtW4YNGzbgyy+/NF7yNXjw4Ab3uXPnTly+fBnR0dHw9fXFqVOn8O9//xunTp3CwYMHjUlKc15fSUkJIiMjcebMGTz22GO4/vrrkZOTg2+++QZXrlyBp6cnysrKMHr0aFy8eBGzZ89GcHAwtmzZgunTp6OgoABz5sxp9Bg0RKfTYcyYMRg+fDjeffdd7Nq1C++99x5CQ0Mxc+ZMeHl5YdWqVZg5cybuu+8+3H///QDQ6LEBmvezP3r0KMaOHYvu3btjwYIF0Ol0ePPNN41JZVNWrVqF2bNnIzIyEnPnzkViYiImTJgAd3d39OjRw9ivqKgI//nPf/Dggw/iiSeeQHFxMdasWYMxY8bg0KFDGDJkiMl+N2zYgOLiYsyaNQvl5eX44IMPcNttt+Gvv/4yjoZNnDgRp06dwjPPPIOgoCBkZWVh586dSE5ONvlQxNIMyZG7u7ux7Y8//sD+/fsxZcoU9OjRA4mJiVi1ahVGjx6N06dPw8HBAW5ubhg4cCB+++03PPvsswD0P1NJkpCXl4fTp09jwIABAIA9e/YYk6n63HPPPXBycsLmzZsxatQok8c2bdqEAQMGYODAgQBa/7jW9/p//vln3HXXXQgPD0dsbCwUCoUxSduzZ0+dEb9JkyYhKCgIcXFxOHjwIJYvX478/Hxs2LDB2GfVqlUYMGAAxo8fDxsbG3z77bd4+umnIcsyZs2a1WScb731FlQqFV544QVUVFRApVI1O86//vrL+H/BG2+8gaqqKsTGxnaJkVuiNieIiKp9+umnAoDYtWuXyM7OFikpKWLjxo3Cw8ND2NvbiytXrojExEShVCrF22+/bbLtX3/9JWxsbEzaR40aJQCI1atXm/T98ssvBQDxxx9/NBoPABEbG2u87+rqKmbNmtXoNvfcc48IDAys0/7ZZ58JhUIh9uzZY9K+evVqAUDs27fP5HlVKpW4ePGise348eMCgPjwww+NbVOnThUKhaLe1yHLshBCiLfeeks4OjqK8+fPmzz+yiuvCKVSKZKTkxt8LVlZWUKlUok777xT6HQ6Y/uKFSsEALF27VpjW2xsrAAgsrOzG9yfgUajqdP2xRdfCADit99+M+v1zZ8/XwAQ27Zta7DPsmXLBADx+eefGx/TarVixIgRwsnJSRQVFQkhhPjll18EAPHLL7+Y7CchIUEAEJ9++qmxbdq0aQKAePPNN036Dh06VISHhxvvZ2dn1zmPDAzHrLbm/uzHjRsnHBwcRGpqqrHtwoULwsbGps4+r1ZRUSE8PDzEDTfcICorK43t69atEwDEqFGjjG1VVVWioqLCZPv8/Hzh4+MjHnvsMWOb4RgZfk8Nfv/9dwFAzJ0717gtALFkyZJGY2xP06ZNE46OjiI7O1tkZ2eLixcvinfffVdIkiQGDhxoPI+EqP/cPXDggAAgNmzYYGybNWuW8PHxMd6PiYkRI0eOFN7e3mLVqlVCCCFyc3OFJEnigw8+aDS+Bx98UHh7e4uqqipjW3p6ulAoFMbz71qOq+E
"text/plain": [
"<Figure size 900x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Gap persistence: raw vs repaired\n",
"\n",
"def build_gap_df(stocks_df):\n",
" flows_agg = flows[KEY + [\"Quantity - NetFlows\"]].groupby(KEY, as_index=False).sum()\n",
" d = stocks_df.merge(flows_agg, on=KEY, how=\"left\")\n",
" d[\"Quantity - NetFlows\"] = d[\"Quantity - NetFlows\"].fillna(0)\n",
" d = d.sort_values(KEY)\n",
" d[\"prev_aum\"] = d.groupby(GROUP)[\"Quantity - AUM\"].shift(1)\n",
" d[\"flow_lag\"] = d.groupby(GROUP)[\"Quantity - NetFlows\"].shift(1).fillna(0)\n",
" d[\"gap\"] = d[\"Quantity - AUM\"] - (d[\"prev_aum\"] + d[\"flow_lag\"])\n",
" return d\n",
"\n",
"def compute_gap_sequences(df, tol=1.0):\n",
" lengths = []\n",
" for _, g in df.groupby(GROUP):\n",
" gaps = g[\"gap\"].values\n",
" run = 1\n",
" for i in range(1, len(gaps)):\n",
" if np.isfinite(gaps[i]) and np.isfinite(gaps[i-1]) and abs(gaps[i]-gaps[i-1]) < tol:\n",
" run += 1\n",
" else:\n",
" lengths.append(run); run = 1\n",
" lengths.append(run)\n",
" return np.array(lengths)\n",
"\n",
"raw_lengths = compute_gap_sequences(build_gap_df(stocks))\n",
"clean_lengths = compute_gap_sequences(build_gap_df(stocks_repaired))\n",
"\n",
"max_len = 20\n",
"raw_h = np.bincount(np.minimum(raw_lengths, max_len), minlength=max_len+1) / len(raw_lengths)\n",
"clean_h = np.bincount(np.minimum(clean_lengths, max_len), minlength=max_len+1) / len(clean_lengths)\n",
"x = np.arange(max_len + 1)\n",
"\n",
"plt.figure(figsize=(9, 4))\n",
"plt.plot(x, raw_h, marker=\"o\", label=\"Raw\")\n",
"plt.plot(x, clean_h, marker=\"o\", label=\"Repaired\")\n",
"plt.xlabel(\"Gap persistence length (20 = 20+)\")\n",
"plt.ylabel(\"Share of sequences\")\n",
"plt.title(\"Persistence of accounting gaps — Raw vs Repaired\")\n",
"plt.legend()\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "2011955e",
"metadata": {},
"source": [
"## 5. Figure — Rupture intensity distribution (Before vs After)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "250f2125",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"domain": {
"x": [
0,
0.48
]
},
"hole": 0.45,
"labels": [
"Clean (<=1%)",
"Moderate (1-10%)",
"High (10-30%)",
"Severe (>30%)"
],
"name": "Before repair",
"textinfo": "percent",
"type": "pie",
"values": {
"bdata": "zczMzMxMSEAAAAAAAIA8QDMzMzMzMyZAzczMzMzMJ0A=",
"dtype": "f8"
}
},
{
"domain": {
"x": [
0.52,
1
]
},
"hole": 0.45,
"labels": [
"Clean (<=1%)",
"Moderate (1-10%)",
"High (10-30%)",
"Severe (>30%)"
],
2026-03-10 18:45:51 +01:00
"name": "After repair",
"textinfo": "percent",
"type": "pie",
"values": {
2026-04-05 17:49:37 +02:00
"bdata": "zczMzMxMSEAzMzMzM7M8QAAAAAAAACdAZmZmZmZmJkA=",
2026-03-10 18:45:51 +01:00
"dtype": "f8"
}
}
],
"layout": {
"annotations": [
{
"showarrow": false,
"text": "Before repair",
"x": 0.22,
"y": 0.5
},
{
"showarrow": false,
"text": "After repair",
"x": 0.78,
"y": 0.5
}
],
"template": {
"data": {
"bar": [
{
"error_x": {
"color": "#2a3f5f"
},
"error_y": {
"color": "#2a3f5f"
},
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "bar"
}
],
"barpolar": [
{
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "barpolar"
}
],
"carpet": [
{
"aaxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"baxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"type": "carpet"
}
],
"choropleth": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "choropleth"
}
],
"contour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "contour"
}
],
"contourcarpet": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "contourcarpet"
}
],
"heatmap": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmap"
}
],
"histogram": [
{
"marker": {
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "histogram"
}
],
"histogram2d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2d"
}
],
"histogram2dcontour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2dcontour"
}
],
"mesh3d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "mesh3d"
}
],
"parcoords": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "parcoords"
}
],
"pie": [
{
"automargin": true,
"type": "pie"
}
],
"scatter": [
{
"fillpattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
},
"type": "scatter"
}
],
"scatter3d": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter3d"
}
],
"scattercarpet": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattercarpet"
}
],
"scattergeo": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergeo"
}
],
"scattergl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergl"
}
],
"scattermap": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermap"
}
],
"scattermapbox": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermapbox"
}
],
"scatterpolar": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolar"
}
],
"scatterpolargl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolargl"
}
],
"scatterternary": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterternary"
}
],
"surface": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "surface"
}
],
"table": [
{
"cells": {
"fill": {
"color": "#EBF0F8"
},
"line": {
"color": "white"
}
},
"header": {
"fill": {
"color": "#C8D4E3"
},
"line": {
"color": "white"
}
},
"type": "table"
}
]
},
"layout": {
"annotationdefaults": {
"arrowcolor": "#2a3f5f",
"arrowhead": 0,
"arrowwidth": 1
},
"autotypenumbers": "strict",
"coloraxis": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"colorscale": {
"diverging": [
[
0,
"#8e0152"
],
[
0.1,
"#c51b7d"
],
[
0.2,
"#de77ae"
],
[
0.3,
"#f1b6da"
],
[
0.4,
"#fde0ef"
],
[
0.5,
"#f7f7f7"
],
[
0.6,
"#e6f5d0"
],
[
0.7,
"#b8e186"
],
[
0.8,
"#7fbc41"
],
[
0.9,
"#4d9221"
],
[
1,
"#276419"
]
],
"sequential": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"sequentialminus": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
]
},
"colorway": [
"#636efa",
"#EF553B",
"#00cc96",
"#ab63fa",
"#FFA15A",
"#19d3f3",
"#FF6692",
"#B6E880",
"#FF97FF",
"#FECB52"
],
"font": {
"color": "#2a3f5f"
},
"geo": {
"bgcolor": "white",
"lakecolor": "white",
"landcolor": "#E5ECF6",
"showlakes": true,
"showland": true,
"subunitcolor": "white"
},
"hoverlabel": {
"align": "left"
},
"hovermode": "closest",
"mapbox": {
"style": "light"
},
"paper_bgcolor": "white",
"plot_bgcolor": "#E5ECF6",
"polar": {
"angularaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"radialaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"scene": {
"xaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"yaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"zaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
}
},
"shapedefaults": {
"line": {
"color": "#2a3f5f"
}
},
"ternary": {
"aaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"baxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"caxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"title": {
"x": 0.05
},
"xaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
},
"yaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
}
}
},
"title": {
2026-04-05 17:49:37 +02:00
"text": "Rupture intensity distribution — Before vs After repair"
2026-03-10 18:45:51 +01:00
}
}
},
2026-04-05 17:49:37 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAApsAAAFoCAYAAAAYfDBOAAAQAElEQVR4AeydB4AUxdLHa8PlI+eck4AKKiqoYHziUzGBAXN4YFbMmDChmEUf5vQMfIoJRFEMgIooKgaCBIkCkjm4fLfhm/8cfczt7d7t7m2Ymf0rfTPTXV1d/esJNdUzs04//yMBEiABEiABEiABEiCBOBFwCv8jARIgARIwCQGaQQIkQAL2I0Bn035jyh6RAAmQAAmQAAmQgGkIWNbZNA1BGkICJEACJEACJEACJBCSAJ3NkGhYQAIkQAIkECYBipEACZBASAJ0NkOiYQEJkAAJkAAJkAAJkEBdCdDZrCvBSOtTngRIgARIgARIgARSiACdzRQabHaVBEiABEigKgFukQAJxJ8Anc34M2YLJEACJEACJEACJJCyBOhspuzQR9pxypMACZAACZAACZBA5ATobEbOjDVIgARIgARIILkE2DoJWIgAnU0LDRZNJQESIAESIAESIAGrEaCzabURo72REqA8CZAACZAACZBAEgnQ2UwifDZNAiRAAiRAAqlFgL1NRQJ0NlNx1NlnEiABEiABEiABEkgQATqbCQLNZkggUgKUJwESIAESIAE7EKCzaYdRZB9IgARIgARIgATiSYC660CAzmYd4LEqCZAACZAACZAACZBAzQTobNbMh6UkQAKREqA8CZAACZAACRgI0Nk0wOAqCZAACZAACZAACdiJgBn6QmfTDKNAG0iABEiABEiABEjApgTobNp0YNktEiCBSAlQngRIgARIIB4E6GzGgyp1kgAJkAAJkAAJkAAJ6ASicjb1mvxDAiRAAiRAAiRAAiRAArUQoLNZCyAWkwAJkIDJCdA8EiABEjA1ATqbph4eGkcCJEACJEACJEAC1iaQWs6mtceK1pMACZAACZAACZCA5QjQ2bTckNFgEiABErAHAfaCBEggNQjQ2UyNcWYvSYAESIAESIAESCApBOhsJgV7pI1SngRIgARIgARIgASsSYDOpjXHjVaTAAmQAAkkiwDbJQESiIgAnc2IcFGYBEiABEiABEiABEggEgJ0NiOhRdlICVCeBEiABEiABEggxQnQ2UzxHYDdJwESIAESSBUC7CcJJIcAnc3kcGerJEACJEACJEACJJASBOhspsQws5OREqA8CZAACZAACZBAbAjQ2YwNR2ohARIgARIgARKIDwFqtTgBOpsWH0CaX3cCi5etkUNPvEJmff9r3ZXFSMPLkz/VbYJtMVJJNXEggHHqPeRCQTr+nJtl6/a8OLRiH5XYn3GsgReSmY45+1BmT0jAfARM42ziJI2TNU5AgQkn9ESgUyfCRLUXTZ+UjWMffDHi6opxNHUjbizMCmCN8TZedGAf9gXYG6aamIsFs6u2RmAv7Ib9tckay1W9i6+fIEXFJcaiyNdrqKHaCWZfNP2toSlTFam+YT8LTCiL1ljUfemt6fLu8+Nk8ezX5LO3H5ZmTRpGq87S9dR5CXzBJVhnIHPpDQ/LpSNP1HmB2ZED+wn2x+PpqFciAz9wNJ4TKwu5QgIWJWAaZ1PxG/avQZUnIpyMnhl/rTz+/Lv6CUnJcEkCJEACkRCon5td6RTivIKkzi24uEeiC7K4KZg7f6EcOaif9O7REVkpnX5YsETvf9tWzQRcwEfPMPyBTIP6uXLycQMNuVwlAXsTYO8qCJjO2awwa+9f3PnCAZ0191fBnfHektRcw4Vt3vRJMv62y2wB4JKzT9BvLo7UIhxm6lAi7UI0DFGxV564RbKzMpOCIZH9TUoHgzTap0cnade6eUjnKEgVZgUhAMcSDiYc7xEnHyl/rlgrq9dtCiLJrHAIpOKxGA4XylibgOmdzVB4EY0INvWCqQc8E2R0TNU0zZLlawR1MEWBZJy2hDymeHYXFOmRVJQjoS5siKa9NX9vErQBPUabcHJW+ShDgn60U1uCndBllIeN6FdN/cMU6nlXj5e/N26RqZ/P1Z8xQ7uwA/aodqEX+SoFlkMP2kKbYK3ksERdpUctA2UghzxVjjroD/qFPOiFfbBzyOnX6Xai/LsfF+pjh3LIGZPRJmN+qHXogB0qjRn332qigXZBADbCFlUPS8ihDDbUxFfVhTz6j7pIsAV1FVPoMqbCopLKfQjykIO8UQY6AscJ5WhLyaNOTfZBFn2DnairEraRj7aRsI48VY6lqjv/16W12gr54Cl5uS2bN67m5BvHCP1WHGElWJ52yV3y469/VjmWwAHlSFhHPZVCjY/iiTFUsmgbOpDC0QM5Y4J9sBc6jflYD1aG9lTbaok8yIeTwAHp2MEHyiH999GrIIqpr+z5A1swQ2U8rsHk5vuf0xka8xWTPVX1Z6mVXViib+iHKsc68tAG9k3Uhxz0G89tSl4tIY96dTlPq/YwTkhoV6VAhoHlkIOt0KFswhJygfm12Yp6TCRgZgKmdzZxsti0ZYf06tZBOrVvGTVLnMzGjJskbzw9Vo+k/TTjOV3XVWOf0p+TQ8TwpcduFky3jRk1QpfBVFu0EUS0d/bl98oNo8/UdSEaiTZwYjl6+BjBBQ76kWa//6RM+Xh2nR4VQHvG/kHnxs3b5f4n39D7iegZ+o5IDiLFaBdJRdPAGSdn2IG6KEOCnbiw4oSuK9rzBw7hF3N+1vsGOTUlaTzBYh18UQYZJLAdO/7FkFFq8IZ9sFPZAXaHHdxXhp80RIJFuBctWy27dhfIyNOO3WNd8IXq44KFK0Tphk23Xn2O4CYjeK2KXIwbbkaMz5uhX7iAop+18a3QIvqNjJEb+qvKApew6do7J1buQ7C1f99uAqcxcDwC6wZuh2ufsR76NWLUOBk/9rLKcUb/kYcyo2ygrTi+WrdoIrfc/7x+fBllzbI+beb3QfcbXNixj6pnMcEd+97JF2jnjmVr9OcyP3j5Xjm4Xy/BvopyJESk0DfUNx5HYIH8YMcRuIEnHDXoQEKUX+2rRj0oC3U8Qr9KGGvYG+pYwbkC7UEe4xjpMYp6xoT9GccrIsU4R/fSztWIdKIPSg77OY59yKljD+eeh+8YrTM05uN4x7kSdcGyprGAjEo4Jz323Dvy1ZTH9f0V+mubKQCLWJyncR6AHRgjJJwbwBV8ka8SGKBcJRxPOK/g/KJkQi1D2RpKnvkkUCcCMa5semdz8kdf6xGE84YfVy36EAkLnMzgbOFEjHo4CUEn7siRkBfLhPamvT6+2vNcb33whe4433HdeZXNwaZQF4dKoVpW0J6xf0onHKtwHBMwwPTXY3dfoV9MVXNw4ODI4cKs8rDERRYXEKwj4cKLhAsPtpGwDrtwEcI2Ei7IT913DVYjTnjWC898GaMmuKC9MWWmzhQXupqUhupj8yaN9JuMmuqqNlXkBrJwCuCQ5GRlYTOshIuNkVtNlXDjgxsgdeGFbKjxQFksE/aZCc9M1h0B9FPpxvhh7FEGGZUfaKs6vrBPmWFKVTl1iCapBAcB+1Pzpg1VN/SbIDhpcLCN3NW+h+O3UjjICpwLOD23XHV25XEEFhPuGKVLP/HCFH2p/oAb9iEjY5SF2lfDHX+1n6r9FjqRcEziOEVS23U5RrEP4ByDmyCcc9DXWJ1X4YBFMhbYL8NxMNFvldD3WJyncVzj2FB6wRcJ5yaco5CPciSsq6T2q8BxUuXGZShbjTJcJwGzEjCds4kTtboYYLlyzQb9LjXwZBwLoMrJWLX2n1ioq1WHOjEPGtC3muOsLg5btu+sVU8kAnAUt2yr/XMsuAghIhHosGEb+RiHcNpFFFqdXLt0bKNP2wdGtwb061nNCQ9HNy5muKgh2gOWqANHBg4NLnC40CEvVEIfA52LULKB+Z07tNKjnzfcM6nK523gkKA/gfLx2oZjhD6EOx7R2oF9BvuOioAZ9WBcUQYZY36wdTh5te3TGEtMZ+J4jzQh8hWs3cA85dSpiJJaYn/C4xpwElEHF33wNd4gIV/te8b9G/kiUmWBcwmcgsD6OdmZgkhvbfWVMuyrOO5w/Kk8LLGN/NrGH/vlkYP6ifFYUc6b8fyDsUTELNpjFLMKqG/cT9B3MEAfYHO0qa5jEW272B/hQBs5KV3
2026-03-10 18:45:51 +01:00
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
2026-04-05 17:49:37 +02:00
"def rupture_distribution(df):\n",
" series = (\n",
" df.groupby(GROUP)\n",
" .agg(n_ruptures=(\"rupture\", \"sum\"), total_obs=(\"rupture\", \"count\"))\n",
" .reset_index()\n",
" )\n",
" series[\"rupture_rate\"] = series[\"n_ruptures\"] / series[\"total_obs\"]\n",
" bins = [0, 0.01, 0.10, 0.30, 1.01]\n",
" labels = [\"Clean (<=1%)\", \"Moderate (1-10%)\", \"High (10-30%)\", \"Severe (>30%)\"]\n",
" series[\"class\"] = pd.cut(series[\"rupture_rate\"], bins=bins, labels=labels, include_lowest=True)\n",
" return (series[\"class\"].value_counts(normalize=True).sort_index() * 100).round(1)\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"df_raw_gap = build_gap_df(stocks); df_raw_gap[\"rupture\"] = df_raw_gap[\"gap\"].abs() > 10\n",
"df_clean_gap = build_gap_df(stocks_repaired); df_clean_gap[\"rupture\"] = df_clean_gap[\"gap\"].abs() > 10\n",
"\n",
"dist_before = rupture_distribution(df_raw_gap)\n",
"dist_after = rupture_distribution(df_clean_gap)\n",
2026-03-10 18:45:51 +01:00
"\n",
2026-04-05 17:49:37 +02:00
"fig = go.Figure()\n",
2026-03-10 18:45:51 +01:00
"fig.add_trace(go.Pie(\n",
2026-04-05 17:49:37 +02:00
" labels=dist_before.index, values=dist_before.values,\n",
" hole=0.45, name=\"Before repair\", domain=dict(x=[0, 0.48]), textinfo=\"percent\"\n",
2026-03-10 18:45:51 +01:00
"))\n",
"fig.add_trace(go.Pie(\n",
2026-04-05 17:49:37 +02:00
" labels=dist_after.index, values=dist_after.values,\n",
" hole=0.45, name=\"After repair\", domain=dict(x=[0.52, 1.0]), textinfo=\"percent\"\n",
2026-03-10 18:45:51 +01:00
"))\n",
"fig.update_layout(\n",
2026-04-05 17:49:37 +02:00
" title=\"Rupture intensity distribution — Before vs After repair\",\n",
2026-03-10 18:45:51 +01:00
" annotations=[\n",
" dict(text=\"Before repair\", x=0.22, y=0.5, showarrow=False),\n",
2026-04-05 17:49:37 +02:00
" dict(text=\"After repair\", x=0.78, y=0.5, showarrow=False)\n",
2026-03-10 18:45:51 +01:00
" ]\n",
")\n",
"fig.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
2026-04-05 17:49:37 +02:00
"id": "2853dffa-2672-4e98-a3ae-2bcfee09b43c",
2026-03-10 18:45:51 +01:00
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}