Project_Carmignac/Clustering_Fund_400+_corrected.ipynb

5189 lines
771 KiB
Plaintext
Raw Normal View History

2026-04-13 18:33:30 +02:00
{
"cells": [
{
"cell_type": "markdown",
"id": "13c6141d",
"metadata": {},
"source": [
"# Behavioral Clustering of Carmignac Investors\n",
"\n",
"This notebook implements two complementary clustering analyses:\n",
"\n",
"| | Scope | Approach |\n",
"|---|---|---|\n",
"| **Part 1** | All active accounts (~7,000) | Global behavioral clustering |\n",
"| **Part 2** | Top ~400 accounts (AUM > €5M) | High-conviction clustering with performance reactivity features |\n",
"\n",
"Both analyses share the same preprocessing pipeline (RobustScaler, MAD winsorization) and visualization conventions (robust z-score heatmaps).\n",
"\n",
"---\n",
"**Structure:**\n",
"1. Imports & Configuration\n",
"2. Data Loading\n",
"3. Monthly Panel Construction\n",
"4. Feature Engineering\n",
"5. **Part 1** — Global Clustering (all accounts)\n",
" - 5a. Feature selection & preprocessing\n",
" - 5b. K-selection & clustering\n",
" - 5c. Cluster profiles (behavioral + allocation)\n",
" - 5d. Asset-type sub-clustering & cross-analysis\n",
"6. **Part 2** — Top 400 Accounts Clustering\n",
" - 6a. Account selection & feature engineering\n",
" - 6b. K-selection & clustering\n",
" - 6c. Cluster profiles & churn analysis\n",
"7. Cross-Analysis: Global vs Top 400\n"
]
},
{
"cell_type": "markdown",
"id": "28e588fe",
"metadata": {},
"source": [
"---\n",
"## 1. Imports & Configuration\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3bc1ffe0",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import re\n",
"import s3fs\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"from sklearn.preprocessing import RobustScaler\n",
"from sklearn.cluster import KMeans\n",
"from sklearn.metrics import silhouette_score, davies_bouldin_score\n",
"\n",
"sns.set_style(\"whitegrid\")\n",
"pd.set_option(\"display.max_columns\", 200)\n",
"pd.set_option(\"display.max_rows\", 200)\n",
"\n",
"EPS = 1e-9\n",
"RANDOM_STATE = 42"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "69d2dc25",
"metadata": {},
"outputs": [],
"source": [
"# Column names\n",
"ID_COL = \"Registrar Account - ID\"\n",
"ISIN_COL = \"Product - Isin\"\n",
"FUND_COL = \"Product - Fund\"\n",
"ASSET_COL = \"Product - Asset Type\"\n",
"FLOW_DATE_COL = \"Centralisation Date\"\n",
"AUM_DATE_COL = \"Centralisation Date\"\n",
"FLOW_QTY_COL = \"Quantity - NetFlows\"\n",
"FLOW_SUB_COL = \"Quantity - Subscription\"\n",
"FLOW_RED_COL = \"Quantity - Redemption\"\n",
"AUM_QTY_COL = \"Quantity - AUM\"\n",
"AUM_VAL_COL = \"Value - AUM €\"\n",
"REGION_COL = \"Registrar Account - Region\"\n",
"COUNTRY_COL = \"RegistrarAccount - Country\"\n",
"NAV_DATE_COL = \"Dat\"\n",
"NAV_ISIN_COL = \"Isin\"\n",
"NAV_PRICE_COL = \"Price (TF PartPrice)\"\n",
"NAV_BENCH_COL = \"PriceBench\"\n",
"RATE_DATE_COL = \"Date\"\n",
"RATE_VAL_COL = \"Yld to Maturity\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bf5b7a0a",
"metadata": {},
"outputs": [],
"source": [
"# SHARED UTILITIES\n",
"def robust_zscore(s):\n",
" med = np.nanmedian(s)\n",
" mad = np.nanmedian(np.abs(s - med))\n",
" if mad == 0 or np.isnan(mad):\n",
" return np.zeros(len(s))\n",
" return (s - med) / (1.4826 * mad)\n",
"\n",
"def plot_heatmap(dfc, profile_vars, cluster_col, title, figsize=(16, 4)):\n",
" \"\"\"Cluster signature heatmap using robust z-scores, capped at ±3 for readability.\"\"\"\n",
" dfc_viz = dfc[profile_vars + [cluster_col]].copy()\n",
" for col in profile_vars:\n",
" vals = pd.to_numeric(dfc_viz[col], errors=\"coerce\").to_numpy(dtype=float)\n",
" lo = np.nanpercentile(vals, 2)\n",
" hi = np.nanpercentile(vals, 98)\n",
" dfc_viz[col] = np.clip(vals, lo, hi)\n",
" prof = dfc_viz.groupby(cluster_col)[profile_vars].median()\n",
" prof_z = prof.apply(lambda col: robust_zscore(col.values), axis=0)\n",
" prof_z = prof_z.clip(-3, 3) # cap for readability\n",
" plt.figure(figsize=figsize)\n",
" sns.heatmap(prof_z, cmap=\"RdBu_r\", center=0, annot=True, fmt=\".2f\",\n",
" xticklabels=profile_vars,\n",
" yticklabels=[f\"Cluster {i}\" for i in range(len(prof))])\n",
" plt.title(title)\n",
" plt.xticks(rotation=45, ha=\"right\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
" return prof\n",
"\n",
"def winsorize_mad(series, n_sigma=3):\n",
" \"\"\"Winsorize using MAD n-sigma rule. Falls back to p95 clip when MAD~0.\"\"\"\n",
" vals = pd.to_numeric(series, errors=\"coerce\").to_numpy(dtype=float)\n",
" med = np.nanmedian(vals)\n",
" mad = np.nanmedian(np.abs(vals - med)) * 1.4826\n",
" if mad > 0:\n",
" return np.clip(vals, med - n_sigma * mad, med + n_sigma * mad)\n",
" else:\n",
" return np.clip(vals, 0, np.nanpercentile(vals, 95))\n",
"\n",
"def add_months_since_last_tx(dfc, df_month, id_col, suffix=\"\"):\n",
" \"\"\"Adds months_since_last_tx[suffix] to dfc.\"\"\"\n",
" col_name = f\"months_since_last_tx{suffix}\"\n",
" reference_date = df_month[\"month\"].max()\n",
" last_active = (\n",
" df_month[df_month[\"active_month\"] == 1]\n",
" .groupby(id_col)[\"month\"]\n",
" .max()\n",
" .reset_index(name=\"last_active_month\")\n",
" )\n",
" last_active[col_name] = (\n",
" (reference_date.to_period(\"M\") -\n",
" last_active[\"last_active_month\"].dt.to_period(\"M\"))\n",
" .apply(lambda x: x.n)\n",
" )\n",
" dfc = dfc.merge(last_active[[id_col, col_name]], on=id_col, how=\"left\")\n",
" max_months = dfc[col_name].max()\n",
" dfc[col_name] = dfc[col_name].fillna(max_months + 1)\n",
" return dfc\n",
"\n",
"\n",
"def add_months_since_last_tx_by_group(df, id_cols, active_col=\"active_month\", month_col=\"month\", suffix=\"\"):\n",
" col_name = f\"months_since_last_tx{suffix}\"\n",
" reference_date = df[month_col].max()\n",
"\n",
" last_active = (\n",
" df[df[active_col] == 1]\n",
" .groupby(id_cols)[month_col]\n",
" .max()\n",
" .reset_index(name=\"last_active_month\")\n",
" )\n",
"\n",
" last_active[col_name] = (\n",
" (reference_date.to_period(\"M\") - last_active[\"last_active_month\"].dt.to_period(\"M\"))\n",
" .apply(lambda x: x.n)\n",
" )\n",
"\n",
" return last_active[[*id_cols, col_name]]"
]
},
{
"cell_type": "markdown",
"id": "312153e6",
"metadata": {},
"source": [
"---\n",
"## 2. Data Loading\n",
"\n",
"Three data sources are used:\n",
"- **AUM** (repaired): monthly share quantities per account and ISIN\n",
"- **Flows**: daily net transactions, aggregated to monthly\n",
"- **NAV / Rates**: fund performance and interest rate data for enrichment\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "001d9af2-07db-4813-937e-693103823359",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>reg_orig</th>\n",
" <th>reg_used</th>\n",
" <th>Agreement - Code</th>\n",
" <th>Company - Id</th>\n",
" <th>Company - Ultimate Parent Id</th>\n",
" <th>Registrar Account - Region</th>\n",
" <th>RegistrarAccount - Country</th>\n",
" <th>Product - Asset Type</th>\n",
" <th>Product - Strategy</th>\n",
" <th>Product - Legal Status</th>\n",
" <th>Product - Is Dedie ?</th>\n",
" <th>Product - Fund</th>\n",
" <th>Product - Shareclass Type</th>\n",
" <th>Product - Shareclass Currency</th>\n",
" <th>Product - Isin</th>\n",
" <th>Centralisation Date</th>\n",
" <th>Quantity - AUM</th>\n",
" <th>Value - AUM CCY</th>\n",
" <th>Value - AUM €</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18872</td>\n",
" <td>18872</td>\n",
" <td>L104</td>\n",
" <td>2257.0</td>\n",
" <td>33675.0</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>Diversified</td>\n",
" <td>Patrimoine</td>\n",
" <td>FCP</td>\n",
" <td>NO</td>\n",
" <td>Carmignac Patrimoine</td>\n",
" <td>A</td>\n",
" <td>EUR</td>\n",
" <td>FR0010135103</td>\n",
" <td>2015-01-31</td>\n",
" <td>49094.915</td>\n",
" <td>3.242523e+07</td>\n",
" <td>3.242523e+07</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>18872</td>\n",
" <td>18872</td>\n",
" <td>L104</td>\n",
" <td>2257.0</td>\n",
" <td>33675.0</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>Equity</td>\n",
" <td>Investissement Latitude</td>\n",
" <td>FCP</td>\n",
" <td>NO</td>\n",
" <td>Carmignac Investissement Latitude</td>\n",
" <td>A</td>\n",
" <td>EUR</td>\n",
" <td>FR0010147603</td>\n",
" <td>2015-01-31</td>\n",
" <td>1717.000</td>\n",
" <td>4.767422e+05</td>\n",
" <td>4.767422e+05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>18872</td>\n",
" <td>18872</td>\n",
" <td>L104</td>\n",
" <td>2257.0</td>\n",
" <td>33675.0</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>Equity</td>\n",
" <td>Investissement</td>\n",
" <td>FCP</td>\n",
" <td>NO</td>\n",
" <td>Carmignac Investissement</td>\n",
" <td>A</td>\n",
" <td>EUR</td>\n",
" <td>FR0010148981</td>\n",
" <td>2015-01-31</td>\n",
" <td>8254.870</td>\n",
" <td>9.862671e+06</td>\n",
" <td>9.862671e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>18872</td>\n",
" <td>18872</td>\n",
" <td>L104</td>\n",
" <td>2257.0</td>\n",
" <td>33675.0</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>Equity</td>\n",
" <td>Euro-Entrepreneurs</td>\n",
" <td>FCP</td>\n",
" <td>NO</td>\n",
" <td>Carmignac Euro-Entrepreneurs</td>\n",
" <td>A</td>\n",
" <td>EUR</td>\n",
" <td>FR0010149112</td>\n",
" <td>2015-01-31</td>\n",
" <td>278.923</td>\n",
" <td>7.664525e+04</td>\n",
" <td>7.664525e+04</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>18872</td>\n",
" <td>18872</td>\n",
" <td>L104</td>\n",
" <td>2257.0</td>\n",
" <td>33675.0</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>Fixed Income</td>\n",
" <td>Sécurité</td>\n",
" <td>FCP</td>\n",
" <td>NO</td>\n",
" <td>Carmignac Sécurité</td>\n",
" <td>AW &amp; AW-R</td>\n",
" <td>EUR</td>\n",
" <td>FR0010149120</td>\n",
" <td>2015-01-31</td>\n",
" <td>1807.267</td>\n",
" <td>3.078318e+06</td>\n",
" <td>3.078318e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1088393</th>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>France</td>\n",
" <td>France</td>\n",
" <td>Diversified</td>\n",
" <td>Inflation Solution</td>\n",
" <td>SICAV</td>\n",
" <td>NO</td>\n",
" <td>Carmignac Portfolio Inflation Solution</td>\n",
" <td>F</td>\n",
" <td>EUR</td>\n",
" <td>LU2715954330</td>\n",
" <td>2025-10-31</td>\n",
" <td>81065.419</td>\n",
" <td>9.533293e+06</td>\n",
" <td>9.533293e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1088394</th>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>France</td>\n",
" <td>France</td>\n",
" <td>Diversified</td>\n",
" <td>Inflation Solution</td>\n",
" <td>SICAV</td>\n",
" <td>NO</td>\n",
" <td>Carmignac Portfolio Inflation Solution</td>\n",
" <td>A</td>\n",
" <td>EUR</td>\n",
" <td>LU2715954504</td>\n",
" <td>2025-10-31</td>\n",
" <td>6853.363</td>\n",
" <td>7.978685e+05</td>\n",
" <td>7.978685e+05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1088395</th>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>France</td>\n",
" <td>France</td>\n",
" <td>Private Assets</td>\n",
" <td>Evergreen</td>\n",
" <td>SICAV</td>\n",
" <td>NO</td>\n",
" <td>Carmignac S.A. SICAV - PART II UCI Private Eve...</td>\n",
" <td>A</td>\n",
" <td>EUR</td>\n",
" <td>LU2799473124</td>\n",
" <td>2025-10-31</td>\n",
" <td>4212.234</td>\n",
" <td>5.263608e+05</td>\n",
" <td>5.263608e+05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1088396</th>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>France</td>\n",
" <td>France</td>\n",
" <td>Equity</td>\n",
" <td>Tech Solutions</td>\n",
" <td>SICAV</td>\n",
" <td>NO</td>\n",
" <td>Carmignac Portfolio Tech Solutions</td>\n",
" <td>A</td>\n",
" <td>EUR</td>\n",
" <td>LU2809794220</td>\n",
" <td>2025-10-31</td>\n",
" <td>31469.523</td>\n",
" <td>4.438147e+06</td>\n",
" <td>4.438147e+06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1088397</th>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>Private Client</td>\n",
" <td>France</td>\n",
" <td>France</td>\n",
" <td>Equity</td>\n",
" <td>Tech Solutions</td>\n",
" <td>SICAV</td>\n",
" <td>NO</td>\n",
" <td>Carmignac Portfolio Tech Solutions</td>\n",
" <td>F</td>\n",
" <td>EUR</td>\n",
" <td>LU2809794576</td>\n",
" <td>2025-10-31</td>\n",
" <td>554.301</td>\n",
" <td>7.871629e+04</td>\n",
" <td>7.871629e+04</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1088398 rows × 19 columns</p>\n",
"</div>"
],
"text/plain": [
" reg_orig reg_used Agreement - Code Company - Id \\\n",
"0 18872 18872 L104 2257.0 \n",
"1 18872 18872 L104 2257.0 \n",
"2 18872 18872 L104 2257.0 \n",
"3 18872 18872 L104 2257.0 \n",
"4 18872 18872 L104 2257.0 \n",
"... ... ... ... ... \n",
"1088393 Private Client Private Client Private Client Private Client \n",
"1088394 Private Client Private Client Private Client Private Client \n",
"1088395 Private Client Private Client Private Client Private Client \n",
"1088396 Private Client Private Client Private Client Private Client \n",
"1088397 Private Client Private Client Private Client Private Client \n",
"\n",
" Company - Ultimate Parent Id Registrar Account - Region \\\n",
"0 33675.0 Switzerland \n",
"1 33675.0 Switzerland \n",
"2 33675.0 Switzerland \n",
"3 33675.0 Switzerland \n",
"4 33675.0 Switzerland \n",
"... ... ... \n",
"1088393 Private Client France \n",
"1088394 Private Client France \n",
"1088395 Private Client France \n",
"1088396 Private Client France \n",
"1088397 Private Client France \n",
"\n",
" RegistrarAccount - Country Product - Asset Type \\\n",
"0 Switzerland Diversified \n",
"1 Switzerland Equity \n",
"2 Switzerland Equity \n",
"3 Switzerland Equity \n",
"4 Switzerland Fixed Income \n",
"... ... ... \n",
"1088393 France Diversified \n",
"1088394 France Diversified \n",
"1088395 France Private Assets \n",
"1088396 France Equity \n",
"1088397 France Equity \n",
"\n",
" Product - Strategy Product - Legal Status Product - Is Dedie ? \\\n",
"0 Patrimoine FCP NO \n",
"1 Investissement Latitude FCP NO \n",
"2 Investissement FCP NO \n",
"3 Euro-Entrepreneurs FCP NO \n",
"4 Sécurité FCP NO \n",
"... ... ... ... \n",
"1088393 Inflation Solution SICAV NO \n",
"1088394 Inflation Solution SICAV NO \n",
"1088395 Evergreen SICAV NO \n",
"1088396 Tech Solutions SICAV NO \n",
"1088397 Tech Solutions SICAV NO \n",
"\n",
" Product - Fund \\\n",
"0 Carmignac Patrimoine \n",
"1 Carmignac Investissement Latitude \n",
"2 Carmignac Investissement \n",
"3 Carmignac Euro-Entrepreneurs \n",
"4 Carmignac Sécurité \n",
"... ... \n",
"1088393 Carmignac Portfolio Inflation Solution \n",
"1088394 Carmignac Portfolio Inflation Solution \n",
"1088395 Carmignac S.A. SICAV - PART II UCI Private Eve... \n",
"1088396 Carmignac Portfolio Tech Solutions \n",
"1088397 Carmignac Portfolio Tech Solutions \n",
"\n",
" Product - Shareclass Type Product - Shareclass Currency \\\n",
"0 A EUR \n",
"1 A EUR \n",
"2 A EUR \n",
"3 A EUR \n",
"4 AW & AW-R EUR \n",
"... ... ... \n",
"1088393 F EUR \n",
"1088394 A EUR \n",
"1088395 A EUR \n",
"1088396 A EUR \n",
"1088397 F EUR \n",
"\n",
" Product - Isin Centralisation Date Quantity - AUM Value - AUM CCY \\\n",
"0 FR0010135103 2015-01-31 49094.915 3.242523e+07 \n",
"1 FR0010147603 2015-01-31 1717.000 4.767422e+05 \n",
"2 FR0010148981 2015-01-31 8254.870 9.862671e+06 \n",
"3 FR0010149112 2015-01-31 278.923 7.664525e+04 \n",
"4 FR0010149120 2015-01-31 1807.267 3.078318e+06 \n",
"... ... ... ... ... \n",
"1088393 LU2715954330 2025-10-31 81065.419 9.533293e+06 \n",
"1088394 LU2715954504 2025-10-31 6853.363 7.978685e+05 \n",
"1088395 LU2799473124 2025-10-31 4212.234 5.263608e+05 \n",
"1088396 LU2809794220 2025-10-31 31469.523 4.438147e+06 \n",
"1088397 LU2809794576 2025-10-31 554.301 7.871629e+04 \n",
"\n",
" Value - AUM € \n",
"0 3.242523e+07 \n",
"1 4.767422e+05 \n",
"2 9.862671e+06 \n",
"3 7.664525e+04 \n",
"4 3.078318e+06 \n",
"... ... \n",
"1088393 9.533293e+06 \n",
"1088394 7.978685e+05 \n",
"1088395 5.263608e+05 \n",
"1088396 4.438147e+06 \n",
"1088397 7.871629e+04 \n",
"\n",
"[1088398 rows x 19 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_aum = pd.read_csv(\"s3://projet-bdc-carmignac-g3/paco/AUM_paths.csv\")\n",
"df_aum"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3eea8321-6d49-46e1-a678-3b972c8ccd09",
"metadata": {},
"outputs": [],
"source": [
"df_aum = df_aum.rename(columns={\"reg_orig\": \"Registrar Account - ID\"})"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "027cf8e1-3e4c-416c-bc9e-cd299cdbd7c3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Registrar Account - ID', 'reg_used', 'Agreement - Code',\n",
" 'Company - Id', 'Company - Ultimate Parent Id',\n",
" 'Registrar Account - Region', 'RegistrarAccount - Country',\n",
" 'Product - Asset Type', 'Product - Strategy', 'Product - Legal Status',\n",
" 'Product - Is Dedie ?', 'Product - Fund', 'Product - Shareclass Type',\n",
" 'Product - Shareclass Currency', 'Product - Isin',\n",
" 'Centralisation Date', 'Quantity - AUM', 'Value - AUM CCY',\n",
" 'Value - AUM €'],\n",
" dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_aum.columns"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "011958df",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"flows: (2514419, 25)\n",
"aum: (1076643, 20)\n",
"nav: (623914, 6)\n"
]
}
],
"source": [
"PATH_FLOWS = \"s3://projet-bdc-data/carmignac/Flows ENSAE V2 -20251105.csv\"\n",
"PATH_NAV = \"s3://projet-bdc-data/carmignac/Data Modélisation/Nav/NAV_Bench_data.csv\"\n",
"PATH_RATES = \"s3://projet-bdc-data/carmignac/Data Modélisation/market data/esterRates.csv\"\n",
"\n",
"df_flows = pd.read_csv(PATH_FLOWS, sep=\";\")\n",
"df_nav = pd.read_csv(PATH_NAV, sep=\";\")\n",
"df_rates = pd.read_csv(PATH_RATES, sep=\";\")\n",
"\n",
"\n",
"# Date parsing\n",
"for df, col in [\n",
" (df_flows, FLOW_DATE_COL), (df_aum, AUM_DATE_COL),\n",
" (df_nav, NAV_DATE_COL), (df_rates, RATE_DATE_COL)\n",
"]:\n",
" df[col] = pd.to_datetime(df[col], errors=\"coerce\")\n",
" df[\"month\"] = df[col].dt.to_period(\"M\").dt.to_timestamp()\n",
"\n",
"for col in [FLOW_QTY_COL, FLOW_SUB_COL, FLOW_RED_COL]:\n",
" df_flows[col] = pd.to_numeric(df_flows[col], errors=\"coerce\")\n",
"for col in [AUM_QTY_COL, AUM_VAL_COL]:\n",
" df_aum[col] = pd.to_numeric(df_aum[col], errors=\"coerce\")\n",
"for col in [NAV_PRICE_COL, NAV_BENCH_COL]:\n",
" df_nav[col] = pd.to_numeric(df_nav[col], errors=\"coerce\")\n",
"df_rates[RATE_VAL_COL] = pd.to_numeric(df_rates[RATE_VAL_COL], errors=\"coerce\")\n",
"\n",
"for df in [df_flows, df_aum]:\n",
" df[ISIN_COL] = df[ISIN_COL].astype(str).str.strip()\n",
"df_nav[NAV_ISIN_COL] = df_nav[NAV_ISIN_COL].astype(str).str.strip()\n",
"\n",
"# Remove technical accounts (not investable)\n",
"df_flows = df_flows[~df_flows[ID_COL].isin(\n",
" [\"Off Distribution\", \"Private Clients\", \"Private Client\"]\n",
")]\n",
"df_aum = df_aum[~df_aum[ID_COL].isin(\n",
" [\"Off Distribution\", \"Private Clients\", \"Private Client\"]\n",
")]\n",
"\n",
"print(\"flows:\", df_flows.shape)\n",
"print(\"aum: \", df_aum.shape)\n",
"print(\"nav: \", df_nav.shape)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cadf5a78-53f3-4157-878f-edc615ef1964",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Agreement - Code', 'Company - Id', 'Company - Ultimate Parent Id',\n",
" 'Registrar Account - ID', 'Registrar Account - Region',\n",
" 'RegistrarAccount - Country', 'Product - Asset Type',\n",
" 'Product - Strategy', 'Product - Legal Status', 'Product - Is Dedie ?',\n",
" 'Product - Fund', 'Product - Shareclass Type',\n",
" 'Product - Shareclass Currency', 'Product - Isin',\n",
" 'Centralisation Date', 'Quantity - Subscription',\n",
" 'Quantity - Redemption', 'Quantity - NetFlows',\n",
" 'Value Ccy - Subscription', 'Value Ccy - Redemption',\n",
" 'Value Ccy - NetFlows', 'Value € - Subscription',\n",
" 'Value € - Redemption', 'Value € - NetFlows', 'month'],\n",
" dtype='object')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_flows.columns"
]
},
{
"cell_type": "markdown",
"id": "acd89acb-a9c1-4488-a47e-db4ca76b9214",
"metadata": {},
"source": [
"## Etude fund et fund families sur aum et sur flows"
]
},
{
"cell_type": "markdown",
"id": "b568bbc3-55ad-4dcb-bdb0-94e9a154575d",
"metadata": {},
"source": [
"---\n",
"## 3. Part 0 — Define Family Funds et Faire clustering sur les top 15 funds\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "70c8591f-29aa-4d2f-94bb-d3a280cbe5bb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Nombre de couples ISIN/Fund : 348\n",
"Nombre de funds uniques : 62\n",
"Nombre de families uniques : 58\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Product - Isin</th>\n",
" <th>Product - Fund</th>\n",
" <th>fund_family</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>FR0000999999</td>\n",
" <td>Carmignac Court Terme</td>\n",
" <td>Carmignac Court Terme</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>FR0010135103</td>\n",
" <td>Carmignac Patrimoine</td>\n",
" <td>Carmignac Patrimoine</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>FR0010147603</td>\n",
" <td>Carmignac Investissement Latitude</td>\n",
" <td>Carmignac Investissement Latitude</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>FR0010148981</td>\n",
" <td>Carmignac Investissement</td>\n",
" <td>Carmignac Investissement</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>FR0010148999</td>\n",
" <td>Carmignac Profil Réactif 75</td>\n",
" <td>Carmignac Profil Réactif &lt;NUM&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>FR0010149096</td>\n",
" <td>Carmignac Innovation</td>\n",
" <td>Carmignac Innovation</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>FR0010149112</td>\n",
" <td>Carmignac Euro-Entrepreneurs</td>\n",
" <td>Carmignac Euro-Entrepreneurs</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>FR0010149120</td>\n",
" <td>Carmignac Sécurité</td>\n",
" <td>Carmignac Sécurité</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>FR0010149161</td>\n",
" <td>Carmignac Court Terme</td>\n",
" <td>Carmignac Court Terme</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>FR0010149179</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>FR0010149203</td>\n",
" <td>Carmignac Multi Expertise</td>\n",
" <td>Carmignac Multi Expertise</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>FR0010149211</td>\n",
" <td>Carmignac Profil Réactif 100</td>\n",
" <td>Carmignac Profil Réactif &lt;NUM&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>FR0010149278</td>\n",
" <td>Carmignac Euro-Investissement</td>\n",
" <td>Carmignac Euro-Investissement</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>FR0010149302</td>\n",
" <td>Carmignac Emergents</td>\n",
" <td>Carmignac Emergents</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>FR0010306142</td>\n",
" <td>Carmignac Patrimoine</td>\n",
" <td>Carmignac Patrimoine</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>FR0010312660</td>\n",
" <td>Carmignac Investissement</td>\n",
" <td>Carmignac Investissement</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>FR0010956607</td>\n",
" <td>Carmignac Emergents</td>\n",
" <td>Carmignac Emergents</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>FR0010956615</td>\n",
" <td>Carmignac Investissement</td>\n",
" <td>Carmignac Investissement</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>FR0010956649</td>\n",
" <td>Carmignac Patrimoine</td>\n",
" <td>Carmignac Patrimoine</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>FR0011147446</td>\n",
" <td>Carmignac Emergents</td>\n",
" <td>Carmignac Emergents</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Product - Isin Product - Fund \\\n",
"0 FR0000999999 Carmignac Court Terme \n",
"1 FR0010135103 Carmignac Patrimoine \n",
"2 FR0010147603 Carmignac Investissement Latitude \n",
"3 FR0010148981 Carmignac Investissement \n",
"4 FR0010148999 Carmignac Profil Réactif 75 \n",
"5 FR0010149096 Carmignac Innovation \n",
"6 FR0010149112 Carmignac Euro-Entrepreneurs \n",
"7 FR0010149120 Carmignac Sécurité \n",
"8 FR0010149161 Carmignac Court Terme \n",
"9 FR0010149179 Carmignac Absolute Return Europe \n",
"10 FR0010149203 Carmignac Multi Expertise \n",
"11 FR0010149211 Carmignac Profil Réactif 100 \n",
"12 FR0010149278 Carmignac Euro-Investissement \n",
"13 FR0010149302 Carmignac Emergents \n",
"14 FR0010306142 Carmignac Patrimoine \n",
"15 FR0010312660 Carmignac Investissement \n",
"16 FR0010956607 Carmignac Emergents \n",
"17 FR0010956615 Carmignac Investissement \n",
"18 FR0010956649 Carmignac Patrimoine \n",
"19 FR0011147446 Carmignac Emergents \n",
"\n",
" fund_family \n",
"0 Carmignac Court Terme \n",
"1 Carmignac Patrimoine \n",
"2 Carmignac Investissement Latitude \n",
"3 Carmignac Investissement \n",
"4 Carmignac Profil Réactif <NUM> \n",
"5 Carmignac Innovation \n",
"6 Carmignac Euro-Entrepreneurs \n",
"7 Carmignac Sécurité \n",
"8 Carmignac Court Terme \n",
"9 Carmignac Absolute Return Europe \n",
"10 Carmignac Multi Expertise \n",
"11 Carmignac Profil Réactif <NUM> \n",
"12 Carmignac Euro-Investissement \n",
"13 Carmignac Emergents \n",
"14 Carmignac Patrimoine \n",
"15 Carmignac Investissement \n",
"16 Carmignac Emergents \n",
"17 Carmignac Investissement \n",
"18 Carmignac Patrimoine \n",
"19 Carmignac Emergents "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import re\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# =========================================================\n",
"# 1) Fonction conservative de construction de famille\n",
"# - même nom => même famille\n",
"# - différence purement numérique => même famille\n",
"# - sinon famille différente\n",
"# =========================================================\n",
"\n",
"def build_fund_family_numeric_only(fund_name):\n",
" if pd.isna(fund_name):\n",
" return np.nan\n",
" \n",
" s = str(fund_name).strip()\n",
" family = re.sub(r\"\\d+\", \"<NUM>\", s)\n",
" family = re.sub(r\"\\s+\", \" \", family).strip()\n",
" return family\n",
"\n",
"fund_family_table = (\n",
" df_aum[[ISIN_COL, FUND_COL]]\n",
" .dropna(subset=[ISIN_COL, FUND_COL])\n",
" .drop_duplicates()\n",
" .sort_values([ISIN_COL, FUND_COL])\n",
" .reset_index(drop=True)\n",
")\n",
"\n",
"fund_family_table[\"fund_family\"] = fund_family_table[FUND_COL].apply(build_fund_family_numeric_only)\n",
"\n",
"print(\"Nombre de couples ISIN/Fund :\", fund_family_table.shape[0])\n",
"print(\"Nombre de funds uniques :\", fund_family_table[FUND_COL].nunique())\n",
"print(\"Nombre de families uniques :\", fund_family_table[\"fund_family\"].nunique())\n",
"\n",
"fund_family_table.head(20)"
]
},
{
"cell_type": "markdown",
"id": "d34f5ecf",
"metadata": {},
"source": [
"---\n",
"## 3. Monthly Panel Construction\n",
"\n",
"A full outer join of AUM and flows at `(account, ISIN, month)` granularity, enriched with NAV returns and interest rate changes.\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "25f3dce4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"df_rel_m shape: (1365583, 20)\n"
]
}
],
"source": [
"df_flows_m = (\n",
" df_flows\n",
" .dropna(subset=[ID_COL, ISIN_COL, \"month\"])\n",
" .assign(\n",
" gross_flow_qty=lambda x: x[FLOW_QTY_COL].abs(),\n",
" sub_qty=lambda x: x[FLOW_SUB_COL].fillna(0),\n",
" red_qty=lambda x: x[FLOW_RED_COL].fillna(0)\n",
" )\n",
" .groupby([ID_COL, ISIN_COL, \"month\"], as_index=False)\n",
" .agg(\n",
" net_flow_qty=(FLOW_QTY_COL, \"sum\"),\n",
" gross_flow_qty=(\"gross_flow_qty\", \"sum\"),\n",
" sub_qty=(\"sub_qty\", \"sum\"),\n",
" red_qty=(\"red_qty\", \"sum\"),\n",
" n_tx=(FLOW_QTY_COL, \"size\"),\n",
" region=(REGION_COL, \"last\"),\n",
" country=(COUNTRY_COL, \"last\")\n",
" )\n",
")\n",
"\n",
"df_aum_m = (\n",
" df_aum\n",
" .dropna(subset=[ID_COL, ISIN_COL, FUND_COL, \"month\"])\n",
" .groupby([ID_COL, ISIN_COL, FUND_COL, \"month\"], as_index=False)\n",
" .agg(\n",
" aum_qty=(AUM_QTY_COL, \"sum\"),\n",
" aum_val=(AUM_VAL_COL, \"sum\"),\n",
" asset_type=(ASSET_COL, \"last\"),\n",
" region=(REGION_COL, \"last\"),\n",
" country=(COUNTRY_COL, \"last\")\n",
" )\n",
")\n",
"\n",
"keys = pd.concat([\n",
" df_flows_m[[ID_COL, ISIN_COL, \"month\"]],\n",
" df_aum_m[[ID_COL, ISIN_COL, \"month\"]]\n",
"]).drop_duplicates()\n",
"\n",
"df_rel_m = (\n",
" keys\n",
" .merge(df_aum_m, on=[ID_COL, ISIN_COL, \"month\"], how=\"left\")\n",
" .merge(df_flows_m, on=[ID_COL, ISIN_COL, \"month\"], how=\"left\", suffixes=(\"\", \"_flow\"))\n",
")\n",
"\n",
"for c in [\"aum_qty\", \"aum_val\", \"net_flow_qty\", \"gross_flow_qty\", \"sub_qty\", \"red_qty\", \"n_tx\"]:\n",
" df_rel_m[c] = df_rel_m[c].fillna(0)\n",
"\n",
"df_rel_m[\"region\"] = df_rel_m[\"region\"].fillna(df_rel_m.get(\"region_flow\"))\n",
"df_rel_m[\"country\"] = df_rel_m[\"country\"].fillna(df_rel_m.get(\"country_flow\"))\n",
"\n",
"df_rel_m[\"active_rel_month\"] = (df_rel_m[\"gross_flow_qty\"] > 0).astype(int)\n",
"df_rel_m[\"holding_rel_month\"] = (df_rel_m[\"aum_qty\"] > 0).astype(int)\n",
"\n",
"df_rel_m[\"flow_to_aum_rel\"] = np.where(\n",
" df_rel_m[\"aum_qty\"].abs() > 0,\n",
" df_rel_m[\"net_flow_qty\"] / df_rel_m[\"aum_qty\"].abs(),\n",
" np.nan\n",
")\n",
"\n",
"df_rel_m[\"turnover_rel\"] = np.where(\n",
" df_rel_m[\"aum_qty\"].abs() > 0,\n",
" df_rel_m[\"gross_flow_qty\"] / df_rel_m[\"aum_qty\"].abs(),\n",
" np.nan\n",
")\n",
"\n",
"print(\"df_rel_m shape:\", df_rel_m.shape)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "0e95eaaa-3c49-44b9-b049-fee5ecba66ef",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Panel with NAV/rates: (1365583, 22)\n"
]
}
],
"source": [
"df_nav_m = (\n",
" df_nav\n",
" .dropna(subset=[NAV_ISIN_COL, \"month\", NAV_PRICE_COL])\n",
" .sort_values([NAV_ISIN_COL, \"month\"])\n",
" .groupby([NAV_ISIN_COL, \"month\"], as_index=False)\n",
" .tail(1)\n",
")\n",
"\n",
"df_nav_m[\"ret_fund_m\"] = df_nav_m.groupby(NAV_ISIN_COL)[NAV_PRICE_COL].pct_change()\n",
"\n",
"df_nav_m = df_nav_m.rename(columns={NAV_ISIN_COL: ISIN_COL})[\n",
" [ISIN_COL, \"month\", \"ret_fund_m\"]\n",
"]\n",
"\n",
"df_rates_m = (\n",
" df_rates\n",
" .dropna(subset=[\"month\", RATE_VAL_COL])\n",
" .sort_values(RATE_DATE_COL)\n",
" .groupby(\"month\", as_index=False)\n",
" .tail(1)\n",
")\n",
"\n",
"df_rates_m[\"delta_rate_m\"] = df_rates_m[RATE_VAL_COL].diff()\n",
"df_rates_m = df_rates_m[[\"month\", \"delta_rate_m\"]]\n",
"\n",
"df_rel_m = df_rel_m.merge(df_nav_m, on=[ISIN_COL, \"month\"], how=\"left\")\n",
"df_rel_m = df_rel_m.merge(df_rates_m, on=\"month\", how=\"left\")\n",
"\n",
"for c in [\"ret_fund_m\", \"delta_rate_m\"]:\n",
" df_rel_m[c] = df_rel_m[c].fillna(0)\n",
"\n",
"print(\"Panel with NAV/rates:\", df_rel_m.shape)\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f4c2a3fd-2595-445b-95f5-bd544355e23d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Missing fund_family: 416901\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Product - Isin</th>\n",
" <th>Product - Fund</th>\n",
" <th>fund_family</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LU0099161993</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LU0164455502</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>LU0336083497</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>LU0336083810</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>LU0553415323</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>LU0592698954</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>LU0592699093</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>LU0592699259</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>LU0705572823</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>LU0807689582</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>LU0807690838</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>LU0992624949</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53</th>\n",
" <td>LU0992625086</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>LU0992626993</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>LU0992627611</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>LU0992629153</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>69</th>\n",
" <td>LU0992629401</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>71</th>\n",
" <td>LU0992631563</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>73</th>\n",
" <td>LU1163533422</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>79</th>\n",
" <td>LU1299306321</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Product - Isin Product - Fund fund_family\n",
"0 LU0099161993 NaN NaN\n",
"3 LU0164455502 NaN NaN\n",
"12 LU0336083497 NaN NaN\n",
"15 LU0336083810 NaN NaN\n",
"19 LU0553415323 NaN NaN\n",
"21 LU0592698954 NaN NaN\n",
"29 LU0592699093 NaN NaN\n",
"30 LU0592699259 NaN NaN\n",
"37 LU0705572823 NaN NaN\n",
"40 LU0807689582 NaN NaN\n",
"41 LU0807690838 NaN NaN\n",
"42 LU0992624949 NaN NaN\n",
"53 LU0992625086 NaN NaN\n",
"57 LU0992626993 NaN NaN\n",
"60 LU0992627611 NaN NaN\n",
"67 LU0992629153 NaN NaN\n",
"69 LU0992629401 NaN NaN\n",
"71 LU0992631563 NaN NaN\n",
"73 LU1163533422 NaN NaN\n",
"79 LU1299306321 NaN NaN"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#merge famille\n",
"\n",
"df_rel_m = df_rel_m.merge(\n",
" fund_family_table[[ISIN_COL, FUND_COL, \"fund_family\"]],\n",
" on=[ISIN_COL, FUND_COL],\n",
" how=\"left\"\n",
")\n",
"\n",
"print(\"Missing fund_family:\", df_rel_m[\"fund_family\"].isna().sum())\n",
"df_rel_m[[ISIN_COL, FUND_COL, \"fund_family\"]].drop_duplicates().head(20)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "f4fb4a9f-b354-42d0-8699-bf6fc1e26bb9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"df_family_month shape: (539285, 23)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Registrar Account - ID</th>\n",
" <th>fund_family</th>\n",
" <th>month</th>\n",
" <th>aum_qty</th>\n",
" <th>aum_val</th>\n",
" <th>net_flow_qty</th>\n",
" <th>gross_flow_qty</th>\n",
" <th>sub_qty</th>\n",
" <th>red_qty</th>\n",
" <th>n_tx</th>\n",
" <th>n_isin_held</th>\n",
" <th>n_isin_active</th>\n",
" <th>ret_fund_m</th>\n",
" <th>delta_rate_m</th>\n",
" <th>region</th>\n",
" <th>country</th>\n",
" <th>active_month</th>\n",
" <th>flow_to_aum_m</th>\n",
" <th>turnover_m</th>\n",
" <th>sub_share_m</th>\n",
" <th>red_share_m</th>\n",
" <th>aum_peak_to_date</th>\n",
" <th>aum_drawdown</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-01-01</td>\n",
" <td>100.000</td>\n",
" <td>30624.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.004902</td>\n",
" <td>-0.058</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-02-01</td>\n",
" <td>100.000</td>\n",
" <td>31886.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.026230</td>\n",
" <td>-0.022</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-03-01</td>\n",
" <td>100.000</td>\n",
" <td>32026.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.024272</td>\n",
" <td>-0.014</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-04-01</td>\n",
" <td>100.000</td>\n",
" <td>32843.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>-0.077</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-05-01</td>\n",
" <td>100.000</td>\n",
" <td>33545.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>-0.053</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-06-01</td>\n",
" <td>100.000</td>\n",
" <td>32390.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0.020</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-07-01</td>\n",
" <td>100.000</td>\n",
" <td>32163.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.003115</td>\n",
" <td>-0.042</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-08-01</td>\n",
" <td>100.000</td>\n",
" <td>30573.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>-0.004630</td>\n",
" <td>-0.008</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-09-01</td>\n",
" <td>100.000</td>\n",
" <td>29065.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>-0.012</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-10-01</td>\n",
" <td>443.206</td>\n",
" <td>127980.1646</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>-0.083823</td>\n",
" <td>-0.007</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>443.206</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Registrar Account - ID fund_family month \\\n",
"0 18872 Carmignac Absolute Return Europe 2015-01-01 \n",
"1 18872 Carmignac Absolute Return Europe 2015-02-01 \n",
"2 18872 Carmignac Absolute Return Europe 2015-03-01 \n",
"3 18872 Carmignac Absolute Return Europe 2015-04-01 \n",
"4 18872 Carmignac Absolute Return Europe 2015-05-01 \n",
"5 18872 Carmignac Absolute Return Europe 2015-06-01 \n",
"6 18872 Carmignac Absolute Return Europe 2015-07-01 \n",
"7 18872 Carmignac Absolute Return Europe 2015-08-01 \n",
"8 18872 Carmignac Absolute Return Europe 2015-09-01 \n",
"9 18872 Carmignac Absolute Return Europe 2015-10-01 \n",
"\n",
" aum_qty aum_val net_flow_qty gross_flow_qty sub_qty red_qty n_tx \\\n",
"0 100.000 30624.0000 0.0 0.0 0.0 0.0 0.0 \n",
"1 100.000 31886.0000 0.0 0.0 0.0 0.0 0.0 \n",
"2 100.000 32026.0000 0.0 0.0 0.0 0.0 0.0 \n",
"3 100.000 32843.0000 0.0 0.0 0.0 0.0 0.0 \n",
"4 100.000 33545.0000 0.0 0.0 0.0 0.0 0.0 \n",
"5 100.000 32390.0000 0.0 0.0 0.0 0.0 0.0 \n",
"6 100.000 32163.0000 0.0 0.0 0.0 0.0 0.0 \n",
"7 100.000 30573.0000 0.0 0.0 0.0 0.0 0.0 \n",
"8 100.000 29065.0000 0.0 0.0 0.0 0.0 0.0 \n",
"9 443.206 127980.1646 0.0 0.0 0.0 0.0 0.0 \n",
"\n",
" n_isin_held n_isin_active ret_fund_m delta_rate_m region \\\n",
"0 1 0 0.004902 -0.058 Switzerland \n",
"1 1 0 0.026230 -0.022 Switzerland \n",
"2 1 0 0.024272 -0.014 Switzerland \n",
"3 1 0 0.000000 -0.077 Switzerland \n",
"4 1 0 0.000000 -0.053 Switzerland \n",
"5 1 0 0.000000 0.020 Switzerland \n",
"6 1 0 0.003115 -0.042 Switzerland \n",
"7 1 0 -0.004630 -0.008 Switzerland \n",
"8 1 0 0.000000 -0.012 Switzerland \n",
"9 1 0 -0.083823 -0.007 Switzerland \n",
"\n",
" country active_month flow_to_aum_m turnover_m sub_share_m \\\n",
"0 Switzerland 0 0.0 0.0 NaN \n",
"1 Switzerland 0 0.0 0.0 NaN \n",
"2 Switzerland 0 0.0 0.0 NaN \n",
"3 Switzerland 0 0.0 0.0 NaN \n",
"4 Switzerland 0 0.0 0.0 NaN \n",
"5 Switzerland 0 0.0 0.0 NaN \n",
"6 Switzerland 0 0.0 0.0 NaN \n",
"7 Switzerland 0 0.0 0.0 NaN \n",
"8 Switzerland 0 0.0 0.0 NaN \n",
"9 Switzerland 0 0.0 0.0 NaN \n",
"\n",
" red_share_m aum_peak_to_date aum_drawdown \n",
"0 NaN 100.000 0.0 \n",
"1 NaN 100.000 0.0 \n",
"2 NaN 100.000 0.0 \n",
"3 NaN 100.000 0.0 \n",
"4 NaN 100.000 0.0 \n",
"5 NaN 100.000 0.0 \n",
"6 NaN 100.000 0.0 \n",
"7 NaN 100.000 0.0 \n",
"8 NaN 100.000 0.0 \n",
"9 NaN 443.206 0.0 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#construction du panel mensuel compte × famille × mois \n",
"tmp = df_rel_m.copy()\n",
"\n",
"tmp[\"isin_held_flag\"] = (tmp[\"aum_qty\"] > 0).astype(int)\n",
"tmp[\"isin_active_flag\"] = (tmp[\"gross_flow_qty\"] > 0).astype(int)\n",
"\n",
"df_family_month = (\n",
" tmp\n",
" .dropna(subset=[ID_COL, \"fund_family\", \"month\"])\n",
" .groupby([ID_COL, \"fund_family\", \"month\"], as_index=False)\n",
" .agg(\n",
" aum_qty=(\"aum_qty\", \"sum\"),\n",
" aum_val=(\"aum_val\", \"sum\"),\n",
" net_flow_qty=(\"net_flow_qty\", \"sum\"),\n",
" gross_flow_qty=(\"gross_flow_qty\", \"sum\"),\n",
" sub_qty=(\"sub_qty\", \"sum\"),\n",
" red_qty=(\"red_qty\", \"sum\"),\n",
" n_tx=(\"n_tx\", \"sum\"),\n",
" n_isin_held=(\"isin_held_flag\", \"sum\"),\n",
" n_isin_active=(\"isin_active_flag\", \"sum\"),\n",
" ret_fund_m=(\"ret_fund_m\", \"mean\"),\n",
" delta_rate_m=(\"delta_rate_m\", \"first\"),\n",
" region=(\"region\", \"first\"),\n",
" country=(\"country\", \"first\")\n",
" )\n",
" .sort_values([ID_COL, \"fund_family\", \"month\"])\n",
" .reset_index(drop=True)\n",
")\n",
"\n",
"df_family_month[\"active_month\"] = (df_family_month[\"gross_flow_qty\"] > 0).astype(int)\n",
"\n",
"df_family_month[\"flow_to_aum_m\"] = np.where(\n",
" df_family_month[\"aum_qty\"].abs() > 0,\n",
" df_family_month[\"net_flow_qty\"] / df_family_month[\"aum_qty\"].abs(),\n",
" np.nan\n",
")\n",
"\n",
"df_family_month[\"turnover_m\"] = np.where(\n",
" df_family_month[\"aum_qty\"].abs() > 0,\n",
" df_family_month[\"gross_flow_qty\"] / df_family_month[\"aum_qty\"].abs(),\n",
" np.nan\n",
")\n",
"\n",
"df_family_month[\"sub_share_m\"] = np.where(\n",
" df_family_month[\"gross_flow_qty\"] > 0,\n",
" df_family_month[\"sub_qty\"] / df_family_month[\"gross_flow_qty\"],\n",
" np.nan\n",
")\n",
"\n",
"df_family_month[\"red_share_m\"] = np.where(\n",
" df_family_month[\"gross_flow_qty\"] > 0,\n",
" df_family_month[\"red_qty\"] / df_family_month[\"gross_flow_qty\"],\n",
" np.nan\n",
")\n",
"\n",
"df_family_month[\"aum_peak_to_date\"] = df_family_month.groupby([ID_COL, \"fund_family\"])[\"aum_qty\"].cummax()\n",
"\n",
"df_family_month[\"aum_drawdown\"] = np.where(\n",
" df_family_month[\"aum_peak_to_date\"] > 0,\n",
" 1 - df_family_month[\"aum_qty\"] / df_family_month[\"aum_peak_to_date\"],\n",
" np.nan\n",
")\n",
"\n",
"print(\"df_family_month shape:\", df_family_month.shape)\n",
"df_family_month.head(10)"
]
},
{
"cell_type": "markdown",
"id": "9121da21",
"metadata": {},
"source": [
"---\n",
"## 4. Feature Engineering\n",
"\n",
"Features are built at three levels of granularity:\n",
"- **Account × month**: activity flags, turnover, drawdown\n",
"- **Account × ISIN**: entry/exit events, holding duration, performance reactivity\n",
"- **Account (static)**: aggregated behavioral summary used for clustering\n",
"\n",
"Asset type and fund composition shares are computed separately and used as **descriptive** post-clustering variables only.\n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "b7fdb2fe-26ce-4881-903d-01fd6e473ac4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Registrar Account - ID</th>\n",
" <th>fund_family</th>\n",
" <th>month</th>\n",
" <th>aum_qty</th>\n",
" <th>aum_val</th>\n",
" <th>net_flow_qty</th>\n",
" <th>gross_flow_qty</th>\n",
" <th>sub_qty</th>\n",
" <th>red_qty</th>\n",
" <th>n_tx</th>\n",
" <th>n_isin_held</th>\n",
" <th>n_isin_active</th>\n",
" <th>ret_fund_m</th>\n",
" <th>delta_rate_m</th>\n",
" <th>region</th>\n",
" <th>country</th>\n",
" <th>active_month</th>\n",
" <th>flow_to_aum_m</th>\n",
" <th>turnover_m</th>\n",
" <th>sub_share_m</th>\n",
" <th>red_share_m</th>\n",
" <th>aum_peak_to_date</th>\n",
" <th>aum_drawdown</th>\n",
" <th>total_client_aum_qty</th>\n",
" <th>total_client_aum_val</th>\n",
" <th>family_share_of_client_aum</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-01-01</td>\n",
" <td>100.000</td>\n",
" <td>30624.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.004902</td>\n",
" <td>-0.058</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>179864.637</td>\n",
" <td>7.043266e+07</td>\n",
" <td>0.000435</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-02-01</td>\n",
" <td>100.000</td>\n",
" <td>31886.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.026230</td>\n",
" <td>-0.022</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>186761.736</td>\n",
" <td>7.317400e+07</td>\n",
" <td>0.000436</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-03-01</td>\n",
" <td>100.000</td>\n",
" <td>32026.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.024272</td>\n",
" <td>-0.014</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>190357.718</td>\n",
" <td>7.653007e+07</td>\n",
" <td>0.000418</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-04-01</td>\n",
" <td>100.000</td>\n",
" <td>32843.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>-0.077</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>191429.324</td>\n",
" <td>7.509285e+07</td>\n",
" <td>0.000437</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-05-01</td>\n",
" <td>100.000</td>\n",
" <td>33545.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>-0.053</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>189056.475</td>\n",
" <td>7.650176e+07</td>\n",
" <td>0.000438</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-06-01</td>\n",
" <td>100.000</td>\n",
" <td>32390.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0.020</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>188737.275</td>\n",
" <td>7.291397e+07</td>\n",
" <td>0.000444</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-07-01</td>\n",
" <td>100.000</td>\n",
" <td>32163.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.003115</td>\n",
" <td>-0.042</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>204372.466</td>\n",
" <td>7.523034e+07</td>\n",
" <td>0.000428</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-08-01</td>\n",
" <td>100.000</td>\n",
" <td>30573.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>-0.004630</td>\n",
" <td>-0.008</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>204294.776</td>\n",
" <td>6.919099e+07</td>\n",
" <td>0.000442</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-09-01</td>\n",
" <td>100.000</td>\n",
" <td>29065.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>-0.012</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>200179.763</td>\n",
" <td>6.492010e+07</td>\n",
" <td>0.000448</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-10-01</td>\n",
" <td>443.206</td>\n",
" <td>127980.1646</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>-0.083823</td>\n",
" <td>-0.007</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>443.206</td>\n",
" <td>0.0</td>\n",
" <td>193229.149</td>\n",
" <td>6.541687e+07</td>\n",
" <td>0.001956</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Registrar Account - ID fund_family month \\\n",
"0 18872 Carmignac Absolute Return Europe 2015-01-01 \n",
"1 18872 Carmignac Absolute Return Europe 2015-02-01 \n",
"2 18872 Carmignac Absolute Return Europe 2015-03-01 \n",
"3 18872 Carmignac Absolute Return Europe 2015-04-01 \n",
"4 18872 Carmignac Absolute Return Europe 2015-05-01 \n",
"5 18872 Carmignac Absolute Return Europe 2015-06-01 \n",
"6 18872 Carmignac Absolute Return Europe 2015-07-01 \n",
"7 18872 Carmignac Absolute Return Europe 2015-08-01 \n",
"8 18872 Carmignac Absolute Return Europe 2015-09-01 \n",
"9 18872 Carmignac Absolute Return Europe 2015-10-01 \n",
"\n",
" aum_qty aum_val net_flow_qty gross_flow_qty sub_qty red_qty n_tx \\\n",
"0 100.000 30624.0000 0.0 0.0 0.0 0.0 0.0 \n",
"1 100.000 31886.0000 0.0 0.0 0.0 0.0 0.0 \n",
"2 100.000 32026.0000 0.0 0.0 0.0 0.0 0.0 \n",
"3 100.000 32843.0000 0.0 0.0 0.0 0.0 0.0 \n",
"4 100.000 33545.0000 0.0 0.0 0.0 0.0 0.0 \n",
"5 100.000 32390.0000 0.0 0.0 0.0 0.0 0.0 \n",
"6 100.000 32163.0000 0.0 0.0 0.0 0.0 0.0 \n",
"7 100.000 30573.0000 0.0 0.0 0.0 0.0 0.0 \n",
"8 100.000 29065.0000 0.0 0.0 0.0 0.0 0.0 \n",
"9 443.206 127980.1646 0.0 0.0 0.0 0.0 0.0 \n",
"\n",
" n_isin_held n_isin_active ret_fund_m delta_rate_m region \\\n",
"0 1 0 0.004902 -0.058 Switzerland \n",
"1 1 0 0.026230 -0.022 Switzerland \n",
"2 1 0 0.024272 -0.014 Switzerland \n",
"3 1 0 0.000000 -0.077 Switzerland \n",
"4 1 0 0.000000 -0.053 Switzerland \n",
"5 1 0 0.000000 0.020 Switzerland \n",
"6 1 0 0.003115 -0.042 Switzerland \n",
"7 1 0 -0.004630 -0.008 Switzerland \n",
"8 1 0 0.000000 -0.012 Switzerland \n",
"9 1 0 -0.083823 -0.007 Switzerland \n",
"\n",
" country active_month flow_to_aum_m turnover_m sub_share_m \\\n",
"0 Switzerland 0 0.0 0.0 NaN \n",
"1 Switzerland 0 0.0 0.0 NaN \n",
"2 Switzerland 0 0.0 0.0 NaN \n",
"3 Switzerland 0 0.0 0.0 NaN \n",
"4 Switzerland 0 0.0 0.0 NaN \n",
"5 Switzerland 0 0.0 0.0 NaN \n",
"6 Switzerland 0 0.0 0.0 NaN \n",
"7 Switzerland 0 0.0 0.0 NaN \n",
"8 Switzerland 0 0.0 0.0 NaN \n",
"9 Switzerland 0 0.0 0.0 NaN \n",
"\n",
" red_share_m aum_peak_to_date aum_drawdown total_client_aum_qty \\\n",
"0 NaN 100.000 0.0 179864.637 \n",
"1 NaN 100.000 0.0 186761.736 \n",
"2 NaN 100.000 0.0 190357.718 \n",
"3 NaN 100.000 0.0 191429.324 \n",
"4 NaN 100.000 0.0 189056.475 \n",
"5 NaN 100.000 0.0 188737.275 \n",
"6 NaN 100.000 0.0 204372.466 \n",
"7 NaN 100.000 0.0 204294.776 \n",
"8 NaN 100.000 0.0 200179.763 \n",
"9 NaN 443.206 0.0 193229.149 \n",
"\n",
" total_client_aum_val family_share_of_client_aum \n",
"0 7.043266e+07 0.000435 \n",
"1 7.317400e+07 0.000436 \n",
"2 7.653007e+07 0.000418 \n",
"3 7.509285e+07 0.000437 \n",
"4 7.650176e+07 0.000438 \n",
"5 7.291397e+07 0.000444 \n",
"6 7.523034e+07 0.000428 \n",
"7 6.919099e+07 0.000442 \n",
"8 6.492010e+07 0.000448 \n",
"9 6.541687e+07 0.001956 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#part du fmaily fund dans ptf du client\n",
"\n",
"df_client_total_month = (\n",
" df_rel_m\n",
" .groupby([ID_COL, \"month\"], as_index=False)\n",
" .agg(\n",
" total_client_aum_qty=(\"aum_qty\", \"sum\"),\n",
" total_client_aum_val=(\"aum_val\", \"sum\")\n",
" )\n",
")\n",
"\n",
"df_family_month = df_family_month.merge(\n",
" df_client_total_month,\n",
" on=[ID_COL, \"month\"],\n",
" how=\"left\"\n",
")\n",
"\n",
"df_family_month[\"family_share_of_client_aum\"] = np.where(\n",
" df_family_month[\"total_client_aum_val\"].abs() > 0,\n",
" df_family_month[\"aum_val\"] / df_family_month[\"total_client_aum_val\"].abs(),\n",
" np.nan\n",
")\n",
"\n",
"df_family_month.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "d4a01bcc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(539285, 44)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Registrar Account - ID</th>\n",
" <th>fund_family</th>\n",
" <th>month</th>\n",
" <th>aum_qty</th>\n",
" <th>aum_val</th>\n",
" <th>net_flow_qty</th>\n",
" <th>gross_flow_qty</th>\n",
" <th>sub_qty</th>\n",
" <th>red_qty</th>\n",
" <th>n_tx</th>\n",
" <th>n_isin_held</th>\n",
" <th>n_isin_active</th>\n",
" <th>ret_fund_m</th>\n",
" <th>delta_rate_m</th>\n",
" <th>region</th>\n",
" <th>country</th>\n",
" <th>active_month</th>\n",
" <th>flow_to_aum_m</th>\n",
" <th>turnover_m</th>\n",
" <th>sub_share_m</th>\n",
" <th>red_share_m</th>\n",
" <th>aum_peak_to_date</th>\n",
" <th>aum_drawdown</th>\n",
" <th>total_client_aum_qty</th>\n",
" <th>total_client_aum_val</th>\n",
" <th>family_share_of_client_aum</th>\n",
" <th>prev_aum</th>\n",
" <th>entry_event</th>\n",
" <th>full_exit_event</th>\n",
" <th>ret_fund_m_lag1</th>\n",
" <th>buy_on_perf</th>\n",
" <th>sell_on_perf</th>\n",
" <th>ret_fund_mean3_lag1</th>\n",
" <th>ret_fund_mean6_lag1</th>\n",
" <th>buy_on_perf_mean3</th>\n",
" <th>sell_on_perf_mean3</th>\n",
" <th>buy_on_perf_mean6</th>\n",
" <th>sell_on_perf_mean6</th>\n",
" <th>turnover_m_mean3</th>\n",
" <th>flow_to_aum_m_mean3</th>\n",
" <th>turnover_m_mean6</th>\n",
" <th>flow_to_aum_m_mean6</th>\n",
" <th>turnover_m_mean12</th>\n",
" <th>flow_to_aum_m_mean12</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-01-01</td>\n",
" <td>100.000</td>\n",
" <td>30624.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.004902</td>\n",
" <td>-0.058</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>179864.637</td>\n",
" <td>7.043266e+07</td>\n",
" <td>0.000435</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-02-01</td>\n",
" <td>100.000</td>\n",
" <td>31886.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.026230</td>\n",
" <td>-0.022</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>186761.736</td>\n",
" <td>7.317400e+07</td>\n",
" <td>0.000436</td>\n",
" <td>100.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.004902</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.004902</td>\n",
" <td>0.004902</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-03-01</td>\n",
" <td>100.000</td>\n",
" <td>32026.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.024272</td>\n",
" <td>-0.014</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>190357.718</td>\n",
" <td>7.653007e+07</td>\n",
" <td>0.000418</td>\n",
" <td>100.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.026230</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.015566</td>\n",
" <td>0.015566</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-04-01</td>\n",
" <td>100.000</td>\n",
" <td>32843.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>-0.077</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>191429.324</td>\n",
" <td>7.509285e+07</td>\n",
" <td>0.000437</td>\n",
" <td>100.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.024272</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.018468</td>\n",
" <td>0.018468</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-05-01</td>\n",
" <td>100.000</td>\n",
" <td>33545.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>-0.053</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>189056.475</td>\n",
" <td>7.650176e+07</td>\n",
" <td>0.000438</td>\n",
" <td>100.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.016834</td>\n",
" <td>0.013851</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-06-01</td>\n",
" <td>100.000</td>\n",
" <td>32390.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0.020</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>188737.275</td>\n",
" <td>7.291397e+07</td>\n",
" <td>0.000444</td>\n",
" <td>100.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.008091</td>\n",
" <td>0.011081</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-07-01</td>\n",
" <td>100.000</td>\n",
" <td>32163.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.003115</td>\n",
" <td>-0.042</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>204372.466</td>\n",
" <td>7.523034e+07</td>\n",
" <td>0.000428</td>\n",
" <td>100.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0.009234</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-08-01</td>\n",
" <td>100.000</td>\n",
" <td>30573.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>-0.004630</td>\n",
" <td>-0.008</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>204294.776</td>\n",
" <td>6.919099e+07</td>\n",
" <td>0.000442</td>\n",
" <td>100.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.003115</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.001038</td>\n",
" <td>0.008936</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-09-01</td>\n",
" <td>100.000</td>\n",
" <td>29065.0000</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>-0.012</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>100.000</td>\n",
" <td>0.0</td>\n",
" <td>200179.763</td>\n",
" <td>6.492010e+07</td>\n",
" <td>0.000448</td>\n",
" <td>100.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>-0.004630</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>-0.000505</td>\n",
" <td>0.003793</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>2015-10-01</td>\n",
" <td>443.206</td>\n",
" <td>127980.1646</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>-0.083823</td>\n",
" <td>-0.007</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>443.206</td>\n",
" <td>0.0</td>\n",
" <td>193229.149</td>\n",
" <td>6.541687e+07</td>\n",
" <td>0.001956</td>\n",
" <td>100.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>-0.000505</td>\n",
" <td>-0.000252</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Registrar Account - ID fund_family month \\\n",
"0 18872 Carmignac Absolute Return Europe 2015-01-01 \n",
"1 18872 Carmignac Absolute Return Europe 2015-02-01 \n",
"2 18872 Carmignac Absolute Return Europe 2015-03-01 \n",
"3 18872 Carmignac Absolute Return Europe 2015-04-01 \n",
"4 18872 Carmignac Absolute Return Europe 2015-05-01 \n",
"5 18872 Carmignac Absolute Return Europe 2015-06-01 \n",
"6 18872 Carmignac Absolute Return Europe 2015-07-01 \n",
"7 18872 Carmignac Absolute Return Europe 2015-08-01 \n",
"8 18872 Carmignac Absolute Return Europe 2015-09-01 \n",
"9 18872 Carmignac Absolute Return Europe 2015-10-01 \n",
"\n",
" aum_qty aum_val net_flow_qty gross_flow_qty sub_qty red_qty n_tx \\\n",
"0 100.000 30624.0000 0.0 0.0 0.0 0.0 0.0 \n",
"1 100.000 31886.0000 0.0 0.0 0.0 0.0 0.0 \n",
"2 100.000 32026.0000 0.0 0.0 0.0 0.0 0.0 \n",
"3 100.000 32843.0000 0.0 0.0 0.0 0.0 0.0 \n",
"4 100.000 33545.0000 0.0 0.0 0.0 0.0 0.0 \n",
"5 100.000 32390.0000 0.0 0.0 0.0 0.0 0.0 \n",
"6 100.000 32163.0000 0.0 0.0 0.0 0.0 0.0 \n",
"7 100.000 30573.0000 0.0 0.0 0.0 0.0 0.0 \n",
"8 100.000 29065.0000 0.0 0.0 0.0 0.0 0.0 \n",
"9 443.206 127980.1646 0.0 0.0 0.0 0.0 0.0 \n",
"\n",
" n_isin_held n_isin_active ret_fund_m delta_rate_m region \\\n",
"0 1 0 0.004902 -0.058 Switzerland \n",
"1 1 0 0.026230 -0.022 Switzerland \n",
"2 1 0 0.024272 -0.014 Switzerland \n",
"3 1 0 0.000000 -0.077 Switzerland \n",
"4 1 0 0.000000 -0.053 Switzerland \n",
"5 1 0 0.000000 0.020 Switzerland \n",
"6 1 0 0.003115 -0.042 Switzerland \n",
"7 1 0 -0.004630 -0.008 Switzerland \n",
"8 1 0 0.000000 -0.012 Switzerland \n",
"9 1 0 -0.083823 -0.007 Switzerland \n",
"\n",
" country active_month flow_to_aum_m turnover_m sub_share_m \\\n",
"0 Switzerland 0 0.0 0.0 NaN \n",
"1 Switzerland 0 0.0 0.0 NaN \n",
"2 Switzerland 0 0.0 0.0 NaN \n",
"3 Switzerland 0 0.0 0.0 NaN \n",
"4 Switzerland 0 0.0 0.0 NaN \n",
"5 Switzerland 0 0.0 0.0 NaN \n",
"6 Switzerland 0 0.0 0.0 NaN \n",
"7 Switzerland 0 0.0 0.0 NaN \n",
"8 Switzerland 0 0.0 0.0 NaN \n",
"9 Switzerland 0 0.0 0.0 NaN \n",
"\n",
" red_share_m aum_peak_to_date aum_drawdown total_client_aum_qty \\\n",
"0 NaN 100.000 0.0 179864.637 \n",
"1 NaN 100.000 0.0 186761.736 \n",
"2 NaN 100.000 0.0 190357.718 \n",
"3 NaN 100.000 0.0 191429.324 \n",
"4 NaN 100.000 0.0 189056.475 \n",
"5 NaN 100.000 0.0 188737.275 \n",
"6 NaN 100.000 0.0 204372.466 \n",
"7 NaN 100.000 0.0 204294.776 \n",
"8 NaN 100.000 0.0 200179.763 \n",
"9 NaN 443.206 0.0 193229.149 \n",
"\n",
" total_client_aum_val family_share_of_client_aum prev_aum entry_event \\\n",
"0 7.043266e+07 0.000435 NaN 1 \n",
"1 7.317400e+07 0.000436 100.0 0 \n",
"2 7.653007e+07 0.000418 100.0 0 \n",
"3 7.509285e+07 0.000437 100.0 0 \n",
"4 7.650176e+07 0.000438 100.0 0 \n",
"5 7.291397e+07 0.000444 100.0 0 \n",
"6 7.523034e+07 0.000428 100.0 0 \n",
"7 6.919099e+07 0.000442 100.0 0 \n",
"8 6.492010e+07 0.000448 100.0 0 \n",
"9 6.541687e+07 0.001956 100.0 0 \n",
"\n",
" full_exit_event ret_fund_m_lag1 buy_on_perf sell_on_perf \\\n",
"0 0 NaN 0 0 \n",
"1 0 0.004902 0 0 \n",
"2 0 0.026230 0 0 \n",
"3 0 0.024272 0 0 \n",
"4 0 0.000000 0 0 \n",
"5 0 0.000000 0 0 \n",
"6 0 0.000000 0 0 \n",
"7 0 0.003115 0 0 \n",
"8 0 -0.004630 0 0 \n",
"9 0 0.000000 0 0 \n",
"\n",
" ret_fund_mean3_lag1 ret_fund_mean6_lag1 buy_on_perf_mean3 \\\n",
"0 NaN NaN 0 \n",
"1 0.004902 0.004902 0 \n",
"2 0.015566 0.015566 0 \n",
"3 0.018468 0.018468 0 \n",
"4 0.016834 0.013851 0 \n",
"5 0.008091 0.011081 0 \n",
"6 0.000000 0.009234 0 \n",
"7 0.001038 0.008936 0 \n",
"8 -0.000505 0.003793 0 \n",
"9 -0.000505 -0.000252 0 \n",
"\n",
" sell_on_perf_mean3 buy_on_perf_mean6 sell_on_perf_mean6 \\\n",
"0 0 0 0 \n",
"1 0 0 0 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"4 0 0 0 \n",
"5 0 0 0 \n",
"6 0 0 0 \n",
"7 0 0 0 \n",
"8 0 0 0 \n",
"9 0 0 0 \n",
"\n",
" turnover_m_mean3 flow_to_aum_m_mean3 turnover_m_mean6 \\\n",
"0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 \n",
"5 0.0 0.0 0.0 \n",
"6 0.0 0.0 0.0 \n",
"7 0.0 0.0 0.0 \n",
"8 0.0 0.0 0.0 \n",
"9 0.0 0.0 0.0 \n",
"\n",
" flow_to_aum_m_mean6 turnover_m_mean12 flow_to_aum_m_mean12 \n",
"0 0.0 0.0 0.0 \n",
"1 0.0 0.0 0.0 \n",
"2 0.0 0.0 0.0 \n",
"3 0.0 0.0 0.0 \n",
"4 0.0 0.0 0.0 \n",
"5 0.0 0.0 0.0 \n",
"6 0.0 0.0 0.0 \n",
"7 0.0 0.0 0.0 \n",
"8 0.0 0.0 0.0 \n",
"9 0.0 0.0 0.0 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#feature engineering temporel au niveau client × famille × mois\n",
"tmp = df_family_month.sort_values([ID_COL, \"fund_family\", \"month\"]).reset_index(drop=True)\n",
"\n",
"tmp[\"prev_aum\"] = tmp.groupby([ID_COL, \"fund_family\"])[\"aum_qty\"].shift(1)\n",
"\n",
"tmp[\"entry_event\"] = (\n",
" (tmp[\"prev_aum\"].fillna(0) <= 0) & (tmp[\"aum_qty\"] > 0)\n",
").astype(int)\n",
"\n",
"tmp[\"full_exit_event\"] = (\n",
" (tmp[\"prev_aum\"] > 0) & (tmp[\"aum_qty\"] <= 0)\n",
").astype(int)\n",
"\n",
"tmp[\"ret_fund_m_lag1\"] = tmp.groupby([ID_COL, \"fund_family\"])[\"ret_fund_m\"].shift(1)\n",
"\n",
"tmp[\"buy_on_perf\"] = (\n",
" (tmp[\"net_flow_qty\"] > 0) & (tmp[\"ret_fund_m_lag1\"] > 0)\n",
").astype(int)\n",
"\n",
"tmp[\"sell_on_perf\"] = (\n",
" (tmp[\"net_flow_qty\"] < 0) & (tmp[\"ret_fund_m_lag1\"] < 0)\n",
").astype(int)\n",
"\n",
"tmp[\"ret_fund_mean3_lag1\"] = (\n",
" tmp.groupby([ID_COL, \"fund_family\"])[\"ret_fund_m\"]\n",
" .transform(lambda s: s.shift(1).rolling(3, min_periods=1).mean())\n",
")\n",
"\n",
"tmp[\"ret_fund_mean6_lag1\"] = (\n",
" tmp.groupby([ID_COL, \"fund_family\"])[\"ret_fund_m\"]\n",
" .transform(lambda s: s.shift(1).rolling(6, min_periods=1).mean())\n",
")\n",
"\n",
"tmp[\"buy_on_perf_mean3\"] = (\n",
" (tmp[\"net_flow_qty\"] > 0) & (tmp[\"ret_fund_mean3_lag1\"] > 0)\n",
").astype(int)\n",
"\n",
"tmp[\"sell_on_perf_mean3\"] = (\n",
" (tmp[\"net_flow_qty\"] < 0) & (tmp[\"ret_fund_mean3_lag1\"] < 0)\n",
").astype(int)\n",
"\n",
"tmp[\"buy_on_perf_mean6\"] = (\n",
" (tmp[\"net_flow_qty\"] > 0) & (tmp[\"ret_fund_mean6_lag1\"] > 0)\n",
").astype(int)\n",
"\n",
"tmp[\"sell_on_perf_mean6\"] = (\n",
" (tmp[\"net_flow_qty\"] < 0) & (tmp[\"ret_fund_mean6_lag1\"] < 0)\n",
").astype(int)\n",
"\n",
"for w in [3, 6, 12]:\n",
" tmp[f\"turnover_m_mean{w}\"] = (\n",
" tmp.groupby([ID_COL, \"fund_family\"])[\"turnover_m\"]\n",
" .transform(lambda s: s.rolling(w, min_periods=1).mean())\n",
" )\n",
"\n",
" tmp[f\"flow_to_aum_m_mean{w}\"] = (\n",
" tmp.groupby([ID_COL, \"fund_family\"])[\"flow_to_aum_m\"]\n",
" .transform(lambda s: s.rolling(w, min_periods=1).mean())\n",
" )\n",
"\n",
"print(tmp.shape)\n",
"tmp.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "965cd530-9985-4d57-938c-cef531459c39",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"df_family_client_base shape: (7282, 49)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Registrar Account - ID</th>\n",
" <th>fund_family</th>\n",
" <th>n_months</th>\n",
" <th>n_active_months</th>\n",
" <th>flow_freq</th>\n",
" <th>family_aum_qty_mean</th>\n",
" <th>family_aum_qty_median</th>\n",
" <th>family_aum_qty_max</th>\n",
" <th>family_aum_qty_last</th>\n",
" <th>family_aum_val_mean</th>\n",
" <th>family_aum_val_last</th>\n",
" <th>net_flow_qty_sum</th>\n",
" <th>gross_flow_qty_sum</th>\n",
" <th>gross_flow_qty_mean</th>\n",
" <th>sub_qty_sum</th>\n",
" <th>red_qty_sum</th>\n",
" <th>n_tx_total</th>\n",
" <th>avg_n_isin_held</th>\n",
" <th>max_n_isin_held</th>\n",
" <th>net_flow_qty_vol</th>\n",
" <th>flow_to_aum_mean</th>\n",
" <th>flow_to_aum_vol</th>\n",
" <th>turnover_mean</th>\n",
" <th>turnover_vol</th>\n",
" <th>sub_share_mean</th>\n",
" <th>red_share_mean</th>\n",
" <th>aum_drawdown_last</th>\n",
" <th>aum_drawdown_max</th>\n",
" <th>entry_count</th>\n",
" <th>full_exit_count</th>\n",
" <th>buy_on_perf_rate</th>\n",
" <th>sell_on_perf_rate</th>\n",
" <th>buy_on_perf_mean3_rate</th>\n",
" <th>sell_on_perf_mean3_rate</th>\n",
" <th>buy_on_perf_mean6_rate</th>\n",
" <th>sell_on_perf_mean6_rate</th>\n",
" <th>turnover_mean3_avg</th>\n",
" <th>turnover_mean6_avg</th>\n",
" <th>turnover_mean12_avg</th>\n",
" <th>flow_to_aum_mean3_avg</th>\n",
" <th>flow_to_aum_mean6_avg</th>\n",
" <th>flow_to_aum_mean12_avg</th>\n",
" <th>family_share_of_client_aum_mean</th>\n",
" <th>family_share_of_client_aum_max</th>\n",
" <th>ret_fund_m_mean</th>\n",
" <th>delta_rate_m_mean</th>\n",
" <th>region</th>\n",
" <th>country</th>\n",
" <th>months_since_last_tx_family</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>130</td>\n",
" <td>8</td>\n",
" <td>0.061538</td>\n",
" <td>886.504908</td>\n",
" <td>672.748</td>\n",
" <td>3012.748</td>\n",
" <td>0.000</td>\n",
" <td>3.479000e+05</td>\n",
" <td>0.000000e+00</td>\n",
" <td>-123.206</td>\n",
" <td>5403.206</td>\n",
" <td>41.563123</td>\n",
" <td>2640.000</td>\n",
" <td>-2763.206</td>\n",
" <td>9.0</td>\n",
" <td>0.830769</td>\n",
" <td>1</td>\n",
" <td>237.868298</td>\n",
" <td>-0.015569</td>\n",
" <td>0.211558</td>\n",
" <td>0.037561</td>\n",
" <td>0.208752</td>\n",
" <td>0.375000</td>\n",
" <td>-0.625000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0.007692</td>\n",
" <td>0.000000</td>\n",
" <td>0.015385</td>\n",
" <td>0.000000</td>\n",
" <td>0.015385</td>\n",
" <td>0.007692</td>\n",
" <td>0.036878</td>\n",
" <td>0.035899</td>\n",
" <td>0.034487</td>\n",
" <td>-0.015286</td>\n",
" <td>-0.014881</td>\n",
" <td>-0.014528</td>\n",
" <td>0.011541</td>\n",
" <td>0.050205</td>\n",
" <td>0.001311</td>\n",
" <td>0.013723</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>29.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>18872</td>\n",
" <td>Carmignac China New Economy</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>19.545000</td>\n",
" <td>19.545</td>\n",
" <td>19.545</td>\n",
" <td>19.545</td>\n",
" <td>1.209445e+03</td>\n",
" <td>1.209445e+03</td>\n",
" <td>0.000</td>\n",
" <td>0.000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000</td>\n",
" <td>0.000</td>\n",
" <td>0.0</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000037</td>\n",
" <td>0.000037</td>\n",
" <td>0.000000</td>\n",
" <td>0.002000</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Court Terme</td>\n",
" <td>47</td>\n",
" <td>1</td>\n",
" <td>0.021277</td>\n",
" <td>28.340426</td>\n",
" <td>0.000</td>\n",
" <td>666.000</td>\n",
" <td>0.000</td>\n",
" <td>1.061668e+05</td>\n",
" <td>0.000000e+00</td>\n",
" <td>-666.000</td>\n",
" <td>666.000</td>\n",
" <td>14.170213</td>\n",
" <td>0.000</td>\n",
" <td>-666.000</td>\n",
" <td>1.0</td>\n",
" <td>0.042553</td>\n",
" <td>1</td>\n",
" <td>97.146084</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.001850</td>\n",
" <td>0.044187</td>\n",
" <td>-0.000342</td>\n",
" <td>-0.004468</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>94.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Credit &lt;NUM&gt;</td>\n",
" <td>23</td>\n",
" <td>6</td>\n",
" <td>0.260870</td>\n",
" <td>7901.739130</td>\n",
" <td>5455.000</td>\n",
" <td>14240.000</td>\n",
" <td>14240.000</td>\n",
" <td>9.231846e+05</td>\n",
" <td>1.711014e+06</td>\n",
" <td>14240.000</td>\n",
" <td>14240.000</td>\n",
" <td>619.130435</td>\n",
" <td>14240.000</td>\n",
" <td>0.000</td>\n",
" <td>6.0</td>\n",
" <td>3.043478</td>\n",
" <td>5</td>\n",
" <td>1747.389385</td>\n",
" <td>0.118826</td>\n",
" <td>0.278576</td>\n",
" <td>0.118826</td>\n",
" <td>0.278576</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.154477</td>\n",
" <td>0.182237</td>\n",
" <td>0.217451</td>\n",
" <td>0.154477</td>\n",
" <td>0.182237</td>\n",
" <td>0.217451</td>\n",
" <td>0.034637</td>\n",
" <td>0.061213</td>\n",
" <td>0.000000</td>\n",
" <td>-0.085826</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Emergents</td>\n",
" <td>130</td>\n",
" <td>37</td>\n",
" <td>0.284615</td>\n",
" <td>2250.665769</td>\n",
" <td>606.000</td>\n",
" <td>8900.649</td>\n",
" <td>167.799</td>\n",
" <td>1.930753e+06</td>\n",
" <td>2.563734e+05</td>\n",
" <td>-7116.537</td>\n",
" <td>7708.537</td>\n",
" <td>59.296438</td>\n",
" <td>296.000</td>\n",
" <td>-7412.537</td>\n",
" <td>55.0</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" <td>219.774317</td>\n",
" <td>-0.025186</td>\n",
" <td>0.147716</td>\n",
" <td>0.034426</td>\n",
" <td>0.145825</td>\n",
" <td>0.179537</td>\n",
" <td>-0.820463</td>\n",
" <td>0.981148</td>\n",
" <td>0.985394</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0.015385</td>\n",
" <td>0.007692</td>\n",
" <td>0.053846</td>\n",
" <td>0.023077</td>\n",
" <td>0.053846</td>\n",
" <td>0.034430</td>\n",
" <td>0.034449</td>\n",
" <td>0.034690</td>\n",
" <td>-0.025185</td>\n",
" <td>-0.025190</td>\n",
" <td>-0.025412</td>\n",
" <td>0.040489</td>\n",
" <td>0.103768</td>\n",
" <td>0.006332</td>\n",
" <td>0.013723</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>32.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Euro-Entrepreneurs</td>\n",
" <td>80</td>\n",
" <td>9</td>\n",
" <td>0.112500</td>\n",
" <td>2321.171212</td>\n",
" <td>184.023</td>\n",
" <td>15725.000</td>\n",
" <td>0.000</td>\n",
" <td>8.452421e+05</td>\n",
" <td>0.000000e+00</td>\n",
" <td>-7093.880</td>\n",
" <td>7963.880</td>\n",
" <td>99.548500</td>\n",
" <td>435.000</td>\n",
" <td>-7528.880</td>\n",
" <td>10.0</td>\n",
" <td>0.625000</td>\n",
" <td>1</td>\n",
" <td>578.951646</td>\n",
" <td>-0.024076</td>\n",
" <td>0.148697</td>\n",
" <td>0.056100</td>\n",
" <td>0.139609</td>\n",
" <td>0.222222</td>\n",
" <td>-0.777778</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>0.012500</td>\n",
" <td>0.000000</td>\n",
" <td>0.012500</td>\n",
" <td>0.000000</td>\n",
" <td>0.012500</td>\n",
" <td>0.000000</td>\n",
" <td>0.051944</td>\n",
" <td>0.046856</td>\n",
" <td>0.042090</td>\n",
" <td>-0.022293</td>\n",
" <td>-0.020170</td>\n",
" <td>-0.017985</td>\n",
" <td>0.017608</td>\n",
" <td>0.113984</td>\n",
" <td>0.009313</td>\n",
" <td>-0.008925</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>63.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Euro-Investissement</td>\n",
" <td>80</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000</td>\n",
" <td>0.000</td>\n",
" <td>0.000</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000000e+00</td>\n",
" <td>0.000</td>\n",
" <td>0.000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000</td>\n",
" <td>0.000</td>\n",
" <td>0.0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-0.008925</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Investissement</td>\n",
" <td>130</td>\n",
" <td>69</td>\n",
" <td>0.530769</td>\n",
" <td>5287.072846</td>\n",
" <td>3200.800</td>\n",
" <td>13396.330</td>\n",
" <td>1575.000</td>\n",
" <td>4.084883e+06</td>\n",
" <td>2.922383e+06</td>\n",
" <td>-8322.670</td>\n",
" <td>18663.578</td>\n",
" <td>143.565985</td>\n",
" <td>5170.454</td>\n",
" <td>-13493.124</td>\n",
" <td>159.0</td>\n",
" <td>3.146154</td>\n",
" <td>4</td>\n",
" <td>448.840939</td>\n",
" <td>-0.015065</td>\n",
" <td>0.102880</td>\n",
" <td>0.025599</td>\n",
" <td>0.104667</td>\n",
" <td>0.094726</td>\n",
" <td>-0.905274</td>\n",
" <td>0.882430</td>\n",
" <td>0.882430</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.023077</td>\n",
" <td>0.076923</td>\n",
" <td>0.030769</td>\n",
" <td>0.130769</td>\n",
" <td>0.038462</td>\n",
" <td>0.153846</td>\n",
" <td>0.025715</td>\n",
" <td>0.025793</td>\n",
" <td>0.025706</td>\n",
" <td>-0.015182</td>\n",
" <td>-0.015068</td>\n",
" <td>-0.014639</td>\n",
" <td>0.111235</td>\n",
" <td>0.154495</td>\n",
" <td>0.005852</td>\n",
" <td>0.013723</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Investissement Latitude</td>\n",
" <td>80</td>\n",
" <td>8</td>\n",
" <td>0.100000</td>\n",
" <td>891.837500</td>\n",
" <td>1222.000</td>\n",
" <td>1717.000</td>\n",
" <td>0.000</td>\n",
" <td>2.310840e+05</td>\n",
" <td>0.000000e+00</td>\n",
" <td>-2958.000</td>\n",
" <td>2958.000</td>\n",
" <td>36.975000</td>\n",
" <td>0.000</td>\n",
" <td>-2958.000</td>\n",
" <td>9.0</td>\n",
" <td>0.787500</td>\n",
" <td>1</td>\n",
" <td>184.846907</td>\n",
" <td>-0.036575</td>\n",
" <td>0.148040</td>\n",
" <td>0.036575</td>\n",
" <td>0.148040</td>\n",
" <td>0.000000</td>\n",
" <td>-1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0.012500</td>\n",
" <td>0.000000</td>\n",
" <td>0.025000</td>\n",
" <td>0.000000</td>\n",
" <td>0.037500</td>\n",
" <td>0.047173</td>\n",
" <td>0.053641</td>\n",
" <td>0.057770</td>\n",
" <td>-0.047173</td>\n",
" <td>-0.053641</td>\n",
" <td>-0.057770</td>\n",
" <td>0.004618</td>\n",
" <td>0.008707</td>\n",
" <td>0.002647</td>\n",
" <td>-0.008925</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>66.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Multi Expertise</td>\n",
" <td>80</td>\n",
" <td>1</td>\n",
" <td>0.012500</td>\n",
" <td>243.500000</td>\n",
" <td>164.000</td>\n",
" <td>564.000</td>\n",
" <td>0.000</td>\n",
" <td>4.327213e+04</td>\n",
" <td>0.000000e+00</td>\n",
" <td>-400.000</td>\n",
" <td>400.000</td>\n",
" <td>5.000000</td>\n",
" <td>0.000</td>\n",
" <td>-400.000</td>\n",
" <td>1.0</td>\n",
" <td>0.875000</td>\n",
" <td>1</td>\n",
" <td>44.721360</td>\n",
" <td>-0.034843</td>\n",
" <td>0.291519</td>\n",
" <td>0.034843</td>\n",
" <td>0.291519</td>\n",
" <td>0.000000</td>\n",
" <td>-1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.012500</td>\n",
" <td>0.033875</td>\n",
" <td>0.032520</td>\n",
" <td>0.030488</td>\n",
" <td>-0.033875</td>\n",
" <td>-0.032520</td>\n",
" <td>-0.030488</td>\n",
" <td>0.000971</td>\n",
" <td>0.002025</td>\n",
" <td>0.003772</td>\n",
" <td>-0.008925</td>\n",
" <td>Switzerland</td>\n",
" <td>Switzerland</td>\n",
" <td>109.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Registrar Account - ID fund_family n_months \\\n",
"0 18872 Carmignac Absolute Return Europe 130 \n",
"1 18872 Carmignac China New Economy 1 \n",
"2 18872 Carmignac Court Terme 47 \n",
"3 18872 Carmignac Credit <NUM> 23 \n",
"4 18872 Carmignac Emergents 130 \n",
"5 18872 Carmignac Euro-Entrepreneurs 80 \n",
"6 18872 Carmignac Euro-Investissement 80 \n",
"7 18872 Carmignac Investissement 130 \n",
"8 18872 Carmignac Investissement Latitude 80 \n",
"9 18872 Carmignac Multi Expertise 80 \n",
"\n",
" n_active_months flow_freq family_aum_qty_mean family_aum_qty_median \\\n",
"0 8 0.061538 886.504908 672.748 \n",
"1 0 0.000000 19.545000 19.545 \n",
"2 1 0.021277 28.340426 0.000 \n",
"3 6 0.260870 7901.739130 5455.000 \n",
"4 37 0.284615 2250.665769 606.000 \n",
"5 9 0.112500 2321.171212 184.023 \n",
"6 0 0.000000 0.000000 0.000 \n",
"7 69 0.530769 5287.072846 3200.800 \n",
"8 8 0.100000 891.837500 1222.000 \n",
"9 1 0.012500 243.500000 164.000 \n",
"\n",
" family_aum_qty_max family_aum_qty_last family_aum_val_mean \\\n",
"0 3012.748 0.000 3.479000e+05 \n",
"1 19.545 19.545 1.209445e+03 \n",
"2 666.000 0.000 1.061668e+05 \n",
"3 14240.000 14240.000 9.231846e+05 \n",
"4 8900.649 167.799 1.930753e+06 \n",
"5 15725.000 0.000 8.452421e+05 \n",
"6 0.000 0.000 0.000000e+00 \n",
"7 13396.330 1575.000 4.084883e+06 \n",
"8 1717.000 0.000 2.310840e+05 \n",
"9 564.000 0.000 4.327213e+04 \n",
"\n",
" family_aum_val_last net_flow_qty_sum gross_flow_qty_sum \\\n",
"0 0.000000e+00 -123.206 5403.206 \n",
"1 1.209445e+03 0.000 0.000 \n",
"2 0.000000e+00 -666.000 666.000 \n",
"3 1.711014e+06 14240.000 14240.000 \n",
"4 2.563734e+05 -7116.537 7708.537 \n",
"5 0.000000e+00 -7093.880 7963.880 \n",
"6 0.000000e+00 0.000 0.000 \n",
"7 2.922383e+06 -8322.670 18663.578 \n",
"8 0.000000e+00 -2958.000 2958.000 \n",
"9 0.000000e+00 -400.000 400.000 \n",
"\n",
" gross_flow_qty_mean sub_qty_sum red_qty_sum n_tx_total avg_n_isin_held \\\n",
"0 41.563123 2640.000 -2763.206 9.0 0.830769 \n",
"1 0.000000 0.000 0.000 0.0 1.000000 \n",
"2 14.170213 0.000 -666.000 1.0 0.042553 \n",
"3 619.130435 14240.000 0.000 6.0 3.043478 \n",
"4 59.296438 296.000 -7412.537 55.0 1.000000 \n",
"5 99.548500 435.000 -7528.880 10.0 0.625000 \n",
"6 0.000000 0.000 0.000 0.0 0.000000 \n",
"7 143.565985 5170.454 -13493.124 159.0 3.146154 \n",
"8 36.975000 0.000 -2958.000 9.0 0.787500 \n",
"9 5.000000 0.000 -400.000 1.0 0.875000 \n",
"\n",
" max_n_isin_held net_flow_qty_vol flow_to_aum_mean flow_to_aum_vol \\\n",
"0 1 237.868298 -0.015569 0.211558 \n",
"1 1 0.000000 0.000000 0.000000 \n",
"2 1 97.146084 0.000000 0.000000 \n",
"3 5 1747.389385 0.118826 0.278576 \n",
"4 1 219.774317 -0.025186 0.147716 \n",
"5 1 578.951646 -0.024076 0.148697 \n",
"6 0 0.000000 NaN 0.000000 \n",
"7 4 448.840939 -0.015065 0.102880 \n",
"8 1 184.846907 -0.036575 0.148040 \n",
"9 1 44.721360 -0.034843 0.291519 \n",
"\n",
" turnover_mean turnover_vol sub_share_mean red_share_mean \\\n",
"0 0.037561 0.208752 0.375000 -0.625000 \n",
"1 0.000000 0.000000 NaN NaN \n",
"2 0.000000 0.000000 0.000000 -1.000000 \n",
"3 0.118826 0.278576 1.000000 0.000000 \n",
"4 0.034426 0.145825 0.179537 -0.820463 \n",
"5 0.056100 0.139609 0.222222 -0.777778 \n",
"6 NaN 0.000000 NaN NaN \n",
"7 0.025599 0.104667 0.094726 -0.905274 \n",
"8 0.036575 0.148040 0.000000 -1.000000 \n",
"9 0.034843 0.291519 0.000000 -1.000000 \n",
"\n",
" aum_drawdown_last aum_drawdown_max entry_count full_exit_count \\\n",
"0 1.000000 1.000000 1 1 \n",
"1 0.000000 0.000000 1 0 \n",
"2 1.000000 1.000000 1 1 \n",
"3 0.000000 0.000000 1 0 \n",
"4 0.981148 0.985394 1 0 \n",
"5 1.000000 1.000000 2 2 \n",
"6 NaN NaN 0 0 \n",
"7 0.882430 0.882430 1 0 \n",
"8 1.000000 1.000000 1 1 \n",
"9 1.000000 1.000000 1 1 \n",
"\n",
" buy_on_perf_rate sell_on_perf_rate buy_on_perf_mean3_rate \\\n",
"0 0.007692 0.000000 0.015385 \n",
"1 0.000000 0.000000 0.000000 \n",
"2 0.000000 0.000000 0.000000 \n",
"3 0.000000 0.000000 0.000000 \n",
"4 0.000000 0.015385 0.007692 \n",
"5 0.012500 0.000000 0.012500 \n",
"6 0.000000 0.000000 0.000000 \n",
"7 0.023077 0.076923 0.030769 \n",
"8 0.000000 0.012500 0.000000 \n",
"9 0.000000 0.000000 0.000000 \n",
"\n",
" sell_on_perf_mean3_rate buy_on_perf_mean6_rate sell_on_perf_mean6_rate \\\n",
"0 0.000000 0.015385 0.007692 \n",
"1 0.000000 0.000000 0.000000 \n",
"2 0.000000 0.000000 0.000000 \n",
"3 0.000000 0.000000 0.000000 \n",
"4 0.053846 0.023077 0.053846 \n",
"5 0.000000 0.012500 0.000000 \n",
"6 0.000000 0.000000 0.000000 \n",
"7 0.130769 0.038462 0.153846 \n",
"8 0.025000 0.000000 0.037500 \n",
"9 0.000000 0.000000 0.012500 \n",
"\n",
" turnover_mean3_avg turnover_mean6_avg turnover_mean12_avg \\\n",
"0 0.036878 0.035899 0.034487 \n",
"1 0.000000 0.000000 0.000000 \n",
"2 0.000000 0.000000 0.000000 \n",
"3 0.154477 0.182237 0.217451 \n",
"4 0.034430 0.034449 0.034690 \n",
"5 0.051944 0.046856 0.042090 \n",
"6 NaN NaN NaN \n",
"7 0.025715 0.025793 0.025706 \n",
"8 0.047173 0.053641 0.057770 \n",
"9 0.033875 0.032520 0.030488 \n",
"\n",
" flow_to_aum_mean3_avg flow_to_aum_mean6_avg flow_to_aum_mean12_avg \\\n",
"0 -0.015286 -0.014881 -0.014528 \n",
"1 0.000000 0.000000 0.000000 \n",
"2 0.000000 0.000000 0.000000 \n",
"3 0.154477 0.182237 0.217451 \n",
"4 -0.025185 -0.025190 -0.025412 \n",
"5 -0.022293 -0.020170 -0.017985 \n",
"6 NaN NaN NaN \n",
"7 -0.015182 -0.015068 -0.014639 \n",
"8 -0.047173 -0.053641 -0.057770 \n",
"9 -0.033875 -0.032520 -0.030488 \n",
"\n",
" family_share_of_client_aum_mean family_share_of_client_aum_max \\\n",
"0 0.011541 0.050205 \n",
"1 0.000037 0.000037 \n",
"2 0.001850 0.044187 \n",
"3 0.034637 0.061213 \n",
"4 0.040489 0.103768 \n",
"5 0.017608 0.113984 \n",
"6 0.000000 0.000000 \n",
"7 0.111235 0.154495 \n",
"8 0.004618 0.008707 \n",
"9 0.000971 0.002025 \n",
"\n",
" ret_fund_m_mean delta_rate_m_mean region country \\\n",
"0 0.001311 0.013723 Switzerland Switzerland \n",
"1 0.000000 0.002000 Switzerland Switzerland \n",
"2 -0.000342 -0.004468 Switzerland Switzerland \n",
"3 0.000000 -0.085826 Switzerland Switzerland \n",
"4 0.006332 0.013723 Switzerland Switzerland \n",
"5 0.009313 -0.008925 Switzerland Switzerland \n",
"6 0.000000 -0.008925 Switzerland Switzerland \n",
"7 0.005852 0.013723 Switzerland Switzerland \n",
"8 0.002647 -0.008925 Switzerland Switzerland \n",
"9 0.003772 -0.008925 Switzerland Switzerland \n",
"\n",
" months_since_last_tx_family \n",
"0 29.0 \n",
"1 0.0 \n",
"2 94.0 \n",
"3 0.0 \n",
"4 32.0 \n",
"5 63.0 \n",
"6 0.0 \n",
"7 2.0 \n",
"8 66.0 \n",
"9 109.0 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#table finale de features au niveau client × famille\n",
"months_since_last_tx = add_months_since_last_tx_by_group(\n",
" tmp,\n",
" id_cols=[ID_COL, \"fund_family\"],\n",
" active_col=\"active_month\",\n",
" month_col=\"month\",\n",
" suffix=\"_family\"\n",
")\n",
"\n",
"df_family_client_base = (\n",
" tmp\n",
" .groupby([ID_COL, \"fund_family\"], as_index=False)\n",
" .agg(\n",
" n_months=(\"month\", \"nunique\"),\n",
" n_active_months=(\"active_month\", \"sum\"),\n",
" flow_freq=(\"active_month\", \"mean\"),\n",
"\n",
" family_aum_qty_mean=(\"aum_qty\", \"mean\"),\n",
" family_aum_qty_median=(\"aum_qty\", \"median\"),\n",
" family_aum_qty_max=(\"aum_qty\", \"max\"),\n",
" family_aum_qty_last=(\"aum_qty\", \"last\"),\n",
"\n",
" family_aum_val_mean=(\"aum_val\", \"mean\"),\n",
" family_aum_val_last=(\"aum_val\", \"last\"),\n",
"\n",
" net_flow_qty_sum=(\"net_flow_qty\", \"sum\"),\n",
" gross_flow_qty_sum=(\"gross_flow_qty\", \"sum\"),\n",
" gross_flow_qty_mean=(\"gross_flow_qty\", \"mean\"),\n",
"\n",
" sub_qty_sum=(\"sub_qty\", \"sum\"),\n",
" red_qty_sum=(\"red_qty\", \"sum\"),\n",
" n_tx_total=(\"n_tx\", \"sum\"),\n",
"\n",
" avg_n_isin_held=(\"n_isin_held\", \"mean\"),\n",
" max_n_isin_held=(\"n_isin_held\", \"max\"),\n",
"\n",
" net_flow_qty_vol=(\"net_flow_qty\", \"std\"),\n",
" flow_to_aum_mean=(\"flow_to_aum_m\", \"mean\"),\n",
" flow_to_aum_vol=(\"flow_to_aum_m\", \"std\"),\n",
" turnover_mean=(\"turnover_m\", \"mean\"),\n",
" turnover_vol=(\"turnover_m\", \"std\"),\n",
"\n",
" sub_share_mean=(\"sub_share_m\", \"mean\"),\n",
" red_share_mean=(\"red_share_m\", \"mean\"),\n",
"\n",
" aum_drawdown_last=(\"aum_drawdown\", \"last\"),\n",
" aum_drawdown_max=(\"aum_drawdown\", \"max\"),\n",
"\n",
" entry_count=(\"entry_event\", \"sum\"),\n",
" full_exit_count=(\"full_exit_event\", \"sum\"),\n",
"\n",
" buy_on_perf_rate=(\"buy_on_perf\", \"mean\"),\n",
" sell_on_perf_rate=(\"sell_on_perf\", \"mean\"),\n",
" buy_on_perf_mean3_rate=(\"buy_on_perf_mean3\", \"mean\"),\n",
" sell_on_perf_mean3_rate=(\"sell_on_perf_mean3\", \"mean\"),\n",
" buy_on_perf_mean6_rate=(\"buy_on_perf_mean6\", \"mean\"),\n",
" sell_on_perf_mean6_rate=(\"sell_on_perf_mean6\", \"mean\"),\n",
"\n",
" turnover_mean3_avg=(\"turnover_m_mean3\", \"mean\"),\n",
" turnover_mean6_avg=(\"turnover_m_mean6\", \"mean\"),\n",
" turnover_mean12_avg=(\"turnover_m_mean12\", \"mean\"),\n",
"\n",
" flow_to_aum_mean3_avg=(\"flow_to_aum_m_mean3\", \"mean\"),\n",
" flow_to_aum_mean6_avg=(\"flow_to_aum_m_mean6\", \"mean\"),\n",
" flow_to_aum_mean12_avg=(\"flow_to_aum_m_mean12\", \"mean\"),\n",
"\n",
" family_share_of_client_aum_mean=(\"family_share_of_client_aum\", \"mean\"),\n",
" family_share_of_client_aum_max=(\"family_share_of_client_aum\", \"max\"),\n",
"\n",
" ret_fund_m_mean=(\"ret_fund_m\", \"mean\"),\n",
" delta_rate_m_mean=(\"delta_rate_m\", \"mean\"),\n",
"\n",
" region=(\"region\", \"last\"),\n",
" country=(\"country\", \"last\"),\n",
" )\n",
" .merge(months_since_last_tx, on=[ID_COL, \"fund_family\"], how=\"left\")\n",
")\n",
"\n",
"for col in [\n",
" \"net_flow_qty_vol\",\n",
" \"flow_to_aum_vol\",\n",
" \"turnover_vol\",\n",
" \"months_since_last_tx_family\"\n",
"]:\n",
" if col in df_family_client_base.columns:\n",
" df_family_client_base[col] = df_family_client_base[col].fillna(0)\n",
"\n",
"print(\"df_family_client_base shape:\", df_family_client_base.shape)\n",
"df_family_client_base.head(10)"
]
},
{
"cell_type": "markdown",
"id": "ac6b1959",
"metadata": {},
"source": [
"---\n",
"## 5. Part 1 — Top 15 Fund Families\n",
"We restrict the clustering analysis to the 15 largest fund families, measured primarily by total final AUM and client coverage.\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "9cabda05",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Top 15 families:\n",
"01 - Carmignac Patrimoine\n",
"02 - Carmignac Sécurité\n",
"03 - Carmignac Credit <NUM>\n",
"04 - Carmignac Investissement\n",
"05 - Carmignac Portfolio Sécurité\n",
"06 - Carmignac Portfolio Flexible Bond\n",
"07 - Carmignac Portfolio Credit\n",
"08 - Carmignac Emergents\n",
"09 - Carmignac Court Terme\n",
"10 - Carmignac Portfolio Long-Short European Equities\n",
"11 - Carmignac Portfolio Grande Europe\n",
"12 - Carmignac Portfolio Global Bond\n",
"13 - Carmignac Portfolio Emergents\n",
"14 - Carmignac Portfolio Patrimoine\n",
"15 - Carmignac Portfolio Emerging Patrimoine\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fund_family</th>\n",
" <th>n_clients</th>\n",
" <th>total_family_aum</th>\n",
" <th>avg_family_aum</th>\n",
" <th>total_gross_flow</th>\n",
" <th>total_tx</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Carmignac Patrimoine</td>\n",
" <td>341</td>\n",
" <td>6.415987e+09</td>\n",
" <td>3.604052e+07</td>\n",
" <td>4.524176e+07</td>\n",
" <td>213046.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Carmignac Sécurité</td>\n",
" <td>322</td>\n",
" <td>5.282111e+09</td>\n",
" <td>2.309570e+07</td>\n",
" <td>2.832674e+07</td>\n",
" <td>156020.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Carmignac Credit &lt;NUM&gt;</td>\n",
" <td>194</td>\n",
" <td>4.003308e+09</td>\n",
" <td>8.805209e+06</td>\n",
" <td>4.940239e+07</td>\n",
" <td>55630.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Carmignac Investissement</td>\n",
" <td>321</td>\n",
" <td>3.977901e+09</td>\n",
" <td>1.103948e+07</td>\n",
" <td>5.399207e+06</td>\n",
" <td>103479.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Carmignac Portfolio Sécurité</td>\n",
" <td>253</td>\n",
" <td>2.590833e+09</td>\n",
" <td>7.606207e+06</td>\n",
" <td>1.139422e+08</td>\n",
" <td>63976.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Carmignac Portfolio Flexible Bond</td>\n",
" <td>317</td>\n",
" <td>2.438397e+09</td>\n",
" <td>5.677381e+06</td>\n",
" <td>7.345324e+06</td>\n",
" <td>67493.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Carmignac Portfolio Credit</td>\n",
" <td>266</td>\n",
" <td>2.238925e+09</td>\n",
" <td>4.120852e+06</td>\n",
" <td>2.850970e+07</td>\n",
" <td>43262.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Carmignac Emergents</td>\n",
" <td>317</td>\n",
" <td>1.049202e+09</td>\n",
" <td>2.728114e+06</td>\n",
" <td>2.646927e+06</td>\n",
" <td>62600.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Carmignac Court Terme</td>\n",
" <td>207</td>\n",
" <td>1.002651e+09</td>\n",
" <td>2.141392e+06</td>\n",
" <td>9.506942e+05</td>\n",
" <td>28295.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Carmignac Portfolio Long-Short European Equities</td>\n",
" <td>236</td>\n",
" <td>6.875695e+08</td>\n",
" <td>2.336299e+06</td>\n",
" <td>1.743013e+07</td>\n",
" <td>23017.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Carmignac Portfolio Grande Europe</td>\n",
" <td>299</td>\n",
" <td>5.477161e+08</td>\n",
" <td>1.730144e+06</td>\n",
" <td>1.320832e+07</td>\n",
" <td>40485.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Carmignac Portfolio Global Bond</td>\n",
" <td>280</td>\n",
" <td>5.166370e+08</td>\n",
" <td>2.343729e+06</td>\n",
" <td>9.028186e+06</td>\n",
" <td>48365.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Carmignac Portfolio Emergents</td>\n",
" <td>193</td>\n",
" <td>4.760573e+08</td>\n",
" <td>1.521340e+06</td>\n",
" <td>6.876949e+06</td>\n",
" <td>15278.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Carmignac Portfolio Patrimoine</td>\n",
" <td>216</td>\n",
" <td>4.435119e+08</td>\n",
" <td>2.804551e+06</td>\n",
" <td>1.852971e+07</td>\n",
" <td>31854.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Carmignac Portfolio Emerging Patrimoine</td>\n",
" <td>293</td>\n",
" <td>3.188049e+08</td>\n",
" <td>1.703089e+06</td>\n",
" <td>1.061354e+07</td>\n",
" <td>57784.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" fund_family n_clients \\\n",
"0 Carmignac Patrimoine 341 \n",
"1 Carmignac Sécurité 322 \n",
"2 Carmignac Credit <NUM> 194 \n",
"3 Carmignac Investissement 321 \n",
"4 Carmignac Portfolio Sécurité 253 \n",
"5 Carmignac Portfolio Flexible Bond 317 \n",
"6 Carmignac Portfolio Credit 266 \n",
"7 Carmignac Emergents 317 \n",
"8 Carmignac Court Terme 207 \n",
"9 Carmignac Portfolio Long-Short European Equities 236 \n",
"10 Carmignac Portfolio Grande Europe 299 \n",
"11 Carmignac Portfolio Global Bond 280 \n",
"12 Carmignac Portfolio Emergents 193 \n",
"13 Carmignac Portfolio Patrimoine 216 \n",
"14 Carmignac Portfolio Emerging Patrimoine 293 \n",
"\n",
" total_family_aum avg_family_aum total_gross_flow total_tx \n",
"0 6.415987e+09 3.604052e+07 4.524176e+07 213046.0 \n",
"1 5.282111e+09 2.309570e+07 2.832674e+07 156020.0 \n",
"2 4.003308e+09 8.805209e+06 4.940239e+07 55630.0 \n",
"3 3.977901e+09 1.103948e+07 5.399207e+06 103479.0 \n",
"4 2.590833e+09 7.606207e+06 1.139422e+08 63976.0 \n",
"5 2.438397e+09 5.677381e+06 7.345324e+06 67493.0 \n",
"6 2.238925e+09 4.120852e+06 2.850970e+07 43262.0 \n",
"7 1.049202e+09 2.728114e+06 2.646927e+06 62600.0 \n",
"8 1.002651e+09 2.141392e+06 9.506942e+05 28295.0 \n",
"9 6.875695e+08 2.336299e+06 1.743013e+07 23017.0 \n",
"10 5.477161e+08 1.730144e+06 1.320832e+07 40485.0 \n",
"11 5.166370e+08 2.343729e+06 9.028186e+06 48365.0 \n",
"12 4.760573e+08 1.521340e+06 6.876949e+06 15278.0 \n",
"13 4.435119e+08 2.804551e+06 1.852971e+07 31854.0 \n",
"14 3.188049e+08 1.703089e+06 1.061354e+07 57784.0 "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 28 - Table des top families\n",
"\n",
"top_family_table = (\n",
" df_family_client_base\n",
" .groupby(\"fund_family\", as_index=False)\n",
" .agg(\n",
" n_clients=(ID_COL, \"nunique\"),\n",
" total_family_aum=(\"family_aum_val_last\", \"sum\"),\n",
" avg_family_aum=(\"family_aum_val_mean\", \"mean\"),\n",
" total_gross_flow=(\"gross_flow_qty_sum\", \"sum\"),\n",
" total_tx=(\"n_tx_total\", \"sum\")\n",
" )\n",
" .sort_values([\"total_family_aum\", \"n_clients\"], ascending=[False, False])\n",
" .reset_index(drop=True)\n",
")\n",
"\n",
"top_15_families = top_family_table.head(15)[\"fund_family\"].tolist()\n",
"\n",
"print(\"Top 15 families:\")\n",
"for i, fam in enumerate(top_15_families, 1):\n",
" print(f\"{i:02d} - {fam}\")\n",
"\n",
"top_family_table.head(15)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "3be8b56b",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAJOCAYAAABm7rQwAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XlcVdX+//HXATGcRyoHVJRERYyDlKkoiZWJqamYszlkZppzgHFTLE1M064gYWSIZFoakFN6r3QzcSozy3LMCXEWU0NQ4cDvD3+crydAgZSj+H4+Hjwe7L3XXuuz1zqQfFprbUN2dnY2IiIiIiIiIiIixcjG2gGIiIiIiIiIiMiDR0kpEREREREREREpdkpKiYiIiIiIiIhIsVNSSkREREREREREip2SUiIiIiIiIiIiUuyUlBIRERERERERkWKnpJSIiIiIiIiIiBQ7JaVERERERERERKTYKSklIiIiIiIiIiLFTkkpEREREXkgffLJJ7Rv357GjRvTtWvXYm/fxcWF0NDQ25Y7f/48o0ePpkWLFri4uLBo0aK7H9xNfHx8CAwMNB9v374dFxcXtm/fbj4XGBiIj49Pscc1fPjwYm3TWv4+BiIiJUUpawcgIiIiItbl4uJSoHKLFy+mRYsWdzWWzz//nG3btvHrr79y6tQpunXrRkhISK5ysbGxTJo0Kc86EhMTcXBwuGU7iYmJzJo1iy5duvDGG29QpUqVOxL/3TBjxgw2bdrEqFGjqF69Ok2bNrV2SA+MM2fO8OWXX/LMM8/QuHHjItWxceNGfv31V9544407HF3+Ll++TOvWrbl+/Tpr166lQYMGucoMGDCAP//8k9WrV+e6duHCBVq2bMmoUaPMcd/8M7dkyRI8PT0t7snOzubpp5/m9OnTPP300yxYsOAuPJmIlDRKSomIiIg84N5//32L46+//prNmzfnOp/XH7Z32ieffMKVK1dwc3Pj3Llzty0/evRoateubXGuYsWKt71v27Zt2NjYMH36dEqXLl3keIvDtm3baN++PUOHDrVK++vWrcNgMNyyzLvvvkt2dnYxRVR8zp49S1hYGLVq1fpHSaklS5YUa1IqZ8wcHBxYuXIl48aNu2N1P/TQQ6xevTpXUuqHH37g9OnT9/zPk4jcW5SUEhEREXnA/X3p2i+//MLmzZutsqQtJiaGmjVrYjAYMBqNty3ftm1b3NzcCt1OSkoK9vb298Uf0CkpKQVKtN0tBekjOzu7YohECmrlypV4e3tTs2ZNVq9efUeTUt7e3qxbt45//etflCr1f39Orl69GldXVy5evHjH2hKRkk97SomIiIjIbaWlpRESEoK3tzdNmzalQ4cOLFy4MNfsGBcXF9555x1WrlxJhw4dcHNzo3v37vz4448FaqdWrVq3nZXzd6mpqZhMpgKXd3FxITY2lrS0NFxcXMzHycnJ5u/zuufm/Z9CQ0NxcXHh2LFjBAYG4unpSfPmzZk0aRLp6ekW916/fp333nuPp556CqPRyGuvvcbp06dvG2dsbCwuLi5kZ2ezZMkSc6wAFy9eZObMmXTu3Bmj0YiHhwevvPIK+/bts6gjZ/+ntWvXEhYWRps2bTAajYwePZq//vqL69evM336dFq2bInRaGTSpElcv37doo6C7GeU155SWVlZLFq0iE6dOuHm5karVq2YPHkyly5dsii3e/duhg4dSosWLWjWrBk+Pj75Ls3MS2JiIl27dsXNzQ1fX1/+85//mK8dP3483324du7ciYuLS57L1+BG3/n5+QEwadIki89Kjm+++Ybu3bvTrFkzWrRowcSJEzlz5oxFvyxZsgTAfP/Ny2UXLlxI7969zc/evXt31q1bV+Bnz8vJkyfZsWMHvr6+dOrUieTkZHbu3PmP6rxZp06duHjxIps3bzafu379OuvXr6dz5853rB0ReTBoppSIiIiI3FJ2djYjRoww/5HeuHFjNm3axPvvv8+ZM2d46623LMr/+OOPrF27lgEDBlC6dGmWLl3KK6+8wvLly2nYsOEdjW3gwIGkpaVhZ2eHl5cXgYGB1KtX75b3vP/++3z55Zf8+uuvTJs2DQAPD48itT927Fhq167N+PHj2bNnD8uXL6dq1aq8+eab5jJBQUGsXLmSF154AQ8PD7Zt28arr75627qfeOIJ3n//ffz9/WndurXFzLXjx4+zYcMGnn/+eWrXrs358+f54osv6N+/P2vWrOGRRx6xqOvjjz/G3t6eV199lWPHjvHZZ59RqlQpDAYDly9fZtSoUfzyyy/ExsZSq1YtRo0aVaT+uNnkyZOJi4uje/fuDBgwgOTkZJYsWcKePXtYunQpdnZ2pKSkMHToUKpUqcKrr75KxYoVSU5O5r///W+B2jh69Cjjxo2jd+/edOvWja+++ooxY8bwySef0Lp1axwdHfHw8GDlypUMGjTI4t5Vq1ZRrlw52rdvn2fdDRo0YPTo0cybN49evXrRvHlz4P8+Kzl7LLm5uTF+/HhSUlJYvHgxO3fuJD4+nooVK9KrVy/Onj2b53JYuLFPm4+PD507dyYjI4M1a9YwZswYFixYwNNPP13wzr7J6tWrKVOmDO3atcPe3p46deqwatWqIn/G/65WrVq4u7uzZs0avL29Afj+++/566+/8PX1JSYm5o60IyIPBiWlREREROSWEhIS2LZtG2PHjmXEiBEA9OvXj9GjR7N48WL69+9PnTp1zOUPHDjAV199Zd6Qu1OnTjz//PPMmzePsLCwOxKTvb093bt3p0WLFpQvX57ffvuNRYsW0bt3b+Li4qhRo0a+93bt2pWtW7eyZ88ei0RPcnJyoeNo3Lgx7733nvn44sWLrFixwpyU2rdvHytXrqRv375MmTIFuNF3EyZMYP/+/bes29HREUdHR/z9/alXr55FrC4uLqxfvx4bm/9b+NC1a1c6duzIihUrGDlypEVdJpOJmJgY8zK7P//8kzVr1tCmTRsiIyPNcSUlJREbG/uPk1I7duxg+fLlzJ4922L2TIsWLXjllVdYt24dnTt35ueff+bSpUssXLjQYhlmQZebHT16lNDQUJ577jkA/Pz8eP7555k9ezatW7cG4MUXX2Ty5MkcOnTIvC9aRkYG33zzDc899xxlypTJs+7q1avTtm1b5s2bh7u7u0X/Z2RkMHv2bBo2bMiSJUt46KGHAGjevDnDhw9n0aJFjB49GqPRSL169fJdDrt+/Xrs7e3Nx/369aN79+5ERUUVOSm1atUq2rdvb67X19eXL774gqCgIIvldv9E586d+eCDD7h69Sr29vasWrWKJ554IlcyVETkdrR8T0RERERu6fvvv8fW1pYBAwZYnB8yZAjZ2dl8//33FueNRqPFG+Jq1qxJ+/btSUxMLNQyu1vx9fVlxowZvPjiizzzzDOMHTuWTz75hIsXL/LRRx/dkTYKonfv3hbHnp6eXLx4kdTUVODGJtdArr57+eWX/1G7pUuXNiekTCYTf/75J2XLlsXJyYk9e/bkKt+1a1eLfZ+aNWtGdnY2PXr0sCjXrFkzTp06RWZm5j+Kb926dVSoUIHWrVtz4cIF85erqytly5Zl+/btAFSoUAGA7777joyMjEK38/DDD/Pss8+aj8uXL8+LL77Inj17zBvld+zYkYceeohVq1aZyyUmJvLnn3/SpUuXIj3fb7/9RkpKCn369DEnpACefvpp6tevz3fffVegem5OSF26dIm//vqL5s2b5zmGBbFv3z4OHDjACy+8YD7XqVMn/vzzTxITE4tUZ146duzItWvX+N///kdqairfffedlu6JSJFoppSIiIiI3NKJEyd4+OGHKV++vMX5nFknJ06csDhft27dXHXUq1eP9PR0Lly4gIODw12J09PTk8cff5ytW7felfrzUrNmTYvjnA3JL126RPny5Tlx4gQ2NjYWM8kA6tev/4/azcrKYvHixXz++eckJydbJPsqV6582zhzkkF/n1FWoUIFsrKy+Ouvv6hSpUqR4zt27Bh//fUXLVu2zPN6SkoKAE8++SQdOnQgLCyMRYsW8eSTT/LMM8/QuXPnAm2wXrdu3Vx7kOU
"text/plain": [
"<Figure size 1200x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAJOCAYAAABm7rQwAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XlcVdX+//HXATGcEaVyQEVRVMQEKVNRCisTcwRznjNzuM4Bxr2KpYlD2hUljAyRTEsDUjO9V+7NxIEyG8z5ioo4D6khinDg94c/ztcTKIPKkXo/Hw8fD/bea6/12Wudw43PXWttQ05OTg4iIiIiIiIiIiIlyMrSAYiIiIiIiIiIyF+PklIiIiIiIiIiIlLilJQSEREREREREZESp6SUiIiIiIiIiIiUOCWlRERERERERESkxCkpJSIiIiIiIiIiJU5JKRERERERERERKXFKSomIiIiIiIiISIlTUkpEREREREREREqcklIiIiIiUuI++ugjOnToQJMmTejWrVuJt+/i4kJYWFiB5S5evMi4ceNo1aoVLi4uLF++/OEHdwcfHx+CgoJMx0lJSbi4uJCUlGQ6FxQUhI+PT4nHNXLkyBJt81Hh4uLC22+/bekwCiUrK4u5c+fi7e1N48aNGT169H3Vl99nrbDfJRGR/JSxdAAiIiIif2UuLi6FKrdixQpatWr1UGP59NNP2bVrF7/88gtnzpyhR48ehIaG5ikXGxvL1KlT860jMTERBweHe7aTmJjIvHnz6Nq1K3/729+oWrXqA4n/YZg9ezbbtm1j7NixVK9enWbNmlk6JJFC++KLL1i2bBmDBw+madOm1KxZ09IhFcv69eu5dOkSQ4YMsXQoIvKAKSklIiIiYkFz5841O/7yyy/Zvn17nvMNGjR46LF89NFHXL9+HTc3Ny5cuFBg+XHjxlG7dm2zc5UrVy7wvl27dmFlZcWsWbMoW7ZsseMtCbt27aJDhw4MHz7cIu1v2rQJg8FwzzLvvPMOOTk5JRSRlCa7du3iiSee4K233npobfzyyy9YW1s/tPoBNmzYwJEjR5SUEvkTUlJKRERExIL+uHTt559/Zvv27RZZ0hYTE0PNmjUxGAy4u7sXWL59+/a4ubkVuZ1Lly5ha2v7yCek4HashUm0PSyF6SMbG5sSiERKUkZGBjY2NlhZ3d9uKyXx+X3sscceav0i8uemPaVEREREHnHp6emEhobi7e1Ns2bN6NixI8uWLcszOyZ3r5t169bRsWNH3Nzc6NmzJ99//32h2qlVq1aBs3L+KC0tDaPRWOjyLi4uxMbGkp6ejouLi+k4NTXV9HN+99y5Z01YWBguLi6cOHGCoKAgPD09admyJVOnTuXGjRtm9966dYt3332XZ599Fnd3d9544w3Onj1bYJyxsbG4uLiQk5PDypUrTbECXLlyhTlz5tClSxfc3d3x8PDgtdde4+DBg2Z15O7/tHHjRhYvXky7du1wd3dn3Lhx/P7779y6dYtZs2bRunVr3N3dmTp1Krdu3TKr4497SuUnv31+srOzWb58OZ07d8bNzY02bdowbdo0rl69alZu7969DB8+nFatWtG8eXN8fHzuujQzP4mJiXTr1g03Nzd8fX3517/+Zbp28uTJu+7DtWfPHlxcXNiwYcNd676z/z744ANTEnTw4MGcOHHCrOzd+mngwIEMHDgw3zqLOya5CvM9O3fuHFOnTqVNmzY0a9aMzp07s3bt2nyf86uvvmLhwoW0a9eOp556irS0tLv2TUG/E3K/T0lJSRw5csT0+b1zL7L8bN26lQEDBpg+135+fqxfv/6e9+S3p1RRnrug8R04cCDffPMNp06dMj3HnZ/3mJgYOnfuzFNPPcXTTz9Nz549C4xZRB4dmiklIiIi8gjLyclh1KhRJCUl4e/vT5MmTdi2bRtz587l3LlzeZblfP/992zcuJGBAwdStmxZVq1axWuvvcaaNWto1KjRA41t0KBBpKenY2Njg5eXF0FBQdSrV++e98ydO5fPP/+cX375hZkzZwLg4eFRrPYnTJhA7dq1mTRpEvv372fNmjXY29vz5ptvmsoEBwezbt06XnnlFTw8PNi1axevv/56gXU//fTTzJ07l4CAANq2bWs2c+3kyZNs2bKFl19+mdq1a3Px4kU+++wzBgwYwFdffcUTTzxhVteHH36Ira0tr7/+OidOnOCTTz6hTJkyGAwGrl27xtixY/n555+JjY2lVq1ajB07tlj9cadp06YRFxdHz549GThwIKmpqaxcuZL9+/ezatUqbGxsuHTpEsOHD6dq1aq8/vrrVK5cmdTUVP79738Xqo3jx48zceJE+vTpQ48ePfjiiy8YP348H330EW3btsXR0REPDw/WrVuXZ9nV+vXrqVChAh06dCiwncjISAwGA8OGDSMtLY2PPvqIKVOmsGbNmuJ0DXD/Y1KY79nFixd59dVXMRgM9O/fH3t7e7799luCg4NJS0vL0yfh4eHY2NgwfPhwbt26ddcZcIX5nWBvb8/cuXOJiIggPT2dSZMmAfdeBhwbG8tbb71Fw4YNGTlyJJUqVeLAgQNs27aNLl26FLpvi/rcBY3vG2+8we+//87Zs2dNCdMKFSoA8PnnnzNz5kw6duzIoEGDyMjI4NChQ/z8889FillELEdJKREREZFHWEJCArt27WLChAmMGjUKgP79+zNu3DhWrFjBgAEDqFOnjqn84cOH+eKLL0wbcnfu3JmXX36ZRYsWsXjx4gcSk62tLT179qRVq1ZUrFiRX3/9leXLl9OnTx/i4uKoUaPGXe/t1q0bO3fuZP/+/WaJntTU1CLH0aRJE959913T8ZUrV1i7dq0pKXXw4EHWrVtHv379mD59OnC77yZPnsyhQ4fuWbejoyOOjo4EBARQr149s1hdXFzYvHmz2dKqbt260alTJ9auXcuYMWPM6jIajcTExJiSDL/99htfffUV7dq1IzIy0hRXSkoKsbGx952U2r17N2vWrGH+/Plmf5i3atWK1157jU2bNtGlSxd+/PFHrl69yrJly8yWYU6cOLFQ7Rw/fpywsDBeeuklAPz9/Xn55ZeZP38+bdu2BaB79+5MmzaNo0ePmhIimZmZfP3117z00kuUK1euwHYyMjKIj483LWWsXLkys2bN4vDhw8VOtN7vmBTme7Zw4UKMRiPr1683bebft29fJk2axOLFi+nTpw+2trZmz/nFF1+YnctPYX8ndOvWjbVr1/Lbb78VuBz4999/Z+bMmTRv3pyYmBizJXlF3a+sOM99r/Ft27YtK1as4Nq1a3me45tvvqFhw4YsWrSoSDGKyKNDy/dEREREHmHffvst1tbWZkuQAIYNG0ZOTg7ffvut2Xl3d3ezN8TVrFmTDh06kJiYWKRldvfi6+vL7Nmz6d69Oy+88AITJkzgo48+4sqVK3zwwQcPpI3C6NOnj9mxp6cnV65cMS172rp1K0Cevhs8ePB9tVu2bFlTQspoNPLbb79Rvnx5nJyc2L9/f57y3bp1M5v10rx5c3JycvDz8zMr17x5c86cOUNWVtZ9xbdp0yYqVapE27ZtuXz5sumfq6sr5cuXNy3hqlSpEnD7D/vMzMwit/P444/z4osvmo4rVqxI9+7d2b9/v2mj/E6dOvHYY4+ZLadKTEzkt99+o2vXroVqp2fPnmZ7a3l6egK3Z6wV1/2OSUHfs5ycHP71r3/h4+NDTk6O2Th4eXnx+++/s2/fPrM6u3fvXmBCCor+O6Ewtm/fzvXr13n99dfz7BFVlCW9xXnu+xnfypUrc/bsWX755ZdCxygijxbNlBIRERF5hJ06dYrHH3+cihUrmp3PnXVy6tQps/N169bNU0e9evW4ceMGly9fxsHB4aHE6enpyVNPPcXOnTsfSv35+ePr7XM3dL569SoVK1bk1KlTWFlZmc0kA6hfv/59tZudnc2KFSv49NNPSU1NNUv22dnZFRhnbjLojzPKKlWqRHZ2Nr///rtphklxnDhxgt9//53WrVvne/3SpUsAPPPMM3Ts2JHFixezfPlynnnmGV544QW6dOlSqA3W69atmydhkbt
"text/plain": [
"<Figure size 1200x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# 29 - Visualisation des top 15 families\n",
"\n",
"top15_plot = top_family_table.head(15).copy()\n",
"\n",
"plt.figure(figsize=(12, 6))\n",
"sns.barplot(data=top15_plot, y=\"fund_family\", x=\"total_family_aum\")\n",
"plt.title(\"Top 15 fund families by total AUM\")\n",
"plt.xlabel(\"Total family AUM\")\n",
"plt.ylabel(\"Fund family\")\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"plt.figure(figsize=(12, 6))\n",
"sns.barplot(data=top15_plot, y=\"fund_family\", x=\"n_clients\")\n",
"plt.title(\"Top 15 fund families by number of clients\")\n",
"plt.xlabel(\"Number of clients\")\n",
"plt.ylabel(\"Fund family\")\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "9b5257a3",
"metadata": {},
"source": [
"---\n",
"## 6. Part 2 — Additional Performance Features for Family Clustering\n",
"We enrich the `client × family` dataset with two types of performance-sensitive variables computed from the monthly family panel:\n",
"\n",
"1. **Correlation between flow intensity and past family performance**, using rolling average returns over 3 and 6 months.\n",
"2. **Buy-after-good-performance shares**, i.e. the proportion of buy months occurring after positive rolling average returns.\n",
"\n",
"These variables extend the original notebook by explicitly measuring momentum-like behavior at fund-family level.\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "3ae47862",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Performance features merged. Shape: (7282, 53)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Registrar Account - ID</th>\n",
" <th>fund_family</th>\n",
" <th>corr_flow_ret_3m</th>\n",
" <th>corr_flow_ret_6m</th>\n",
" <th>buy_after_good_perf_share_3m</th>\n",
" <th>buy_after_good_perf_share_6m</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Absolute Return Europe</td>\n",
" <td>0.051116</td>\n",
" <td>0.031925</td>\n",
" <td>0.666667</td>\n",
" <td>0.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>18872</td>\n",
" <td>Carmignac China New Economy</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Court Terme</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Credit &lt;NUM&gt;</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>18872</td>\n",
" <td>Carmignac Emergents</td>\n",
" <td>-0.062404</td>\n",
" <td>0.018069</td>\n",
" <td>0.142857</td>\n",
" <td>0.428571</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Registrar Account - ID fund_family corr_flow_ret_3m \\\n",
"0 18872 Carmignac Absolute Return Europe 0.051116 \n",
"1 18872 Carmignac China New Economy 0.000000 \n",
"2 18872 Carmignac Court Terme 0.000000 \n",
"3 18872 Carmignac Credit <NUM> 0.000000 \n",
"4 18872 Carmignac Emergents -0.062404 \n",
"\n",
" corr_flow_ret_6m buy_after_good_perf_share_3m \\\n",
"0 0.031925 0.666667 \n",
"1 0.000000 0.000000 \n",
"2 0.000000 0.000000 \n",
"3 0.000000 0.000000 \n",
"4 0.018069 0.142857 \n",
"\n",
" buy_after_good_perf_share_6m \n",
"0 0.666667 \n",
"1 0.000000 \n",
"2 0.000000 \n",
"3 0.000000 \n",
"4 0.428571 "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 30 - Additional performance features for family clustering\n",
"\n",
"def safe_corr(x, y, min_obs=6):\n",
" x = pd.Series(x, dtype=float)\n",
" y = pd.Series(y, dtype=float)\n",
" mask = x.notna() & y.notna()\n",
" x = x[mask]\n",
" y = y[mask]\n",
" if len(x) < min_obs or x.nunique() <= 1 or y.nunique() <= 1:\n",
" return np.nan\n",
" return x.corr(y)\n",
"\n",
"tmp_perf = tmp.copy()\n",
"\n",
"# Past family returns already available in tmp:\n",
"# ret_fund_mean3_lag1 and ret_fund_mean6_lag1\n",
"\n",
"tmp_perf[\"buy_flag\"] = (tmp_perf[\"net_flow_qty\"] > 0).astype(int)\n",
"tmp_perf[\"good_perf_3m\"] = (tmp_perf[\"ret_fund_mean3_lag1\"] > 0).astype(int)\n",
"tmp_perf[\"good_perf_6m\"] = (tmp_perf[\"ret_fund_mean6_lag1\"] > 0).astype(int)\n",
"\n",
"perf_rows = []\n",
"for (acc, fam), g in tmp_perf.groupby([ID_COL, \"fund_family\"]):\n",
" g = g.sort_values(\"month\")\n",
"\n",
" buy_months = g[\"buy_flag\"].sum()\n",
"\n",
" perf_rows.append({\n",
" ID_COL: acc,\n",
" \"fund_family\": fam,\n",
" \"corr_flow_ret_3m\": safe_corr(g[\"flow_to_aum_m\"], g[\"ret_fund_mean3_lag1\"], min_obs=6),\n",
" \"corr_flow_ret_6m\": safe_corr(g[\"flow_to_aum_m\"], g[\"ret_fund_mean6_lag1\"], min_obs=6),\n",
" \"buy_after_good_perf_share_3m\": (\n",
" ((g[\"buy_flag\"] == 1) & (g[\"good_perf_3m\"] == 1)).sum() / buy_months\n",
" if buy_months > 0 else np.nan\n",
" ),\n",
" \"buy_after_good_perf_share_6m\": (\n",
" ((g[\"buy_flag\"] == 1) & (g[\"good_perf_6m\"] == 1)).sum() / buy_months\n",
" if buy_months > 0 else np.nan\n",
" ),\n",
" })\n",
"\n",
"df_family_perf = pd.DataFrame(perf_rows)\n",
"\n",
"# Merge into final family-level table\n",
"df_family_client_base = df_family_client_base.merge(\n",
" df_family_perf,\n",
" on=[ID_COL, \"fund_family\"],\n",
" how=\"left\"\n",
")\n",
"\n",
"for col in [\"corr_flow_ret_3m\", \"corr_flow_ret_6m\",\n",
" \"buy_after_good_perf_share_3m\", \"buy_after_good_perf_share_6m\"]:\n",
" if col in df_family_client_base.columns:\n",
" df_family_client_base[col] = df_family_client_base[col].fillna(0)\n",
"\n",
"print(\"Performance features merged. Shape:\", df_family_client_base.shape)\n",
"df_family_client_base[\n",
" [ID_COL, \"fund_family\", \"corr_flow_ret_3m\", \"corr_flow_ret_6m\",\n",
" \"buy_after_good_perf_share_3m\", \"buy_after_good_perf_share_6m\"]\n",
"].head()"
]
},
{
"cell_type": "markdown",
"id": "6b781457",
"metadata": {},
"source": [
"---\n",
"## 7. Part 3 — Clustering Features\n",
"To stay close to the original global clustering notebook, we keep a compact set of core behavioral variables:\n",
"- activity frequency\n",
"- gross flow intensity relative to average AUM\n",
"- family concentration/importance\n",
"- exit behavior\n",
"- flow direction balance\n",
"- recency of activity\n",
"\n",
"We then add the new **performance-sensitive features** and the **rolling turnover averages**, which are central to the family-fund approach.\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "2d120f23",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of clustering features: 15\n",
"['flow_freq', 'gross_flow_to_aum_family', 'avg_n_isin_held', 'exit_rate_per_isin_family', 'flow_direction_balance', 'log_family_aum_val_mean', 'months_since_last_tx_family', 'family_share_of_client_aum_mean', 'turnover_mean3_avg', 'turnover_mean6_avg', 'turnover_mean12_avg', 'corr_flow_ret_3m', 'corr_flow_ret_6m', 'buy_after_good_perf_share_3m', 'buy_after_good_perf_share_6m']\n"
]
}
],
"source": [
"# 31 - Final clustering features\n",
"\n",
"# Derived ratios closer to the initial notebook\n",
"df_family_client_base[\"log_family_aum_val_mean\"] = np.log1p(\n",
" df_family_client_base[\"family_aum_val_mean\"].clip(lower=0)\n",
")\n",
"\n",
"df_family_client_base[\"gross_flow_to_aum_family\"] = np.where(\n",
" df_family_client_base[\"family_aum_qty_mean\"].abs() > 1,\n",
" df_family_client_base[\"gross_flow_qty_sum\"] / df_family_client_base[\"family_aum_qty_mean\"].abs(),\n",
" np.nan\n",
")\n",
"\n",
"df_family_client_base[\"flow_direction_balance\"] = np.where(\n",
" df_family_client_base[\"gross_flow_qty_sum\"] > 0,\n",
" df_family_client_base[\"net_flow_qty_sum\"] / df_family_client_base[\"gross_flow_qty_sum\"],\n",
" np.nan\n",
")\n",
"\n",
"df_family_client_base[\"exit_rate_per_isin_family\"] = np.where(\n",
" df_family_client_base[\"avg_n_isin_held\"] > 0,\n",
" df_family_client_base[\"full_exit_count\"] / df_family_client_base[\"avg_n_isin_held\"],\n",
" np.nan\n",
")\n",
"\n",
"df_family_client_base[\"family_aum_final_to_peak\"] = np.where(\n",
" df_family_client_base[\"family_aum_qty_max\"] > 0,\n",
" df_family_client_base[\"family_aum_qty_last\"] / df_family_client_base[\"family_aum_qty_max\"],\n",
" np.nan\n",
")\n",
"\n",
"df_family_client_base[\"aum_drawdown_last\"] = df_family_client_base[\"aum_drawdown_last\"].clip(0, 1)\n",
"df_family_client_base[\"family_aum_final_to_peak\"] = df_family_client_base[\"family_aum_final_to_peak\"].clip(0, 1)\n",
"\n",
"# Compact but rich feature list\n",
"family_cluster_features = [\n",
" \"flow_freq\",\n",
" \"gross_flow_to_aum_family\",\n",
" \"avg_n_isin_held\",\n",
" \"exit_rate_per_isin_family\",\n",
" \"flow_direction_balance\",\n",
" \"log_family_aum_val_mean\",\n",
" \"months_since_last_tx_family\",\n",
" \"family_share_of_client_aum_mean\",\n",
" \"turnover_mean3_avg\",\n",
" \"turnover_mean6_avg\",\n",
" \"turnover_mean12_avg\",\n",
" \"corr_flow_ret_3m\",\n",
" \"corr_flow_ret_6m\",\n",
" \"buy_after_good_perf_share_3m\",\n",
" \"buy_after_good_perf_share_6m\",\n",
"]\n",
"\n",
"print(\"Number of clustering features:\", len(family_cluster_features))\n",
"print(family_cluster_features)"
]
},
{
"cell_type": "markdown",
"id": "e455f9cf",
"metadata": {},
"source": [
"---\n",
"## 8. Part 4 — Robust Clustering Utilities\n",
"Compared with the first family notebook version, the functions below:\n",
"- apply **stronger filtering of outliers**\n",
"- winsorize a broader set of variables\n",
"- reject values of `K` that create **tiny clusters**\n",
"- therefore avoid the pathological case “one giant cluster + one or two clients alone”\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "8b43106b",
"metadata": {},
"outputs": [],
"source": [
"# 32 - Robust clustering helpers\n",
"\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.impute import SimpleImputer\n",
"\n",
"def prepare_family_matrix(df_family, features, winsorize_cols=None):\n",
" X = df_family[features].copy()\n",
"\n",
" # Median imputation\n",
" imputer = SimpleImputer(strategy=\"median\")\n",
" X_imp = pd.DataFrame(imputer.fit_transform(X), columns=features, index=X.index)\n",
"\n",
" # Broad winsorization, inspired by the original notebook\n",
" if winsorize_cols is None:\n",
" winsorize_cols = [\n",
" \"gross_flow_to_aum_family\",\n",
" \"avg_n_isin_held\",\n",
" \"exit_rate_per_isin_family\",\n",
" \"months_since_last_tx_family\",\n",
" \"family_share_of_client_aum_mean\",\n",
" \"turnover_mean3_avg\",\n",
" \"turnover_mean6_avg\",\n",
" \"turnover_mean12_avg\",\n",
" ]\n",
"\n",
" for col in winsorize_cols:\n",
" if col in X_imp.columns:\n",
" X_imp[col] = winsorize_mad(X_imp[col], n_sigma=3)\n",
"\n",
" # Variables bounded by construction\n",
" for col in [\n",
" \"flow_freq\",\n",
" \"family_share_of_client_aum_mean\",\n",
" \"buy_after_good_perf_share_3m\",\n",
" \"buy_after_good_perf_share_6m\",\n",
" \"family_aum_final_to_peak\"\n",
" ]:\n",
" if col in X_imp.columns:\n",
" X_imp[col] = np.clip(X_imp[col], 0, 1)\n",
"\n",
" # Correlations should live in [-1, 1]\n",
" for col in [\"corr_flow_ret_3m\", \"corr_flow_ret_6m\", \"flow_direction_balance\"]:\n",
" if col in X_imp.columns:\n",
" X_imp[col] = np.clip(X_imp[col], -1, 1)\n",
"\n",
" # Same logic as in the original notebook:\n",
" # clip and log-transform gross flow intensity\n",
" if \"gross_flow_to_aum_family\" in X_imp.columns:\n",
" vals = X_imp[\"gross_flow_to_aum_family\"].to_numpy(dtype=float)\n",
" vals = np.clip(vals, 0, np.nanpercentile(vals, 90))\n",
" X_imp[\"gross_flow_to_aum_family\"] = np.log1p(vals)\n",
"\n",
" scaler = RobustScaler()\n",
" X_scaled = scaler.fit_transform(X_imp)\n",
"\n",
" return X_imp, X_scaled, imputer, scaler\n",
"\n",
"\n",
"def evaluate_k_range(X_scaled, k_min=2, k_max=6, min_cluster_size=10, min_cluster_share=0.02, random_state=42):\n",
" n = X_scaled.shape[0]\n",
" k_max = min(k_max, max(2, n - 1))\n",
"\n",
" rows = []\n",
" for k in range(k_min, k_max + 1):\n",
" if k >= n:\n",
" continue\n",
"\n",
" km = KMeans(n_clusters=k, random_state=random_state, n_init=20)\n",
" labels = km.fit_predict(X_scaled)\n",
"\n",
" cluster_sizes = pd.Series(labels).value_counts().sort_index()\n",
" min_size = int(cluster_sizes.min())\n",
" min_size_required = max(min_cluster_size, int(np.ceil(min_cluster_share * n)))\n",
"\n",
" valid = (min_size >= min_size_required)\n",
"\n",
" if valid:\n",
" sil = silhouette_score(X_scaled, labels)\n",
" dbi = davies_bouldin_score(X_scaled, labels)\n",
" else:\n",
" sil = np.nan\n",
" dbi = np.nan\n",
"\n",
" rows.append({\n",
" \"k\": k,\n",
" \"silhouette\": sil,\n",
" \"davies_bouldin\": dbi,\n",
" \"min_cluster_size\": min_size,\n",
" \"min_cluster_required\": min_size_required,\n",
" \"inertia\": km.inertia_,\n",
" \"valid_k\": valid\n",
" })\n",
"\n",
" return pd.DataFrame(rows)\n",
"\n",
"\n",
"def choose_best_k(metrics_df):\n",
" valid = metrics_df[metrics_df[\"valid_k\"]].dropna(subset=[\"silhouette\"]).copy()\n",
" if len(valid) == 0:\n",
" return None\n",
" return int(valid.sort_values([\"silhouette\", \"davies_bouldin\"], ascending=[False, True]).iloc[0][\"k\"])"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "ceb97cfd",
"metadata": {},
"outputs": [],
"source": [
"# 33 - Family clustering pipeline\n",
"\n",
"def run_family_clustering(\n",
" df_base,\n",
" family_name,\n",
" features,\n",
" k_min=2,\n",
" k_max=6,\n",
" min_clients=40,\n",
" min_cluster_size=10,\n",
" min_cluster_share=0.02,\n",
" random_state=42,\n",
" make_plots=True\n",
"):\n",
" df_family = df_base[df_base[\"fund_family\"] == family_name].copy()\n",
"\n",
" # Quality filters\n",
" df_family = df_family[\n",
" (df_family[\"n_months\"] >= 6) &\n",
" (df_family[\"family_aum_val_mean\"] > 0)\n",
" ].copy()\n",
"\n",
" # Stronger quantile filtering, inspired by the initial notebook\n",
" for col in [\"family_aum_val_mean\", \"gross_flow_qty_sum\", \"n_tx_total\"]:\n",
" if col in df_family.columns and df_family[col].notna().sum() > 0:\n",
" cap = df_family[col].quantile(0.99)\n",
" df_family = df_family[df_family[col] <= cap].copy()\n",
"\n",
" n_clients = df_family[ID_COL].nunique()\n",
" if n_clients < min_clients:\n",
" print(f\"[SKIP] {family_name}: only {n_clients} usable clients.\")\n",
" return None\n",
"\n",
" X_imp, X_scaled, imputer, scaler = prepare_family_matrix(df_family, features)\n",
"\n",
" metrics_df = evaluate_k_range(\n",
" X_scaled,\n",
" k_min=k_min,\n",
" k_max=k_max,\n",
" min_cluster_size=min_cluster_size,\n",
" min_cluster_share=min_cluster_share,\n",
" random_state=random_state\n",
" )\n",
"\n",
" if len(metrics_df) == 0:\n",
" print(f\"[SKIP] {family_name}: no K evaluated.\")\n",
" return None\n",
"\n",
" best_k = choose_best_k(metrics_df)\n",
" if best_k is None:\n",
" print(f\"[SKIP] {family_name}: no valid K with cluster size constraints.\")\n",
" return None\n",
"\n",
" km = KMeans(n_clusters=best_k, random_state=random_state, n_init=30)\n",
" labels = km.fit_predict(X_scaled)\n",
"\n",
" df_family[\"cluster\"] = labels\n",
"\n",
" cluster_sizes = (\n",
" df_family[\"cluster\"]\n",
" .value_counts()\n",
" .sort_index()\n",
" .rename_axis(\"cluster\")\n",
" .reset_index(name=\"n_clients\")\n",
" )\n",
"\n",
" cluster_profile = (\n",
" df_family.groupby(\"cluster\")[features]\n",
" .median()\n",
" .sort_index()\n",
" )\n",
"\n",
" # Churn proxies at family level\n",
" df_family[\"churn_hard\"] = (df_family[\"family_aum_final_to_peak\"] < 0.10).astype(int)\n",
" df_family[\"churn_soft\"] = (\n",
" (df_family[\"family_aum_final_to_peak\"] < 0.40) &\n",
" (df_family[\"aum_drawdown_last\"] > 0.40)\n",
" ).astype(int)\n",
" df_family[\"churn_warning\"] = (\n",
" (df_family[\"flow_direction_balance\"] < 0) &\n",
" (df_family[\"aum_drawdown_last\"] > 0.20)\n",
" ).astype(int)\n",
"\n",
" churn_profile = (\n",
" df_family.groupby(\"cluster\")[[\"churn_hard\", \"churn_soft\", \"churn_warning\"]]\n",
" .mean()\n",
" .sort_index()\n",
" )\n",
"\n",
" # PCA for visualization\n",
" pca = PCA(n_components=2, random_state=random_state)\n",
" coords = pca.fit_transform(X_scaled)\n",
" df_family[\"pca1\"] = coords[:, 0]\n",
" df_family[\"pca2\"] = coords[:, 1]\n",
"\n",
" result = {\n",
" \"family\": family_name,\n",
" \"n_clients\": n_clients,\n",
" \"best_k\": best_k,\n",
" \"metrics_df\": metrics_df,\n",
" \"df_family\": df_family,\n",
" \"cluster_profile\": cluster_profile,\n",
" \"cluster_sizes\": cluster_sizes,\n",
" \"churn_profile\": churn_profile,\n",
" \"features\": features,\n",
" \"pca_explained_var\": pca.explained_variance_ratio_,\n",
" }\n",
"\n",
" if make_plots:\n",
" # Choice of K\n",
" fig, axes = plt.subplots(1, 3, figsize=(15, 4))\n",
" axes[0].plot(metrics_df[\"k\"], metrics_df[\"silhouette\"], marker=\"o\")\n",
" axes[0].set_title(f\"{family_name} - Silhouette\")\n",
" axes[0].set_xlabel(\"k\")\n",
"\n",
" axes[1].plot(metrics_df[\"k\"], metrics_df[\"davies_bouldin\"], marker=\"o\")\n",
" axes[1].set_title(f\"{family_name} - Davies-Bouldin\")\n",
" axes[1].set_xlabel(\"k\")\n",
"\n",
" axes[2].plot(metrics_df[\"k\"], metrics_df[\"inertia\"], marker=\"o\")\n",
" axes[2].set_title(f\"{family_name} - Inertia\")\n",
" axes[2].set_xlabel(\"k\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
" # Cluster signatures\n",
" plot_heatmap(\n",
" df_family,\n",
" profile_vars=features,\n",
" cluster_col=\"cluster\",\n",
" title=f\"{family_name} - Cluster signatures\",\n",
" figsize=(18, 5)\n",
" )\n",
"\n",
" # PCA projection\n",
" plt.figure(figsize=(7, 5))\n",
" sns.scatterplot(\n",
" data=df_family,\n",
" x=\"pca1\", y=\"pca2\",\n",
" hue=\"cluster\",\n",
" palette=\"tab10\",\n",
" alpha=0.8\n",
" )\n",
" plt.title(\n",
" f\"{family_name} - PCA projection \"\n",
" f\"(explained var: {result['pca_explained_var'][0]:.2f}, {result['pca_explained_var'][1]:.2f})\"\n",
" )\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
" # Cluster sizes\n",
" plt.figure(figsize=(6, 4))\n",
" sns.barplot(data=cluster_sizes, x=\"cluster\", y=\"n_clients\")\n",
" plt.title(f\"{family_name} - Cluster sizes\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
" # Churn by cluster\n",
" churn_long = churn_profile.reset_index().melt(\n",
" id_vars=\"cluster\",\n",
" value_vars=[\"churn_hard\", \"churn_soft\", \"churn_warning\"],\n",
" var_name=\"churn_type\",\n",
" value_name=\"rate\"\n",
" )\n",
"\n",
" plt.figure(figsize=(8, 4))\n",
" sns.barplot(data=churn_long, x=\"cluster\", y=\"rate\", hue=\"churn_type\")\n",
" plt.title(f\"{family_name} - Churn analysis by cluster\")\n",
" plt.ylabel(\"Rate\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
" print(f\"\\n=== {family_name} ===\")\n",
" print(f\"Clients after filtering: {n_clients}\")\n",
" print(f\"Chosen K: {best_k}\")\n",
" print(cluster_sizes)\n",
" print(\"\\nChurn profile:\")\n",
" print(churn_profile.round(3))\n",
"\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "b1161a03",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABdEAAAGGCAYAAACUkchWAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA7R1JREFUeJzs3Xd4VNXWx/HvZNIIIYE0SkKHhJIAAUIIBFAERZBXigpIL4Ioil4V8aoogoIKXsVCDaGDKEVRsBd6770TOimQ3jPvHzGjEQJJCJmU3+d5eHTO7NmzzpoD58yaffY2mEwmEyIiIiIiIiIiIiIichMrSwcgIiIiIiIiIiIiIlJUqYguIiIiIiIiIiIiIpIDFdFFRERERERERERERHKgIrqIiIiIiIiIiIiISA5URBcRERERERERERERyYGK6CIiIiIiIiIiIiIiOVARXUREREREREREREQkByqii4iIiIiIiIiIiIjkQEV0EREREREREREREZEcqIguxUb79u0ZO3aspcMoESyZy5L2OY4dO5b27dtn2+bj48Onn35qfvzpp5/i4+NDVFRUYYcnIpJvJe3fa0vSebfo27ZtGz4+Pmzbts3SoRSalStX4uPjw4ULF+7Y9t/HUWnMl4jkn85FBUfXFMXXhQsX8PHxYeXKlZYORfLJ2tIBiGWFhYUxZ84cNm3axLVr17CxscHb25uHH36YXr16YW9vb+kQS5z+/fuzfft282NnZ2eqVq1Knz596NGjB1ZWuf9t6+TJk6xbt47u3bvj5eV1L8IttaKiovjiiy/YuHEjly5domzZsnh6ehIYGMgzzzxD2bJlLR1igUpMTGTOnDm0aNGCwMDAbM/9+eef7N+/n+eee85C0YmUHDrvFj6dd4u+Cxcu8MADD5gfW1tb4+joSM2aNWnRogW9e/emSpUqFozw3li5ciWvvfZatm0uLi7UqVOHYcOG0a5dOwtFJiLFga4pCp+uKYq+rGuKMWPGMHToUIvEsGbNGiIjIxk0aJBF3l/uHRXRS7E//viD0aNHY2try6OPPoq3tzepqans2rWLDz/8kJMnTzJhwgRLh2n2ww8/YDAYLB1GgahUqRL/+c9/ALh+/TqrV6/m9ddf5+zZs7z88su57ufkyZN89tlntGjRIk8nXkvmsjh8jjdu3KBnz57ExcXRs2dPatWqxY0bNzh27BhLly6lT58+5iL6hAkTMJlMFo747iUmJvLZZ58xatSoWxbRFy9erCK6yF3SeddydN4tHp/jI488Qtu2bTGZTERHR3PgwAHmz5/PggULePfdd+nSpcs9e++AgAD279+PjY3NPXuPnDz//PN4eXlhMpmIjIxk1apVDB8+nBkzZnD//fcXejy5Ycl8iYiuKSxJ1xQl43O8l7777jtOnDhxUxHd09OT/fv3Y22tUmxxpU+ulDp//jwvvvgiVapUYf78+Xh4eJif69u3L+fOneOPP/646/cxmUwkJycXyK/gtra2d91HUVGuXDkeffRR8+NevXrRqVMnFi9ezOjRo+/JF5J/fhaWzGVx+By//vprLl26xNKlS2natGm25+Li4rJ9PvryKCK5ofOuZem8Wzw0aNAg2+cEcPHiRYYMGcKrr75K7dq1qVev3j15bysrK+zs7O5J33fStm1b/Pz8zI8fe+wxWrduzXfffVdki+iWzJdIaadrCsvSNYXkJCEhAQcHhxyfNxgMOncWc5oTvZSaM2cOCQkJvPvuu9lOulmqV6/OwIEDzY9XrFjBgAEDCAoKwtfXl86dO7NkyZKbXte+fXtGjBjBhg0b6NGjB40aNWLZsmXmeRPXrl3LZ599Rps2bfD39+f5558nNjaWlJQU3n33XYKCgvD39+e1114jJSXlpr7/Pf/W0aNH6devH40aNaJt27Z88cUXrFix4qb5HbPi2rlzJ4899hh+fn488MADrF69Olt/N27c4P3336dr1674+/vTtGlThg0bxtGjR2/a1+TkZD799FMeeugh/Pz8CA4OZtSoUYSFheXqM/inMmXK0LhxYxISEoiKiuLixYu8/fbbPPTQQzRq1IjAwECef/75bPu0cuVKRo8eDcCAAQPw8fHJNjdlTp/FrXKZNSfmzp07mThxIi1btqR58+aMGzeOlJQUYmJiGDNmDAEBAQQEBPDBBx/cNPo6ISGByZMn065dO3x9fXnooYcICQm5qV1O771r1y4mTZpEy5YtadKkCc8+++wt5xD/888/efLJJ2nSpAn+/v4MHz6cEydO5DnntxMWFobRaKRJkyY3Pefo6JjtxHerOdFzEhsby9ixY2nevDnNmjXjtddeIzExMVubtLQ0Pv/8czp06ICvry/t27fno48+uunvw7/nXc9yq78nMTExvPvuu+bPpmPHjsyaNYuMjAwg85a3oKAgAD777DPzsfTpp58yduxYFi9ebH7PrD9ZMjIymDdvHl26dMHPz49WrVoxbtw4oqOjc5UTkdJC512dd3XezR9PT08mT55Mamoqs2fPNm/PzbETERFBgwYN+Oyzz27q9/Tp0/j4+LBo0SIg5zm+9+3bx9ChQ2nWrBmNGzemX79+7Nq1K1ubuLg43n33Xdq3b4+vry9BQUEMHjyYQ4cO5WufnZycsLOzu2mkWm4+89vNt5rTtcM/mUwmvvjiC9q2bUvjxo3p37//LT/vW+Wrf//+PPLII5w8eZL+/fvTuHFj2rRpk+1zE5G7p2sKXVPomiJv7kWcY8eOxd/fn7CwMJ566in8/f15+eWX6d+/P3/88QcXL140f65Z9YJbnaOPHj3K2LFjeeCBB/Dz86N169a89tprXL9+/d4mRfJFI9FLqd9//52qVaveNMo2J0uXLqVu3bq0b98ea2trfv/9d8aPH4/JZKJv377Z2p45c4aXXnqJXr168cQTT1CzZk3zc7NmzcLe3p7hw4dz7tw5Fi1ahLW1NQaDgZiYGEaNGsW+fftYuXIlnp6ejBo1KseYrl69ar44GD58OA4ODnz11Vc5/jp67tw5Ro8ezWOPPUb37t1ZsWIFY8eOpWHDhtStWxfI/FX/l19+oVOnTnh5eREREcGXX35Jv379+P7776lYsSIA6enpjBgxgi1bttClSxcGDBhAfHw8mzZt4vjx41SrVi1Xef2nCxcuYDQacXJy4s8//2TPnj106dKFSpUqcfHiRZYuXcqAAQP4/vvvKVOmDAEBAfTv35+FCxfy9NNPU6tWLQBq166dq8/iViZOnIibmxvPPfcc+/bt48svv6RcuXLs2bOHypUr8+KLL7J+/XpCQkLw9vamW7duQOYXrpEjR7Jt2zYee+wx6tevz4YNG/jggw+4evUq//3vf++4/xMnTsTJyYlRo0Zx8eJF5s+fzzvvvMPHH39sbrN69WrGjh1LcHAwL7/8MomJiSxdupQnn3ySVatWFdhccp6enqSnp/PNN9/QvXv3AukT4IUXXsDLy4v//Oc/HD58mK+++goXFxdeeeUVc5s33niDVatW8dBDDzF48GD279/PzJkzOXXqFJ9//nme3zMxMZF+/fpx9epVevfuTeXKldmzZw8fffQR4eHhvP7667i4uPD222/z9ttv07FjRzp27AhkftlOTEzk2rVrbNq0iQ8++OCm/seNG8eqVavo0aMH/fv358KFCyxevJjDhw+zdOlSjdQX+YvOuzrv3orOu7nj7+9PtWrV2Lx5s3lbbo4dNzc3AgICWLdu3U3H9tq1azEajXTq1CnH992yZQtPPfUUvr6+jBo1CoPBwMqVKxk4cCBLliyhUaNGALz11lv8+OOP9OvXj9q1a3Pjxg127drFqVOnaNiw4R33Ly4uzvwlPjIykoULF5KQkMD//d//mdsUxGeeG5988gnTp0+nXbt2tGvXjkOHDjFkyBBSU1Nz9fro6GiGDRtGx44defjhh/nxxx+ZMmUK3t7emuNdpIDomkLXFLeia4o7K+g409LSzD+0v/rqq9jb2+Pu7k5sbCxXrlwxr3tyu/XUNm/ezPnz5+nRowfu7u6cOHGC5cuXc/LkSZYvX67pc4oak5Q6sbGxJm9vb9PIkSNz/ZrExMSbtg0
"text/plain": [
"<Figure size 1500x400 with 3 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABjMAAAHpCAYAAADHxs6UAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XV4FMcbwPHvJSSEKDHiCXEsIQR3CC3aFrfi7kV/FG/x0gLF3V0KBHd3d6d4AoG4C8n9/ggcHEmwEi6Q9/M8+8Duzu69M7eZ27vZmVEolUolQgghhBBCCCGEEEIIIYQQWZSWpgMQQgghhBBCCCGEEEIIIYR4F2nMEEIIIYQQQgghhBBCCCFEliaNGUIIIYQQQgghhBBCCCGEyNKkMUMIIYQQQgghhBBCCCGEEFmaNGYIIYQQQgghhBBCCCGEECJLk8YMIYQQQgghhBBCCCGEEEJkadKYIYQQQgghhBBCCCGEEEKILE0aM4QQQgghhBBCCCGEEEIIkaVJY4YQQgghhBBCCCGEEEIIIbI0acwQQgghhMgi/Pz8GDBggKbD+CZosiy/xfdxwIAB+Pn5aTqML+rx48d4enqyfv16TYcihBBCCCGEAHJoOgAhhBBCiMz08OFD5s2bx9GjR3n27Bk6Ojp4eHhQo0YNGjdujJ6enqZD/Oa0aNGCU6dOqdZNTExwcHCgadOm1KtXDy2tD3+e5s6dO2zfvp26detib2+fGeFma9HR0SxatIhdu3bx6NEjkpOTcXR0pGLFirRs2RIrK6svEsfy5cvJlSsX9erV+yKv9zU6d+4cR48epVWrVhgbG2s6HCGEEEIIIb44acwQQgghxDfrwIED9OzZE11dXWrXro2HhwdJSUmcPXuWv/76izt37jBy5EhNh6myY8cOFAqFpsP4LKytrenTpw8AYWFh+Pv7M3jwYO7fv0+/fv0++Dx37txh2rRplChR4qMaMzRZll/L+/jo0SNat27NkydPqF69Oo0bN0ZHR4ebN2/yzz//sGfPHnbu3PlFYlm5ciWmpqZZqjHDzs6OS5cukSNH1vjKdP78eaZNm0bdunWlMUMIIYQQQmRLWePOXAghhBDiM3v06BG9e/fG1taWxYsXkydPHtW+Zs2a8eDBAw4cOPCfX0epVJKQkPBZenjo6ur+53NkFUZGRtSuXVu13rhxY6pXr87y5cvp2bMnOjo6n/0133wvNFmWX8P7+OLFC7p3705ISAhLliyhWLFiavt79+7N3LlzNRTd5/HixQtSUlI++f1QKBTkzJnzM0eV9cTGxqKvr6/pMIQQQgghhHgvmTNDCCGEEN+kefPmERsby+jRo9UaMl5xcnKiVatWqvV169bRsmVLSpcuTaFChahZsyYrVqxIc5yfnx+dOnXi8OHD1KtXD29vb1atWsXJkyfx9PRk27ZtTJs2jfLly1OkSBF++eUXoqKiSExMZPTo0ZQuXZoiRYowcOBAEhMT05z77bkWbty4QfPmzfH29qZChQrMmDGDdevW4enpyePHj9PEdebMGRo0aICXlxdVqlTB399f7Xzh4eGMGzeOH3/8kSJFiuDr60v79u25ceNGmrwmJCQwdepUqlWrhpeXF+XKlaN79+48fPjwg96DN+XKlYvChQsTGxtLaGgoAQEB/P7771SrVg1vb29KlizJL7/8opan9evX07NnTwBatmyJp6cnnp6enDx58p3vRXpluX79ejw9PTlz5gyjRo2iVKlSFCtWjGHDhpGYmEhkZCT9+/enePHiFC9enD///BOlUqmWh9jYWP744w8qVqxIoUKFqFatGvPnz0+TLqPXPnv2LGPHjqVUqVL4+PjQrVs3QkND05TVwYMH+fnnn/Hx8aFIkSJ07NiR27dvf3SZv8uuXbu4ceMGnTt3TtOQAWBoaEjv3r0zPP7V9f7qvXglvXkmnj9/zsCBA6lQoQKFChWiXLlydOnSRfVe+/n5cfv2bU6dOqV6j1u0aKE6PjIyktGjR6vK/fvvv2fOnDmkpKSked358+ezaNEivvvuO7y8vPj3338zzMPRo0dp2rQpxYoVo0iRIlSrVo2JEye+My8A27dvp2bNmnh5efHDDz+we/fuNHOKvBnP6tWr+e677yhUqBD169fn0qVLaue7ceMGAwYMoEqVKnh5eVG2bFkGDhxIWFiYKs3UqVP5888/AahSpYqqnB4/fvzOuT08PT2ZOnWq2nk8PT25c+cOffv2pXjx4vz888+q/Rs3blT9LZUoUYLevXvz5MkTtXPev3+fHj16ULZsWby8vKhQoQK9e/cmKioqw7IWQgghhBDic5CeGUIIIYT4Ju3fvx8HBwd8fX0/KP3KlStxd3fHz8+PHDlysH//foYPH45SqaRZs2Zqae/du0ffvn1p3LgxjRo1wtnZWbVvzpw56Onp0bFjRx48eMCyZcvIkSMHCoWCyMhIunfvzsWLF1m/fj12dnZ07949w5iCgoJUDS4dO3ZEX1+ftWvXZvik+YMHD+jZsycNGjSgbt26rFu3jgEDBlCwYEHc3d2B1B4re/bsoXr16tjb2xMcHMzq1atp3rw5W7duVc2RkJycTKdOnTh+/Di1atWiZcuWxMTEcPToUW7duoWjo+MHleubHj9+jLa2NsbGxhw8eJDz589Tq1YtrK2tCQgIYOXKlbRs2ZKtW7eSK1cuihcvTosWLVi6dCmdO3fGxcUFAFdX1w96L9IzatQoLCws6NGjBxcvXmT16tUYGRlx/vx5bGxs6N27N4cOHWL+/Pl4eHhQp04dILXXR5cuXTh58iQNGjQgf/78HD58mD///JOgoCAGDRr03vyPGjUKY2NjunfvTkBAAIsXL2bEiBFMmjRJlcbf358BAwZQrlw5+vXrR1xcHCtXruTnn39mw4YNn23ekL179wKo9Z7JLD169ODOnTs0b94cOzs7QkNDOXr0KE+ePMHe3p5BgwYxcuRI9PX16dy5MwAWFhYAxMXF0bx5c4KCgmjSpAk2NjacP3+eiRMn8vz5cwYPHqz2WuvXrychIYFGjRqhq6uLiYlJujHdvn2bTp064enpyS+//IKuri4PHjzg3Llz78zLgQMH6N27Nx4eHvTt25eIiAgGDx6c4dwiW7ZsISYmhsaNG6NQKJg3bx49evRgz549qt5Jx44d49GjR9SrVw9LS0tu377NmjVruHPnDmvWrEGhUPD9999z//59tmzZwsCBAzE1NQXAzMws3Qax9+nZsydOTk707t1b1Rg3c+ZMJk+eTI0aNWjQoAGhoaEsW7aMZs2a4e/vj7GxMYmJibRr147ExESaN2+OhYUFQUFBHDhwgMjISIyMjD46FiGEEEIIIT6YUgghhBDiGxMVFaX08PBQdunS5YOPiYuLS7Otbdu2yipVqqhtq1y5stLDw0N56NAhte0nTpxQenh4KH/44QdlYmKianufPn2Unp6eyvbt26ulb9y4sbJy5cppzv3rr7+q1keOHKn09PRUXrt2TbUtLCxMWaJECaWHh4fy0aNHaeI6ffq0altISIiyUKFCyj/++EO1LSEhQZmcnKz2uo8ePVIWKlRIOW3aNNW2f/75R+nh4aFcuHBhmnJJSUlJs+1NzZs3V1avXl0ZEhKiDAkJUd65c0c5cuRIpYeHh7JTp05KpTL98j5//rzSw8NDuWHDBtW27du3Kz08PJQnTpxIkz6j9+LVvjfLct26dUoPDw9l27Zt1eJv3Lix0tPTUzls2DDVthcvXigrVKigbN68uWrb7t27lR4eHsoZM2aovU6PHj2Unp6eygcPHrz3tVu3bq322mPGjFHmz59fGRkZqVQqlcro6GhlsWLFlEOGDFF7jefPnyuLFi2aZvt/UadOHWXRokU/OP2vv/6qdr2+ut7ffl8ePXqk9PDwUK5bt06pVCqVERERSg8PD+W8efPeef5atWqplfcr06dPV/r4+Cjv3buntn38+PHK/PnzKwMDA9Ve19fXVxkSEvLe/CxcuFDp4eHxzrRv50WpVCp/+OEHZYUKFZTR0dGqbSdPnlR6eHiolc+rY0uUKKEMDw9Xbd+zZ4/Sw8NDuW/fPtW29P4Wtmz
"text/plain": [
"<Figure size 1800x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArIAAAHqCAYAAAD4TK2HAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAyzVJREFUeJzs3Xd8W9X5+PHP1V6W915xhp3lLLJJGAkjJGGFDQHaMvuFlh90AKWlbFpoS1ugFAplEyDMpGSwdyCQvbcdr3gP7Xl/fxibKLYTZ8i2kuf9ffHqN+dcSY90JfnRuc85R1FVVUUIIYQQQogYo+ntAIQQQgghhDgUksgKIYQQQoiYJImsEEIIIYSISZLICiGEEEKImCSJrBBCCCGEiEmSyAohhBBCiJgkiawQQgghhIhJksgKIYQQQoiYJImsEEIIIYSISZLIikMybdo0brvttt4O46jQm6+lnMfYcfnll3P55Zf3+ON+++23FBUV8e233/b4YwOEw2Fmz57NE0880SuPv6+ioiIeffTRg75db7+Ohxq3OPrMmzePk046Cb/f39uhHBGSyPYhu3fv5s4772T69OkUFxczZswYLr74Yp5//nm8Xm9vh3dUuvzyyykqKmr/b/z48Zx33nm88cYbhMPhg7qv7du38+ijj1JeXh6laI9N5eXlEedoyJAhnHTSSdxwww1s2rSpw/E+n4/nnnuOCy64gOOOO47i4mJOP/107rnnHnbt2tXpYzz00EMUFRXx//7f/4vys+m7Xn75Zd56663eDqOD//3vf1RVVTF37tzeDkX0opaWFv7whz8wceJERo0axeWXX86GDRu6fftFixZx4YUXMnbsWCZMmMDcuXP59NNP93ubBQsWUFRUxOjRow8zepg/fz5nnHEGxcXFnHbaabz44ovdvq3f7+fhhx9mypQpjBgxggsuuICvvvqqy2P//e9/M2PGDIqLi5k8eTLXXnste/bsaT9mzpw5BAIBXn311cN+Xn2BrrcDEK0+/fRTbrrpJgwGA2effTaFhYUEAgFWrFjBww8/zPbt27n33nt7O8x2S5YsQVGU3g7jiMjIyOCWW24BoLGxkXfeeYc77riDkpISfv3rX3f7frZv385jjz3G+PHjycnJ6fbtevO1jKXzOHv2bE444QTC4TA7duxg3rx5fP7557z++usMGTIEgIaGBq6++mo2bNjAySefzOzZs7FYLOzatYtFixbx+uuvs379+oj7VVWV9957j+zsbD755BOcTic2m603nuJ+PfPMM1G9/3nz5pGYmMicOXMi2seNG8fatWvR6/VRffyuPPPMM8yaNYu4uLheefwjpbdfx1gWDoe59tpr2bJlC1dddRWJiYm88sorXH755bz11lv069dvv7d/8cUXue+++zjppJP41a9+hc/n4+233+a6667j0Ucf5bTTTutwG5fLxcMPP4zFYjns+F999VX++Mc/cvrpp/PTn/6U77//nvvuuw+Px8O11157wNvfdtttLF26lCuuuIJ+/frx9ttvc+211/L8888zduzY9uMCgQDXXXcdq1at4oILLqCoqIiWlhbWrFmDw+EgIyMDAKPRyDnnnMNzzz3H5ZdfHjN/A7qkil63e/duddSoUeqMGTPU6urqDv0lJSXqc889d9iPEw6HVY/Hc9j3czSZO3euOmvWrIg2t9utnnDCCeqoUaNUv9/f7ftavHixWlhYqH7zzTcHPFbORfeVlZWphYWF6tNPPx3R/tFHH6mFhYXqH/7wh/a2a6+9Vh08eLC6ZMmSDvfj8/nUP/3pTx3aly1bphYWFqrLli1Thw0bpr711ltH/kl0wuVy9cjjdNesWbPUuXPn9nYYETZs2KAWFhaqX3/9dW+H0q6wsFD95z//2dthHLS+ErfX61VDodBB3ea9995TCwsL1cWLF7e31dfXq2PHjlVvueWWA97+tNNOU8877zw1HA63tzkcDnXUqFHq9ddf3+ltHn74YfX0009Xf/WrX6mjRo06qHj35vF41PHjx6vXXnttRHvb/TY1Ne339mvWrOnw/ef1etVTTjlFveiiiyKOfeqpp9Rhw4apa9asOWBc69at63OfrUMlpQV9wNNPP43b7eb+++8nLS2tQ39+fj5XXnll+7/ffPNNrrjiCiZNmsTw4cOZOXMmr7zySofbTZs2jeuuu44vvviCOXPmMGLECF599dX2Wq1Fixbx2GOPMXXqVEaPHs0vf/lLHA4Hfr+f+++/n0mTJjF69Ghuv/32DrU0ndVWbt68mblz5zJixAhOOOEE/vWvf/Hmm29SVFQUcbm9La7vv/+e888/n+LiYqZPn84777wTcX9NTU38+c9/5swzz2T06NGMGTOGq6++ms2bN3d4rj6fj0cffZTTTz+d4uJipkyZwo033sju3bu7dQ72ZjabGTlyJG63m4aGBioqKrjrrrs4/fTTGTFiBBMmTOCXv/xlxHN66623uOmmmwC44oor2i+Dt9XDdXUuOnst33rrLYqKitp/tU+cOJGxY8dy55134vf7aWlp4be//S3jxo1j3LhxPPTQQ6iqGvEc3G43f/rTnzjxxBMZPnw4p59+Os8880yH47p67BUrVvDggw+2X8a74YYbaGho6PBaffbZZ1x66aWMGjWK0aNHc+2117Jt27aDfs0PxcSJEwHaz8OaNWv49NNPOf/88zn99NM7HG8wGLj11ls7tC9cuJCBAwcyceJEJk2axMKFC7sdQ1FREffccw8LFixof+/NmTOH7777LuK4Rx99lKKiIrZv386vfvUrxo0bx6WXXgpAMBjk8ccf55RTTmH48OFMmzaNv/3tbx0+c53VyPr9fv75z39y6qmnMnz4cE488UQeeuihTmvf3n33Xc4//3xGjhzJuHHjuOyyy/jyyy+B1vfBtm3bWL58eft7t+2xuqrtXLx4cft7ecKECfz617+muro64pjbbruN0aNHU11dzf/93/8xevRoJk6cyJ///GdCodABX98PP/wQvV4fMerUprq6mttvv53JkyczfPhwZs2axRtvvNHe7/V6mTFjBjNmzIgozWpqamLKlClcfPHF7TG0xVlWVsZVV13FqFGjmDJlCo899liHz8y+uvP90NXrePnllzN79my2b9/O5ZdfzsiRI5k6dSr/+c9/OjxOd8+13+/ngQceYOLEiYwePZrrr78+4rJyV+rq6hg6dCiPPfZYh76dO3dSVFTESy+91P4adue7ue05v/feezzyyCNMnTqVkSNH4nQ6CQQC7Nixg5qamgPGtnTpUlJSUiJGTpOSkjjjjDP46KOPDljr6XQ6SU5Ojhh5tNlsWK1WTCZTh+NLSkp47rnnuP3229HpDu/C9bfffktTU1P7573NZZddhtvtPmB5w5IlS9BqtVx00UXtbUajkfPPP59Vq1ZRVVUFtI5av/DCC5xyyimMGDGCYDCIx+Pp8n6HDx9OQkICH3300aE/uT5CSgv6gE8++YTc3FzGjBnTrePnzZvHoEGDmDZtGjqdjk8++YS7774bVVW57LLLIo7dtWsXv/rVr7jooou48MILKSgoaO976qmnMJlMXHvttZSWlvLSSy+h0+lQFIWWlhZuvPFG1qxZw1tvvUV2djY33nhjlzFVV1e3J9vXXnstFouF+fPnYzAYOj2+tLSUm266ifPPP59zzz2XN998k9tuu41hw4YxaNAgAMrKyvjwww+ZMWMGOTk51NXV8dprrzF37lzee+890tPTAQiFQlx33XUsW7aMWbNmccUVV+Byufjqq6/YunUreXl53Xpd91ZeXo5Wq8Vut/PZZ5+xatUqZs2aRUZGBhUVFcybN48rrriC9957D7PZzLhx47j88st58cUXuf766+nfvz8AAwYM6Na56Mx9991HSkoKv/jFL1izZg2vvfYacXFxrFq1iszMTG6++WY+//xznnnmGQoLCznnnHOA1kvlP//5z/n22285//zzGTJkCF988QUPPfQQ1dXV/O53vzvg87/vvvuw2+3ceOONVFRU8Pzzz3PPPffw97//vf2Yd955h9tuu40pU6bw61//Go/Hw7x587j00kt5++2
"text/plain": [
"<Figure size 700x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAk4AAAGGCAYAAACNCg6xAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAPxxJREFUeJzt3XlcVGX///E3jKCYiIq4LynGqAEKrhDmnqZZLpUtZpaWWi5Z5pZpKAqaWpnmnoq5lGHemZVlpXl3k2k3bqmkuS+3Ai4goOAwvz/6Od8mxA44OgO+no+Hj5rrXHPO51xzgPecc80ZN6vVahUAAAD+kbuzCwAAACgsCE4AAAAGEZwAAAAMIjgBAAAYRHACAAAwiOAEAABgEMEJAADAIIITAACAQQQnAAAAgwhOwC3Upk0bjRo1ytllFAnOHMui+DqOGjVKbdq0cXYZt9WJEydkNpu1Zs0aZ5eCQqyYswsA8uvYsWNauHChfvrpJ509e1YeHh4KCAjQgw8+qJ49e6pEiRLOLrHIeeaZZ/TLL7/YHvv4+Kh69ep68skn1b17d7m7G38PdvDgQX311Vfq1q2bqlWrdivKvaNdunRJS5Ys0TfffKPjx4/LYrGoRo0aatmypXr37q2KFSveljqWL18uLy8vde/e/bZsD7hdCE4oVDZt2qShQ4fK09NTjzzyiAICApSdna1ff/1Vb7/9tg4ePKiJEyc6u0ybr7/+Wm5ubs4uwyEqVaqkV199VZJ0/vx5rV27Vm+88YaOHDmi4cOHG17PwYMHNWvWLDVt2jRfwcmZY1lYXsfjx4+rT58+On36tDp27KiePXvKw8NDiYmJ+vTTT7Vx40Zt2LDhttSycuVKlS1b1qWCU9WqVbVr1y4VK8afPhQcRw8KjePHj2vYsGGqUqWKli5dqgoVKtiWPf300zp69Kg2bdp009uxWq26cuWKQ85ceXp63vQ6XIW3t7ceeeQR2+OePXuqY8eOWr58uYYOHSoPDw+Hb/Ovr4Uzx7IwvI5Xr17VoEGDlJKSotjYWDVu3Nhu+bBhw7RgwQInVecYV69eVU5OToFfDzc3NxUvXtzBVeFOwxwnFBoLFy5URkaGJk2aZBearqlZs6aeffZZ2+O4uDj17t1bYWFhCgwMVKdOnbRixYpcz2vTpo369++vLVu2qHv37goODtaqVau0detWmc1mffnll5o1a5ZatGihkJAQDRkyRGlpacrKytKkSZMUFhamkJAQjR49WllZWbnW/fe5Mfv371evXr0UHBys+++/Xx988IHi4uJkNpt14sSJXHVt375djz76qIKCgtS2bVutXbvWbn0XLlzQlClT1KVLF4WEhCg0NFT9+vXT/v37c+3rlStX9P7776tDhw4KCgpSRESEBg0apGPHjhl6Df7Ky8tLDRo0UEZGhs6dO6eTJ0/qrbfeUocOHRQcHKxmzZppyJAhdvu0Zs0aDR06VJLUu3dvmc1mmc1mbd269YavxfXGcs2aNTKbzdq+fbuioqLUvHlzNW7cWOPGjVNWVpZSU1M1YsQINWnSRE2aNNHUqVNltVrt9iEjI0MxMTFq2bKlAgMD1aFDBy1atChXv7y2/euvvyo6OlrNmzdXw4YN9fLLL+vcuXO5xmrz5s166qmn1LBhQ4WEhOjFF1/UgQMH8j3mN/LNN99o//79GjBgQK7QJEmlSpXSsGHD8nz+teP92mtxzfXmBSUlJWn06NG6//77FRgYqIiICA0cOND2Wrdp00YHDhzQL7/8YnuNn3nmGdvzU1NTNWnSJNu4t2/fXvPnz1dOTk6u7S5atEhLlixRu3btFBQUpD/++CPPffjpp5/05JNPqnHjxgoJCVGHDh00Y8aMPPfl2j5f79/f538ZeQ3/aVxQNHDGCYXGDz/8oOrVqys0NNRQ/5UrV+qee+5RmzZtVKxYMf3www+KjIyU1WrV008/bdf38OHDeu2119SzZ089/vjjqlWrlm3Z/PnzVaJECb344os6evSoPvroIxUrVkxubm5KTU3VoEGDtHPnTq1Zs0ZVq1bVoEGD8qzpzJkztnD34osvqmTJklq9enWe76CPHj2qoUOH6tFHH1W3bt0UFxenUaNG6d5779U999wj6c8zcRs3blTHjh1VrVo1JScn6+OPP1avXr20fv1625wWi8Wi/v37Kz4+Xp07d1bv3r2Vnp6un376Sb///rtq1KhhaFz/6sSJEzKZTCpdurQ2b96shIQEde7cWZUqVdLJkye1cuVK9e7dW+vXr5eXl5eaNGmiZ555RsuWLdOAAQNUu3ZtSZK/v7+h1+J6oqKiVL58eQ0ePFg7d+7Uxx9/LG9vbyUkJKhy5coaNmyYfvzxRy1atEgBAQHq2rWrpD/PZg0cOFBbt27Vo48+qnr16mnLli2aOnWqzpw5ozFjxvzj/kdFRal06dIaNGiQTp48qaVLl2rChAl69913bX3Wrl2rUaNGKSIiQsOHD1dmZqZWrlypp556Sp999pnD5nl99913kmR3VvBWGTx4sA4ePKhevXqpatWqOnfunH766SedPn1a1apV05gxYzRx4kSVLFlSAwYMkCSVL19ekpSZmalevXrpzJkzeuKJJ1S5cmUlJCRoxowZSkpK0htvvGG3rTVr1ujKlSt6/PHH5enpKR8fn+vWdODAAfXv319ms1lDhgyRp6enjh49qv/+97957oe/v7+mTp1q15aWlqaYmBiVK1fO1mb0NfyncUERYQUKgbS0NGtAQIB14MCBhp+TmZmZq+3555+3tm3b1q6tdevW1oCAAOuPP/5o1/7zzz9bAwICrA899JA1KyvL1v7qq69azWaztV+/fnb9e/bsaW3dunWudY8cOdL2eOLEiVaz2Wzdu3evre38+fPWpk2bWgMCAqzHjx/PVde2bdtsbSkpKdbAwEBrTEyMre3KlStWi8Vit93jx49bAwMDrbNmzbK1ffrpp9aAgADr4sWLc41LTk5Orra/6tWrl7Vjx47WlJQUa0pKivXgwYPWiRMnWgMCAqz9+/e3Wq3XH++EhARrQECA9bPPPrO1ffXVV9aAgADrzz//nKt/Xq/FtWV/Hcu4uDhrQECA9fnnn7erv2fPnlaz2WwdN26cre3q1avW+++/39qrVy9b27fffmsNCAiwfvDBB3bbGTx4sNVsNluPHj36j9vu06eP3bYnT55srVevnjU1NdVqtVqtly5dsjZu3Ng6duxYu20kJSVZGzVqlKv9ZnTt2tXaqFEjw/1Hjhxpd7xeO97//rocP37cGhAQYI2Li7NarVbrxYsXrQEBAdaFCxfecP2dO3e2G+9rZs+ebW3YsKH18OHDdu3Tpk2z1qtXz3rq1Cm77YaGhlpTUlL+cX8WL15sDQgIuGHfv+/L3+Xk5Fj79+9vbdiwofXAgQNWq9X4a2h0XFD4cakOhcKlS5ckSXfddZfh5/x1jlJaWprOnTunpk2b6vjx40pLS7PrW61aNbVo0eK663nkkUfs5u8EBwfLarWqR48edv2Cg4N1+vRpXb16Nc+atmzZooYNG6pevXq2tjJlyqhLly7X7V+nTh27yy7lypVTrVq1dPz4cVubp6en7VNtFotF58+fV8mSJVWrVi3t3bvX1u+bb75R2bJl1atXr1zbMTLx+dChQwoLC1NYWJg6deqkjz76SK1atdLkyZMl2Y93dna2zp8/rxo1aqh06dJ2dfyTG70W1/Poo4/a1X/t9Xn00UdtbSaTSYGBgXbj9uOPP8pkMtldQpKk559/XlarVT/++OM/bvvxxx+323bjxo1lsVh08uRJSdJ//vMfpaamqnPnzjp37pztn7u7uxo0aJDrstjNuHTpUr5+PgqqRIkS8vDw0C+//KKLFy/m+/lff/21GjVqpNKlS9uNSXh4uCwWi7Zt22bX/4EHHrA7+5OX0qVLS/rzzNtfL/nlx+zZs/XDDz8oJiZGderUkWT8NbzZcUHhwaU6FAqlSpWSJKWnpxt+zq+//qr3339fO3bsUGZmpt2ytLQ0eXt72x7f6DR6lSpV7B5fe17lypVztefk5CgtLU1ly5a97rpOnjyphg0b5mrP6zL
"text/plain": [
"<Figure size 600x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAxYAAAGGCAYAAADmRxfNAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAWvBJREFUeJzt3XlcVGX///E3jKzhCrim4sa4gALuZrnmllbummbu5dptdZum3xQ3NDMLLTM1FTMTcynXyswyQ9PSNDOt3FBJBNxBgWF+f/hjbkdAwQEG8PV8PHzknDnnXJ+Z64zNe851neNgNpvNAgAAAAAbONq7AAAAAAD5H8ECAAAAgM0IFgAAAABsRrAAAAAAYDOCBQAAAACbESwAAAAA2IxgAQAAAMBmBAsAAAAANiNYAAAAALAZwQJAGi1bttS4cePsXUaBYM/3siD24/PPP6+OHTvau4w8b968eTIajTmy771798poNGrv3r027ysv9WdB/LwAua2QvQsAHhZnzpzR4sWLtXv3bkVHR8vJyUm+vr5q3769evbsKVdXV3uXWOA8//zz+vnnny2PixYtqvLly6t3797q0qWLHB0z/9vK33//ra1bt6pz58569NFHc6Lch9r169e1bNkyff3114qMjJTJZFKFChXUrFkz9evXT6VKlbJ3icADS0hI0OLFi9WgQQM1bNjQ3uUAOYZgAeSCnTt36uWXX5azs7OeeeYZ+fr6KikpSb/88otmz56tv//+W1OnTrV3mRbbtm2Tg4ODvcvIFqVLl9Yrr7wiSbp06ZI2bNigCRMm6NSpU3rttdcyvZ+///5b8+fPV4MGDbIULOz5XuaXfoyMjFT//v0VFRWldu3aqWfPnnJyctKxY8f0+eefa/v27frqq6/sXSb+v/r16+vQoUNycnKydyn5RkJCgubPn6+RI0cSLFCgESyAHBYZGakxY8aobNmyWr58uUqWLGl5rk+fPjp9+rR27txpcztms1m3bt3KljMfzs7ONu8jryhcuLCeeeYZy+OePXuqXbt2WrlypV5++eUc+XJ0Z1/Y873MD/2YnJyskSNHKjY2VmFhYapXr57V82PGjNGiRYtyva6EhAS5ubnlerv5gaOjo1xcXOxdBiTFx8fL3d3d3mUAFsyxAHLY4sWLFR8fr+nTp1uFilQVK1bUCy+8YHm8du1a9evXT40bN5afn586dOigTz/9NM12LVu21Isvvqhdu3apS5cuql27tj777DPL+OctW7Zo/vz5evzxxxUYGKjRo0fr2rVrSkxM1PTp09W4cWMFBgZq/PjxSkxMTLPvu8ca//nnn+rbt69q166tJ554Qh988IHWrl0ro9Gos2fPpqlr//796tatm/z9/dWqVStt2LDBan+XL1/WrFmz1KlTJwUGBiooKEiDBw/Wn3/+mea13rp1S/PmzVPbtm3l7++vpk2bauTIkTpz5kym+uBObm5uqlOnjuLj4xUXF6dz585p8uTJatu2rWrXrq2GDRtq9OjRVq9p3bp1evnllyVJ/fr1k9FotBpjnlFfpPderlu3TkajUfv379e0adPUqFEj1atXT2+++aYSExN19epVjR07VvXr11f9+vX11ltvyWw2W72G+Ph4zZw5U82aNZOfn5/atm2rJUuWpFkvo7Z/+eUXhYSEqFGjRgoICNCIESMUFxeX5r36/vvv9dxzzykgIECBgYEaOnSo/vrrryy/5/fy9ddf688//9RLL72UJlRIkoeHh8aMGZNm+d9//63nn39ederU0eOPP54mfKS+1jv7UUp/fkDqOP/ff/9dffr0UZ06dfTOO+/o7NmzMhqNWrJkiVavXq3WrVvLz89PXbt21aFDh+772jJ7jN/5mV2wYIGeeOIJ+fv764UXXtDp06et1t2/f79Gjx6t5s2by8/PT82aNdOMGTN08+bNe9bSt29fPf300+k+17ZtWw0aNMjyePPmzerSpYul5k6dOmn58uX3fA9PnTqlUaNG6bHHHpO/v7+eeOIJjRkzRteuXbvv+yRJv//+u3r16qXatWurZcuWWrVqleW5GzduKCAgQNOmTUuz3b///qsaNWpo4cKF99x/SkqKli9frk6dOsnf31+NGjXSoEGDdPjw4Qy3yWieSnrH1uHDhzVo0CA1bNjQ8hrGjx8vSTp79qwaN24sSZo/f77l34958+ZZtv/nn380evRoNWjQQP7+/urSpYu+/fbbdNv9+eefNXnyZDVu3FjNmjW75+sGchtnLIAc9t1336l8+fIKCgrK1PqrVq1StWrV1LJlSxUqVEjfffedgoODZTab1adPH6t1T548qVdffVU9e/ZUjx49VKlSJctzH330kVxdXTV06FCdPn1an3zyiQoVKiQHBwddvXpVI0eO1G+//aZ169apXLlyGjlyZIY1XbhwwRJ+hg4dKnd3d61ZsybDX8RPnz6tl19+Wd26dVPnzp21du1ajRs3TrVq1VK1atUk3T6Ts337drVr106PPvqoYmJitHr1avXt21ebN2+2jKk3mUx68cUXFRERoaeeekr9+vXTjRs3tHv3bh0/flwVKlTI1Pt6p7Nnz8pgMKhIkSL6/vvvdeDAAT311FMqXbq0zp07p1WrVqlfv37avHmz3NzcVL9+fT3//PNasWKFXnrpJVWuXFmSVKVKlUz1RXqmTZsmLy8vjRo1Sr/99ptWr16twoUL68CBAypTpozGjBmjH374QUuWLJGvr6+effZZSbfPhgwbNkx79+5Vt27dVKNGDe3atUtvvfWWLly4oDfeeOO+r3/atGkqUqSIRo4cqXPnzmn58uWaMmWK3n33Xcs6GzZs0Lhx49S0aVO99tprSkhI0KpVq/Tcc89p/fr12TbPJPXL051nle7nypUrGjx4sJ588km1b99eX331ld5++235+vo+8Bety5cva8iQIXrqqaf09NNPy9PT0/Lcpk2bdOPGDfXs2VMODg5avHixRo0ape3bt9/zjFdmj/FUixYtkoODgwYOHKjr169r8eLFeu2117RmzRrLOtu2bdPNmzfVu3dvFStWTIcOHdInn3yif//9V6GhoRnW8swzz2jixIk6fvy4fH19LcsPHTqkU6dOadiwYZKk3bt365VXXlHjxo0tQwVPnDihX3/91eoHkDslJiZq0KBBSkxMVN++feXl5aULFy5o586dunr1qgoXLnyPd/52fw4dOlTt27fXU089pa1bt2ry5MlycnJSt27d9Mgjj6h169baunWrxo8fL4PBYNl206ZNMpvN6tSp0z3bmDBhgtatW6cnnnhC3bp1k8lk0v79+/Xbb7/J39//ntveT2xsrAYNGqTixYtr6NChKlKkiM6ePatvvvlGklSiRAlNnjxZkydP1pNPPqknn3xSkiyh5a+//lLv3r1VqlQpDRkyRO7u7tq6datGjBihefPmWdZPFRwcrBIlSmjEiBGKj4+3qXYg25kB5Jhr166ZfX19zcOGDcv0NgkJCWmWDRw40NyqVSurZS1atDD7+vqaf/jhB6vle/bsMfv6+po7duxoTkxMtCx/5ZVXzEaj0Tx48GCr9Xv27Glu0aJFmn2//vrrlsdTp041G41G8x9//GFZdunSJXODBg3Mvr6+5sjIyDR17du3z7IsNjbW7OfnZ545c6Zl2a1bt8wmk8mq3cjISLOfn595/vz5lmWff/652dfX17x06dI070tKSkqaZXfq27evuV27dubY2FhzbGys+e+//zZPnTrV7Ovra37xxRfNZnP67/eBAwfMvr6+5vXr11uWbd261ezr62ves2dPmvUz6ovU5+58L9euXWv29fU1Dxw40Kr+nj17mo1Go/nNN9+0LEtOTjY/8cQT5r59+1qWffPNN2ZfX1/zBx98YNXOqFGjzEaj0Xz69On7tt2/f3+rtmfMmGGuUaOG+erVq2az2Wy+fv26uV69euaJEydatXHx4kVz3bp10yy3xbPPPmuuW7duptfv27dvmr65deuW+bHHHjOPGjXKsiz1td55bJrN//t83NmPqftctWqV1bqRkZFmX19fc4MGDcyXL1+2LN++fbvZ19fXvGPHjnvWmtl
"text/plain": [
"<Figure size 800x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"=== Carmignac Patrimoine ===\n",
"Clients after filtering: 319\n",
"Chosen K: 6\n",
" cluster n_clients\n",
"0 0 62\n",
"1 1 16\n",
"2 2 158\n",
"3 3 20\n",
"4 4 41\n",
"5 5 22\n",
"\n",
"Churn profile:\n",
" churn_hard churn_soft churn_warning\n",
"cluster \n",
"0 0.242 0.823 0.903\n",
"1 0.750 0.938 0.938\n",
"2 0.342 0.671 0.665\n",
"3 0.900 1.000 0.850\n",
"4 0.293 0.561 0.659\n",
"5 0.682 0.909 0.818\n"
]
}
],
"source": [
"# 34 - Test on one family\n",
"\n",
"test_family = top_15_families[0]\n",
"result_test = run_family_clustering(\n",
" df_base=df_family_client_base,\n",
" family_name=test_family,\n",
" features=family_cluster_features,\n",
" k_min=4,\n",
" k_max=10,\n",
" min_clients=40,\n",
" min_cluster_size=10,\n",
" min_cluster_share=0.02,\n",
" random_state=RANDOM_STATE,\n",
" make_plots=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e339c5fb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Patrimoine\n",
"==========================================================================================\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Sécurité\n",
"==========================================================================================\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Credit <NUM>\n",
"==========================================================================================\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Investissement\n",
"==========================================================================================\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Portfolio Sécurité\n",
"==========================================================================================\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Portfolio Flexible Bond\n",
"==========================================================================================\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Portfolio Credit\n",
"==========================================================================================\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Emergents\n",
"==========================================================================================\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Court Terme\n",
"==========================================================================================\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Portfolio Long-Short European Equities\n",
"==========================================================================================\n",
"[SKIP] Carmignac Portfolio Long-Short European Equities: no valid K with cluster size constraints.\n",
"\n",
"==========================================================================================\n",
"Running clustering for family: Carmignac Portfolio Grande Europe\n",
"==========================================================================================\n"
]
}
],
"source": [
"# 35 - Run clustering on the top 15 fund families\n",
"\n",
"family_results = {}\n",
"family_summary_rows = []\n",
"\n",
"for fam in top_15_families:\n",
" print(\"\\n\" + \"=\" * 90)\n",
" print(f\"Running clustering for family: {fam}\")\n",
" print(\"=\" * 90)\n",
"\n",
" res = run_family_clustering(\n",
" df_base=df_family_client_base,\n",
" family_name=fam,\n",
" features=family_cluster_features,\n",
" k_min=4,\n",
" k_max=10,\n",
" min_clients=40,\n",
" min_cluster_size=10,\n",
" min_cluster_share=0.02,\n",
" random_state=RANDOM_STATE,\n",
" make_plots=False\n",
" )\n",
"\n",
" if res is not None:\n",
" family_results[fam] = res\n",
" best_row = res[\"metrics_df\"].loc[res[\"metrics_df\"][\"k\"] == res[\"best_k\"]].iloc[0]\n",
"\n",
" family_summary_rows.append({\n",
" \"fund_family\": fam,\n",
" \"n_clients\": res[\"n_clients\"],\n",
" \"best_k\": res[\"best_k\"],\n",
" \"silhouette\": best_row[\"silhouette\"],\n",
" \"davies_bouldin\": best_row[\"davies_bouldin\"],\n",
" \"min_cluster_size\": int(best_row[\"min_cluster_size\"]),\n",
" })\n",
"\n",
"family_summary = pd.DataFrame(family_summary_rows).sort_values(\n",
" [\"silhouette\", \"n_clients\"],\n",
" ascending=[False, False]\n",
").reset_index(drop=True)\n",
"\n",
"family_summary"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5c7e517",
"metadata": {},
"outputs": [],
"source": [
"# 36 - Visual and business review for one selected family\n",
"\n",
"selected_family = top_15_families[0]\n",
"if selected_family in family_results:\n",
" _ = run_family_clustering(\n",
" df_base=df_family_client_base,\n",
" family_name=selected_family,\n",
" features=family_cluster_features,\n",
" k_min=4,\n",
" k_max=10,\n",
" min_clients=40,\n",
" min_cluster_size=10,\n",
" min_cluster_share=0.02,\n",
" random_state=RANDOM_STATE,\n",
" make_plots=True\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ba643a5",
"metadata": {},
"outputs": [],
"source": [
"# 37 - Summary table of clustering results across families\n",
"\n",
"family_summary"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67c30d96",
"metadata": {},
"outputs": [],
"source": [
"# # 38 - Optional exports\n",
"\n",
"# family_summary.to_csv(\"family_clustering_summary.csv\", index=False)\n",
"\n",
"# for fam, res in family_results.items():\n",
"# safe_name = re.sub(r\"[^A-Za-z0-9_]+\", \"_\", fam)\n",
"# res[\"df_family\"].to_csv(f\"clustered_family_{safe_name}.csv\", index=False)\n",
"# res[\"cluster_profile\"].to_csv(f\"cluster_profile_{safe_name}.csv\")\n",
"# res[\"churn_profile\"].to_csv(f\"churn_profile_{safe_name}.csv\")"
]
},
{
"cell_type": "markdown",
"id": "ccc09ae0",
"metadata": {},
"source": [
"### Notes for interpretation\n",
"For each family, the clustering combines:\n",
"- **activity** (`flow_freq`)\n",
"- **intensity** (`gross_flow_to_aum_family`, rolling turnover features)\n",
"- **allocation importance** (`family_share_of_client_aum_mean`)\n",
"- **recency and exit behavior** (`months_since_last_tx_family`, `exit_rate_per_isin_family`)\n",
"- **performance sensitivity** (`corr_flow_ret_3m`, `corr_flow_ret_6m`, buy-after-good-performance shares)\n",
"- **churn proxy** (analyzed after clustering)\n",
"\n",
"This is the family-level analogue of the original global clustering notebook, but adapted to the finer and more sparse `client × family` structure."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca108af5-69fe-4ae6-bf6e-088c6e7b0fcb",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}