Project_Carmignac/clustering.ipynb

3302 lines
3.7 MiB
Plaintext
Raw Normal View History

2026-04-07 20:26:19 +02:00
{
"cells": [
{
"cell_type": "markdown",
"id": "13c6141d",
"metadata": {},
"source": [
"# Behavioral Clustering of Carmignac Investors\n",
"\n",
"This notebook implements two complementary clustering analyses:\n",
"\n",
"| | Scope | Approach |\n",
"|---|---|---|\n",
"| **Part 1** | All active accounts (~7,000) | Global behavioral clustering |\n",
2026-04-08 17:41:37 +02:00
"| **Part 2** | Top ~400 accounts (AUM > €5M) | High-conviction clustering with performance reactivity features |\n",
2026-04-07 20:26:19 +02:00
"\n",
"Both analyses share the same preprocessing pipeline (RobustScaler, MAD winsorization) and visualization conventions (robust z-score heatmaps).\n",
"\n",
"---\n",
"**Structure:**\n",
"1. Imports & Configuration\n",
"2. Data Loading\n",
"3. Monthly Panel Construction\n",
"4. Feature Engineering\n",
"5. **Part 1** — Global Clustering (all accounts)\n",
" - 5a. Feature selection & preprocessing\n",
" - 5b. K-selection & clustering\n",
" - 5c. Cluster profiles (behavioral + allocation)\n",
" - 5d. Asset-type sub-clustering & cross-analysis\n",
"6. **Part 2** — Top 400 Accounts Clustering\n",
" - 6a. Account selection & feature engineering\n",
" - 6b. K-selection & clustering\n",
" - 6c. Cluster profiles & churn analysis\n",
"7. Cross-Analysis: Global vs Top 400\n"
]
},
{
"cell_type": "markdown",
"id": "28e588fe",
"metadata": {},
"source": [
"---\n",
"## 1. Imports & Configuration\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3bc1ffe0",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import s3fs\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"os.environ[\"AWS_ACCESS_KEY_ID\"] = 'UMMV3Z72A70MCCSRV17O'\n",
"os.environ[\"AWS_SECRET_ACCESS_KEY\"] = 'wBFxaez78UPNW3BtchZOf4f238ZNXKnCexeGufaa'\n",
"os.environ[\"AWS_SESSION_TOKEN\"] = 'eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NLZXkiOiJVTU1WM1o3MkE3ME1DQ1NSVjE3TyIsImFjciI6IjAiLCJhbGxvd2VkLW9yaWdpbnMiOlsiKiJdLCJhdWQiOlsibWluaW8iLCJhY2NvdW50Il0sImF1dGhfdGltZSI6MTc3NTEzNTA4NiwiYXpwIjoib255eGlhLW1pbmlvIiwiZW1haWwiOiJzYXJhaC50aG91bXlyZUBlbnNhZS5mciIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJleHAiOjE3NzYzNDQ3NDksImZhbWlseV9uYW1lIjoiVEhPVU1ZUkUiLCJnaXZlbl9uYW1lIjoiU2FyYWgiLCJncm91cHMiOlsiYmRjLWRhdGEiLCJiZGMtY2FybWlnbmFjLWczIl0sImlhdCI6MTc3NTEzNTE0OCwiaXNzIjoiaHR0cHM6Ly9hdXRoLmdyb3VwZS1nZW5lcy5mci9yZWFsbXMvZ2VuZXMiLCJqdGkiOiJlZGY1ZDQ1OC1hYzkxLTQ5NTAtYmI5Ny0zNjMwNWY1MTQwYTIiLCJuYW1lIjoiU2FyYWggVEhPVU1ZUkUiLCJwb2xpY3kiOiJzdHNvbmx5IiwicHJlZmVycmVkX3VzZXJuYW1lIjoic3Rob3VteXJlLWVuc2FlIiwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbIm9mZmxpbmVfYWNjZXNzIiwiZGVmYXVsdC1yb2xlcy1nZW5lcyIsInVtYV9hdXRob3JpemF0aW9uIl19LCJyZXNvdXJjZV9hY2Nlc3MiOnsiYWNjb3VudCI6eyJyb2xlcyI6WyJtYW5hZ2UtYWNjb3VudCIsIm1hbmFnZS1hY2NvdW50LWxpbmtzIiwidmlldy1wcm9maWxlIl19fSwic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSBlbWFpbCIsInNpZCI6IjMzMjg4YjJjLTlhMjAtNDNhOS1iMDlhLTdlMjc1OWQ1NjIxNiIsInN1YiI6ImVhYWVkN2QyLWM4MjYtNGIxNC05MzczLTYwYjNhODhlMWFiNiIsInR5cCI6IkJlYXJlciJ9.rffoTJijRiGK2DCDhXj5y8R31DRH1LWkTwuH_1lvU9qN_xJSTmBIM4uGR_zp7XpMnq_ePwVhlkoWN15cNUgjMA'\n",
"os.environ[\"AWS_DEFAULT_REGION\"] = 'us-east-1'\n",
"\n",
"fs = s3fs.S3FileSystem(\n",
" client_kwargs={'endpoint_url': 'https://'+'minio-simple.lab.groupe-genes.fr'},\n",
" key = os.environ[\"AWS_ACCESS_KEY_ID\"], \n",
" secret = os.environ[\"AWS_SECRET_ACCESS_KEY\"], \n",
" token = os.environ[\"AWS_SESSION_TOKEN\"])\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"from sklearn.preprocessing import RobustScaler\n",
"from sklearn.cluster import KMeans\n",
"from sklearn.metrics import (\n",
" silhouette_score, davies_bouldin_score,\n",
" pairwise_distances, adjusted_rand_score\n",
")\n",
"from sklearn.linear_model import LinearRegression\n",
"\n",
"sns.set_style(\"whitegrid\")\n",
"pd.set_option(\"display.max_columns\", 200)\n",
"pd.set_option(\"display.max_rows\", 200)\n",
"\n",
"EPS = 1e-9\n",
"RANDOM_STATE = 42"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "69d2dc25",
"metadata": {},
"outputs": [],
"source": [
"# Column names\n",
"ID_COL = \"Registrar Account - ID\"\n",
"ISIN_COL = \"Product - Isin\"\n",
"FUND_COL = \"Product - Fund\"\n",
"ASSET_COL = \"Product - Asset Type\"\n",
"FLOW_DATE_COL = \"Centralisation Date\"\n",
"AUM_DATE_COL = \"Centralisation Date\"\n",
"FLOW_QTY_COL = \"Quantity - NetFlows\"\n",
"FLOW_SUB_COL = \"Quantity - Subscription\"\n",
"FLOW_RED_COL = \"Quantity - Redemption\"\n",
"AUM_QTY_COL = \"Quantity - AUM\"\n",
"AUM_VAL_COL = \"Value - AUM €\"\n",
"REGION_COL = \"Registrar Account - Region\"\n",
"COUNTRY_COL = \"RegistrarAccount - Country\"\n",
"NAV_DATE_COL = \"Dat\"\n",
"NAV_ISIN_COL = \"Isin\"\n",
"NAV_PRICE_COL = \"Price (TF PartPrice)\"\n",
"NAV_BENCH_COL = \"PriceBench\"\n",
"RATE_DATE_COL = \"Date\"\n",
"RATE_VAL_COL = \"Yld to Maturity\"\n",
"\n",
"PATH_NAV = \"s3://projet-bdc-data/carmignac/Data Modélisation/Nav/NAV_Bench_data.csv\"\n",
"PATH_RATES = \"s3://projet-bdc-data/carmignac/Data Modélisation/market data/esterRates.csv\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bf5b7a0a",
"metadata": {},
"outputs": [],
"source": [
"# SHARED UTILITIES\n",
"def robust_zscore(s):\n",
" med = np.nanmedian(s)\n",
" mad = np.nanmedian(np.abs(s - med))\n",
" if mad == 0 or np.isnan(mad):\n",
" return np.zeros(len(s))\n",
" return (s - med) / (1.4826 * mad)\n",
"\n",
"def plot_heatmap(dfc, profile_vars, cluster_col, title, figsize=(16, 4)):\n",
" \"\"\"Cluster signature heatmap using robust z-scores, capped at ±3 for readability.\"\"\"\n",
" dfc_viz = dfc[profile_vars + [cluster_col]].copy()\n",
" for col in profile_vars:\n",
" vals = pd.to_numeric(dfc_viz[col], errors=\"coerce\").to_numpy(dtype=float)\n",
" lo = np.nanpercentile(vals, 2)\n",
" hi = np.nanpercentile(vals, 98)\n",
" dfc_viz[col] = np.clip(vals, lo, hi)\n",
" prof = dfc_viz.groupby(cluster_col)[profile_vars].median()\n",
" prof_z = prof.apply(lambda col: robust_zscore(col.values), axis=0)\n",
" prof_z = prof_z.clip(-3, 3) # cap for readability\n",
" plt.figure(figsize=figsize)\n",
" sns.heatmap(prof_z, cmap=\"RdBu_r\", center=0, annot=True, fmt=\".2f\",\n",
" xticklabels=profile_vars,\n",
" yticklabels=[f\"Cluster {i}\" for i in range(len(prof))])\n",
" plt.title(title)\n",
" plt.xticks(rotation=45, ha=\"right\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
" return prof\n",
"\n",
"def winsorize_mad(series, n_sigma=3):\n",
" \"\"\"Winsorize using MAD n-sigma rule. Falls back to p95 clip when MAD~0.\"\"\"\n",
" vals = pd.to_numeric(series, errors=\"coerce\").to_numpy(dtype=float)\n",
" med = np.nanmedian(vals)\n",
" mad = np.nanmedian(np.abs(vals - med)) * 1.4826\n",
" if mad > 0:\n",
" return np.clip(vals, med - n_sigma * mad, med + n_sigma * mad)\n",
" else:\n",
" return np.clip(vals, 0, np.nanpercentile(vals, 95))\n",
"\n",
"def add_months_since_last_tx(dfc, df_month, id_col, suffix=\"\"):\n",
" \"\"\"Adds months_since_last_tx[suffix] to dfc.\"\"\"\n",
" col_name = f\"months_since_last_tx{suffix}\"\n",
" reference_date = df_month[\"month\"].max()\n",
" last_active = (\n",
" df_month[df_month[\"active_month\"] == 1]\n",
" .groupby(id_col)[\"month\"]\n",
" .max()\n",
" .reset_index(name=\"last_active_month\")\n",
" )\n",
" last_active[col_name] = (\n",
" (reference_date.to_period(\"M\") -\n",
" last_active[\"last_active_month\"].dt.to_period(\"M\"))\n",
" .apply(lambda x: x.n)\n",
" )\n",
" dfc = dfc.merge(last_active[[id_col, col_name]], on=id_col, how=\"left\")\n",
" max_months = dfc[col_name].max()\n",
" dfc[col_name] = dfc[col_name].fillna(max_months + 1)\n",
" return dfc"
]
},
{
"cell_type": "markdown",
"id": "312153e6",
"metadata": {},
"source": [
"---\n",
"## 2. Data Loading\n",
"\n",
"Three data sources are used:\n",
"- **AUM** (repaired): monthly share quantities per account and ISIN\n",
"- **Flows**: daily net transactions, aggregated to monthly\n",
"- **NAV / Rates**: fund performance and interest rate data for enrichment\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "011958df",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"flows: (2574461, 26)\n",
"aum: (4824814, 19)\n",
"nav: (623914, 6)\n"
]
}
],
"source": [
"df_flows = pd.read_csv(\"flows.csv\", low_memory=False)\n",
"df_aum = pd.read_csv(\n",
" \"s3://projet-bdc-carmignac-g3/paco/AUM_repaired.csv\", low_memory=False\n",
")\n",
"df_nav = pd.read_csv(PATH_NAV, sep=\";\")\n",
"df_rates = pd.read_csv(PATH_RATES, sep=\";\")\n",
"\n",
"# Date parsing\n",
"for df, col in [\n",
" (df_flows, FLOW_DATE_COL), (df_aum, AUM_DATE_COL),\n",
" (df_nav, NAV_DATE_COL), (df_rates, RATE_DATE_COL)\n",
"]:\n",
" df[col] = pd.to_datetime(df[col], errors=\"coerce\")\n",
" df[\"month\"] = df[col].dt.to_period(\"M\").dt.to_timestamp()\n",
"\n",
"for col in [FLOW_QTY_COL, FLOW_SUB_COL, FLOW_RED_COL]:\n",
" df_flows[col] = pd.to_numeric(df_flows[col], errors=\"coerce\")\n",
"for col in [AUM_QTY_COL, AUM_VAL_COL]:\n",
" df_aum[col] = pd.to_numeric(df_aum[col], errors=\"coerce\")\n",
"for col in [NAV_PRICE_COL, NAV_BENCH_COL]:\n",
" df_nav[col] = pd.to_numeric(df_nav[col], errors=\"coerce\")\n",
"df_rates[RATE_VAL_COL] = pd.to_numeric(df_rates[RATE_VAL_COL], errors=\"coerce\")\n",
"\n",
"for df in [df_flows, df_aum]:\n",
" df[ISIN_COL] = df[ISIN_COL].astype(str).str.strip()\n",
"df_nav[NAV_ISIN_COL] = df_nav[NAV_ISIN_COL].astype(str).str.strip()\n",
"\n",
"# Remove technical accounts (not investable)\n",
"df_flows = df_flows[~df_flows[ID_COL].isin(\n",
" [\"Off Distribution\", \"Private Clients\", \"Private Client\"]\n",
")]\n",
"df_aum = df_aum[~df_aum[ID_COL].isin(\n",
" [\"Off Distribution\", \"Private Clients\", \"Private Client\"]\n",
")]\n",
"\n",
"print(\"flows:\", df_flows.shape)\n",
"print(\"aum: \", df_aum.shape)\n",
"print(\"nav: \", df_nav.shape)"
]
},
{
"cell_type": "markdown",
"id": "d34f5ecf",
"metadata": {},
"source": [
"---\n",
"## 3. Monthly Panel Construction\n",
"\n",
"A full outer join of AUM and flows at `(account, ISIN, month)` granularity, enriched with NAV returns and interest rate changes.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "25f3dce4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Panel shape: (4754355, 24)\n"
]
}
],
"source": [
"df_flows_m = (\n",
" df_flows\n",
" .dropna(subset=[ID_COL, ISIN_COL, \"month\"])\n",
" .assign(\n",
" gross_flow_qty = lambda x: x[FLOW_QTY_COL].abs(),\n",
" sub_qty = lambda x: x[FLOW_SUB_COL].fillna(0),\n",
" red_qty = lambda x: x[FLOW_RED_COL].fillna(0)\n",
" )\n",
" .groupby([ID_COL, ISIN_COL, \"month\"], as_index=False)\n",
" .agg(\n",
" net_flow_qty = (FLOW_QTY_COL, \"sum\"),\n",
" gross_flow_qty = (\"gross_flow_qty\", \"sum\"),\n",
" sub_qty = (\"sub_qty\", \"sum\"),\n",
" red_qty = (\"red_qty\", \"sum\"),\n",
" n_tx = (FLOW_QTY_COL, \"size\"),\n",
" region = (REGION_COL, \"last\"),\n",
" country = (COUNTRY_COL, \"last\")\n",
" )\n",
")\n",
"\n",
"df_aum_m = (\n",
" df_aum\n",
" .dropna(subset=[ID_COL, ISIN_COL, \"month\"])\n",
" .groupby([ID_COL, ISIN_COL, \"month\"], as_index=False)\n",
" .agg(\n",
" aum_qty = (AUM_QTY_COL, \"sum\"),\n",
" aum_val = (AUM_VAL_COL, \"sum\"),\n",
" fund = (FUND_COL, \"last\"),\n",
" asset_type = (ASSET_COL, \"last\"),\n",
" region = (REGION_COL, \"last\"),\n",
" country = (COUNTRY_COL, \"last\")\n",
" )\n",
")\n",
"\n",
"keys = pd.concat([\n",
" df_flows_m[[ID_COL, ISIN_COL, \"month\"]],\n",
" df_aum_m[[ID_COL, ISIN_COL, \"month\"]]\n",
"]).drop_duplicates()\n",
"\n",
"df_rel_m = (\n",
" keys\n",
" .merge(df_aum_m, on=[ID_COL, ISIN_COL, \"month\"], how=\"left\")\n",
" .merge(df_flows_m, on=[ID_COL, ISIN_COL, \"month\"], how=\"left\",\n",
" suffixes=(\"\", \"_flow\"))\n",
")\n",
"\n",
"for c in [\"aum_qty\",\"aum_val\",\"net_flow_qty\",\"gross_flow_qty\",\n",
" \"sub_qty\",\"red_qty\",\"n_tx\"]:\n",
" df_rel_m[c] = df_rel_m[c].fillna(0)\n",
"\n",
"df_rel_m[\"region\"] = df_rel_m[\"region\"].fillna(df_rel_m.get(\"region_flow\"))\n",
"df_rel_m[\"country\"] = df_rel_m[\"country\"].fillna(df_rel_m.get(\"country_flow\"))\n",
"df_rel_m[\"active_rel_month\"] = (df_rel_m[\"gross_flow_qty\"] > 0).astype(int)\n",
"df_rel_m[\"holding_rel_month\"] = (df_rel_m[\"aum_qty\"] > 0).astype(int)\n",
"df_rel_m[\"flow_to_aum_rel\"] = df_rel_m[\"net_flow_qty\"] / (df_rel_m[\"aum_qty\"].abs() + EPS)\n",
"df_rel_m[\"turnover_rel\"] = df_rel_m[\"gross_flow_qty\"] / (df_rel_m[\"aum_qty\"].abs() + EPS)\n",
"\n",
"# --- NAV returns & interest rates ---\n",
"df_nav_m = (\n",
" df_nav\n",
" .dropna(subset=[NAV_ISIN_COL, \"month\", NAV_PRICE_COL])\n",
" .sort_values([NAV_ISIN_COL, \"month\"])\n",
" .groupby([NAV_ISIN_COL, \"month\"], as_index=False).tail(1).copy()\n",
")\n",
"df_nav_m[\"ret_fund_m\"] = df_nav_m.groupby(NAV_ISIN_COL)[NAV_PRICE_COL].pct_change()\n",
"df_nav_m[\"ret_bench_m\"] = df_nav_m.groupby(NAV_ISIN_COL)[NAV_BENCH_COL].pct_change()\n",
"df_nav_m[\"active_return_m\"] = df_nav_m[\"ret_fund_m\"] - df_nav_m[\"ret_bench_m\"]\n",
"df_nav_m = df_nav_m.rename(columns={NAV_ISIN_COL: ISIN_COL})[\n",
" [ISIN_COL, \"month\", \"ret_fund_m\", \"ret_bench_m\", \"active_return_m\"]\n",
"]\n",
"\n",
"df_rates_m = (\n",
" df_rates.dropna(subset=[\"month\", RATE_VAL_COL])\n",
" .sort_values(RATE_DATE_COL)\n",
" .groupby(\"month\", as_index=False).tail(1).copy()\n",
")\n",
"df_rates_m[\"delta_rate_m\"] = df_rates_m[RATE_VAL_COL].diff()\n",
"df_rates_m = df_rates_m[[\"month\", RATE_VAL_COL, \"delta_rate_m\"]]\n",
"\n",
"df_rel_m = df_rel_m.merge(df_nav_m, on=[ISIN_COL, \"month\"], how=\"left\")\n",
"df_rel_m = df_rel_m.merge(\n",
" df_rates_m[[\"month\", \"delta_rate_m\"]], on=\"month\", how=\"left\"\n",
")\n",
"for c in [\"ret_fund_m\",\"ret_bench_m\",\"active_return_m\",\"delta_rate_m\"]:\n",
" df_rel_m[c] = df_rel_m[c].fillna(0)\n",
"\n",
"print(\"Panel shape:\", df_rel_m.shape)"
]
},
{
"cell_type": "markdown",
"id": "9121da21",
"metadata": {},
"source": [
"---\n",
"## 4. Feature Engineering\n",
"\n",
"Features are built at three levels of granularity:\n",
"- **Account × month**: activity flags, turnover, drawdown\n",
"- **Account × ISIN**: entry/exit events, holding duration, performance reactivity\n",
"- **Account (static)**: aggregated behavioral summary used for clustering\n",
"\n",
"Asset type and fund composition shares are computed separately and used as **descriptive** post-clustering variables only.\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "d4a01bcc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Monthly panel shape: (931089, 22)\n",
"ISIN-level client features: (12582, 12)\n",
"Asset shares: (7473, 6)\n",
"Fund shares: (6591, 11)\n",
"df_client_base shape: (12582, 47)\n"
]
}
],
"source": [
"# 4a. Monthly account-level panel\n",
"tmp = df_rel_m.copy()\n",
"tmp[\"isin_held_flag\"] = (tmp[\"aum_qty\"] > 0).astype(int)\n",
"tmp[\"isin_active_flag\"] = (tmp[\"gross_flow_qty\"] > 0).astype(int)\n",
"\n",
"df_month = (\n",
" tmp.groupby([ID_COL, \"month\"], as_index=False)\n",
" .agg(\n",
" aum_qty = (\"aum_qty\", \"sum\"),\n",
" aum_val = (\"aum_val\", \"sum\"),\n",
" net_flow_qty = (\"net_flow_qty\", \"sum\"),\n",
" gross_flow_qty = (\"gross_flow_qty\", \"sum\"),\n",
" sub_qty = (\"sub_qty\", \"sum\"),\n",
" red_qty = (\"red_qty\", \"sum\"),\n",
" n_tx = (\"n_tx\", \"sum\"),\n",
" n_isin_held = (\"isin_held_flag\", \"sum\"),\n",
" n_isin_active = (\"isin_active_flag\", \"sum\"),\n",
" delta_rate_m = (\"delta_rate_m\", \"first\"),\n",
" ret_fund_m = (\"ret_fund_m\", \"mean\"),\n",
" region = (\"region\", \"first\"),\n",
" country = (\"country\", \"first\"),\n",
" )\n",
" .sort_values([ID_COL, \"month\"])\n",
" .reset_index(drop=True)\n",
")\n",
"\n",
"df_month[\"active_month\"] = (df_month[\"gross_flow_qty\"] > 0).astype(int)\n",
"df_month[\"flow_to_aum_m\"] = np.where(\n",
" df_month[\"aum_qty\"].abs() > 0,\n",
" df_month[\"net_flow_qty\"] / df_month[\"aum_qty\"].abs(), np.nan\n",
")\n",
"df_month[\"turnover_m\"] = np.where(\n",
" df_month[\"aum_qty\"].abs() > 0,\n",
" df_month[\"gross_flow_qty\"] / df_month[\"aum_qty\"].abs(), np.nan\n",
")\n",
"df_month[\"sub_share_m\"] = np.where(\n",
" df_month[\"gross_flow_qty\"] > 0,\n",
" df_month[\"sub_qty\"] / df_month[\"gross_flow_qty\"], np.nan\n",
")\n",
"df_month[\"red_share_m\"] = np.where(\n",
" df_month[\"gross_flow_qty\"] > 0,\n",
" df_month[\"red_qty\"] / df_month[\"gross_flow_qty\"], np.nan\n",
")\n",
"df_month[\"aum_peak_to_date\"] = df_month.groupby(ID_COL)[\"aum_qty\"].cummax()\n",
"df_month[\"aum_drawdown\"] = np.where(\n",
" df_month[\"aum_peak_to_date\"] > 0,\n",
" 1 - df_month[\"aum_qty\"] / df_month[\"aum_peak_to_date\"], np.nan\n",
")\n",
"\n",
"print(\"Monthly panel shape:\", df_month.shape)\n",
"\n",
"# 4b. ISIN-level features (entry/exit, performance reactivity)\n",
"tmp = df_rel_m.sort_values([ID_COL, ISIN_COL, \"month\"]).copy()\n",
"tmp[\"prev_aum\"] = tmp.groupby([ID_COL, ISIN_COL])[\"aum_qty\"].shift(1)\n",
"tmp[\"entry_event\"] = ((tmp[\"prev_aum\"].fillna(0) <= 0) & (tmp[\"aum_qty\"] > 0)).astype(int)\n",
"tmp[\"full_exit_event\"] = ((tmp[\"prev_aum\"] > 0) & (tmp[\"aum_qty\"] <= 0)).astype(int)\n",
"tmp[\"ret_fund_m_lag1\"] = tmp.groupby([ID_COL, ISIN_COL])[\"ret_fund_m\"].shift(1)\n",
"tmp[\"buy_on_perf\"] = ((tmp[\"net_flow_qty\"] > 0) & (tmp[\"ret_fund_m_lag1\"] > 0)).astype(int)\n",
"tmp[\"sell_on_perf\"] = ((tmp[\"net_flow_qty\"] < 0) & (tmp[\"ret_fund_m_lag1\"] < 0)).astype(int)\n",
"\n",
"df_rel_feat = (\n",
" tmp.groupby([ID_COL, ISIN_COL], as_index=False)\n",
" .agg(\n",
" rel_n_months = (\"month\", \"nunique\"),\n",
" rel_active_months = (\"active_rel_month\", \"sum\"),\n",
" rel_holding_months = (\"holding_rel_month\", \"sum\"),\n",
" rel_aum_mean = (\"aum_qty\", \"mean\"),\n",
" rel_turnover_mean = (\"turnover_rel\", \"mean\"),\n",
" rel_turnover_vol = (\"turnover_rel\", \"std\"),\n",
" rel_flow_to_aum_vol = (\"flow_to_aum_rel\", \"std\"),\n",
" rel_n_tx = (\"n_tx\", \"sum\"),\n",
" rel_full_exit_count = (\"full_exit_event\", \"sum\"),\n",
" rel_entry_count = (\"entry_event\", \"sum\"),\n",
" buy_on_perf_rate = (\"buy_on_perf\", \"mean\"),\n",
" sell_on_perf_rate = (\"sell_on_perf\", \"mean\"),\n",
" )\n",
")\n",
"\n",
"isin_aum = df_rel_feat.groupby(ID_COL)[\"rel_aum_mean\"].transform(\"sum\")\n",
"df_rel_feat[\"isin_weight\"] = np.where(\n",
" isin_aum > 0, df_rel_feat[\"rel_aum_mean\"] / isin_aum, np.nan\n",
")\n",
"hhi_isin = (\n",
" df_rel_feat.groupby(ID_COL)[\"isin_weight\"]\n",
" .apply(lambda w: np.sum(w**2))\n",
" .reset_index(name=\"hhi_isin\")\n",
")\n",
"\n",
"df_rel_client = (\n",
" df_rel_feat.groupby(ID_COL, as_index=False)\n",
" .agg(\n",
" n_isin_total = (ISIN_COL, \"nunique\"),\n",
" rel_turnover_mean_avg = (\"rel_turnover_mean\", \"mean\"),\n",
" rel_turnover_vol_avg = (\"rel_turnover_vol\", \"mean\"),\n",
" rel_flow_to_aum_vol_avg = (\"rel_flow_to_aum_vol\", \"mean\"), \n",
" full_exit_count = (\"rel_full_exit_count\", \"sum\"),\n",
" entry_count = (\"rel_entry_count\", \"sum\"),\n",
" avg_holding_months_per_isin = (\"rel_holding_months\", \"mean\"),\n",
" max_holding_months_per_isin = (\"rel_holding_months\", \"max\"),\n",
" buy_on_perf_rate_avg = (\"buy_on_perf_rate\", \"mean\"),\n",
" sell_on_perf_rate_avg = (\"sell_on_perf_rate\", \"mean\"),\n",
" )\n",
" .merge(hhi_isin, on=ID_COL, how=\"left\")\n",
")\n",
"\n",
"print(\"ISIN-level client features:\", df_rel_client.shape)\n",
"\n",
"# 4c. Asset type & fund composition shares\n",
"aum_by_asset = (\n",
" df_aum.dropna(subset=[ID_COL, ASSET_COL])\n",
" .groupby([ID_COL, ASSET_COL], as_index=False)[AUM_VAL_COL].sum()\n",
")\n",
"total_aum_acc = aum_by_asset.groupby(ID_COL)[AUM_VAL_COL].sum().rename(\"total_aum\")\n",
"aum_by_asset = aum_by_asset.merge(total_aum_acc, on=ID_COL)\n",
"aum_by_asset[\"share\"] = np.where(\n",
" aum_by_asset[\"total_aum\"] > 0,\n",
" aum_by_asset[AUM_VAL_COL] / aum_by_asset[\"total_aum\"], np.nan\n",
")\n",
"asset_shares = (\n",
" aum_by_asset\n",
" .pivot_table(index=ID_COL, columns=ASSET_COL, values=\"share\", aggfunc=\"mean\")\n",
" .fillna(0).reset_index()\n",
")\n",
"asset_shares.columns = [ID_COL] + [\n",
" f\"share_asset_{c.lower().replace(' ','_')}\" for c in asset_shares.columns[1:]\n",
"]\n",
"\n",
"aum_by_fund = (\n",
" df_aum.dropna(subset=[ID_COL, FUND_COL])\n",
" .groupby([ID_COL, FUND_COL], as_index=False)[AUM_VAL_COL].sum()\n",
")\n",
"aum_by_fund = aum_by_fund.merge(total_aum_acc, on=ID_COL)\n",
"aum_by_fund[\"share\"] = np.where(\n",
" aum_by_fund[\"total_aum\"] > 0,\n",
" aum_by_fund[AUM_VAL_COL] / aum_by_fund[\"total_aum\"], np.nan\n",
")\n",
"top_funds = aum_by_fund.groupby(FUND_COL)[AUM_VAL_COL].sum().nlargest(10).index\n",
"fund_shares = (\n",
" aum_by_fund[aum_by_fund[FUND_COL].isin(top_funds)]\n",
" .pivot_table(index=ID_COL, columns=FUND_COL, values=\"share\", aggfunc=\"mean\")\n",
" .fillna(0).reset_index()\n",
")\n",
"fund_shares.columns = [ID_COL] + [\n",
" f\"share_fund_{c.lower().replace(' ','_')[:30]}\" for c in fund_shares.columns[1:]\n",
"]\n",
"\n",
"print(\"Asset shares:\", asset_shares.shape)\n",
"print(\"Fund shares: \", fund_shares.shape)\n",
"\n",
"# 4d. Static client-level features\n",
"df_client_base = (\n",
" df_month.groupby(ID_COL, as_index=False)\n",
" .agg(\n",
" n_months = (\"month\", \"nunique\"),\n",
" n_active_months = (\"active_month\", \"sum\"),\n",
" flow_freq = (\"active_month\", \"mean\"),\n",
" aum_qty_mean = (\"aum_qty\", \"mean\"),\n",
" aum_qty_median = (\"aum_qty\", \"median\"),\n",
" aum_qty_max = (\"aum_qty\", \"max\"),\n",
" aum_qty_last = (\"aum_qty\", \"last\"),\n",
" net_flow_qty_sum = (\"net_flow_qty\", \"sum\"),\n",
" gross_flow_qty_sum = (\"gross_flow_qty\", \"sum\"),\n",
" gross_flow_qty_mean= (\"gross_flow_qty\", \"mean\"),\n",
" sub_qty_sum = (\"sub_qty\", \"sum\"),\n",
" red_qty_sum = (\"red_qty\", \"sum\"),\n",
" n_tx_total = (\"n_tx\", \"sum\"),\n",
" avg_n_isin_held = (\"n_isin_held\", \"mean\"),\n",
" max_n_isin_held = (\"n_isin_held\", \"max\"),\n",
" net_flow_qty_vol = (\"net_flow_qty\", \"std\"),\n",
" aum_drawdown_last = (\"aum_drawdown\", \"last\"),\n",
" aum_drawdown_max = (\"aum_drawdown\", \"max\"),\n",
" region = (\"region\", \"last\"),\n",
" country = (\"country\", \"last\"),\n",
" )\n",
")\n",
"df_client_base[\"net_flow_qty_vol\"] = df_client_base[\"net_flow_qty_vol\"].fillna(0)\n",
"\n",
"df_client_base = (\n",
" df_client_base\n",
" .merge(df_rel_client, on=ID_COL, how=\"left\")\n",
" .merge(asset_shares, on=ID_COL, how=\"left\")\n",
" .merge(fund_shares, on=ID_COL, how=\"left\")\n",
")\n",
"\n",
"print(\"df_client_base shape:\", df_client_base.shape)"
]
},
{
"cell_type": "markdown",
"id": "c383042d",
"metadata": {},
"source": [
"---\n",
"## 5. Part 1 — Global Clustering (All Accounts)\n",
"\n",
"### Objective\n",
"Segment the full client base into behavioral profiles using 8 carefully selected features. The analysis covers ~7,000 accounts with at least 6 months of history.\n",
"\n",
"### Feature set\n",
"| Feature | Description |\n",
"|---|---|\n",
"| `flow_freq` | Proportion of months with at least one transaction |\n",
"| `gross_flow_to_aum` | Total gross flows relative to mean AUM (clipped p90, log-transformed) |\n",
"| `n_isin_total` | Total number of distinct ISINs held over the period |\n",
"| `avg_holding_months_per_isin` | Average holding duration per ISIN |\n",
"| `exit_rate_per_isin` | Average number of full exits per ISIN |\n",
"| `flow_direction_balance` | Ratio of net to gross flows (buyer vs seller signal) |\n",
"| `log_aum_qty_mean` | Log mean AUM — only size variable retained |\n",
"| `months_since_last_tx` | Months since last transaction (recency signal, most discriminant feature) |\n",
"\n",
"### Preprocessing\n",
"- MAD winsorization (3σ) for long-tailed distributions\n",
"- Clip p90 + log-transform for `gross_flow_to_aum` and `flow_freq`\n",
"- RobustScaler before K-means\n",
"- Geographic and allocation variables excluded from clustering (used post-hoc as descriptors)\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "0d8b7276-8213-4667-979c-d97b3729162a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accounts after quality filters: 7177\n",
"Accounts: 7177 | Features: 8\n",
"Points > 5 std after scaling: 0 (0.0%)\n",
" k inertia silhouette davies_bouldin\n",
" 2 20240.673342 0.421930 0.973123\n",
" 3 16711.420111 0.241169 1.543030\n",
" 4 14679.824806 0.231005 1.511161\n",
" 5 13213.816987 0.228496 1.409421\n",
" 6 12021.187284 0.223428 1.417110\n",
" 7 11112.958987 0.229601 1.420989\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABdEAAAGMCAYAAAA1CuswAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA6WpJREFUeJzs3XdcE/cbB/BPEpZMZYiAGwTZiANFHMVV96obtdZV66rVn6tDrLbSam3rtmpdtY4WpVpRq7Zu3KCouBdDkSF7J/f7gyY1AgIxEMbn/Xrxqrl87/LcQ8pdnnzvOZEgCAKIiIiIiIiIiIiIiKgAsaYDICIiIiIiIiIiIiKqqFhEJyIiIiIiIiIiIiIqAovoRERERERERERERERFYBGdiIiIiIiIiIiIiKgILKITERERERERERERERWBRXQiIiIiIiIiIiIioiKwiE5EREREREREREREVAQW0YmIiIiIiIiIiIiIisAiOhERERERERERERFREVhEJyIiIqJC7d27Fw4ODoiKiir31/b19cXcuXPL/XUrkrlz58LX11eldR0cHPDll1+qLZYLFy7AwcEBFy5cUNs2gcr9e9bk/x9EREREVL5YRCciIiKqYOTFufDwcKXlqampeO+99+Dq6opTp05pKDr1uXr1KlauXImUlBRNh1Jubt++jXnz5sHX1xeurq5o1qwZ+vbti2+//RaRkZGaDk9tnj59ii+++AKdOnWCq6srPD09MXToUGzduhVZWVnlEkNmZiZWrlyp9sI/EREREVU/WpoOgIiIiIiKl5aWhg8++AB37tzBqlWr0L59e02H9NZCQ0OxatUq9O/fH8bGxkrPHT58GCKRSEORlY09e/bA398ftWrVQu/evdG4cWPk5eXh3r17+OOPP7Bt2zZcu3YNEolE06G+lRMnTmD69OnQ0dFB3759YW9vj9zcXFy5cgVLly7F/fv3sWjRojKPIzMzE6tWrcKUKVPg5eWl9u337dsXPXv2hI6Ojtq3TUREREQVC4voRERERBVcWloaxo4di4iICKxatQodOnTQdEhlrqoVJq9evQp/f394enpi3bp1MDQ0VHp+7ty5WLt2rYaiU5/IyEjMmDED1tbW2Lp1K2rXrq14bsSIEXjy5AlOnDihuQDVICMjA/r6+pBIJJX+Cw8iIiIiKhm2cyEiIiKqwNLT0zFu3DjcvHkTK1euRMeOHd84Pi0tDV999RV8fX3h4uKCNm3aYMyYMbh586bSuGvXrmHs2LFo3rw53N3d4efnhytXrpQoppMnT2L48OHw8PBAs2bNMGHCBNy7d6/AuAcPHmD69Olo3bo13Nzc0K1bN3z//fcAgJUrV+Lbb78FAHTq1AkODg5K/aUL65UdGRmJadOmoVWrVnB3d8fgwYMLFGTlvbuDg4Oxdu1atG/fHq6urhg9ejSePHlSov0rC6tXr4ZIJMKyZcsKFNABQFdXFx9//HGxRdmMjAwEBASgQ4cOcHFxQbdu3bBp0yYIglDo+P3796Nbt25wdXXFgAEDcOnSJaXno6Oj4e/vj27dusHNzQ1eXl6YNm2ayn2+N27ciIyMDHz11VdKBXS5Bg0aYPTo0UWuv3LlSjg4OBRYXlj/8fDwcIwdOxZeXl5wc3ODr68v5s2bBwCIiopCmzZtAACrVq1SvL9WrlypWP/BgweK95M8P8ePHy/0dS9evAh/f3+0adNG8SVWYTH5+vpi4sSJuHz5sqL1UqdOnRAUFFRgn27fvg0/Pz+4ubmhffv2WLNmDQIDA9lnnYiIiKgC4kx0IiIiogoqMzMT48ePx40bN/Djjz/inXfeKXadBQsW4MiRI/Dz84OtrS2SkpJw5coVPHjwAM7OzgCAkJAQjB8/Hi4uLpgyZQpEIhH27t2L0aNH49dff4Wbm1uR2w8KCsLcuXPh4+ODWbNmITMzEzt37sTw4cOxb98+1K1bF0B+gXDEiBHQ0tLCkCFDYGNjg6dPn+Lvv//GjBkz0KVLFzx+/Bh//vkn5s2bh1q1agEATE1NC33d+Ph4DB06FJmZmRg5ciRq1aqFffv2YdKkSVixYgW6dOmiNH7Dhg0QiUT44IMPkJaWho0bN2LWrFn47bffSpR7dcrMzMT58+fRqlUr1KlTR+XtCIKASZMm4cKFC3jvvffg6OiI06dP49tvv0VsbCzmz5+vNP7SpUsIDg7GyJEjoaOjg507d2LcuHH47bffYG9vDyC/EB0aGoqePXuiTp06iI6Oxs6dOzFq1CgcPHgQNWrUKFWM//zzD+rVqwdPT0+V97MkEhISMHbsWNSqVQsTJkyAsbExoqKicPToUQD57yN/f3/4+/ujS5cuiveHvEB/7949DBs2DJaWlhg/fjz09fVx6NAhTJ48GStXrizwflq4cCFMTU0xefJkZGRkvDG2J0+eYPr06XjvvffQv39/BAYGYu7cuXB2dkaTJk0AALGxsYovEyZMmAB9fX389ttvVe4KDCIiIqKqgkV0IiIiogpq7ty5ePHiBX744Qd06tSpROucPHkSgwcPVprFPX78eMW/BUGAv78/vLy8sHHjRkXf8aFDh6Jnz5744Ycf8PPPPxe67fT0dHz11VcYNGiQUk/r/v37491338X69esVyxcvXgxBELBv3z5YW1srxs6aNQsA0LRpUzg5OeHPP/9E586dFcX3ovz000+Ij4/Hjh070KJFCwDAoEGD0KdPHyxZsgSdOnWCWPzfRZbZ2dkICgpSFCWNjY3x1Vdf4e7du4oCcnl58uQJ8vLyFAXUVyUlJUEmkykeGxoaFllIPX78OM6fP4+PP/4YkyZNApDfImXatGnYtm0b/Pz8UL9+fcX4u3fvIjAwEC4uLgCAnj174t1338WKFSuwatUqAEDHjh3x7rvvKr3OO++8gyFDhuDIkSPo169fifczLS0NsbGxJX6vvo3Q0FAkJydj06ZNcHV1VSyfMWMGAEBfXx/dunWDv78/HBwc0LdvX6X1v/rqK1hZWSEwMFCR7+HDh2PYsGFYtmxZgSK6iYkJtmzZUqL2LY8ePVJ6n3bv3h0dOnTA3r17MWfOHAD5X/IkJydj3759cHR0BAAMGDAA3bp1UzEjRERERFSW2M6FiIiIqIKKj4+Hjo4OrKysSryOsbExrl27htjY2EKfj4iIwOPHj9G7d2+8fPkSiYmJSExMREZGBtq0aYNLly4pFXVfde7cOaSkpKBnz56K9RITEyEWi+Hu7o4LFy4AABITE3Hp0iUMHDhQqYAOQOWbhZ48eRJubm6KwiQAGBgYYMiQIYiOjsb9+/eVxg8YMECpGC1fLzIyUqXXfxtpaWkA8gu7r+vcuTPatGmj+Pn777+L3M6pU6cgkUgwcuRIpeUffPABBEHAqVOnlJY3a9ZMUUAHAGtra3Tq1AlnzpyBVCoFAOjp6Smez83NxcuXL1G/fn0YGxvj1q1bKu2ngYFBqdZThZGREYD8m5jm5uaWat2kpCScP38e3bt3R1pamuJ9/PLlS/j4+ODx48cF/v8ZPHhwifuf29nZKb1PTU1N0ahRI6X33unTp+Hh4aEooANAzZo10bt371LtCxERERGVD85EJyIiIqqgvvzySyxZsgTjxo3Djh070LhxYwCAVCpFYmKi0lgTExPo6Ohg1qxZmDt3Ljp27AhnZ2d06NAB/fr1Q7169QAAjx8/BgDFjNjCpKamwsTEpMBy+bpF9bSW9/qWFwvVOeM7JiYG7u7uBZbLcxITE6P0eq8X742NjQEAKSkpRb5GTk4OkpOTVYpPW1sbNWvWLPQ5eV4KawOyZs0a5OXl4fbt2/jmm2/e+BrR0dGoXbt2gZ7qtra2iudf1aBBgwLbaNiwITIzM5GYmAgLCwtkZWVh/fr12Lt3L2JjY5V6q6empr4xntfJ40pPTy/Veqpo1aoVunXrhlWrVmHLli1o1aoVOnfujN69exfbEuXp06cQBAE//vgjfvzxx0LHJCQkwNLSUvG4uCslXlXYl14mJiZK763o6Gh4eHgUGPfqlQREREREVHGwiE5ERERUQdna2mLDhg0YPXo0PvjgA+zcuRNWVlZ49uxZgZYZ27Ztg5e
"text/plain": [
"<Figure size 1500x400 with 3 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"K=4 | sil=0.2310 | db=1.5112\n",
" n_comptes pct\n",
"cluster_k4 \n",
"0 1478 20.6\n",
"1 1820 25.4\n",
"2 1171 16.3\n",
"3 2708 37.7\n"
]
}
],
"source": [
"# 2f. Engineered ratios\n",
"dfc = df_client_base.copy()\n",
"dfc[\"log_aum_qty_mean\"] = np.log1p(dfc[\"aum_qty_mean\"].clip(lower=0))\n",
"dfc[\"gross_flow_to_aum\"] = np.where(\n",
" dfc[\"aum_qty_mean\"] > 1,\n",
" dfc[\"gross_flow_qty_sum\"] / dfc[\"aum_qty_mean\"], np.nan\n",
")\n",
"dfc[\"flow_direction_balance\"] = np.where(\n",
" dfc[\"gross_flow_qty_sum\"] > 0,\n",
" dfc[\"net_flow_qty_sum\"] / dfc[\"gross_flow_qty_sum\"], np.nan\n",
")\n",
"dfc[\"sub_share_mean\"] = np.where(\n",
" dfc[\"gross_flow_qty_sum\"] > 0,\n",
" dfc[\"sub_qty_sum\"] / dfc[\"gross_flow_qty_sum\"], np.nan\n",
")\n",
"dfc[\"redemption_bias\"] = np.where(\n",
" dfc[\"gross_flow_qty_sum\"] > 0,\n",
" (dfc[\"red_qty_sum\"] - dfc[\"sub_qty_sum\"]) / dfc[\"gross_flow_qty_sum\"], np.nan\n",
")\n",
"dfc[\"exit_rate_per_isin\"] = np.where(\n",
" dfc[\"n_isin_total\"] > 0,\n",
" dfc[\"full_exit_count\"] / dfc[\"n_isin_total\"], np.nan\n",
")\n",
"dfc[\"aum_final_to_peak\"] = np.where(\n",
" dfc[\"aum_qty_max\"] > 0,\n",
" dfc[\"aum_qty_last\"] / dfc[\"aum_qty_max\"], np.nan\n",
")\n",
"dfc[\"aum_drawdown_last\"] = dfc[\"aum_drawdown_last\"].clip(0, 1)\n",
"\n",
"for col in [\"aum_qty_mean\", \"gross_flow_qty_sum\", \"n_tx_total\"]:\n",
" dfc[f\"log_{col}\"] = np.log1p(dfc[col].clip(lower=0))\n",
"\n",
"# Filtres qualité\n",
"dfc = dfc[(dfc[\"n_months\"] >= 6) & (dfc[\"aum_qty_mean\"] > 0)].copy()\n",
"for col in [\"aum_qty_mean\", \"gross_flow_qty_sum\", \"n_tx_total\"]:\n",
" cap = dfc[col].quantile(0.99)\n",
" dfc = dfc[dfc[col] <= cap].copy()\n",
"\n",
"# Géographie\n",
"top_countries = dfc[\"country\"].fillna(\"Unknown\").value_counts().head(10).index\n",
"top_regions = dfc[\"region\"].fillna(\"Unknown\").value_counts().head(10).index\n",
"dfc[\"country_grp\"] = np.where(dfc[\"country\"].isin(top_countries), dfc[\"country\"], \"Other\")\n",
"dfc[\"region_grp\"] = np.where(dfc[\"region\"].isin(top_regions), dfc[\"region\"], \"Other\")\n",
"\n",
"# months_since_last_tx\n",
"dfc = add_months_since_last_tx(dfc, df_month, ID_COL)\n",
"\n",
"print(f\"Accounts after quality filters: {len(dfc)}\")\n",
"\n",
"# 5a. Feature selection & preprocessing\n",
"base_features_global = [\n",
" \"flow_freq\",\n",
" \"gross_flow_to_aum\",\n",
" \"n_isin_total\",\n",
" \"avg_holding_months_per_isin\",\n",
" \"exit_rate_per_isin\",\n",
" \"flow_direction_balance\",\n",
" \"log_aum_qty_mean\",\n",
" \"months_since_last_tx\",\n",
"]\n",
"all_features_global = [c for c in base_features_global if c in dfc.columns]\n",
"\n",
"dfc_clean = dfc.copy() # working copy for preprocessing\n",
"\n",
"for col in [\"flow_direction_balance\", \"months_since_last_tx\"]:\n",
" if col in dfc_clean.columns:\n",
" dfc_clean[col] = dfc_clean[col].fillna(0)\n",
"\n",
"for col in [\"n_isin_total\", \"exit_rate_per_isin\",\n",
" \"avg_holding_months_per_isin\", \"months_since_last_tx\"]:\n",
" if col in dfc_clean.columns:\n",
" dfc_clean[col] = winsorize_mad(dfc_clean[col], n_sigma=3)\n",
"\n",
"col = \"gross_flow_to_aum\"\n",
"if col in dfc_clean.columns:\n",
" vals = dfc_clean[col].to_numpy(dtype=float)\n",
" vals = np.clip(vals, 0, np.nanpercentile(vals, 90))\n",
" dfc_clean[col] = np.log1p(vals)\n",
"\n",
"col = \"flow_freq\"\n",
"if col in dfc_clean.columns:\n",
" vals = dfc_clean[col].to_numpy(dtype=float)\n",
" dfc_clean[col] = np.log1p(np.clip(vals, 0, None))\n",
"\n",
"col = \"log_aum_qty_mean\"\n",
"if col in dfc_clean.columns:\n",
" dfc_clean[col] = winsorize_mad(dfc_clean[col], n_sigma=3)\n",
"\n",
"X_global = dfc_clean[all_features_global].copy()\n",
"X_global = X_global.loc[:, ~X_global.columns.duplicated()]\n",
"X_global = X_global.fillna(X_global.median())\n",
"\n",
"scaler_global = RobustScaler()\n",
"X_global_scaled = scaler_global.fit_transform(X_global)\n",
"\n",
"# Diagnostic\n",
"X_df = pd.DataFrame(X_global_scaled, columns=X_global.columns)\n",
"extreme = (X_df.abs() > 5).any(axis=1).sum()\n",
"print(f\"Accounts: {X_global.shape[0]} | Features: {X_global.shape[1]}\")\n",
"print(f\"Points > 5 std after scaling: {extreme} ({extreme/len(X_df):.1%})\")\n",
"\n",
"# 5b. K-selection\n",
"rows = []\n",
"for k in range(2, 8):\n",
" km = KMeans(n_clusters=k, n_init=50, random_state=RANDOM_STATE)\n",
" labels = km.fit_predict(X_global_scaled)\n",
" rows.append({\n",
" \"k\": k, \"inertia\": km.inertia_,\n",
" \"silhouette\": silhouette_score(X_global_scaled, labels),\n",
" \"davies_bouldin\": davies_bouldin_score(X_global_scaled, labels),\n",
" })\n",
"df_kdiag_global = pd.DataFrame(rows)\n",
"print(df_kdiag_global.to_string(index=False))\n",
"\n",
"fig, axes = plt.subplots(1, 3, figsize=(15, 4))\n",
"for ax, col, title in zip(axes,\n",
" [\"inertia\", \"silhouette\", \"davies_bouldin\"],\n",
" [\"Elbow / Inertia\", \"Silhouette (higher=better)\", \"Davies-Bouldin (lower=better)\"]):\n",
" ax.plot(df_kdiag_global[\"k\"], df_kdiag_global[col], marker=\"o\")\n",
" ax.set_title(title); ax.set_xlabel(\"K\")\n",
"plt.suptitle(\"K-selection — Global Clustering\")\n",
"plt.tight_layout(); plt.show()\n",
"\n",
"# 5c. Final clustering K=4\n",
"RESULTS_GLOBAL = {}\n",
"for k in [4]:\n",
" km = KMeans(n_clusters=k, n_init=50, random_state=RANDOM_STATE)\n",
" dfc[f\"cluster_k{k}\"] = km.fit_predict(X_global_scaled)\n",
" RESULTS_GLOBAL[k] = {\n",
" \"model\": km,\n",
" \"silhouette\": silhouette_score(X_global_scaled, dfc[f\"cluster_k{k}\"]),\n",
" \"davies_bouldin\": davies_bouldin_score(X_global_scaled, dfc[f\"cluster_k{k}\"]),\n",
" }\n",
" print(f\"K={k} | sil={RESULTS_GLOBAL[k]['silhouette']:.4f} \"\n",
" f\"| db={RESULTS_GLOBAL[k]['davies_bouldin']:.4f}\")\n",
" counts = dfc[f\"cluster_k{k}\"].value_counts().sort_index()\n",
" props = counts / counts.sum() * 100\n",
" print(pd.DataFrame({\"n_comptes\": counts, \"pct\": props.round(1)}))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "1c0ea35a",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABMkAAAGGCAYAAABhZtaKAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdYFMcbwPHv0VR6FQVBRQQsoGLvJcZuYoktauwau0Zjj7GXxBbFFmPvJir2XqKJPbFF0RhrbEiVLm1/fxBOT0APBO5++n6eZx/d2dnZd45jbm+YmVUpiqIghBBCCCGEEEIIIcQHzEDXAQghhBBCCCGEEEIIoWvSSSaEEEIIIYQQQgghPnjSSSaEEEIIIYQQQgghPnjSSSaEEEIIIYQQQgghPnjSSSaEEEIIIYQQQgghPnjSSSaEEEIIIYQQQgghPnjSSSaEEEIIIYQQQgghPnjSSSaEEEIIIYQQQgghPnjSSSaEEEIIIYQQQgghPnjSSSaEEFqoV68eo0aN0nUYOjFq1Cjq1aun6zCEDnTu3JlmzZrl+nUfPnyIp6cn27Zty/VrA2zbtg1PT08ePnyok+u/75KTk2nWrBmLFy/WdSh6wdPTk0mTJuk6DPGfsLAwypYty6+//qrrUIQQQuiAdJIJIT5oDx48YPz48Xz00Ud4e3vj6+tL+/btWb16NXFxcbkSQ2xsLAsWLODs2bO5cr1UoaGhTJkyhUaNGuHj40PVqlX57LPP+P7774mOjs7VWDJjyZIlHD58WNdh6JWzZ8/i6empsVWqVIm2bduyc+dOXYf33lqwYEGa1z1127hxY45c89dff2XBggU5UnZu2b17N0+ePKFTp07qtNSOyatXr2rkjYyM5LPPPsPb25sTJ05kaxzdunX7YDuoAgMDWbBgAQEBAboORe/Y2Njw2Wef8cMPP+g6FCGEEDpgpOsAhBBCV44fP87gwYMxMTHh008/xcPDg4SEBP744w++//57/vnnHyZPnpzjccTGxuLn58eAAQOoXLlyjl8PIDw8nNatWxMVFUXr1q1xc3MjPDycmzdvsnHjRjp06ICZmRkAkydPRlGUXIlLG0uXLqVhw4bUr19f16Honc6dO+Pt7Q2k/Iz37dvH119/TWRkJB07dtRxdNpzdnbmypUrGBn9f9ymTJgwAVNTU420MmXK5Mi1fv31V9avX8/AgQNzpPzcsHz5cpo2bYqFhcUb80VFRdG9e3du3ryJn58ftWrVyrYYDh48yKVLl7KtvP83z549w8/PD2dnZ0qUKKHrcPROhw4dWLt2LadPn6Zq1aq6DkcIIUQu+v+4+xRCiGz277//MnToUJycnFi9ejX58+dXH+vYsSP379/n+PHjugswG8TExKT54p7ql19+4fHjx2zcuBFfX1+NY1FRURgbG6v3X/3/+yo5OZmEhATy5Mmj61DeSYUKFWjUqJF6v0OHDtSvX59du3b9X3WSqVSqbP1ZvOl3ITs0bNgQW1vbHCs/N+T0a5Tq+vXr3Lhx463T16OioujRowcBAQH4+flRu3btbIvhxYsXzJgxg549ezJ//vxsKzdVbr2WQjuKovDixQvy5s2r9TnFihXDw8OD7du3SyeZEEJ8YGS6pRDig/TTTz8RExPD1KlTNTrIUhUuXJguXbpkeH7qNKvXpbeW0dWrV+nRoweVK1fGx8eHevXqMXr0aCBl7aXUG3A/Pz/1VK1Xp1Pdvn2bQYMGUalSJby9vWnVqhVHjhxJ97rnzp1jwoQJVK1a9Y1fKh88eIChoSFly5ZNc8zc3FyjgyK9NcnCwsL4+uuv8fX1pUKFCowcOZIbN26kWUdq1KhRlCtXjsDAQPr160e5cuWoUqUKM2fOJCkpSaPM5cuX0759e/Xr1KpVK/bv36+Rx9PTk5iYGLZv365+rVK/bGe0dlp6P6vUKVY7d+6kadOmeHt7c/LkSSBlGtLo0aOpVq0apUuXpmnTpvzyyy9pyl27di1NmzalTJkyVKxYkVatWrFr1670Xm6dMTExwcrKKt0RWTt27KBVq1b4+PhQqVIlhg4dypMnT9It559//qFz586UKVOGmjVrsmzZMo3j8fHx/PDDD7Rq1Yry5ctTtmxZPv/8c86cOaPOk5CQQKVKldTv/VdFRUXh7e3NzJkzgYzXJDt9+jSff/45ZcuWpUKFCvTt25fbt29r5En9ef/zzz8MGzaMihUr8vnnnwOoO2dSp1dXr16d0aNHExYWpsWrmXXavNYXLlxg0KBB1KlTh9KlS1O7dm2mTZumMe171KhRrF+/HkBjaie8nHL7+rTt9F7L1N/LBw8e0KtXL8qVK8fw4cOBlA7jVatWqX8vqlWrxvjx43n+/LlGuW9q197k8OHDGBsbU6FChQzzREdH07NnT65du8aCBQuoU6fOW8vNjGXLlqEoCj169Hjnst70WsbExDBjxgxq165N6dKladiwIcuXL89wZO7OnTtp2LChup0/f/58mmtp28b9/vvvdOjQgQoVKlCuXDkaNmzInDlzgJT3ymeffQbA6NGj1e+jjNYATH0PZbS9jTbvleTkZFavXk3z5s3x9vamSpUq9OjRQ2P6bWJiIgsXLqR+/fqULl2aevXqMWfOHOLj4zXKqlevHn369OHkyZPq37tNmzYBEBERwdSpU9U/k48//pgff/yR5OTkNHFXq1aNY8eO6dVIaiGEEDlPRpIJIT5Ix44dw8XFJc0oquwWEhJCjx49sLGxoXfv3lhaWvLw4UMOHToEgK2tLRMmTGDChAl8/PHHfPzxxwDqLx63bt2iQ4cOODo60qtXL0xNTdm3bx/9+/dnwYIF6vypJk6ciK2tLf379ycmJibDuJydnUlKSmLHjh20bNkyU3VKTk6mb9++XLlyhQ4dOuDm5saRI0cYOXJkuvmTkpLo0aMHPj4+jBgxgtOnT7NixQpcXFzUnRcAa9asoV69ejRv3pyEhAT27NnD4MGDWbp0qfpL8nfffce4cePw8fGhbdu2ALi6umYq/lRnzpxh3759dOzYERsbG5ydnQkODqZt27aoVCo6duyIra0tJ06cYOzYsURFRdG1a1cAtmzZwpQpU2jYsCFffPEFL1684ObNm1y+fJnmzZtnKZ7sEB0dTWhoKADPnz9n9+7d/P3330ydOlUj3+LFi/nhhx9o3Lgxn332GaGhoaxbt46OHTvi7++PpaWlOu/z58/p2bMnH3/8MY0bN+bAgQPMmjULDw8PdUdsVFQUP//8M82aNaNNmzZER0fzyy+/0LNnT37++WdKlCiBsbEx9evX59ChQ0ycOBETExP1NQ4fPkx8fDxNmjTJsG6nTp2iV69eFCpUiAEDBhAXF8e6devo0KED27Zto1ChQhr5Bw8eTOHChRk6dKj6S+6pU6f4999/adWqFQ4ODty6dYstW7bwzz//sGXLFlQqVZZe99c7kAwNDbGyssrUa71//37i4uLo0KED1tbWXLlyhXXr1vH06VP1aKd27drx7Nkzfv/9d7777rssxZoqMTGRHj16UL58eUaOHKkeZTN+/Hi2b99Oq1at6Ny5Mw8fPmT9+vVcv36djRs3Ymxs/NZ27U0uXryIh4dHhiNUY2Nj6dWrF3/99Rc//PADdevWTZMnPj6eqKgorer5+gi/x48fs2zZMqZNm5apkUVvkt5rqSgKffv2VXdIlShRgpMnT/Ldd98RGBjImDFjNMo4f/48e/fupXPnzpiYmLBx40b174+Hh0em4rl16xZ9+vTB09OTQYMGYWJiwv379/nzzz+BlFFSgwYNYv78+bRr147y5csDZPh5aGtrm+b9lpiYyPTp09860ljb98rYsWPZtm0btWrV4rPPPiMpKYkLFy5w+fJl9RTycePGsX37dho2bEi3bt24cuUKS5cu5fbt2yxcuFCjvLt37zJs2DDatWtH27ZtKVq0KLGxsXTq1InAwEDat29PwYIFuXjxInPmzCEoKIixY8dqlFGqVClWrVrFrVu3Mv0zEEII8X9MEUKID0xkZKTi4eGh9O3bV+tz6tatq4wcOVK9P3/+fMXDwyNNvq1btyoeHh7Kv//+qyiKohw6dEj
"text/plain": [
"<Figure size 1400x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"=== Median behavioral features — K=4 ===\n",
" gross_flow_to_aum flow_freq flow_direction_balance n_isin_total avg_holding_months_per_isin exit_rate_per_isin log_aum_qty_mean months_since_last_tx\n",
"cluster_k4 \n",
"0 1.159 0.043 -1.000 3.0 60.000 0.400 5.167 27.0\n",
"1 1.476 0.012 -1.000 3.0 12.000 0.714 3.408 127.0\n",
"2 5.351 0.617 -0.006 12.0 28.897 0.667 8.763 3.0\n",
"3 7.889 0.071 0.000 1.0 11.333 1.000 5.280 69.0\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABB8AAAGGCAYAAAAzaSmEAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdYFFcXwOHf0sQCSFcRLCBFBbH3gpqYWGKLvcRu7CXGrom9d2xRIxasidhiNxpj7EbFgmjsFaV3KbvfH3ysrhRRgRU57/PMoztzZ/bcvbvDztl77yhUKpUKIYQQQgghhBBCiCyio+0AhBBCCCGEEEII8XmT5IMQQgghhBBCCCGylCQfhBBCCCGEEEIIkaUk+SCEEEIIIYQQQogsJckHIYQQQgghhBBCZClJPgghhBBCCCGEECJLSfJBCCGEEEIIIYQQWUqSD0IIIYQQQgghhMhSknwQQgghhBBCCCFElpLkgxDik1W/fn1Gjx6t7TCy1Y4dO3BycuLx48faDkVkktGjR1O/fn2NdU5OTixZskRLEb3bkiVLcHJy0nYYWvMx7ZNd5y2lUknTpk1Zvnx5lj+XtqT22fmUZMXnOK2/AatXr6ZBgwa4uLjQvHlz4uPjqVu3Lt7e3pn6/EIIkZUk+SCEyHYPHz5k4sSJNGjQAFdXVypUqED79u1Zt24dsbGx2RJDTEwMS5Ys4ezZs9nyfDnVnj178PLy0nYYn6zw8HBcXV1xcnLizp072g7nvchnIHv8999/LFmyJNMTinv37uXZs2d07txZvS75wvXq1asaZSMiIvj2229xdXXlxIkTmRpH9+7dcXJyYvLkyZl63M/dihUrOHLkSIbKnjx5kjlz5lChQgVmzJjB8OHD0dfXp3v37qxYsYJXr15lcbRCCJE5JPkghMhWx48fp1mzZuzfvx8PDw8mTJjADz/8QJEiRZgzZw7Tpk3LljhiYmLw9PTk3Llz2fJ8GdW8eXN8fX2xsbHRdihA0gXO+vXrtR3GJ+vAgQMoFAosLS3ZvXu3tsN5L+l9Bvr164evr68Wosr5Dhw4wJQpU9SP//vvPzw9PXny5EmmPs+aNWto0qQJRkZG6ZaLjIykR48e+Pv74+npSZ06dTIthkOHDnH58uVMO15O4+vrS79+/T5o35UrV6aafEjtb8CZM2fQ0dFh2rRptGjRgrp16wLQqlUrQkJC2LNnz4dVQAghspkkH4QQ2ebRo0cMGzaMIkWK8McffzB+/Hjatm1Lp06dmD9/Pn/88QcODg7aDvOjREdHf9T+urq65MmTB4VCkUkRfZpiYmK0HUKm2L17N3Xr1qVJkybs3btX2+FkGj09PfLkyaPtMN7bx37+MoOBgQH6+vpZ+hw3btzg5s2bfP311+mWi4yMpGfPnvj5+bFkyRL1RWtmePXqFTNnzqRXr16ZdsycQKlUqnsa5MmTBz09vUw9fmp/A4KCgjA0NMTAwECjrLGxMbVq1cLHxydTYxBCiKwiyQchRLZZvXo10dHRTJs2DSsrqxTbixUrxnfffZfm/mmNQ09tjOzVq1fp2bMnVatWxc3Njfr16zNmzBgAHj9+TPXq1QHw9PTEyckpxdjdO3fuMHjwYKpUqYKrqyutWrXi6NGjqT7vuXPn+Pnnn6levfo7v9xv2LCBJk2aUK5cOSpXrkyrVq00frVKrS5KpZIlS5ZQq1YtypUrR5cuXfjvv/9SjC1P3vfixYvMmDGDatWq4e7uzoABAwgODtaI48iRI/Tp04datWpRtmxZGjZsyNKlS0lMTFSX6dKlC8ePH+fJkyfq1yh5/HVa45LPnj2Lk5OTRlf+Ll260LRpU65du0anTp0oV64c8+fPByAuLo7FixfzxRdfULZsWerWrcvs2bOJi4vTOO4///xDhw4dqFSpEuXLl6dRo0bqY2jL06dPuXDhAo0bN6ZJkyY8fvyYf//994OPd+PGDXr16kWFChUoX7483333Xaq/KoeHhzN9+nTq169P2bJlqVOnDiNHjlS3cVxcHIsWLaJVq1ZUrFgRd3d3OnbsyJkzZ9THeNdnILXPWkJCAkuXLqVhw4aULVuW+vXrM3/+/BRtVb9+ffr27cuFCxfUXf0bNGjAzp073/kaPH78GCcnJ9asWYOXlxceHh64ubnRuXNnbt26pVF29OjRlC9fnocPH9K7d2/Kly/PiBEjgKQkxMyZM6lbty5ly5alUaNGrFmzBpVKpXGMuLg4pk+fTrVq1Shfvjzff/89z58/TxFXWnMPpPY6vfm53LFjB0OGDAGga9eu6tc5+fOR3nkqPUeOHEFfX59KlSqlWSYqKopevXpx/fp1lixZQr169d553PexatUqVCoVPXv2zJTjHTlyhKZNm+Lq6krTpk05fPhwquWUSiVeXl40adIEV1dXatSowcSJEwkLC9Mol5HXVqlUsm7dOpo1a4arqyvVqlWjZ8+eGsNWkoeU7N69W/2cf//9t3rbm383kt8Pd+7cYciQIVSoUIGqVasydepUjaERTk5OREdH4+Pjo35PvPmeefPc6uTkxI4dO4iOjlaX3bFjh/pYNWrU4OLFi4SGhn7Aqy6EENkrc9O1QgiRjmPHjmFra0uFChWy9HmCgoLo2bMnpqam9OnTB2NjYx4/fqz+MmtmZsbPP//Mzz//zBdffMEXX3wBoL6IuH37Nh06dMDa2prevXuTL18+9u/fz4ABA1iyZIm6fLJJkyZhZmbGgAED0v3lddu2bUydOpVGjRrRtWtXXr16hb+/P1euXKFZs2Zp7jdv3jxWr16Nh4cHtWvX5ubNm/Ts2TPNcb5Tp07F2NiYgQMH8uTJE9atW8fkyZNZuHChuoyPjw/58uWje/fu5MuXjzNnzrB48WIiIyMZNWoUAN9//z0RERE8f/5c/aU9f/7873j1UxcaGkrv3r1p0qQJ33zzDebm5iiVSvr168fFixdp27Yt9vb23Lp1i3Xr1nH//n2WLVsGJLVH3759cXJyYvDgwRgYGPDgwYOPutDPDHv37iVv3rx4eHhgaGiInZ0de/bs+aD39+3bt+nUqRP58+enV69e6OnpsXXrVrp06cLGjRspV64ckHRB2alTJ+7cuUPr1q0pXbo0ISEh/PnnnwQEBGBmZkZkZCTbt2+nadOmtGnThqioKH777Td69erF9u3bcXFxeednIDXjx4/Hx8eHRo0a0b17d3x9fVm5ciV37txh6dKlGmUfPHjAkCFD+Pbbb2nZsiW///47o0ePpkyZMpQqVeqdr8fOnTuJioqiY8eOvHr1ig0bNvDdd9+xZ88eLCws1OUSEhLo2bMnFStWZNSoURgaGqJSqejXrx9nz57l22+/xcXFhb///pvZs2cTEBDA2LFj1fuPGzeO3bt307RpUypUqMCZM2fo06fPe7VdeipXrkyXLl3YsGED33//PSVLlgTA3t7+neep9Fy6dAlHR8c0e1jExMTQu3dvrl27xqJFi/Dw8EhRJi4ujsjIyAzVw8zMTOPx06dPWbVqFdOnT8fQ0DBDx0jPyZMnGTRoEA4ODvzwww+EhIQwZswYChUqlKLsxIkT8fHxoVWrVnTp0oXHjx/j7e3NjRs32Lx5M/r6+hl+bceNG8eOHTuoU6cO3377LYmJiVy4cIErV67g6uqqLnfmzBn2799Pp06dMDU1feewuKFDh2JjY8MPP/zA5cuX2bBhA+Hh4cyePRuA2bNnM378eNzc3Gjbti0AdnZ2qR5r9uzZbNu2DV9fX6ZOnQqgcY4pU6YMKpWKS5cupdrOQgjxSVEJIUQ2iIiIUDk6Oqr69euX4X08PDxUo0aNUj9evHixytHRMUW533//XeXo6Kh69OiRSqVSqQ4fPqxydHRU+fr6pnnsoKAglaOjo2rx4sUptn333Xeqpk2bql69eqVep1QqVe3atVN9+eWXKZ63Q4cOqoSEhHfWp1+/fqomTZqkW+bturx8+VJVunRpVf/+/TXKLVmyROXo6Kjx+iTv261bN5VSqVSvnz59usrFxUUVHh6uXhcTE5PiuSdMmKAqV66cRr379Omj8vDweGecyc6cOaNydHRUnTlzRr2uc+f
"text/plain": [
"<Figure size 1200x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"=== Median allocation — K=4 ===\n",
" share_asset_fixed_income share_asset_diversified share_asset_equity share_fund_carmignac_patrimoine share_fund_carmignac_investissement share_fund_carmignac_sécurité share_fund_carmignac_emergents\n",
"cluster_k4 \n",
"0 0.000 0.373 0.227 0.260 0.000 0.000 0.000\n",
"1 0.000 0.326 0.099 0.156 0.000 0.000 0.000\n",
"2 0.284 0.207 0.154 0.149 0.011 0.017 0.002\n",
"3 0.768 0.000 0.000 0.000 0.000 0.000 0.000\n",
"\n",
"=== Distribution géographique per cluster ===\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABrcAAAGGCAYAAADRitpgAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XVcFOkfB/APICBISAsIiAUqILaYZyt2d3fn2YeF2I2JqIjtIZ6tZ3d3/mwMpJVmgd3fH5yLK+GCLLvLft738vW6nZ0ZvsMOM/PZ55ln1EQikQhERERERERERERERERESkBd3gUQERERERERERERERERSYuNW0RERERERERERERERKQ02LhFRERERERERERERERESoONW0RERERERERERERERKQ02LhFRERERERERERERERESoONW0RERERERERERERERKQ02LhFRERERERERERERERESoONW0RERERERERERERERKQ02LhFRERERERERERERERESoONW0RE/zlw4AAcHBzw8eNH8bTevXujd+/e+fLzHRwcsGbNGvHrNWvWwMHBAZGRkfny8xs2bIipU6fmy8/KjI+PD5o3bw6hUCi3GhTFx48f4eDggAMHDsi7lFwZP348xo4dK+8yiIiIiCgH8jP7SIP5jPlMUTCfEREpJjZuERVwQUFB8PDwQKNGjeDs7IzKlSujW7du8PPzQ2JiorzLAwDs3LlTaS8SM3P37l2sWbMG0dHR8i4lA0WtLTY2Fps3b8bgwYOhrp52ahKJRPD29kbdunXh5uaG+fPnQyAQSCwXFxeHunXr4vDhw/IoW2m9evUKa9askfiiIC8NHjwYp06dwvPnz2WyfiIiIqKC6Htjzvd/5cuXR926dTF16lSEhITIuzylpagZCFDc2pjP8hfzGRFR7hSSdwFEJDvnz5/H2LFjoaWlhbZt26Js2bJITk7GnTt3sGTJErx69Qrz5s2Td5nYvXs3jIyM0KFDB3mXkoGvr2+Ol7l37x68vb3Rvn17GBgYSL3cw4cPoaGhkeOflxPZ1XbixAmoqanJ9Odn5e+//0ZKSgpatWolnnbo0CFs2LABgwcPho6ODjZs2ABTU1MMHTpUPM+GDRtgbW2N1q1by6NspfXq1St4e3ujevXqKF68eJ6vv3z58nBycsKWLVuwePHiPF8/ERERUUE2ZswYFC9eHAKBAPfv30dgYCDu3LmDI0eOQFtbW2Y/NzfZJ78xn+UP5rP8xXxGRJQ7bNwiKqA+fPiA8ePHw8rKCn5+fjA3Nxe/17NnT7x//x7nz5+XX4G5FB8fD11d3Xz7eVpaWjJdv1AoRHJyMrS1tWUaVKUh623NzoEDB9CwYUOJ38H58+fRunVr8fAJSUlJOHv2rDg8BQUFYfv27dixY4dcas5Ofu+niuLH7W7RogXWrFmDuLg4FClSRM6VERERESmPevXqwdnZGQDQuXNnGBkZwcfHB2fOnIG7u7vMfq4884C0mM/yB/NZwcB8RkQFHYclJCqgNm/ejPj4eMyfP1+iYes7Ozs79O3bV/w6JSUFa9euRePGjeHk5ISGDRti+fLlGYYZ+Hnc8e9+Hg/8+5Aad+7cwYIFC1CzZk24urpi5MiREmOUN2zYEC9fvsTNmzfFw298H0P9+zpu3ryJ2bNnw83NDfXr18f169fh4OCAf//9N0Mdhw8fhoODA+7du5ft7+fly5fo06cPXFxcUK9ePaxbty7TscQzG9Pd398fLVu2RMWKFVGtWjV06NBBPOzCmjVrxD2hGjVqJN6m78MLODg4YO7cuTh06BBatmwJZ2dnXLp0KdvfbVRUFMaOHYvKlSujRo0a8PT0RFJSkvj97Mb//nGdv6otszHdP3z4gDFjxqB69eqoWLEiunTpkqFR9MaNG3BwcMCxY8ewfv16cRjv27cv3r9/n8UnIPkzXrx4gVq1aklMT0xMhKGhofi1oaEhEhISxK8XLlwId3d3cfCXxo+1Ll++HLVr14arqyuGDRuG4ODgDPM/ePAAAwcORJUqVVCxYkX06tULd+7ckZjn+9j7r169wsSJE1GtWjX06NEj2zqio6Ph5eWFhg0bwsnJCfXq1cPkyZOzHb8/q+cLTJ06FQ0bNpSYdvToUXTo0AGVKlVC5cqV0bp1a/j5+QFI+7v6Hkj79Okj3g9u3LghXv7ChQvo0aMHXF1dUalSJQwZMgQvX77M8HMrVaqEoKAgDB48GJUqVcKkSZPE79eqVQvx8fG4evVqtr8LIiIiIspe1apVAaRdN//o9evX4mt1Z2dndOjQAWfOnMmw/PPnz9GrVy+J7BMQECDV86wiIiIwffp01KpVC87OzmjTpg0CAwMl5vmeR3x9fbF3715xpuzYsSMePnwo1TYynzGfMZ8xnxER5RTv3CIqoM6dOwcbGxtUrlxZqvlnzpyJwMBANGvWDP3798fDhw+xceNGvH79GmvXrs11HZ6enjAwMMCoUaPw6dMn+Pn5Ye7cuVi5ciUAYPr06Zg3bx50dXUxbNgwAICpqanEOubMmQNjY2OMHDkS8fHxqFGjBiwtLXH48GE0adJEYt7Dhw/D1tYWlSpVyrKmsLAw9OnTB6mpqRgyZAh0dHSwb98+qXrm7du3D56enmjWrBn69OmDpKQkvHjxAg8ePEDr1q3RpEkTvHv3DkeOHMG0adNgZGQEADA2Nhav4/r16zh+/Dh69uwJIyMjWFtbZ/szx40bB2tra0ycOBH379+Hv78/oqOjczycgDS1/Sg8PBzdunVDQkICevfuDSMjIwQGBmL48OFYvXp1ht+9j48P1NTUMGDAAPEY7ZMmTcL+/fuzret7Q2T58uUlpjs7O2PXrl1o3rw5dHR0sHfvXvHneuXKFVy/fh0nT57M0e/gu/Xr10NNTQ2DBw9GREQE/Pz80K9fP/zzzz8oXLgwAODatWsYPHgwnJycMGrUKKipqeHAgQPo27cvdu3aBRcXF4l1jh07FnZ2dhg/fjxEIlGWPzsuLg49e/bE69ev0bFjR5QvXx5RUVE4e/YsQkJCsvw8pHXlyhVMmDABbm5u4jDz5s0b3L17F3379kW1atXQu3dv+Pv7Y9iwYShZsiQAoFSpUgCAgwcPYurUqahTpw4mTZqEhIQE7N69Gz169EBgYKDEMBkpKSnicDllyhTx7w4ASpcujcKFC+Pu3bsZ9hUiIiIikt6nT58AQGLYupcvX6J79+6wsLDA4MGDoauri+PHj2PkyJFYs2aN+PorJCRE3KlxyJAh0NXVxf79+6W6KygxMRG9e/dGUFAQevbsieLFi+PEiROYOnUqoqOjJTpLAsCRI0cQFxeHrl27Qk1NDZs3b8bo0aNx+vRpaGpqZvlzmM+Yz5jPmM+IiHKDjVtEBVBsbCxCQkLQqFEjqeZ//vw5AgMD0blzZ3h6egJIG7rQ2NgYW7ZswfXr11GzZs1c1VK0aFFs2bJFPFa4UCiEv78/YmJioK+vj8aNG2PlypUwMjJC27ZtM12HoaEhtm3bJjHeeZs2bbB161bxegAgMjISV65cETeSZcXHxweRkZHYv3+/+AK4ffv2aNq06S+35/z58yhTpgxWr16d6fuOjo4oX748jhw5gsaNG2c6Xvbbt29x+PBhlC5d+pc/DwCKFy+O9evXA0j7XPT09LBr1y4MGDAAjo6OUq1D2tp+tGnTJoSHh2Pnzp3i3qKdO3dGmzZtsGDBAjRq1Ej8cGEgbViKgwcPioOygYEB5s+fj//9738oW7Zslj/nzZs34u38UZ8+fXD58mV07doVAFCmTBmMHj0aKSkp8PLywrBhw2BmZib19v/o27dvOHbsGPT09ACkBbdx48Zh37596NOnD0QiEWbPno0aNWpg8+bN4v23W7duaNmyJVauXIktW7ZIrNPR0RHLli375c/29fXF//73P3h7e0uEihEjRmQbuqR1/vx56OnpwdfXN9NnBNjY2KBq1arw9/dHrVq1UKNGDfF7cXFxmD9/Pjp37izxPL727dujefPm2Lhxo8R0gUCA5s2bY+LEiRl+TqFChVCsWDG8evXqt7eJiIiISJXExsYiMjISAoE
"text/plain": [
"<Figure size 1800x400 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# 5d. Cluster profiles\n",
"profile_vars_behavior = [\n",
" \"gross_flow_to_aum\", \"flow_freq\", \"flow_direction_balance\",\n",
" \"n_isin_total\", \"avg_holding_months_per_isin\", \"exit_rate_per_isin\",\n",
" \"log_aum_qty_mean\", \"months_since_last_tx\",\n",
"]\n",
"profile_vars_behavior = [c for c in profile_vars_behavior if c in dfc.columns]\n",
"\n",
"prof_behavior = plot_heatmap(\n",
" dfc, profile_vars_behavior, \"cluster_k4\",\n",
" title=\"Cluster Signatures — Behavioral Features (K=4, robust z-score)\",\n",
" figsize=(14, 4)\n",
")\n",
"print(\"\\n=== Median behavioral features — K=4 ===\")\n",
"print(prof_behavior.round(3).to_string())\n",
"\n",
"profile_vars_allocation = [\n",
" c for c in [\n",
" \"share_asset_fixed_income\", \"share_asset_diversified\",\n",
" \"share_asset_equity\", \"share_fund_carmignac_patrimoine\",\n",
" ] if c in dfc.columns\n",
"]\n",
"\n",
"prof_allocation = plot_heatmap(\n",
" dfc, profile_vars_allocation, \"cluster_k4\",\n",
" title=\"Cluster signatures — Allocation produits (K=4, descriptif)\",\n",
" figsize=(12, 4)\n",
")\n",
"print(\"\\n=== Median allocation — K=4 ===\")\n",
"print(prof_allocation.round(3).to_string())\n",
"\n",
"# 5e. Geographic description (post-clustering)\n",
"print(\"\\n=== Distribution géographique per cluster ===\")\n",
"geo_country = pd.crosstab(\n",
" dfc[\"cluster_k4\"], dfc[\"country_grp\"].fillna(\"Unknown\"),\n",
" normalize=\"index\"\n",
").round(3) * 100\n",
"geo_region = pd.crosstab(\n",
" dfc[\"cluster_k4\"], dfc[\"region_grp\"].fillna(\"Unknown\"),\n",
" normalize=\"index\"\n",
").round(3) * 100\n",
"\n",
"fig, axes = plt.subplots(1, 2, figsize=(18, 4))\n",
"sns.heatmap(geo_country, cmap=\"Blues\", annot=True, fmt=\".1f\",\n",
" ax=axes[0], cbar_kws={\"label\": \"%\"})\n",
"axes[0].set_title(\"Country distribution (% per cluster)\")\n",
"sns.heatmap(geo_region, cmap=\"Blues\", annot=True, fmt=\".1f\",\n",
" ax=axes[1], cbar_kws={\"label\": \"%\"})\n",
"axes[1].set_title(\"Region distribution (% per cluster)\")\n",
"plt.tight_layout(); plt.show()"
]
},
2026-04-08 17:41:37 +02:00
{
"cell_type": "markdown",
"id": "9fb2786e",
"metadata": {},
"source": [
"---\n",
"### 5e. Asset-Type Sub-Clustering & Cross-Analysis\n",
"\n",
"A complementary clustering is performed **within each asset type** (Fixed Income, Diversified, Equity, Alternative) using the same behavioral features restricted to each asset's positions. The cross-analysis with the global clustering uses the Adjusted Rand Index to measure how much the two segmentations overlap.\n"
]
},
2026-04-07 20:26:19 +02:00
{
"cell_type": "code",
"execution_count": 9,
"id": "bea76665-7a28-44ac-80a3-e32c595ff630",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"=== Asset types available ===\n",
"Product - Asset Type\n",
"Equity 2002728\n",
"Diversified 1365811\n",
"Fixed Income 933096\n",
"Alternative 210440\n",
"Private Assets 118\n",
"Name: count, dtype: int64\n",
"\n",
"Accounts per asset type:\n",
"Product - Asset Type\n",
"Diversified 4159\n",
"Fixed Income 3932\n",
"Equity 3899\n",
"Alternative 1317\n",
"Private Assets 11\n",
"Name: Registrar Account - ID, dtype: int64\n",
"\n",
"Retained asset types (>= 50 accounts): ['Alternative', 'Diversified', 'Equity', 'Fixed Income']\n",
"\n",
"============================================================\n",
"ASSET TYPE: Alternative\n",
"============================================================\n",
" k silhouette davies_bouldin\n",
" 2 0.4577 0.9931\n",
" 3 0.3432 1.1315\n",
" 4 0.2579 1.3841\n",
" 5 0.2823 1.2409\n",
" 6 0.2644 1.3500\n",
"→ Retained K: 2 (silhouette=0.4577)\n",
" n_accounts pct\n",
"cluster_alternative \n",
"0 310 23.5\n",
"1 1007 76.5\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABOQAAAGGCAYAAADbxV7qAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA5rxJREFUeJzs3XVYU+3/B/D3RgkiKKmCoqKACtiF+KiI3YUdYHdgB3a3YmBht4LyiPGI/Yj52IqtiEkoIQ3n94c/9mXAdExgOt6v69p1sXPuc++z7XDv7LM7RIIgCCAiIiIiIiIiIqI8IVZ2AERERERERERERPkJE3JERERERERERER5iAk5IiIiIiIiIiKiPMSEHBERERERERERUR5iQo6IiIiIiIiIiCgPMSFHRERERERERESUh5iQIyIiIiIiIiIiykNMyBEREREREREREeUhJuSIiIiIiIiIiIjyEBNyREQqwMnJCZMmTVJ2GEoxadIkODk5KTsMykGqfD736tULvXr1Uspjp6amolWrVli/fr1SHv934+TkhEGDBik7DPp/SUlJqF+/Pnbv3q3sUIiIiPIEE3JERL+x4OBgeHh4oFGjRrCzs0PVqlXRtWtXbN++HfHx8XkSQ1xcHNasWYNr167lyeOliYiIwNy5c9GsWTPY29ujTp066NSpE5YsWYJv377laSzZsWHDBpw5c0bZYfy2Lly4AGtrazg6OiI1NVWuY54/f441a9YgJCQkl6PLGb9rvH///Tc+fPiAnj17SrYdOXIE1tbWuH//vlTZ6OhodOrUCXZ2drh48eIvPe7p06cxevRoNGrUCJUqVULTpk2xcOFCREVF/VK9fyJltad/Ag0NDbi6umLDhg1ISEhQdjhERES5jgk5IqLf1Pnz59G6dWucOHECDRs2xPTp0+Hu7o7ixYtjyZIlmDdvXp7EERcXB09PT1y/fj1PHg8Avn79io4dO+Lo0aNo0KABpk2bBldXV1hYWGDv3r348uWLpOycOXNw8uTJPIvtZ7y8vJiQ+4Fjx47BzMwMoaGhuHr1qlzHPH/+HJ6ennj37l0uR5czfhTvli1bsGXLFiVE9f2xW7ZsiUKFCv2wXExMDNzc3PDkyRN4enrir7/++qXHnT59Ol68eIE2bdpg2rRpqFevHnbt2oUuXbrk2Q8LvwtltKd/kg4dOuDLly/w8/NTdihERES5Tl3ZARARUWZv377FmDFjULx4cWzfvh0mJiaSfT169MCbN29w/vx55QWYA2JjY6Gjo5PlvkOHDuH9+/fYu3cvqlatKrUvJiYGGhoakvvp/1ZVqampSEpKgpaWlrJD+SWxsbE4e/Ysxo4diyNHjsDPzw8ODg5KjUfWOZhbNDU18/Tx0jx69AhBQUE/HQocExODfv364fHjx/D09ET9+vV/+bFXr16NWrVqSW2ztbXFxIkT4efnh86dO//yYwDKeT/px+Li4qCtrS13eT09PTg6OsLHxwedOnXKxciIiIiUjz3kiIh+Q5s3b0ZsbCzmzZsnlYxLY2FhgT59+sg8fs2aNbC2ts60PW14WvqhdPfv30e/fv1Qq1Yt2Nvbw8nJCZMnTwYAhISEoE6dOgAAT09PWFtbw9raGmvWrJEc/+LFC4wcORI1a9aEnZ0dOnTogICAgCwf9/r165g5cybq1Knzwy/6wcHBUFNTQ+XKlTPt09XVlUpMZTWH3JcvXzB+/HhUrVoV1atXx8SJExEUFARra2scOXJE6tgqVarg06dPGDp0KKpUqYLatWtj0aJFSElJkapzy5Yt6Nq1q+R16tChQ6aeedbW1oiNjYWPj4/ktUpLgMia6y6r98ra2hqzZ8/GsWPH0LJlS9jZ2eHSpUsAgE+fPmHy5MlwcHCAra0tWrZsiUOHDmWqd+fOnWjZsiUqVaqEGjVqoEOHDkrvdfLPP/8gPj4ezZo1Q4sWLXD69OmfDk07cuQIRo0aBQDo3bu35HVNP+TvwoUL6N69OypXrowqVapg4MCBePbsmVQ9ae91cHAwBgwYgCpVqmDcuHEA/vd6nzlzBq1atZK8rhmHar579w4zZ85E06ZNYW9vj1q1amHkyJFS/08/izf9HHJhYWGoUKECPD09Mz3vly9fwtraGrt27ZJsi4qKwrx581C/fn3Y2tqicePG2Lhxo1xDf8+cOQMNDQ1Ur15dZplv376hf//+ePjwIdasWYMGDRr8tF55ZEzGAYCzszOA7+2HItL+b54/fw53d3fUqFED3bt3BwAkJydj7dq1cHZ2hq2tLZycnLB8+XIkJiZmWdfly5fRtm1b2NnZSc7LrB4ro9xoTzNKK5PV7WdDol+/fo0RI0agbt26sLOzw19//YUxY8YgOjpaqtzRo0fRqVMnSVvRo0cPXL58WarM7t270bJlS9ja2sLR0RGzZs3KNOS4V69eaNWqFR48eIAePXqgUqVKWL58OQAgMTERq1evRuPGjWFra4v69etj8eLFWb4nDg4OuHXrFr5+/frD50dERPSnYw85IqLf0Llz51CiRIlMvcNyWnh4OPr164ciRYpg4MCB0NPTQ0hICP755x8AgIGBAWbOnImZM2eicePGaNy4MQBIvpw+e/YM3bp1g6mpKQYMGAAdHR2cOHECw4YNw5o1ayTl08yaNQsGBgYYNmwYYmNjZcZlZmaGlJQUHD16FO3bt8/Wc0pNTcWQIUNw7949dOvWDWXKlEFAQAAmTpyYZfmUlBT069cP9vb2mDBhAgIDA7F161aUKFFC8gUfAHbs2AEnJye0bt0aSUlJOH78OEaNGgUvLy9J4mLx4sWYNm0a7O3t4eLiAgAoWbJktuJPc/XqVZw4cQI9evRAkSJFYGZmhrCwMLi4uEAkEqFHjx4wMDDAxYsXMXXqVMTExKBv374AgAMHDmDu3Llo2rQpevfujYSEBDx58gR3795F69atFYonJ/j5+aFWrVowNjZGy5YtsWzZMpw9exbNmzeXeUyNGjXQq1cv7Ny5E4MHD0aZMmUAAJaWlgAAX19fTJo0CY6Ojhg3bhzi4uKwd+9edO/eHT4+PjA3N5fUlZycjH79+qFatWqYOHEiChQoINl369YtnD59Gt27d0fBggWxc+dOjBw5EufOnUORIkUAfE+23L59Gy1btkTRokXx7t077N27F71798bx48ehra3903jTMzIyQo0aNXDixAkMHz5cap+/vz/U1NTQrFkzAN97GvXs2ROfPn1C165dUaxYMdy+fRvLly9HaGgopk6d+sPX/vbt27CyspLZozQuLg4DBgzAgwcPsGrVKjRs2DBTmcTERMTExPzwcdIYGBj8cH9YWBgASF5bRY0aNQoWFhYYM2YMBEEAAEybNg0+Pj5o2rQpXF1dce/ePXh5eeHFixdYu3at1PGvX7/GmDFj0LVrV7Rv3x6HDx/GqFGjsHnzZtStWzdbsfxqe5qVxYsXZ9q2atUqhIeH/7A3YGJiIvr164fExET07NkTRkZG+PTpE86fP4+oqCjJsGVPT0+sWbMGVapUwciRI6GhoYG7d+/i6tWrcHR0BPA9Ienp6QkHBwd069YNr169wt69e3H//n3s3btX6pz6+vUrBgwYgJYtW6JNmzYwNDSUtMm3bt2Ci4sLLC0t8fTpU2zfvh2vX7/GunXrpGKvWLEiBEHA7du3szwPiYiIVIZARES/lejoaMHKykoYMmSI3Mc0bNhQmDhxouT+6tWrBSsrq0zlDh8+LFhZWQlv374VBEEQ/vnnH8HKykq4d++ezLrDw8MFKysrYfXq1Zn29enTR2jVqpWQkJAg2Zaamip06dJFaNKkSabH7datm5CcnPzT5xMaGirUrl1bsLKyEpo1ayZ4eHgIfn5+QlRUVKayEydOFBo2bCi5f+rUKcHKykrYtm2bZFtKSorQu3dvwcrKSjh8+LDUsVZWVoKnp6dUne3atRPat28vtS0uLk7qfmJiotCqVSuhd+/eUtsrV64s9V7IijNNVu+VlZWVYGNjIzx79kxq+5QpU4S6desKERERUtvHjBkjVKtWTRLjkCFDhJYtW2Z6LGUKCwsTKlSoIBw4cECyrUuXLlme5xn
"text/plain": [
"<Figure size 1400x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Medians — Alternative:\n",
" flow_freq gross_flow_to_aum avg_n_isin_held flow_direction_balance log_aum_qty_mean months_since_last_tx_asset aum_final_to_peak aum_drawdown_last\n",
"cluster_alternative \n",
"0 0.085 1.039 1.000 0.104 5.776 12.0 0.915 0.085\n",
"1 0.069 4.730 0.512 -0.072 5.063 66.0 0.000 1.000\n",
"\n",
"============================================================\n",
"ASSET TYPE: Diversified\n",
"============================================================\n",
" k silhouette davies_bouldin\n",
" 2 0.6037 0.6502\n",
" 3 0.5111 0.8181\n",
" 4 0.4851 0.9788\n",
" 5 0.4695 0.8712\n",
" 6 0.3429 1.1031\n",
"→ Retained K: 2 (silhouette=0.6037)\n",
" n_accounts pct\n",
"cluster_diversified \n",
"0 3369 81.0\n",
"1 790 19.0\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABOQAAAGGCAYAAADbxV7qAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA5gRJREFUeJzs3XdUE9nbB/BvQlOqIkUFRQUBG4oFFXVVlhV7F3vDXrBXdLF37KBiQ7H39hPL2hvq6qLYdW2IjaYUQSmZ9w9fsgSIBgxEw/dzzpxDJndungmTm+TJLSJBEAQQERERERERERFRvhCrOgAiIiIiIiIiIqKChAk5IiIiIiIiIiKifMSEHBERERERERERUT5iQo6IiIiIiIiIiCgfMSFHRERERERERESUj5iQIyIiIiIiIiIiykdMyBEREREREREREeUjJuSIiIiIiIiIiIjyERNyRERERERERERE+YgJOSIiNeHi4oJJkyapOgyVmDRpElxcXFQdBinZ/v37YWdnh/DwcFWHIpednR1Wrlwpsy80NBRdunRBtWrVYGdnhwcPHmDlypWws7NT6mP37NkTPXv2VKjsp0+fULduXRw+fFipMfyq7OzsMHPmTFWHQf/vw4cPqFatGs6fP6/qUIiIiPKNpqoDICKibwsLC8P69etx+fJlREREQEtLC7a2tmjWrBk6d+6MQoUK5XkMSUlJWL9+PZycnFC7du08f7x0MTExWLVqFS5duoQ3b95AT08PFhYWqF27NoYOHQo9Pb18iyUn1qxZAxsbG7i6uqo6lJ/GtWvX0KtXL+ltLS0tGBoawtraGvXq1YO7uzuMjY1VGKFypKSkYNSoUdDW1sbkyZNRqFAhlCxZUtVhITAwEHp6emjRooV038qVK+Hr64vg4GCZ5/7t27fo2bMn4uLiEBAQgEqVKuXqMSUSCQ4ePIiTJ0/iwYMHiI2NhaWlJZo3b45+/fpBR0fnh8/rV/L+/Xvs3r0brq6uqFChgqrD+akULVoUHTt2xPLly9GwYUNVh0NERJQvmJAjIvqJnTt3DiNHjoS2tjbatGkDW1tbpKSk4ObNm1i0aBH+/fdfzJo1K8/jSEpKgq+vL4YPH55vCbmPHz+iQ4cOSEhIQIcOHVCuXDl8/PgRjx49wo4dO9C1a1dpQm7WrFkQBCFf4lKEv78/3NzcmJDLRs+ePVGlShVIJBLExMQgJCQEK1euREBAAJYtW4a6detKy7Zp0wYtWrSAtra2CiP+ttDQUGhoaEhvh4WF4fXr15g9ezY6deok3T9kyBAMHDhQFSEiJSUFgYGB6NOnj0ys2Xn//j169eqF2NjYH0rGAV/bjcmTJ6NatWro0qULihUrJv1/BwcHIzAwECKRKNf1/2oiIiLg6+sLCwsLJuSy0bVrV2zZsgXBwcEy7QAREZG6YkKOiOgn9erVK4wePRolS5bE5s2bYWZmJr2ve/fuePnyJc6dO6e6AJUgMTERurq62d63d+9evHnzBjt27ED16tVl7ktISICWlpb0dsa/1ZVEIkFKSsov36uoZs2aaNq0qcy+hw8fwsPDAyNGjMDRo0el17qGhsZ3E0h5ISkpCYULF1aobOb/R0xMDADAwMBAZr+mpiY0NVXzsevcuXOIiYlBs2bNvlkuPRn38eNHbNy4EZUrV/6hx9XS0sry+nV3d4eFhYU0Kefs7PxDj5HuW20J5T9BEPDly5cc9eC2traGra0tDhw4wIQcEREVCJxDjojoJ7V+/XokJiZizpw5Msm4dFZWVujdu7fc4+XNWZXdvFx37txBv379ULt2bTg4OMDFxQWTJ08GAISHh0u/HPn6+sLOzi7LvFlPnz7FiBEj4OTkhCpVqqB9+/Y4ffp0to97/fp1TJ8+HXXr1v3m0KSwsDBoaGigWrVqWe7T19eXSYRkN4fchw8fMH78eFSvXh01a9bExIkT8fDhQ9jZ2WH//v0yxzo6OuL9+/cYOnQoHB0dUadOHSxYsABpaWkydW7YsAFdunSRPk/t27fH8ePHZcrY2dkhMTERBw4ckD5X6XP7yZvrLrv/VfocV4cPH0aLFi1QpUoVXLx4EcDXxMnkyZPh7OyMypUro0WLFti7d2+Werds2YIWLVqgatWqqFWrFtq3b48jR45k93SrlL29Pby8vBAXF4dt27ZJ92e+VgcNGoTff/892zo6d+6M9u3by+w7dOgQ2rdvDwcHBzg5OWH06NF4+/atTJmePXuiZcuWuHv3Lrp3746qVatiyZIlAL79ukiX8bUwadIk9OjRAwAwcuRI2NnZSed4k/d6VCRGANi1axdcXV3h4OCAjh074saNG/Kf0ExOnToFCwsLlC5dWm6ZiIgI9OrVC9HR0diwYQOqVKmicP3yaGtrZ0mmA8Aff/wB4Gu7kRvpr9mwsDAMGDAAjo6OGDduHICvibn58+ejYcOGqFy5Mtzc3LBhwwa5PWgPHz4MNzc3abv1999/Z3ksRV+zly9fRteuXVGzZk04OjrCzc1Nei1du3YNHTt2BABMnjxZ2jZkbIsyCg8Pl5bJbvseRa5diUSCzZs3o1WrVqhSpQrq1KmDfv364c6dO9Iyqamp8PPzg6urKypXrgwXFxcsWbIEycnJMnW5uLhg0KBBuHjxovR63rlzJwAgLi4Oc+bMkf5P/vjjD6xduxYSiSRL3M7Ozjh79uxP1eOZiIgor7CHHBHRT+rs2bMoVapUtl9olSk6Ohr9+vVD0aJFMXDgQBgaGiI8PBx//fUXAMDY2BjTp0/H9OnT8ccff0i/TKd/KXzy5Am6du0Kc3NzDBgwALq6ujh27BiGDRuGlStXSsunmzFjBoyNjTFs2DAkJibKjcvCwgJpaWk4dOgQ2rVrl6NzkkgkGDJkCEJDQ9G1a1eUK1cOp0+fxsSJE7Mtn5aWhn79+sHBwQETJkxAcHAwNm7ciFKlSqFbt27ScoGBgXBxcUGrVq2QkpKCo0ePYuTIkfD390ejRo0AAAsXLsTUqVPh4OAAd3d3APhmIuRbrl69imPHjqF79+4oWrQoLCwsEBUVBXd3d4hEInTv3h3Gxsa4cOECpkyZgoSEBPTp0wcAsHv3bsyePRtubm7o1asXvnz5gkePHuH27dto1apVruLJS25ubpgyZQouXbqE0aNHZ1umWbNmmDhxIkJDQ+Hg4CDd//r1a9y6dQsTJkyQ7lu9ejWWL1+OZs2aoWPHjoiJicHWrVvRvXt3HDx4EIaGhtKyHz9+xIABA9CiRQu0bt0axYoV++7rIjudO3eGubk51qxZIx2aa2JiIre8ojHu2bMH3t7ecHR0RO/evfHq1SsMGTIERkZGKFGixHef25CQkG8OPY2OjsaIESMQFRWFjRs3yjy36ZKSkpCUlPTdx9LQ0ICRkdE3y0RFRQH4Om9YbqWmpqJfv36oUaMGJk6ciEKFCkEQBAwZMkSa/KpQoQIuXryIhQsX4v379/Dy8pKp4++//0ZQUBB69uwJbW1t7NixA/3798eePXtga2ubo3iePHmCQYMGwc7ODiNGjIC2tjZevnyJf/75B8DX3l8jRozAihUr0LlzZ9SoUQMA5LbvxsbGWLhwYZZznjdv3nd7BCt67U6ZMgX79+/Hb7/9ho4dOyItLQ03btzA7du3pQnZqVOn4sCBA3Bzc0Pfvn0RGhoKf39/PH36FH5+fjL1PX/+HGPHjkXnzp3h7u6OsmXLIikpCT169MD79+/RpUsXlChRAiEhIViyZAkiIyMxZcoUmToqVaqETZs24cmTJzn+HxAREf1yBCIi+unEx8cLtra2wpAhQxQ+pnHjxsLEiROlt1esWCHY2tpmKbdv3z7B1tZWePXqlSAIgvDXX38Jtra2QmhoqNy6o6OjBVtbW2HFihVZ7uvdu7fQsmVL4cuXL9J9EolE6Ny5s9CkSZMsj9u1a1chNTX1u+cTGRkp1KlTR7C1tRWaNm0qeHt7C0eOHBHi4uKylJ04caLQuHFj6e0TJ04Itra2wqZNm6T70tLShF69egm2trbCvn37ZI61tbUVfH19Zeps27at0K5dO5l9SUlJMreTk5OFli1bCr169ZLZX61
"text/plain": [
"<Figure size 1400x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Medians — Diversified:\n",
" flow_freq gross_flow_to_aum avg_n_isin_held flow_direction_balance log_aum_qty_mean months_since_last_tx_asset aum_final_to_peak aum_drawdown_last\n",
"cluster_diversified \n",
"0 0.044 3.042 0.625 -0.578 5.063 80.0 0.000 1.000\n",
"1 0.085 0.217 1.000 -0.675 5.150 12.0 0.907 0.093\n",
"\n",
"============================================================\n",
"ASSET TYPE: Equity\n",
"============================================================\n",
" k silhouette davies_bouldin\n",
" 2 0.3706 1.3811\n",
" 3 0.4255 0.9469\n",
" 4 0.2870 1.3650\n",
" 5 0.2594 1.4419\n",
" 6 0.2784 1.3111\n",
"→ Retained K: 3 (silhouette=0.4255)\n",
" n_accounts pct\n",
"cluster_equity \n",
"0 767 19.7\n",
"1 748 19.2\n",
"2 2384 61.1\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABM4AAAGGCAYAAACDus3zAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA9ORJREFUeJzs3XlcTPv/B/DXTItEpU0UQlFdirrJvsW92WWXXbbs++5G2felRHyzZN+zhchO9i1J106WtNCelpnfH37NbVSMaZnk9Xw8zoM553M+8z7T6dPMez6LQCwWi0FERERERERERERShIoOgIiIiIiIiIiIqChi4oyIiIiIiIiIiCgHTJwRERERERERERHlgIkzIiIiIiIiIiKiHDBxRkRERERERERElAMmzoiIiIiIiIiIiHLAxBkREREREREREVEOmDgjIiIiIiIiIiLKARNnREREREREREREOWDijIiIpNjb22PatGmKDkMhpk2bBnt7e0WHQUVYYf9+DBkyBLNmzSq05yvK+vbti3bt2ik6DMqie/fuWLJkiaLDICIiKlBMnBER/SZev34NV1dXtGjRApaWlrCxsUHPnj2xdetWpKSkFEoMycnJ8PDwwPXr1wvl+TLFxMRg3rx5aNWqFaysrFC/fn107doVS5cuRWJiYqHG8jPWr1+PM2fOKDqMIuX69eswMzPLdTt+/HihxvP06VN4eHggPDw83+u+ffs2rly5giFDhkj2ZV7/yZMnpcqmpqZi2LBhMDc3x/79+/P0vDdv3oSLiwuaNm0KS0tLNGzYEIMGDcLt27fzVO+vir+HuRsyZAh27tyJyMhIRYdCRERUYJQVHQARERW88+fPY+zYsVBVVUXHjh1RvXp1pKWl4fbt21i6dCmePn2KuXPnFngcycnJ8PT0xKhRo1C3bt0Cfz4A+Pz5M7p06YKEhAR06dIFVatWxefPnxEWFoZdu3bByckJpUqVAgDMnTsXYrG4UOKShbe3NxwcHNCyZUtFh1Lk9O3bF5aWltn2165du0Cf9+TJkxAIBJLHT58+haenJ+zs7FChQoV8fS4fHx/Ur18fxsbG3y2XlpaGMWPG4MKFC5g7dy66du2ap+d9+fIlhEIhevbsCT09PcTFxeHIkSPo06cPvL290aRJkzzV/6vh72HuWrRogdKlS2Pnzp0YO3asosMhIiIqEEycEREVc2/evMH48eNhaGiIrVu3omzZspJjvXv3xqtXr3D+/HnFBZgPkpKSoK6unuOx/fv34927d9i1axdsbGykjiUkJEBFRUXyOOv/iyuRSIS0tDSUKFFC0aHkia2tLVq1alXoz6uqqloozxMdHY0LFy5gzpw53y2XlpaGcePG4fz583B3d0e3bt3y/NzdunXLVk+vXr3QsmVLbN26Nd8SZ9/7vSXF+NmfiVAohIODAw4fPowxY8ZIJZWJiIiKCw7VJCIq5v73v/8hKSkJ8+fPl0qaZTI2Nkb//v1zPd/DwwNmZmbZ9h88eBBmZmZSQ9SCg4MxaNAg1K1bF1ZWVrC3t8f06dMBAOHh4ahfvz4AwNPTUzK0zsPDQ3L+s2fPMGbMGNjZ2cHS0hKdO3dGYGBgjs9748YNzJkzB/Xr10fTpk1zjf/169dQUlLKsSdS6dKlpRJIOc1x9unTJ0yePBk2NjawtbXF1KlT8fjxY5iZmeHgwYNS51pbWyMiIgIjRoyAtbU16tWrh8WLFyMjI0OqTh8fH/Ts2VPyOnXu3Dnb0DszMzMkJSXh0KFDktcqc26t3OZiy+lnZWZmBnd3dxw5cgRt27aFpaUlLl26BACIiIjA9OnT0aBBA9SsWRNt27bNcZjftm3b0LZtW9SqVQt16tRB586dcfTo0Zxe7iIlNTUVCxYsQL169WBtbQ0XFxd8+PAh2333M69n1jnODh48KOll069fP8nP6fr165g6dSrq1q2LtLS0bPU6OzvDwcHhu7GfP38e6enpaNCgQa5l0tPTMWHCBAQGBmLOnDno3r37d+vMi5IlS0JHRwfx8fFynf+j39sdO3agbdu2qFmzJho1agQ3NzfExcXlWNfDhw/Rs2dPSRuza9euHJ/r2+GzmcNcsw4Vf/nyJUaPHo2GDRvC0tISTZo0wfjx4yXX+b3fw5zY29vnOoz4R0PUIyMjMX36dDRp0kTyOgwfPjzbdVy4cAF9+vSBtbU1bGxs0KVLl2y/jydOnEDnzp1hZWWFunXrYtKkSYiIiJAqk9lmvX79GkOGDIG1tTUmTZoE4GuCfcuWLZI2o0GDBnB1dUVsbGy2uBs0aIC3b98iNDT0u9dHRET0q2KPMyKiYu7cuXOoWLFitt5W+S06OhqDBg2CtrY2hg4dCk1NTYSHh+P06dMAAB0dHcyZMwdz5szBX3/9hb/++gsAJImJJ0+ewMnJCQYGBhgyZAjU1dVx4sQJjBw5Eh4eHpLymdzc3KCjo4ORI0ciKSkp17iMjIyQkZGBw4cPo1OnTj91TSKRCMOHD8eDBw/g5OSEqlWrIjAwEFOnTs2xfEZGBgYNGgQrKytMmTIFQUFB2LRpEypWrIhevXpJyvn6+sLe3h7t27dHWloajh8/jrFjx8Lb2xvNmjUDACxZsgSzZs2ClZWVJCFSqVKln4o/07Vr13DixAn07t0b2traMDIyQlRUFLp37w6BQIDevXtDR0cHFy9exMyZM5GQkIABAwYAAPbu3Yt58+bBwcEB/fr1w5cvXxAWFob79++jffv2csWTHxITExETE5Ntv7a2tqTXy8yZM3HkyBG0a9cONjY2uHbtGoYOHZpvMdSpUwd9+/bFtm3b4OLigqpVqwIATExM0LFjR/j5+eHy5cto3ry55JzIyEhcu3YNI0eO/G7dd+/eRZkyZWBkZJTj8YyMDEyYMAGnT5+Gq6srevbsma1MWlqazImuMmXKQCiU/j41ISEBqamp+PTpEw4fPox///0XLi4uMtWXm5x+bz08PODp6YkGDRrAyckJL168wK5duxAcHIxdu3ZJ9QSNjY3F0KFD0bp1a7Rt2xYnTpzAnDlzoKKi8tNDVFNTUzFo0CCkpqaiT58+0NPTQ0REBM6fP4+4uDhoaGj89O/hjBkzss2buHXrVoSGhqJMmTLfjWf06NF4+vQp+vTpAyMjI8TExODKlSt4//69ZBjwwYMHMWPGDFSrVg3Dhg2DhoYGQkNDcenSJcnv48GDBzF9+nRYWlpiwoQJiI6Ohq+vL+7cuQM/Pz9oampKnjM9PR2DBg3Cn3/+ialTp0JNTQ0A4OrqikOHDqFz587o27cvwsPDsWPHDjx69Cjbz6RmzZoAgDt37uCPP/6Q8dUnIiL6dTBxRkRUjCUkJCAiIgItWrQo8Oe6e/cuYmNj4ePjIzX31Pjx4wEA6urqcHBwwJw5c2BmZoaOHTtKnT9//nyUL18eBw4ckAyH69WrF5ycnLBs2bJsiTMtLS1s2bIFSkpK342rS5cu2LJlC6ZNm4YNGzbAzs4OderUQdOmTaGhofHdc8+cOYO7d+9ixowZkl55Tk5OGDhwYI7lv3z5gtatW0uSIk5OTujUqRP2798vlTg7deqU5AMq8HXIbOfOnbF582ZJ4qxjx46YM2cOKlasmO21+lkvXrzA0aNHYWpqKtk3c+ZMZGRk4OjRo9DW1pbEO2HCBHh6eqJnz55QU1PD+fPnUa1aNaxZsyZPMeS3GTNm5Lj/8uXL0NfXx+PHj3HkyBH06tULs2fPBvD1dZ44cSLCwsLyJYaKFSvC1tYW27ZtQ4MGDaTm7dPR0UG5cuVw5MgRqcTZ8ePHIRKJ0KFDh+/W/fz581yTZgCwfPlyvH37Fq6urlL3VlZ37txBv379ZLqWwMDAbHO0jR07FpcvXwbwdRhzjx49MGLECJnqy823v7cxMTHw9vZGo0aNsHHjRknyrmrVqpKekl26dJGc//HjR0ybNk3yO9ijRw90794dK1asQMeOHX9quPWzZ88QHh6O1atXSw37HTVqlOT/P/t7+O08aCdOnEBISAjGjBmTY8/dTHFxcbh79y6mTJmCQYMGSfYPGzZM8v/4+HjMmzcPVlZW2LZtm1Rv2cy
"text/plain": [
"<Figure size 1400x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Medians — Equity:\n",
" flow_freq gross_flow_to_aum avg_n_isin_held flow_direction_balance log_aum_qty_mean months_since_last_tx_asset aum_final_to_peak aum_drawdown_last\n",
"cluster_equity \n",
"0 0.071 0.067 1.046 -0.935 4.552 12.0 0.975 0.025\n",
"1 0.646 3.610 3.588 -0.099 8.474 0.0 0.154 0.846\n",
"2 0.025 3.296 0.576 -0.835 3.976 90.0 0.000 1.000\n",
"\n",
"============================================================\n",
"ASSET TYPE: Fixed Income\n",
"============================================================\n",
" k silhouette davies_bouldin\n",
" 2 0.6775 0.5104\n",
" 3 0.4227 0.8458\n",
" 4 0.4350 0.9964\n",
" 5 0.4607 0.9170\n",
" 6 0.4388 0.9468\n",
"→ Retained K: 2 (silhouette=0.6775)\n",
" n_accounts pct\n",
"cluster_fixed_income \n",
"0 3140 79.9\n",
"1 792 20.1\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABOQAAAGGCAYAAADbxV7qAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA6I5JREFUeJzs3XdUE9nbB/BvQlO6gKACooKADTsq6KqIvXexrWJFwYpdseuqWFAs2FDsHXXFsrqWVbGsa+8dsVKUIiAl8/7hS34EiIZQovj9nJNzyJ07kyfJcDN5cotIEAQBREREREREREREVCDEqg6AiIiIiIiIiIjoV8KEHBERERERERERUQFiQo6IiIiIiIiIiKgAMSFHRERERERERERUgJiQIyIiIiIiIiIiKkBMyBERERERERERERUgJuSIiIiIiIiIiIgKEBNyREREREREREREBYgJOSIiIiIiIiIiogLEhBwR0Q/MxcUFEydOVHUYKjFx4kS4uLioOgzKJTs7O6xYsaLAH/dX/t/JjUGDBmHq1KmqDuOH0KdPH7Rp00bVYVAG3bp1w8KFC1UdBhERUZ5gQo6ISAXCwsLg4+ODJk2aoEqVKqhRowZ69OiBzZs3IykpqUBiSExMxIoVK3D58uUCebx00dHRmDNnDlq0aAEHBwfUq1cPXbp0waJFi/D58+cCjSUn1qxZg5MnT6o6jB/K5cuXYWdnl+1t9OjRqg5PIXZ2dpg1a5aqw/ghXLt2DRcuXMCgQYOkZenv8bFjx2TqJicnY8iQIbC3t8fevXtz9bihoaGYNGkSmjdvjqpVq6JJkyaYMmUKPnz4kKvj/qzY1sg3aNAgbN++HREREaoOhYiIKNfUVR0AEdGv5syZMxg5ciQ0NTXRvn172NraIiUlBdeuXcOiRYvw5MkTzJ49O9/jSExMhL+/Pzw9PVGnTp18fzwA+PTpEzp37oz4+Hh07twZ5cqVw6dPn/Dw4UPs2LEDbm5u0NHRAQDMnj0bgiAUSFyKCAgIQPPmzeHq6qrqUH44ffr0QZUqVWTKzM3NAQC3bt2CmpqaKsKiHNqwYQPq1asHKyurb9ZLSUnBiBEjcPbsWcyePRtdunTJ1eMuWrQIMTExaNGiBcqUKYNXr15h69atOHPmDIKDg1G8ePFcHf9nw7ZGviZNmkBXVxfbt2/HyJEjVR0OERFRrjAhR0RUgF69eoXRo0ejVKlS2Lx5M0xNTaXbevXqhZcvX+LMmTOqCzAPJCQkQFtbO9tte/fuxZs3b7Bjxw7UqFFDZlt8fDw0NDSk9zP+XVhJJBKkpKRAS0tL1aHkSq1atdCiRYtst/3sz+1XERUVhbNnz2LGjBnfrJeSkoJRo0bhzJkzmDVrFrp27Zrrx540aRJq1qwJsfh/AzcaNGiA3r17Y+vWrXnW2/JbbROpRk7fE7FYjObNm+PgwYMYMWIERCJRPkZHRESUvzhklYioAK1fvx4JCQmYO3euTDIunZWVFX7//Xe5+69YsQJ2dnZZyvfv3w87OzuEh4dLy27fvo0BAwagTp06cHBwgIuLCyZNmgQACA8PR7169QAA/v7+0mGGGef6evr0KUaMGAFHR0dUqVIFnTp1wqlTp7J93CtXrmDGjBmoV68eGjZsKDf+sLAwqKmpoVq1alm26erqyiRvsptD7uPHjxg3bhxq1KiBWrVqYcKECXjw4AHs7Oywf/9+mX2rV6+O9+/fY9iwYahevTrq1q2LBQsWIC0tTeaYGzZsQI8ePaSvU6dOnbIMz7Ozs0NCQgIOHDggfa3S5yeTN9dddu9V+vDIQ4cOoXXr1qhSpQr++ecfAMD79+8xadIkODk5oXLlymjdunW2QwG3bNmC1q1bo2rVqqhduzY6deqEw4cPZ/dy/xAynldJSUlo0aIFWrRoITM0+9OnT6hfvz569OghfX8kEgk2bdokfZ2cnJzg4+ODmJgYmeMLgoBVq1bht99+Q9WqVdGnTx88fvxY6XjTh2iGhIRg9erV+O2331ClShX8/vvvePnyZZb6N2/exKBBg1C7dm1Uq1YNbdu2xebNm2XqhIaGomfPnqhWrRpq1aoFDw8PPH36VKZO+vny/PlzeHt7o2bNmqhbty6WLVsGQRDw9u1beHh4oEaNGnB2dsbGjRuzxJKcnIzly5ejadOmqFy5Mho2bIiFCxciOTn5u8/7zJkzSE1NhZOTk9w6qampGDNmDE6dOoUZM2agW7du3z2uImrXri2TjEsvMzQ0xLNnz5Q65vfapm3btqF169aoXLky6tevj5kzZyI2NjbbY925cwc9evSQtqM7duzI9rEytr/A/86ljNMCvHjxAl5eXnB2dkaVKlXw22+/YfTo0YiLiwPw7bYmOy4uLnKHjX9vOoKIiAhMmjQJv/32m/R18PDwyPI8zp49i969e6N69eqoUaMGOnfunKXNOXr0KDp16gQHBwfUqVMH3t7eeP/+vUyd9HY5LCwMgwYNQvXq1eHt7Q1A8f93AHBycsLr169x//79bz4/IiKiHx17yBERFaDTp0/D0tIyS++wvBYVFYUBAwagWLFiGDx4MPT19REeHo6//voLAGBkZIQZM2ZgxowZaNq0KZo2bQoA0gTS48eP4ebmBjMzMwwaNAja2to4evQohg8fjhUrVkjrp5s5cyaMjIwwfPhwJCQkyI3L3NwcaWlpOHjwIDp27Jij5ySRSODh4YFbt27Bzc0N5cqVw6lTpzBhwoRs66elpWHAgAFwcHDA+PHjERoaio0bN8LS0hI9e/aU1gsKCoKLiwvatm2LlJQUHDlyBCNHjkRAQAAaNWoEAFi4cCGmTp0KBwcHaRKidOnSOYo/3aVLl3D06FH06tULxYoVg7m5OSIjI9GtWzeIRCL06tULRkZGOHfuHKZMmYL4+Hj069cPALB7927MmTMHzZs3R9++ffHlyxc8fPgQN2/eRNu2bZWKJy98/vwZ0dHRMmWGhoZZkixFihTBggUL4ObmhqVLl0oTxLNmzUJcXBzmz58vHd7q4+ODAwcOoFOnTujTpw/Cw8Oxbds23Lt3Dzt27JD2oPTz88Pq1avRsGFDNGzYEHfv3oW7uztSUlJy9ZzWrVsHkUgEd3d3xMfHY/369fD29saePXukdS5cuIAhQ4bA1NQUffv2hYmJCZ4+fYozZ85IE+sXL17EoEGDYGFhAU9PTyQlJWHr1q1wc3PD/v37YWFhIfO4o0ePhrW1NcaOHYuzZ89i9erVMDQ0xM6dO1G3bl14e3vj8OHDWLBgAapUqYLatWsD+N//x7Vr19CtWzdYW1vj0aNH2Lx5M168eIFVq1Z98/lev34dhoaG0qHGmaWlpWHMmDH466+/4OPjgx49emSpk5KSIk0sfU9250dGnz9/xufPn1GsWDGFjidPdm3TihUr4O/vDycnJ7i5ueH58+fYsWMHbt++LXNuAUBMTAwGDx6Mli1bonXr1jh69ChmzJgBDQ2NHA/VTU5OxoABA5CcnIzevXvDxMQE79+/x5kzZxAbGws9Pb0ctzWTJ0/OMvfm5s2bcf/+fRgaGn4zHi8vLzx58gS9e/eGubk5oqOjceHCBbx9+1Z6Xu7fvx+TJ09G+fLlMWTIEOjp6eH+/fv4559/pG3O/v37MWnSJFSpUgVjxoxBVFQUgoKC8N9//yE4OBj6+vrSx0xNTcWAAQNQs2ZNTJgwAUWKFAGg+P87AFSuXBkA8N9//6FixYoKvvpEREQ/IIGIiApEXFycYGtrK3h4eCi8T+PGjYUJEyZI7y9fvlywtbXNUm/fvn2Cra2t8OrVK0EQBOGvv/4SbG1thVu3bsk9dlRUlGBrayssX748y7bff/9daNOmjfDlyxdpmUQiEbp37y40a9Ysy+O6ubkJqamp330+ERERQt26dQVbW1uhRYsWgo+Pj3D48GEhNjY2S90JEyYIjRs3lt4/fvy4YGtrK2zatElalpaWJvTt21ewtbUV9u3bJ7Ovra2t4O/vL3PMDh06CB07dpQpS0xMlLmfnJwstGnTRujbt69MebVq1WTeC3lxpsvuvbK1tRXs7e2Fx48fy5RPnjxZcHZ2FqKjo2X
"text/plain": [
"<Figure size 1400x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Medians — Fixed Income:\n",
" flow_freq gross_flow_to_aum avg_n_isin_held flow_direction_balance log_aum_qty_mean months_since_last_tx_asset aum_final_to_peak aum_drawdown_last\n",
"cluster_fixed_income \n",
"0 0.060 6.239 0.48 0.000 5.146 69.0 0.000 1.000\n",
"1 0.182 2.310 1.50 0.471 7.273 2.0 0.998 0.002\n",
"\n",
"============================================================\n",
"SUMMARY — Asset-type clustering\n",
"============================================================\n",
" Alternative : K=2, sil=0.4577, n=1317\n",
" Diversified : K=2, sil=0.6037, n=4159\n",
" Equity : K=3, sil=0.4255, n=3899\n",
" Fixed Income : K=2, sil=0.6775, n=3932\n"
]
}
],
"source": [
"# ============================================================\n",
"# ASSET-TYPE SUB-CLUSTERING\n",
"# ============================================================\n",
"\n",
"print(\"=== Asset types available ===\")\n",
"print(df_aum[ASSET_COL].value_counts())\n",
"\n",
"# Build account × asset type monthly panel\n",
"df_rel_m_asset = df_rel_m.copy()\n",
"df_rel_m_asset = df_rel_m_asset.merge(\n",
" df_aum[[ID_COL, ISIN_COL, \"month\", ASSET_COL]].drop_duplicates(),\n",
" on=[ID_COL, ISIN_COL, \"month\"], how=\"left\"\n",
")\n",
"\n",
"tmp_asset = df_rel_m_asset.copy()\n",
"tmp_asset[\"isin_held_flag\"] = (tmp_asset[\"aum_qty\"] > 0).astype(int)\n",
"tmp_asset[\"isin_active_flag\"] = (tmp_asset[\"gross_flow_qty\"] > 0).astype(int)\n",
"\n",
"df_month_asset = (\n",
" tmp_asset.dropna(subset=[ASSET_COL])\n",
" .groupby([ID_COL, ASSET_COL, \"month\"], as_index=False)\n",
" .agg(\n",
" aum_qty = (\"aum_qty\", \"sum\"),\n",
" net_flow_qty = (\"net_flow_qty\", \"sum\"),\n",
" gross_flow_qty = (\"gross_flow_qty\", \"sum\"),\n",
" sub_qty = (\"sub_qty\", \"sum\"),\n",
" red_qty = (\"red_qty\", \"sum\"),\n",
" n_tx = (\"n_tx\", \"sum\"),\n",
" n_isin_held = (\"isin_held_flag\", \"sum\"),\n",
" )\n",
" .sort_values([ID_COL, ASSET_COL, \"month\"])\n",
" .reset_index(drop=True)\n",
")\n",
"\n",
"df_month_asset[\"active_month\"] = (df_month_asset[\"gross_flow_qty\"] > 0).astype(int)\n",
"df_month_asset[\"flow_direction\"] = np.where(\n",
" df_month_asset[\"gross_flow_qty\"] > 0,\n",
" df_month_asset[\"net_flow_qty\"] / df_month_asset[\"gross_flow_qty\"], np.nan\n",
")\n",
"df_month_asset[\"aum_peak\"] = df_month_asset.groupby(\n",
" [ID_COL, ASSET_COL])[\"aum_qty\"].cummax()\n",
"df_month_asset[\"aum_drawdown\"] = np.where(\n",
" df_month_asset[\"aum_peak\"] > 0,\n",
" 1 - df_month_asset[\"aum_qty\"] / df_month_asset[\"aum_peak\"], np.nan\n",
")\n",
"\n",
"# Feature engineering per account × asset type\n",
"reference_date = df_month_asset[\"month\"].max()\n",
"last_active_asset = (\n",
" df_month_asset[df_month_asset[\"active_month\"] == 1]\n",
" .groupby([ID_COL, ASSET_COL])[\"month\"].max()\n",
" .reset_index(name=\"last_active_month\")\n",
")\n",
"last_active_asset[\"months_since_last_tx_asset\"] = (\n",
" (reference_date.to_period(\"M\") -\n",
" last_active_asset[\"last_active_month\"].dt.to_period(\"M\"))\n",
" .apply(lambda x: x.n)\n",
")\n",
"\n",
"df_client_asset = (\n",
" df_month_asset.groupby([ID_COL, ASSET_COL], as_index=False)\n",
" .agg(\n",
" n_months = (\"month\", \"nunique\"),\n",
" n_active_months = (\"active_month\", \"sum\"),\n",
" flow_freq = (\"active_month\", \"mean\"),\n",
" aum_qty_mean = (\"aum_qty\", \"mean\"),\n",
" aum_qty_max = (\"aum_qty\", \"max\"),\n",
" aum_qty_last = (\"aum_qty\", \"last\"),\n",
" gross_flow_qty_sum = (\"gross_flow_qty\", \"sum\"),\n",
" net_flow_qty_sum = (\"net_flow_qty\", \"sum\"),\n",
" n_tx_total = (\"n_tx\", \"sum\"),\n",
" avg_n_isin_held = (\"n_isin_held\", \"mean\"),\n",
" aum_drawdown_last = (\"aum_drawdown\", \"last\"),\n",
" )\n",
")\n",
"\n",
"df_client_asset = df_client_asset.merge(\n",
" last_active_asset[[ID_COL, ASSET_COL, \"months_since_last_tx_asset\"]],\n",
" on=[ID_COL, ASSET_COL], how=\"left\"\n",
")\n",
"df_client_asset[\"months_since_last_tx_asset\"] = (\n",
" df_client_asset[\"months_since_last_tx_asset\"]\n",
" .fillna(df_client_asset[\"months_since_last_tx_asset\"].max() + 1)\n",
")\n",
"df_client_asset[\"gross_flow_to_aum\"] = np.where(\n",
" df_client_asset[\"aum_qty_mean\"] > 1,\n",
" df_client_asset[\"gross_flow_qty_sum\"] / df_client_asset[\"aum_qty_mean\"], np.nan\n",
")\n",
"df_client_asset[\"flow_direction_balance\"] = np.where(\n",
" df_client_asset[\"gross_flow_qty_sum\"] > 0,\n",
" df_client_asset[\"net_flow_qty_sum\"] / df_client_asset[\"gross_flow_qty_sum\"], np.nan\n",
")\n",
"df_client_asset[\"aum_final_to_peak\"] = np.where(\n",
" df_client_asset[\"aum_qty_max\"] > 0,\n",
" np.clip(df_client_asset[\"aum_qty_last\"] / df_client_asset[\"aum_qty_max\"], 0, 1), np.nan\n",
")\n",
"df_client_asset[\"log_aum_qty_mean\"] = np.log1p(\n",
" df_client_asset[\"aum_qty_mean\"].clip(lower=0)\n",
")\n",
"df_client_asset = df_client_asset[\n",
" (df_client_asset[\"n_months\"] >= 6) &\n",
" (df_client_asset[\"aum_qty_mean\"] > 0)\n",
"].copy()\n",
"\n",
"print(\"\\nAccounts per asset type:\")\n",
"print(df_client_asset.groupby(ASSET_COL)[ID_COL].nunique().sort_values(ascending=False))\n",
"\n",
"# Select asset types with enough accounts\n",
"min_accounts = 50\n",
"asset_counts = df_client_asset.groupby(ASSET_COL)[ID_COL].nunique()\n",
"valid_assets = asset_counts[asset_counts >= min_accounts].index.tolist()\n",
"print(f\"\\nRetained asset types (>= {min_accounts} accounts): {valid_assets}\")\n",
"\n",
"# Feature set\n",
"asset_features = [\n",
" \"flow_freq\", \"gross_flow_to_aum\", \"avg_n_isin_held\",\n",
" \"flow_direction_balance\", \"log_aum_qty_mean\",\n",
" \"months_since_last_tx_asset\", \"aum_final_to_peak\", \"aum_drawdown_last\",\n",
"]\n",
"\n",
"# Clustering loop\n",
"ASSET_RESULTS = {}\n",
"\n",
"for asset in valid_assets:\n",
" print(f\"\\n{'='*60}\")\n",
" print(f\"ASSET TYPE: {asset}\")\n",
" print(f\"{'='*60}\")\n",
"\n",
" df_a = df_client_asset[df_client_asset[ASSET_COL] == asset].copy()\n",
" feats = [c for c in asset_features if c in df_a.columns]\n",
"\n",
" d = df_a.copy()\n",
" d[\"flow_direction_balance\"] = d[\"flow_direction_balance\"].fillna(0)\n",
"\n",
" for col in [\"avg_n_isin_held\", \"months_since_last_tx_asset\",\n",
" \"aum_drawdown_last\", \"aum_final_to_peak\"]:\n",
" if col not in d.columns:\n",
" continue\n",
" d[col] = winsorize_mad(d[col], n_sigma=3)\n",
"\n",
" for col in [\"gross_flow_to_aum\"]:\n",
" if col not in d.columns:\n",
" continue\n",
" vals = d[col].to_numpy(dtype=float)\n",
" d[col] = np.log1p(np.clip(vals, 0, np.nanpercentile(vals, 90)))\n",
"\n",
" for col in [\"flow_freq\"]:\n",
" if col not in d.columns:\n",
" continue\n",
" vals = d[col].to_numpy(dtype=float)\n",
" d[col] = np.log1p(np.clip(vals, 0, None))\n",
"\n",
" X_a = d[feats].fillna(d[feats].median()).to_numpy()\n",
" X_a_scaled = RobustScaler().fit_transform(X_a)\n",
"\n",
" best_k, best_sil = 2, -1\n",
" rows_k = []\n",
" max_k = min(6, len(df_a) // 50)\n",
"\n",
" for k in range(2, max_k + 1):\n",
" km = KMeans(n_clusters=k, n_init=30, random_state=RANDOM_STATE)\n",
" labels = km.fit_predict(X_a_scaled)\n",
" sil = silhouette_score(X_a_scaled, labels)\n",
" db = davies_bouldin_score(X_a_scaled, labels)\n",
" rows_k.append({\"k\": k, \"silhouette\": round(sil, 4), \"davies_bouldin\": round(db, 4)})\n",
" if sil > best_sil:\n",
" best_sil, best_k = sil, k\n",
"\n",
" print(pd.DataFrame(rows_k).to_string(index=False))\n",
" print(f\"→ Retained K: {best_k} (silhouette={best_sil:.4f})\")\n",
"\n",
" km_final = KMeans(n_clusters=best_k, n_init=50, random_state=RANDOM_STATE)\n",
" cluster_col = f\"cluster_{asset.lower().replace(' ','_')}\"\n",
" df_a[cluster_col] = km_final.fit_predict(X_a_scaled)\n",
"\n",
" counts = df_a[cluster_col].value_counts().sort_index()\n",
" props = counts / counts.sum() * 100\n",
" print(pd.DataFrame({\"n_accounts\": counts, \"pct\": props.round(1)}))\n",
"\n",
" profile_vars_asset = [c for c in asset_features if c in df_a.columns]\n",
" prof = plot_heatmap(\n",
" df_a, profile_vars_asset, cluster_col,\n",
" title=f\"Cluster Signatures — {asset} (K={best_k}, robust z-score)\",\n",
" figsize=(14, 4)\n",
" )\n",
" print(f\"\\nMedians — {asset}:\")\n",
" print(prof.round(3).to_string())\n",
"\n",
" ASSET_RESULTS[asset] = {\n",
" \"df\": df_a, \"cluster_col\": cluster_col,\n",
" \"k\": best_k, \"silhouette\": best_sil, \"profile\": prof,\n",
" }\n",
"\n",
"print(\"\\n\" + \"=\"*60)\n",
"print(\"SUMMARY — Asset-type clustering\")\n",
"print(\"=\"*60)\n",
"for asset, res in ASSET_RESULTS.items():\n",
" print(f\" {asset:20s}: K={res['k']}, sil={res['silhouette']:.4f}, n={len(res['df'])}\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "05d06b16",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Available columns: ['Registrar Account - ID', 'cluster_k4', 'cluster_alternative', 'cluster_diversified', 'cluster_equity', 'cluster_fixed_income']\n",
"Shape: (7177, 6)\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABs8AAATNCAYAAAADs8oEAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdUVNfXxvGHau+IBcSCAVQsYIs9lliw925iLDGWGEuiRmOMJZZf7N3YEjX23nvvLRqjRmNvwYqIgCDM+4cwryMDYoER+H7Wcq3MuW3fO3PJ7Nn3nGNlMBgMAgAAAAAAAAAAACBrSwcAAAAAAAAAAAAAfCgongEAAAAAAAAAAAARKJ4BAAAAAAAAAAAAESieAQAAAAAAAAAAABEongEAAAAAAAAAAAARKJ4BAAAAAAAAAAAAESieAQAAAAAAAAAAABEongEAAAAAAAAAAAARKJ4BAAAAAAAAAAAAESieAQAAAGYcPnxY7u7umjhxYpwf6+bNm3J3d1ffvn3j/FjvQ9++feXu7q6bN2++9T7i4/pWqlRJlSpVirP9S9LEiRPl7u6uw4cPx+lxLCU+riEQ37hvAQAA8Dq2lg4AAAAA0Ttz5oz++OMPHTt2THfv3lV4eLgcHR3l5eWlevXqqUyZMpYO8b07evSoWrVqJUkaN26catSoYeGIEjd/f38tWrRIe/bs0aVLl+Tv76/kyZPLxcVFRYsWVe3atVW4cGFLhxmvPuT7rnXr1jpy5Ij++ecfi8WQFKxatUp9+vSRJC1dulSFChWycETmrVixQv369dPw4cPVoEGDGNe9efOmKleuHOt9Ozk5aceOHe8aYrzhvgUAAMD7RPEMAADgAxQeHq6RI0dq7ty5srW11ccff6xKlSrJ1tZWN27c0O7du7VmzRp9/fXX6tKli6XDfa+WLVsmSbKystLy5cuTRPEsS5Ys2rBhg9KkSROvxz148KB69OihR48eKVeuXKpUqZIcHBwUGBioS5cuacmSJZo3b56+//57ffbZZ/EamyUk5fsuOnPnzrV0CBaxbNkyWVlZyWAwaPny5R9s8exNpE2bVl27do3SPmnSJKVJkybKPR7ff4/eFvdtVEn1vgUAAHifKJ4BAAB8gMaNG6e5c+cqX758mjBhglxcXEyWBwcHa/78+fLz87NMgHEkICBAmzdvlru7uxwcHLR//37duXNH2bJls3RoccrOzk6urq7xesxz586pU6dOsrKy0qhRo1SnTh1ZWVmZrOPn56fffvtNAQEB8RqbpSTV+y4mr16DD0FwcLAWLVqkzz77LMpnNtK1a9d05swZ1axZ8433f/XqVR09elSVKlXS5cuXtX79evXr10/Jkyd/19AtKm3atOrWrVuU9kmTJkW7LCHgvo3qQ7xvAQAAEhrmPAMAAPjAXLt2TTNnzlT69Ok1c+ZMsz+CJU+eXO3bt9fXX39tbIuch+rGjRuaPXu2fHx85OnpaTKP1oULF9S9e3eVKlVKnp6eqlSpkoYNG6ZHjx5FOcbVq1fVr18/VapUSZ6enipRooTq1KmjYcOGyWAwGNe7e/euhg4dqqpVq6pQoUIqVqyYatSooYEDB+rJkydvdO7r1q1TUFCQ6tWrp7p16yo8PFwrVqwwu+7Lc9asXbtWdevWVaFChVS2bFkNHTpUwcHBJuuHhIRo3rx5ateunSpUqCBPT0+VKlVKXbt21dmzZ18bW3h4uCpWrKiSJUsqJCTE7DotW7ZU/vz59d9//xm3Wbp0qRo1aqQSJUqoUKFCKl++vDp16mQy1050c569z2v7qshrNHDgQNWtW9dsESJ9+vTq3r27vvzyy1jvd/ny5WrcuLG8vLzk5eWlxo0bR/seRjp27Jhat24tLy8vFStWTN26ddO1a9eirHfo0CH169dP1apVM+6/QYMGWrx4cazji87b3nfmxDSfW3TvdWzuN3d3dx05csT435H/Xt3X+fPn1aNHD5UtW1aenp6qWLGihgwZEuU+fzmWS5cuqUuXLipZsqTJfHbm5k5603tPkp4/f67p06erSpUqKliwoD799FNNnz5dN27ceOP5/hYsWKDhw4drwIABJn+LIl2/fl1t2rRR//79df/+/VjvN9Ly5cslyfh36MmTJ9q0aZPZdZ88eaLx48fLx8dHXl5e8vb21qeffqo+ffro1q1bxvWePXum2bNnq06dOipatKiKFCmiSpUqqXv37jp//nyU/W7btk2fffaZihcvroIFC6pWrVqaNWuWwsLCjOv07dtX/fr1kyT169fP5DPxLpYuXSp3d3f9+uuvZpcfPHhQ7u7uGjhwoLEt8nPi7++vgQMHqkyZMipYsKDq1aundevWmd2PwWDQsmXL1KxZM3l7e6tw4cJq0KCBsfdxbHDfJpz7FgAAIKGh5xkAAMAHZsWKFQoLC1OzZs3k4OAQ47r29vZR2oYMGaJTp06pQoUKqlixojJlyiTpRYGiffv2Cg0NVbVq1eTk5KQ///xTv//+u3bt2qXFixcrY8aMkiRfX181btxYQUFBqlChgnx8fBQUFKSrV69q4cKF6tOnj2xtbRUUFKTmzZvr1q1bKlOmjKpUqaLQ0FDdvHlTa9asUbt27d5o6K9ly5bJxsZGtWvXVurUqTVo0CCtWLFCnTt3jraHyYIFC7R3715VqlRJH3/8sfbu3at58+bp0aNHGj16tHG9x48f6+eff1axYsVUoUIFpU2bVjdu3NCOHTu0Z88ezZ8/P8ah2aytrdWoUSNNmDBBmzdvVu3atU2WX758WceOHdMnn3yirFmzSpJGjx5t/EG3Vq1aSpUqlXx9fXX8+HEdOHBAJUuWjPZ4b3JtI+cyiu0cRVevXtWxY8eUPXt21atX77Xr29rGLm0YOnSo5s2bpyxZsqhhw4aSpC1btqhfv346e/asBgwYEGWbP//8U9OnT1e5cuXUunVrXbx4UVu3btWxY8e0ZMkS5ciRw7jur7/+quvXr6tw4cLKmjWr/P39tW/fPg0cOFBXrlx5px9y3/W+exexvd+6du2qlStX6tatWybD7+XLl8/439u3b9c333wja2trVa5cWVmzZtWlS5c0f/587du3T0uWLFG6dOlMjn/t2jU1adJEbm5uql+/vvz8/GRnZ/fauGN770nS999/r9WrVytHjhxq2bKlQkJCNHfuXJ08efKNr9fnn3+uM2fOGIdWHDJkiPHvw/Xr19W6dWs9ePBAkydPfu17+aqwsDCtXLlS6dKlU8WKFeXp6akJEyZo+fLlUe4Vg8Ggdu3a6dSpU/L29la5cuVkbW2tW7duaceOHapbt66cnJwkSX369NHGjRvl7u6uBg0ayN7eXv/9958OHz6sv/76Sx4eHsb9jh49WjNmzFCWLFn06aefKk2aNDp27JhGjRqlU6dOacKECZKkKlWqyN/fX9u3b1flypVNPgfvombNmhoxYoSWLVumDh06RFm+dOlSSVLjxo1N2kNCQvT5558rMDBQderUUVBQkDZu3KhevXrp0aNHat26tcm16927t9atW6dcuXKpVq1asre31/79+9W/f39dunTJOOdcTLhvE859CwAAkNBQPAMAAPjAnDhxQpL08ccfv9X2//zzj1auXKns2bMb28LDw9WvXz8FBQVp5syZKleunHHZqFGjNGvWLP3yyy/6+eefJb0oePj7+5ud68rPz89YTDl48KBu3rypzz77TN9//73Jek+fPo3VD3kvx/3XX3+pbNmyypw5sySpatWqWrVqlQ4dOqRSpUqZ3e7AgQNavny58uTJI0nq0aOH6tatqw0bNui7775TlixZJEnp0qXTrl27jK8jXbx4UU2aNNHYsWM1Z86cGGNs1KiRpkyZoiVLlkQpnpn7QXnZsmVydHTUmjVrlCJFCpP1XzeE2Pu8tq/6888/JUnFixeXtfX7GYzi6NGjmjdvnlxdXbV48WJjYa9bt25q0qSJ5s2bp+rVq6tYsWIm2+3bt08//fSTmjVrZmxbtGiRfvzxRw0bNkzTpk0ztg8aNMikmCa96BnRsWNH/f7772rTpo3J5/5NvOt99y5ie79169ZNR44c0a1bt8wOsff
"text/plain": [
"<Figure size 1800x1200 with 8 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"============================================================\n",
"Global × Alternative\n",
"============================================================\n",
"\n",
"% per global cluster (each row sums to 100%):\n",
" Not exposed Asset C0 Asset C1\n",
"Global C0 86.5 6.2 7.3\n",
"Global C1 93.2 0.9 5.9\n",
"Global C2 48.8 10.8 40.4\n",
"Global C3 91.1 1.1 7.9\n",
"\n",
"% per asset cluster (each column sums to 100%):\n",
" Not exposed Asset C0 Asset C1\n",
"Global C0 21.3 34.6 12.0\n",
"Global C1 28.2 6.5 11.9\n",
"Global C2 9.5 47.9 52.5\n",
"Global C3 41.0 11.0 23.6\n",
"\n",
"============================================================\n",
"Global × Diversified\n",
"============================================================\n",
"\n",
"% per global cluster (each row sums to 100%):\n",
" Not exposed Asset C0 Asset C1\n",
"Global C0 31.6 40.5 27.9\n",
"Global C1 40.9 54.0 5.1\n",
"Global C2 21.2 61.7 17.1\n",
"Global C3 64.2 33.9 1.8\n",
"\n",
"% per asset cluster (each column sums to 100%):\n",
" Not exposed Asset C0 Asset C1\n",
"Global C0 14.6 18.6 54.6\n",
"Global C1 23.3 30.5 12.3\n",
"Global C2 7.8 22.4 26.5\n",
"Global C3 54.4 28.5 6.6\n",
"\n",
"============================================================\n",
"Global × Equity\n",
"============================================================\n",
"\n",
"% per global cluster (each row sums to 100%):\n",
" Not exposed Asset C0 Asset C1 Asset C2\n",
"Global C0 37.3 29.4 0.3 32.9\n",
"Global C1 44.2 6.8 0.0 49.1\n",
"Global C2 18.6 9.7 48.5 23.1\n",
"Global C3 70.7 3.2 0.2 25.9\n",
"\n",
"% per asset cluster (each column sums to 100%):\n",
" Not exposed Asset C0 Asset C1 Asset C2\n",
"Global C0 15.8 57.2 0.9 20.7\n",
"Global C1 23.1 16.2 0.0 38.0\n",
"Global C2 6.2 15.0 98.3 11.5\n",
"Global C3 54.9 11.6 0.9 29.8\n",
"\n",
"============================================================\n",
"Global × Fixed Income\n",
"============================================================\n",
"\n",
"% per global cluster (each row sums to 100%):\n",
" Not exposed Asset C0 Asset C1\n",
"Global C0 65.4 21.2 13.5\n",
"Global C1 72.0 24.1 3.9\n",
"Global C2 19.2 52.9 27.8\n",
"Global C3 34.5 61.5 4.0\n",
"\n",
"% per asset cluster (each column sums to 100%):\n",
" Not exposed Asset C0 Asset C1\n",
"Global C0 28.1 10.3 28.2\n",
"Global C1 38.2 14.4 10.1\n",
"Global C2 6.6 20.4 46.2\n",
"Global C3 27.2 54.9 15.5\n",
"\n",
"============================================================\n",
"Adjusted Rand Index — coherence between global and asset-type clusterings\n",
"============================================================\n",
"(1 = identical, 0 = independent, <0 = worse than random)\n",
"\n",
" Alternative : ARI=0.0274 (n=1164 shared accounts)\n",
" Diversified : ARI=0.0344 (n=3978 shared accounts)\n",
" Equity : ARI=0.1579 (n=3689 shared accounts)\n",
" Fixed Income : ARI=0.1112 (n=3742 shared accounts)\n",
"\n",
"============================================================\n",
"Multi-asset exposure by global cluster\n",
"============================================================\n",
"\n",
"Average number of asset types per global cluster:\n",
"cluster_k4\n",
"0 1.79\n",
"1 1.50\n",
"2 2.92\n",
"3 1.40\n",
"Name: n_asset_types, dtype: float64\n",
"\n",
"Distribution of asset type count per global cluster:\n",
" 0 asset type(s) 1 asset type(s) 2 asset type(s) 3 asset type(s) 4 asset type(s)\n",
"Global C0 0.0 49.1 29.3 14.9 6.7\n",
"Global C1 0.0 64.7 23.3 9.6 2.4\n",
"Global C2 0.8 17.5 13.7 24.9 43.1\n",
"Global C3 0.4 73.9 14.8 7.3 3.5\n"
]
}
],
"source": [
"# Step 1. Merge asset cluster labels into global dataframe\n",
"dfc_cross = dfc[[ID_COL, \"cluster_k4\"]].copy()\n",
"\n",
"for asset, res in ASSET_RESULTS.items():\n",
" cluster_col = res[\"cluster_col\"]\n",
" df_a = res[\"df\"][[ID_COL, cluster_col]].copy()\n",
" dfc_cross = dfc_cross.merge(df_a, on=ID_COL, how=\"left\")\n",
"\n",
"print(\"Available columns:\", dfc_cross.columns.tolist())\n",
"print(\"Shape:\", dfc_cross.shape)\n",
"\n",
2026-04-08 17:41:37 +02:00
"# Step 2. Contingency tables: global clusters × asset clusters\n",
"fig, axes = plt.subplots(2, 2, figsize=(18, 12))\n",
"axes = axes.flatten()\n",
2026-04-07 20:26:19 +02:00
"\n",
2026-04-08 17:41:37 +02:00
"for i, (asset, res) in enumerate(ASSET_RESULTS.items()):\n",
" cluster_col = res[\"cluster_col\"]\n",
" if cluster_col not in dfc_cross.columns:\n",
" continue\n",
" ct = pd.crosstab(\n",
" dfc_cross[\"cluster_k4\"],\n",
" dfc_cross[cluster_col].fillna(-1).astype(int),\n",
" normalize=\"index\"\n",
" ).round(3) * 100\n",
" col_names = {\n",
" c: f\"Asset C{c}\" if c >= 0 else \"Not exposed\"\n",
" for c in ct.columns\n",
" }\n",
" ct = ct.rename(columns=col_names)\n",
" ct.index = [f\"Global C{i}\" for i in ct.index]\n",
" sns.heatmap(\n",
" ct, cmap=\"Blues\", annot=True, fmt=\".1f\",\n",
" ax=axes[i], cbar_kws={\"label\": \"%\"},\n",
" vmin=0, vmax=100,\n",
" )\n",
" axes[i].set_title(f\"Global × {asset} (% per global cluster)\")\n",
" axes[i].set_xlabel(f\"{asset} cluster\")\n",
" axes[i].set_ylabel(\"Global cluster\")\n",
2026-04-07 20:26:19 +02:00
"\n",
2026-04-08 17:41:37 +02:00
"plt.suptitle(\"Cross-Analysis: Global Clustering × Asset-Type Clustering\",\n",
" fontsize=14, y=1.02)\n",
"plt.tight_layout()\n",
"plt.show()\n",
2026-04-07 20:26:19 +02:00
"\n",
2026-04-08 17:41:37 +02:00
"# Step 3. Detailed contingency tables (row % and column %)\n",
"for asset, res in ASSET_RESULTS.items():\n",
" cluster_col = res[\"cluster_col\"]\n",
" if cluster_col not in dfc_cross.columns:\n",
" continue\n",
" print(f\"\\n{'='*60}\")\n",
" print(f\"Global × {asset}\")\n",
" print(f\"{'='*60}\")\n",
" ct_row = pd.crosstab(\n",
" dfc_cross[\"cluster_k4\"],\n",
" dfc_cross[cluster_col].fillna(-1).astype(int),\n",
" normalize=\"index\"\n",
" ).round(3) * 100\n",
" ct_row.index = [f\"Global C{i}\" for i in ct_row.index]\n",
" ct_row.columns = [f\"Asset C{c}\" if c >= 0 else \"Not exposed\"\n",
" for c in ct_row.columns]\n",
" print(\"\\n% per global cluster (each row sums to 100%):\")\n",
" print(ct_row.to_string())\n",
2026-04-07 20:26:19 +02:00
"\n",
2026-04-08 17:41:37 +02:00
" ct_col = pd.crosstab(\n",
" dfc_cross[\"cluster_k4\"],\n",
" dfc_cross[cluster_col].fillna(-1).astype(int),\n",
" normalize=\"columns\"\n",
" ).round(3) * 100\n",
" ct_col.index = [f\"Global C{i}\" for i in ct_col.index]\n",
" ct_col.columns = [f\"Asset C{c}\" if c >= 0 else \"Not exposed\"\n",
" for c in ct_col.columns]\n",
" print(\"\\n% per asset cluster (each column sums to 100%):\")\n",
" print(ct_col.to_string())\n",
2026-04-07 20:26:19 +02:00
"\n",
2026-04-08 17:41:37 +02:00
"# Step 4. Adjusted Rand Index\n",
"from sklearn.metrics import adjusted_rand_score\n",
2026-04-07 20:26:19 +02:00
"\n",
"print(\"\\n\" + \"=\"*60)\n",
2026-04-08 17:41:37 +02:00
"print(\"Adjusted Rand Index — coherence between global and asset-type clusterings\")\n",
2026-04-07 20:26:19 +02:00
"print(\"=\"*60)\n",
2026-04-08 17:41:37 +02:00
"print(\"(1 = identical, 0 = independent, <0 = worse than random)\\n\")\n",
"\n",
"for asset, res in ASSET_RESULTS.items():\n",
" cluster_col = res[\"cluster_col\"]\n",
" if cluster_col not in dfc_cross.columns:\n",
" continue\n",
" mask = dfc_cross[cluster_col].notna()\n",
" labels_global = dfc_cross.loc[mask, \"cluster_k4\"].values\n",
" labels_asset = dfc_cross.loc[mask, cluster_col].values\n",
" ari = adjusted_rand_score(labels_global, labels_asset)\n",
" print(f\" {asset:20s} : ARI={ari:.4f} (n={mask.sum()} shared accounts)\")\n",
2026-04-07 20:26:19 +02:00
"\n",
2026-04-08 17:41:37 +02:00
"# Step 5. Multi-asset exposure by global cluster\n",
2026-04-07 20:26:19 +02:00
"print(\"\\n\" + \"=\"*60)\n",
2026-04-08 17:41:37 +02:00
"print(\"Multi-asset exposure by global cluster\")\n",
2026-04-07 20:26:19 +02:00
"print(\"=\"*60)\n",
"\n",
2026-04-08 17:41:37 +02:00
"asset_cols = [res[\"cluster_col\"] for res in ASSET_RESULTS.values()\n",
" if res[\"cluster_col\"] in dfc_cross.columns]\n",
"dfc_cross[\"n_asset_types\"] = dfc_cross[asset_cols].notna().sum(axis=1)\n",
2026-04-07 20:26:19 +02:00
"\n",
2026-04-08 17:41:37 +02:00
"print(\"\\nAverage number of asset types per global cluster:\")\n",
"print(dfc_cross.groupby(\"cluster_k4\")[\"n_asset_types\"].mean().round(2))\n",
2026-04-07 20:26:19 +02:00
"\n",
2026-04-08 17:41:37 +02:00
"print(\"\\nDistribution of asset type count per global cluster:\")\n",
"ct_multi = pd.crosstab(\n",
" dfc_cross[\"cluster_k4\"],\n",
" dfc_cross[\"n_asset_types\"],\n",
" normalize=\"index\"\n",
").round(3) * 100\n",
"ct_multi.index = [f\"Global C{i}\" for i in ct_multi.index]\n",
"ct_multi.columns = [f\"{c} asset type(s)\" for c in ct_multi.columns]\n",
"print(ct_multi.to_string())"
2026-04-07 20:26:19 +02:00
]
},
{
"cell_type": "markdown",
"id": "16115b05",
"metadata": {},
"source": [
"---\n",
"## 6. Part 2 — Top 400 Accounts Clustering\n",
"\n",
"### Objective\n",
"Focus on the accounts representing the highest AUM (> €5M as of October 2025), which together account for over 97% of total assets. On this restricted universe, the longer and denser time series allow for additional features — in particular, **lagged correlations between flows and fund performance** — that are too sparse to use on the full dataset.\n",
"\n",
"### Additional features (vs global clustering)\n",
"| Feature | Description |\n",
"|---|---|\n",
"| `corr_flow_fund_lag3` | Correlation between flow-to-AUM and fund return lagged 3 months |\n",
"| `corr_flow_fund_lag6` | Same at 6-month lag |\n",
"| `corr_flow_rate_lag3` | Correlation between flow-to-AUM and interest rate change lagged 3 months |\n",
"| `activity_intensity` | Number of transactions per month |\n",
"| `flow_to_aum_vol` | Volatility of the flow-to-AUM ratio |\n",
"\n",
"### Preprocessing\n",
"Identical to Part 1: MAD winsorization, clip p90 + log-transform for `gross_flow_to_aum` and `flow_freq`, RobustScaler.\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "083087d6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Selected accounts (AUM > €5M): 431\n",
"Accounts after quality filters: 427\n",
"Feature set: 13 features\n",
"['log_aum_qty_mean', 'flow_freq', 'gross_flow_to_aum', 'n_tx_total', 'n_isin_total', 'avg_holding_months_per_isin', 'exit_rate_per_isin', 'flow_direction_balance', 'aum_drawdown_last', 'months_since_last_tx', 'corr_flow_fund_lag3', 'corr_flow_fund_lag6', 'corr_flow_rate_lag3']\n",
"Accounts: 427 | Features: 13\n",
"Points > 5 std after scaling: 12 (2.8%)\n",
"Features with extreme values after scaling:\n",
"months_since_last_tx 12\n",
" k inertia silhouette davies_bouldin\n",
" 2 3292.213936 0.317621 1.447549\n",
" 3 2891.756531 0.158566 1.801947\n",
" 4 2599.086861 0.167596 1.749729\n",
" 5 2420.318207 0.166294 1.797919\n",
" 6 2302.475137 0.152136 1.803708\n",
" 7 2200.224213 0.148174 1.844575\n",
" 8 2127.271606 0.143511 1.938188\n",
" 9 2069.314390 0.118802 1.986200\n",
"10 1998.491936 0.113821 2.007113\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABdEAAAGMCAYAAAA1CuswAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA+EdJREFUeJzs3XdYFFfbBvB7d+ldiohgA12QLqgIdiyxJoqJWLAbS+zR1xYTRY2aRE2ixm7sNVGJxhZNojEKNlCwK1aKSFF63Z3vDz42roCAArvg/buuXMnOnDlzntkNM/vszHNEgiAIICIiIiIiIiIiIiKiQsSqHgARERERERERERERkbpiEp2IiIiIiIiIiIiIqBhMohMRERERERERERERFYNJdCIiIiIiIiIiIiKiYjCJTkRERERERERERERUDCbRiYiIiIiIiIiIiIiKwSQ6EREREREREREREVExmEQnIiIiIiIiIiIiIioGk+hERERERERERERERMVgEp2IiIiICjlw4ADs7e0RFRVV6fv29fXFzJkzK32/RERERERERWESnYiIiEiNFCSvIyIilJanpqbi448/houLC/755x8Vja78hIaGYuXKlUhJSVH1UCpMVFQU7O3tS/VPZf9YcejQIdjb26NJkyZFro+MjMSIESPQpEkTNG/eHP/73/+QlJRUqJ1cLseGDRvg6+sLFxcX9OzZE7///nuZx/Ptt9/C3t4ekydPLvO2VUVmZiZWrlyJCxcuqHooRERERFRGGqoeABERERG9WVpaGoYPH447d+5g1apVaNOmjaqH9M7CwsKwatUq9O7dG0ZGRkrrjh8/DpFIpKKRlR9TU1N8++23Sss2b96MZ8+eYdasWYXaVpb09HR899130NPTK3L9s2fPMHDgQBgaGmLKlCnIyMjAzz//jLt37+KXX36BlpaWou3333+P9evXo2/fvnBxccGff/6JqVOnQiQSoXv37qUajyAIOHLkCKytrfH3338jLS0NBgYG5RKrOsnMzMSqVaswfvx4eHl5qXo4RERERFQGTKITERERqbG0tDSMGDECt27dwqpVq9C2bVtVD6nCvZqkrcr09PTw0UcfKS07evQoUlJSCi2vTGvWrIG+vj68vLzw559/Flq/du1aZGZm4sCBA6hduzYAwNXVFcOGDcPBgwfh7+8PAIiLi8PmzZsxcOBAfPXVVwCATz75BAEBAfj222/RpUsXSCSSEsdz4cIFPHv2DFu3bsXIkSNx8uRJ9O7duxwjJiIiIiJ6NyznQkRERKSm0tPTMXLkSNy4cQMrV65Eu3bt3tg+LS0NX3/9NXx9feHs7Axvb28MGzYMN27cUGp37do1jBgxAp6ennBzc0NAQACuXLlSqjGdOXMGAwYMgLu7O5o0aYJRo0bh3r17hdpFRkZi0qRJaNGiBVxdXfHBBx/g+++/BwCsXLlScYd2hw4dCpU0Kaom+tOnTzFx4kQ0b94cbm5u6Nu3L06fPq3U5sKFC7C3t8fRo0exZs0atGnTBi4uLhgyZAgeP35cqvhUITExEbNnz4aPjw9cXFzw4Ycf4uDBg0ptCkrDbNq0CVu2bEH79u3h6uqKgIAA3L17t9T7evToEbZs2YJZs2ZBQ6Po+2n++OMPtGvXTpFABwAfHx/Ur18fx44dUyw7deoUcnNzMWDAAMUykUiE/v3749mzZwgLCyvVmA4fPoyGDRuiRYsW8Pb2xuHDh4tsFxcXh9mzZ6NVq1ZwdnaGr68v5s6di5ycHEWblJQULFq0SPH/QJs2bTB9+nSlUjSlOd4Fn6XXS68UvA8HDhxQLJs5cyaaNGmCuLg4fPbZZ2jSpAlatGiBb775BjKZTLGdt7c3AGDVqlWKz/zKlSsBAPHx8Zg1axbatGkDZ2dntGrVCmPHjlXJnAREREREVBjvRCciIiJSQ5mZmfj0009x/fp1/Pjjj2jfvn2J28ydOxcnTpxAQEAA7Ozs8PLlS1y5cgWRkZFwcnICAAQHB+PTTz+Fs7Mzxo8fD5FIhAMHDmDIkCHYtWsXXF1di+0/KCgIM2fORKtWrTBt2jRkZmZi9+7dGDBgAA4ePAgbGxsAwO3btzFw4EBoaGjA398f1tbWePLkCf766y9MmTIFnTp1wqNHj/D7779j1qxZqFGjBoDiS5okJCSgX79+yMzMxKBBg1CjRg0cPHgQY8eOxYoVK9CpUyel9hs2bIBIJMLw4cORlpaGjRs3Ytq0afjll19KdewrU1ZWFgYNGoQnT55g4MCBsLGxwfHjxzFz5kykpKRgyJAhSu2DgoKQnp6OAQMGIDs7G9u3b8eQIUNw+PBhmJubl7i/RYsWwcvLC23btlVKiBeIi4tDYmIinJ2dC61zdXVVqsd/69Yt6Onpwc7OrlC7gvVNmzZ943hycnLwxx9/YNiwYQCA7t27Y/bs2YiPj4eFhYXSuD7++GOkpqaib9++sLW1RVxcHE6cOIGsrCxoaWkhPT0dAwcORGRkJPr06QNHR0e8ePECf/31F+Li4mBqalrm411aMpkMI0aMgKurK6ZPn47g4GD8/PPPqFOnDgYMGABTU1PMmzcP8+bNQ6dOnRSfWXt7ewDAhAkTcP/+fQQEBMDa2hpJSUk4d+4cYmNjFf9fEREREZEKCURERESkNvbv3y9IpVKhffv2gpOTk3Dy5MlSb+vp6SkEBgYWu14ulwudO3cWhg8fLsjlcsXyzMxMwdfXVxg2bFihcTx9+lQQBEFIS0sTmjZtKsyZM0epz/j4eMHT01Np+cCBA4UmTZoI0dHRhfZfYOPGjUr9v6p9+/bCjBkzFK+//vprQSqVCpcuXVIsS0tLE3x9fYX27dsLMplMEARBCAkJEaRSqdC1a1chOztb0Xbr1q2CVCoV7ty5U+yxqSyjRo0S2rdvr3i9ZcsWQSqVCr/99ptiWU5OjuDv7y+4u7sLqampgiAIwtOnTwWpVCq4uroKz549U7S9du2aIJVKhUWLFpW477///ltwdHQU7t27JwiCIMyYMUNwd3dXahMeHi5IpVLh4MGDhbb/5ptvBKlUqji2o0aNEjp06FCoXUZGhiCVSoWlS5eWOKbjx48LUqlUePTokSAIgpCamiq4uLgImzdvVmo3ffp0wcHBQQgPDy/UR8Hn6scffxSkUqnwxx9/FNumtMe74LMUEhKi1E/B+7B//37FshkzZghSqVRYtWqVUttevXoJvXv3VrxOTEwUpFKpsGLFCqV2ycnJglQqFTZu3Fj0QSIiIiIilWM5FyIiIiI1lJCQAC0tLVhZWZV6GyMjI1y7dg1xcXFFrr916xYePXqEnj174sWLF0hKSkJSUhIyMjLg7e2NS5cuQS6XF7nt+fPnkZKSgu7duyu2S0pKglgshpubm6LsRVJSEi5duoQ+ffoolQMB8NaThZ45cwaurq5KdzXr6+vD398f0dHRuH//vlJ7Pz8/pbrqBds9ffr0rfZfkf755x9YWFigR48eimWampoYNGgQMjIycOnSJaX2HTt2hKWlpeK1q6sr3NzccObMmTfuJycnB4sXL0a/fv3QsGHDYttlZ2cDKLouvba2NoD8u+cL/l2adm9y+PBhODs7o169egAAAwMDtGvXTqmki1wux6lTp9C+fXu4uLgU6qPgc/XHH3/AwcGh0JMJr7Yp6/Eui/79+yu99vT0LFU5Fh0dHWhqauLixYtITk5+6/0TERERUcVhORciIiIiNTR//nwsXrwYI0eOxM6dO2Frawsgv2zEq/WdAcDY2BhaWlqYNm0aZs6ciXbt2sHJyQlt27ZFr169UKdOHQD59bABYMaMGcXuNzU1FcbGxoWWF2xbXLkLAwMDAP8lqqVSaemDLUFMTAzc3NwKLS84JjExMUr7ez15b2RkBCC/XnZxcnJy3jqBqampCRMTk7faNjo6GvXq1YNYrHxvS0GJlJiYGKXlBcnmV71eq7woW7ZswYsXLzBhwoQ3titIgL9aZ7xAQYJdR0dH8e/StCtOSkoKzpw5g4CAAKWa9R4eHjhx4gQePnyIBg0aICkpCWlpaWjUqNEb+3vy5Ak6d+78xjZlPd6lpa2tXagckbG
"text/plain": [
"<Figure size 1500x400 with 3 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"K=2 | sil=0.3176 | db=1.4475\n",
" n_accounts pct\n",
"cluster_k2 \n",
"0 325 76.1\n",
"1 102 23.9\n",
"\n",
"K=4 | sil=0.1676 | db=1.7497\n",
" n_accounts pct\n",
"cluster_k4 \n",
"0 127 29.7\n",
"1 67 15.7\n",
"2 38 8.9\n",
"3 195 45.7\n",
"\n",
"K=5 | sil=0.1663 | db=1.7979\n",
" n_accounts pct\n",
"cluster_k5 \n",
"0 67 15.7\n",
"1 37 8.7\n",
"2 62 14.5\n",
"3 137 32.1\n",
"4 124 29.0\n",
"\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABYoAAAGGCAYAAADVZbXpAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XlcTPv/B/DXTBuKSMmePVtRKOtFssZFESFLIdn3XbL2RVxaEBHZ9+1asu/Zd7JLdi2otM+c3x9+zW00qdzmTsvr+XjM49Gc8zmn97zndJp5z2feRyQIggAiIiIiIiIiIiIiKrDEqg6AiIiIiIiIiIiIiFSLhWIiIiIiIiIiIiKiAo6FYiIiIiIiIiIiIqICjoViIiIiIiIiIiIiogKOhWIiIiIiIiIiIiKiAo6FYiIiIiIiIiIiIqICjoViIiIiIiIiIiIiogKOhWIiIiIiIiIiIiKiAo6FYiIiIiIiIiIiIqICjoViIiLKk6ysrDB16lRVh6ESU6dOhZWVlarDICIVOnLkCCwsLPD9+3dVh6Jye/fuhbGxMe7fv6/qUOj/bdu2Da1atUJSUpKqQyEiIqJsYKGYiIhylbCwMLi5uaFNmzYwMTGBubk5evfujY0bNyIhIeE/iSE+Ph7e3t64evXqf/L7UkVFRWH+/Pno0KEDTE1N0aRJE/To0QNLlizJ1cWg1atX4+TJk6oOI9ewsrKCsbFxpre9e/f+p3GFhYXBxMQkw4JadHQ0Zs2ahcaNG6N+/fpwdHTEw4cPFe7r1KlT6N69O0xMTNCqVSt4eXkhJSUlW/GcO3cOxsbGaN68OaRS6W89prxgy5YtOf5cSyQSeHt7o1+/ftDW1pYtt7KygouLS7rx+/fvR61ateDs7IzExMTf/r0fPnyAj48PevTogUaNGsHS0hKOjo64fPnyb+8zLzt37hy8vb1VHUauZGtri+TkZGzfvl3VoRAREVE2qKs6ACIiolRnz57FmDFjoKmpia5du6JGjRpITk7GzZs3sWTJEjx//hzz5s1Tehzx8fHw8fHByJEjYWlpqfTfBwBfv36FnZ0dYmNjYWdnhypVquDr16948uQJtm3bBgcHB1lBaN68eRAE4T+JKyv8/PzQvn17WFtbqzqUXGH69Olyhf3z58/j77//xrRp01CiRAnZcnNz8/80roULF0JdXV3hDD+pVIqhQ4fiyZMncHZ2RokSJbB161Y4Ojpi7969qFSpkmzsuXPnMGLECFhYWGDWrFl4+vQpVq1ahcjISMyZMyfL8Rw8eBDlypXDu3fvcOXKFTRt2jQnHmaus23bNpQoUQK2trY5ts8zZ87g1atX6NWrV6ZjDx48iGnTpqFp06ZYuXIltLS0fvv3njp1CmvXroW1tTW6d++OlJQUHDhwAIMGDcLChQthZ2f32/vOi86dO4ctW7Zg1KhRqg4l19HS0kK3bt2wYcMGODo6QiQSqTokIiIiygIWiomIKFd48+YNxo0bh7Jly2Ljxo0oVaqUbF3fvn3x+vVrnD17VnUB5oC4uDgUKVJE4brdu3fj/fv32LZtW7oCYmxsLDQ0NGT30/6cX0mlUiQnJ/+ropaq/Fwwj4iIwN9//w1ra2uUL19eJTFduHABFy9exODBg7Fq1ap0648dO4bbt29jxYoV6NChAwCgY8eOaN++Pby9vbF06VLZ2MWLF8PY2Bjr16+HuvqPl5La2trw8/ND//79UbVq1UzjiYuLw+nTpzF+/Hjs3bsXhw4dyreFYmXYs2cPzM3NYWho+Mtxhw8fxtSpU9G4ceN/XSQGAEtLS5w5cwZ6enqyZQ4ODujatSu8vLxyrFCcl//+86vExERoaGhALM76F1I7duwIf39/XLlyBU2aNFFidERERJRT2HqCiIhyBX9/f8TFxWHBggVyReJURkZGGDBgQIbbe3t7w9jYON3y1N6Vb9++lS27f/8+nJ2dYWlpCVNTU1hZWWHatGkAgLdv38re0Pr4+MjaBKT9evGLFy8wevRoWFhYwMTEBLa2tjh16pTC33vt2jW4u7ujSZMmaNmyZYbxh4WFQU1NDfXr10+3TkdHR65goqhH8ZcvXzBp0iSYm5ujYcOGmDJlCh4/fpyuxcHUqVNhZmaGT58+Yfjw4TAzM0Pjxo2xaNEiSCQSuX2uW7cOvXv3luXJ1tYWx44dkxtjbGyMuLg47Nu3T5ar1N7RGfVSVvRcGRsbY+7cuTh48CBsbGxgYmKCCxcuAAA+ffokmxFZt25d2NjYYPfu3en2u2nTJtjY2KBevXpo1KgRbG1tcejQIUXpVrmUlBT4+vrC2toadevWhZWVFZYtW5Zutm9qK4GLFy+ia9euMDExQadOnXD8+PEs/67k5GQsWLAA/fv3R8WKFRWOCQoKgr6+Ptq1aydbpqenh44dO+LUqVOyuJ4/f47nz5/D3t5eViQGgD59+kAQBAQFBWUpphMnTiAhIQEdOnSQPR5FLRESExPh7e2N9u3bw8TEBM2bN8fIkSMRFhYmGyOVSrFx40Z06dIFJiYmaNy4MZydneXaa2Q13z//raf6uSd66t/3zZs34eHhIWvXMWLECERFRclt9+zZM1y7dk329+Ho6Ajgx/Pi4+ODdu3awcTEBJaWlnBwcMClS5d+mbvExERcuHAh08L6kSNHMGnSJFhYWGDVqlU5UnStXr26XJEYADQ1NdGyZUt8/PgRsbGxv7XfX/39P3r0CIMHD4a5uTnMzMwwYMAA3LlzR+F+EhIS4ObmBktLS5ibm2Py5Mn49u1but+Vlec4s+dn6tSp2LJli2yfqbeMpJ73FN2y0m8/K+e3T58+Yfr06WjevLnsOJ89e7bccf7mzRvZ/6969erB3t4+3YewV69ehbGxMQ4fPoy//voLLVq0QL169WTP7927d+Hs7IwGDRqgXr166NevH27evJku5rp166J48eLp/j8SERFR7sUZxURElCucOXMGFSpUUPrX8SMjI2VfrR86dCiKFSuGt2/f4sSJEwB+FMfc3d3h7u6Otm3bom3btgAgKwA8e/YMDg4OMDQ0xJAhQ1CkSBEcPXoUI0aMgLe3t2x8qjlz5kBPTw8jRoxAXFxchnGVK1cOEokEBw4cQPfu3bP1mKRSKVxdXXHv3j04ODigSpUqOHXqFKZMmaJwvEQigbOzM0xNTTF58mQEBwdj/fr1qFChAvr06SMbFxgYCCsrK3Tp0gXJyck4fPgwxowZAz8/P7Rq1QrAj9mlM2fOhKmpKezt7QEgw2JkZq5cuYKjR4+ib9++KFGiBMqVK4eIiAjY29tDJBKhb9++0NPTw/nz5zFjxgzExsZi4MCBAICdO3di/vz5aN++Pfr374/ExEQ8efIEd+/eRZcuXX4rHmWaOXMm9u3bh/bt22PQoEG4d+8e/Pz88OLFC/j6+sqNDQ0Nxbhx49C7d290794de/bswZgxY+Dv749mzZpl+rs2btyI6OhoDB8+PMMCc0hICGrXrp1utqCJiQl27NiBV69ewdjYGI8ePZItT8vQ0BClS5dGSEhIlh7/oUOHYGlpCQMDA9jY2GDp0qU4ffo0OnbsKBsjkUjg4uKC4OBg2NjYoH///vj+/TsuXbqEp0+fyo6zGTNmYO/evfjjjz/Qo0cPSCQS3LhxA3fv3pXFmZ18Z8f8+fNRrFgxjBw5Eu/evcPGjRsxd+5cLF++HMCPNiTz5s1DkSJFMGzYMACAvr4+gB8fRPn5+aFnz54wNTVFbGwsHjx4gIcPH/7yeX3w4AGSk5NRu3btDMcEBQVh0qRJaNiwIVavXo1ChQqlG/Pt27d0Hw4pUrhwYRQuXPiXY8LDw7M07lcU/f0/e/YMffv2hba2NgYPHgx1dXXs2LEDjo6O2Lx5M+rVqye3j7lz58qej1evXmHbtm14//49Nm3alO3WB5k9P7169cLnz59x6dIlLF68ONP9tW3bNt258eHDh9i4cWO64vvPsnJ++/TpE3r06IGYmBjY29ujSpUq+PTpE4KCgpCQkABNTU1ERESgd+/eiI+Ph6OjI0qUKIF9+/bB1dUVXl5e6f5/rVy5EhoaGnB2dkZSUhI0NDQQHBy
"text/plain": [
"<Figure size 1600x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"=== Cluster medians — K=2 ===\n",
" log_aum_qty_mean gross_flow_to_aum flow_freq n_tx_total n_isin_total avg_holding_months_per_isin exit_rate_per_isin flow_direction_balance aum_drawdown_last aum_final_to_peak months_since_last_tx corr_flow_fund_lag3 corr_flow_fund_lag6 corr_flow_rate_lag3\n",
"cluster_k2 \n",
"0 11.425 4.452 0.786 1872.0 32.0 48.347 0.637 0.073 1.00 0.00 0.0 0.054 0.050 -0.041\n",
"1 10.777 2.652 0.154 24.0 7.0 35.523 0.381 0.347 0.17 0.83 2.0 0.024 0.014 -0.025\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABX8AAAGGCAYAAAAjAPI0AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XVcVecfwPEPIAhIl4GAgoKoYHc7Z8zN2uzWObs2W7fZztoMDOyuGdjtcHbNFjsQMUjpEO7vD35cvbTs4kX2fb9e56Xn3Oec8zwP59b3Puf7aCkUCgVCCCGEEEIIIYQQQggh8hRtTVdACCGEEEIIIYQQQgghhPpJ8FcIIYQQQgghhBBCCCHyIAn+CiGEEEIIIYQQQgghRB4kwV8hhBBCCCGEEEIIIYTIgyT4K4QQQgghhBBCCCGEEHmQBH+FEEIIIYQQQgghhBAiD5LgrxBCCCGEEEIIIYQQQuRBEvwVQgghhBBCCCGEEEKIPEiCv0IIIYQQQgghhBBCCJEHSfBXCCGEABo2bMiYMWM0XQ2NGDNmDA0bNtR0NYQQGnTgwAGqVq1KZGSkpquicQsXLsTFxYXg4GBNV0X835w5c2jbtq2mqyGEEEJ8liT4K4QQIk/z9fXll19+4YsvvsDNzY2KFSvSoUMH1q5dS0xMzCepQ3R0NAsXLuTChQuf5HzJgoODmTp1Kk2bNsXd3Z0aNWrw3XffMXv27Fwd4Fm6dCnHjh3TdDVyjYYNG+Li4pLpsnPnzk9aL19fX9zc3HBxceHmzZupHg8LC+Pnn3+mevXqlC9fnq5du3L79u00j3X8+HFat26Nm5sb9evXZ8GCBbx79+6j6nPy5ElcXFyoXbs2iYmJ2WrT52Djxo1q/1snJCSwcOFCunTpQoECBZTbGzZsSN++fVOV9/LywtXVld69exMbG/uvzp3e9bxs2bJ/ddzP0d69e1mzZo2mq5Erde/enbt373L8+HFNV0UIIYT47OTTdAWEEEKInOLt7c3QoUPR09OjZcuWODs7Ex8fz5UrV5g9ezYPHz5kypQpOV6P6OhoPDw8GDRoENWqVcvx8wGEhoby7bffEhERwbfffoujoyOhoaHcu3ePzZs307FjR2WQZ8qUKSgUik9Sr6zw9PSkSZMmNGrUSNNVyRXGjRunEqz/+++/2bdvH2PHjsXc3Fy5vWLFip+0XtOnTydfvnzExcWleiwxMZEffviBe/fu0bt3b8zNzdm0aRNdu3Zl586dFCtWTFn25MmTDBw4kKpVq/Lzzz9z//59lixZQlBQEJMmTcpyffbs2YOtrS0vXrzg/Pnz1KxZUx3NzHU2b96Mubk5bdq0Udsx//rrL548eUL79u0zLbtnzx7Gjh1LzZo1Wbx4Mfnz5//X569VqxYtW7ZU2Va6dOl/fdzPzb59+3jw4AE9evTQdFVyHWtra7744gtWrVrFF198oenqCCGEEJ8VCf4KIYTIk54/f87w4cMpUqQIa9euxcbGRvlY586defbsGd7e3pqroBpERUVhaGiY5mPbt2/H39+fzZs3pwoKRkREoKurq1z/8P95VWJiIvHx8WoJVH1qKYPggYGB7Nu3j0aNGlG0aFGN1OnUqVOcPn2a77//niVLlqR6/NChQ1y9epX58+fTtGlTAJo1a0aTJk1YuHAhc+fOVZadNWsWLi4urFq1inz5kj6aFihQAE9PT7p164aTk1Om9YmKiuLEiRP8+OOP7Ny5k7179+bZ4G9O2LFjBxUrVqRgwYIZltu/fz9jxoyhevXqagv8AhQrVixV8Fed3r17R2JiInp6ejl2DvFxMnr/Sk+zZs0YOnQoz58/x87OLodqJoQQQuQ9kvZBCCFEnrRixQqioqKYNm2aSuA3mYODA927d093/+Scjynt3LkTFxcX/Pz8lNtu3rxJ7969qVatGu7u7jRs2JCxY8cC4OfnR40aNQDw8PBQ3tK8cOFC5f6PHj1iyJAhVK1aFTc3N9q0aZPq1tbk8168eJGJEydSo0YN6tWrl279fX190dHRoXz58qkeMzIyUgnapJXzNyQkhJEjR1KxYkUqV67M6NGjuXv3bqr0AmPGjKFChQq8fv2aAQMGUKFCBapXr87MmTNJSEhQOebKlSvp0KGDsp/atGnDoUOHVMq4uLgQFRXFrl27lH2VnIs5vdzEaf2tXFxcmDx5Mnv27KF58+a4ublx6tQpAF6/fq0cuVi2bFmaN2/O9u3bUx13/fr1NG/enHLlylGlShXatGnD3r170+pujXv37h2LFi2iUaNGlC1bloYNG/L777+nGpWbfBv/6dOnadmyJW5ubnz11VccOXIky+eKj49n2rRpdOvWDXt7+zTLHD58GCsrKxo3bqzcZmFhQbNmzTh+/LiyXg8fPuThw4e0a9dOGfgF6NSpEwqFgsOHD2epTkePHiUmJoamTZsq25NWOoLY2FgWLlxIkyZNcHNzo3bt2gwaNAhfX19lmcTERNauXcs333yDm5sb1atXp3fv3iqpLbLa3ymf68lS5hhPfn5fuXKFGTNmKFNlDBw4UCXvbMOGDXnw4AEXL15UPj+6du0KJP1dPDw8aNy4MW5ublSrVo2OHTty5syZDPsuNjaWU6dOZRosP3DgACNHjqRq1aosWbJE7T+kxMTE/OsUEpD0muvi4sLKlStZs2YNjRo1ws3NjUePHgFw7tw5OnXqRPny5alcuTL9+/dXPpZSSEgIQ4cOpWLFilSrVo2pU6eq1DH5XGml4Uj5t4+IiGDatGk0bNiQsmXLUqNGDXr27KlMhdK1a1e8vb158eKF8m+bUS72MWPGpJsyI61r7kNZvVYePXrE0KFDqV69Ou7u7jRp0oQ//vhDpcydO3f4/vvvqVixIhUqVKB79+5cu3ZNpUxm718nT55U/k0qVKjADz/8wIMHD1LVO/kaldQPQgghxMeRkb9CCCHypL/++gs7O7scvxU+KChIeVv7Dz/8gImJCX5+fhw9ehRICnhNnDiRiRMn8uWXX/Lll18CKIOVDx48oGPHjhQsWJA+ffpgaGjIwYMHGThwIAsXLlSWTzZp0iQsLCwYOHAgUVFR6dbL1taWhIQEdu/eTevWrT+qTYmJifTv358bN27QsWNHHB0dOX78OKNHj06zfEJCAr1798bd3Z1Ro0Zx7tw5Vq1ahZ2dHZ06dVKWW7duHQ0bNuSbb74hPj6e/fv3M3ToUDw9Palfvz6QNAp0woQJuLu7065dO4B0A4yZOX/+PAcPHqRz586Ym5tja2tLYGAg7dq1Q0tLi86dO2NhYcHff//N+PHjiYiIUN5uvW3bNqZOnUqTJk3o1q0bsbGx3Lt3j+vXr/PNN99kqz45acKECezatYsmTZrQs2dPbty4gaenJ48ePWLRokUqZZ8+fcrw4cPp0KEDrVu3ZseOHQwdOpQVK1ZQq1atTM+1du1awsLCGDBgQLpBYx8fH0qXLo22tuo4Azc3N7Zu3cqTJ09wcXHhzp07yu0fKliwIIUKFcLHxydL7d+7dy/VqlXD2tqa5s2bM3fuXE6cOEGzZs2UZRISEujbty/nzp2jefPmdOvWjcjISM6cOcP9+/eV19n48ePZuXMndevW5bvvviMhIYHLly9z/fp1ZT0/pr8/xtSpUzExMWHQoEG8ePGCtWvXMnnyZObNmwckpQCZMmUKhoaG9OvXDwArKysg6cclT09P2rZti7u7OxEREdy6dYvbt29n+He9desW8fHxGaZZOHz4MCNHjqRy5cosXboUfX39VGXevn2b6geftBgYGGBgYKCybdeuXWzatAmFQoGTkxP9+/f/18+znTt3EhsbS7t27dDT08PU1JSzZ8/Sp08fihYtyqBBg4iJiWHDhg107NiRnTt3phpJP2zYMGxtbfnpp5+4du0a69evJywsjFmzZn10fX799VcOHz5Mly5dcHJyIjQ0lCtXrvDo0SPKlClDv379CA8P59WrV8ofDz/Mv5xS+/btlT8sJjt16hR79+7FwsIiw7pk5Vq5e/cunTt3Jl++fLRv3x5bW1t8fX05ceIEw4cPB5Levzp37kyBAgX4/vvvyZcvH1u3bqV
"text/plain": [
"<Figure size 1600x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"=== Cluster medians — K=5 ===\n",
" log_aum_qty_mean gross_flow_to_aum flow_freq n_tx_total n_isin_total avg_holding_months_per_isin exit_rate_per_isin flow_direction_balance aum_drawdown_last aum_final_to_peak months_since_last_tx corr_flow_fund_lag3 corr_flow_fund_lag6 corr_flow_rate_lag3\n",
"cluster_k5 \n",
"0 10.975 1.488 0.557 819.0 25.0 52.905 0.567 -0.487 1.000 0.000 0.0 0.002 0.016 -0.024\n",
"1 11.174 1.389 0.043 4.0 2.0 42.429 0.250 0.557 0.303 0.697 19.0 -0.000 -0.007 -0.012\n",
"2 10.357 4.383 0.372 90.5 12.5 32.149 0.434 0.287 0.077 0.923 1.0 0.042 0.025 -0.034\n",
"3 11.045 5.471 0.777 1448.0 24.0 40.857 0.688 0.245 1.000 0.000 0.0 0.009 -0.008 0.005\n",
"4 11.994 5.155 0.926 4935.5 47.5 57.100 0.620 0.037 1.000 0.000 0.0 0.158 0.130 -0.140\n",
"seg_2D\n",
"Highly active 137\n",
"Dormant 136\n",
"Small rebalancers 77\n",
"Occasional large movers 77\n",
"Name: count, dtype: int64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAxYAAAHqCAYAAACZcdjsAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA3hxJREFUeJzs3Xd8E/X/B/DXJW1pC7SlLS0bhC5GS8seQqGATGUJOABBVhEQRRAR/SmKlq/gYMkQZAmIgIAoQ7YoZRTK3rPM7gFt6Uju90dIyLikd1l3Sd7Px8OH9HK5e+dG8nnfZzEsy7IghBBCCCGEEAvIxA6AEEIIIYQQ4vgosSCEEEIIIYRYjBILQgghhBBCiMUosSCEEEIIIYRYjBILQgghhBBCiMUosSCEEEIIIYRYjBILQgghhBBCiMUosSCEEEIIIYRYjBILQgghhBBCiMUosSCEmOXevXsIDw/H8uXL7b7v+fPnIzw83O77VRsyZAiGDBki2v6JIbGvCUIIIYCb2AEQ4mrOnj2LrVu34tixY7h//z78/PzQuHFjvPfee3jhhRd01h0yZAiOHz8OAGAYBt7e3qhcuTKioqLQp08ftG3bltc+P/roI2zZskXzt1wuR+XKldGkSROMGzcOISEh1vuARKO4uBjr16/Hli1bkJKSAplMhuDgYDRp0gTDhg1DvXr1xA7RrrZv347MzEwMGzbMrPcXFhZi2bJlaNGiBVq2bGnd4CRE+743Zfz48ZgwYYIdIlLJy8tD165dkZWVhblz56Jbt246rxcXF2Pu3LnYtm0b8vLyEB4ejvfee4/ze+rUqVOYPXs2Ll68iAoVKqB79+54//33Ub58ed7x3LhxAz169ICHhwf+++8/+Pj4WPwZpcjS+4YQe6LEghA7W7ZsGU6dOoVu3bohPDwc6enpWLt2Lfr164cNGzYgLCxMZ/0qVapg0qRJAFQFqzt37mDPnj34448/0L17d8yePRvu7u5l7tfDwwMzZ84EACgUCqSkpODXX3/F4cOH8ddffyE4ONj6H9ZGxo4di9GjR4sdRpneffdd/PPPP+jZsycGDBiA0tJS3Lx5EwcPHkRMTIzLJRZ//vknrl27ZlFisWDBAowfP94gsXCUa4KP+Ph4vPrqq5q/z507hzVr1iA+Ph5169bVLLd3Dc28efPw9OlTo69/9NFH2L17N4YOHYo6depgy5YtGD16NFatWoVmzZpp1rt06ZImsf7oo4/w6NEj/Pzzz7h9+zaWLVvGO54//vgDlStXRm5uLnbv3o0BAwZY9PmkytL7hhB7osSCEDsbNmwY5syZAw8PD82yHj164OWXX8bSpUsxZ84cnfUrVqyI3r176yybPHkyZs6ciXXr1qF69eqYMmVKmft1c3Mz2E50dDTGjBmDQ4cOYeDAgRZ8Kvtyc3ODm5t1vr5YlkVRURE8PT2tsj21s2fP4sCBA3j//fcRHx+v85pCoUBeXp5V9+fqrHlNiE3/CX+5cuWwZs0atGnTRrSamqtXr2L9+vV45513MG/ePIPXz549i7/++gsffvghRowYAQDo06cPevXqhTlz5uDXX3/VrPvdd9/Bx8cHa9asQYUKFQAANWrUwCeffIJ///0XL774YpnxsCyL7du3o1evXrh37x7++OMPp00sCHEk1MeCEDtr0qSJTlIBAHXq1EFoaChu3rzJaxtyuRyffPIJQkJCsHbtWjx+/NisWAIDAzXb05aXl4evvvoKsbGxaNSoEbp06YKlS5dCqVRybmfDhg3o3LkzGjVqhP79++Ps2bM6r1++fBkfffQROnXqhMjISLRt2xbTpk1Ddna2Zp1du3YhPDycswnIr7/+ivDwcFy9ehUAd3v60tJSLFy4UBNHXFwcvvvuOxQXF+usFxcXhzFjxuDw4cPo168foqKiNIWezZs3Y+jQoWjdujUaNWqEHj16YN26dXwOpYG7d+8CUJ1vfXK5HJUqVdJZlpqaimnTpqFNmzZo1KgRevbsiU2bNhm89/79+4iPj0d0dDRat26Nr7/+GocPH0Z4eDiOHTumWW/IkCHo1asXLl++jMGDB6Nx48bo0qULdu3aBQA4fvw4BgwYgKioKHTt2hVHjhwx2BefmI4dO4bw8HDs2LEDixYtQvv27REZGYm33noLd+7c0Ynn4MGDuH//PsLDwxEeHo64uDgAz5vQ9OvXD02bNkV0dDTeeOMNHD16VPP+e/fuoXXr1gCABQsWaLYxf/58ANa5JpKSkvDqq68iMjISnTp1wtatWw2OiZSsXbsWPXv2RKNGjfDiiy9ixowZBgmr+jo4f/48XnvtNURFRSEuLg7r168XtK+vvvoKnTt31ql50LZr1y7I5XIMGjRIs6xcuXJ49dVXkZycjIcPHwIAnjx5giNHjuCVV17RJBUA0Lt3b3h7e2Pnzp284jl58iTu37+PHj16oEePHkhKSsKjR48M1lMqlVi1ahVefvllREZGolWrVhgxYgTOnTuns962bdvw6quvonHjxmjevDnefPNN/Pvvvzrr8DnecXFx+Oijjwzi0O+XZY37BgDWrFmDnj17auLu168ftm/fzusYEmILzvF4hxAHx7IsMjIyEBoayvs9crkcPXv2xNy5c3Hy5El06NChzPdkZWUBUP3Y3r17F3PmzIGfnx86duyoWaewsBCDBw9GamoqXnvtNVStWhXJycn47rvvkJ6ejunTp+ts888//0R+fj4GDRoEhmGwbNkyTJgwAXv37tU00Tpy5Aju3r2Lfv36oXLlyrh27Rp+++03XL9+Hb/99hsYhkGHDh00BYsWLVro7GPHjh0IDQ01aCam7ZNPPsGWLVvQtWtXDB8+HGfPnsWSJUtw48YNLFy4UGfdW7du4YMPPsCgQYMwcOBATd+W9evXIzQ0FHFxcXBzc8OBAwcwY8YMsCyLN998s8zjq61atWoAVO2jmzRpYvJpekZGBgYOHAiGYfDmm2/C398f//zzD6ZPn44nT55omkAUFBTgrbfeQnp6OoYOHYrAwED8+eefOgmFttzcXMTHx6NHjx7o1q0b1q9fj0mTJkGpVOLrr7/Ga6+9hl69emH58uV49913cfDgQU1hj29Maj/99BMYhsHbb7+NJ0+eYNmyZZg8eTI2btwIQNW85/Hjx3j06BGmTZsGAJr29E+ePMHGjRvRq1cvDBgwAPn5+di0aRNGjhyJjRs3on79+vD398fnn3+Ozz//HF26dEGXLl0AmG4OJOSauHPnDiZOnIhXX30Vffv2xebNm/HRRx+hYcOGgu5Le5k/fz4WLFiANm3a4PXXX8etW7ewfv16nDt3DuvXr9dpHpmbm4vRo0eje/fu6NmzJ3bu3InPP/8c7u7uOk2ujNm5cyeSk5OxY8cO3L9/n3OdS5cuoU6dOjrJAgBERUVpXq9atSquXLmC0tJSNGrUSGc9Dw8P1K9fH5cuXeL1+bdv345atWohKioKYWFh8PT0xJ9//omRI0fqrDd9+nT8/vvvaN++PV599VUoFAokJSXhzJkziIyMBKBKVOfPn4+YmBi8++67cHd3x5kzZ3D06FFN7YmQ4y2EJffNb7/9hpkzZ6Jr164YOnQoioqKcOXKFZw5cwYvv/yyWfEQYjGWECK6rVu3smFhYezGjRt1lg8ePJjt2bOn0fft2bOHDQsLY1etWmVy+1OnTmXDwsIM/mvXrh17/vx5nXUXLlzIRkdHs7du3dJZPmfOHLZ+/frsgwcPWJZl2bt377JhYWFsixYt2JycHM16e/fuZcPCwtj9+/drlhUWFhrE9Oeff7JhYWHsiRMnNMsmTZrEtm7dmi0tLdUsS0tLYyMiItgFCxZols2bN48NCwvT/H3p0iU2LCyMnT59us4+Zs2axYaFhbGJiYmaZR07dmTDwsLYf/75xyAmrjjffvtttlOnTjrLBg8ezA4ePNhgXW1KpZIdPHgwGxYWxrZp04adNGkS+8svv7D37983WPfjjz9m27Zty2ZlZeksf//999mmTZtq4vr555/ZsLAwds+ePZp1nj59ynbr1o0NCwtjjx49qhNjWFgYu337ds2yGzdusGFhYWxERAR
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"=== Overall churn rates ===\n",
"churn_hard 0.684\n",
"churn_soft 0.775\n",
"churn_warning 0.321\n",
"dtype: float64\n",
"\n",
"=== Churn rates by cluster — K=2 ===\n",
" n_accounts churn_hard churn_soft churn_warning\n",
"cluster_k2 \n",
"0 325 0.883 0.969 0.385\n",
"1 102 0.049 0.157 0.118\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAxYAAAGGCAYAAADmRxfNAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAUBxJREFUeJzt3Xl8TGf///F3MhFEKkgItUuJJUGspVSJllprp0LtS6ml1FJqK419S1u72Ktp7beg1ZZqhdutlKpSu6CxxFKSSEzm94df5mtkkWSSTMLr+Xh43M051znXZ+Y+mZz3nOs6x85kMpkEAAAAAFawt3UBAAAAALI+ggUAAAAAqxEsAAAAAFiNYAEAAADAagQLAAAAAFYjWAAAAACwGsECAAAAgNUIFgAAAACsRrAAAAAAYDWCBWBjnp6emjRpkq3LeCEFBATI09NT4eHhti7F7ODBg/L09NTBgwdtXQrwwuvdu7fGjh1r6zISFBMTo3r16mnt2rW2LgUwI1gA6eTSpUsaN26cfH195e3trSpVqqhjx45auXKloqKibF1eqsSdiMf9q1Chgho0aKDJkyfr3r17qdpnWFiYAgICdPLkyTSuNvP5/vvv1atXL9WsWVNeXl6qU6eOBg8erJCQkAyr4bffflNAQECq///KDOLCV3L+ZbQFCxbI09NTzZo1S3D9b7/9pk6dOqlSpUp67bXXNHnyZD148CBeu+joaM2YMUN16tRRxYoV1a5dO/36668prmfw4MHy9PTUjBkzUrxtVpFenyGHDx/Wr7/+qt69e5uXxR17O3futGgbHR2tvn37qmzZsvr222+t6jckJESjR49Wo0aNVKlSJfn6+mrMmDG6fv26Rbts2bKpe/fuWrhwoR4+fGhVn0BacbB1AcDzaM+ePRo8eLAcHR3VsmVLlSlTRjExMTp8+LBmzJihM2fO6NNPP7V1mak2YcIEOTk5KTIyUiEhIVq9erVOnDihr776KsX7un79uj7//HMVLlxY5cqVS4dqbc9kMunjjz/Wxo0bVb58eXXv3l1ubm66ceOGvv/+e3Xr1k1fffWVqlSpku61HDlyRJ9//rlatWql3Llzp3t/6cHDw0PTp0+3WDZ79mw5OTmpX79+NqpK+ueff7Ro0SI5OTkluP7kyZPq1q2bPDw8NGrUKP3zzz9avny5Lly4oKVLl1q0HTVqlHbt2qWuXbuqRIkS2rRpk/r06aOVK1eqWrVqyarn/v37+umnn1S4cGFt375dw4cPl52dndWvM7NJr8+QZcuWqVatWipevHiS7WJiYjRo0CDt3btXn376qdq2bWtVvzNmzNDdu3fVuHFjlShRQpcvX9aaNWu0Z88ebd68Wfnz5ze3bd26tWbOnKlt27ZZ3S+QFggWQBq7fPmyhg4dqpdfflkrV65UgQIFzOs6d+6sixcvas+ePRlaU2xsrGJiYpQ9e/Y02V+jRo2UL18+SVLHjh01dOhQBQcH69ixY6pYsWKa9PE8Wb58uTZu3Kj33ntPo0ePtji569+/vzZv3iwHh6z9cRwZGamcOXNmSF9ubm5q2bKlxbIlS5Yob9688ZZnpGnTpqlSpUqKjY3V7du3462fPXu2cufOrdWrV8vZ2VmSVKRIEY0dO1a//PKL6tSpI0k6duyYtm/frhEjRqhnz56SpHfeeUfNmjXTzJkztX79+mTVs2vXLsXGxuqzzz7Te++9p0OHDqlGjRpp9Gqfb7du3dLevXs1YcKEJNvFxMRoyJAh2rNnjyZNmqR27dpZ3ffo0aNVtWpV2dv/36CSunXrys/PT2vWrNHQoUPNy3Pnzq06depo06ZNBAtkCgyFAtLY0qVLFRERoSlTpliEijjFixfXe++9F2/57t271axZM3l5ealp06b6+eefLdaPGjVKDRo0iLdd3PCkJ8XN29i6dauaNm0qb29v7du3Txs3bpSnp6cOHz4sf39/vfrqq6pcubIGDBhg1TyDuG9QL126ZF52584dTZs2Tc2bN5ePj4+qVKmiXr166a+//jK3OXjwoPmP4ejRo83DVzZu3Ghu8/vvv6tnz56qWrWqKlWqJD8/Px0+fNii//v372vKlClq0KCBvLy8VKtWLXXv3l0nTpxIVv23b9/W4MGDVaVKFdWsWVOTJ0+2GFrg5+enFi1aJLhto0aNzCd/CYmKitLixYtVqlQpjRw5MsFvjN95550kA1mDBg00atSoeMu7dOmiLl26WCxbvXq1mjZtqkqVKql69epq3bq1tm3bJunxsRL3Tb+vr6/5/Q4NDTVvv2XLFrVu3VoVK1ZUjRo1NHToUF27di1ev82aNdMff/yhzp07q1KlSpo9e3ai9dvK5cuXNWjQINWoUUOVKlVS+/bt44X6uKEtwcHBmj17tl577TVVrlxZ/fr1i/e6k3Lo0CHt2rVLH3/8cYLr79+/r/3796tFixbmUCFJLVu2lJOTk3bs2GFetnPnThkMBnXo0MG8LHv27Grbtq2OHDmS7Lq2bdum2rVr69VXX5WHh4f5OHja2bNnNXjwYL366quqWLGiGjVqpDlz5li0CQsL08cff6w6derIy8tLDRo00Pjx4xUdHW1uk5z3O+4z6MljTkp4blHccXbmzBl16dJFlSpVUt26dbVkyRKL7ZL6DLlw4YI++OADvfbaa/L29tbrr7+uoUOH6t9//03yvduzZ48ePXqk2rVrJ9rm0aNH+vDDD/XDDz9owoQJat++fZL7TK7q1atbhIq4ZXny5NG5c+fita9du7YOHz6sO3fupEn/gDWy9ldkQCb0008/qWjRoika1nL48GF99913evfdd5UrVy6tXr1agwYN0k8//aS8efOmqo4DBw5ox44d6ty5s/LmzavChQubx9VPnjxZuXPn1sCBA3XlyhWtXLlSkyZN0ty5c1PVV9xJwpNDay5fvqzdu3ercePGKlKkiG7evKmvv/5afn5+2r59u9zd3eXh4aFBgwZp/vz56tChg6pWrSpJ5vcuJCREvXv3lpeXlwYOHCg7OzvzN//r1q0zn4yPHz9eu3btkp+fnzw8PHTnzh0dPnxYZ8+eVYUKFZ5Z/5AhQ1S4cGENGzZMR48e1erVq3Xv3j3zSXjLli01duxYnT59WmXKlDFvd+zYMV24cEH9+/dPdN9xf/C7du0qg8GQwnc2ZYKCgjR58mQ1atRIXbt21cOHD3Xq1Cn9/vvvat68ud58801duHBB//nPfzR69GjzsRV39WnBggWaN2+e3n77bbVt21bh4eFas2aNOnfurM2bN1v8/3vnzh317t1bTZs2VYsWLeTq6pqury2lbt68qY4dOyoyMlJdunRR3rx5tWnTJvXv31/z58/Xm2++adF+wYIFsrOzU+/evXXr1i2tXLlS3bp105YtW5QjR44k+zIajeYhMInN6zh16pQePXokLy8vi+WOjo4qV66cxfyAkydPqkSJEhYBRJL5eD958qQKFSqUZE1hYWE6ePCgpk6dKklq2rSpVq5cqU8++USOjo7mdn/99Zc6d+4sBwcHdejQQYULF9alS5f0448/mr8ZDwsLU9u2bfXvv/+qffv2KlWqlMLCwrRr1y5FRUXJ0dExxe93ct29e1e9evXSm2++qbffflu7du3SzJkzVaZMGdWrVy/Jz5Do6Gj17NlT0dHR8vPzk5ubm8LCwrRnzx7du3dPL730UqL9HjlyRHny5FHhwoUTXG80GvXhhx/q+++/17hx49SxY8d4bWJiYp4ZYOLkyZMnXph40oMHD/TgwYME/x5UqFBBJpNJR44cUf369ZPVH5BeCBZAGrp//77CwsLk6+ubou3Onj2r4OBgFStWTJJUs2ZNtWzZUtu3b5efn1+qajl//ry2bdumV155xbws7uQlT548Wr58ufnb89jYWK1evVr//vtvkn9s49y9e1fS4+EvBw4c0Lp165QvXz5Vr17d3MbT01O7du2y+GPZsmVLvf322/r22281YMAAubm56fXXX9f8+fNVuXJli2EsJpNJEyZMUM2aNbV06VJzrR07dlTTpk01d+5cLV++XJK0d+9etW/f3uJb/ScnXD5LkSJFtGDBAkm
"text/plain": [
"<Figure size 800x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"=== Churn rates by cluster — K=5 ===\n",
" n_accounts churn_hard churn_soft churn_warning\n",
"cluster_k5 \n",
"0 67 0.612 0.925 0.955\n",
"1 37 0.108 0.297 0.108\n",
"2 62 0.000 0.032 0.048\n",
"3 137 0.964 0.978 0.117\n",
"4 124 0.927 0.984 0.403\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAxYAAAGGCAYAAADmRxfNAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAVPZJREFUeJzt3XlYVOX///EXjCAg7rjlLgku4JZ7mqWW5poLLrnkWprmkqWYfdwNcynFzF3cM8olza2sNEs0M0szU3NHTVHMUkBwZn5/+GO+jiwCBxjQ5+O6vC7nzDn3ec/hzMx5zbnvc5ysVqtVAAAAAGCAs6MLAAAAAJD9ESwAAAAAGEawAAAAAGAYwQIAAACAYQQLAAAAAIYRLAAAAAAYRrAAAAAAYBjBAgAAAIBhBAsAAAAAhhEsAAfz9fXVxIkTHV3GY2nOnDny9fVVZGSko0ux2b9/v3x9fbV//35HlwI89vr37693333X0WUk6saNG6pWrZp2797t6FIAG4IFkEHOnz+vsWPHqkmTJvL391eNGjXUpUsXLV++XDExMY4uL03iD8Tj/1WuXFmNGzfW5MmT9e+//6apzStXrmjOnDk6duxYOleb9Xz99dfq16+f6tSpIz8/PzVo0EBDhw5VWFhYptXwyy+/aM6cOWn+e2UF8eErJf8y27x58+Tr66tWrVol+vwvv/yirl27qmrVqnr66ac1efJk3b59O8F8sbGxmj59uho0aKAqVaooICBAP/74Y6rrGTp0qHx9fTV9+vRUL5tdZNRnyMGDB/Xjjz+qf//+tmnx+9727dvt5o2NjdVrr72mChUq6PPPPze03vXr1ye5P0dERNjmy58/vzp27KjZs2cbWh+QnnI4ugDgUbRr1y4NHTpUrq6uatu2rXx8fBQXF6eDBw9q+vTp+uuvvzRp0iRHl5lm48ePl4eHh6KjoxUWFqaVK1fq6NGj+uSTT1Ld1tWrV/XRRx+pePHiqlixYgZU63hWq1XvvPOO1q9fr0qVKql3797y8vJSRESEvv76a/Xq1UuffPKJatSokeG1HDp0SB999JHatWunPHnyZPj6MoK3t7emTZtmN+2DDz6Qh4eHBgwY4KCqpL///lsLFiyQh4dHos8fO3ZMvXr1kre3twIDA/X3339r6dKlOnv2rBYvXmw3b2BgoHbs2KGePXuqTJky2rBhg1599VUtX75cNWvWTFE9t27d0nfffafixYtry5Yteuutt+Tk5GT4dWY1GfUZsmTJEtWrV0+lS5dOdr64uDgNGTJEu3fv1qRJk9SxY8d0Wf+QIUNUokQJu2kPvme7du2qlStXKiwsTPXq1UuX9QJGECyAdHbhwgUNHz5cTzzxhJYvX67ChQvbnuvWrZvOnTunXbt2ZWpNFotFcXFxypkzZ7q016xZMxUoUECS1KVLFw0fPlxbt27V4cOHVaVKlXRZx6Nk6dKlWr9+vV555RWNHj3a7uBu4MCB2rhxo3LkyN4fx9HR0XJ3d8+UdXl5ealt27Z20xYtWqT8+fMnmJ6Z3n//fVWtWlUWi0U3btxI8PwHH3ygPHnyaOXKlfL09JQklShRQu+++65++OEHNWjQQJJ0+PBhbdmyRSNHjlTfvn0lSS+99JJatWqlGTNmaO3atSmqZ8eOHbJYLHrvvff0yiuv6MCBA6pdu3Y6vdpH2/Xr17V7926NHz8+2fni4uI0bNgw7dq1SxMnTlRAQEC61fDMM8/I398/2Xm8vb3l4+OjDRs2ECyQJdAVCkhnixcvVlRUlKZMmWIXKuKVLl1ar7zySoLpO3fuVKtWreTn56eWLVvq+++/t3s+MDBQjRs3TrBcfPek+8WP29i0aZNatmwpf39/7dmzx3aK/eDBgwoKClLdunVVrVo1DRo0yNA4g/hfUM+fP2+b9s8//+j9999X69atVb16ddWoUUP9+vXTn3/+aZtn//79tl/3Ro8ebTvdv379ets8v/32m/r27aunnnpKVatWVffu3XXw4EG79d+6dUtTpkxR48aN5efnp3r16ql37946evRoiuq/ceOGhg4dqho1aqhOnTqaPHmy7ty5Y3u+e/fuatOmTaLLNmvWzHbwl5iYmBgtXLhQ5cqV06hRoxL9xfill15KNpA1btxYgYGBCab36NFDPXr0sJu2cuVKtWzZUlWrVlWtWrXUvn17bd68WdK9fSX+l/4mTZrYtnd4eLht+S+++ELt27dXlSpVVLt2bQ0fPlyXL19OsN5WrVrp999/V7du3VS1alV98MEHSdbvKBcuXNCQIUNUu3ZtVa1aVZ06dUoQ6uO7tmzdulUffPCBnn76aVWrVk0DBgxI8LqTc+DAAe3YsUPvvPNOos/funVLe/fuVZs2bWyhQpLatm0rDw8Pbdu2zTZt+/btMplM6ty5s21azpw51bFjRx06dCjFdW3evFn169dX3bp15e3tbdsPHnTq1CkNHTpUdevWVZUqVdSsWTN9+OGHdvNcuXJF77zzjho0aCA/Pz81btxY48aNU2xsrG2elGzv+M+g+/c5KfGxRfH72V9//aUePXqoatWqatiwoRYtWmS3XHKfIWfPntUbb7yhp59+Wv7+/nrmmWc0fPhw/ffff8luu127dunu3buqX79+kvPcvXtXb775pr755huNHz9enTp1SrbNtLh165bMZnOy89SvX1/fffedrFZruq8fSK3s/RMZkAV99913KlmyZKq6tRw8eFBfffWVXn75ZeXKlUsrV67UkCFD9N133yl//vxpqmPfvn3atm2bunXrpvz586t48eK2fvWTJ09Wnjx5NHjwYF28eFHLly/XxIkTNWvWrDStK/4g4f7T9BcuXNDOnTvVvHlzlShRQteuXdOnn36q7t27a8uWLSpSpIi8vb01ZMgQBQcHq3PnznrqqackybbtwsLC1L9/f/n5+Wnw4MFycnKy/fK/Zs0a28H4uHHjtGPHDnXv3l3e3t76559/dPDgQZ06dUqVK1d+aP3Dhg1T8eLFNWLECP36669auXKl/v33X9tBeNu2bfXuu+/qxIkT8vHxsS13+PBhnT17VgMHDkyy7YMHD+qff/5Rz549ZTKZUrllUyc0NFSTJ09Ws2bN1LNnT925c0fHjx/Xb7/9ptatW+v555/X2bNn9eWXX2r06NG2fSv+7NO8efM0e/Zsvfjii+rYsaMiIyO1atUqdevWTRs3brT7+/7zzz/q37+/WrZsqTZt2qhgwYIZ+tpS69q1a+rSpYuio6PVo0cP5c+fXxs2bNDAgQMVHBys559/3m7+efPmycnJSf3799f169e1fPly9erVS1988YXc3NySXZfZbLZ1gUlqXMfx48d19+5d+fn52U13dXVVxYoV7cYHHDt2TGXKlLELIJJs+/uxY8dUrFixZGu6cuWK9u/fr6lTp0qSWrZsqeXLl+t///ufXF1dbfP9+eef6tatm3LkyKHOnTurePHiOn/+vL799lsNHz7c1lbHjh3133//qVOnTipXrpyuXLmiHTt2KCYmRq6urqne3il18+ZN9evXT88//7xefPFF7dixQzNmzJCPj48aNWqU7GdIbGys+vbtq9jYWHXv3l1eXl66cuWKdu3apX///Ve5c+dOcr2HDh1Svnz5VLx48USfN5vNevPNN/X1119r7Nix6tKlS4J54uLiHhpg4uXLl0/Ozva/9fbs2VNRUVFycXFRgwYNFBgYqDJlyiRYtnLlylq2bJlOnjxp9/kEOALBAkhHt27d0pUrV9SkSZNULXfq1Clt3bpVpUqVkiTVqVNHbdu21ZYtW9S9e/c01XLmzBlt3rxZTz75pG1a/MFLvnz5tHTpUtuv5xaLRStXrtR///2X7JdtvJs3b0q61/1l3759WrNmjQoUKKBatWrZ5vH19dWOHTvsvizbtm2rF198UZ9//rkGDRokLy8vPfPMMwoODla1atXsurFYrVaNHz9ederU0eLFi221dunSRS1bttSsWbO0dOlSSdLu3bvVqVMnu1/17x9w+TAlSpTQvHnzJN3rrubp6ak1a9aoT58+qlChgpo3b65JkyZp06ZNeuutt2zLbdq0SR4eHnr
"text/plain": [
"<Figure size 800x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===== Cluster profile — K=2 =====\n",
" n_accounts aum_mean_med freq_med gross_flow_to_aum_med n_tx_med holding_med exit_rate_med flow_dir_med drawdown_med months_inactive_med corr_fund_lag3_med corr_rate_lag3_med\n",
"cluster_k2 \n",
"0 325 91586.099 0.786 4.452 1872.0 48.347 0.637 0.073 1.00 0.0 0.054 -0.041\n",
"1 102 47913.297 0.154 2.652 24.0 35.523 0.381 0.347 0.17 2.0 0.024 -0.025\n",
"\n",
"===== Cluster profile — K=5 =====\n",
" n_accounts aum_mean_med freq_med gross_flow_to_aum_med n_tx_med holding_med exit_rate_med flow_dir_med drawdown_med months_inactive_med corr_fund_lag3_med corr_rate_lag3_med\n",
"cluster_k5 \n",
"3 137 62616.679 0.777 5.471 1448.0 40.857 0.688 0.245 1.000 0.0 0.009 0.005\n",
"4 124 161746.356 0.926 5.155 4935.5 57.100 0.620 0.037 1.000 0.0 0.158 -0.140\n",
"0 67 58391.143 0.557 1.488 819.0 52.905 0.567 -0.487 1.000 0.0 0.002 -0.024\n",
"2 62 31466.909 0.372 4.383 90.5 32.149 0.434 0.287 0.077 1.0 0.042 -0.034\n",
"1 37 71234.484 0.043 1.389 4.0 42.429 0.250 0.557 0.303 19.0 -0.000 -0.012\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuEAAAKyCAYAAAB7WgDLAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzsvXecZUWZPv7UyTffvp17untyACY4IAxJUIKKoCxZlMEALKyk1d0V+MpPQREUWZRkZAUWwQSigCQFUYkSlDBMYlJP93TuvvneE+v3R9Wpus3MwIAjDO55Ph8+9Nx77jl16tSpeut9n/d5CaWUIkKECBEiRIgQIUKECG8blHe6AREiRIgQIUKECBEi/F9DZIRHiBAhQoQIESJEiPA2IzLCI0SIECFChAgRIkR4mxEZ4REiRIgQIUKECBEivM2IjPAIESJEiBAhQoQIEd5mREZ4hAgRIkSIECFChAhvMyIjPEKECBEiRIgQIUKEtxmRER4hQoQIESJEiBAhwtuMyAiPECFChAgRIkSIEOFtRmSER9hl8Ktf/Qrz589Hf3//O92UdxTLly/H8uXL3+lmRNgGomcTIQLw4osvYuHChRgYGHinm7JNXHXVVTjhhBPe6WZEiPCGiIzwdylCg/Wll15607+t1Wq47rrr8PTTT/8DWrbr4o9//COuu+66t/WaF154IebPny/+W7p0KQ499FCcd955ePDBBxEEwU65zvPPP4/rrrsOxWJxp5zvnUbYb3vuuSfq9fpW32/cuFH06f/8z/+86fMPDw/juuuuw8qVK3dGc3cZLF++fMp4295/b/d7UCwWsd9++2H+/Pl44IEHtvrecRx861vfwoEHHojFixfjhBNOwOOPP77Ncz3//PM4+eSTsWTJEhxwwAG47LLLUKlU3lR71q1bh/nz52PRokX/NO/MtnDPPffg5ptv3unn/fa3v40jjzwS06ZNE58tX74cRx111FbHPvnkk1iyZAmOOeYY5PP5v+u6hxxyyDbH85e//OUpx33qU5/CqlWr8PDDD/9d14sQ4R8N7Z1uQIS3H7VaDddffz3OOeccLFu27J1uztuGP/7xj7jttttw7rnnvq3XNQwDl112GQDAtm0MDAzgD3/4A8477zzss88++N73vodkMimOfytG5V//+ldcf/31OOaYY5BOp3da299JaJqGer2ORx55BB/5yEemfHfPPffANE3Ytv2Wzj0yMoLrr78e06ZNw2677bbDv3srz+btxFlnnYXjjz9e/Pull17CrbfeirPOOguzZs0Sn8+fP/9tbde11167zc1UiAsvvBAPPvggTj31VMyYMQN33XUX/vVf/xW33HIL3vve94rjVq5ciU9/+tOYPXs2LrzwQgwNDeHHP/4xNm7ciBtvvHGH23P33XejtbUVhUIBDz744D+t1/Tee+/F2rVr8elPf3qnnXPlypV44okn8LOf/ewNj33yySdx1llnYebMmbjpppuQzWb/7uvvtttu+MxnPjPls5kzZ075d2trKw499FD8+Mc/xqGHHvp3XzNChH8UIiM8wk5DtVpFPB5/p5vxtoJSCtu2YVnWdo/RNA1HH330lM8+//nP44c//CH++7//GxdffDG+853viO8Mw/hHNfddBcMwsOeee+K3v/3tVkb4vffei/e///148MEH35a21Go1xGKxXf7ZHHDAAVP+bZombr31Vuy///7v2IZ7zZo1+OlPf4rPfe5zuPbaa7f6/sUXX8Rvf/tbfPGLX8Rpp50GAPiXf/kXHHXUUbjqqqumGHtXX3010uk0br31VrFx7e7uxsUXX4zHHnsMBx544Bu2h1KKe+65B0cddRT6+/tx9913/9Ma4f8I3Hnnnejq6sJ73vOe1z3uL3/5C/7t3/4NM2bM2GkGOAC0t7dvNZ9uC0cccQTOP/98bN68GT09PTvl2hEi7GxEdJR/Ilx44YVYunQphoeH8bnPfQ5Lly7Fvvvui29+85vwfR8A0N/fj/322w8AcP31128zPL1u3TrhpV20aBGOPfbYrcJ6IR3mL3/5Cy655BLst99+OPjgg1+3fevWrcP555+PfffdF4sXL8aHPvQhfPvb337d32wvdH7IIYfgwgsvFP92XRfXX389PvjBD2LRokVYtmwZTj75ZBHSvvDCC3HbbbeJc4b/hQiCADfffDOOPPJILFq0CPvvvz++/OUvo1AobHXdM888E3/+859x7LHHYvHixTvkEdoW/vVf/xUHHnggHnjgAWzYsEF8vi3e8a233oojjzwSS5Yswd57741jjz0W99xzDwDguuuuw5VXXgkAOPTQQ8W9hdz6O++8E6eeeir2228/LFy4EB/5yEdw++23b7NPzzzzTDz77LM4/vjjsWjRIhx66KH49a9/vdWxxWIRl19+OQ455BAsXLgQBx10EL74xS9iYmJCHOM4Dq699locfvjhWLhwIQ4++GBceeWVcBxnh/voqKOOwp/+9KcplIEXX3wRGzdu3GboO5/P45vf/CY++tGPYunSpdhzzz1x+umnY9WqVeKYp59+WniLL7roItFfv/rVrwDIsPrLL7+MT37yk1iyZAmuvvpq8V3js7nggguwaNEirFu3bko7TjvtNOy9994YHh7e4Xt9O3HbbbfhyCOPxMKFC3HggQfi0ksv3YqW0dgPH//4x7F48WIccsgh+OlPf/qmrvX1r38dhx122BSPdiMeeOABqKqKk046SXxmmiaOP/54/PWvf8Xg4CAAoFwu44knnsDHPvaxKZGjo48+GvF4HPfff/8Otee5557DwMAAPvKRj+AjH/kInn32WQwNDW11XBAEuOWWW/DRj34UixYtwr777ovTTjttKwrgb37zGxx//PHi3fzkJz+Jxx57bMoxO9Lfr53TQrx2zD399NOYP38+7rvvPnzve9/DQQcdhEWLFuFTn/oUNm3aNOV3jz76KAYGBsQYP+SQQ8T3rzenvB4efvhh7LvvviCEbPeYZ599FmeeeSZ6e3tx0003oamp6Q3P+2bgOA6q1errHrP//vuL9kaIsKsi8oT/k8H3fZx22mlYvHgxvvjFL+LJJ5/Ej3/8Y/T09OATn/gEcrkcLrnkElxyySU4/PDDcfjhhwOQ4em1a9fi5JNPRnt7O8444wyxuJ199tm47rrrxPEhLr30UuRyOZx99tmvOymuWrUKn/zkJ6FpGk466SRMmzYNfX19eOSRR/D5z3/+777v66+/Hj/4wQ9wwgknYPHixSiXy3j55ZexYsUKHHDAATjppJMwMjKCxx9/XBisjfjyl7+Mu+66C8ceeyyWL1+O/v5+3HbbbXjllVfw05/+FLqui2M3bNiA//iP/8BJJ52EE088catQ6JvBxz72MTz22GN44okntnueX/ziF7jsssvwoQ99CKeeeips28bq1avxwgsv4KMf/SgOP/xwbNy4Effeey8uuugiseDlcjkAwE9/+lPMnTsXhxxyCDRNwx/+8AdceumloJTik5/85JRrbdq0Ceeffz6OP/54HHPMMbjzzjtx4YUXYo899sDcuXMBAJVKBZ/85Cexbt06HHfccdh9990xOTmJRx55BMPDw8jlcgiCAP/2b/+G5557DieeeCJmz56NNWvW4JZbbsHGjRvx3e9+d4f65/DDD8dXvvIVPPTQQ8JwvvfeezFr1izsvvvuWx2/efNm/P73v8eHP/xhdHd3Y2xsDD//+c9xyimn4Le//S3a29sxe/ZsnHfeebj22mtx0kknYa+99gIA7LnnnuI8+XweZ5xxBo488kh87GMfQ3Nz8zbb96UvfQlPPfUULrjgAvz85z+Hqqr42c9+hsceewxXXnkl2tvbd+g+305cd911uP7667H//vvj5JNPxoYNG/DTn/4UL7300lZjvVAo4F//9V9xxBFH4Mgjj8T999+PSy65BLquT6G9bA/3338//vrXv+K+++7bbhLfypUrMWPGjCmGNQAsXrxYfN/Z2YnVq1fD8zwsXLhwynGGYWC33XbbYX7/Pffcg97eXixevBjz5s2DZVm49957cfrpp0857ktf+hJ+9atf4aCDDsLxxx8P3/fx7LP
"text/plain": [
"<Figure size 800x700 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# ============================================================\n",
"# PART 2 — TOP 400 ACCOUNTS CLUSTERING\n",
"# ============================================================\n",
"\n",
"# 6a. Account selection — AUM > €5M as of October 2025\n",
"ref_date = pd.Timestamp(\"2025-10-01\")\n",
"df_ref = df_aum[df_aum[\"month\"] == ref_date]\n",
"\n",
"aum_account = (\n",
" df_ref.groupby(ID_COL)[AUM_VAL_COL].sum()\n",
" .reset_index().sort_values(AUM_VAL_COL, ascending=False)\n",
")\n",
"aum_account = aum_account[aum_account[AUM_VAL_COL] > 5_000_000]\n",
"selected_accounts = aum_account[ID_COL]\n",
"print(f\"Selected accounts (AUM > €5M): {len(selected_accounts)}\")\n",
"\n",
"df_month_top400 = df_month[df_month[ID_COL].isin(selected_accounts)].copy()\n",
"\n",
"# ── 6b. Feature engineering ───────────────────────────────────────────────\n",
"dfc_top400 = df_client_base[df_client_base[ID_COL].isin(selected_accounts)].copy()\n",
"\n",
"dfc_top400[\"log_aum_qty_mean\"] = np.log1p(dfc_top400[\"aum_qty_mean\"].clip(lower=0))\n",
"dfc_top400[\"log_gross_flow_qty_mean\"] = np.log1p(dfc_top400[\"gross_flow_qty_mean\"].clip(lower=0))\n",
"dfc_top400[\"gross_flow_to_aum\"] = dfc_top400[\"gross_flow_qty_sum\"] / (dfc_top400[\"aum_qty_mean\"].abs() + EPS)\n",
"dfc_top400[\"flow_direction_balance\"] = np.where(\n",
" dfc_top400[\"gross_flow_qty_sum\"] > 0,\n",
" dfc_top400[\"net_flow_qty_sum\"] / dfc_top400[\"gross_flow_qty_sum\"], np.nan\n",
")\n",
"dfc_top400[\"exit_rate_per_isin\"] = np.where(\n",
" dfc_top400[\"n_isin_total\"] > 0,\n",
" dfc_top400[\"full_exit_count\"] / dfc_top400[\"n_isin_total\"], np.nan\n",
")\n",
"dfc_top400[\"aum_drawdown_last\"] = dfc_top400[\"aum_drawdown_last\"].clip(0, 1)\n",
"dfc_top400[\"aum_final_to_peak\"] = np.where(\n",
" dfc_top400[\"aum_qty_max\"] > 0,\n",
" (dfc_top400[\"aum_qty_last\"] / dfc_top400[\"aum_qty_max\"]).clip(0, 1), np.nan\n",
")\n",
"\n",
"# Performance reactivity features — viable on large accounts (denser time series)\n",
"def corr_lag(x, y, lag):\n",
" x = np.asarray(x, dtype=float)\n",
" y = np.asarray(y, dtype=float)\n",
" mask = np.isfinite(x) & np.isfinite(y)\n",
" x, y = x[mask], y[mask]\n",
" if len(x) <= lag + 3:\n",
" return np.nan\n",
" return pd.Series(x[lag:]).corr(pd.Series(y[:-lag]))\n",
"\n",
"rows_corr = []\n",
"for acc, g in df_month_top400.groupby(ID_COL):\n",
" g = g.sort_values(\"month\")\n",
" rows_corr.append({\n",
" ID_COL: acc,\n",
" \"corr_flow_fund_lag3\": corr_lag(g[\"flow_to_aum_m\"].values, g[\"ret_fund_m\"].values, 3),\n",
" \"corr_flow_fund_lag6\": corr_lag(g[\"flow_to_aum_m\"].values, g[\"ret_fund_m\"].values, 6),\n",
" \"corr_flow_rate_lag3\": corr_lag(g[\"flow_to_aum_m\"].values, g[\"delta_rate_m\"].values, 3),\n",
" })\n",
"df_corr_top400 = pd.DataFrame(rows_corr)\n",
"dfc_top400 = dfc_top400.merge(df_corr_top400, on=ID_COL, how=\"left\")\n",
"\n",
"# Recency feature\n",
"dfc_top400 = add_months_since_last_tx(dfc_top400, df_month_top400, ID_COL)\n",
"\n",
"# Quality filters\n",
"dfc_top400 = dfc_top400[\n",
" (dfc_top400[\"n_months\"] >= 6) & (dfc_top400[\"aum_qty_mean\"] > 0)\n",
"].copy()\n",
"\n",
"# Geographic grouping\n",
"top_countries_t = dfc_top400[\"country\"].fillna(\"Unknown\").value_counts().head(10).index\n",
"top_regions_t = dfc_top400[\"region\"].fillna(\"Unknown\").value_counts().head(10).index\n",
"dfc_top400[\"country_grp\"] = np.where(dfc_top400[\"country\"].isin(top_countries_t), dfc_top400[\"country\"], \"Other\")\n",
"dfc_top400[\"region_grp\"] = np.where(dfc_top400[\"region\"].isin(top_regions_t), dfc_top400[\"region\"], \"Other\")\n",
"\n",
"print(f\"Accounts after quality filters: {len(dfc_top400)}\")\n",
"\n",
"# ── 6c. Feature selection ─────────────────────────────────────────────────\n",
"# Removed vs initial set:\n",
"# flow_to_aum_vol : EXTREME values (max 2.5e13), ratio~0 — no discriminant power\n",
"# activity_intensity : highly redundant with n_tx_total (corr=0.98)\n",
"# avg_n_isin_held : highly redundant with n_isin_total (corr=0.91)\n",
"base_features_top400 = [\n",
" \"log_aum_qty_mean\",\n",
" \"flow_freq\",\n",
" \"gross_flow_to_aum\",\n",
" \"n_tx_total\",\n",
" \"n_isin_total\",\n",
" \"avg_holding_months_per_isin\",\n",
" \"exit_rate_per_isin\",\n",
" \"flow_direction_balance\",\n",
" \"aum_drawdown_last\",\n",
" \"months_since_last_tx\",\n",
" \"corr_flow_fund_lag3\",\n",
" \"corr_flow_fund_lag6\",\n",
" \"corr_flow_rate_lag3\",\n",
"]\n",
"all_features_top400 = [c for c in base_features_top400 if c in dfc_top400.columns]\n",
"print(f\"Feature set: {len(all_features_top400)} features\")\n",
"print(all_features_top400)\n",
"\n",
"# ── 6d. Preprocessing — MAD winsorization + RobustScaler ─────────────────\n",
"dfc_top400_clean = dfc_top400.copy()\n",
"\n",
"# Impute NaN with 0 (absence of activity = neutral)\n",
"for col in [\"flow_direction_balance\", \"months_since_last_tx\",\n",
" \"corr_flow_fund_lag3\", \"corr_flow_fund_lag6\", \"corr_flow_rate_lag3\"]:\n",
" if col in dfc_top400_clean.columns:\n",
" dfc_top400_clean[col] = dfc_top400_clean[col].fillna(0)\n",
"\n",
"# MAD 3-sigma clip\n",
"for col in [\"n_isin_total\", \"exit_rate_per_isin\", \"avg_holding_months_per_isin\",\n",
" \"aum_drawdown_last\", \"aum_final_to_peak\"]:\n",
" if col in dfc_top400_clean.columns:\n",
" dfc_top400_clean[col] = winsorize_mad(dfc_top400_clean[col], n_sigma=3)\n",
"\n",
"# Clip p90 + log-transform (heavy right tails / asymmetric distributions)\n",
"for col, p_clip in [\n",
" (\"gross_flow_to_aum\", 90),\n",
"]:\n",
" if col in dfc_top400_clean.columns:\n",
" vals = dfc_top400_clean[col].to_numpy(dtype=float)\n",
" dfc_top400_clean[col] = np.log1p(np.clip(vals, 0, np.nanpercentile(vals, p_clip)))\n",
"\n",
"for col in [\"flow_freq\", \"n_tx_total\"]:\n",
" if col in dfc_top400_clean.columns:\n",
" vals = dfc_top400_clean[col].to_numpy(dtype=float)\n",
" dfc_top400_clean[col] = np.log1p(np.clip(vals, 0, None))\n",
"\n",
"# months_since_last_tx : MAD~0 (64% zeros) → log-transform before scaling\n",
"# so RobustScaler can normalize the right tail (values 1-54)\n",
"col = \"months_since_last_tx\"\n",
"if col in dfc_top400_clean.columns:\n",
" vals = dfc_top400_clean[col].to_numpy(dtype=float)\n",
" dfc_top400_clean[col] = np.log1p(np.clip(vals, 0, None))\n",
"\n",
"# MAD 3-sigma clip for log-scale variables\n",
"for col in [\"log_aum_qty_mean\"]:\n",
" if col in dfc_top400_clean.columns:\n",
" dfc_top400_clean[col] = winsorize_mad(dfc_top400_clean[col], n_sigma=3)\n",
"\n",
"# Build X\n",
"X_top400 = dfc_top400_clean[all_features_top400].copy()\n",
"X_top400 = X_top400.loc[:, ~X_top400.columns.duplicated()]\n",
"X_top400 = X_top400.replace([np.inf, -np.inf], np.nan).fillna(X_top400.median())\n",
"\n",
"scaler_top400 = RobustScaler()\n",
"X_top400_scaled = scaler_top400.fit_transform(X_top400)\n",
"\n",
"# Scaling diagnostic\n",
"X_df_t = pd.DataFrame(X_top400_scaled, columns=X_top400.columns)\n",
"extreme = (X_df_t.abs() > 5).any(axis=1).sum()\n",
"print(f\"Accounts: {X_top400.shape[0]} | Features: {X_top400.shape[1]}\")\n",
"print(f\"Points > 5 std after scaling: {extreme} ({extreme/len(X_df_t):.1%})\")\n",
"\n",
"extreme_by_feat = (X_df_t.abs() > 5).sum().sort_values(ascending=False)\n",
"if extreme_by_feat[extreme_by_feat > 0].shape[0] > 0:\n",
" print(\"Features with extreme values after scaling:\")\n",
" print(extreme_by_feat[extreme_by_feat > 0].to_string())\n",
"else:\n",
" print(\"All features clean after scaling.\")\n",
"\n",
"# ── 6e. K-selection ───────────────────────────────────────────────────────\n",
"rows_k = []\n",
"for k in range(2, 11):\n",
" km = KMeans(n_clusters=k, n_init=30, random_state=RANDOM_STATE)\n",
" labels = km.fit_predict(X_top400_scaled)\n",
" rows_k.append({\n",
" \"k\": k, \"inertia\": km.inertia_,\n",
" \"silhouette\": silhouette_score(X_top400_scaled, labels),\n",
" \"davies_bouldin\": davies_bouldin_score(X_top400_scaled, labels),\n",
" })\n",
"df_kdiag_top400 = pd.DataFrame(rows_k)\n",
"print(df_kdiag_top400.to_string(index=False))\n",
"\n",
"fig, axes = plt.subplots(1, 3, figsize=(15, 4))\n",
"for ax, col, title in zip(axes,\n",
" [\"inertia\", \"silhouette\", \"davies_bouldin\"],\n",
" [\"Elbow / Inertia\", \"Silhouette (higher=better)\", \"Davies-Bouldin (lower=better)\"]):\n",
" ax.plot(df_kdiag_top400[\"k\"], df_kdiag_top400[col], marker=\"o\")\n",
" ax.set_title(title)\n",
" ax.set_xlabel(\"K\")\n",
"plt.suptitle(\"K-selection — Top 400 Accounts\")\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"# ── 6f. Final clustering K=2, 4, 5 ───────────────────────────────────────\n",
"RESULTS_TOP400 = {}\n",
"for k in [2, 4, 5]:\n",
" km = KMeans(n_clusters=k, n_init=50, random_state=RANDOM_STATE)\n",
" dfc_top400[f\"cluster_k{k}\"] = km.fit_predict(X_top400_scaled)\n",
" RESULTS_TOP400[k] = {\n",
" \"model\": km,\n",
" \"silhouette\": silhouette_score(X_top400_scaled, dfc_top400[f\"cluster_k{k}\"]),\n",
" \"davies_bouldin\": davies_bouldin_score(X_top400_scaled, dfc_top400[f\"cluster_k{k}\"]),\n",
" }\n",
" print(f\"K={k} | sil={RESULTS_TOP400[k]['silhouette']:.4f} \"\n",
" f\"| db={RESULTS_TOP400[k]['davies_bouldin']:.4f}\")\n",
" counts = dfc_top400[f\"cluster_k{k}\"].value_counts().sort_index()\n",
" props = counts / counts.sum() * 100\n",
" print(pd.DataFrame({\"n_accounts\": counts, \"pct\": props.round(1)}))\n",
" print()\n",
"\n",
"# ── 6g. Profile variables for visualization ───────────────────────────────\n",
"profile_vars_top400 = [\n",
" \"log_aum_qty_mean\",\n",
" \"gross_flow_to_aum\",\n",
" \"flow_freq\",\n",
" \"n_tx_total\",\n",
" \"n_isin_total\",\n",
" \"avg_holding_months_per_isin\",\n",
" \"exit_rate_per_isin\",\n",
" \"flow_direction_balance\",\n",
" \"aum_drawdown_last\",\n",
" \"aum_final_to_peak\",\n",
" \"months_since_last_tx\",\n",
" \"corr_flow_fund_lag3\",\n",
" \"corr_flow_fund_lag6\",\n",
" \"corr_flow_rate_lag3\",\n",
"]\n",
"profile_vars_top400 = [c for c in profile_vars_top400 if c in dfc_top400.columns]\n",
"\n",
"for k in [2, 5]:\n",
" prof = plot_heatmap(\n",
" dfc_top400, profile_vars_top400, f\"cluster_k{k}\",\n",
" title=f\"Cluster Signatures — Top 400 Accounts (K={k}, robust z-score)\",\n",
" figsize=(16, 4)\n",
" )\n",
" print(f\"\\n=== Cluster medians — K={k} ===\")\n",
" print(prof.round(3).to_string())\n",
"\n",
"# ── 6h. 2D behavioral segmentation ───────────────────────────────────────\n",
"thr_int = dfc_top400[\"gross_flow_to_aum\"].median()\n",
"thr_freq = dfc_top400[\"flow_freq\"].median()\n",
"\n",
"def quadrant(row):\n",
" low_int = row[\"gross_flow_to_aum\"] < thr_int\n",
" low_frq = row[\"flow_freq\"] < thr_freq\n",
" if low_int and low_frq: return \"Dormant\"\n",
" if low_int and not low_frq: return \"Small rebalancers\"\n",
" if not low_int and low_frq: return \"Occasional large movers\"\n",
" return \"Highly active\"\n",
"\n",
"dfc_top400[\"seg_2D\"] = dfc_top400.apply(quadrant, axis=1)\n",
"print(dfc_top400[\"seg_2D\"].value_counts())\n",
"\n",
"plt.figure(figsize=(8, 5))\n",
"for name, g in dfc_top400.groupby(\"seg_2D\"):\n",
" plt.scatter(g[\"flow_freq\"], g[\"gross_flow_to_aum\"], s=15, label=name)\n",
"plt.yscale(\"log\")\n",
"plt.axvline(thr_freq, linestyle=\"--\", color=\"gray\")\n",
"plt.axhline(thr_int, linestyle=\"--\", color=\"gray\")\n",
"plt.xlabel(\"Activity frequency (share of active months)\")\n",
"plt.ylabel(\"Gross flow / mean AUM [log scale]\")\n",
"plt.title(\"2D Behavioral Segmentation — Top 400 Accounts\")\n",
"plt.legend(markerscale=2)\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"# ── 6i. Churn analysis ────────────────────────────────────────────────────\n",
"dfc_top400[\"churn_hard\"] = (dfc_top400[\"aum_final_to_peak\"] < 0.10).astype(int)\n",
"dfc_top400[\"churn_soft\"] = (\n",
" (dfc_top400[\"aum_final_to_peak\"] < 0.40) &\n",
" (dfc_top400[\"aum_drawdown_last\"] > 0.40)\n",
").astype(int)\n",
"dfc_top400[\"churn_warning\"] = (\n",
" (dfc_top400[\"flow_direction_balance\"] < 0) &\n",
" (dfc_top400[\"aum_drawdown_last\"] > 0.20)\n",
").astype(int)\n",
"\n",
"print(\"\\n=== Overall churn rates ===\")\n",
"print(dfc_top400[[\"churn_hard\", \"churn_soft\", \"churn_warning\"]].mean().round(3))\n",
"\n",
"for k in [2, 5]:\n",
" churn_profile = (\n",
" dfc_top400.groupby(f\"cluster_k{k}\")\n",
" .agg(\n",
" n_accounts = (ID_COL, \"count\"),\n",
" churn_hard = (\"churn_hard\", \"mean\"),\n",
" churn_soft = (\"churn_soft\", \"mean\"),\n",
" churn_warning = (\"churn_warning\", \"mean\"),\n",
" )\n",
" )\n",
" print(f\"\\n=== Churn rates by cluster — K={k} ===\")\n",
" print(churn_profile.round(3).to_string())\n",
"\n",
" churn_profile[[\"churn_hard\", \"churn_soft\", \"churn_warning\"]].plot(\n",
" kind=\"bar\", figsize=(8, 4),\n",
" color=[\"#d62728\", \"#ff7f0e\", \"#ffbb78\"]\n",
" )\n",
" plt.title(f\"Churn Rates by Cluster — Top 400 Accounts (K={k})\")\n",
" plt.ylabel(\"Rate\")\n",
" plt.xlabel(\"Cluster\")\n",
" plt.xticks(rotation=0)\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
"# ── 6j. Full cluster profile table ────────────────────────────────────────\n",
"for k in [2, 5]:\n",
" print(f\"\\n===== Cluster profile — K={k} =====\")\n",
" prof = (\n",
" dfc_top400.groupby(f\"cluster_k{k}\")\n",
" .agg(\n",
" n_accounts = (ID_COL, \"count\"),\n",
" aum_mean_med = (\"aum_qty_mean\", \"median\"),\n",
" freq_med = (\"flow_freq\", \"median\"),\n",
" gross_flow_to_aum_med = (\"gross_flow_to_aum\", \"median\"),\n",
" n_tx_med = (\"n_tx_total\", \"median\"),\n",
" holding_med = (\"avg_holding_months_per_isin\",\"median\"),\n",
" exit_rate_med = (\"exit_rate_per_isin\", \"median\"),\n",
" flow_dir_med = (\"flow_direction_balance\", \"median\"),\n",
" drawdown_med = (\"aum_drawdown_last\", \"median\"),\n",
" months_inactive_med = (\"months_since_last_tx\", \"median\"),\n",
" corr_fund_lag3_med = (\"corr_flow_fund_lag3\", \"median\"),\n",
" corr_rate_lag3_med = (\"corr_flow_rate_lag3\", \"median\"),\n",
" )\n",
" .sort_values(\"n_accounts\", ascending=False)\n",
" )\n",
" print(prof.round(3).to_string())\n",
"\n",
"# ── 6k. Inter-cluster distance matrix ─────────────────────────────────────\n",
"def plot_distance_matrix(X_scaled, labels, max_points=400, title=\"Distance matrix\"):\n",
" n = X_scaled.shape[0]\n",
" idx = np.arange(n)\n",
" if n > max_points:\n",
" idx = np.random.default_rng(42).choice(idx, size=max_points, replace=False)\n",
" X_sub = X_scaled[idx]\n",
" labels_sub = np.asarray(labels)[idx]\n",
" order = np.lexsort((np.arange(len(labels_sub)), labels_sub))\n",
" X_sub = X_sub[order]\n",
" labels_sub = labels_sub[order]\n",
" D = pairwise_distances(X_sub)\n",
" plt.figure(figsize=(8, 7))\n",
" sns.heatmap(D, cmap=\"viridis\")\n",
" for b in np.cumsum(np.unique(labels_sub, return_counts=True)[1])[:-1]:\n",
" plt.axhline(b, color=\"red\", linewidth=2)\n",
" plt.axvline(b, color=\"red\", linewidth=2)\n",
" plt.title(title)\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
"plot_distance_matrix(\n",
" X_top400_scaled,\n",
" dfc_top400[\"cluster_k5\"].values,\n",
" title=\"Inter-cluster Distance Matrix — Top 400 Accounts (K=5)\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "b394752d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Selected accounts (AUM > €5M): 431\n"
]
}
],
"source": [
"ref_date = pd.Timestamp(\"2025-10-01\") # first day of month (panel convention)\n",
"df_ref = df_aum[df_aum[\"month\"] == ref_date]\n",
"\n",
"aum_account = (\n",
" df_ref.groupby(ID_COL)[AUM_VAL_COL].sum()\n",
" .reset_index().sort_values(AUM_VAL_COL, ascending=False)\n",
")\n",
"aum_account = aum_account[aum_account[AUM_VAL_COL] > 5_000_000]\n",
"selected_accounts = aum_account[ID_COL]\n",
"print(f\"Selected accounts (AUM > €5M): {len(selected_accounts)}\")"
]
},
{
"cell_type": "markdown",
"id": "078c2442",
"metadata": {},
"source": [
"---\n",
"### 6h. Visualization — Top 400 Accounts\n",
"\n",
"The 2D intensity-frequency space provides an intuitive view of behavioral profiles. The churn analysis links clusters to concrete retention risk signals.\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "715c7165",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"seg_2D\n",
"Highly active (high int, high freq) 137\n",
"Dormant (low int, low freq) 136\n",
"Small rebalancers (low int, high freq) 77\n",
"Occasional large movers (high int, low freq) 77\n",
"Name: count, dtype: int64\n",
"thr_int: 4.0037 | thr_freq: 0.7231\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3kAAAHqCAYAAAC5nYcRAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA63lJREFUeJzs3Xd8E/X/B/BXmrbQMgod7D067GbvMpW9h8qQJUMcgCggKvIVLCqiIigqUwREQKYMAWUoyLK0jDJklSJgKWW20DbJ74/+cjbNvOSSXJLX8/HgQXO53L1z97nLve/zuc9HodFoNCAiIiIiIiK34OXsAIiIiIiIiEg6TPKIiIiIiIjcCJM8IiIiIiIiN8Ikj4iIiIiIyI0wySMiIiIiInIjTPKIiIiIiIjcCJM8IiIiIiIiN8Ikj4iIiIiIyI0wySMiIiIiInIjTPKIyKOlp6cjLCwMixcvdvi6v/jiC4SFhTl8vVqDBw/G4MGDnbZ+0ufsMkFERO7B29kBEJFrSklJwcaNG3H48GFcv34dZcqUQWxsLMaPH4+aNWvqzDt48GAcOXIEAKBQKODv74+QkBDExMSgZ8+eaN68uUXrnDJlCjZs2CC8ViqVCAkJQb169TBu3DjUqVNHui9IgtzcXKxevRobNmxAWloavLy8UL58edSrVw9Dhw5F7dq1nR2iQ23ZsgWZmZkYOnSoVZ/PycnBokWL0KhRIzRu3Fja4GSk8HFvyssvv4xXXnnFAREVuH//Pp555hncuXMHn3/+OTp27Kjzfm5uLj7//HNs2rQJ9+/fR1hYGMaPH2/wPPXXX3/h448/xpkzZ1CyZEl06tQJEyZMQIkSJSyO5+LFi+jcuTN8fX3xxx9/oHTp0jZ/Rzmy9bghInGY5BGRVRYtWoS//voLHTt2RFhYGDIyMrBy5Ur07t0ba9asQWhoqM78FSpUwMSJEwEUXORevXoVu3btwubNm9GpUyd8/PHH8PHxMbteX19fzJw5EwCgUqmQlpaGH374AQcOHMDPP/+M8uXLS/9l7WTs2LEYNWqUs8Mw69VXX8X+/fvRpUsX9OvXD/n5+bh06RL27t2L+Ph4j0vytm7digsXLtiU5M2fPx8vv/yyXpLnKmXCEmPGjEHfvn2F1ydPnsSKFSswZswY1KpVS5ju6JrLefPm4fHjx0bfnzJlCnbu3IkhQ4agRo0a2LBhA0aNGoXly5ejQYMGwnypqanCTY4pU6bg5s2bWLJkCa5cuYJFixZZHM/mzZsREhKCe/fuYefOnejXr59N30+ubD1uiEgcJnlEZJWhQ4dizpw58PX1FaZ17twZ3bp1wzfffIM5c+bozF+qVCn06NFDZ9qkSZMwc+ZMrFq1CpUrV8Ybb7xhdr3e3t56y4mLi8Po0aOxb98+9O/f34Zv5Vje3t7w9pbmNKzRaPDkyRMUL15ckuVppaSk4LfffsOECRMwZswYnfdUKhXu378v6fo8nZRlwtmK1nwVK1YMK1asQLNmzZxWg3n+/HmsXr0aL730EubNm6f3fkpKCn7++We8+eabGDFiBACgZ8+e6Nq1K+bMmYMffvhBmHfu3LkoXbo0VqxYgZIlSwIAqlSpgrfffhu///47WrRoYTYejUaDLVu2oGvXrkhPT8fmzZvdNskjIsfiM3lEZJV69erpJHgAUKNGDdStWxeXLl2yaBlKpRJvv/026tSpg5UrV+LBgwdWxRIcHCwsr7D79+9j1qxZSEhIQFRUFDp06IBvvvkGarXa4HLWrFmD9u3bIyoqCn369EFKSorO+2fPnsWUKVPQrl07REdHo3nz5pg6dSqysrKEeXbs2IGwsDCDzdR++OEHhIWF4fz58wAMP3+Vn5+PBQsWCHG0bdsWc+fORW5urs58bdu2xejRo3HgwAH07t0bMTExwgXo+vXrMWTIEDRt2hRRUVHo3LkzVq1aZcmm1HPt2jUABfu7KKVSibJly+pMu3XrFqZOnYpmzZohKioKXbp0wbp16/Q+e/36dYwZMwZxcXFo2rQpPvjgAxw4cABhYWE4fPiwMN/gwYPRtWtXnD17FoMGDUJsbCw6dOiAHTt2AACOHDmCfv36ISYmBs888wwOHjyoty5LYjp8+DDCwsKwbds2fPXVV2jVqhWio6Pxwgsv4OrVqzrx7N27F9evX0dYWBjCwsLQtm1bAP818+vduzfq16+PuLg4PP/88/jzzz+Fz6enp6Np06YAgPnz5wvL+OKLLwBIUyaOHTuGvn37Ijo6Gu3atcPGjRv1tomcrFy5El26dEFUVBRatGiBGTNm6N080JaDU6dO4dlnn0VMTAzatm2L1atXi1rXrFmz0L59e50aucJ27NgBpVKJAQMGCNOKFSuGvn37IikpCTdu3AAAPHz4EAcPHkT37t2FBA8AevToAX9/f2zfvt2ieI4fP47r16+jc+fO6Ny5M44dO4abN2/qzadWq7F8+XJ069YN0dHRaNKkCUaMGIGTJ0/qzLdp0yb07dsXsbGxaNiwIQYOHIjff/9dZx5Ltnfbtm0xZcoUvTiKPscrxXEDACtWrECXLl2EuHv37o0tW7ZYtA2JyDD3uF1IRLKg0Whw+/Zt1K1b1+LPKJVKdOnSBZ9//jmOHz+O1q1bm/3MnTt3ABRc+Fy7dg1z5sxBmTJl0KZNG2GenJwcDBo0CLdu3cKzzz6LihUrIikpCXPnzkVGRgamTZums8ytW7fi0aNHGDBgABQKBRYtWoRXXnkFu3fvFpqRHjx4ENeuXUPv3r0REhKCCxcu4Mcff8Tff/+NH3/8EQqFAq1btxYu8ho1aqSzjm3btqFu3bp6TVkLe/vtt7FhwwY888wzGDZsGFJSUvD111/j4sWLWLBggc68ly9fxuuvv44BAwagf//+wrOQq1evRt26ddG2bVt4e3vjt99+w4wZM6DRaDBw4ECz27ewSpUqASh4nqZevXoma5lu376N/v37Q6FQYODAgQgMDMT+/fsxbdo0PHz4UGimlZ2djRdeeAEZGRkYMmQIgoODsXXrVp3krrB79+5hzJgx6Ny5Mzp27IjVq1dj4sSJUKvV+OCDD/Dss8+ia9euWLx4MV599VXs3btXuPC2NCatb7/9FgqFAsOHD8fDhw+xaNEiTJo0CWvXrgVQ0ATxwYMHuHnzJqZOnQoAwvNXDx8+xNq1a9G1a1f069cPjx49wrp16zBy5EisXbsWERERCAwMxHvvvYf33nsPHTp0QIcOHQCYbrIopkxcvXoVr732Gvr27YtevXph/fr1mDJlCiIjI0Udl47yxRdfYP78+WjWrBmee+45XL58GatXr8bJkyexevVqnSbc9+7dw6hRo9CpUyd06dIF27dvx3vvvQcfHx+dZqHGbN++HUlJSdi2bRuuX79ucJ7U1FTUqFFDJ3EDgJiYGOH9ihUr4ty5c8jPz0dUVJTOfL6+voiIiEBqaqpF33/Lli2oVq0aYmJiEBoaiuLFi2Pr1q0YOXKkznzTpk3DTz/9hFatWqFv375QqVQ4duwYkpOTER0dDaDgpsEXX3yB+Ph4vPrqq/Dx8UFycjL+/PNPoVZRzPYWw5bj5scff8TMmTPxzDPPYMiQIXjy5AnOnTuH5ORkdOvWzap4iAiAhohIIhs3btSEhoZq1q5dqzN90KBBmi5duhj93K5duzShoaGa5cuXm1z+5MmTNaGhoXr/WrZsqTl16pTOvAsWLNDExcVpLl++rDN9zpw5moiICM0///yj0Wg0mmvXrmlCQ0M1jRo10ty9e1eYb/fu3ZrQ0FDNr7/+KkzLycnRi2nr1q2a0NBQzdGjR4VpEydO1DRt2lSTn58vTPv333814eHhmvnz5wvT5s2bpwkNDRVep6amakJDQzXTpk3TWcfs2bM1oaGhmkOHDgnT2rRpowkNDdXs379fLyZDcQ4fPlzTrl07nWmDBg3SDBo0SG/ewtRqtWbQoEGa0NBQTbNmzTQTJ07UfP/995rr16/rzfvWW29pmjdvrrlz547O9AkTJmjq168vxLVkyRJNaGioZteuXcI8jx8/1nTs2FETGhqq+fPPP3ViDA0N1WzZskWYdvHiRU1
"text/plain": [
"<Figure size 900x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3kAAAHqCAYAAAC5nYcRAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAA8WRJREFUeJzs3XdYU+fbwPFvQFARF7gVEVFwAILiRnHvvepeuFddVdRaxYVWrQsHde+6rduq/Tmr1j3RuhARBw5UBAokef/wJTUSMEAwoPfnurwkZ97n5MlJ7vOMo1Cr1WqEEEIIIYQQQnwVTIwdgBBCCCGEEEIIw5EkTwghhBBCCCG+IpLkCSGEEEIIIcRXRJI8IYQQQgghhPiKSJInhBBCCCGEEF8RSfKEEEIIIYQQ4isiSZ4QQgghhBBCfEUkyRNCCCGEEEKIr4gkeUIIIYQQQgjxFZEkT4hUtn37dhwdHQkODjZ2KCKN8fb2platWsYOQ4hUo1KpaNKkCYsXLzZqHKnxWdO1TUdHRxYsWGDQ/Yj0Kzg4GEdHR7Zv366ZtmDBAhwdHY0YVcpt3LiRGjVqEB0dbexQRCIkyRNpTlxSdO3aNa3p7969o02bNjg7O3P8+PEU7SPuIvvpP2dnZ723oVQq2bZtG126dKFChQo4OTlRq1YtxowZEy/21HTs2LFv6keFo6MjkyZNSta6u3fvZtWqVYYNyIAiIyNZsGABZ8+eTZXtq1QqKlWqxNKlSxNcJqHPhqOjIxs3bkyVuL5l58+fp1evXlSrVg1nZ2dq1KhBv3792L17t7FDM4g9e/bw5MkTOnfubOxQRBLcvXuXBQsWyM1JPaX175aAgABGjhyJp6cnTk5OVKhQge7du7Nt2zaUSiUAr1+/ZtmyZXTq1IlKlSrh7u5Ou3bt2LdvX7zttWrVipiYGH777bcvfSgiCTIYOwAh9BEeHk7Pnj25ffs2fn5+VK9e3SDbnThxIhYWFprXpqameq0XFRXFoEGDOHHiBOXLl6dv375kz56dx48fs3//fnbs2MHRo0fJly+fQeJMzLFjx1i/fj2DBw9O9X2ld3v27OHOnTt0797d2KEAMHnyZNRqteZ1ZGQkfn5+DBo0iIoVKxp8f1evXuX169fUqFHjs8t++tkAKFOmjMFj+pbt37+fYcOGUbJkSbp27Ur27NkJDg7m3LlzbN68maZNmxo7xBRbvnw5jRs3JmvWrMYO5Yu4evWq3t8jadndu3fx8/OjQoUKFCpUyNjhpHlJ+W7p378/ffr0Sf2g/t+WLVuYMGEC1tbWNG/eHFtbW96/f8+ZM2cYN24coaGh9OvXj8uXLzN37lyqV69O//79yZAhAwcPHmTYsGHcvXuXIUOGaLaZMWNGWrRowapVq+jSpQsKheKLHY/QnyR5Is0LDw/Hy8uLgIAA/Pz88PT0NNi269evj5WVVZLX+/nnnzlx4gRjxoyJd1EfNGhQmr6jpw+1Ws2///5LpkyZjB3KV83MzOyL7u/YsWMULFiQ4sWLf3bZpHw2IiIi4iWE4vP8/PwoVqwYmzZtwtzcXGvey5cvjRSV4dy8eZNbt27h7e1t7FC+mIwZMxo7hC9Ovi+SJkOGDGTIYLif35GRkWTOnFnnvMuXLzNhwgRcXV359ddfsbS01Mzr3r07165d486dOwAUK1aMgwcPUrBgQc0yHTt2pHv37ixdupRevXppXecbNmzIsmXLOHPmDJUrVzbY8QjDkeaaIk17//49vXr14saNGyxYsECvGoikCg8P16pN+ZynT5+yadMmqlatqvOunampKV5eXonW4iXUb6NWrVpaP4hiYmLw8/OjXr16ODs7U7FiRTp06MCpU6eAD31C1q9fr9lm3L84KpWKVatW0bhxY5ydnalSpQo//fQTb968ibffvn37cuLECVq1aoWLi4umGcapU6fo0KED7u7uuLm5Ub9+fX755Re9z1dqO3v2LI6Ojuzbt4/FixdTvXp1nJ2d6datGw8fPtQs16VLF44ePcrjx4815+nj/jTR0dHMnz+funXr4uTkhKenJz///HO8PgdxzUUPHz5MkyZNcHJyonHjxvGaEIeHhzN16lRq1aqFk5MTlStXpkePHty4cUOzzMd9eoKDgzVflH5+fpoYFyxYwLZt23B0dOTmzZvxjn/JkiWULFmSZ8+effZcHTt2LMU3SeKaU//9999MnDiRypUra23z2LFjdOzYEVdXV9zc3OjTp4/mR8TH4s6fs7MzTZo04dChQ/H6OMW9t582X9XVzwXg3r17DBkyhAoVKuDs7EyrVq04cuSIzvgvXLiAr68vlSpVwtXVlYEDB/Lq1at4cR47dozOnTvj5uZG2bJlad26taYp5fz58yldurTO9caPH4+7uzv//vtvgucyKCgIZ2fneAkegLW1dbzjXb58OatWraJmzZq4uLjQuXNn/vnnH6314pKq2rVr4+zsTNWqVRkzZgyvX7+Ot49nz54xduxYPDw8NM3NJ0yYoFXm3759y9SpUzXNvOrWrcuvv/6KSqVK8LjiHD58GDMzM9zd3XXue8yYMVSpUkXzGdq6datmflRUFA0aNKBBgwZERUVppoeFheHh4UH79u01zcwg8fdJl6SWLV3lVZdPr+1xzZ8fPnyIt7c37u7ulCtXjjFjxhAZGam1blRUFFOmTKFixYq4ubnRr18/nj17pnc/P32uYaNHj8bZ2Zl79+5prevl5UX58uV59uwZ27dv5/vvvwega9eummtR3LlK7PtCn/LycXlev349tWvXpkyZMvTs2ZMnT56gVqtZuHAh1atXx8XFhf79+xMWFhbvePW91ujy6NEjzbWiTJkytGvXjqNHj2otk1B/+k/Lzue+Wz6VUJ+833//XXM+K1SowLBhw3jy5InWMl26dKFJkyZcv36dTp06UaZMmUS/j/38/FAoFMyaNUsrwYsTd50EsLGx0UrwABQKBXXq1CE6OppHjx5pzXNyciJHjhzxrrEi7ZCaPJFmRUZG0rt3b65fv868efOoWbNmvGWio6MJDw/Xa3u6aiVq166tqYWoXbs23t7e5MqVK9HtHD9+nNjYWJo1a6bfgaSAn58f/v7+tG3bFhcXF8LDw7l+/To3btygatWqfPfddzx//pxTp07x888/x1v/p59+YseOHbRq1YouXboQHBzM+vXruXnzJhs3btSqSXrw4AEjRozgu+++o127dtjZ2XHnzh369u2Lo6MjQ4YMwdzcnIcPH3Lx4sVUP/akWrp0KQqFgp49exIeHs6yZcsYOXIkW7ZsAaBfv368e/eOp0+fMmbMGACyZMkCfEiG+/fvz4ULF2jXrh329vb8888/rF69msDAQBYtWqS1rwsXLvDHH3/QsWNHsmTJwtq1axkyZAj/+9//yJkzJwATJkzg4MGDdO7cGXt7e8LCwrhw4QL37t2jdOnS8eK3srJi4sSJTJw4kbp161K3bl3gw4/GQoUKMWnSJHbv3k2pUqW01tu9ezcVKlQgb968iZ6f0NBQbt68qdXkJjGf3ggwNTUle/bsmtc+Pj5YWVkxcOBAIiIiANi5cyfe3t54eHgwcuRIIiMj2bhxIx07dmTHjh2aZl8nT55k8ODBFCtWjBEjRvD69WvGjBmToubNd+7coUOHDuTNm5fevXtjYWHB/v37GThwIAsWLNCczzhTpkwhW7ZsDBo0iMePH7N69WomTZrE3LlzNcts376dsWPHUrx4cfr27UvWrFkJCAjgxIkTNG3alObNm7Nw4UL27dun1ecsOjqagwcPUq9evURrdgoUKMDp06d5+vSpXse+c+dO3r9/T8eOHfn3339Zu3Yt3bp1Y/fu3Zrr1l9//cWjR49o1aoVuXPn5s6dO2zevJm7d++yefNmTbOqZ8+e0aZNG969e0e7du0oWrQoz5494+DBg0RFRWFubk5kZCSdO3fm2bNntG/fnvz583Pp0iV++eUXQkNDGTduXKLxXrp0CQcHh3g11i9evKBdu3YoFAo6deqElZUVx48fZ9y4cYSHh9O9e3cyZcrEjBkz6NChA3PmzNF8ZidNmsS7d+/w9fXVNIv83PuUUoY
"text/plain": [
"<Figure size 900x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABW0AAAHqCAYAAAB/bWzAAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdYU9cbB/BvQFARFziqKIpaEBkioiiiuAe4txVw72rVWsFRFVu3ttbRn9Y9sHVinTiLAxUVB2qxigtQqxScbJL7+4PmlpAEAgQT9Pt5Hh/Jzcm57z3nJvfmzbnnSgRBEEBEREREREREREREesFA1wEQERERERERERER0X+YtCUiIiIiIiIiIiLSI0zaEhEREREREREREekRJm2JiIiIiIiIiIiI9AiTtkRERERERERERER6hElbIiIiIiIiIiIiIj3CpC0RERERERERERGRHmHSloiIiIiIiIiIiEiPMGlLREREREREREREpEeYtCUqAB8fH/j4+IiPY2NjYWNjg3379n3wWMLCwmBjY4OwsLAPvm5t02U7fijr169HmzZtYGtri27duuk6nI9SREQE7O3t8fTpU3FZ9vcs6Y6+9EXfvn2xePFiXYdBRPTRsbGxwdy5c3UdRoH5+/ujdevWeXrNvn37YGNjg1u3bhVSVNqlT+fe8raLjY39IOtTdb6oifzsF/ThaPs889dff0XLli2RlpamtTqJNMGkLRVpNjY2Gv37GBKZhenFixdYuXIlIiMjdR2KWmfOnMHKlSt1HYZWnD9/HkuWLIGzszMWLFiAyZMn6zqkj9KPP/4ILy8vWFhY6DoUREVFYeXKlR/sC4i+KArbPWLECOzYsQNxcXG6DoWIqEiIjo7GrFmz0KZNGzg4OMDZ2Rn9+/fHli1bkJKSouvw6CMUGBhYaAllfTpfzKowt7kgduzYgQkTJqBly5awsbGBv7+/rkPSmrdv36Jp06awsbFBcHCwwnM9e/ZEeno6fvvtNx1FR5+qYroOgKggso+O+v333xEaGqq0vHbt2h8kHgsLC0RERKBYsQ//1mrUqBEiIiJgZGSU59e+fPkSq1atgoWFBWxtbQshurxR1Y5nzpxBYGAgxo8fr8PItOPSpUswMDDAvHnzYGxsrOtwPkqRkZG4cOGC3pxYRUVFYdWqVWjcuDGqVaum63A+mJy2e8OGDTqKSlGbNm1gamqKHTt24KuvvtJ1OEREei0kJARfffUVjI2N0a1bN1hbWyM9PR3h4eFYsmQJoqKi8N133+k6TK367rvvIAiCrsP4pP36668oX748evbsqdV69e18MavC2uaCWr9+PRITE+Hg4PDR/eC9YsUKtT88FS9eHN27d8fmzZvh4+MDiUTygaOjTxWTtlSkZb+s/ObNmwgNDc31cvPk5GSULFlS6/FIJBIUL15c6/VqwsDAQGfr1jZdtuOHEB8fjxIlSuSasJXJZEhPT/+o26Kw7N27F1WrVoWTk5OuQ8kzQRCQmpqKEiVK6DqUQqUvP1gYGBigQ4cO+P333zFhwgSehBMRqRETE4NJkyahatWq2LJlCypVqiQ+N3DgQDx58gQhISEfNKbCPFdKSkqCiYlJvgZEUNFQlM8X8yMjIwMymaxA52Dbtm1D1apVIZFI0KBBAy1Gp1v37t3Dr7/+irFjx2LFihUqy3Tq1Anr16/HpUuX0LRp0w8cIX2qOD0CffR8fHzQuXNn3L59GwMHDkT9+vXxww8/AABOnjyJkSNHwt3dHfb29mjbti1Wr14NqVSqVM/OnTvRtm1bODo6onfv3rh69apSGVXzQfn7+6NBgwZ48eIFxo4diwYNGqBJkyZYtGiR0npevXqFb775Bs7OznBxcYGfnx/u3r2r0RxTqua0lW97VFQUfHx8UL9+fTRv3hzr1q1TeF3v3r0BANOmTROnlMi6vps3b2LYsGFo2LAh6tevD29vb4SHhyusf+XKlbCxscGTJ0/g7+8PFxcXNGzYENOmTUNycrJC2dDQUAwYMAAuLi5o0KABOnToIPaJqnb09/dHYGAgAMUpMQRBQOvWrTFmzBil9khNTUXDhg0xa9YstW3WuXNnlXMdyWQyNG/eHBMmTBCXHT58GD179kSDBg3g7OyMLl26YMuWLWrrVke+XUlJSUptLZ/77cCBA/Dy8oKDgwPOnTsHIHMKi2nTpsHNzQ329vbw8vLCnj17lOr/+++/MXbsWDg5OaFp06aYP38+zp07p7RvtG7dWuXlTKrmf0pLS8OKFSvQrl072Nvbw8PDA4sXL1aa00ke/8mTJ9G5c2cxzrNnzyqt58WLF5g+fbr43mvdujVmz56NtLQ0xMTEwMbGBps3b1Z63bVr12BjY4NDhw7l2M6nTp1CkyZNNErAxcfHY/r06XBzc4ODgwO6du2KoKAgpXL5fX/u27dPHMHp6+urNG1L69atMWrUKJw7dw49e/aEo6OjOOJj79698PX1RdOmTWFvbw9PT0/s2LFDaR3yOq5evYrevXvDwcEBbdq0wf79+xXKpaenY9WqVWjfvj0cHBzg6uqKAQMGIDQ0VCxz9+5d+Pv7i5e9NmvWDNOmTcOrV6+U1ptTP+a23ar2NU36Qv75sGHDBvFz2d7eHr169UJERIRC2bi4OEybNg0tWrSAvb093N3dMWbMGKXpGtzc3PD06VO9niKGiEjX1q9fj6SkJMybN08hYStXo0YNDBo0SGl5bucF6uYGlZ9bZqXuXEk+D2p4eDgWLFiAJk2awMnJCePGjUNCQkKu2yY/X4+OjsaIESPQoEEDTJkyRW18+TkvfPPmDXr37o0WLVrg4cOHKsvcunULNjY2Ks9D5Odzf/zxBwDg/fv3mDdvHlq3bg17e3s0bdoUQ4YMwZ07d3LdXk1dvHgRX3zxBZycnODi4oIxY8bgwYMH4vOXLl2CjY0NTpw4ofTagwcPwsbGBtevXweQt/OLrFq3bo379+/j8uXL4rmEj49PoZ8vnjlzBt7e3mIf9+rVCwcPHlRbl7r7i6j6fpjb+Ym6bZZ7+/Yt5s2bBw8PD9jb26Ndu3b45ZdfIJPJlNa7YcMGbN68GW3btoWDg4PYf9u2bYOXlxfq16+PRo0aoWfPnjlun5yFhUW+f+B+/fo1Fi1ahC5duojtOnz4cNy9e1ehnLwtjxw5gv/9739o0aIFHBwcMGjQIDx58kSpXk2+p+dm3rx5aNu2LVxcXNSWsbe3R7ly5XDq1Kk810+UXxxpS5+E169fY8SIEfDy8kLXrl1hbm4OAAgKCoKJiQmGDBkCExMTXLp0CStWrMD79+/h5+cnvn737t2YNWsWGjRogEGDBiEmJgZjxoxB2bJlUaVKlVzXL5VKMWzYMDg6OmLq1Km4ePEiNm7ciOrVq+OLL74AkJkoHDNmDCIiIjBgwADUqlULp06dUogjP968eYPhw4ejXbt26NSpE44dO4alS5fC2toaHh4eqF27NiZMmIAVK1agX79+aNiwIQDA2dkZQObJ2ogRI2Bvb48vv/wSEokE+/btw6BBg7Bjxw44OjoqrG/ixImoVq0aJk+ejD///BO7d++GmZkZvvnmGwDA/fv3MWrUKNjY2GDChAkwNjbGkydPcO3aNbXb0K9fP7x8+VJp6guJRIIuXbpgw4YNeP36NcqVKyc+d/r0abx//x5du3ZVW2+nTp2watUqxMXFoWLFiuLy8PBwvHz5Ep6engAyk8yTJ09G06ZNxRP4hw8f4tq1ayq/nORk8eLF2LVrFyIiIvD9998D+K+tgcyT36NHj2LgwIEoX748LCws8M8//6Bv376QSCQYOHAgzMzMcPbsWcyYMQPv37/H4MGDAQApKSkYNGgQnj9/Dh8fH1SqVAm///47Ll26lKcYs5Lvl+Hh4ejbty9q166Ne/fuYcuWLXj8+DF+/vlnhfLh4eE4fvw4vvjiC5QqVQrbtm3DhAkT8Mcff6B8+fIAMhN9vXv3xrt379C3b1/UqlULL168wLFjx5CSkoLq1avD2dkZBw4cELdN7uDBgyhVqhTatGmjNuYXL17g2bNnqFevXq7bl5KSAh8fH0RHR2PgwIGoVq0agoO
"text/plain": [
"<Figure size 1400x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABX8AAAGGCAYAAAAjAPI0AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdYU9cbwPEvU0E2iCICCgqigmjds446autq3Vtr3at1a62z1lUXDtx71b23WPfe4p6IypYtK78/+BGNTGkwSN/P89xHc3POzTknNyF5c+57tBQKhQIhhBBCCCGEEEIIIYQQuYq2phsghBBCCCGEEEIIIYQQQv0k+CuEEEIIIYQQQgghhBC5kAR/hRBCCCGEEEIIIYQQIheS4K8QQgghhBBCCCGEEELkQhL8FUIIIYQQQgghhBBCiFxIgr9CCCGEEEIIIYQQQgiRC0nwVwghhBBCCCGEEEIIIXIhCf4KIYQQQgghhBBCCCFELiTBXyGEEEIIIYQQQgghhMiFJPgrhBBCpKFOnTqMGDFC0834rLZt24aLiwu+vr6abooQQqSwZMkSGjZsSGJiotqOOW/ePFxcXAgODlbbMcW/M2PGDFq2bKnpZgghhBC5ggR/hRBC/Oc8f/6csWPHUrduXdzc3ChXrhxt2rRh1apVxMTEfJY2REdHM2/ePM6fP/9ZHu9LtXv3blauXKnpZuRoz58/x83NDRcXF27evJni/rCwMH777TcqV66Mh4cHHTt25Pbt26ke6+jRozRv3hw3Nze+/vpr5s6dS3x8fIZtePjwIfPmzZMfDT6TK1euMG/ePMLCwtR2zDp16tCzZ88U+3fs2IGrqyvdu3fn3bt3/+oxXFxcUt0WL16cqfoREREsXbqUHj16oK395X+Nkfe3tHXu3Jm7d+9y9OhRTTdFCCGE+OLparoBQgghxOfk7e3NwIED0dfXp2nTpjg7OxMXF8fly5eZPn06Dx8+ZOLEidnejujoaDw9PenXrx+VKlXK9sfLrKZNm9K4cWP09fU13RQA9uzZw4MHD+jSpYumm5Jj/fHHH+jq6hIbG5vivsTERH7++Wfu3btH9+7dMTc3Z/369XTs2JFt27ZRpEgRZdkTJ07Qt29fKlasyG+//cb9+/dZuHAhQUFBjB8/Pt02PHz4EE9PTypWrEjhwoXV3UXxkatXr+Lp6Unz5s0xMTHJtsfZtWsXI0eOpGrVqixYsIA8efL862NWq1aNpk2bquwrWbJkpupu2bKF+Ph4vvvuu3/djpxA3t/Slj9/furWrcvy5cupW7euppsjhBBCfNEk+CuEEOI/48WLFwwePJhChQqxatUqrK2tlfe1b9+eZ8+e4e3trbkGqkFUVBSGhoZZrq+jo4OOjo4aW5QzRUdHY2BgoOlm/GsnT57k1KlT/PTTTyxcuDDF/QcOHODq1avMmTOHhg0bAtCoUSMaNGjAvHnzmDlzprLstGnTcHFxYfny5ejqJn1EzJcvH15eXnTq1AknJ6fP0ymRI+zdu5cRI0ZQuXJltQV+AYoUKZIi+JtZ27Zto06dOhm2JT4+nsTExBzzI5bI2t+mRo0aMXDgQF68eIGdnV02tUwIIYTI/b7866WEEEKITFq6dClRUVFMnjxZJfCbzMHBgc6dO6dZPzkv5MdSy5N78+ZNunfvTqVKlXB3d6dOnTqMHDkSAF9fX6pUqQKAp6en8tLnefPmKes/evSIAQMGULFiRdzc3GjRokWKy1+TH/fChQuMGzeOKlWqUKtWrXTHYM2aNTRu3JgyZcpQoUIFWrRowe7du9PtS2JiIvPmzaN69eqUKVOGjh078vDhwxQ5kZPrXr58mSlTpijTDPTt2zdFLs0jR47w888/U716dUqXLk29evWYP38+CQkJyjIdO3bE29ubly9fKseoTp06abYT4Pz587i4uKik0+jYsSPfffcdt27don379pQpU4a//voLgNjYWObOncs333xD6dKlqVWrFtOmTUsxi/b06dO0bduW8uXLU7ZsWRo0aKA8hqbExcUxefJkOnXqhL29faplDh48iJWVFfXr11fus7CwoFGjRhw9elTZz4cPH/Lw4UNatWqlDPwCtGvXDoVCwcGDB9Nsx7Zt2xg4cCAAnTp1Uj5XHz4H69ato3HjxpQuXZrq1aszfvz4FCkLPnye2rRpo3zdbNiwIVPjsXXrVjp16kSVKlUoXbo03377LevXr0+17IkTJ+jQoQNly5alXLly/PDDDyqvA4Dr16/To0cPKlSogIeHB99//z2rVq1SKXP27FnatWuHh4cH5cuXp3fv3jx69EilzIgRI5Tn7YdSez9xcXFhwoQJHDlyhO+++47SpUvTuHFj/vnnH5V606ZNA6Bu3brK8U5+LajjXN23bx9Dhw6lYsWKLFy4UG2B32QxMTGfnELixYsX3Lt3j6pVq6rs9/X1xcXFhWXLlrFy5Urq1auHm5ub8nnIzHOULCQkhIEDB1KuXDkqVarEpEmTVNqZ/Fjbtm1LUffj9/CIiAgmT55MnTp1KF26NFWqVKFr167KlCvpvb+lZsSIEWmmzfjwcVMTFxeHp6cn9evXx83NjUqVKtG2bVtOnz6tUu7Ro0cMHDiQypUr4+7uToMGDZg1a5ZKmTt37vDTTz9Rrlw5ypYtS+fOnbl27ZpKmYz+Np04cUL5nJQtW5aff/6ZBw8epGh38nMtqR+EEEKIf0dm/gohhPjPOH78OHZ2dpQrVy5bHycoKEh5if3PP/+MiYkJvr6+HD58GEgKvo0bN45x48bxzTff8M033wAoA0EPHjygbdu2FChQgB49emBoaMj+/fvp27cv8+bNU5ZPNn78eCwsLOjbty9RUVFptmvz5s1MmjSJBg0a0KlTJ969e8e9e/e4fv0633//fZr1Zs6cydKlS6lduzY1atTg7t276eb/nDRpEiYmJvTr14+XL1+yatUqJkyYwOzZs5Vltm/fjqGhIV27dsXQ0JBz584xd+5cIiIiGD58OAC9evUiPDyc169fKwPn+fLly2D0UxcaGkqPHj1o3LgxTZo0wdLSksTERHr37s3ly5dp1aoVTk5O3L9/n1WrVvH06VMWLFgAJD0fPXv2xMXFhQEDBqCvr8+zZ8+4cuVKltqiLqtWrSIsLIw+ffpw6NChVMv4+PhQsmTJFPlR3dzc2LRpE0+ePMHFxYU7d+4o93+oQIECFCxYEB8fnzTbUaFCBTp27MiaNWvo1asXjo6OAMqZwvPmzcPT05OqVavStm1bnjx5woYNG7h58yYbNmxAT09Peay3b9/y888/06hRIxo3bsz+/fsZN24cenp6/Pjjj+mOx4YNGyhevDh16tRBV1eX48ePM378eBQKBe3bt1eW27ZtG6NGjaJ48eL07NkTY2NjfHx8OHnypPJ1cPr0aXr27Im1tTWdOnXCysqKR48e4e3trfyB6MyZM/To0YPChQvTr18/YmJiWLt2LW3btmXbtm1ZTn9x+fJlDh06RLt27ciXLx9r1qxhwIABHD9+HHNzc7755huePn3Knj17GDlyJObm5kDS+4o6ztWDBw8ydOhQypcvz6JFi8ibN2+KMm/fvlX5oSYtBgYGKWbYb9++nfXr16NQKHBycqJ3797pvv8ku3r1KpB2ioht27bx7t07WrVqhb6+Pqampp/8HA0aNAhbW1t+/fVXrl27xpo1awgLC1MG2z/F77//zsGDB+nQoQNOTk6EhoZy+fJlHj16RKlSpT75/a1169bKHw2TnTx5kt27d2NhYZFuWzw9PfHy8qJly5a4u7sTERHBrVu3uH37NtWqVQPg7t27tG/fHl1dXVq3bo2trS3Pnz/n2LFjDB48GEh6L2zfvj358uXjp59+QldXl02bNtGxY0fWrl1LmTJlVB43tb9NO3bsYMSIEVSvXp0hQ4YQHR3Nhg0baNeuHdu3b1d5ToyNjbG3t+fKlSuSGkMIIYT4FyT4K4QQ4j8hIiKCN2/efJbcgVevXuXt27csW7ZMJZiW/AXa0NCQBg0aMG7cOFxcXFJcAj158mRsbGzYunWr8rLldu3a0bZtW2bMmJEi+GtqasrKlSs
"text/plain": [
"<Figure size 1600x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"=== Medians K=5 (Top400) ===\n",
" log_aum_qty_mean gross_flow_to_aum flow_freq n_tx_total n_isin_total avg_holding_months_per_isin exit_rate_per_isin flow_direction_balance aum_drawdown_last aum_final_to_peak months_since_last_tx corr_flow_fund_lag3 corr_flow_fund_lag6 corr_flow_rate_lag3\n",
"cluster_k5 \n",
"0 10.975 1.488 0.557 819.0 25.0 52.905 0.567 -0.487 1.000 0.000 0.0 0.002 0.016 -0.024\n",
"1 11.174 1.389 0.043 4.0 2.0 42.429 0.250 0.557 0.303 0.697 19.0 -0.000 -0.007 -0.012\n",
"2 10.357 4.383 0.372 90.5 12.5 32.149 0.434 0.287 0.077 0.923 1.0 0.042 0.025 -0.034\n",
"3 11.045 5.471 0.777 1448.0 24.0 40.857 0.688 0.245 1.000 0.000 0.0 0.009 -0.008 0.005\n",
"4 11.994 5.155 0.926 4935.5 47.5 57.100 0.620 0.037 1.000 0.000 0.0 0.158 0.130 -0.140\n",
"\n",
"=== Overall churn rates ===\n",
"churn_hard 0.684\n",
"churn_soft 0.775\n",
"churn_warning 0.321\n",
"dtype: float64\n",
"\n",
"=== Churn per cluster K=2 ===\n",
" n_clients churn_hard churn_soft churn_warning\n",
"cluster_k2 \n",
"0 325 0.883 0.969 0.385\n",
"1 102 0.049 0.157 0.118\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAxYAAAGGCAYAAADmRxfNAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAThFJREFUeJzt3XlYVHX///EXDCAirqhk7lkzLmDgrmmampamuWRqLuWWmUuZpXBbpqahprlQuYtrbrlkRWrlbZmit5kmLm3uWy6guQACw/z+8Md8HVkEDzCgz8d1ceV8zvY+M2em8zrnc85xsdlsNgEAAACAAa7OLgAAAABA3kewAAAAAGAYwQIAAACAYQQLAAAAAIYRLAAAAAAYRrAAAAAAYBjBAgAAAIBhBAsAAAAAhhEsAAAAABhGsAByGYvForFjxzq7jAdC06ZN1b9/f2eXIenW5x4aGursMoAH2ty5c/XMM88oKSnJ2aWkaujQoXrjjTecXQaQJoIFkENOnjypUaNGqVmzZvL391eNGjXUpUsXLVq0SHFxcc4uz2l+/PFHdqiz2fnz5xUaGqrDhw87uxRDLBZLhv527dqVo3X98ssv9mVHR0enGH7+/Hm98cYbqlWrlmrUqKEBAwbo1KlTqc5r9erVevbZZ+Xv768WLVpoyZIl2V1+njRr1ix9//33WTrP69eva968eerXr59cXf9v9yitgz2zZs2SxWJRcHCwoSBy5MgRTZo0Sc8//7wCAwPVsGFDvfrqq4qMjEwxbr9+/bR582b9/vvv97w8IDu5ObsA4EGwdetWvfHGG/Lw8NDzzz8vs9mshIQE7dmzRx999JH+/vtvffDBB84u0yl+/PFHLVu2TIMHD3Z2KfetCxcu6JNPPlHp0qVVpUoVZ5dzzyZNmuTw+ssvv9T27dtTtFeqVCnHakpKStK4cePk5eWlmJiYFMNv3Lihnj176tq1a+rfv7/c3d21cOFCde/eXevXr1fRokXt465YsULvv/++WrZsqV69eumXX37RuHHjFBsbq1dffTXH1ikvmD17tlq2bKnmzZtn2Ty/+OILJSYm6rnnnrvruHPmzNHUqVPVvn17jR8/3iGI3Mtyv/jiC7Vo0UIvvfSSrl27ppUrV6pz586aN2+eGjRoYB+3atWq8vPz04IFC1Js90BuQLAAstmpU6c0dOhQPfzww1q0aJFKlixpH9atWzedOHFCW7duzdGakpKSlJCQoHz58mX5vGNiYuTl5ZXl80Xuk9Of9fPPP+/w+rffftP27dtTtOeklStX6ty5c3rhhRe0ePHiFMM///xzHT9+XKtXr1b16tUlSY0aNVKbNm0UFhamt956S5IUFxenqVOnqkmTJpoxY4Yk6cUXX1RSUpJmzpypzp07q3Dhwjm3Yg+gtWvXqmnTpnf9XZw3b56mTJmidu3a6cMPPzQUKiSpdevWGjRokAoUKGBv69ixo1q1aqXQ0FCHYCFJzz77rEJDQ3Xjxg2HaYDcgK5QQDabN2+eYmJiNH78eIdQkax8+fJ6+eWXU7R///33eu655+Tn56fWrVvrp59+chgeFBSkpk2bppguNDRUFovFoS35VP6GDRvUunVr+fv7a9u2bVq7dq0sFov27NmjkJAQ1atXTwEBARo4cGCqXTruFBQUpMDAQJ08eVL9+vVTYGCg3n77bUm3uocMGTJETZo0kZ+fnxo3bqwPP/zQodtXUFCQli1bZq8x+S9ZUlKSFi5caK+5QYMGGjVqlP7991+HOiIjI9WnTx/VrVtX1atXV9OmTRUcHHzX+pP9/PPPev755+Xv769WrVpp8+bN9mGnTp2SxWLRwoULU0z366+/ymKx6Ouvv053/jdv3lRoaKhatmwpf39/NWzYUIMGDdLJkyfTnCYzn+/27dvVtWtX1apVS4GBgWrZsqU+/vhjSdKuXbv0wgsvSJKCg4Pt7/HatWvt0//222/q06ePatasqccff1zdu3fXnj17Ul3u33//rWHDhql27dp66aWX0l1vZ4iJidGECRPUuHFj+fn5qWXLlpo/f75sNpvDeLd/J5I/lw4dOmj37t0ZXtaVK1c0bdo0DRkyRIUKFUp1nE2bNsnf398eKqRbZ1Tq16+vb7/91t62a9cuXblyJcV72q1bN8XExNz14MOZM2c0evRotWzZUtWrV1fdunU1ZMgQnT59OsW4V69e1YcffqimTZvKz89PTz75pIYPH+7wnc/INpuR9/r06dMptrdkd15XlLyNnThxQkFBQapVq5Zq1qyp4OBgxcbGOkwXExOjdevW2bfnoKAgSbe6M40fP96+bvXr11evXr108ODBdN+/U6dO6Y8//kixE3+nsLAwffTRR2rbtq1CQkIMhwpJ8vPzSxEQihYtqlq1auno0aMpxm/QoIFiYmK0Y8cOw8sGshpnLIBs9t///ldly5ZVjRo1MjzNnj17tHnzZr300ksqUKCAlixZoiFDhui///2vQ9eJzNi5c6e+/fZbdevWTUWLFlXp0qV19epVSdK4ceNUqFAhDRo0SGfOnNGiRYs0duxYTZs27a7zTUxMtO+UjhgxQp6enpKkjRs3Ki4uTl27dlWRIkW0f/9+LV26VP/884/9iGznzp114cKFVLuzSNKoUaO0bt06dejQQT169NDp06e1bNkyHTp0SMuXL5e7u7uioqLUp08fFS1aVK+++qoKFSqk06dP67vvvsvQ+3L8+HENHTpUXbp0Ufv27bVmzRq98cYbmjdvnp544gn7Z7dhwwa98sorDtN+9dVXKlCggJo1a5bm/K1Wq/r376+IiAi1bt1aPXv21I0bN7R9+3b9+eefKleuXIbqTMtff/2l/v37y2KxaMiQIfLw8NCJEyf066+/Srq1EztkyBDNmDFDnTt3Vs2aNSXJvj1GRESoX79+8vPz06BBg+Ti4qK1a9fq5Zdf1ueff+6wQyxJb7zxhsqXL6+hQ4em2Fl3NpvNpgEDBtjDVJUqVbRt2zZNmjRJ58+f13/+8x+H8Xfv3q3w8HD16NFDHh4eWr58ufr27avVq1fLbDbfdXnTp09XiRIl1KVLF3322WcphiclJemPP/5Qx44dUwzz9/fXzz//rOvXr8vb21uHDh2SdGsn83bVqlWTq6urDh8+nO6ZmcjISO3du1etW7fWQw89pDNnzmj58uXq2bOnvvnmG+XPn1/Sra5Z3bp105EjR9SxY0dVrVpVly9f1pYtW3T+/HkVK1YsQ9tsZt/rzHjzzTdVpkwZvfXWWzp06JBWr16tYsWK6Z133pF0q0vcu+++q+rVq+vFF1+UJPv36P3339emTZvUvXt3VapUSVeuXNGePXt05MgRVatWLc1l7t27V9KtrkZpWbRokSZMmKDnnntOEyZMSDVUZOSAjCR5e3vLw8Mj3XEuXryoIkWKpGh/9NFH5enpqV9//VVPP/10hpYH5BgbgGxz7do1m9lstg0YMCDD05jNZlu1atVsJ06csLcdPnzYZjabbUuWLLG3jRgxwvbUU0+lmH7GjBk2s9mcYp6VK1e2/fXXXw7ta9assZnNZtsrr7xiS0pKsrd/+OGHtipVqtiuXr2abq0jRoywmc1m2+TJk1MMi42NTdE2e/Zsm8VisZ05c8beNmbMmBT12mw22+7du21ms9m2YcMGh/affvrJof27776zmc1m2/79+9OtNTVPPfWUzWw22zZt2mRvu3btmu2JJ56wtWvXzt62YsUKm9lstv3999/2tvj4eFvdunVtI0aMSHcZX3zxhc1sNtvCwsJSDLv9PTebzbYZM2bYX2f08w0LC7OZzWZbVFRUmjXs37/fZjabbWvWrEmx/BYtWth69+7tUEtsbKytadOmtl69eqVY7ltvvZXu+uakO7ed5G3hs88+cxhv8ODBNovF4vCdMpvNNrPZbIuMjLS3nTlzxubv728bOHDgXZd9+PBhW5UqVWzbtm2z2Wz/9/7c/jlERUXZzGaz7ZNPPkkx/dKlS21ms9l25MgR+7pUqVIl1WXVq1fPNnTo0HTrSe37tnfvXpvZbLatW7fO3jZ9+nSb2Wy2bd68OcX4ydtARrbZjL7Xp06
"text/plain": [
"<Figure size 800x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"=== Churn per cluster K=5 ===\n",
" n_clients churn_hard churn_soft churn_warning\n",
"cluster_k5 \n",
"0 67 0.612 0.925 0.955\n",
"1 37 0.108 0.297 0.108\n",
"2 62 0.000 0.032 0.048\n",
"3 137 0.964 0.978 0.117\n",
"4 124 0.927 0.984 0.403\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAxYAAAGGCAYAAADmRxfNAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAUwlJREFUeJzt3Xd0FOX/9vErWRKSEGqoIh13KQkC0gVBQFCaFBGQIlVEFEUUki9KEwhSBANIN7RQpQiIgIgoQkBElICISG9SEpCShCS7+/zBk/2xpJBkk2wC79c5nENmZ+757Oy95Zq5Z8bFarVaBQAAAAAOcHV2AQAAAACyP4IFAAAAAIcRLAAAAAA4jGABAAAAwGEECwAAAAAOI1gAAAAAcBjBAgAAAIDDCBYAAAAAHEawAAAAAOAwggWQxZhMJo0ZM8bZZTwWGjdurP79+zu7DEn3Xvfp06c7uwzgsTZv3jy9+OKLslgszi4lUa+++qomTpzo7DKAJBEsgExy9uxZjRgxQk2aNJGfn5+qV6+uzp07a9GiRYqOjnZ2eU7z448/8oM6g12+fFnTp0/X0aNHnV2KQ0wmU4r+7du3L1Pr+vXXX23rjoiISPD45cuX9e6776pGjRqqXr26BgwYoHPnziXa1urVq/XSSy/Jz89PzZo105IlSzK6/Gxp9uzZ2r59e7q2efv2bc2fP1/9+vWTq+v//TxKamfP7NmzZTKZFBAQ4FAQOX/+fJJ9+ZtvvrGbt1+/flq2bJmuXr2a5vUBGSmHswsAHgc7d+7Uu+++K3d3d7388ssyGo2KjY3VgQMHNGnSJP3zzz/65JNPnF2mU/z4448KCQnRO++84+xSHllXrlzRjBkzVLx4cVWsWNHZ5aTZg3tqv/76a+3evTvB9HLlymVaTRaLRWPHjpWXl5ciIyMTPH7nzh316NFDt27dUv/+/eXm5qaFCxeqW7duWr9+vfLnz2+bd8WKFRo5cqSaN2+uXr166ddff9XYsWMVFRWlN954I9OeU3YwZ84cNW/eXE2bNk23Nr/66ivFxcWpVatWD5137ty5mjp1qtq1a6dx48bZBZG0atWqlZ577jm7aVWrVrX7u0mTJvL29tayZcv07rvvOrxOIL0RLIAMdu7cOQ0ePFhPPPGEFi1apMKFC9se69q1q86cOaOdO3dmak0Wi0WxsbHKmTNnurcdGRkpLy+vdG8XWU9mv9Yvv/yy3d9//PGHdu/enWB6Zlq5cqUuXbqkV155RYsXL07w+LJly3T69GmtXr1aVapUkSQ1aNBArVu3VnBwsN5//31JUnR0tKZOnapGjRopKChI0r1hLxaLRbNmzVKnTp2UN2/ezHtij6G1a9eqcePGD/1cnD9/vqZMmaK2bdtq/Pjx6RIqJKlSpUoP7cuurq5q3ry5vv76aw0aNEguLi7psm4gvTAUCshg8+fPV2RkpMaNG2cXKuKVKlVKr7/+eoLp27dvV6tWreTr66uWLVvqp59+snvc399fjRs3TrDc9OnTZTKZ7KbFH8rfsGGDWrZsKT8/P+3atUtr166VyWTSgQMHFBgYqDp16qhq1aoaOHBgokM6HuTv769q1arp7Nmz6tevn6pVq6YPPvhA0r3hIYMGDVKjRo3k6+urhg0bavz48XbDvvz9/RUSEmKrMf5fPIvFooULF9pqrlevnkaMGKH//vvPro6wsDD16dNHtWvXVpUqVdS4cWMFBAQ8tP54P//8s15++WX5+fmpRYsW2rZtm+2xc+fOyWQyaeHChQmW++2332QymbRp06Zk2797966mT5+u5s2by8/PT/Xr19fbb7+ts2fPJrlMal7f3bt3q0uXLqpRo4aqVaum5s2b67PPPpMk7du3T6+88ookKSAgwLaN165da1v+jz/+UJ8+ffTMM8/o6aefVrdu3XTgwIFE1/vPP/9oyJAhqlmzpl577bVkn7czREZGasKECWrYsKF8fX3VvHlzLViwQFar1W6++98T8a9L+/bttX///hSv68aNG5o2bZoGDRqkPHnyJDrP1q1b5efnZwsV0r0jKnXr1tW3335rm7Zv3z7duHEjwTbt2rWrIiMjH7rz4cKFCxo1apSaN2+uKlWqqHbt2ho0aJDOnz+fYN6bN29q/Pjxaty4sXx9ffXcc89p6NChdu/5lPTZlGzr+GE+9/e3eA+eVxTfx86cOSN/f3/VqFFDzzzzjAICAhQVFWW3XGRkpNatW2frz/7+/pLuDWcaN26c7bnVrVtXvXr10pEjR5LdfufOndOxY8dUr169ZOcLDg7WpEmT1KZNGwUGBqZbqIgXGRmpmJiYZOepV6+eLly4kO2HNuLRxBELIIP98MMPKlGihKpXr57iZQ4cOKBt27bptddeU65cubRkyRINGjRIP/zwg93QidTYu3evvv32W3Xt2lX58+dX8eLFdfPmTUnS2LFjlSdPHr399tu6cOGCFi1apDFjxmjatGkPbTcuLs72o3TYsGHy8PCQJG3ZskXR0dHq0qWL8uXLp0OHDmnp0qX6999/bXtkO3XqpCtXriQ6nEWSRowYoXXr1ql9+/bq3r27zp8/r5CQEP35559avny53NzcFB4erj59+ih//vx64403lCdPHp0/f17fffddirbL6dOnNXjwYHXu3Fnt2rXTmjVr9O6772r+/Pl69tlnba/dhg0b1LNnT7tlN27cqFy5cqlJkyZJtm82m9W/f3+FhoaqZcuW6tGjh+7cuaPdu3fr77//VsmSJVNUZ1KOHz+u/v37y2QyadCgQXJ3d9eZM2f022+/Sbr3I3bQoEEKCgpSp06d9Mwzz0iSrT+GhoaqX79+8vX11dtvvy0XFxetXbtWr7/+upYtW2b3g1iS3n33XZUqVUqDBw9O8GPd2axWqwYMGGALUxUrVtSuXbs0ceJEXb58Wf/73//s5t+/f782b96s7t27y93dXcuXL1ffvn21evVqGY3Gh67v888/V6FChdS5c2d98cUXCR63WCw6duyYOnTokOAxPz8//fzzz7p9+7a8vb31559/SpJ8fX3t5qtcubJcXV119OjRZPdmh4WF6eDBg2rZsqWKFi2qCxcuaPny5erRo4e++eYbeXp6Sro3NKtr1646ceKEOnTooEqVKun69evasWOHLl++rAIFCqSoz6Z2W6fGe++9pyeffFLvv/++/vzzT61evVoFChTQhx9+KOnekLiPPvpIVapU0auvvipJtvfRyJEjtXXrVnXr1k3lypXTjRs3dODAAZ04cUKVK1dOcp0HDx6UdO+oQVIWLVqkCRMmqFWrVpowYUKioSIlO2QkydvbW+7u7nbTZsyYoYkTJ8rFxUWVK1fW4MGDVb9+/QTLxveR3377Ldl6AaewAsgwt27dshqNRuuAAQNSvIzRaLRWrlzZeubMGdu0o0ePWo1Go3XJkiW2acOGDbM+//zzCZYPCgqyGo3GBG1WqFDBevz4cbvpa9assRqNRmvPnj2tFovFNn38+PHWihUrWm/evJlsrcOGDbMajUbr5MmTEzwWFRWVYNqcOXOsJpPJeuHCBdu00aNHJ6jXarVa9+/fbzUajdYNGzbYTf/pp5/spn/33XdWo9FoPXToULK1Jub555+3Go1G69atW23Tbt26ZX322Wetbdu2tU1bsWKF1Wg0Wv/55x/btJiYGGvt2rWtw4YNS3YdX331ldVoNFqDg4MTPHb/NjcajdagoCDb3yl9fYODg61Go9EaHh6eZA2HDh2yGo1G65o1axKsv1mzZtbevXvb1RIVFWVt3LixtVevXgnW+/777yf7fDPTg30nvi988cUXdvO98847VpPJZPeeMhqNVqPRaA0LC7NNu3DhgtXPz886cODAh6776NGj1ooVK1p37dpltVr/b/vc/zqEh4dbjUajdcaMGQmWX7p0qdVoNFpPnDhhey4VK1ZMdF116tSxDh48ONl6Enu/HTx40Go0Gq3r1q2zTfv888+tRqPRum3btgTzx/eBlPTZlG7rc+fOJdr3rNaEfT5+GwYEBNjNN3DgQGutWrXsplWtWjXR994zzzxjHT16dILpDzN16lSr0Wi03r59O9E
"text/plain": [
"<Figure size 800x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuEAAAKyCAYAAAB7WgDLAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzsvXecZUWZPv7UyTffvp17untyACY4IAxJUIKKoCxZlMEALKyk1d0V+MpPQREUWZRkZAUWwQSigCQFUYkSlDBMYlJP93TuvvneE+v3R9Wpus3MwIAjDO55Ph8+9Nx77jl16tSpeut9n/d5CaWUIkKECBEiRIgQIUKECG8blHe6AREiRIgQIUKECBEi/F9DZIRHiBAhQoQIESJEiPA2IzLCI0SIECFChAgRIkR4mxEZ4REiRIgQIUKECBEivM2IjPAIESJEiBAhQoQIEd5mREZ4hAgRIkSIECFChAhvMyIjPEKECBEiRIgQIUKEtxmRER4hQoQIESJEiBAhwtuMyAiPECFChAgRIkSIEOFtRmSER9hl8Ktf/Qrz589Hf3//O92UdxTLly/H8uXL3+lmRNgGomcTIQLw4osvYuHChRgYGHinm7JNXHXVVTjhhBPe6WZEiPCGiIzwdylCg/Wll15607+t1Wq47rrr8PTTT/8DWrbr4o9//COuu+66t/WaF154IebPny/+W7p0KQ499FCcd955ePDBBxEEwU65zvPPP4/rrrsOxWJxp5zvnUbYb3vuuSfq9fpW32/cuFH06f/8z/+86fMPDw/juuuuw8qVK3dGc3cZLF++fMp4295/b/d7UCwWsd9++2H+/Pl44IEHtvrecRx861vfwoEHHojFixfjhBNOwOOPP77Ncz3//PM4+eSTsWTJEhxwwAG47LLLUKlU3lR71q1bh/nz52PRokX/NO/MtnDPPffg5ptv3unn/fa3v40jjzwS06ZNE58tX74cRx111FbHPvnkk1iyZAmOOeYY5PP5v+u6hxxyyDbH85e//OUpx33qU5/CqlWr8PDDD/9d14sQ4R8N7Z1uQIS3H7VaDddffz3OOeccLFu27J1uztuGP/7xj7jttttw7rnnvq3XNQwDl112GQDAtm0MDAzgD3/4A8477zzss88++N73vodkMimOfytG5V//+ldcf/31OOaYY5BOp3da299JaJqGer2ORx55BB/5yEemfHfPPffANE3Ytv2Wzj0yMoLrr78e06ZNw2677bbDv3srz+btxFlnnYXjjz9e/Pull17CrbfeirPOOguzZs0Sn8+fP/9tbde11167zc1UiAsvvBAPPvggTj31VMyYMQN33XUX/vVf/xW33HIL3vve94rjVq5ciU9/+tOYPXs2LrzwQgwNDeHHP/4xNm7ciBtvvHGH23P33XejtbUVhUIBDz744D+t1/Tee+/F2rVr8elPf3qnnXPlypV44okn8LOf/ewNj33yySdx1llnYebMmbjpppuQzWb/7uvvtttu+MxnPjPls5kzZ075d2trKw499FD8+Mc/xqGHHvp3XzNChH8UIiM8wk5DtVpFPB5/p5vxtoJSCtu2YVnWdo/RNA1HH330lM8+//nP44c//CH++7//GxdffDG+853viO8Mw/hHNfddBcMwsOeee+K3v/3tVkb4vffei/e///148MEH35a21Go1xGKxXf7ZHHDAAVP+bZombr31Vuy///7v2IZ7zZo1+OlPf4rPfe5zuPbaa7f6/sUXX8Rvf/tbfPGLX8Rpp50GAPiXf/kXHHXUUbjqqqumGHtXX3010uk0br31VrFx7e7uxsUXX4zHHnsMBx544Bu2h1KKe+65B0cddRT6+/tx9913/9Ma4f8I3Hnnnejq6sJ73vOe1z3uL3/5C/7t3/4NM2bM2GkGOAC0t7dvNZ9uC0cccQTOP/98bN68GT09PTvl2hEi7GxEdJR/Ilx44YVYunQphoeH8bnPfQ5Lly7Fvvvui29+85vwfR8A0N/fj/322w8AcP31128zPL1u3TrhpV20aBGOPfbYrcJ6IR3mL3/5Cy655BLst99+OPjgg1+3fevWrcP555+PfffdF4sXL8aHPvQhfPvb337d32wvdH7IIYfgwgsvFP92XRfXX389PvjBD2LRokVYtmwZTj75ZBHSvvDCC3HbbbeJc4b/hQiCADfffDOOPPJILFq0CPvvvz++/OUvo1AobHXdM888E3/+859x7LHHYvHixTvkEdoW/vVf/xUHHnggHnjgAWzYsEF8vi3e8a233oojjzwSS5Yswd57741jjz0W99xzDwDguuuuw5VXXgkAOPTQQ8W9hdz6O++8E6eeeir2228/LFy4EB/5yEdw++23b7NPzzzzTDz77LM4/vjjsWjRIhx66KH49a9/vdWxxWIRl19+OQ455BAsXLgQBx10EL74xS9iYmJCHOM4Dq699locfvjhWLhwIQ4++GBceeWVcBxnh/voqKOOwp/+9KcplIEXX3wRGzdu3GboO5/P45vf/CY++tGPYunSpdhzzz1x+umnY9WqVeKYp59+WniLL7roItFfv/rVrwDIsPrLL7+MT37yk1iyZAmuvvpq8V3js7nggguwaNEirFu3bko7TjvtNOy9994YHh7e4Xt9O3HbbbfhyCOPxMKFC3HggQfi0ksv3YqW0dgPH//4x7F48WIccsgh+OlPf/qmrvX1r38dhx122BSPdiMeeOABqKqKk046SXxmmiaOP/54/PWvf8Xg4CAAoFwu44knnsDHPvaxKZGjo48+GvF4HPfff/8Otee5557DwMAAPvKRj+AjH/kInn32WQwNDW11XBAEuOWWW/DRj34UixYtwr777ovTTjttKwrgb37zGxx//PHi3fzkJz+Jxx57bMoxO9Lfr53TQrx2zD399NOYP38+7rvvPnzve9/DQQcdhEWLFuFTn/oUNm3aNOV3jz76KAYGBsQYP+SQQ8T3rzenvB4efvhh7LvvviCEbPeYZ599FmeeeSZ6e3tx0003oamp6Q3P+2bgOA6q1errHrP//vuL9kaIsKsi8oT/k8H3fZx22mlYvHgxvvjFL+LJJ5/Ej3/8Y/T09OATn/gEcrkcLrnkElxyySU4/PDDcfjhhwOQ4em1a9fi5JNPRnt7O8444wyxuJ199tm47rrrxPEhLr30UuRyOZx99tmvOymuWrUKn/zkJ6FpGk466SRMmzYNfX19eOSRR/D5z3/+777v66+/Hj/4wQ9wwgknYPHixSiXy3j55ZexYsUKHHDAATjppJMwMjKCxx9/XBisjfjyl7+Mu+66C8ceeyyWL1+O/v5+3HbbbXjllVfw05/+FLqui2M3bNiA//iP/8BJJ52EE088catQ6JvBxz72MTz22GN44okntnueX/ziF7jsssvwoQ99CKeeeips28bq1avxwgsv4KMf/SgOP/xwbNy4Effeey8uuugiseDlcjkAwE9/+lPMnTsXhxxyCDRNwx/+8AdceumloJTik5/85JRrbdq0Ceeffz6OP/54HHPMMbjzzjtx4YUXYo899sDcuXMBAJVKBZ/85Cexbt06HHfccdh9990xOTmJRx55BMPDw8jlcgiCAP/2b/+G5557DieeeCJmz56NNWvW4JZbbsHGjRvx3e9+d4f65/DDD8dXvvIVPPTQQ8JwvvfeezFr1izsvvvuWx2/efNm/P73v8eHP/xhdHd3Y2xsDD//+c9xyimn4Le//S3a29sxe/ZsnHfeebj22mtx0kknYa+99gIA7LnnnuI8+XweZ5xxBo488kh87GMfQ3Nz8zbb96UvfQlPPfUULrjgAvz85z+Hqqr42c9+hsceewxXXnkl2tvbd+g+305cd911uP7667H//vvj5JNPxoYNG/DTn/4UL7300lZjvVAo4F//9V9xxBFH4Mgjj8T999+PSy65BLquT6G9bA/3338//vrXv+K+++7bbhLfypUrMWPGjCmGNQAsXrxYfN/Z2YnVq1fD8zwsXLhwynGGYWC33XbbYX7/Pffcg97eXixevBjz5s2DZVm49957cfrpp0857ktf+hJ+9atf4aCDDsLxxx8P3/fx7LP
"text/plain": [
"<Figure size 800x700 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===== K=2 =====\n",
" n_clients aum_qty_mean_med freq_med gross_flow_to_aum_med n_tx_med holding_med exit_rate_med flow_dir_med drawdown_med months_inactive_med corr_fund_lag3_med corr_rate_lag3_med\n",
"cluster_k2 \n",
"0 325 91586.099 0.786 4.452 1872.0 48.347 0.637 0.073 1.00 0.0 0.054 -0.041\n",
"1 102 47913.297 0.154 2.652 24.0 35.523 0.381 0.347 0.17 2.0 0.024 -0.025\n",
"\n",
"===== K=5 =====\n",
" n_clients aum_qty_mean_med freq_med gross_flow_to_aum_med n_tx_med holding_med exit_rate_med flow_dir_med drawdown_med months_inactive_med corr_fund_lag3_med corr_rate_lag3_med\n",
"cluster_k5 \n",
"3 137 62616.679 0.777 5.471 1448.0 40.857 0.688 0.245 1.000 0.0 0.009 0.005\n",
"4 124 161746.356 0.926 5.155 4935.5 57.100 0.620 0.037 1.000 0.0 0.158 -0.140\n",
"0 67 58391.143 0.557 1.488 819.0 52.905 0.567 -0.487 1.000 0.0 0.002 -0.024\n",
"2 62 31466.909 0.372 4.383 90.5 32.149 0.434 0.287 0.077 1.0 0.042 -0.034\n",
"1 37 71234.484 0.043 1.389 4.0 42.429 0.250 0.557 0.303 19.0 -0.000 -0.012\n"
]
}
],
"source": [
"# 1. 2D behavioral segmentation: relative intensity vs activity frequency\n",
"thr_int = dfc_top400[\"gross_flow_to_aum\"].median()\n",
"thr_freq = dfc_top400[\"flow_freq\"].median()\n",
"\n",
"def quadrant(row):\n",
" low_int = row[\"gross_flow_to_aum\"] < thr_int\n",
" low_frq = row[\"flow_freq\"] < thr_freq\n",
" if low_int and low_frq: return \"Dormant (low int, low freq)\"\n",
" if low_int and not low_frq: return \"Small rebalancers (low int, high freq)\"\n",
" if not low_int and low_frq: return \"Occasional large movers (high int, low freq)\"\n",
" return \"Highly active (high int, high freq)\"\n",
"\n",
"dfc_top400[\"seg_2D\"] = dfc_top400.apply(quadrant, axis=1)\n",
"print(dfc_top400[\"seg_2D\"].value_counts())\n",
"print(f\"thr_int: {thr_int:.4f} | thr_freq: {thr_freq:.4f}\")\n",
"\n",
"plt.figure(figsize=(9, 5))\n",
"for name, g in dfc_top400.groupby(\"seg_2D\"):\n",
" plt.scatter(g[\"flow_freq\"], g[\"gross_flow_to_aum\"], s=10, label=name)\n",
"plt.yscale(\"log\")\n",
"plt.axvline(thr_freq, linestyle=\"--\", color=\"gray\")\n",
"plt.axhline(thr_int, linestyle=\"--\", color=\"gray\")\n",
"plt.xlabel(\"Activity frequency (share of active months)\")\n",
"plt.ylabel(\"Gross flow / mean AUM [log scale]\")\n",
"plt.title(\"2D Behavioral Segmentation — Top 400 Accounts\")\n",
"plt.legend(markerscale=2)\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"# 2. K=5 clusters in the 2D intensity-frequency space\n",
"labels_map = {\n",
" 0: \"Cluster 0: Large & highly active movers\",\n",
" 1: \"Cluster 1: Occasional large movers\",\n",
" 3: \"Cluster 3: Dormant profiles\",\n",
" 4: \"Cluster 4: Loyal clients\"\n",
"}\n",
"\n",
"plt.figure(figsize=(9, 5))\n",
"for name, g in dfc_top400[~dfc_top400[\"cluster_k5\"].isin([2])].groupby(\"cluster_k5\"):\n",
" plt.scatter(\n",
" g[\"flow_freq\"], g[\"gross_flow_to_aum\"],\n",
" s=10, label=labels_map.get(int(name), f\"Cluster {int(name)}\")\n",
" )\n",
"plt.yscale(\"log\")\n",
"plt.axvline(thr_freq, linestyle=\"--\", color=\"gray\")\n",
"plt.axhline(thr_int, linestyle=\"--\", color=\"gray\")\n",
"plt.xlabel(\"Activity frequency (share of active months)\")\n",
"plt.ylabel(\"Gross flow / mean AUM [log scale]\")\n",
"plt.title(\"K=5 Clusters — Intensity / Frequency Space (excluding extreme outlier C2)\")\n",
"plt.ylim(0.1, 1000)\n",
"plt.legend(markerscale=2)\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"# 3. Dual view: trading intensity and churn/loyalty dimensions\n",
"if \"log_n_tx_total\" not in dfc_top400.columns:\n",
" dfc_top400[\"log_n_tx_total\"] = np.log1p(dfc_top400[\"n_tx_total\"].clip(lower=0))\n",
"\n",
"thr_log_tx = dfc_top400[\"log_n_tx_total\"].median()\n",
"thr_churn = dfc_top400[\"aum_drawdown_last\"].median()\n",
"thr_hold = dfc_top400[\"avg_holding_months_per_isin\"].median()\n",
"\n",
"color_map = {1: \"#ff7f0e\", 4: \"red\"}\n",
"\n",
"fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
"\n",
"# Graphique 1 : log_n_tx_total vs gross_flow_to_aum\n",
"for name, g in dfc_top400[~dfc_top400[\"cluster_k5\"].isin([2, 4])].groupby(\"cluster_k5\"):\n",
" axes[0].scatter(\n",
" g[\"log_n_tx_total\"], g[\"gross_flow_to_aum\"],\n",
" s=10, label=labels_map.get(int(name), f\"Cluster {int(name)}\")\n",
" )\n",
"axes[0].set_yscale(\"log\")\n",
"axes[0].axvline(thr_log_tx, linestyle=\"--\", color=\"gray\")\n",
"axes[0].axhline(thr_int, linestyle=\"--\", color=\"gray\")\n",
"axes[0].set_xlabel(\"Activity frequency (log n_tx_total)\")\n",
"axes[0].set_ylabel(\"Gross flow / mean AUM\")\n",
"axes[0].set_title(\"Trading intensity vs. frequency (log transactions)\")\n",
"axes[0].set_ylim(0.1, 1000)\n",
"axes[0].legend(markerscale=2)\n",
"\n",
"# Graphique 2 : avg_holding_months_per_isin vs aum_drawdown_last\n",
"for name, g in dfc_top400[~dfc_top400[\"cluster_k5\"].isin([0, 2, 3])].groupby(\"cluster_k5\"):\n",
" axes[1].scatter(\n",
" g[\"avg_holding_months_per_isin\"], g[\"aum_drawdown_last\"],\n",
" s=10,\n",
" color=color_map.get(int(name), \"gray\"),\n",
" label=labels_map.get(int(name), f\"Cluster {int(name)}\")\n",
" )\n",
"axes[1].set_yscale(\"log\")\n",
"axes[1].axvline(thr_hold, linestyle=\"--\", color=\"gray\")\n",
"axes[1].axhline(thr_churn, linestyle=\"--\", color=\"gray\")\n",
"axes[1].set_xlabel(\"avg_holding_months_per_isin\")\n",
"axes[1].set_ylabel(\"aum_drawdown_last\")\n",
"axes[1].set_title(\"Churn risk vs. loyalty (clusters 1 and 4)\")\n",
"axes[1].set_ylim(0.001, 1.3)\n",
"axes[1].legend(markerscale=2)\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n",
"\n",
"# 4. Cluster signature heatmap — K=5\n",
"prof_louis_k5 = plot_heatmap(\n",
" dfc_top400, profile_vars_top400, \"cluster_k5\",\n",
" title=\"Cluster signatures — 400 top accounts K=5 (robust z-score)\",\n",
" figsize=(16, 4)\n",
")\n",
"print(\"\\n=== Medians K=5 (Top400) ===\")\n",
"print(prof_louis_k5.round(3).to_string())\n",
"\n",
"# 5. Churn analysis\n",
"dfc_top400[\"churn_hard\"] = (dfc_top400[\"aum_final_to_peak\"] < 0.10).astype(int)\n",
"dfc_top400[\"churn_soft\"] = (\n",
" (dfc_top400[\"aum_final_to_peak\"] < 0.40) &\n",
" (dfc_top400[\"aum_drawdown_last\"] > 0.40)\n",
").astype(int)\n",
"dfc_top400[\"churn_warning\"] = (\n",
" (dfc_top400[\"flow_direction_balance\"] < 0) &\n",
" (dfc_top400[\"aum_drawdown_last\"] > 0.20)\n",
").astype(int)\n",
"\n",
"print(\"\\n=== Overall churn rates ===\")\n",
"print(dfc_top400[[\"churn_hard\", \"churn_soft\", \"churn_warning\"]].mean().round(3))\n",
"\n",
"for k in [2, 5]:\n",
" churn_profile = (\n",
" dfc_top400.groupby(f\"cluster_k{k}\")\n",
" .agg(\n",
" n_clients = (ID_COL, \"count\"),\n",
" churn_hard = (\"churn_hard\", \"mean\"),\n",
" churn_soft = (\"churn_soft\", \"mean\"),\n",
" churn_warning = (\"churn_warning\", \"mean\"),\n",
" )\n",
" )\n",
" print(f\"\\n=== Churn per cluster K={k} ===\")\n",
" print(churn_profile.round(3).to_string())\n",
"\n",
" churn_profile[[\"churn_hard\", \"churn_soft\", \"churn_warning\"]].plot(\n",
" kind=\"bar\", figsize=(8, 4),\n",
" color=[\"#d62728\", \"#ff7f0e\", \"#ffbb78\"]\n",
" )\n",
" plt.title(f\"Churn rates by cluster — Top 400 accounts (K={k})\")\n",
" plt.ylabel(\"Rate\")\n",
" plt.xlabel(\"Cluster\")\n",
" plt.xticks(rotation=0)\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
"# 6. Inter-cluster distance matrix\n",
"def plot_distance_matrix(X_scaled, labels, max_points=400,\n",
" title=\"Distance matrix\"):\n",
" n = X_scaled.shape[0]\n",
" idx = np.arange(n)\n",
" if n > max_points:\n",
" rng = np.random.default_rng(42)\n",
" idx = rng.choice(idx, size=max_points, replace=False)\n",
" X_sub = X_scaled[idx]\n",
" labels_sub = np.asarray(labels)[idx]\n",
" order = np.lexsort((np.arange(len(labels_sub)), labels_sub))\n",
" X_sub = X_sub[order]\n",
" labels_sub = labels_sub[order]\n",
" D = pairwise_distances(X_sub)\n",
" plt.figure(figsize=(8, 7))\n",
" sns.heatmap(D, cmap=\"viridis\")\n",
" unique_labels, counts = np.unique(labels_sub, return_counts=True)\n",
" for b in np.cumsum(counts)[:-1]:\n",
" plt.axhline(b, color=\"red\", linewidth=2)\n",
" plt.axvline(b, color=\"red\", linewidth=2)\n",
" plt.title(title)\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
"plot_distance_matrix(\n",
" X_top400_scaled,\n",
" dfc_top400[\"cluster_k5\"].values,\n",
" title=\"Inter-cluster Distance Matrix — Top 400 Accounts (K=5)\"\n",
")\n",
"\n",
"# 7. Full cluster profile table\n",
"for k in [2, 5]:\n",
" print(f\"\\n===== K={k} =====\")\n",
" prof = (\n",
" dfc_top400.groupby(f\"cluster_k{k}\")\n",
" .agg(\n",
" n_clients = (ID_COL, \"count\"),\n",
" aum_qty_mean_med = (\"aum_qty_mean\", \"median\"),\n",
" freq_med = (\"flow_freq\", \"median\"),\n",
" gross_flow_to_aum_med = (\"gross_flow_to_aum\", \"median\"),\n",
" n_tx_med = (\"n_tx_total\", \"median\"),\n",
" holding_med = (\"avg_holding_months_per_isin\",\"median\"),\n",
" exit_rate_med = (\"exit_rate_per_isin\", \"median\"),\n",
" flow_dir_med = (\"flow_direction_balance\", \"median\"),\n",
" drawdown_med = (\"aum_drawdown_last\", \"median\"),\n",
" months_inactive_med = (\"months_since_last_tx\", \"median\"),\n",
" corr_fund_lag3_med = (\"corr_flow_fund_lag3\", \"median\"),\n",
" corr_rate_lag3_med = (\"corr_flow_rate_lag3\", \"median\"),\n",
" )\n",
" .sort_values(\"n_clients\", ascending=False)\n",
" )\n",
" print(prof.round(3).to_string())"
]
},
{
"cell_type": "markdown",
"id": "c97f67e5",
"metadata": {},
"source": [
"---\n",
"## 7. Cross-Analysis: Global vs Top 400 Accounts\n",
"\n",
"The Adjusted Rand Index (ARI) measures whether the two segmentations agree on the accounts they share. An ARI close to 0 means the two clusterings are independent — which is expected and desirable, as they are built on different feature sets and objectives.\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "b2716808",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accounts present in both analyses: 293\n",
"\n",
"Croisement K=4 x K=5 (n=293) :\n",
" Top 400 Accounts C0 Top 400 Accounts C1 Top 400 Accounts C2 Top 400 Accounts C3 Top 400 Accounts C4\n",
"Global C0 20.0 39.0 30.0 9.0 2.0\n",
"Global C1 0.0 100.0 0.0 0.0 0.0\n",
"Global C2 20.0 0.0 15.0 41.0 23.0\n",
"Global C3 4.0 59.0 30.0 7.0 0.0\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuwAAAHqCAYAAABfkRt8AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAj/5JREFUeJzs3XdYU2cbBvA7IBtkg8oUlaGy3OLA1Wrdu1VEq2Lds9q668ZaR1UUB4h7z7pX1datdSu4QHHLkA0yku8PP9JGQANJCBzvn1euNu8ZeQ4PIU/e8573iCQSiQRERERERFQiaag7ACIiIiIiKhgLdiIiIiKiEowFOxERERFRCcaCnYiIiIioBGPBTkRERERUgrFgJyIiIiIqwViwExERERGVYCzYiYiIiIhKMBbsREREREQlGAt2KrSlS5fCxcWlSNs2a9YMAwcOVFosz58/h4uLC3bv3l2i9qUO48ePR7NmzdT2+mKxGG3btkVwcLDaYviUR48eoWrVqnjw4IG6QyEqslu3bqF69ep48eKFukPJ1/z589GtWzd1h0EkOCzYCQDw7NkzzJgxAy1btoSnpyc8PT3RunVrTJ8+HREREeoOTxAePXqEpUuX4vnz5+oORSUOHDiAV69eoVevXtK23bt3w8XFBbdv35ZZNzk5GV27doW7uzv++usvpcbRt29fuLi4YMaMGTLtlStXhq+vL5YsWaLU15PHH3/8ARcXF3h7e+e7/PHjx+jfvz+8vb1Rp04djBs3DvHx8XnWE4vFWL16NZo1awZ3d3e0a9cOBw4ckCuGM2fOYOnSpQodB8lv//79WLt2rdL3u2jRIrRp0wY2NjbSNn9/f7Rt2zbPuhcuXICnpyc6deqEhIQEhV63WbNmcHFxyfOYOnWqzHp9+vRBREQETp48qdDrEZGsMuoOgNTv1KlTGD16NDQ1NdGuXTu4urpCQ0MDkZGROHbsGLZs2YKTJ0/KfEBQ4T169AhBQUGoU6cObG1tlb7/mTNnQiKRKH2/8goNDUWbNm1gZGT0yfVSUlLQr18/3L9/H0FBQWjcuLHSYjh27Bhu3LhR4PLvvvsOP/zwA6Kjo2Fvb6+01/2U1NRU/Pbbb9DX1893+evXr+Hn5wcjIyOMHj0aaWlpWLNmDR48eIAdO3ZAW1tbuu6iRYuwatUqdO/eHe7u7jh58iR+/PFHiEQitGnT5pNxnDlzBps2bcLw4cOVenyUvwMHDuDhw4f4/vvvlbbP8PBwnD9/Hlu3bv3suhcuXMCgQYNQsWJFhIWFwcTEROHXd3NzQ9++fWXaKlasKPPc0tISzZs3x5o1a9C8eXOFX5OIPmDB/oWLjo7GmDFjUKFCBaxduxZWVlYyy8eOHYvNmzdDQ4MnY0qqtLQ06OvrQ0tLS20x3Lt3DxERERg/fvwn10tJSUH//v0RHh6OoKAg+Pr6Ki2G9+/fY+7cuQgICCiwF93HxwfGxsbYs2cPRo4cqbTX/pTg4GAYGBigbt26+fY6rlixAunp6di9ezcqVKgAAPDw8EDfvn2xZ88efPvttwCAN2/eICwsDH5+ftJezW7duqFXr16YN28eWrVqBU1NzWI5JlKPXbt2oUKFCvDy8vrkepcvX8bgwYPh6OiotGIdAKytrdGhQ4fPrvfNN99g5MiRePbsGezs7JTy2kRfOlZhX7iQkBCkpaUhMDAwT7EOAGXKlEHv3r1Rvnz5T+4nOzsby5YtQ4sWLVC9enU0a9YMCxcuRGZmZr7rnz17Fh06dIC7uztat26NY8eOySxPSEjAr7/+inbt2sHb2xs1atRAQECAQsNzkpKSMGfOHDRr1gzVq1dH48aN8dNPP+U79CCXv78//P3987TnN1784MGD6Ny5szTedu3aYd26dQA+DA3JLRB79+4tPZ186dIl6fZnzpxBz5494eXlBW9vb/zwww94+PBhntf19vZGdHQ0BgwYAG9vb4wdOzbfmHLH5IeGhmLbtm3S3HTp0gW3bt3Kc0yHDx9G69at4e7ujrZt2+L48eNyj4s/ceIEtLS0UKtWrQLXSU1NRUBAAO7evYulS5eiSZMmn91vYaxevRoSiQT9+/cvcB0tLS3UqVPns6frMzIy0KpVK7Rq1QoZGRnS9oSEBDRs2BDfffcdcnJyPhvTkydPsHbtWkyYMAFlyuTfP3Ls2DE0adJEWqwDH75YODo64vDhw9K2EydOICsrCz179pS2iUQi9OjRA69fv8b169cLjGP8+PHYtGkTAMgMZ8iVlpaGuXPnwtfXF9WrV0fLli0RGhqa54xN7lCjP/74Ay1btoS7uzs6d+6MK1eufPZnkZmZicWLF6Nz586oWbMmvLy80LNnT1y8eDHPumKxGOvWrUO7du3g7u6OevXqoX///nmGVu3btw9du3aFp6cnateuDT8/P5w9e1ZmnU2bNqFNmzaoXr06GjZsiOnTpyMpKUlmnWbNmuX7ZfPj9/+lS5fg4uKCQ4cOITg4GI0bN4a7uzv69OmDp0+fymx3+vRpvHjxQvqz/u/7aMOGDWjTpo007s6dO2P//v2f/RmePHkS9erVg0gkKnCdq1evYuDAgbC3t0dYWBhMTU0/u9/CyMzMRFpa2ifX8fHxkcZLRMrBHvYv3KlTp+Dg4ABPT0+F9jN58mTs2bMHLVu2RN++fXHr1i2sXLkSjx8/xrJly2TWffLkCUaPHo3vvvsOnTp1wq5duzBy5EiEhISgQYMGAD6MqT9x4gRatWoFW1tbxMbGYtu2bejVqxcOHjwIa2vrQsWXmpoKPz8/PH78GF26dEHVqlXx7t07/Pnnn3jz5g3MzMwUOv5z585hzJgxqF+/vrSAjoyMxLVr19CnTx/Url0b/v7+2LBhAwYNGgQnJycAQKVKlQAAe/fuxfjx49GwYUOMHTsW6enp2LJlC3r27Ik9e/bIDKHJzs5G//79UbNmTfz888/Q1dX9ZGwHDhxAamoqvv32W4hEIoSEhGD48OHSIhsATp8+jdGjR8PZ2Rk//vgjEhMTMWnSJLl/ztevX4ezs3OBvfzp6ekYMGAA7ty5g8WLF6Np06Z51snMzERKSopcr/dxvl6+fInVq1djzpw5n/15VKtWDSdPnkRKSgoMDQ3zXUdXVxe//vorevTogUWLFmHChAkAgBkzZiA5ORmBgYFy9WbPmTMHdevWha+vr0zxnevNmzeIi4tD9erV8yzz8PCQGd8fHh4OfX196e/Mf9fLXV7QF6Zvv/0Wb9++xblz5zBv3jyZZRKJBIMHD8alS5fQtWtXuLm54e+//8a8efPw5s0bTJw4UWb9K1eu4NChQ/D394e2tja2bNmCgIAA7NixA87OzgX+LFJSUrBjxw60bdsW3bp1Q2pqKnbu3Cnd1s3NTbrupEmTsHv3bjRu3Bhdu3ZFTk4Orl69ips3b8Ld3R0AEBQUhKVLl8Lb2xsjRoyAlpYWbt68iYsXL6Jhw4YAPlwgHxQUBB8fH/To0QNRUVHYsmULbt++jS1bthT5rNTq1ashEonQr18/pKSkICQkBGPHjsWOHTsAAIMGDUJycjJev34t/d0xMDAAAGzfvh2zZs1Cy5Yt0bt3b7x//x7379/HzZs30a5duwJf882bN3j58iWqVq1a4Dr//PMPBgwYAFtbW6xduzbfv2vJycnIysr67DHq6OhIY8518eJFeHl5IScnBzY2NujTpw/69OmTZ1sjIyPY29vj2rVrSh0SRPQlY8H+BUtJScHbt2/RokWLPMuSkpKQnZ0tfa6vr19gIRQREYE9e/agW7dumDVrFgDAz88PZmZmWLNmDS5evIh69epJ13/y5AmWLl2Kr7/+GgDQtWtXtGrVCvPnz5cW7C4uLjh69KjMUJwOHTrgm2++wc6dOzF06NBCHWtoaCgePHiAoKAgfPXVV9L2IUOGKGXc9+nTp2FoaIjQ0NB8Czk7OzvUqlULGzZsgI+PD+rWrStdlpqaitmzZ6Nbt26YOXOmtL1Tp05o1aoVVq5cKdOemZmJVq1a4ccff5QrtpcvX+LYsWMwNjYG8GHM6ZAhQ3D27Flp4bxgwQJYW1tjy5Yt0g/p+vXrw9/fX65
"text/plain": [
"<Figure size 800x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Adjusted Rand Index Global x Top 400 Accounts : 0.1341\n",
"(0 = independent segmentations, 1 = identical)\n"
]
}
],
"source": [
"# Comptes communs\n",
"common_ids = set(dfc[ID_COL]).intersection(set(dfc_top400[ID_COL]))\n",
"print(f\"Accounts present in both analyses: {len(common_ids)}\")\n",
"\n",
"# Croisement des clusterings sur les shared accounts\n",
"dfc_compare = (\n",
" dfc[dfc[ID_COL].isin(common_ids)][[ID_COL, \"cluster_k4\"]]\n",
" .rename(columns={\"cluster_k4\": \"cluster_global\"})\n",
" .merge(\n",
" dfc_top400[dfc_top400[ID_COL].isin(common_ids)][[ID_COL, \"cluster_k5\"]]\n",
" .rename(columns={\"cluster_k5\": \"cluster_400_accounts\"}),\n",
" on=ID_COL\n",
" )\n",
")\n",
"\n",
"print(f\"\\nCroisement K=4 x K=5 (n={len(dfc_compare)}) :\")\n",
"ct = pd.crosstab(\n",
" dfc_compare[\"cluster_global\"],\n",
" dfc_compare[\"cluster_400_accounts\"],\n",
" normalize=\"index\"\n",
").round(2) * 100\n",
"ct.index = [f\"Global C{i}\" for i in ct.index]\n",
"ct.columns = [f\"Top 400 Accounts C{i}\" for i in ct.columns]\n",
"print(ct.to_string())\n",
"\n",
"plt.figure(figsize=(8, 5))\n",
"sns.heatmap(ct, cmap=\"Blues\", annot=True, fmt=\".1f\",\n",
" cbar_kws={\"label\": \"%\"})\n",
"plt.title(\"Global clustering (K=4) x 400 top accounts (K=5)\")\n",
"plt.tight_layout(); plt.show()\n",
"\n",
"ari = adjusted_rand_score(\n",
" dfc_compare[\"cluster_global\"].values,\n",
" dfc_compare[\"cluster_400_accounts\"].values\n",
")\n",
"print(f\"\\nAdjusted Rand Index Global x Top 400 Accounts : {ari:.4f}\")\n",
"print(\"(0 = independent segmentations, 1 = identical)\")"
]
},
{
"cell_type": "code",
2026-04-08 17:41:37 +02:00
"execution_count": 28,
2026-04-07 20:26:19 +02:00
"id": "5a3ec2e8-d19f-43d5-80c7-5deb19c33197",
"metadata": {},
2026-04-08 17:41:37 +02:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABYgAAAGGCAYAAADRkGXUAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XVYVNkbwPHvUEooaawIIiKgIgoGilis3R1rt66da6/r+rPWWjvWVmzMde1aW+wAAxMTAZHO+f3BMjoyKKKC6Pt5nnl0zj333PdMXOCdM+9VKJVKJUIIIYQQQgghhBBCCCG+O1qZHYAQQgghhBBCCCGEEEKIzCEJYiGEEEIIIYQQQgghhPhOSYJYCCGEEEIIIYQQQgghvlOSIBZCCCGEEEIIIYQQQojvlCSIhRBCCCGEEEIIIYQQ4jslCWIhhBBCCCGEEEIIIYT4TkmCWAghhBBCCCGEEEIIIb5TkiAWQgghhBBCCCGEEEKI75QkiIUQQgghhBBCCCGEEOI7JQliIYQQQtCuXTs8PT0zO4yvRkBAAA4ODsyZMyezQxGZ5FPfE1/yPeXg4MDw4cO/yNjDhw/HwcHhi4yd0b6297GXlxeurq6EhIRkdigfdODAAZycnLh//35mhyKEEEKIDCAJYiGEEOIbFRUVxYoVK/jpp58oW7YsxYoVw93dnW7duuHt7U18fHyGxzRnzhwOHDiQ4cdNdu7cOXr27ImnpydOTk6UL1+eJk2aMGHCBB49epRpcX2MgIAA5syZg6+vb2aHkqU8fPiQ//3vf9SrVw9XV1ecnJzw8PCgW7dueHl5ERkZmdkhfjHx8fFs3ryZTp06Ua5cOZycnHBzc6Ndu3asXr2aqKioDI9pxYoVeHt7Z/hxM0tYWBhz5syhY8eOmJqaqtrnzJmDg4MDV69eTbHP8uXLcXR0pFWrVrx+/fqzx+Tl5YWDgwMODg4EBwerbatWrRr29vZMmzbtsx9XCCGEEF8fncwOQAghhBCf34MHD+jevTv379/H3d2d7t27Y2pqSlBQEKdOnWLEiBHcuXOHYcOGZWhcc+fOpXHjxlSrVi1DjwtJyZDffvsNKysrGjVqxA8//EBwcDD+/v7s2rWL0qVLY2VlBYClpSVXrlxBW1s7w+P8kMePHzN37lwsLS0pUqRIZoeTJXh7e/Prr7+io6NDrVq1aNWqFdmzZ+fly5ecO3eO33//nYMHD7J06dLMDvWzCw4OplevXly6dIkSJUrQoUMHcuXKRVhYGOfOnWPSpEn4+Pjw559/Zmhcq1atwtLSkiZNmnyxY3xN72MvLy/CwsJo27ZtmvrPmjWLBQsW4OHhwdy5c9HX1/+s8Tx//pzp06djYGCQ6ocj7du355dffuH27dsULlz4sx5fCCGEEF8XSRALIYQQ35jo6Gh69OihWmlao0YNte3du3fnypUrGlesZWXh4eEYGRlp3BYfH8/MmTPJly8f27ZtS9EvNjZWLUmiUCjIli3bF433a/W+xzErOnXqFKNGjaJw4cIsWbKEPHnyqG3v2bMnjx49Yvfu3ZkU4ZejVCrp168fly5dYvTo0bRr105te6dOnbh//z579uzJpAi/jOTX8NfyPk5MTGTDhg1UrFgRMzOz9/ZVKpVMmDCBNWvWULt2baZOnYqent5nj2n8+PFYW1tjZ2fHjh07NPapXr0648aNY/369YwZM+azxyCEEEKIr4eUmBBCCCG+MZs2beLevXt06tQpRXI4mbOzM23atHnvOJ6enikSSgBnzpzBwcFB7evhMTExzJkzh5o1a1KiRAlKly5N/fr1mTJlCvCmFijA1q1bVV9rfrfW6cmTJ+ncuTOlS5emePHi1K9fn3Xr1qUa240bN+jSpQulSpWiQYMGqc4lJCSE169fU7x4cY3JTz09PUxMTFT3U6tdGhUVxaRJk/Dw8MDZ2ZkWLVpw6tQpjXVbk2vQPn/+nEGDBlGmTBlKlChBly5duHfvnlrf8PBwZs6cSfPmzXFzc8PJyYnq1aszbdo0ta//e3t70759ewBGjBihegyTnydvb28cHBw4c+ZMijlqqon7ocfx/v37DB06FA8PD5ycnPD09GTKlCkpVhw+ffqUESNGULVqVVXpjlatWrF169YUcWS0P/74A0hakflucjiZlZUVPXr0SNN4586do1OnTpQqVQpnZ2caN27Mpk2bUu3/6NEjevXqRalSpXB1daV3794pypkkJiayYMEC2rRpQ4UKFXBycqJKlSr8+uuvn1Sv9vDhw5w7d446depofC8D2NjY0LNnz/eOk1o9ZU3vk8TERFasWEH9+vVxcXHB1dWVmjVrMnLkSOLi4oCkGsqPHz/m7NmzaueCgIAA1ThXr16ld+/eqvdDzZo1WbBgQYrSOMmxPXr0iH79+lG2bFlKlSqVanxvtx0+fJimTZtSvHhxPDw8mDJlisbSO3v37qVBgwYUL16cKlWqMHfuXE6ePJniPJiaK1eu8PjxYypXrvzefvHx8QwbNow1a9bQokULZsyY8UWSw/v37+fQoUP89ttv711dbWhoSKlSpdi7d+9nj0EIIYQQXxdZQSyEEEJ8Y5L/mG/ZsmWGHfO3335jy5YtNGrUCBcXFxISErh//74qUWlmZsbUqVMZNmwYpUuXpkWLFinG2LBhA7/++islS5akZ8+e6Ovrc/LkScaNG8fDhw/55Zdf1Po/efKEDh06UKtWLWrUqPHeGrIWFhYYGBhw7tw57t69i62tbbrm2b9/f44ePUq1atVwd3cnICCA3r17kz9/fo39IyMjadu2LSVKlGDgwIEEBASwatUqfv75Z3bt2qVKzjx//pzNmzdTo0YN6tWrh46ODmfPnuWvv/7C19dXVfqgTJky9OzZk4ULF9KyZUtVIszCwiJd84HUH8dr167RoUMHcubMScuWLcmTJw9+fn6sXr2aixcvsnr1anR1dYmPj6dTp048f/6cn376CRsbG8LDw7l58yY+Pj40btw43bF9qkePHnH9+nXKlCmT7uf8bYcOHaJPnz5YWFjQqVMnjIyM+Pvvvxk9ejQBAQEMHDhQrX9kZCTt2rXD2dmZQYMG8eDBA7y8vLh8+TJbt24lV65cAMTFxbF06VJq1KjBjz/+iL6+PlevXmXLli1cuHCBLVu2pCtRmHwu0PR++1IWLFjA7NmzqVq1Kq1atUJbW5uAgAAOHTpEbGwsurq6TJ06lUmTJmFqaqqWnE5eXXvkyBH69OlDgQIF6Ny5M8bGxly6dInZs2fj6+vL7Nmz1Y4ZERFB27ZtcXV1ZcCAASnq6Wpy9OhRvLy8aNWqFU2bNuXgwYMsW7YMY2NjtZh2797NoEGDsLa2pk+fPmhra7Nt2zYOHTqU5sfk7NmzQNIHc6mJiYmhb9++HDp0iK5duzJ06FCN/SIiIoiJiUnTcbNly4ahoaFaW3h4OOPHj6dly5Y4Ozvj5eX13jFcXFw4fvw4/v7+FCpUKE3HFUIIIUTWIwliIYQQ4htz+/ZtjIyMVPV0M8KBAweoVKmSasXwuwwMDGjYsCHDhg3DysqKhg0bqm1/8eIFEyZMoG7dukyfPl3V3qZNGyZMmKC62N7bcwoICGDChAk0b978g/EpFAr69u3LlClTqFevHkWLFqVkyZI4OztTvnx5VaLufY4ePcrRo0dp3rw5EyZMULWXK1eO7t27a9wnJCSELl260K1bN1WbmZkZf/zxBydPnqRixYpA0grWI0eOoKurqzb35DqkV65cwdnZGSsrK9zd3Vm4cCElS5ZM8TimR2qP48iRI8mVKxebN29WW3Vdvnx5+vTpw86dO2nSpAl37tzh3r17DBkyRG2eX4Pbt28D4OjomGJbVFRUiouzmZqaolAoNI6VkJDA77//joGBAZs2bVKtRv7pp59o3749ixcvpnHjxtjY2Kj2CQkJoX379owaNUrVVqZMGfr06cOcOXMYP348kLSC/fjx42TPnl3Vr3Xr1ri4uDB69GgOHDhAnTp10j3/jKxVfeDAAQoVKsTChQvV2ocMGaL6f8OGDfnzzz+xsLBI8RqOiYlh1KhRlChRgpUrV6Kjk/TnSqtWrXB
"text/plain": [
"<Figure size 1600x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABOgAAAGGCAYAAADB+b5kAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XdUE1kbB+BfQDpIt6GCIAGkd0UsIIINKXYBRVGx995Fv7UslhV7VyyogH1X7BUVcXVt2BBBdFWa9J75/mATjQkYEAjo+5wzR7lz5+bO5GaSvLmFxTAMA0IIIYQQQgghhBBCiFhIiLsChBBCCCGEEEIIIYT8yihARwghhBBCCCGEEEKIGFGAjhBCCCGEEEIIIYQQMaIAHSGEEEIIIYQQQgghYkQBOkIIIYQQQgghhBBCxIgCdIQQQgghhBBCCCGEiBEF6AghhBBCCCGEEEIIESMK0BFCCCGEEEIIIYQQIkYUoCOEEEIIIYQQQgghRIwoQEcI+Sn5+fnB2dlZ3NWoM5KTk2FgYICQkBBxV4X8RGbPng0DAwNxV6Na3LlzBwYGBoiMjOSl1cfXjYGBAWbPni3uagAAJk+ejIEDB4q7GqSSfqb3T2Gv67qkrtfva5GRkTAwMMCdO3fEXZUKXbhwASYmJnjz5o24q0IIIZVGATpCSL2Rn5+PPXv2YPDgwbCzs4OxsTEcHBwwcuRIREZGoqSkpNbrFBISggsXLtT643LdvXsXo0ePhrOzM0xMTNCuXTt4e3tj2bJlePv2rdjqVRnJyckICQlBXFycuKtSL3CDYtzNyMgI7dq1w+jRo/H333+Lu3o14s6dOwgJCUFWVlaVjj948CAMDAxgZWWF/Pz8aq5d7crKykJISEid/5J87949/PXXX5g8eTJfup+fHywtLQXyFxUVYdKkSTAwMEBQUBAYhqn2Ok2ePBkGBgbo1atXtZddm36We6a43z9J7duzZ0+NByNdXFzAZrMRHBxco49DCCE1oYG4K0AIIaJITEzEqFGj8ObNGzg4OGDUqFFQVVVFWloabt26hTlz5uDVq1eYOXNmrdZrw4YN8PLygouLS60+LlAWdFiyZAlatGgBT09PNG3aFOnp6YiPj8fp06dhY2ODFi1aAAC0tLTw8OFDSEpK1no9v+fdu3fYsGEDtLS0YGRkJO7q1BuLFy+GvLw8ioqK8OLFCxw9ehTXr1/H7t27YWdnJ+7qVauYmBjea61hw4aVPj48PBwtW7ZEUlISzp49Cy8vrxqoZe3IysrChg0bMH78eNjb2wvsf/jwISQkxP/766ZNm2BkZIS2bdt+N29+fj7Gjx+PGzduYMyYMQJBvepw+fJlREVFQVZWttrLrm0/yz2zNt4/bW1t8fDhQzRoQF95fpSHhwd69uwJKSmpKpexb98+aGlpwdvbuxprJmjIkCGYNWsWXr58CX19/Rp9LEIIqU70bkUIqfMKCgoQGBjI6zXg6urKt3/UqFF4+PAhHj16JKYa1oycnBwoKioK3VdSUoK1a9eiWbNmOH78uEC+oqIi5OXl8f5msViQkZGp0frWVRVdx/rMzc0NampqvL9tbGwwceJE7Nixo9wAHcMwyMvLg4KCQm1VU+yePXuGJ0+eYOXKldi7dy8iIiLqdYDue+rC6zwxMRE3b94UaahtVlYWRo0ahQcPHmDOnDnw9/ev9vrk5uZiyZIl8PHxwaVLl6q9fFL3cO/7EhISdeI1UZ9xr6WkpGSd/JGP6+v3+q5du2Lx4sUICwvDggULxFwzQggRnfh/YiWEkO84evQoEhISMGzYMIHgHJeZmRl8fHwqLMfZ2Rl+fn4C6cLmgCksLERISAjc3Nxgbm4OGxsbuLu7Y+XKlQC+zE0FAMeOHeMbcvi16OhoDB8+HDY2NjA1NYW7uzsOHTpUbt2ePn2KgIAAWFtbo3fv3uWeS0ZGBrKysmBqaio0+CQtLQ0VFRXe3+XNpZWfn4/ly5fD0dERZmZm6N+/P27duiV0bjHuvEQfP37E1KlTYWtrC3NzcwQEBCAhIYEvb05ODtauXYt+/frB3t4eJiYm6Nq1K4KDg/mGGEZGRmLIkCEAgDlz5vCuIfd5qmjOG2HzJH3vOr558wYzZsyAo6MjTExM4OzsjJUrV/IFMwHg33//xZw5c+Dk5MQbOjxw4EAcO3ZMoB51haOjI4Cy4AjA364PHDiAHj16wNTUFLt27QJQFuTdtm0bL93e3h7jxo3D8+fPBcouLCzEypUree2kb9++uHHjhtB6VOZ1BpQFk7dv3w4PDw+Ym5vD2toa3t7e2L9/P4CyIb0bNmwAAHTp0oXXRkSdFy48PBzy8vJwdXWFl5cX7t69y7tGVVWZawcAUVFR8PPzg42NDczNzeHm5oZly5ahqKgIAMDhcLB582b4+Pigffv2MDExQefOnbFo0SJkZGTwyrlz5w66dOkCoKz3EfdafP06KG8OuqNHj8LLywtmZmawtrbG8OHDERsbK5CPe/z9+/fh6+sLCwsL2NvbY968ecjNzRXp+kRFRYFhGHTs2LHCfKmpqfDz88PDhw/x22+/1UhwDgDWrl2L0tLSauuZ9/X99PTp03B3d4epqSk6d+6MkJAQodMtPHv2DOPGjYO9vT1MTU3Ro0cPbN++HaWlpXz5vnfv+d4983sSExMxZ84cdOzYESYmJnB0dMSYMWPw+PHjCo+rr++fwur3dVpERAR69uwJExMTODk5Yfv27ULP/+DBg3Bzc4OJiQlcXV2xf//+Ss/JduHCBXh6esLU1BSdOnXCunXryp2ao6ioCFu2bEHPnj1hamoKGxsbjB49Gk+fPuXLx+FwsGfPHri7u8PS0hJWVlZwc3PD3LlzUVxczJf36dOnmDhxIhwcHGBiYoJOnTph6tSpSEpK4uXhvv5v3bqFQYMGwdLSEmPGjAEg/P2YmxYdHY2QkBBeu3V3d8eZM2f4Ht/AwADv3r1DTEwM33OenJzMd40GDhwICwsLWFpaYuDAgUKHQX/vvV5BQQHW1taIior63tNCCCF1CvWgI4TUedwPWAMGDKi1x1yyZAkiIiLg6ekJS0tLlJaW4s2bN7wPpmpqali1ahVmzpwJGxsb9O/fX6CMw4cPY9GiRbCwsMDo0aMhJyeH6OhoLF68GElJSZg1axZf/vfv32Po0KHo1q0bXF1dBYJGX9PQ0IC8vDzu3r2L169fQ1dXt0rnOWnSJFy9ehUuLi5wcHBAcnIyxo0bh+bNmwvNn5eXB19fX5ibm2PKlClITk7Gvn37MHbsWJw+fZr36/rHjx8RHh4OV1dX9OrVCw0aNEBMTAx27NiBuLg47Ny5E0DZ8KPRo0djy5YtGDBgAKytrXnnV1XlXcfHjx9j6NChaNiwIQYMGIDGjRvj2bNnCA0Nxf379xEaGgopKSmUlJRg2LBh+PjxIwYPHgwdHR3k5OTg+fPniI2NrbO9r7hBJ1VVVb70vXv34vPnz+jXrx80NTXRpEkTAMD06dPx119/oX379hg0aBBSU1Nx4MABDBw4EAcOHECbNm14ZUydOhUXLlyAk5MTOnTogKSkJEyYMKHcdiKqoqIiBAQEICYmBo6OjujduzdkZGTw4sULnDt3Dr6+vhgwYABycnJw/vx5zJkzh3d+oixOUVRUhFOnTqFbt26Ql5dHr169sGrVKkRERGDq1KlVrndlrt3atWuxZcsWtG7dGv7+/tDU1ERSUhLOnTuHiRMnQlpaGsXFxdi5cydcXV3RpUsXyMnJ4dGjR4iIiMDff/+NiIgISEtLQ09PD3PmzMHy5cvRtWtXdO3aFQC+2yPy999/x44dO2BmZoapU6ciJycHR44cwdChQ7Fp0yZ06tSJL39cXBxGjx4Nb29v9OrVCzExMQgPD4eEhASWLl363esTExODhg0bolWrVuXmeffuHYYPH47379/jjz/+4J3LtzIzMwWCWOVRVFSEtLQ0X9rDhw9x4MABrF69utp70l66dAlv376Fj48PNDQ0cOnSJWzYsAHv37/H8uXLefkePXoEPz8/NGjQgJf38uXLCA4OxrNnz7B69WoAEOne8yP3zEePHsHf3x8lJSX
"text/plain": [
"<Figure size 1400x400 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABkAAAAHpCAYAAADauPAPAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XVYFFsfB/DvLiFdAipINwKCgihiYSfqtRG78167+1rXumJ3K4rY1+7uFgxUFJPORWrfP5DVlQXRF0Tg+3mefZQzZ2bOOTM7uzu/OecIxGKxGERERERERERERERERMWIsLALQERERERERERERERElN8YACEiIiIiIiIiIiIiomKHARAiIiIiIiIiIiIiIip2GAAhIiIiIiIiIiIiIqJihwEQIiIiIiIiIiIiIiIqdhgAISIiIiIiIiIiIiKiYocBECIiIiIiIiIiIiIiKnYYACEiIiIiIiIiIiIiomKHARAiIiIiIiIiIiIiIip2GAAhIiIiyiNfX194eXkVdjF+G2FhYbCxsYGfn19hF4WIihmxWIz27dtj+PDhhV2UPBk4cCB8fX0LuxhERERE9A35wi4AERERUWESiUTw9/fHsWPH8OzZMyQmJkJTUxMVKlRA48aN0aJFC8jL/9qvTH5+frCzs0O9evV+6X6zXL9+HWvXrsWTJ0/w8eNHqKuro1y5cqhUqRK6du0KIyOjQinXjwgLC8OePXtQr1492NnZFXZxfluBgYEYO3ZsnvJWqVIFmzdvLuASySYSidCsWTOEhYXBx8cHkyZNypbn+fPnmDdvHq5fv47U1FTY29tj8ODBqFatWra88fHxWLRoEY4dO4aYmBgYGxvDx8cHHTt2hEAg+KGynT17Fn369IFQKMTJkydhYGDw0/UsSuLi4rBx40ZUqVIF7u7u+b79gwcP4sGDB5gzZ45UupeXF1RUVHDw4EGp9ISEBPTt2xc3btzA8OHD0adPn/+7DF5eXnjz5o3MZZcvX4aOjo7k78GDB6Nly5Y4efIk6tat+3/vm4iIiIjyBwMgREREVGKFhoaiT58+ePnyJTw8PNCnTx9oa2sjMjISly9fxtixY/Hs2TOMGjXql5ZryZIlaNWqVaEEQLZt24apU6fCyMgILVu2RLly5RAVFYWQkBAcPHgQrq6ukgCIoaEh7t27Bzk5uV9ezu958+YNlixZAkNDQwZAcuHm5oa5c+dKpa1YsQLPnz/Plq6rq/sriyZl8eLFiIqKynH5q1ev0LFjR8jJyaFXr15QU1PDrl270KtXL6xevRoeHh6SvCkpKejevTuCgoLQuXNnWFhY4Ny5c5g6dSoiIyMxePDgHyrb7t27Ua5cOURERCAwMBCDBg366XoWJXFxcViyZAkGDRpUIAGQpUuXonbt2jA1Nf1u3qioKPTs2RPBwcGYPn062rVrl2/lMDc3R79+/bKlq6mpSf1ta2uLKlWqYNmyZQyAEBEREf1GGAAhIiKiEik5ORl9+/ZFWFgY/Pz80KBBA6nlffr0wb1793D//v1CKmHBSEhIyHbjLktaWhoWLlwIAwMD7N27N1u+lJQUJCUlSf4WCAQoVapUgZb3d5VbOxYlRkZG2Xr0BAQE4Pnz5/D29i6kUkl7+PAhNm7ciJEjR2L27Nky88yfPx9xcXEIDAyUBLxatmyJZs2aYerUqThy5IikZ8euXbtw//59TJgwQTJkUbt27TB48GCsXLkSrVu3hqGhYZ7KFhUVhVOnTqF///4ICgpCYGAgBg4c+MO9SEja5cuX8eLFizwNf/Xu3Tt0794dYWFhmD9/Ppo0aZKvZdHV1c3ze8Hb2xvjxo3Dw4cPUaFChXwtBxERERH9HM4BQkRERCXSrl278OLFC3Tv3j1b8COLk5MTfHx8ct2Ol5eXzHHfr169ChsbGwQGBkrSPn36BD8/PzRs2BAVK1aEq6srmjdvLhniJWtODQDYs2cPbGxsJK+vXbp0CT169ICrqyscHR3RvHlzbN++PceyPXr0CD179kTlypXRokWLHOsSHR2NuLg4ODo6yry5r6ioCC0tLcnfOc0BIhKJMGvWLHh6esLJyQnt2rXD5cuXMWbMmGx1yZpX5cOHDxg2bBjc3NxQsWJF9OzZEy9evJDKm5CQgIULF6Jt27Zwd3eHg4MD6tevj3nz5kEkEknyBQYGokuXLgCAsWPHStow6zgFBgbCxsYGV69ezVZHWfO8fK8dX758iZEjR8LT0xMODg7w8vLCnDlzpIJFQOaN2rFjx6JOnTpwcHBAtWrV0KFDB+zZsydbOX5HwcHBGDhwINzd3eHo6IgmTZpg9erVSE9Pl8qXdZyjoqIwatQouLu7w9nZGV27dsXDhw9/aJ/p6emYOHEiatSogfr168vMk5SUhFOnTqFKlSpSvX1UVVXRpk0bvHz5UiqQefDgQSgrK2frJdC1a1ekpqbiv//+y3P59u3bh7S0NHh7e6NVq1Z48+YNLl++LDNveHg4ZsyYgbp160qOf/fu3XHx4kWpfKGhoRg7dixq1qwJBwcHeHp6on///njw4IFUvhMnTqBDhw5wdnaGi4sLOnTogBMnTmTbr42NDcaMGZMtXdb7wM/PDzY2Nnj+/DkWLFggKUOLFi1w9uxZSb6rV69KejksWbJE8h77+r2zd+9etGnTBq6urnB2dkbdunUxfPjwXHvyZDl8+DDk5ORQvXr1XPO9ePECnTp1wvv377F8+fJ8D35kSUtLQ0JCwnfz1axZE0Bm+YmIiIjo98AeIERERFQiHT16FADQvn37X7bPqVOnYvfu3WjZsiVcXFyQnp6Oly9fSm5A6ujoYO7cuRg1ahRcXV1lDuPi7++PyZMnw9nZGf369YOysjIuXbqEKVOm4NWrVxg9erRU/rdv36Jr165o1KgRGjRokO2m/Nd0dXWhoqKC69ev4/nz5zA3N/+peg4dOhRnz55FvXr14OHhgbCwMAwcOBDly5eXmT8pKQmdO3dGxYoV8ddffyEsLAybNm3CgAEDcPDgQckQWx8+fEBAQAAaNGiAZs2aQV5eHteuXcOaNWsQFBSEtWvXAsgc1qlfv35YsWIF2rdvj8qVK0vq97NyascHDx6ga9eu0NDQQPv27VGmTBkEBwdj8+bNuH37NjZv3gwFBQWkpaWhe/fu+PDhAzp16gRTU1MkJCTg8ePHuHHjBlq1avXTZfsV7t+/D19fX8jLy8PHxwe6uro4ffo05s2bh+DgYMyfPz/bOr169YKmpiYGDRqEiIgIbNmyBZ07d4a/vz+sra3ztN8NGzbg+fPnWLx4cY55Hj9+jJSUFDg7O2dblpV2//59ODk5ISMjA48ePYK9vX223ktOTk4QCAQ/1Otr9+7dcHNzQ/ny5VG2bFmULl0au3fvlhpyC8gMFnbs2BGRkZHw9vaGg4MDRCIR7t69i0uXLklu9N+/fx/dunVDWloa2rRpAysrK8TGxuLatWu4ffs2HBwcAABbt27FtGnTYG5ujgEDBgDIDJoOHDgQ06ZN+7+va2PGjIG8vDx69OiB1NRUbNy4EQMHDsSRI0dQvnx5WFhYYOzYsZg1axbq168vCU6pqqoCyAx+jB49Gq6urhgyZAiUlJTw7t07nD17FpGRkVJzZ8hy/fp1WFpaQkVFJcc8QUFB6NmzJ1JTU7Fu3TpUqlRJZr68BFyyqKurQ0FBQSrt7t27cHZ2RmpqKtTV1VG3bl0MGzYMZcqUyba+np4eDA0Nce3atTzvk4iIiIgKFgMgREREVCI9ffoUampqv3RC7xMnTqBmzZrZJvXNoqKiAm9vb4waNQpGRkbZhl35+PEjZsyYgaZNm0rdcPbx8cGMGTOwYcMGdOrUSapOYWFhmDFjBtq2bfvd8gkEAgwePBhz5sxBs2bNYG9vD2dnZzg5OaFatWrQ09P77jbOnj2Ls2fPom3btpgxY4YkvWrVqjlOShwdHY2ePXuid+/ekjQdHR38888/uHTpEmrUqAEgc7imM2fOSN2g9PHxwaJFi7B8+XLcu3cPTk5OMDIygoeHB1asWAFnZ+d8Gcopp3YcN24c9PT0EBAQINVrplq1ahg0aBAOHDiA1q1
"text/plain": [
"<Figure size 1800x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def plot_heatmap_annotated(dfc, profile_vars, cluster_col, title, figsize=(16, 5)):\n",
" \"\"\"\n",
" Heatmap colorée par z-score robuste, annotée avec les médianes réelles.\n",
" \"\"\"\n",
" # Médianes réelles\n",
" prof_median = dfc.groupby(cluster_col)[profile_vars].median()\n",
" \n",
" # Z-scores robustes pour la couleur\n",
" prof_z = prof_median.copy().astype(float)\n",
" for col in profile_vars:\n",
" vals = prof_median[col].values\n",
" med = np.median(vals)\n",
" mad = np.median(np.abs(vals - med)) * 1.4826\n",
" if mad > 0:\n",
" prof_z[col] = (vals - med) / mad\n",
" else:\n",
" prof_z[col] = np.zeros(len(vals))\n",
" prof_z = prof_z.clip(-3, 3)\n",
"\n",
" # Annotations : médianes formatées\n",
" def fmt(val):\n",
" if abs(val) >= 1000:\n",
" return f\"{val:,.0f}\"\n",
" elif abs(val) >= 10:\n",
" return f\"{val:.1f}\"\n",
" elif abs(val) >= 0.01:\n",
" return f\"{val:.2f}\"\n",
" else:\n",
" return f\"{val:.3f}\"\n",
"\n",
" annot = prof_median.applymap(fmt)\n",
"\n",
" # Labels des clusters avec taille\n",
" cluster_sizes = dfc[cluster_col].value_counts().sort_index()\n",
" row_labels = [\n",
" f\"C{i} (n={cluster_sizes.get(i, '?')})\"\n",
" for i in prof_median.index\n",
" ]\n",
"\n",
" fig, ax = plt.subplots(figsize=figsize)\n",
" sns.heatmap(\n",
" prof_z,\n",
" cmap=\"RdBu_r\",\n",
" center=0,\n",
" annot=annot,\n",
" fmt=\"\",\n",
" linewidths=0.5,\n",
" linecolor=\"white\",\n",
" ax=ax,\n",
" cbar_kws={\"label\": \"Robust z-score\", \"shrink\": 0.8},\n",
" xticklabels=profile_vars,\n",
" yticklabels=row_labels,\n",
" )\n",
" ax.set_title(title, fontsize=13, pad=12)\n",
" ax.tick_params(axis=\"x\", rotation=45, labelsize=9)\n",
" ax.tick_params(axis=\"y\", rotation=0, labelsize=9)\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
" return prof_median\n",
"\n",
"\n",
"# ── Global clustering ──────────────────────────────────────────────────────\n",
"prof_global = plot_heatmap_annotated(\n",
" dfc,\n",
" profile_vars = profile_vars_behavior,\n",
" cluster_col = \"cluster_k4\",\n",
" title = \"Cluster Signatures — Global Clustering (K=4)\\nColor: robust z-score | Values: cluster medians\",\n",
" figsize = (16, 4)\n",
")\n",
"\n",
"# ── Allocation (descriptive) ───────────────────────────────────────────────\n",
"prof_alloc = plot_heatmap_annotated(\n",
" dfc,\n",
" profile_vars = profile_vars_allocation,\n",
" cluster_col = \"cluster_k4\",\n",
" title = \"Cluster Signatures — Product Allocation (K=4, post-clustering descriptor)\\nColor: robust z-score | Values: cluster medians\",\n",
" figsize = (14, 4)\n",
")\n",
"\n",
"# ── Top 400 ───────────────────────────────────────────────────────────────\n",
"prof_top400 = plot_heatmap_annotated(\n",
" dfc_top400,\n",
" profile_vars = profile_vars_top400,\n",
" cluster_col = \"cluster_k5\",\n",
" title = \"Cluster Signatures — Top 400 Accounts (K=5)\\nColor: robust z-score | Values: cluster medians\",\n",
" figsize = (18, 5)\n",
")"
]
2026-04-07 20:26:19 +02:00
},
{
"cell_type": "code",
2026-04-08 17:41:37 +02:00
"execution_count": 38,
2026-04-07 20:26:19 +02:00
"id": "e42c50aa-0343-47b9-b562-78137d63d5e9",
"metadata": {},
2026-04-08 17:41:37 +02:00
"outputs": [
{
"ename": "IndentationError",
"evalue": "unindent does not match any outer indentation level (<string>, line 96)",
"output_type": "error",
"traceback": [
" \u001b[36mFile \u001b[39m\u001b[32m<string>:96\u001b[39m\n\u001b[31m \u001b[39m\u001b[31mdf_plot = df[feature_names + [cluster_col]].copy()\u001b[39m\n ^\n\u001b[31mIndentationError\u001b[39m\u001b[31m:\u001b[39m unindent does not match any outer indentation level\n"
]
}
],
"source": [
"# ============================================================\n",
"# VISUALISATION CLUSTERING — GLOBAL + TOP 400\n",
"# ============================================================\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import numpy as np\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.metrics import silhouette_samples\n",
"from pandas.plotting import parallel_coordinates\n",
"\n",
"# ------------------------------------------------------------\n",
"# GENERIC FUNCTION\n",
"# ------------------------------------------------------------\n",
"def visualize_clustering(df, X_scaled, feature_names, cluster_col, title_prefix):\n",
"\n",
" print(f\"\\n{'='*60}\")\n",
" print(f\"{title_prefix}\")\n",
" print(f\"{'='*60}\")\n",
"\n",
" # ========================================================\n",
" # 1. PCA projection (2D)\n",
" # ========================================================\n",
" pca = PCA(n_components=2)\n",
" X_pca = pca.fit_transform(X_scaled)\n",
"\n",
" plt.figure(figsize=(7,5))\n",
" sns.scatterplot(\n",
" x=X_pca[:,0], y=X_pca[:,1],\n",
" hue=df[cluster_col],\n",
" palette=\"tab10\",\n",
" s=20\n",
" )\n",
" plt.title(f\"{title_prefix} — PCA projection\")\n",
" plt.xlabel(\"PC1\")\n",
" plt.ylabel(\"PC2\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
" # ========================================================\n",
" # 2. Radar chart (cluster profiles)\n",
" # ========================================================\n",
"\n",
" # ========================================================\n",
" # 3. Parallel coordinates\n",
" # ========================================================\n",
" df_plot = df[feature_names + [cluster_col]].copy()\n",
" df_plot[cluster_col] = df_plot[cluster_col].astype(str)\n",
"\n",
" plt.figure(figsize=(12,5))\n",
" parallel_coordinates(df_plot, cluster_col, colormap=\"tab10\", alpha=0.2)\n",
" plt.xticks(rotation=45)\n",
" plt.title(f\"{title_prefix} — Parallel coordinates\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
" # ========================================================\n",
" # 4. Boxplots (distribution per cluster)\n",
" # ========================================================\n",
" for col in feature_names:\n",
" plt.figure(figsize=(5,3))\n",
" sns.boxplot(x=cluster_col, y=col, data=df)\n",
" plt.title(f\"{title_prefix} — {col}\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
" # ========================================================\n",
" # 5. Silhouette distribution\n",
" # ========================================================\n",
" sil_vals = silhouette_samples(X_scaled, df[cluster_col])\n",
"\n",
" plt.figure(figsize=(6,4))\n",
" sns.histplot(sil_vals, bins=30)\n",
" plt.title(f\"{title_prefix} — Silhouette distribution\")\n",
" plt.xlabel(\"Silhouette value\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
" # ========================================================\n",
" # 6. Business 2D plot (intensity vs frequency)\n",
" # ========================================================\n",
" if \"flow_freq\" in df.columns and \"gross_flow_to_aum\" in df.columns:\n",
"\n",
" plt.figure(figsize=(7,5))\n",
" sns.scatterplot(\n",
" data=df,\n",
" x=\"flow_freq\",\n",
" y=\"gross_flow_to_aum\",\n",
" hue=cluster_col,\n",
" size=\"aum_qty_mean\" if \"aum_qty_mean\" in df.columns else None,\n",
" sizes=(20,200),\n",
" alpha=0.7\n",
" )\n",
" plt.yscale(\"log\")\n",
" plt.title(f\"{title_prefix} — Intensity vs Frequency\")\n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
"\n",
"# ============================================================\n",
"# RUN FOR GLOBAL\n",
"# ============================================================\n",
"visualize_clustering(\n",
" df = dfc,\n",
" X_scaled = X_global_scaled,\n",
" feature_names= all_features_global,\n",
" cluster_col = \"cluster_k4\",\n",
" title_prefix = \"GLOBAL CLUSTERING (K=4)\"\n",
")\n",
"\n",
"# ============================================================\n",
"# RUN FOR TOP 400\n",
"# ============================================================\n",
"visualize_clustering(\n",
" df = dfc_top400,\n",
" X_scaled = X_top400_scaled,\n",
" feature_names= all_features_top400,\n",
" cluster_col = \"cluster_k5\",\n",
" title_prefix = \"TOP 400 CLUSTERING (K=5)\"\n",
")"
]
2026-04-07 20:26:19 +02:00
},
{
"cell_type": "code",
2026-04-08 17:41:37 +02:00
"execution_count": 40,
2026-04-07 20:26:19 +02:00
"id": "fc913550-7c1d-44ee-aa36-e7b027b98e2e",
"metadata": {},
2026-04-08 17:41:37 +02:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Top discriminant features:\n",
"months_since_last_tx 2953.000000\n",
"avg_holding_months_per_isin 519.522790\n",
"n_isin_total 24.250000\n",
"gross_flow_to_aum 10.461616\n",
"log_aum_qty_mean 5.029849\n",
"dtype: float64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkUAAAHdCAYAAAATow1yAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzsnXd8FHX+/5+72U3ZbDa997ZJKAoIKKKCgtg99WycwqlY0DtQT0/UO7+C5bDx07OcoCieYkERVCygqChBRESlhpDek80m2ZKy2Ta/P3I7Zknb9MI8H499QGY/M/OZ3dmZ17yrTBAEAQkJCQkJCQmJExz5cE9AQkJCQkJCQmIkIIkiCQkJCQkJCQkkUSQhISEhISEhAUiiSEJCQkJCQkICkESRhISEhISEhAQgiSIJCQkJCQkJCUASRRISEhISEhISgCSKJCQkJCQkJCQASRRJSEhISEhISACSKJI4wXnhhRfIyMgY0n3ecsst/POf/xzSffaV8vJyMjIy2LRpk7hsOD6zzubRX/bs2UNGRgZ79uwZsG0ONWvXrmXOnDlkZWXxhz/8YbinM6w0NDQwadIkvvvuu+GeisQoRjHcExgNlJWVsW7dOnbt2kV1dTUAsbGxnHrqqVxzzTVkZmaKY1944QVefPFFdu/eTUhISJfbtNlsvPvuu3z88ccUFhYCkJKSwh/+8Afmz5+PUql0G3/OOedQUVEh/u3t7U10dDRz5szhtttuIygoqNP9PPXUU7z22mtccMEFPPfccx3eLy8vZ86cOdx3330sWrTI049kUFmwYAE//fST+LePjw+JiYn88Y9/ZOHChcjlo1fL79u3j127dvHFF18AHb/Xrli5ciVXXHHFYE9PYhj57rvvOHDgAEuWLPFofHZ2Nk8//TSXXnopS5YsITg4eETMa7gIDg7myiuv5N///jezZs0a7ulIjFIkUdQD3377LXfffTdeXl5ccsklZGZmIpfLKSws5Msvv+Tdd9/l66+/JjY21uNtNjc3c9ttt/HTTz9x9tlnc8UVVyCTydi5cyePP/44X331FWvWrEGlUrmtl5WVxY033giA1Wrl0KFDvPnmm+zdu5eNGzd22I8gCHz22WfExsby7bff0tjYiFqt7t8HMkRERUXxt7/9DWh7Avz0009ZuXIlDQ0N3H333cM8u77z2muvMWPGDBITEwF48MEHaWpqEt///vvv+fTTT3nggQfcbnJTpkwZ8rl2xe23386tt946pPuMjY3lwIEDKBQDd8maNm0aBw4c6PAAMlx89913vP322x6Ljx9//BG5XM7jjz+Ot7f3iJnXcDJ//nzeeustdu/ezYwZM4Z7OhKjEEkUdUNpaSl/+9vfiImJ4Y033iAiIsLt/XvvvZd33nmn15aLJ554gp9++omHHnqI66+/Xlz+pz/9ibfffptHHnmEJ598khUrVritFxkZ6WYiv+qqq1CpVLz++usUFxeTlJTkNn7Pnj1UV1fz3//+l5tvvpmvvvqKyy+/vFdzHS4CAgLcjnX+/PlccMEFvPXWWyxduhQvL69hnF3nCIJAa2srvr6+nb5fV1fHd999x/Lly8Vlc+fOdRuj1+v59NNPmTt3LnFxcYM53T6jUCgGVJx0h91ux+l04u3tjY+Pz4BuWy6XD/g2h5K6ujp8fX0HVRANJs3NzR0e/PpLamoqWq2WzZs3D4oocjqdWK3WAd+uxOChVCp7db+QRFE3rF27lubmZlauXNlBEEHbzWHhwoW92mZ1dTUbN27ktNNOcxNELq677jq2bdvGxo0buf3224mKiup2e+Hh4QCdfulbtmwhLS2N0047jRkzZrBly5ZRI4qOx8fHhwkTJrBt2zbq6urE7+Po0aO88cYb7N27F51Oh0aj4ayzzuK+++7r4E74+eefWblyJceOHSMyMpKbb7650319+OGHfPzxx+Tl5WE2m0lISOD666/nT3/6k9u4c845h/T0dK6//nqeffZZ8vLyuOeee7jhhhs63e6OHTuw2+2cfvrpvTp2u93OmjVr2Lx5M9XV1URERHDxxRfz17/+1e2G6JrPggULePrppyksLCQ+Pp677rqLefPm9bgfk8nEv/71L7766itkMhlz5szp9FhcLuLc3Fxx2a5du3jxxRfJy8vD4XAQERHBeeedJ1r7AFpbW3nllVf49NNPqaysJDAwkEmTJnHfffeRkJDg5sr18vJi/fr1VFRUsGnTJgICApgzZ46bG/H+++9n27ZtfPbZZ6xYsYKffvoJtVrN4sWLue6668jNzeXxxx/nwIEDBAcH87e//Y1LLrlEnM+ePXtYuHAhb775JqeeeirQ5rptaGjgueeeY8WKFRw4cACNRsPChQu55ZZbxHWtVisvv/wy3333HSUlJTgcDsaNG8fSpUs57bTTxHHtj0mtVvPqq69SXV1NRkYGDz/8MCeddJJ4LJs3bwZwi9dq/xm3p/0Y1//bfzYff/wx//3vf8nPz8fX15eZM2dy3333ER0dLa73888/8+abb3LgwAH0ej2hoaHid+YS9t3Nq7PPr/0xd/Zdffzxxzz66KP8/PPPzJgxg//85z84nU7efPNNPvjgA0pLSwkICGDu3Lncc889BAYGits9ePAgzz33HIcOHaKlpYWwsDBOPfVUVq5c6fbZnH766WzatAlBEJDJZJ1+fn3BarVSVFSE0+kcsG1KDA1BQUFERUV5dD5Ioqgbvv32WxITEzn55JMHbJvff/89DoeDyy67rMsxl112GXv27GHnzp1cddVV4nK73U59fT3Q9gM9cuQI69atY9q0acTHx7ttw2q18uWXX4rutosuuogHH3yQ2tpaUUiNNioqKpDJZGg0GnHZDz/8QFlZGVdccQXh4eHk5eXx/vvvk5+fz/vvvy/+CHJzc1m0aBEhISEsWbIEu93OCy+8QGhoaIf9vPvuu6Snp3POOeegUCj49ttvWbFiBYIgcN1117mNLSoq4p577uGaa67h6quvJjk5ucv5//rrrwQFBfXK1Qrwz3/+k82bN3Peeedx4403cuDAAdasWUNBQQEvvfSS29ji4mLuvvturr32Wi6//HI+/PBD7rzzTtauXcvMmTO73IcgCNxxxx3s27ePa6+9ltTUVL766iuWLVvW4/zy8vK47bbbyMjIYOnSpXh7e1NSUsIvv/wijnE4HNx2223s3r2biy66iIULF9LU1MSuXbs4duwYCQkJ4thNmzbR2trK1Vdfjbe3N4GBgV3eiBwOB7fccgtTp07l3nvvZcuWLTzyyCP4+fnx7LPPcskllzBv3jzee+89li1bxqRJkzr8Vo7HaDRy8803c+6553LBBRewbds2nnnmGbRarRir0tjYyAcffMDFF1/MVVddRVNTExs3buTmm2/mgw8+ICsry22bn376KU1NTVxzzTXIZDLWrl3LkiVL2L59O0qlkmuuuQadTseuXbt46qmnevzMn3rqKd5//30OHDjAY489BvzuYn355Zf597//zQUXXMCVV15JfX0969ev57rrruOjjz4Sfz9bt27FYrEwf/58goKCOHDgAOvXr6e6uprnn38eoNfz6g673c6iRYs45ZRTWLZsmSi8/u///o/NmzdzxRVXsGDBAsrLy3n77bc5cuQI7777Lkqlkrq6OhYtWkRwcDC33norGo2G8vJyvvrqqw77GT9+PG+88QZ5eXlotdp+zdmFIAhUVVXh5eVFfHz8qI5rPJEQBIHm5mZ0Oh2A20NBdytJdILZbBa0Wq1wxx13dHjPaDQKdXV14qulpUV87/nnnxe0Wq1QV1fX6XYff/xxQavVCkeOHOly34cPHxa0Wq2wcuVKcdnZZ58taLXaDq9rr71WqK+v77CNrVu3ClqtViguLhaPZ+LEicK6devcxpWVlQlarVZYu3Ztt5/HUHL99dcL559/vvj5FhQUCE8++aSg1WqFW2+91W1s+8/exaeffipotVph79694rI77rhDmDhxolBRUSEuy8/PF7KysgStVtvjNm+66SZhzpw5bstc38n333/v0XHNnz9fuPzyy7sds3btWkGr1QplZWWCIAhCTk6OoNVqhX/
"text/plain": [
"<Figure size 600x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>months_since_last_tx</th>\n",
" <th>avg_holding_months_per_isin</th>\n",
" <th>n_isin_total</th>\n",
" <th>gross_flow_to_aum</th>\n",
" <th>log_aum_qty_mean</th>\n",
" </tr>\n",
" <tr>\n",
" <th>cluster_k4</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>27.0</td>\n",
" <td>60.000000</td>\n",
" <td>3.0</td>\n",
" <td>1.159420</td>\n",
" <td>5.166510</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>127.0</td>\n",
" <td>12.000000</td>\n",
" <td>3.0</td>\n",
" <td>1.476151</td>\n",
" <td>3.407548</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3.0</td>\n",
" <td>28.896552</td>\n",
" <td>12.0</td>\n",
" <td>5.351092</td>\n",
" <td>8.762920</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>69.0</td>\n",
" <td>11.333333</td>\n",
" <td>1.0</td>\n",
" <td>7.889030</td>\n",
" <td>5.279875</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" months_since_last_tx avg_holding_months_per_isin n_isin_total \\\n",
"cluster_k4 \n",
"0 27.0 60.000000 3.0 \n",
"1 127.0 12.000000 3.0 \n",
"2 3.0 28.896552 12.0 \n",
"3 69.0 11.333333 1.0 \n",
"\n",
" gross_flow_to_aum log_aum_qty_mean \n",
"cluster_k4 \n",
"0 1.159420 5.166510 \n",
"1 1.476151 3.407548 \n",
"2 5.351092 8.762920 \n",
"3 7.889030 5.279875 "
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def select_top_features(df, feature_names, cluster_col, top_n=5):\n",
" # Select most discriminant features based on variance of cluster medians\n",
" \n",
" prof = df.groupby(cluster_col)[feature_names].median()\n",
" \n",
" # variance entre clusters\n",
" var_between = prof.var(axis=0).sort_values(ascending=False)\n",
" \n",
" top_features = var_between.head(top_n).index.tolist()\n",
" \n",
" print(\"\\nTop discriminant features:\")\n",
" print(var_between.head(top_n))\n",
" \n",
" return top_features\n",
" \n",
" prof = df.groupby(cluster_col)[feature_names].median()\n",
"\n",
"def plot_radar(df, feature_names, cluster_col, title):\n",
" \n",
" prof = df.groupby(cluster_col)[feature_names].median()\n",
" \n",
" # NORMALISATION (clé !!!)\n",
" prof_norm = (prof - prof.min()) / (prof.max() - prof.min() + 1e-9)\n",
" \n",
" labels = prof_norm.columns\n",
" num_vars = len(labels)\n",
" \n",
" angles = np.linspace(0, 2*np.pi, num_vars, endpoint=False).tolist()\n",
" angles += angles[:1]\n",
" \n",
" fig, ax = plt.subplots(figsize=(6,6), subplot_kw=dict(polar=True))\n",
" \n",
" for i in range(len(prof_norm)):\n",
" values = prof_norm.iloc[i].values.tolist()\n",
" values += values[:1]\n",
" \n",
" ax.plot(angles, values, label=f\"Cluster {i}\")\n",
" ax.fill(angles, values, alpha=0.1)\n",
" \n",
" ax.set_xticks(angles[:-1])\n",
" ax.set_xticklabels(labels)\n",
" \n",
" plt.title(title)\n",
" plt.legend(loc=\"upper right\", bbox_to_anchor=(1.3,1.1))\n",
" plt.tight_layout()\n",
" plt.show()\n",
" \n",
" return prof\n",
"\n",
"top_features_global = select_top_features(\n",
" dfc,\n",
" all_features_global,\n",
" \"cluster_k4\",\n",
" top_n=5\n",
")\n",
"\n",
"plot_radar(\n",
" dfc,\n",
" top_features_global,\n",
" \"cluster_k4\",\n",
" title=\"GLOBAL — Radar (Top discriminant features)\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "0b87d9ff-cca1-4c0b-a12f-64a9c09c5c8b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Top discriminant features:\n",
"n_tx_total 4.121397e+06\n",
"n_isin_total 2.883250e+02\n",
"avg_holding_months_per_isin 9.944313e+01\n",
"months_since_last_tx 7.050000e+01\n",
"gross_flow_to_aum 3.969037e+00\n",
"log_aum_qty_mean 3.438283e-01\n",
"dtype: float64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAi4AAAGMCAYAAAD9dimnAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjgsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvwVt1zgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzsnXeYE3X+x1+TTbZmW7b33uhF5Gii4k9sqKicoICKjdND4URAT6WpYOE8RU/AwinYUAE7eqAoICCiiMD23jfZbLYmmza/P/aSY9mW7bs4r+fJs5B8Z+Y7mczMez5VEEVRREJCQkJCQkJiECDr7wlISEhISEhISDiKJFwkJCQkJCQkBg2ScJGQkJCQkJAYNEjCRUJCQkJCQmLQIAkXCQkJCQkJiUGDJFwkJCQkJCQkBg2ScJGQkJCQkJAYNEjCRUJCQkJCQmLQIO/vCUhISEhISPQUoihiNpuxWCz9PRWJTuDk5IRcLkcQhA7HSsJFQkJCQuK8wGg0UlpaSkNDQ39PRaILuLu7ExISgrOzc7vjBKnkv4SEhITEYMdqtZKZmYmTkxMBAQE4Ozs79PQu0f+IoojRaEStVmOxWEhISEAmazuSRbK4SEhISEgMeoxGI1arlYiICNzd3ft7OhKdxM3NDYVCQX5+PkajEVdX1zbHSsG5EhISEhLnDe09qUsMbBw9dtIRlpCQkJCQkBg0SMJFQkJCQkJCYtAgCRcJiV7k0ksvZcWKFX22vdLSUoYPH87x48f7bJvdYePGjSQlJTV7r6+/s7bm0V1WrFjBpZde2qPr7Es0Gg0PPPAA48ePJykpiX//+9/9PaV+pbKykvT0dKxWa7/NISkpib179/bb9gcKUnBuH+HoRfHtt99m/PjxAFRVVbFlyxa+/fZbSkpKcHNzY/jw4cydO5dLLrmk2XJFRUVMmzbN/n+ZTEZQUBBDhw7lr3/9KykpKQ7P1WQycd1115Gdnc2yZcu48847m31utVp54403eO+991Cr1URHR3PvvfdyzTXXtFhXdnY2Tz/9NL/88gsKhYKpU6fyyCOPoFKpHJ5PT3Lu9yQIAl5eXowYMYL777+f0aNH98u8eopXXnmFkSNHMnbsWI4ePcr8+fMdWi49Pb2XZybR32zatIn4+Hguu+wyh8avW7eOAwcO8Ne//hV/f3+GDRs2IObVX/j6+qJWq9Fqtfj7+/f4+tVqNZs2bWL//v2Ul5fj5+dHSkoKt912GxMmTOjx7dmuD8eOHcPLy6vH1w+g0+lYu3Yt3333HTKZjMsvv5y///3veHh4dGu9knDpI5599tlm///kk084dOhQi/fj4uIAyMnJ4fbbb0er1XLDDTcwfPhwampq+Oyzz1i4cCELFixg+fLlLbZzzTXXcNFFF2G1WsnOzua9997jhx9+YMeOHQ6Ll+3bt1NaWtrm5y+88AJbtmzhz3/+M8OHD2ffvn089NBDCILA1VdfbR9XVlbGrbfeiqenJ0uWLKGhoYE333yTjIwMPvzwww5z9XuTs7+nvLw83n33XebPn89HH33U40/efYVWq2X37t2sX78eaPotnfv7+sc//oG7uzsLFy7sjyk6xJ49e/o8jfUvf/kL99xzT4+uc+3atQykahObN29m+vTpDguEI0eOMG3atBYPLv09r/5CJpPh4+NDZWUlfn5+PfobLSoqYs6cOXh5ebFs2TISExMxm80cPHiQ1atXs2fPnh7bVk8jiiIWiwW5vKWcWLp0KWq1mq1bt2IymXj00Ud54okn2LBhQ7c3KtEPrF69WkxMTGz1M6PRKF5zzTXiyJEjxRMnTjT7zGw2i4sXLxYTExPFL774wv5+YWGhmJiYKL7++uvNxu/bt09MTEwUH3/8cYfmpdFoxLFjx4ovv/xyq+srKysThw4dKq5evdr+ntVqFW+55RbxoosuEs1ms/39lStXiiNGjBCLi4vt7x06dEhMTEwU33//fYfm09O09T19//33YmJiorhy5coe3d4ll1wiLl++vEfWZTAYRIvF0ubnW7duFUeMGCHW1dW1Oebqq68W586d2yPz6QleeumlNs+DvqC+vr7ftt3XjBo1qlO/xaSkpGbneW/R2Xm1hV6vF8+cOSPq9XrRarXazxWr1SrWN5p65KXR1YrHfv1NLK/UtTvOarV2au533XWXOGXKlFZ/j9XV1fZ/JyYmiv/5z39EURTFI0eOiImJic0+P3PmjJiYmCgWFhaKoiiKRUVF4r333itecMEF4siRI8WrrrpK3L9/v/06ePbLdgwsFou4adMm8ZJLLhGHDx8uzpgxQ/zqq6/s27Btd//+/eLMmTPFoUOHikeOHGkx76ysLDExMVE8efKk/b3vv/9eTEpKEsvKylr9Hs4+hu0hWVwGIN988w0ZGRk88MADjBw5stlnTk5OrFmzhoMHD7Jx40auuuqqdtf1pz/9CWhS9I7w/PPPExMTw7XXXstLL73U4vO9e/diMpm45ZZb7O8JgsCcOXN46KGH+PXXX7ngggvs+3HxxRcTGhpqHztx4kSio6P56quvuPnmmx2aU19gm3NhYWGz9z/++GM++eQTMjMzqa2tJTIykrlz5zbbf2h66nj11Vd5//33qa6uZsSIETzxxBMttqPT6di8eTMHDx6kqKgIQRAYM2YMS5cuJTk52T7OZsb9xz/+QUZGBjt37kStVvPTTz+1adbdu3cvI0aM6LQZtrCwkOeee44jR47Q2NhIUlIS9913HxdffHGL+bzwwgukpaXx8ccfU19fz5/+9CdWrlxJSEhIh9v5+eefWbduHRkZGQQFBXHXXXe1Ou7SSy/lwgsvtFuOTCYTmzdv5tNPP6W0tBR3d3diY2P561//yqRJk+zLZWdn89JLL3H06FEaGhoICQnhiiuuYMmSJUBTHMvLL7/MF198wauvvsoPP/xAWFgYu3fvtn92tsssKSmJW2+9lXHjxrFx40aKiopISUlhzZo1JCUl8f777/PGG29QVlbGqFGjWLduHeHh4fblV6xYwU8//cS3334L/M9NuWzZMpRKJa+99hplZWUkJSWxcuVKRowYYV82LS2Nf//73xw7doyKigq8vLy46KKLWLZsGb6+vvZxtnl/8803vPrqq+zduxdRFLn88st54okncHNzs+8LwK5du9i1axcAM2fOtH/HZ7Nz504eeeQRAN555x3eeecd4H/uxJqaGjZu3Mg333xDZWUlISEhzJo1i7vuuqtZOusbb7zBf/7zH3Jzc9Hr9cTHx3PPPfdwxRVXNPuO25rXud/fuft87rG65557uPjii8nLy8NsNhMZGYmnpyc3vvojvxToWuxn9yhs99MLonz5cOEEh6wyOp2OAwcOsGTJklbrz3THjbNmzRpMJhPbt2/H3d2drKwse3XajRs3smjRIvbs2YNSqbTXTbGda6tXryY6Oppjx47x8MMPo1KpuPDCC+3r3rBhA8uXLyciIqLVOf766694eXkxfPhw+3sTJ05EJpNx8uRJ/u///q/L+yUJlwGI7US9/vrrW/3c09OTadOmsWvXLvLz84mKimpzXQUFBQD4+Ph0uN2TJ0+ye/du3n333TZPuNTUVNzd3e0uLRu2i25qaioXXHAB5eXlVFZWtuoXHzFiBD/88EOH8+lLiouLgZYXiffee4+EhAQuvfRS5HI53333HatXr0YURW699Vb7uBdffJFXX32VqVOnMnXqVE6fPs2CBQswmUzN1ldYWMjevXu54oorCA8PR6PR8MEHHzB37ly++OILgoKCmo3/17/+hUKh4M4778RoNKJQKFqdv8lk4vfff2fOnDmd2m+NRsPs2bPR6/XMmzcPX19fdu3axV/+8hdeeumlFheXV199FUEQuPvuu6msrOStt97i9ttv55NPPmm3YFR6ejp33nknKpWKRYsWYTab2bhxI35+fh3O8eWXX2bz5s3MmjWLESNGUFdXx6lTpzh9+rRduKSlpXHrrbcil8u5+eabCQsLo6CggG+//dYuXGw8+OCDREVFsWTJkg5dOT///DPffvutXahu2bKFhQsXctddd/Huu+9yyy23UF1dzeuvv86jjz7K22+
"text/plain": [
"<Figure size 600x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>n_tx_total</th>\n",
" <th>n_isin_total</th>\n",
" <th>avg_holding_months_per_isin</th>\n",
" <th>months_since_last_tx</th>\n",
" <th>gross_flow_to_aum</th>\n",
" <th>log_aum_qty_mean</th>\n",
" </tr>\n",
" <tr>\n",
" <th>cluster_k5</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>819.0</td>\n",
" <td>25.0</td>\n",
" <td>52.904762</td>\n",
" <td>0.0</td>\n",
" <td>1.488451</td>\n",
" <td>10.974937</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.0</td>\n",
" <td>2.0</td>\n",
" <td>42.428571</td>\n",
" <td>19.0</td>\n",
" <td>1.388519</td>\n",
" <td>11.173746</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>90.5</td>\n",
" <td>12.5</td>\n",
" <td>32.149303</td>\n",
" <td>1.0</td>\n",
" <td>4.382506</td>\n",
" <td>10.356551</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1448.0</td>\n",
" <td>24.0</td>\n",
" <td>40.857143</td>\n",
" <td>0.0</td>\n",
" <td>5.470824</td>\n",
" <td>11.044803</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4935.5</td>\n",
" <td>47.5</td>\n",
" <td>57.100000</td>\n",
" <td>0.0</td>\n",
" <td>5.154737</td>\n",
" <td>11.993787</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" n_tx_total n_isin_total avg_holding_months_per_isin \\\n",
"cluster_k5 \n",
"0 819.0 25.0 52.904762 \n",
"1 4.0 2.0 42.428571 \n",
"2 90.5 12.5 32.149303 \n",
"3 1448.0 24.0 40.857143 \n",
"4 4935.5 47.5 57.100000 \n",
"\n",
" months_since_last_tx gross_flow_to_aum log_aum_qty_mean \n",
"cluster_k5 \n",
"0 0.0 1.488451 10.974937 \n",
"1 19.0 1.388519 11.173746 \n",
"2 1.0 4.382506 10.356551 \n",
"3 0.0 5.470824 11.044803 \n",
"4 0.0 5.154737 11.993787 "
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"top_features_top400 = select_top_features(\n",
" dfc_top400,\n",
" all_features_top400,\n",
" \"cluster_k5\",\n",
" top_n=6\n",
")\n",
"\n",
"plot_radar(\n",
" dfc_top400,\n",
" top_features_top400,\n",
" \"cluster_k5\",\n",
" title=\"TOP 400 — Radar (Top discriminant features)\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4d5f8f2f-6f2e-40f2-9025-2e91bdd61c96",
"metadata": {},
2026-04-07 20:26:19 +02:00
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}