Go to file

arevelle-ensae 354f6847b6 standard model		2024-03-29 10:15:28 +00:00
Descriptive_statistics	fix errors	2024-03-14 23:02:50 +00:00
exploratory_analysis	ajout dossier + rangement notebooks	2024-02-26 15:51:31 +00:00
notebooks_merge	ajout dossier + rangement notebooks	2024-02-26 15:51:31 +00:00
Spectacle	added summary with weights	2024-03-21 13:21:40 +00:00
Sport	adjusted graphic options	2024-03-28 13:13:13 +00:00
useless	Changement architecture p1	2024-03-28 21:18:08 +00:00
.gitignore	Actualiser .gitignore	2024-01-13 14:56:43 +01:00
1_Input_Cleaning.py	Changement architecture p1	2024-03-28 21:18:08 +00:00
2_Datasets_Generation.py	Changement architecture p1	2024-03-28 21:18:08 +00:00
3_Modelling_Datasets.py	Changement architecture p1	2024-03-28 21:18:08 +00:00
4_Descriptive_Statistics.py	Changement architecture p1	2024-03-28 21:18:08 +00:00
0_5_Machine_Learning.py	standard model	2024-03-29 10:15:28 +00:00
0_6_segmentation_V2TP.py	added activity in the titles of graphics	2024-03-28 09:27:29 +00:00
0_6_Segmentation.py	add probability	2024-03-20 13:07:33 +00:00
0_7_CA_segment.py	CA estimation by segment works well	2024-03-27 18:59:05 +00:00
code_base_train_test.ipynb	code	2024-03-05 01:44:01 +00:00
code_valeur manquante.ipynb	Valeur_manquante	2024-02-20 01:27:57 +00:00
Exploration_billet_AJ.ipynb	Ajout exploration is_partner	2024-03-27 22:21:26 +00:00
Identification_entreprise.ipynb	IDENTIFICATION	2024-02-20 01:27:30 +00:00
Notebook_AR.ipynb	look at graph	2024-03-20 09:27:03 +00:00
Notebook_Fanta.ipynb	update	2024-01-10 18:26:26 +00:00
README.md	added README	2024-03-28 16:48:22 +00:00
Traitement_Fanta.ipynb	traitement	2024-02-11 13:17:32 +00:00
utils_CA_segment.py	CA estimation by segment works well	2024-03-27 18:59:05 +00:00
utils_cleaning_and_merge.py	Changement architecture p1	2024-03-28 21:18:08 +00:00
utils_features_construction.py	Changement architecture p1	2024-03-28 21:18:08 +00:00
utils_ml.py	fix path	2024-03-29 10:14:14 +00:00
utils_segmentation_V2TP.py	adjusted graphic options	2024-03-28 13:13:13 +00:00
utils_segmentation.py	add probability	2024-03-20 13:07:33 +00:00
utils_stat_desc.py	fix path	2024-03-29 10:14:14 +00:00

README.md

Business data challenge 2023-2024 | ENSAE Paris

Arenametrix : customer segmentation

Team 1 :

Antoine JOUBREL
Alexis REVELLE
Fanta RODRIGUE
Thomas PIQUÉ

Coaches :

Elia LAPENTA
Michael VISSER

Description of the problematic

The goal of this project is to create segments of customers from 15 companies belonging to 3 different types of activities (sports companies, museum, and music companies).

Our approach

We opted for a sector-based approach, which means that 3 segmentations have been performed (one for each type of activity). As the segments have to be linked to a probability of future purchase, we directly used the probability of purchase during the incoming year to make segments. The first step of the modelization is a pipeline that fits 3 ML models (naive bayes, random forest, and logistic regression) on the data to predict whether the customer will purchase during the year. We then use the probability of purchase estimated to split the customers into 4 segments. For each segment, we can estimate the potential number of tickets and revenue for the incoming year.

How run the code

run 0_1_Input_cleaning.py to clean the raw data and generate dataframes that will be used to build datasets with insightful variables.
run 0_2_Dataset_construction.py.
run 0_3_General_modelization_dataset.py to generate test and train sets for the 3 types of activities.
run the script 0_4_Generate_stat_desc.py to generate graphics describing the data
run 0_5_Machine_Learning.py. 3 ML models will be fitted on the data, and results will be exported for all 3 types of activities
run 0_6_Segmentation.py. The test set will be fitted with the optimal parameters computed previously. That will allow to compute a propensity score (probability of a future purchase). Segmentation is performed according to the scores provided. This scripts exports graphics describing the marketing personae associated to the segments as well as their business value.
run 0_7_CA_segment.py. The scores will be adjusted to better fit the overall probability of a purchase. This score adjusted is used to estimate the number of tickets sold and the revenue generated during the incoming year.