This commit is contained in:
Thomas PIQUE 2024-04-03 19:30:04 +00:00
parent 5fa57cb4b9
commit b9aa0d7578

View File

@ -58,8 +58,8 @@ Use dataframes previously created and aggregate them to create test and train se
For each type of activity, the test and train sets of the 5 tenants are concatenated. Databases are exported to location 1_Temp/1_0_Modelling_Datasets/.
- `4_Descriptive_statistics.py` \
Generate graphics providing some descriptive statistics about the data at the activity level. All graphics are exported to location 2_Output/2_0_Descriptive_Statistics/.
- `5_Modelling.py. 3 ML` \
Models will be fitted on the data, and results will be exported for all 3 types of activities. \
- `5_Modelling.py` \
3 ML models will be fitted on the data, and results will be exported for all 3 types of activities. \
3 pipelines are built, one by type of model (Naive Bayes, Random Forest, Logistic Regression). For the 2 latter ML methods, cross validation was performed to ensure generalization. Graphics displaying the quality of the training are provided. Optimal parameters found are saved in a pickle file (which will be used in the 6th step to add propensity scores to the test set and then determine the segments of the customers). All these files are exported to location 2_Output/2_1_Modeling_results/
- `6_Segmentation_and_Marketing_Personae.py` \
The test set will be fitted with the optimal parameters computed previously, and a propensity score (probability of a future purchase) will be assigned to each customer of this dataset. Segmentation is performed according to the scores provided. Graphics describing the marketing personae associated to the segments as well as their business value are exported to location 2_Output/2_2_Segmentation_and_Marketing_Personae/.