Photo by Mike Kononov on Unsplash
This analysis is the combined effort of Umaer and me.
Telecom Churn Case Study
Analysis Approach :
Telecommunications industry experiences an average of 15 - 25% annual churn rate. Given the fact that it costs 5-10 times more to acquire a new customer than to retain an existing one, customer retention has become even more important than customer acquisition.
Here we are given with 4 months of data related to customer usage. In this case study, we analyse customer-level data of a leading telecom firm, build predictive models to identify customers at high risk of churn and identify the main indicators of churn.
Churn is predicted using two approaches. Usage based churn and Revenue based churn. Usage based churn:
Customers who have zero usage, either incoming or outgoing - in terms of calls, internet etc. over a period of time.
This case study only considers usage based churn.
In the Indian and the southeast Asian market, approximately 80% of revenue comes from the top 20% customers (called high-value customers). Thus, if we can reduce churn of the high-value customers, we will be able to reduce significant revenue leakage. Hence, this case study focuses on high value customers only.
The dataset contains customer-level information for a span of four consecutive months - June, July, August and September. The months are encoded as 6, 7, 8 and 9, respectively.
The business objective is to predict the churn in the last (i.e. the ninth) month using the data (features) from the first three months.
This is a classification problem, where we need to predict whether the customers is about to churn or not. We have carried out Baseline Logistic Regression, then Logistic Regression with PCA, PCA + Random Forest, PCA + XGBoost.
Analysis Steps
Data Cleaning and EDA
- We have started with importing Necessary packages and libraries.
- We have loaded the dataset into a dataframe.
- We have checked the number of columns, their data types, Null count and unique value_value_count to get some understanding about data and to check if the columns are under correct data-type.
- Checking for duplicate records (rows) in the data. There were no duplicates.
- Since ‘mobile_number’ is the unique identifier available, we have made it our index to retain the identity.
- Have found some columns that donot follow the naming standard, we have renamed those columns to make sure all the variables follow the same naming convention.
- Follwing with column renaming, we have dealt with converting the columns into their respective data types. Here, we have evaluated all the columns which are having less than or equal to 29 unique values as catrgorical columns and rest as contineous columns.
- The date columns were having ‘object’ as their data type, we have converted to the proper datetime format.
- Since, our analysis is focused on the HVC(High value customers), we have filtered for high value customers to carryout the further analysis. The metric of this filtering of HVC is such that all the customers whose ‘Average_rech_amt’ of months 6 and 7 greater than or equal to 70th percentile of the ‘Average_rech_amt’ are considered as High Value Customers.
- Checked for missing values.
- Dropped all the columns with missing values greater than 50%.
- We have been given 4 months data. Since each months revenue and usage data is not related to other, we did month-wise drill down on missing values.
- Some columns had similar range of missing values. So, we have looked at their related columns and checked if these might be imputed with zero.
- We have found that ‘last_date_of_the_month’ had some misisng values, so this is very meaningful and we have imputed the last date based on the month.
- We have found some columns with only one unique value, so it is of no use for the analysis, hence we have dropped those columns.
- Once after checking all the data preparation tasks, tagged the Churn variable(which is our target variable).
- After imputing, we have dropped churn phase columns (Columns belonging to month - 9).
- After all the above processing, we have retained 30,011 rows and 126 columns.
- Exploratory Data Analysis
- The telecom company has many users with negative average revenues in both phases. These users are likely to churn.
- Most customers prefer the plans of ‘0’ category.
- The customers with lesser ‘aon’ are more likely to Churn when compared to the Customers with higer ‘aon’.
- Revenue generated by the Customers who are about to churn is very unstable.
- The Customers whose arpu decreases in 7th month are more likely to churn when compared to ones with increase in arpu.
- The Customers with high total_og_mou in 6th month and lower total_og_mou in 7th month are more likely to churn compared to the rest.
- The Customers with decrease in rate of total_ic_mou in 7th month are more likely to churn, compared to the rest.
- Customers with stable usage of 2g volume throughout 6 and 7 months are less likely to churn.
- Customers with fall in usage of 2g volume in 7th month are more likely to Churn.
- Customers with stable usage of 3g volume throughout 6 and 7 months are less likely to churn.
- Customers with fall in consumption of 3g volume in 7th month are more likely to Churn.
- The customers with lower total_og_mou in 6th and 8th months are more likely to Churn compared to the ones with higher total_og_mou.
- The customers with lesser total_og_mou_8 and aon are more likely to churn compared to the one with higher total_og_mou_8 and aon.
- The customers with less total_ic_mou_8 are more likely to churn irrespective of aon.
- The customers with total_ic_mou_8 > 2000 are very less likely to churn.
- Correlation analysis has been performed.
- We have created the derived variables and then removed the variables that were used to derive new ones.
- Outlier treatment has been performed. We have looked at the quantiles to understand the spread of Data.
- We have capped the upper outliers to 99th percentile.
- We have checked categorical variables and contribution of classes in those variables. The classes with less ccontribution are grouped into ‘Others’.
- Dummy Variables were created.
Pre-processing Steps
- Train-Test Split has been performed.
- The data has high class-imbalance with the ratio of 0.095 (class 1 : class 0).
- SMOTE technique has been used to overcome class-imbalance.
- Predictor columns have been standardized to mean - 0 and standard_deviation- 1.
Modelling
Model 1 : Logistic Regression with RFE & Manual Elimination ( Interpretable Model )
Most important predictors of Churn , in order of importance and their coefficients are as follows :
- loc_ic_t2f_mou_8 -1.2736
- total_rech_num_8 -1.2033
- total_rech_num_6 0.6053
- monthly_3g_8_0 0.3994
- monthly_2g_8_0 0.3666
- std_ic_t2f_mou_8 -0.3363
- std_og_t2f_mou_8 -0.2474
- const -0.2336
- monthly_3g_7_0 -0.2099
- std_ic_t2f_mou_7 0.1532
- sachet_2g_6_0 -0.1108
- sachet_2g_7_0 -0.0987
- sachet_2g_8_0 0.0488
- sachet_3g_6_0 -0.0399
PCA: PCA : 95% of variance in the train set can be explained by first 16 principal components and 100% of variance is explained by the first 45 principal components.
Model 2 : PCA + Logistic Regression
1Train Performance :23 Accuracy : 0.6274 Sensitivity / True Positive Rate / Recall : 0.9185 Specificity / True Negative Rate : 0.5996 Precision / Positive Predictive Value : 0.1797 F1-score : 0.389 Test Performance :1011 Accuracy : 0.08612 Sensitivity / True Positive Rate / Recall : 1.013 Specificity / True Negative Rate : 0.014 Precision / Positive Predictive Value : 0.08615 F1-score : 0.158
Model 3 : PCA + Random Forest Classifier
1Train Performance :23 Accuracy : 0.8824 Sensitivity / True Positive Rate / Recall : 0.8165 Specificity / True Negative Rate : 0.8886 Precision / Positive Predictive Value : 0.4087 F1-score : 0.54489 Test Performance :1011 Accuracy : 0.8612 Sensitivity / True Positive Rate / Recall : 0.8013 Specificity / True Negative Rate : 0.7814 Precision / Positive Predictive Value :0.3715 F1-score :0.51
Model 4 : PCA + XGBoost
1Train Performance :23 Accuracy : 0.8734 Sensitivity / True Positive Rate / Recall : 0.8875 Specificity / True Negative Rate : 0.8726 Precision / Positive Predictive Value : 0.3967 F1-score : 0.54889 Test Performance :1011 Accuracy : 0.08612 Sensitivity / True Positive Rate / Recall : 1.013 Specificity / True Negative Rate : 0.014 Precision / Positive Predictive Value : 0.08615 F1-score : 0.158
Recommendations :
Following are the strongest indicators of churn
Customers who churn show lower average monthly local incoming calls from fixed line in the action period by 1.27 standard deviations , compared to users who don’t churn , when all other factors are held constant. This is the strongest indicator of churn. Customers who churn show lower number of recharges done in action period by 1.20 standard deviations, when all other factors are held constant. This is the second strongest indicator of churn. Further customers who churn have done 0.6 standard deviations higher recharge than non-churn customers. This factor when coupled with above factors is a good indicator of churn. Customers who churn are more likely to be users of ‘monthly 2g package-0 / monthly 3g package-0’ in action period (approximately 0.3 std deviations higher than other packages), when all other factors are held constant.
Based on the above indicators the recommendations to the telecom company are :
Concentrate on users with 1.27 std devations lower than average incoming calls from fixed line. They are most likely to churn. Concentrate on users who recharge less number of times ( less than 1.2 std deviations compared to avg) in the 8th month. They are second most likely to churn. Models with high sensitivity are the best for predicting churn. Use the PCA + Logistic Regression model to predict churn. It has an ROC score of 0.87, test sensitivity of 100%.
Analysis
Data Understanding
1# Importing Necessary Libraries.2import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns3import warnings4warnings.filterwarnings('ignore')56# Setting max display columns and rows.7pd.set_option('display.max_rows', 500)8pd.set_option('display.max_columns', 500)
1# Reading Dataset into a DataFrame.2data=pd.read_csv('telecom_churn_data.csv')3data.head()
mobile_number | circle_id | loc_og_t2o_mou | std_og_t2o_mou | loc_ic_t2o_mou | last_date_of_month_6 | last_date_of_month_7 | last_date_of_month_8 | last_date_of_month_9 | arpu_6 | arpu_7 | arpu_8 | arpu_9 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | onnet_mou_9 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | offnet_mou_9 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_ic_mou_9 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | roam_og_mou_9 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2t_mou_9 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2m_mou_9 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2f_mou_9 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_t2c_mou_9 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | loc_og_mou_9 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2t_mou_9 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2m_mou_9 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_t2f_mou_9 | std_og_t2c_mou_6 | std_og_t2c_mou_7 | std_og_t2c_mou_8 | std_og_t2c_mou_9 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | std_og_mou_9 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | isd_og_mou_9 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | spl_og_mou_9 | og_others_6 | og_others_7 | og_others_8 | og_others_9 | total_og_mou_6 | total_og_mou_7 | total_og_mou_8 | total_og_mou_9 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2t_mou_9 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2m_mou_9 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_t2f_mou_9 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | loc_ic_mou_9 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2t_mou_9 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2m_mou_9 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_t2f_mou_9 | std_ic_t2o_mou_6 | std_ic_t2o_mou_7 | std_ic_t2o_mou_8 | std_ic_t2o_mou_9 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | std_ic_mou_9 | total_ic_mou_6 | total_ic_mou_7 | total_ic_mou_8 | total_ic_mou_9 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | spl_ic_mou_9 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | isd_ic_mou_9 | ic_others_6 | ic_others_7 | ic_others_8 | ic_others_9 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | total_rech_num_9 | total_rech_amt_6 | total_rech_amt_7 | total_rech_amt_8 | total_rech_amt_9 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | max_rech_amt_9 | date_of_last_rech_6 | date_of_last_rech_7 | date_of_last_rech_8 | date_of_last_rech_9 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | last_day_rch_amt_9 | date_of_last_rech_data_6 | date_of_last_rech_data_7 | date_of_last_rech_data_8 | date_of_last_rech_data_9 | total_rech_data_6 | total_rech_data_7 | total_rech_data_8 | total_rech_data_9 | max_rech_data_6 | max_rech_data_7 | max_rech_data_8 | max_rech_data_9 | count_rech_2g_6 | count_rech_2g_7 | count_rech_2g_8 | count_rech_2g_9 | count_rech_3g_6 | count_rech_3g_7 | count_rech_3g_8 | count_rech_3g_9 | av_rech_amt_data_6 | av_rech_amt_data_7 | av_rech_amt_data_8 | av_rech_amt_data_9 | vol_2g_mb_6 | vol_2g_mb_7 | vol_2g_mb_8 | vol_2g_mb_9 | vol_3g_mb_6 | vol_3g_mb_7 | vol_3g_mb_8 | vol_3g_mb_9 | arpu_3g_6 | arpu_3g_7 | arpu_3g_8 | arpu_3g_9 | arpu_2g_6 | arpu_2g_7 | arpu_2g_8 | arpu_2g_9 | night_pck_user_6 | night_pck_user_7 | night_pck_user_8 | night_pck_user_9 | monthly_2g_6 | monthly_2g_7 | monthly_2g_8 | monthly_2g_9 | sachet_2g_6 | sachet_2g_7 | sachet_2g_8 | sachet_2g_9 | monthly_3g_6 | monthly_3g_7 | monthly_3g_8 | monthly_3g_9 | sachet_3g_6 | sachet_3g_7 | sachet_3g_8 | sachet_3g_9 | fb_user_6 | fb_user_7 | fb_user_8 | fb_user_9 | aon | aug_vbc_3g | jul_vbc_3g | jun_vbc_3g | sep_vbc_3g | |
0 | 7000842753 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 197.385 | 214.816 | 213.803 | 21.100 | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | 0.00 | 0.00 | 0.00 | 0.00 | NaN | NaN | 0.16 | NaN | NaN | NaN | 4.13 | NaN | NaN | NaN | 1.15 | NaN | NaN | NaN | 5.44 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.00 | NaN | 0.00 | 0.00 | 5.44 | 0.00 | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | NaN | 4 | 3 | 2 | 6 | 362 | 252 | 252 | 0 | 252 | 252 | 252 | 0 | 6/21/2014 | 7/16/2014 | 8/8/2014 | 9/28/2014 | 252 | 252 | 252 | 0 | 6/21/2014 | 7/16/2014 | 8/8/2014 | NaN | 1.0 | 1.0 | 1.0 | NaN | 252.0 | 252.0 | 252.0 | NaN | 0.0 | 0.0 | 0.0 | NaN | 1.0 | 1.0 | 1.0 | NaN | 252.0 | 252.0 | 252.0 | NaN | 30.13 | 1.32 | 5.75 | 0.0 | 83.57 | 150.76 | 109.61 | 0.00 | 212.17 | 212.17 | 212.17 | NaN | 212.17 | 212.17 | 212.17 | NaN | 0.0 | 0.0 | 0.0 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1.0 | 1.0 | 1.0 | NaN | 968 | 30.4 | 0.0 | 101.20 | 3.58 |
1 | 7001865778 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 34.047 | 355.074 | 268.321 | 86.285 | 24.11 | 78.68 | 7.68 | 18.34 | 15.74 | 99.84 | 304.76 | 53.76 | 0.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 23.88 | 74.56 | 7.68 | 18.34 | 11.51 | 75.94 | 291.86 | 53.76 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 2.91 | 0.00 | 0.00 | 35.39 | 150.51 | 299.54 | 72.11 | 0.23 | 4.11 | 0.00 | 0.00 | 0.00 | 0.46 | 0.13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.23 | 4.58 | 0.13 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 4.68 | 23.43 | 12.76 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 40.31 | 178.53 | 312.44 | 72.11 | 1.61 | 29.91 | 29.23 | 116.09 | 17.48 | 65.38 | 375.58 | 56.93 | 0.00 | 8.93 | 3.61 | 0.00 | 19.09 | 104.23 | 408.43 | 173.03 | 0.00 | 0.00 | 2.35 | 0.00 | 5.90 | 0.00 | 12.49 | 15.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 5.90 | 0.00 | 14.84 | 15.01 | 26.83 | 104.23 | 423.28 | 188.04 | 0.00 | 0.0 | 0.0 | 0.00 | 1.83 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 4 | 9 | 11 | 5 | 74 | 384 | 283 | 121 | 44 | 154 | 65 | 50 | 6/29/2014 | 7/31/2014 | 8/28/2014 | 9/30/2014 | 44 | 23 | 30 | 0 | NaN | 7/25/2014 | 8/10/2014 | NaN | NaN | 1.0 | 2.0 | NaN | NaN | 154.0 | 25.0 | NaN | NaN | 1.0 | 2.0 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | 154.0 | 50.0 | NaN | 0.00 | 108.07 | 365.47 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | NaN | 0.00 | 0.00 | NaN | NaN | 28.61 | 7.60 | NaN | NaN | 0.0 | 0.0 | NaN | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | 1.0 | 1.0 | NaN | 1006 | 0.0 | 0.0 | 0.00 | 0.00 |
2 | 7001625959 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 167.690 | 189.058 | 210.226 | 290.714 | 11.54 | 55.24 | 37.26 | 74.81 | 143.33 | 220.59 | 208.36 | 118.91 | 0.0 | 0.00 | 0.00 | 38.49 | 0.0 | 0.00 | 0.00 | 70.94 | 7.19 | 28.74 | 13.58 | 14.39 | 29.34 | 16.86 | 38.46 | 28.16 | 24.11 | 21.79 | 15.61 | 22.24 | 0.0 | 135.54 | 45.76 | 0.48 | 60.66 | 67.41 | 67.66 | 64.81 | 4.34 | 26.49 | 22.58 | 8.76 | 41.81 | 67.41 | 75.53 | 9.28 | 1.48 | 14.76 | 22.83 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 47.64 | 108.68 | 120.94 | 18.04 | 0.0 | 0.0 | 0.0 | 0.0 | 46.56 | 236.84 | 96.84 | 42.08 | 0.45 | 0.0 | 0.0 | 0.0 | 155.33 | 412.94 | 285.46 | 124.94 | 115.69 | 71.11 | 67.46 | 148.23 | 14.38 | 15.44 | 38.89 | 38.98 | 99.48 | 122.29 | 49.63 | 158.19 | 229.56 | 208.86 | 155.99 | 345.41 | 72.41 | 71.29 | 28.69 | 49.44 | 45.18 | 177.01 | 167.09 | 118.18 | 21.73 | 58.34 | 43.23 | 3.86 | 0.0 | 0.0 | 0.0 | 0.0 | 139.33 | 306.66 | 239.03 | 171.49 | 370.04 | 519.53 | 395.03 | 517.74 | 0.21 | 0.0 | 0.0 | 0.45 | 0.00 | 0.85 | 0.0 | 0.01 | 0.93 | 3.14 | 0.0 | 0.36 | 5 | 4 | 2 | 7 | 168 | 315 | 116 | 358 | 86 | 200 | 86 | 100 | 6/17/2014 | 7/24/2014 | 8/14/2014 | 9/29/2014 | 0 | 200 | 86 | 0 | NaN | NaN | NaN | 9/17/2014 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 46.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 46.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 8.42 | NaN | NaN | NaN | 2.84 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 1.0 | 1103 | 0.0 | 0.0 | 4.17 | 0.00 |
3 | 7001204172 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 221.338 | 251.102 | 508.054 | 389.500 | 99.91 | 54.39 | 310.98 | 241.71 | 123.31 | 109.01 | 71.68 | 113.54 | 0.0 | 54.86 | 44.38 | 0.00 | 0.0 | 28.09 | 39.04 | 0.00 | 73.68 | 34.81 | 10.61 | 15.49 | 107.43 | 83.21 | 22.46 | 65.46 | 1.91 | 0.65 | 4.91 | 2.06 | 0.0 | 0.00 | 0.00 | 0.00 | 183.03 | 118.68 | 37.99 | 83.03 | 26.23 | 14.89 | 289.58 | 226.21 | 2.99 | 1.73 | 6.53 | 9.99 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 29.23 | 16.63 | 296.11 | 236.21 | 0.0 | 0.0 | 0.0 | 0.0 | 10.96 | 0.00 | 18.09 | 43.29 | 0.00 | 0.0 | 0.0 | 0.0 | 223.23 | 135.31 | 352.21 | 362.54 | 62.08 | 19.98 | 8.04 | 41.73 | 113.96 | 64.51 | 20.28 | 52.86 | 57.43 | 27.09 | 19.84 | 65.59 | 233.48 | 111.59 | 48.18 | 160.19 | 43.48 | 66.44 | 0.00 | 129.84 | 1.33 | 38.56 | 4.94 | 13.98 | 1.18 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 45.99 | 105.01 | 4.94 | 143.83 | 280.08 | 216.61 | 53.13 | 305.38 | 0.59 | 0.0 | 0.0 | 0.55 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.80 | 10 | 11 | 18 | 14 | 230 | 310 | 601 | 410 | 60 | 50 | 50 | 50 | 6/28/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 30 | 50 | 50 | 30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | 2491 | 0.0 | 0.0 | 0.00 | 0.00 |
4 | 7000142493 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 261.636 | 309.876 | 238.174 | 163.426 | 50.31 | 149.44 | 83.89 | 58.78 | 76.96 | 91.88 | 124.26 | 45.81 | 0.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 50.31 | 149.44 | 83.89 | 58.78 | 67.64 | 91.88 | 124.26 | 37.89 | 0.00 | 0.00 | 0.00 | 1.93 | 0.0 | 0.00 | 0.00 | 0.00 | 117.96 | 241.33 | 208.16 | 98.61 | 0.00 | 0.00 | 0.00 | 0.00 | 9.31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.31 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 5.98 | 0.00 | 0.0 | 0.0 | 0.0 | 127.28 | 241.33 | 208.16 | 104.59 | 105.68 | 88.49 | 233.81 | 154.56 | 106.84 | 109.54 | 104.13 | 48.24 | 1.50 | 0.00 | 0.00 | 0.00 | 214.03 | 198.04 | 337.94 | 202.81 | 0.00 | 0.00 | 0.86 | 2.31 | 1.93 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 1.93 | 0.25 | 0.86 | 2.31 | 216.44 | 198.29 | 338.81 | 205.31 | 0.00 | 0.0 | 0.0 | 0.18 | 0.00 | 0.00 | 0.0 | 0.00 | 0.48 | 0.00 | 0.0 | 0.00 | 5 | 6 | 3 | 4 | 196 | 350 | 287 | 200 | 56 | 110 | 110 | 50 | 6/26/2014 | 7/28/2014 | 8/9/2014 | 9/28/2014 | 50 | 110 | 110 | 50 | 6/4/2014 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 56.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 56.0 | NaN | NaN | NaN | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | NaN | NaN | NaN | 1526 | 0.0 | 0.0 | 0.00 | 0.00 |
1# Checking information about data.2print(data.info())3def metadata_matrix(data) :4 return pd.DataFrame({5 'Datatype' : data.dtypes.astype(str),6 'Non_Null_Count': data.count(axis = 0).astype(int),7 'Null_Count': data.isnull().sum().astype(int),8 'Null_Percentage': round(data.isnull().sum()/len(data) * 100 , 2),9 'Unique_Values_Count': data.nunique().astype(int)10 }).sort_values(by='Null_Percentage', ascending=False)1112metadata_matrix(data)
1<class 'pandas.core.frame.DataFrame'>2RangeIndex: 99999 entries, 0 to 999983Columns: 226 entries, mobile_number to sep_vbc_3g4dtypes: float64(179), int64(35), object(12)5memory usage: 172.4+ MB6None
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
arpu_3g_6 | float64 | 25153 | 74846 | 74.85 | 7418 |
night_pck_user_6 | float64 | 25153 | 74846 | 74.85 | 2 |
total_rech_data_6 | float64 | 25153 | 74846 | 74.85 | 37 |
arpu_2g_6 | float64 | 25153 | 74846 | 74.85 | 6990 |
max_rech_data_6 | float64 | 25153 | 74846 | 74.85 | 48 |
fb_user_6 | float64 | 25153 | 74846 | 74.85 | 2 |
av_rech_amt_data_6 | float64 | 25153 | 74846 | 74.85 | 887 |
date_of_last_rech_data_6 | object | 25153 | 74846 | 74.85 | 30 |
count_rech_2g_6 | float64 | 25153 | 74846 | 74.85 | 31 |
count_rech_3g_6 | float64 | 25153 | 74846 | 74.85 | 25 |
date_of_last_rech_data_7 | object | 25571 | 74428 | 74.43 | 31 |
total_rech_data_7 | float64 | 25571 | 74428 | 74.43 | 42 |
fb_user_7 | float64 | 25571 | 74428 | 74.43 | 2 |
max_rech_data_7 | float64 | 25571 | 74428 | 74.43 | 48 |
night_pck_user_7 | float64 | 25571 | 74428 | 74.43 | 2 |
count_rech_2g_7 | float64 | 25571 | 74428 | 74.43 | 36 |
av_rech_amt_data_7 | float64 | 25571 | 74428 | 74.43 | 961 |
arpu_2g_7 | float64 | 25571 | 74428 | 74.43 | 6586 |
count_rech_3g_7 | float64 | 25571 | 74428 | 74.43 | 28 |
arpu_3g_7 | float64 | 25571 | 74428 | 74.43 | 7246 |
total_rech_data_9 | float64 | 25922 | 74077 | 74.08 | 37 |
count_rech_3g_9 | float64 | 25922 | 74077 | 74.08 | 27 |
fb_user_9 | float64 | 25922 | 74077 | 74.08 | 2 |
max_rech_data_9 | float64 | 25922 | 74077 | 74.08 | 50 |
arpu_3g_9 | float64 | 25922 | 74077 | 74.08 | 8063 |
date_of_last_rech_data_9 | object | 25922 | 74077 | 74.08 | 30 |
night_pck_user_9 | float64 | 25922 | 74077 | 74.08 | 2 |
arpu_2g_9 | float64 | 25922 | 74077 | 74.08 | 6795 |
count_rech_2g_9 | float64 | 25922 | 74077 | 74.08 | 32 |
av_rech_amt_data_9 | float64 | 25922 | 74077 | 74.08 | 945 |
total_rech_data_8 | float64 | 26339 | 73660 | 73.66 | 46 |
arpu_3g_8 | float64 | 26339 | 73660 | 73.66 | 7787 |
fb_user_8 | float64 | 26339 | 73660 | 73.66 | 2 |
night_pck_user_8 | float64 | 26339 | 73660 | 73.66 | 2 |
av_rech_amt_data_8 | float64 | 26339 | 73660 | 73.66 | 973 |
max_rech_data_8 | float64 | 26339 | 73660 | 73.66 | 50 |
count_rech_3g_8 | float64 | 26339 | 73660 | 73.66 | 29 |
arpu_2g_8 | float64 | 26339 | 73660 | 73.66 | 6652 |
count_rech_2g_8 | float64 | 26339 | 73660 | 73.66 | 34 |
date_of_last_rech_data_8 | object | 26339 | 73660 | 73.66 | 31 |
ic_others_9 | float64 | 92254 | 7745 | 7.75 | 1923 |
std_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 26553 |
std_og_t2c_mou_9 | float64 | 92254 | 7745 | 7.75 | 1 |
isd_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 5557 |
std_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 11266 |
isd_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 1255 |
spl_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 4095 |
spl_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 384 |
og_others_9 | float64 | 92254 | 7745 | 7.75 | 235 |
loc_ic_t2t_mou_9 | float64 | 92254 | 7745 | 7.75 | 12993 |
std_ic_t2o_mou_9 | float64 | 92254 | 7745 | 7.75 | 1 |
loc_ic_t2m_mou_9 | float64 | 92254 | 7745 | 7.75 | 21484 |
std_ic_t2f_mou_9 | float64 | 92254 | 7745 | 7.75 | 3090 |
loc_ic_t2f_mou_9 | float64 | 92254 | 7745 | 7.75 | 7091 |
loc_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 27697 |
std_ic_t2m_mou_9 | float64 | 92254 | 7745 | 7.75 | 8933 |
std_og_t2f_mou_9 | float64 | 92254 | 7745 | 7.75 | 2295 |
std_og_t2t_mou_9 | float64 | 92254 | 7745 | 7.75 | 17934 |
std_ic_t2t_mou_9 | float64 | 92254 | 7745 | 7.75 | 6157 |
loc_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 25376 |
roam_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 5882 |
loc_og_t2m_mou_9 | float64 | 92254 | 7745 | 7.75 | 20141 |
loc_og_t2f_mou_9 | float64 | 92254 | 7745 | 7.75 | 3758 |
roam_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 4827 |
offnet_mou_9 | float64 | 92254 | 7745 | 7.75 | 30077 |
loc_og_t2c_mou_9 | float64 | 92254 | 7745 | 7.75 | 2332 |
loc_og_t2t_mou_9 | float64 | 92254 | 7745 | 7.75 | 12949 |
std_og_t2m_mou_9 | float64 | 92254 | 7745 | 7.75 | 19052 |
onnet_mou_9 | float64 | 92254 | 7745 | 7.75 | 23565 |
onnet_mou_8 | float64 | 94621 | 5378 | 5.38 | 24089 |
std_ic_t2t_mou_8 | float64 | 94621 | 5378 | 5.38 | 6352 |
std_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 11662 |
loc_ic_t2t_mou_8 | float64 | 94621 | 5378 | 5.38 | 13346 |
roam_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 6504 |
std_ic_t2m_mou_8 | float64 | 94621 | 5378 | 5.38 | 9304 |
loc_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 28200 |
std_ic_t2f_mou_8 | float64 | 94621 | 5378 | 5.38 | 3051 |
roam_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 5315 |
std_ic_t2o_mou_8 | float64 | 94621 | 5378 | 5.38 | 1 |
loc_og_t2t_mou_8 | float64 | 94621 | 5378 | 5.38 | 13336 |
loc_ic_t2f_mou_8 | float64 | 94621 | 5378 | 5.38 | 7097 |
offnet_mou_8 | float64 | 94621 | 5378 | 5.38 | 30908 |
loc_ic_t2m_mou_8 | float64 | 94621 | 5378 | 5.38 | 21886 |
loc_og_t2m_mou_8 | float64 | 94621 | 5378 | 5.38 | 20544 |
isd_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 1276 |
ic_others_8 | float64 | 94621 | 5378 | 5.38 | 1896 |
og_others_8 | float64 | 94621 | 5378 | 5.38 | 216 |
spl_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 102 |
loc_og_t2f_mou_8 | float64 | 94621 | 5378 | 5.38 | 3807 |
std_og_t2m_mou_8 | float64 | 94621 | 5378 | 5.38 | 19786 |
spl_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 4390 |
std_og_t2c_mou_8 | float64 | 94621 | 5378 | 5.38 | 1 |
isd_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 5844 |
loc_og_t2c_mou_8 | float64 | 94621 | 5378 | 5.38 | 2516 |
std_og_t2f_mou_8 | float64 | 94621 | 5378 | 5.38 | 2333 |
std_og_t2t_mou_8 | float64 | 94621 | 5378 | 5.38 | 18291 |
loc_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 25990 |
std_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 27491 |
date_of_last_rech_9 | object | 95239 | 4760 | 4.76 | 30 |
std_ic_t2f_mou_6 | float64 | 96062 | 3937 | 3.94 | 3125 |
ic_others_6 | float64 | 96062 | 3937 | 3.94 | 1817 |
isd_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 5521 |
std_ic_t2m_mou_6 | float64 | 96062 | 3937 | 3.94 | 9308 |
std_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 11646 |
spl_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 84 |
std_ic_t2o_mou_6 | float64 | 96062 | 3937 | 3.94 | 1 |
loc_ic_t2f_mou_6 | float64 | 96062 | 3937 | 3.94 | 7250 |
loc_ic_t2t_mou_6 | float64 | 96062 | 3937 | 3.94 | 13540 |
std_og_t2c_mou_6 | float64 | 96062 | 3937 | 3.94 | 1 |
std_og_t2f_mou_6 | float64 | 96062 | 3937 | 3.94 | 2450 |
std_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 27502 |
std_og_t2m_mou_6 | float64 | 96062 | 3937 | 3.94 | 19734 |
isd_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 1381 |
std_og_t2t_mou_6 | float64 | 96062 | 3937 | 3.94 | 18244 |
spl_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 3965 |
loc_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 26372 |
og_others_6 | float64 | 96062 | 3937 | 3.94 | 1018 |
loc_og_t2c_mou_6 | float64 | 96062 | 3937 | 3.94 | 2235 |
loc_og_t2m_mou_6 | float64 | 96062 | 3937 | 3.94 | 20905 |
loc_og_t2f_mou_6 | float64 | 96062 | 3937 | 3.94 | 3860 |
loc_og_t2t_mou_6 | float64 | 96062 | 3937 | 3.94 | 13539 |
roam_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 8038 |
std_ic_t2t_mou_6 | float64 | 96062 | 3937 | 3.94 | 6279 |
onnet_mou_6 | float64 | 96062 | 3937 | 3.94 | 24313 |
loc_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 28569 |
offnet_mou_6 | float64 | 96062 | 3937 | 3.94 | 31140 |
roam_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 6512 |
loc_ic_t2m_mou_6 | float64 | 96062 | 3937 | 3.94 | 22065 |
loc_og_t2c_mou_7 | float64 | 96140 | 3859 | 3.86 | 2426 |
roam_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 5230 |
loc_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 26091 |
loc_og_t2t_mou_7 | float64 | 96140 | 3859 | 3.86 | 13411 |
offnet_mou_7 | float64 | 96140 | 3859 | 3.86 | 31023 |
loc_og_t2f_mou_7 | float64 | 96140 | 3859 | 3.86 | 3863 |
std_og_t2t_mou_7 | float64 | 96140 | 3859 | 3.86 | 18567 |
std_ic_t2t_mou_7 | float64 | 96140 | 3859 | 3.86 | 6481 |
onnet_mou_7 | float64 | 96140 | 3859 | 3.86 | 24336 |
std_og_t2m_mou_7 | float64 | 96140 | 3859 | 3.86 | 20018 |
loc_og_t2m_mou_7 | float64 | 96140 | 3859 | 3.86 | 20637 |
std_og_t2f_mou_7 | float64 | 96140 | 3859 | 3.86 | 2391 |
roam_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 6639 |
std_og_t2c_mou_7 | float64 | 96140 | 3859 | 3.86 | 1 |
std_ic_t2m_mou_7 | float64 | 96140 | 3859 | 3.86 | 9464 |
isd_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 1380 |
ic_others_7 | float64 | 96140 | 3859 | 3.86 | 2002 |
loc_ic_t2f_mou_7 | float64 | 96140 | 3859 | 3.86 | 7395 |
loc_ic_t2m_mou_7 | float64 | 96140 | 3859 | 3.86 | 21918 |
std_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 11889 |
loc_ic_t2t_mou_7 | float64 | 96140 | 3859 | 3.86 | 13511 |
std_ic_t2f_mou_7 | float64 | 96140 | 3859 | 3.86 | 3209 |
loc_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 28390 |
spl_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 107 |
og_others_7 | float64 | 96140 | 3859 | 3.86 | 187 |
spl_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 4396 |
isd_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 5789 |
std_ic_t2o_mou_7 | float64 | 96140 | 3859 | 3.86 | 1 |
std_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 27951 |
date_of_last_rech_8 | object | 96377 | 3622 | 3.62 | 31 |
date_of_last_rech_7 | object | 98232 | 1767 | 1.77 | 31 |
last_date_of_month_9 | object | 98340 | 1659 | 1.66 | 1 |
date_of_last_rech_6 | object | 98392 | 1607 | 1.61 | 30 |
last_date_of_month_8 | object | 98899 | 1100 | 1.10 | 1 |
loc_ic_t2o_mou | float64 | 98981 | 1018 | 1.02 | 1 |
std_og_t2o_mou | float64 | 98981 | 1018 | 1.02 | 1 |
loc_og_t2o_mou | float64 | 98981 | 1018 | 1.02 | 1 |
last_date_of_month_7 | object | 99398 | 601 | 0.60 | 1 |
sachet_3g_8 | int64 | 99999 | 0 | 0.00 | 29 |
jul_vbc_3g | float64 | 99999 | 0 | 0.00 | 14162 |
aug_vbc_3g | float64 | 99999 | 0 | 0.00 | 14676 |
aon | int64 | 99999 | 0 | 0.00 | 3489 |
jun_vbc_3g | float64 | 99999 | 0 | 0.00 | 13312 |
monthly_2g_9 | int64 | 99999 | 0 | 0.00 | 5 |
sachet_3g_6 | int64 | 99999 | 0 | 0.00 | 25 |
vol_3g_mb_9 | float64 | 99999 | 0 | 0.00 | 14472 |
sachet_3g_7 | int64 | 99999 | 0 | 0.00 | 27 |
monthly_2g_8 | int64 | 99999 | 0 | 0.00 | 6 |
monthly_3g_9 | int64 | 99999 | 0 | 0.00 | 11 |
monthly_3g_8 | int64 | 99999 | 0 | 0.00 | 12 |
sachet_3g_9 | int64 | 99999 | 0 | 0.00 | 27 |
monthly_3g_7 | int64 | 99999 | 0 | 0.00 | 15 |
monthly_3g_6 | int64 | 99999 | 0 | 0.00 | 12 |
sachet_2g_9 | int64 | 99999 | 0 | 0.00 | 32 |
sachet_2g_8 | int64 | 99999 | 0 | 0.00 | 34 |
sachet_2g_7 | int64 | 99999 | 0 | 0.00 | 35 |
sachet_2g_6 | int64 | 99999 | 0 | 0.00 | 32 |
monthly_2g_7 | int64 | 99999 | 0 | 0.00 | 6 |
monthly_2g_6 | int64 | 99999 | 0 | 0.00 | 5 |
mobile_number | int64 | 99999 | 0 | 0.00 | 99999 |
vol_3g_mb_8 | float64 | 99999 | 0 | 0.00 | 14960 |
total_og_mou_9 | float64 | 99999 | 0 | 0.00 | 39160 |
total_rech_num_7 | int64 | 99999 | 0 | 0.00 | 101 |
total_rech_num_6 | int64 | 99999 | 0 | 0.00 | 102 |
total_ic_mou_9 | float64 | 99999 | 0 | 0.00 | 31260 |
total_ic_mou_8 | float64 | 99999 | 0 | 0.00 | 32128 |
total_ic_mou_7 | float64 | 99999 | 0 | 0.00 | 32242 |
total_ic_mou_6 | float64 | 99999 | 0 | 0.00 | 32247 |
circle_id | int64 | 99999 | 0 | 0.00 | 1 |
total_og_mou_8 | float64 | 99999 | 0 | 0.00 | 40074 |
vol_3g_mb_7 | float64 | 99999 | 0 | 0.00 | 14519 |
total_og_mou_7 | float64 | 99999 | 0 | 0.00 | 40477 |
total_og_mou_6 | float64 | 99999 | 0 | 0.00 | 40327 |
arpu_9 | float64 | 99999 | 0 | 0.00 | 79937 |
arpu_8 | float64 | 99999 | 0 | 0.00 | 83615 |
arpu_7 | float64 | 99999 | 0 | 0.00 | 85308 |
arpu_6 | float64 | 99999 | 0 | 0.00 | 85681 |
last_date_of_month_6 | object | 99999 | 0 | 0.00 | 1 |
total_rech_num_8 | int64 | 99999 | 0 | 0.00 | 96 |
total_rech_num_9 | int64 | 99999 | 0 | 0.00 | 97 |
total_rech_amt_6 | int64 | 99999 | 0 | 0.00 | 2305 |
total_rech_amt_7 | int64 | 99999 | 0 | 0.00 | 2329 |
vol_3g_mb_6 | float64 | 99999 | 0 | 0.00 | 13773 |
vol_2g_mb_9 | float64 | 99999 | 0 | 0.00 | 13919 |
vol_2g_mb_8 | float64 | 99999 | 0 | 0.00 | 14994 |
vol_2g_mb_7 | float64 | 99999 | 0 | 0.00 | 15114 |
vol_2g_mb_6 | float64 | 99999 | 0 | 0.00 | 15201 |
last_day_rch_amt_9 | int64 | 99999 | 0 | 0.00 | 185 |
last_day_rch_amt_8 | int64 | 99999 | 0 | 0.00 | 199 |
last_day_rch_amt_7 | int64 | 99999 | 0 | 0.00 | 173 |
last_day_rch_amt_6 | int64 | 99999 | 0 | 0.00 | 186 |
max_rech_amt_9 | int64 | 99999 | 0 | 0.00 | 201 |
max_rech_amt_8 | int64 | 99999 | 0 | 0.00 | 213 |
max_rech_amt_7 | int64 | 99999 | 0 | 0.00 | 183 |
max_rech_amt_6 | int64 | 99999 | 0 | 0.00 | 202 |
total_rech_amt_9 | int64 | 99999 | 0 | 0.00 | 2304 |
total_rech_amt_8 | int64 | 99999 | 0 | 0.00 | 2347 |
sep_vbc_3g | float64 | 99999 | 0 | 0.00 | 3720 |
Data Cleaning
1# Checking if there are any duplicate records.2data['mobile_number'].value_counts().sum()
199999
- Since number of rows is same as distinct mobile numbers, there is no duplicate data
1# mobile_number is a unique identifier2# Setting mobile_number as the index3data = data.set_index('mobile_number')
1# Renaming columns2data = data.rename({'jun_vbc_3g' : 'vbc_3g_6', 'jul_vbc_3g' : 'vbc_3g_7', 'aug_vbc_3g' : 'vbc_3g_8', 'sep_vbc_3g' : 'vbc_3g_9'}, axis=1)
1#Converting columns into appropriate data types and extracting singe value columns.2# Columns with unique values < 29 are considered as categorical variables.3# The number 30 is arrived at, by looking at the above metadata_matrix output.45columns=data.columns6change_to_cat=[]7single_value_col=[]8for column in columns:9 unique_value_count=data[column].nunique()10 if unique_value_count==1:11 single_value_col.append(column)12 if unique_value_count<=29 and unique_value_count!=0 and data[column].dtype in ['int','float']:13 change_to_cat.append(column)14print( ' Columns to change to categorical data type : \n' ,pd.DataFrame(change_to_cat), '\n')
1Columns to change to categorical data type :2 030 circle_id41 loc_og_t2o_mou52 std_og_t2o_mou63 loc_ic_t2o_mou74 std_og_t2c_mou_685 std_og_t2c_mou_796 std_og_t2c_mou_8107 std_og_t2c_mou_9118 std_ic_t2o_mou_6129 std_ic_t2o_mou_71310 std_ic_t2o_mou_81411 std_ic_t2o_mou_91512 count_rech_3g_61613 count_rech_3g_71714 count_rech_3g_81815 count_rech_3g_91916 night_pck_user_62017 night_pck_user_72118 night_pck_user_82219 night_pck_user_92320 monthly_2g_62421 monthly_2g_72522 monthly_2g_82623 monthly_2g_92724 monthly_3g_62825 monthly_3g_72926 monthly_3g_83027 monthly_3g_93128 sachet_3g_63229 sachet_3g_73330 sachet_3g_83431 sachet_3g_93532 fb_user_63633 fb_user_73734 fb_user_83835 fb_user_9
1# Converting all the above columns having <=29 unique values into categorical data type.2data[change_to_cat]=data[change_to_cat].astype('category')
1# Converting *sachet* variables to categorical data type2sachet_columns = data.filter(regex='.*sachet.*', axis=1).columns.values3data[sachet_columns] = data[sachet_columns].astype('category')
1#Changing datatype of date variables to datetime.2columns=data.columns3col_with_date=[]4import re5for column in columns:6 x = re.findall("^date", column)7 if x:8 col_with_date.append(column)9data[col_with_date].dtypes
1date_of_last_rech_6 object2date_of_last_rech_7 object3date_of_last_rech_8 object4date_of_last_rech_9 object5date_of_last_rech_data_6 object6date_of_last_rech_data_7 object7date_of_last_rech_data_8 object8date_of_last_rech_data_9 object9dtype: object
1# Checking the date format2data[col_with_date].head()
date_of_last_rech_6 | date_of_last_rech_7 | date_of_last_rech_8 | date_of_last_rech_9 | date_of_last_rech_data_6 | date_of_last_rech_data_7 | date_of_last_rech_data_8 | date_of_last_rech_data_9 | |
mobile_number | ||||||||
7000842753 | 6/21/2014 | 7/16/2014 | 8/8/2014 | 9/28/2014 | 6/21/2014 | 7/16/2014 | 8/8/2014 | NaN |
7001865778 | 6/29/2014 | 7/31/2014 | 8/28/2014 | 9/30/2014 | NaN | 7/25/2014 | 8/10/2014 | NaN |
7001625959 | 6/17/2014 | 7/24/2014 | 8/14/2014 | 9/29/2014 | NaN | NaN | NaN | 9/17/2014 |
7001204172 | 6/28/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | NaN | NaN | NaN | NaN |
7000142493 | 6/26/2014 | 7/28/2014 | 8/9/2014 | 9/28/2014 | 6/4/2014 | NaN | NaN | NaN |
- Lets convert the above columns to datetime data type.
1for col in col_with_date:2 data[col]=pd.to_datetime(data[col], format="%m/%d/%Y")3data[col_with_date].head()
date_of_last_rech_6 | date_of_last_rech_7 | date_of_last_rech_8 | date_of_last_rech_9 | date_of_last_rech_data_6 | date_of_last_rech_data_7 | date_of_last_rech_data_8 | date_of_last_rech_data_9 | |
mobile_number | ||||||||
7000842753 | 2014-06-21 | 2014-07-16 | 2014-08-08 | 2014-09-28 | 2014-06-21 | 2014-07-16 | 2014-08-08 | NaT |
7001865778 | 2014-06-29 | 2014-07-31 | 2014-08-28 | 2014-09-30 | NaT | 2014-07-25 | 2014-08-10 | NaT |
7001625959 | 2014-06-17 | 2014-07-24 | 2014-08-14 | 2014-09-29 | NaT | NaT | NaT | 2014-09-17 |
7001204172 | 2014-06-28 | 2014-07-31 | 2014-08-31 | 2014-09-30 | NaT | NaT | NaT | NaT |
7000142493 | 2014-06-26 | 2014-07-28 | 2014-08-09 | 2014-09-28 | 2014-06-04 | NaT | NaT | NaT |
Filtering High Value Customers
- Customers are High Values if their Average recharge amount of june and july is more than or equal to 70th percentile of Average recharge amount.
1#Deriving Average recharge amount of June and July.2data['Average_rech_amt_6n7']=(data['total_rech_amt_6']+data['total_rech_amt_7'])/2
1#Filtering based HIGH VALUED CUSTOMERS based on (Average_rech_amt_6n7 >= 70th percentile of Average_rech_amt_6n7)2data=data[(data['Average_rech_amt_6n7']>= data['Average_rech_amt_6n7'].quantile(0.7))]
Missing Values
1#Checking for missing values.2missing_values = metadata_matrix(data)[['Datatype', 'Null_Percentage']].sort_values(by='Null_Percentage', ascending=False)3missing_values
Datatype | Null_Percentage | |
av_rech_amt_data_6 | float64 | 62.02 |
count_rech_2g_6 | float64 | 62.02 |
arpu_2g_6 | float64 | 62.02 |
max_rech_data_6 | float64 | 62.02 |
night_pck_user_6 | category | 62.02 |
date_of_last_rech_data_6 | datetime64[ns] | 62.02 |
total_rech_data_6 | float64 | 62.02 |
arpu_3g_6 | float64 | 62.02 |
fb_user_6 | category | 62.02 |
count_rech_3g_6 | category | 62.02 |
av_rech_amt_data_9 | float64 | 61.81 |
count_rech_2g_9 | float64 | 61.81 |
night_pck_user_9 | category | 61.81 |
arpu_3g_9 | float64 | 61.81 |
arpu_2g_9 | float64 | 61.81 |
fb_user_9 | category | 61.81 |
date_of_last_rech_data_9 | datetime64[ns] | 61.81 |
total_rech_data_9 | float64 | 61.81 |
count_rech_3g_9 | category | 61.81 |
max_rech_data_9 | float64 | 61.81 |
count_rech_2g_7 | float64 | 61.14 |
count_rech_3g_7 | category | 61.14 |
arpu_2g_7 | float64 | 61.14 |
arpu_3g_7 | float64 | 61.14 |
av_rech_amt_data_7 | float64 | 61.14 |
max_rech_data_7 | float64 | 61.14 |
fb_user_7 | category | 61.14 |
total_rech_data_7 | float64 | 61.14 |
date_of_last_rech_data_7 | datetime64[ns] | 61.14 |
night_pck_user_7 | category | 61.14 |
av_rech_amt_data_8 | float64 | 60.83 |
count_rech_3g_8 | category | 60.83 |
total_rech_data_8 | float64 | 60.83 |
arpu_3g_8 | float64 | 60.83 |
max_rech_data_8 | float64 | 60.83 |
date_of_last_rech_data_8 | datetime64[ns] | 60.83 |
arpu_2g_8 | float64 | 60.83 |
fb_user_8 | category | 60.83 |
night_pck_user_8 | category | 60.83 |
count_rech_2g_8 | float64 | 60.83 |
loc_og_t2t_mou_9 | float64 | 5.68 |
ic_others_9 | float64 | 5.68 |
isd_ic_mou_9 | float64 | 5.68 |
og_others_9 | float64 | 5.68 |
loc_og_t2f_mou_9 | float64 | 5.68 |
roam_ic_mou_9 | float64 | 5.68 |
loc_og_mou_9 | float64 | 5.68 |
std_og_t2f_mou_9 | float64 | 5.68 |
loc_og_t2m_mou_9 | float64 | 5.68 |
std_og_t2m_mou_9 | float64 | 5.68 |
loc_og_t2c_mou_9 | float64 | 5.68 |
std_og_t2t_mou_9 | float64 | 5.68 |
std_ic_t2o_mou_9 | category | 5.68 |
std_ic_mou_9 | float64 | 5.68 |
spl_ic_mou_9 | float64 | 5.68 |
std_ic_t2f_mou_9 | float64 | 5.68 |
roam_og_mou_9 | float64 | 5.68 |
std_ic_t2m_mou_9 | float64 | 5.68 |
offnet_mou_9 | float64 | 5.68 |
std_og_mou_9 | float64 | 5.68 |
spl_og_mou_9 | float64 | 5.68 |
loc_ic_t2t_mou_9 | float64 | 5.68 |
onnet_mou_9 | float64 | 5.68 |
loc_ic_t2m_mou_9 | float64 | 5.68 |
loc_ic_t2f_mou_9 | float64 | 5.68 |
std_og_t2c_mou_9 | category | 5.68 |
loc_ic_mou_9 | float64 | 5.68 |
std_ic_t2t_mou_9 | float64 | 5.68 |
isd_og_mou_9 | float64 | 5.68 |
std_og_t2t_mou_8 | float64 | 3.13 |
std_og_t2c_mou_8 | category | 3.13 |
std_og_t2f_mou_8 | float64 | 3.13 |
std_og_mou_8 | float64 | 3.13 |
roam_og_mou_8 | float64 | 3.13 |
isd_og_mou_8 | float64 | 3.13 |
loc_og_t2t_mou_8 | float64 | 3.13 |
spl_ic_mou_8 | float64 | 3.13 |
std_og_t2m_mou_8 | float64 | 3.13 |
ic_others_8 | float64 | 3.13 |
offnet_mou_8 | float64 | 3.13 |
og_others_8 | float64 | 3.13 |
isd_ic_mou_8 | float64 | 3.13 |
roam_ic_mou_8 | float64 | 3.13 |
spl_og_mou_8 | float64 | 3.13 |
loc_og_t2f_mou_8 | float64 | 3.13 |
std_ic_t2m_mou_8 | float64 | 3.13 |
std_ic_t2f_mou_8 | float64 | 3.13 |
std_ic_t2t_mou_8 | float64 | 3.13 |
loc_og_t2c_mou_8 | float64 | 3.13 |
loc_ic_mou_8 | float64 | 3.13 |
onnet_mou_8 | float64 | 3.13 |
loc_og_t2m_mou_8 | float64 | 3.13 |
loc_ic_t2f_mou_8 | float64 | 3.13 |
std_ic_t2o_mou_8 | category | 3.13 |
loc_og_mou_8 | float64 | 3.13 |
loc_ic_t2m_mou_8 | float64 | 3.13 |
std_ic_mou_8 | float64 | 3.13 |
loc_ic_t2t_mou_8 | float64 | 3.13 |
date_of_last_rech_9 | datetime64[ns] | 2.89 |
date_of_last_rech_8 | datetime64[ns] | 1.98 |
last_date_of_month_9 | object | 1.20 |
loc_og_mou_6 | float64 | 1.05 |
std_ic_t2m_mou_6 | float64 | 1.05 |
roam_og_mou_6 | float64 | 1.05 |
std_ic_t2t_mou_6 | float64 | 1.05 |
loc_ic_mou_6 | float64 | 1.05 |
roam_ic_mou_6 | float64 | 1.05 |
loc_ic_t2f_mou_6 | float64 | 1.05 |
loc_ic_t2m_mou_6 | float64 | 1.05 |
std_og_t2t_mou_6 | float64 | 1.05 |
onnet_mou_6 | float64 | 1.05 |
loc_ic_t2t_mou_6 | float64 | 1.05 |
offnet_mou_6 | float64 | 1.05 |
og_others_6 | float64 | 1.05 |
loc_og_t2t_mou_6 | float64 | 1.05 |
isd_og_mou_6 | float64 | 1.05 |
std_og_t2m_mou_6 | float64 | 1.05 |
loc_og_t2f_mou_6 | float64 | 1.05 |
spl_ic_mou_6 | float64 | 1.05 |
std_ic_mou_6 | float64 | 1.05 |
isd_ic_mou_6 | float64 | 1.05 |
loc_og_t2m_mou_6 | float64 | 1.05 |
std_ic_t2o_mou_6 | category | 1.05 |
spl_og_mou_6 | float64 | 1.05 |
ic_others_6 | float64 | 1.05 |
std_ic_t2f_mou_6 | float64 | 1.05 |
loc_og_t2c_mou_6 | float64 | 1.05 |
std_og_mou_6 | float64 | 1.05 |
std_og_t2f_mou_6 | float64 | 1.05 |
std_og_t2c_mou_6 | category | 1.05 |
roam_ic_mou_7 | float64 | 1.01 |
loc_og_t2c_mou_7 | float64 | 1.01 |
loc_og_t2f_mou_7 | float64 | 1.01 |
loc_og_t2m_mou_7 | float64 | 1.01 |
loc_og_t2t_mou_7 | float64 | 1.01 |
roam_og_mou_7 | float64 | 1.01 |
std_ic_t2t_mou_7 | float64 | 1.01 |
offnet_mou_7 | float64 | 1.01 |
onnet_mou_7 | float64 | 1.01 |
std_ic_t2f_mou_7 | float64 | 1.01 |
std_ic_mou_7 | float64 | 1.01 |
loc_ic_t2f_mou_7 | float64 | 1.01 |
std_ic_t2m_mou_7 | float64 | 1.01 |
loc_og_mou_7 | float64 | 1.01 |
loc_ic_t2t_mou_7 | float64 | 1.01 |
std_og_t2t_mou_7 | float64 | 1.01 |
std_og_t2c_mou_7 | category | 1.01 |
std_og_mou_7 | float64 | 1.01 |
isd_og_mou_7 | float64 | 1.01 |
spl_og_mou_7 | float64 | 1.01 |
og_others_7 | float64 | 1.01 |
spl_ic_mou_7 | float64 | 1.01 |
loc_ic_t2m_mou_7 | float64 | 1.01 |
loc_ic_mou_7 | float64 | 1.01 |
ic_others_7 | float64 | 1.01 |
std_og_t2m_mou_7 | float64 | 1.01 |
isd_ic_mou_7 | float64 | 1.01 |
std_ic_t2o_mou_7 | category | 1.01 |
std_og_t2f_mou_7 | float64 | 1.01 |
last_date_of_month_8 | object | 0.52 |
loc_og_t2o_mou | category | 0.38 |
loc_ic_t2o_mou | category | 0.38 |
date_of_last_rech_7 | datetime64[ns] | 0.38 |
std_og_t2o_mou | category | 0.38 |
date_of_last_rech_6 | datetime64[ns] | 0.21 |
last_date_of_month_7 | object | 0.10 |
vol_3g_mb_6 | float64 | 0.00 |
arpu_6 | float64 | 0.00 |
total_rech_amt_8 | int64 | 0.00 |
total_rech_amt_7 | int64 | 0.00 |
total_rech_amt_6 | int64 | 0.00 |
total_rech_num_9 | int64 | 0.00 |
last_date_of_month_6 | object | 0.00 |
vol_3g_mb_8 | float64 | 0.00 |
arpu_7 | float64 | 0.00 |
arpu_8 | float64 | 0.00 |
arpu_9 | float64 | 0.00 |
total_og_mou_6 | float64 | 0.00 |
total_og_mou_7 | float64 | 0.00 |
vol_3g_mb_7 | float64 | 0.00 |
max_rech_amt_9 | int64 | 0.00 |
vol_2g_mb_9 | float64 | 0.00 |
vol_2g_mb_8 | float64 | 0.00 |
vol_2g_mb_7 | float64 | 0.00 |
vol_2g_mb_6 | float64 | 0.00 |
last_day_rch_amt_9 | int64 | 0.00 |
last_day_rch_amt_8 | int64 | 0.00 |
last_day_rch_amt_7 | int64 | 0.00 |
last_day_rch_amt_6 | int64 | 0.00 |
max_rech_amt_8 | int64 | 0.00 |
max_rech_amt_7 | int64 | 0.00 |
max_rech_amt_6 | int64 | 0.00 |
total_rech_amt_9 | int64 | 0.00 |
total_ic_mou_6 | float64 | 0.00 |
total_og_mou_8 | float64 | 0.00 |
vbc_3g_8 | float64 | 0.00 |
total_ic_mou_7 | float64 | 0.00 |
total_ic_mou_8 | float64 | 0.00 |
sachet_3g_9 | category | 0.00 |
sachet_3g_7 | category | 0.00 |
vbc_3g_9 | float64 | 0.00 |
vbc_3g_6 | float64 | 0.00 |
vbc_3g_7 | float64 | 0.00 |
aon | int64 | 0.00 |
sachet_3g_6 | category | 0.00 |
monthly_3g_8 | category | 0.00 |
monthly_3g_9 | category | 0.00 |
sachet_3g_8 | category | 0.00 |
monthly_3g_7 | category | 0.00 |
sachet_2g_9 | category | 0.00 |
sachet_2g_8 | category | 0.00 |
sachet_2g_7 | category | 0.00 |
sachet_2g_6 | category | 0.00 |
monthly_2g_9 | category | 0.00 |
monthly_2g_8 | category | 0.00 |
monthly_2g_7 | category | 0.00 |
monthly_2g_6 | category | 0.00 |
monthly_3g_6 | category | 0.00 |
circle_id | category | 0.00 |
vol_3g_mb_9 | float64 | 0.00 |
total_og_mou_9 | float64 | 0.00 |
total_rech_num_8 | int64 | 0.00 |
total_rech_num_7 | int64 | 0.00 |
total_rech_num_6 | int64 | 0.00 |
total_ic_mou_9 | float64 | 0.00 |
Average_rech_amt_6n7 | float64 | 0.00 |
1# Columns with high missing values , > 50%2metadata = metadata_matrix(data)3condition = metadata['Null_Percentage'] > 504high_missing_values = metadata[condition]5high_missing_values
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
av_rech_amt_data_6 | float64 | 11397 | 18614 | 62.02 | 794 |
count_rech_3g_6 | category | 11397 | 18614 | 62.02 | 25 |
count_rech_2g_6 | float64 | 11397 | 18614 | 62.02 | 30 |
arpu_2g_6 | float64 | 11397 | 18614 | 62.02 | 4503 |
max_rech_data_6 | float64 | 11397 | 18614 | 62.02 | 43 |
night_pck_user_6 | category | 11397 | 18614 | 62.02 | 2 |
date_of_last_rech_data_6 | datetime64[ns] | 11397 | 18614 | 62.02 | 30 |
total_rech_data_6 | float64 | 11397 | 18614 | 62.02 | 36 |
arpu_3g_6 | float64 | 11397 | 18614 | 62.02 | 4875 |
fb_user_6 | category | 11397 | 18614 | 62.02 | 2 |
max_rech_data_9 | float64 | 11461 | 18550 | 61.81 | 48 |
count_rech_3g_9 | category | 11461 | 18550 | 61.81 | 27 |
fb_user_9 | category | 11461 | 18550 | 61.81 | 2 |
total_rech_data_9 | float64 | 11461 | 18550 | 61.81 | 35 |
date_of_last_rech_data_9 | datetime64[ns] | 11461 | 18550 | 61.81 | 30 |
av_rech_amt_data_9 | float64 | 11461 | 18550 | 61.81 | 812 |
arpu_2g_9 | float64 | 11461 | 18550 | 61.81 | 3846 |
arpu_3g_9 | float64 | 11461 | 18550 | 61.81 | 4800 |
night_pck_user_9 | category | 11461 | 18550 | 61.81 | 2 |
count_rech_2g_9 | float64 | 11461 | 18550 | 61.81 | 29 |
fb_user_7 | category | 11662 | 18349 | 61.14 | 2 |
date_of_last_rech_data_7 | datetime64[ns] | 11662 | 18349 | 61.14 | 31 |
total_rech_data_7 | float64 | 11662 | 18349 | 61.14 | 40 |
night_pck_user_7 | category | 11662 | 18349 | 61.14 | 2 |
max_rech_data_7 | float64 | 11662 | 18349 | 61.14 | 46 |
count_rech_2g_7 | float64 | 11662 | 18349 | 61.14 | 35 |
arpu_3g_7 | float64 | 11662 | 18349 | 61.14 | 4860 |
av_rech_amt_data_7 | float64 | 11662 | 18349 | 61.14 | 863 |
arpu_2g_7 | float64 | 11662 | 18349 | 61.14 | 4219 |
count_rech_3g_7 | category | 11662 | 18349 | 61.14 | 28 |
night_pck_user_8 | category | 11754 | 18257 | 60.83 | 2 |
fb_user_8 | category | 11754 | 18257 | 60.83 | 2 |
arpu_2g_8 | float64 | 11754 | 18257 | 60.83 | 3854 |
count_rech_2g_8 | float64 | 11754 | 18257 | 60.83 | 33 |
date_of_last_rech_data_8 | datetime64[ns] | 11754 | 18257 | 60.83 | 31 |
av_rech_amt_data_8 | float64 | 11754 | 18257 | 60.83 | 837 |
arpu_3g_8 | float64 | 11754 | 18257 | 60.83 | 4769 |
total_rech_data_8 | float64 | 11754 | 18257 | 60.83 | 45 |
count_rech_3g_8 | category | 11754 | 18257 | 60.83 | 29 |
max_rech_data_8 | float64 | 11754 | 18257 | 60.83 | 47 |
1# Dropping above columns with high missing values2high_missing_value_columns = high_missing_values.index3data.drop(columns=high_missing_value_columns, inplace=True)
1# Looking at remaining columns with missing values2metadata_matrix(data)
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
std_ic_t2o_mou_9 | category | 28307 | 1704 | 5.68 | 1 |
spl_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 2966 |
isd_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 908 |
roam_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 3370 |
std_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 15900 |
roam_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 4004 |
std_ic_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 1971 |
std_og_t2c_mou_9 | category | 28307 | 1704 | 5.68 | 1 |
loc_og_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 10360 |
std_og_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 1595 |
std_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 7745 |
loc_og_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 15585 |
std_og_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 12445 |
loc_og_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 3111 |
std_og_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 11141 |
loc_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 18018 |
loc_og_t2c_mou_9 | float64 | 28307 | 1704 | 5.68 | 1576 |
offnet_mou_9 | float64 | 28307 | 1704 | 5.68 | 20452 |
loc_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 18207 |
spl_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 287 |
std_ic_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 6168 |
loc_ic_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 4611 |
ic_others_9 | float64 | 28307 | 1704 | 5.68 | 1284 |
loc_ic_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 15194 |
loc_ic_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 9407 |
std_ic_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 4280 |
isd_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 3329 |
og_others_9 | float64 | 28307 | 1704 | 5.68 | 132 |
onnet_mou_9 | float64 | 28307 | 1704 | 5.68 | 16674 |
std_og_mou_8 | float64 | 29073 | 938 | 3.13 | 16864 |
std_og_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 13326 |
og_others_8 | float64 | 29073 | 938 | 3.13 | 133 |
loc_ic_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 4705 |
std_og_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 11781 |
loc_og_mou_8 | float64 | 29073 | 938 | 3.13 | 18885 |
std_ic_t2o_mou_8 | category | 29073 | 938 | 3.13 | 1 |
loc_ic_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 15598 |
std_ic_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 6420 |
std_ic_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 4486 |
std_og_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 1627 |
std_ic_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 1941 |
spl_og_mou_8 | float64 | 29073 | 938 | 3.13 | 3238 |
loc_ic_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 9671 |
std_og_t2c_mou_8 | category | 29073 | 938 | 3.13 | 1 |
isd_og_mou_8 | float64 | 29073 | 938 | 3.13 | 940 |
loc_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 18573 |
roam_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 3655 |
isd_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 3493 |
onnet_mou_8 | float64 | 29073 | 938 | 3.13 | 17604 |
loc_og_t2c_mou_8 | float64 | 29073 | 938 | 3.13 | 1730 |
spl_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 85 |
loc_og_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 3124 |
std_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 8033 |
roam_og_mou_8 | float64 | 29073 | 938 | 3.13 | 4382 |
ic_others_8 | float64 | 29073 | 938 | 3.13 | 1259 |
loc_og_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 16165 |
loc_og_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 10772 |
offnet_mou_8 | float64 | 29073 | 938 | 3.13 | 21513 |
date_of_last_rech_9 | datetime64[ns] | 29145 | 866 | 2.89 | 30 |
date_of_last_rech_8 | datetime64[ns] | 29417 | 594 | 1.98 | 31 |
last_date_of_month_9 | object | 29651 | 360 | 1.20 | 1 |
std_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 8391 |
offnet_mou_6 | float64 | 29695 | 316 | 1.05 | 22454 |
std_ic_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 2033 |
isd_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 3429 |
ic_others_6 | float64 | 29695 | 316 | 1.05 | 1227 |
onnet_mou_6 | float64 | 29695 | 316 | 1.05 | 18813 |
std_ic_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 6680 |
loc_ic_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 9872 |
loc_ic_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 16015 |
loc_ic_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 4817 |
loc_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 19133 |
std_ic_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 4608 |
og_others_6 | float64 | 29695 | 316 | 1.05 | 862 |
spl_og_mou_6 | float64 | 29695 | 316 | 1.05 | 3053 |
roam_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 4338 |
spl_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 78 |
std_og_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 12777 |
loc_og_t2c_mou_6 | float64 | 29695 | 316 | 1.05 | 1658 |
std_og_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 14518 |
loc_og_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 3252 |
std_og_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 1773 |
loc_og_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 16747 |
std_ic_t2o_mou_6 | category | 29695 | 316 | 1.05 | 1 |
std_og_t2c_mou_6 | category | 29695 | 316 | 1.05 | 1 |
std_og_mou_6 | float64 | 29695 | 316 | 1.05 | 18325 |
loc_og_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 11151 |
isd_og_mou_6 | float64 | 29695 | 316 | 1.05 | 1113 |
roam_og_mou_6 | float64 | 29695 | 316 | 1.05 | 5174 |
loc_og_mou_6 | float64 | 29695 | 316 | 1.05 | 19691 |
isd_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 3639 |
std_ic_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 2075 |
std_ic_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 6747 |
std_ic_t2o_mou_7 | category | 29708 | 303 | 1.01 | 1 |
ic_others_7 | float64 | 29708 | 303 | 1.01 | 1371 |
spl_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 93 |
std_ic_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 4706 |
std_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 8543 |
loc_ic_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 4897 |
og_others_7 | float64 | 29708 | 303 | 1.01 | 123 |
loc_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 19030 |
std_og_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 1714 |
onnet_mou_7 | float64 | 29708 | 303 | 1.01 | 18938 |
roam_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 3649 |
roam_og_mou_7 | float64 | 29708 | 303 | 1.01 | 4431 |
loc_og_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 11154 |
loc_og_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 16872 |
loc_og_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 3267 |
loc_og_t2c_mou_7 | float64 | 29708 | 303 | 1.01 | 1750 |
loc_og_mou_7 | float64 | 29708 | 303 | 1.01 | 19880 |
std_og_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 12983 |
std_og_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 14589 |
offnet_mou_7 | float64 | 29708 | 303 | 1.01 | 22650 |
std_og_t2c_mou_7 | category | 29708 | 303 | 1.01 | 1 |
loc_ic_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 9961 |
isd_og_mou_7 | float64 | 29708 | 303 | 1.01 | 1125 |
spl_og_mou_7 | float64 | 29708 | 303 | 1.01 | 3399 |
std_og_mou_7 | float64 | 29708 | 303 | 1.01 | 18445 |
loc_ic_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 16068 |
last_date_of_month_8 | object | 29854 | 157 | 0.52 | 1 |
std_og_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
loc_ic_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
date_of_last_rech_7 | datetime64[ns] | 29897 | 114 | 0.38 | 31 |
loc_og_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
last_date_of_month_7 | object | 29980 | 31 | 0.10 | 1 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
monthly_2g_8 | category | 30011 | 0 | 0.00 | 6 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.00 | 7310 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.00 | 6984 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
sachet_3g_9 | category | 30011 | 0 | 0.00 | 27 |
sachet_3g_8 | category | 30011 | 0 | 0.00 | 29 |
monthly_3g_9 | category | 30011 | 0 | 0.00 | 11 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.00 | 7440 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.00 | 7151 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.00 | 7016 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
monthly_2g_7 | category | 30011 | 0 | 0.00 | 6 |
monthly_2g_9 | category | 30011 | 0 | 0.00 | 5 |
sachet_3g_7 | category | 30011 | 0 | 0.00 | 27 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
sachet_2g_7 | category | 30011 | 0 | 0.00 | 34 |
sachet_2g_8 | category | 30011 | 0 | 0.00 | 34 |
sachet_2g_9 | category | 30011 | 0 | 0.00 | 29 |
vbc_3g_9 | float64 | 30011 | 0 | 0.00 | 2171 |
monthly_3g_8 | category | 30011 | 0 | 0.00 | 12 |
monthly_3g_7 | category | 30011 | 0 | 0.00 | 15 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
vbc_3g_7 | float64 | 30011 | 0 | 0.00 | 7318 |
vbc_3g_8 | float64 | 30011 | 0 | 0.00 | 7291 |
aon | int64 | 30011 | 0 | 0.00 | 3321 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.00 | 7813 |
circle_id | category | 30011 | 0 | 0.00 | 1 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.00 | 170 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.00 | 179 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
arpu_7 | float64 | 30011 | 0 | 0.00 | 29260 |
arpu_8 | float64 | 30011 | 0 | 0.00 | 28405 |
arpu_9 | float64 | 30011 | 0 | 0.00 | 27327 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_og_mou_7 | float64 | 30011 | 0 | 0.00 | 24913 |
total_og_mou_8 | float64 | 30011 | 0 | 0.00 | 23644 |
total_og_mou_9 | float64 | 30011 | 0 | 0.00 | 22615 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 20711 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 20096 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 19437 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
total_rech_num_7 | int64 | 30011 | 0 | 0.00 | 101 |
total_rech_num_8 | int64 | 30011 | 0 | 0.00 | 96 |
total_rech_num_9 | int64 | 30011 | 0 | 0.00 | 96 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 2265 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 2299 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 2248 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 151 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 182 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 186 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.00 | 149 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.00 | 3025 |
- data contains information of 04 months - 6,7,8,9.
- For the purpose of missing value treatment, each month’s revenue and usage data is not related to the other months.
- hence, missing value treatment could be performed month wise.
1# Month 6
1sixth_month_columns = []2for column in data.columns:3 x = re.search("6$", column)4 if x:5 sixth_month_columns.append(column)6# missing_values.loc[sixth_month_columns].sort_values(by='Null_Percentage', ascending=False)7metadata = metadata_matrix(data)8condition = metadata.index.isin(sixth_month_columns)9sixth_month_metadata = metadata[condition]10sixth_month_metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
std_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 8391 |
offnet_mou_6 | float64 | 29695 | 316 | 1.05 | 22454 |
std_ic_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 2033 |
isd_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 3429 |
ic_others_6 | float64 | 29695 | 316 | 1.05 | 1227 |
onnet_mou_6 | float64 | 29695 | 316 | 1.05 | 18813 |
std_ic_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 6680 |
loc_ic_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 9872 |
loc_ic_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 16015 |
loc_ic_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 4817 |
loc_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 19133 |
std_ic_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 4608 |
og_others_6 | float64 | 29695 | 316 | 1.05 | 862 |
spl_og_mou_6 | float64 | 29695 | 316 | 1.05 | 3053 |
roam_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 4338 |
spl_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 78 |
std_og_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 12777 |
loc_og_t2c_mou_6 | float64 | 29695 | 316 | 1.05 | 1658 |
std_og_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 14518 |
loc_og_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 3252 |
std_og_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 1773 |
loc_og_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 16747 |
std_ic_t2o_mou_6 | category | 29695 | 316 | 1.05 | 1 |
std_og_t2c_mou_6 | category | 29695 | 316 | 1.05 | 1 |
std_og_mou_6 | float64 | 29695 | 316 | 1.05 | 18325 |
loc_og_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 11151 |
isd_og_mou_6 | float64 | 29695 | 316 | 1.05 | 1113 |
roam_og_mou_6 | float64 | 29695 | 316 | 1.05 | 5174 |
loc_og_mou_6 | float64 | 29695 | 316 | 1.05 | 19691 |
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
- Note that all the columns with *_mou have exactly 3.94% rows with missing values.
- This is an indicator of a meaningful missing values.
- Further note that *_mou columns indicate minutes of usage, which are applicable only to customers using calling plans. It is probable that, the 3.94% customers not using calling plans.
- This could confirmed by looking at ‘total_og_mou_6’ and ‘total_ic_mou_6’ related columns where _mou columns have missing values. If these columns are zero for a customer , then all _mou columns should be zero too.
1# columns with meaningful missing in 6th month2sixth_month_meaningful_missing_condition = sixth_month_metadata['Null_Percentage'] == 1.053sixth_month_meaningful_missing_cols = sixth_month_metadata[sixth_month_meaningful_missing_condition].index.values4sixth_month_meaningful_missing_cols
1array(['std_ic_mou_6', 'offnet_mou_6', 'std_ic_t2f_mou_6', 'isd_ic_mou_6',2 'ic_others_6', 'onnet_mou_6', 'std_ic_t2m_mou_6',3 'loc_ic_t2t_mou_6', 'loc_ic_t2m_mou_6', 'loc_ic_t2f_mou_6',4 'loc_ic_mou_6', 'std_ic_t2t_mou_6', 'og_others_6', 'spl_og_mou_6',5 'roam_ic_mou_6', 'spl_ic_mou_6', 'std_og_t2t_mou_6',6 'loc_og_t2c_mou_6', 'std_og_t2m_mou_6', 'loc_og_t2f_mou_6',7 'std_og_t2f_mou_6', 'loc_og_t2m_mou_6', 'std_ic_t2o_mou_6',8 'std_og_t2c_mou_6', 'std_og_mou_6', 'loc_og_t2t_mou_6',9 'isd_og_mou_6', 'roam_og_mou_6', 'loc_og_mou_6'], dtype=object)
1# Looking at all sixth month columns where rows of *_mou are null2condition = data[sixth_month_meaningful_missing_cols].isnull()3# data.loc[condition, sixth_month_columns]456# Rows is null for all the above columns7missing_rows = pd.Series([True]*data.shape[0], index = data.index)8for column in sixth_month_meaningful_missing_cols :9 missing_rows = missing_rows & data[column].isnull()1011print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_6'].unique()[0])12print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_6'].unique()[0])
1Total outgoing mou for each customer with missing *_mou data is 0.02Total incoming mou for each customer with missing *_mou data is 0.0
- Hence, these could be imputed with 0
1# Imputation2data[sixth_month_meaningful_missing_cols] = data[sixth_month_meaningful_missing_cols].fillna(0)34metadata = metadata_matrix(data)56# Remaining Missing Values7metadata.iloc[metadata.index.isin(sixth_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
loc_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16747 |
isd_og_mou_6 | float64 | 30011 | 0 | 0.00 | 1113 |
std_og_mou_6 | float64 | 30011 | 0 | 0.00 | 18325 |
std_og_t2c_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 1773 |
std_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 14518 |
std_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 12777 |
loc_og_mou_6 | float64 | 30011 | 0 | 0.00 | 19691 |
loc_og_t2c_mou_6 | float64 | 30011 | 0 | 0.00 | 1658 |
loc_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 3252 |
loc_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 11151 |
roam_og_mou_6 | float64 | 30011 | 0 | 0.00 | 5174 |
roam_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 4338 |
offnet_mou_6 | float64 | 30011 | 0 | 0.00 | 22454 |
onnet_mou_6 | float64 | 30011 | 0 | 0.00 | 18813 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
spl_og_mou_6 | float64 | 30011 | 0 | 0.00 | 3053 |
og_others_6 | float64 | 30011 | 0 | 0.00 | 862 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
ic_others_6 | float64 | 30011 | 0 | 0.00 | 1227 |
isd_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 3429 |
spl_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 78 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
std_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 8391 |
std_ic_t2o_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 2033 |
std_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 6680 |
std_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 4608 |
loc_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 19133 |
loc_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 4817 |
loc_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16015 |
loc_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 9872 |
- Looks like there ‘1.61%’ customers with missing date of last recharge. Let’s look at ‘recharge’ related columns for such customers
1# Looking at 'recharge' related 6th month columns for customers with missing 'date_of_last_rech_6'2condition = data['date_of_last_rech_6'].isnull()3data[condition].filter(regex='.*rech.*6$', axis=1).head()
total_rech_num_6 | total_rech_amt_6 | max_rech_amt_6 | date_of_last_rech_6 | |
mobile_number | ||||
7001588448 | 0 | 0 | 0 | NaT |
7001223277 | 0 | 0 | 0 | NaT |
7000721536 | 0 | 0 | 0 | NaT |
7001490351 | 0 | 0 | 0 | NaT |
7000665415 | 0 | 0 | 0 | NaT |
1data[condition].filter(regex='.*rech.*6$', axis=1).nunique()
1total_rech_num_6 12total_rech_amt_6 13max_rech_amt_6 14date_of_last_rech_6 05dtype: int64
- Notice, that the recharge related columns for customers with missing ‘date_of_last_rech_6’ has just one unique value. From the first few rows of the output, we see that this is 0.
- Hence, ‘date_of_last_rech_6’ is missing since there were no recharges made in this month.
- These are meaning missing values
1# Check for missing values in 6th month variables2metadata = metadata_matrix(data)3metadata[metadata.index.isin(sixth_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
loc_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16747 |
isd_og_mou_6 | float64 | 30011 | 0 | 0.00 | 1113 |
std_og_mou_6 | float64 | 30011 | 0 | 0.00 | 18325 |
std_og_t2c_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 1773 |
std_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 14518 |
std_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 12777 |
loc_og_mou_6 | float64 | 30011 | 0 | 0.00 | 19691 |
loc_og_t2c_mou_6 | float64 | 30011 | 0 | 0.00 | 1658 |
loc_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 3252 |
loc_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 11151 |
roam_og_mou_6 | float64 | 30011 | 0 | 0.00 | 5174 |
roam_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 4338 |
offnet_mou_6 | float64 | 30011 | 0 | 0.00 | 22454 |
onnet_mou_6 | float64 | 30011 | 0 | 0.00 | 18813 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
spl_og_mou_6 | float64 | 30011 | 0 | 0.00 | 3053 |
og_others_6 | float64 | 30011 | 0 | 0.00 | 862 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
ic_others_6 | float64 | 30011 | 0 | 0.00 | 1227 |
isd_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 3429 |
spl_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 78 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
std_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 8391 |
std_ic_t2o_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 2033 |
std_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 6680 |
std_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 4608 |
loc_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 19133 |
loc_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 4817 |
loc_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16015 |
loc_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 9872 |
- No more Missing Values in 6th month columns
1# Month : 72seventh_month_columns = data.filter(regex='7$', axis=1).columns3seventh_month_columns
1Index(['last_date_of_month_7', 'arpu_7', 'onnet_mou_7', 'offnet_mou_7',2 'roam_ic_mou_7', 'roam_og_mou_7', 'loc_og_t2t_mou_7',3 'loc_og_t2m_mou_7', 'loc_og_t2f_mou_7', 'loc_og_t2c_mou_7',4 'loc_og_mou_7', 'std_og_t2t_mou_7', 'std_og_t2m_mou_7',5 'std_og_t2f_mou_7', 'std_og_t2c_mou_7', 'std_og_mou_7', 'isd_og_mou_7',6 'spl_og_mou_7', 'og_others_7', 'total_og_mou_7', 'loc_ic_t2t_mou_7',7 'loc_ic_t2m_mou_7', 'loc_ic_t2f_mou_7', 'loc_ic_mou_7',8 'std_ic_t2t_mou_7', 'std_ic_t2m_mou_7', 'std_ic_t2f_mou_7',9 'std_ic_t2o_mou_7', 'std_ic_mou_7', 'total_ic_mou_7', 'spl_ic_mou_7',10 'isd_ic_mou_7', 'ic_others_7', 'total_rech_num_7', 'total_rech_amt_7',11 'max_rech_amt_7', 'date_of_last_rech_7', 'last_day_rch_amt_7',12 'vol_2g_mb_7', 'vol_3g_mb_7', 'monthly_2g_7', 'sachet_2g_7',13 'monthly_3g_7', 'sachet_3g_7', 'vbc_3g_7', 'Average_rech_amt_6n7'],14 dtype='object')
1seventh_month_metadata = metadata[metadata.index.isin(seventh_month_columns)]2seventh_month_metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
loc_ic_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 9961 |
og_others_7 | float64 | 29708 | 303 | 1.01 | 123 |
loc_ic_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 4897 |
loc_ic_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 16068 |
loc_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 19030 |
std_ic_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 4706 |
std_ic_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 2075 |
std_ic_t2o_mou_7 | category | 29708 | 303 | 1.01 | 1 |
std_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 8543 |
spl_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 93 |
isd_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 3639 |
ic_others_7 | float64 | 29708 | 303 | 1.01 | 1371 |
std_ic_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 6747 |
isd_og_mou_7 | float64 | 29708 | 303 | 1.01 | 1125 |
spl_og_mou_7 | float64 | 29708 | 303 | 1.01 | 3399 |
std_og_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 1714 |
onnet_mou_7 | float64 | 29708 | 303 | 1.01 | 18938 |
offnet_mou_7 | float64 | 29708 | 303 | 1.01 | 22650 |
roam_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 3649 |
roam_og_mou_7 | float64 | 29708 | 303 | 1.01 | 4431 |
loc_og_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 11154 |
loc_og_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 3267 |
loc_og_t2c_mou_7 | float64 | 29708 | 303 | 1.01 | 1750 |
loc_og_mou_7 | float64 | 29708 | 303 | 1.01 | 19880 |
std_og_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 12983 |
std_og_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 14589 |
loc_og_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 16872 |
std_og_t2c_mou_7 | category | 29708 | 303 | 1.01 | 1 |
std_og_mou_7 | float64 | 29708 | 303 | 1.01 | 18445 |
date_of_last_rech_7 | datetime64[ns] | 29897 | 114 | 0.38 | 31 |
last_date_of_month_7 | object | 29980 | 31 | 0.10 | 1 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.00 | 7813 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 151 |
vbc_3g_7 | float64 | 30011 | 0 | 0.00 | 7318 |
sachet_3g_7 | category | 30011 | 0 | 0.00 | 27 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 2265 |
monthly_2g_7 | category | 30011 | 0 | 0.00 | 6 |
sachet_2g_7 | category | 30011 | 0 | 0.00 | 34 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.00 | 149 |
monthly_3g_7 | category | 30011 | 0 | 0.00 | 15 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.00 | 7440 |
total_rech_num_7 | int64 | 30011 | 0 | 0.00 | 101 |
arpu_7 | float64 | 30011 | 0 | 0.00 | 29260 |
total_og_mou_7 | float64 | 30011 | 0 | 0.00 | 24913 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 20711 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.00 | 3025 |
- Note that all the columns with *_mou have exactly 3.86% rows with missing values.
- This is an indicator of a meaningful missing values.
- Further note that *_mou columns indicate minutes of usage, which are applicable only to customers using calling plans. It is probable that, the 3.86% customers not using calling plans.
- This could confirmed by looking at ‘total_og_mou_7’ and ‘total_ic_mou_7’ related columns where _mou columns have missing values. If these columns are zero for a customer , then all _mou columns should be zero too.
1# columns with meaningful missing in 7th month2seventh_month_meaningful_missing_condition = seventh_month_metadata['Null_Percentage'] == 1.013seventh_month_meaningful_missing_cols = seventh_month_metadata[seventh_month_meaningful_missing_condition].index.values4seventh_month_meaningful_missing_cols
1array(['loc_ic_t2t_mou_7', 'og_others_7', 'loc_ic_t2f_mou_7',2 'loc_ic_t2m_mou_7', 'loc_ic_mou_7', 'std_ic_t2t_mou_7',3 'std_ic_t2f_mou_7', 'std_ic_t2o_mou_7', 'std_ic_mou_7',4 'spl_ic_mou_7', 'isd_ic_mou_7', 'ic_others_7', 'std_ic_t2m_mou_7',5 'isd_og_mou_7', 'spl_og_mou_7', 'std_og_t2f_mou_7', 'onnet_mou_7',6 'offnet_mou_7', 'roam_ic_mou_7', 'roam_og_mou_7',7 'loc_og_t2t_mou_7', 'loc_og_t2f_mou_7', 'loc_og_t2c_mou_7',8 'loc_og_mou_7', 'std_og_t2t_mou_7', 'std_og_t2m_mou_7',9 'loc_og_t2m_mou_7', 'std_og_t2c_mou_7', 'std_og_mou_7'],10 dtype=object)
1# Looking at all 7th month columns where rows of *_mou are null2condition = data[seventh_month_meaningful_missing_cols].isnull()34# Rows is null for all the above columns5missing_rows = pd.Series([True]*data.shape[0], index = data.index)6for column in seventh_month_meaningful_missing_cols :7 missing_rows = missing_rows & data[column].isnull()89print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_7'].unique()[0])10print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_7'].unique()[0])
1Total outgoing mou for each customer with missing *_mou data is 0.02Total incoming mou for each customer with missing *_mou data is 0.0
- Hence, these could be imputed with 0
1# Imputation2data[seventh_month_meaningful_missing_cols] = data[seventh_month_meaningful_missing_cols].fillna(0)34metadata = metadata_matrix(data)56# Remaining Missing Values7metadata.iloc[metadata.index.isin(seventh_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
date_of_last_rech_7 | datetime64[ns] | 29897 | 114 | 0.38 | 31 |
last_date_of_month_7 | object | 29980 | 31 | 0.10 | 1 |
total_rech_num_7 | int64 | 30011 | 0 | 0.00 | 101 |
ic_others_7 | float64 | 30011 | 0 | 0.00 | 1371 |
isd_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 3639 |
spl_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 93 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 2265 |
sachet_2g_7 | category | 30011 | 0 | 0.00 | 34 |
monthly_3g_7 | category | 30011 | 0 | 0.00 | 15 |
sachet_3g_7 | category | 30011 | 0 | 0.00 | 27 |
vbc_3g_7 | float64 | 30011 | 0 | 0.00 | 7318 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 151 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.00 | 149 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.00 | 7813 |
monthly_2g_7 | category | 30011 | 0 | 0.00 | 6 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.00 | 7440 |
loc_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 4897 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 20711 |
loc_og_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 11154 |
std_og_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 14589 |
std_og_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 12983 |
loc_og_mou_7 | float64 | 30011 | 0 | 0.00 | 19880 |
loc_og_t2c_mou_7 | float64 | 30011 | 0 | 0.00 | 1750 |
loc_og_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 3267 |
loc_og_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 16872 |
roam_og_mou_7 | float64 | 30011 | 0 | 0.00 | 4431 |
roam_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 3649 |
offnet_mou_7 | float64 | 30011 | 0 | 0.00 | 22650 |
onnet_mou_7 | float64 | 30011 | 0 | 0.00 | 18938 |
arpu_7 | float64 | 30011 | 0 | 0.00 | 29260 |
std_og_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 1714 |
std_og_t2c_mou_7 | category | 30011 | 0 | 0.00 | 1 |
loc_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 16068 |
std_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 8543 |
std_ic_t2o_mou_7 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 2075 |
std_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 6747 |
std_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 4706 |
loc_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 19030 |
loc_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 9961 |
total_og_mou_7 | float64 | 30011 | 0 | 0.00 | 24913 |
og_others_7 | float64 | 30011 | 0 | 0.00 | 123 |
spl_og_mou_7 | float64 | 30011 | 0 | 0.00 | 3399 |
isd_og_mou_7 | float64 | 30011 | 0 | 0.00 | 1125 |
std_og_mou_7 | float64 | 30011 | 0 | 0.00 | 18445 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.00 | 3025 |
- Looks like there ‘1.77%’ customers with missing date of last recharge. Let’s look at ‘recharge’ related columns for such customers
1# Looking at 'recharge' related 7th month columns for customers with missing 'date_of_last_rech_7'2condition = data['date_of_last_rech_7'].isnull()3data[condition].filter(regex='.*rech.*7$', axis=1).head()
total_rech_num_7 | total_rech_amt_7 | max_rech_amt_7 | date_of_last_rech_7 | Average_rech_amt_6n7 | |
mobile_number | |||||
7000369789 | 0 | 0 | 0 | NaT | 393.0 |
7001967148 | 0 | 0 | 0 | NaT | 500.5 |
7000066601 | 0 | 0 | 0 | NaT | 490.0 |
7001189556 | 0 | 0 | 0 | NaT | 523.5 |
7002024450 | 0 | 0 | 0 | NaT | 493.0 |
1data[condition].filter(regex='.*rech.*7$', axis=1).nunique()
1total_rech_num_7 12total_rech_amt_7 13max_rech_amt_7 14date_of_last_rech_7 05Average_rech_amt_6n7 906dtype: int64
- Notice, that the recharge related columns for customers with missing ‘date_of_last_rech_7’ has just one unique value. From the first few rows of the output, we see that this is 0.
- Hence, ‘date_of_last_rech_7’ is missing since there were no recharges made in this month.
- These are meaning missing values
1# Month : 8
1eighth_month_columns = data.filter(regex="8$", axis=1).columns2metadata = metadata_matrix(data)3condition = metadata.index.isin(eighth_month_columns)4eighth_month_metadata = metadata[condition]5eighth_month_metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
std_og_t2c_mou_8 | category | 29073 | 938 | 3.13 | 1 |
std_og_mou_8 | float64 | 29073 | 938 | 3.13 | 16864 |
isd_og_mou_8 | float64 | 29073 | 938 | 3.13 | 940 |
loc_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 18573 |
std_og_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 13326 |
loc_ic_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 15598 |
loc_og_mou_8 | float64 | 29073 | 938 | 3.13 | 18885 |
std_og_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 11781 |
std_og_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 1627 |
loc_ic_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 4705 |
loc_og_t2c_mou_8 | float64 | 29073 | 938 | 3.13 | 1730 |
ic_others_8 | float64 | 29073 | 938 | 3.13 | 1259 |
loc_og_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 16165 |
spl_og_mou_8 | float64 | 29073 | 938 | 3.13 | 3238 |
roam_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 3655 |
std_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 8033 |
spl_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 85 |
std_ic_t2o_mou_8 | category | 29073 | 938 | 3.13 | 1 |
onnet_mou_8 | float64 | 29073 | 938 | 3.13 | 17604 |
loc_og_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 3124 |
offnet_mou_8 | float64 | 29073 | 938 | 3.13 | 21513 |
std_ic_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 1941 |
og_others_8 | float64 | 29073 | 938 | 3.13 | 133 |
loc_ic_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 9671 |
std_ic_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 6420 |
std_ic_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 4486 |
roam_og_mou_8 | float64 | 29073 | 938 | 3.13 | 4382 |
isd_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 3493 |
loc_og_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 10772 |
date_of_last_rech_8 | datetime64[ns] | 29417 | 594 | 1.98 | 31 |
last_date_of_month_8 | object | 29854 | 157 | 0.52 | 1 |
total_rech_num_8 | int64 | 30011 | 0 | 0.00 | 96 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 2299 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.00 | 179 |
sachet_2g_8 | category | 30011 | 0 | 0.00 | 34 |
monthly_3g_8 | category | 30011 | 0 | 0.00 | 12 |
sachet_3g_8 | category | 30011 | 0 | 0.00 | 29 |
vbc_3g_8 | float64 | 30011 | 0 | 0.00 | 7291 |
monthly_2g_8 | category | 30011 | 0 | 0.00 | 6 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 182 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 20096 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.00 | 7310 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.00 | 7151 |
arpu_8 | float64 | 30011 | 0 | 0.00 | 28405 |
total_og_mou_8 | float64 | 30011 | 0 | 0.00 | 23644 |
1# columns with meaningful missing in 8th month2eighth_month_meaningful_missing_condition = eighth_month_metadata['Null_Percentage'] == 3.133eighth_month_meaningful_missing_cols = eighth_month_metadata[eighth_month_meaningful_missing_condition].index.values4eighth_month_meaningful_missing_cols
1array(['std_og_t2c_mou_8', 'std_og_mou_8', 'isd_og_mou_8', 'loc_ic_mou_8',2 'std_og_t2m_mou_8', 'loc_ic_t2m_mou_8', 'loc_og_mou_8',3 'std_og_t2t_mou_8', 'std_og_t2f_mou_8', 'loc_ic_t2f_mou_8',4 'loc_og_t2c_mou_8', 'ic_others_8', 'loc_og_t2m_mou_8',5 'spl_og_mou_8', 'roam_ic_mou_8', 'std_ic_mou_8', 'spl_ic_mou_8',6 'std_ic_t2o_mou_8', 'onnet_mou_8', 'loc_og_t2f_mou_8',7 'offnet_mou_8', 'std_ic_t2f_mou_8', 'og_others_8',8 'loc_ic_t2t_mou_8', 'std_ic_t2m_mou_8', 'std_ic_t2t_mou_8',9 'roam_og_mou_8', 'isd_ic_mou_8', 'loc_og_t2t_mou_8'], dtype=object)
1# Looking at all 8th month columns where rows of *_mou are null2condition = data[eighth_month_meaningful_missing_cols].isnull()34# Rows is null for all the above columns5missing_rows = pd.Series([True]*data.shape[0], index = data.index)6for column in eighth_month_meaningful_missing_cols :7 missing_rows = missing_rows & data[column].isnull()89print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_8'].unique()[0])10print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_8'].unique()[0])
1Total outgoing mou for each customer with missing *_mou data is 0.02Total incoming mou for each customer with missing *_mou data is 0.0
1# Imputation2data[eighth_month_meaningful_missing_cols] = data[eighth_month_meaningful_missing_cols].fillna(0)34metadata = metadata_matrix(data)56# Remaining Missing Values7metadata.iloc[metadata.index.isin(eighth_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
date_of_last_rech_8 | datetime64[ns] | 29417 | 594 | 1.98 | 31 |
last_date_of_month_8 | object | 29854 | 157 | 0.52 | 1 |
spl_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 85 |
total_rech_num_8 | int64 | 30011 | 0 | 0.00 | 96 |
std_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 1941 |
ic_others_8 | float64 | 30011 | 0 | 0.00 | 1259 |
std_ic_t2o_mou_8 | category | 30011 | 0 | 0.00 | 1 |
std_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 8033 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 20096 |
isd_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 3493 |
sachet_2g_8 | category | 30011 | 0 | 0.00 | 34 |
monthly_3g_8 | category | 30011 | 0 | 0.00 | 12 |
sachet_3g_8 | category | 30011 | 0 | 0.00 | 29 |
vbc_3g_8 | float64 | 30011 | 0 | 0.00 | 7291 |
monthly_2g_8 | category | 30011 | 0 | 0.00 | 6 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 2299 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 182 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.00 | 179 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.00 | 7310 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.00 | 7151 |
std_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 6420 |
loc_og_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 16165 |
loc_og_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 3124 |
loc_og_t2c_mou_8 | float64 | 30011 | 0 | 0.00 | 1730 |
loc_og_mou_8 | float64 | 30011 | 0 | 0.00 | 18885 |
std_og_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 11781 |
loc_og_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 10772 |
onnet_mou_8 | float64 | 30011 | 0 | 0.00 | 17604 |
arpu_8 | float64 | 30011 | 0 | 0.00 | 28405 |
roam_og_mou_8 | float64 | 30011 | 0 | 0.00 | 4382 |
offnet_mou_8 | float64 | 30011 | 0 | 0.00 | 21513 |
roam_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 3655 |
std_og_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 13326 |
loc_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 9671 |
loc_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 15598 |
loc_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 4705 |
loc_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 18573 |
std_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 4486 |
total_og_mou_8 | float64 | 30011 | 0 | 0.00 | 23644 |
og_others_8 | float64 | 30011 | 0 | 0.00 | 133 |
std_og_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 1627 |
std_og_t2c_mou_8 | category | 30011 | 0 | 0.00 | 1 |
std_og_mou_8 | float64 | 30011 | 0 | 0.00 | 16864 |
isd_og_mou_8 | float64 | 30011 | 0 | 0.00 | 940 |
spl_og_mou_8 | float64 | 30011 | 0 | 0.00 | 3238 |
1# Looking at 'recharge' related 8th month columns for customers with missing 'date_of_last_rech_8'2condition = data['date_of_last_rech_8'].isnull()3data[condition].filter(regex='.*rech.*8$', axis=1).head()
total_rech_num_8 | total_rech_amt_8 | max_rech_amt_8 | date_of_last_rech_8 | |
mobile_number | ||||
7000340381 | 0 | 0 | 0 | NaT |
7000608224 | 0 | 0 | 0 | NaT |
7000369789 | 0 | 0 | 0 | NaT |
7000248548 | 0 | 0 | 0 | NaT |
7001967063 | 0 | 0 | 0 | NaT |
1data[condition].filter(regex='.*rech.*8$', axis=1).nunique()
1total_rech_num_8 12total_rech_amt_8 13max_rech_amt_8 14date_of_last_rech_8 05dtype: int64
1# Month : 9
1ninth_month_columns = data.filter(regex="9$", axis=1).columns2metadata = metadata_matrix(data)3condition = metadata.index.isin(ninth_month_columns)4ninth_month_metadata = metadata[condition]5ninth_month_metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
std_og_t2c_mou_9 | category | 28307 | 1704 | 5.68 | 1 |
spl_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 287 |
loc_og_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 15585 |
og_others_9 | float64 | 28307 | 1704 | 5.68 | 132 |
loc_og_t2c_mou_9 | float64 | 28307 | 1704 | 5.68 | 1576 |
isd_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 3329 |
loc_og_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 10360 |
spl_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 2966 |
loc_ic_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 9407 |
loc_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 18207 |
roam_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 4004 |
std_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 7745 |
loc_ic_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 15194 |
roam_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 3370 |
std_og_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 11141 |
offnet_mou_9 | float64 | 28307 | 1704 | 5.68 | 20452 |
loc_ic_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 4611 |
std_ic_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 1971 |
isd_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 908 |
std_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 15900 |
std_og_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 1595 |
ic_others_9 | float64 | 28307 | 1704 | 5.68 | 1284 |
std_ic_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 4280 |
std_ic_t2o_mou_9 | category | 28307 | 1704 | 5.68 | 1 |
loc_og_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 3111 |
std_og_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 12445 |
loc_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 18018 |
std_ic_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 6168 |
onnet_mou_9 | float64 | 28307 | 1704 | 5.68 | 16674 |
date_of_last_rech_9 | datetime64[ns] | 29145 | 866 | 2.89 | 30 |
last_date_of_month_9 | object | 29651 | 360 | 1.20 | 1 |
total_rech_num_9 | int64 | 30011 | 0 | 0.00 | 96 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 19437 |
monthly_3g_9 | category | 30011 | 0 | 0.00 | 11 |
monthly_2g_9 | category | 30011 | 0 | 0.00 | 5 |
sachet_2g_9 | category | 30011 | 0 | 0.00 | 29 |
sachet_3g_9 | category | 30011 | 0 | 0.00 | 27 |
vbc_3g_9 | float64 | 30011 | 0 | 0.00 | 2171 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.00 | 7016 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 2248 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 186 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.00 | 170 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.00 | 6984 |
arpu_9 | float64 | 30011 | 0 | 0.00 | 27327 |
total_og_mou_9 | float64 | 30011 | 0 | 0.00 | 22615 |
1# columns with meaningful missing in 9th month2ninth_month_meaningful_missing_condition = ninth_month_metadata['Null_Percentage'] == 5.683ninth_month_meaningful_missing_cols = ninth_month_metadata[ninth_month_meaningful_missing_condition].index.values4ninth_month_meaningful_missing_cols
1array(['std_og_t2c_mou_9', 'spl_ic_mou_9', 'loc_og_t2m_mou_9',2 'og_others_9', 'loc_og_t2c_mou_9', 'isd_ic_mou_9',3 'loc_og_t2t_mou_9', 'spl_og_mou_9', 'loc_ic_t2t_mou_9',4 'loc_og_mou_9', 'roam_og_mou_9', 'std_ic_mou_9',5 'loc_ic_t2m_mou_9', 'roam_ic_mou_9', 'std_og_t2t_mou_9',6 'offnet_mou_9', 'loc_ic_t2f_mou_9', 'std_ic_t2f_mou_9',7 'isd_og_mou_9', 'std_og_mou_9', 'std_og_t2f_mou_9', 'ic_others_9',8 'std_ic_t2t_mou_9', 'std_ic_t2o_mou_9', 'loc_og_t2f_mou_9',9 'std_og_t2m_mou_9', 'loc_ic_mou_9', 'std_ic_t2m_mou_9',10 'onnet_mou_9'], dtype=object)
1# Looking at all 9th month columns where rows of *_mou are null2condition = data[ninth_month_meaningful_missing_cols].isnull()34# Rows is null for all the above columns5missing_rows = pd.Series([True]*data.shape[0], index = data.index)6for column in ninth_month_meaningful_missing_cols :7 missing_rows = missing_rows & data[column].isnull()89print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_9'].unique()[0])10print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_9'].unique()[0])
1Total outgoing mou for each customer with missing *_mou data is 0.02Total incoming mou for each customer with missing *_mou data is 0.0
1# Imputation2data[ninth_month_meaningful_missing_cols] = data[ninth_month_meaningful_missing_cols].fillna(0)34metadata = metadata_matrix(data)56# Remaining Missing Values7metadata.iloc[metadata.index.isin(ninth_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
date_of_last_rech_9 | datetime64[ns] | 29145 | 866 | 2.89 | 30 |
last_date_of_month_9 | object | 29651 | 360 | 1.20 | 1 |
spl_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 287 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 19437 |
std_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 7745 |
isd_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 3329 |
ic_others_9 | float64 | 30011 | 0 | 0.00 | 1284 |
loc_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 18018 |
std_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 4280 |
std_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 6168 |
std_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 1971 |
std_ic_t2o_mou_9 | category | 30011 | 0 | 0.00 | 1 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 2248 |
total_rech_num_9 | int64 | 30011 | 0 | 0.00 | 96 |
monthly_3g_9 | category | 30011 | 0 | 0.00 | 11 |
monthly_2g_9 | category | 30011 | 0 | 0.00 | 5 |
sachet_2g_9 | category | 30011 | 0 | 0.00 | 29 |
sachet_3g_9 | category | 30011 | 0 | 0.00 | 27 |
vbc_3g_9 | float64 | 30011 | 0 | 0.00 | 2171 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 186 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.00 | 7016 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.00 | 170 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.00 | 6984 |
loc_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 4611 |
loc_og_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 10360 |
loc_og_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 15585 |
loc_og_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 3111 |
loc_og_t2c_mou_9 | float64 | 30011 | 0 | 0.00 | 1576 |
loc_og_mou_9 | float64 | 30011 | 0 | 0.00 | 18207 |
roam_og_mou_9 | float64 | 30011 | 0 | 0.00 | 4004 |
onnet_mou_9 | float64 | 30011 | 0 | 0.00 | 16674 |
arpu_9 | float64 | 30011 | 0 | 0.00 | 27327 |
offnet_mou_9 | float64 | 30011 | 0 | 0.00 | 20452 |
roam_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 3370 |
std_og_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 11141 |
spl_og_mou_9 | float64 | 30011 | 0 | 0.00 | 2966 |
og_others_9 | float64 | 30011 | 0 | 0.00 | 132 |
total_og_mou_9 | float64 | 30011 | 0 | 0.00 | 22615 |
loc_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 9407 |
loc_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 15194 |
isd_og_mou_9 | float64 | 30011 | 0 | 0.00 | 908 |
std_og_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 12445 |
std_og_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 1595 |
std_og_t2c_mou_9 | category | 30011 | 0 | 0.00 | 1 |
std_og_mou_9 | float64 | 30011 | 0 | 0.00 | 15900 |
1# Looking at 'recharge' related 9th month columns for customers with missing 'date_of_last_rech_9'2condition = data['date_of_last_rech_9'].isnull()3data[condition].filter(regex='.*rech.*9$', axis=1).head()
total_rech_num_9 | total_rech_amt_9 | max_rech_amt_9 | date_of_last_rech_9 | |
mobile_number | ||||
7000340381 | 0 | 0 | 0 | NaT |
7000854899 | 0 | 0 | 0 | NaT |
7000369789 | 0 | 0 | 0 | NaT |
7001967063 | 0 | 0 | 0 | NaT |
7000066601 | 0 | 0 | 0 | NaT |
1data[condition].filter(regex='.*rech.*9$', axis=1).nunique()
1total_rech_num_9 12total_rech_amt_9 13max_rech_amt_9 14date_of_last_rech_9 05dtype: int64
1# Imputing "last_date_of_month_*"
1print('Missing Value Percentage in last_date_of_month columns : \n', 100*data.filter(regex='last_date_of_month_.*', axis=1).isnull().sum() / data.shape[0], '\n')2print('The unique values in last_date_of_month_6 : ' , data['last_date_of_month_6'].unique())3print('The unique values in last_date_of_month_7 : ' , data['last_date_of_month_7'].unique())4print('The unique values in last_date_of_month_8 : ' , data['last_date_of_month_8'].unique())5print('The unique values in last_date_of_month_9 : ' , data['last_date_of_month_9'].unique())
1Missing Value Percentage in last_date_of_month columns :2 last_date_of_month_6 0.0000003last_date_of_month_7 0.1032954last_date_of_month_8 0.5231425last_date_of_month_9 1.1995606dtype: float6478The unique values in last_date_of_month_6 : ['6/30/2014']9The unique values in last_date_of_month_7 : ['7/31/2014' nan]10The unique values in last_date_of_month_8 : ['8/31/2014' nan]11The unique values in last_date_of_month_9 : ['9/30/2014' nan]
- Last date of month is the last calender date of a particular month, it is independent of the churn data.
- Lets impute these missing values using mode.
1# Imputing last_date_of_month_* values2data['last_date_of_month_7'] = data['last_date_of_month_7'].fillna(data['last_date_of_month_7'].mode()[0])3data['last_date_of_month_8'] = data['last_date_of_month_8'].fillna(data['last_date_of_month_8'].mode()[0])4data['last_date_of_month_9'] = data['last_date_of_month_9'].fillna(data['last_date_of_month_9'].mode()[0])
1data['last_date_of_month_7'].unique()
1array(['7/31/2014'], dtype=object)
1metadata = metadata_matrix(data)2metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
date_of_last_rech_9 | datetime64[ns] | 29145 | 866 | 2.89 | 30 |
date_of_last_rech_8 | datetime64[ns] | 29417 | 594 | 1.98 | 31 |
loc_og_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
date_of_last_rech_7 | datetime64[ns] | 29897 | 114 | 0.38 | 31 |
std_og_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
loc_ic_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
isd_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 3429 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 20711 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 20096 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 19437 |
spl_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 78 |
spl_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 93 |
spl_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 85 |
spl_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 287 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
ic_others_9 | float64 | 30011 | 0 | 0.00 | 1284 |
std_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 8033 |
isd_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 3639 |
isd_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 3493 |
isd_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 3329 |
ic_others_6 | float64 | 30011 | 0 | 0.00 | 1227 |
ic_others_7 | float64 | 30011 | 0 | 0.00 | 1371 |
ic_others_8 | float64 | 30011 | 0 | 0.00 | 1259 |
std_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 7745 |
std_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 8543 |
total_rech_num_8 | int64 | 30011 | 0 | 0.00 | 96 |
std_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 6747 |
loc_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 19133 |
loc_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 19030 |
loc_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 18573 |
loc_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 18018 |
std_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 4608 |
std_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 4706 |
std_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 4486 |
std_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 4280 |
std_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 6680 |
std_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 6420 |
std_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 8391 |
std_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 6168 |
std_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 2033 |
std_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 2075 |
std_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 1941 |
std_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 1971 |
std_ic_t2o_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2o_mou_7 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2o_mou_8 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2o_mou_9 | category | 30011 | 0 | 0.00 | 1 |
total_rech_num_7 | int64 | 30011 | 0 | 0.00 | 101 |
circle_id | category | 30011 | 0 | 0.00 | 1 |
total_rech_num_9 | int64 | 30011 | 0 | 0.00 | 96 |
monthly_3g_9 | category | 30011 | 0 | 0.00 | 11 |
monthly_2g_9 | category | 30011 | 0 | 0.00 | 5 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
sachet_2g_7 | category | 30011 | 0 | 0.00 | 34 |
sachet_2g_8 | category | 30011 | 0 | 0.00 | 34 |
sachet_2g_9 | category | 30011 | 0 | 0.00 | 29 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
monthly_3g_7 | category | 30011 | 0 | 0.00 | 15 |
monthly_3g_8 | category | 30011 | 0 | 0.00 | 12 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
monthly_2g_7 | category | 30011 | 0 | 0.00 | 6 |
sachet_3g_7 | category | 30011 | 0 | 0.00 | 27 |
sachet_3g_8 | category | 30011 | 0 | 0.00 | 29 |
sachet_3g_9 | category | 30011 | 0 | 0.00 | 27 |
aon | int64 | 30011 | 0 | 0.00 | 3321 |
vbc_3g_8 | float64 | 30011 | 0 | 0.00 | 7291 |
vbc_3g_7 | float64 | 30011 | 0 | 0.00 | 7318 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
vbc_3g_9 | float64 | 30011 | 0 | 0.00 | 2171 |
monthly_2g_8 | category | 30011 | 0 | 0.00 | 6 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.00 | 149 |
loc_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 4611 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 2299 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 2248 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 151 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 182 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 186 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.00 | 179 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.00 | 7016 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.00 | 170 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.00 | 7813 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.00 | 7310 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.00 | 6984 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.00 | 7440 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.00 | 7151 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 2265 |
loc_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 4897 |
loc_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 4705 |
roam_og_mou_7 | float64 | 30011 | 0 | 0.00 | 4431 |
roam_og_mou_9 | float64 | 30011 | 0 | 0.00 | 4004 |
loc_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 11151 |
loc_og_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 11154 |
loc_og_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 10772 |
loc_og_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 10360 |
loc_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16747 |
loc_og_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 16872 |
loc_og_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 16165 |
loc_og_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 15585 |
loc_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 3252 |
loc_og_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 3267 |
loc_og_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 3124 |
loc_og_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 3111 |
loc_og_t2c_mou_6 | float64 | 30011 | 0 | 0.00 | 1658 |
loc_og_t2c_mou_7 | float64 | 30011 | 0 | 0.00 | 1750 |
loc_og_t2c_mou_8 | float64 | 30011 | 0 | 0.00 | 1730 |
loc_og_t2c_mou_9 | float64 | 30011 | 0 | 0.00 | 1576 |
loc_og_mou_6 | float64 | 30011 | 0 | 0.00 | 19691 |
loc_og_mou_7 | float64 | 30011 | 0 | 0.00 | 19880 |
roam_og_mou_8 | float64 | 30011 | 0 | 0.00 | 4382 |
roam_og_mou_6 | float64 | 30011 | 0 | 0.00 | 5174 |
loc_og_mou_9 | float64 | 30011 | 0 | 0.00 | 18207 |
roam_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 3370 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
last_date_of_month_7 | object | 30011 | 0 | 0.00 | 1 |
last_date_of_month_8 | object | 30011 | 0 | 0.00 | 1 |
last_date_of_month_9 | object | 30011 | 0 | 0.00 | 1 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
arpu_7 | float64 | 30011 | 0 | 0.00 | 29260 |
arpu_8 | float64 | 30011 | 0 | 0.00 | 28405 |
arpu_9 | float64 | 30011 | 0 | 0.00 | 27327 |
onnet_mou_6 | float64 | 30011 | 0 | 0.00 | 18813 |
onnet_mou_7 | float64 | 30011 | 0 | 0.00 | 18938 |
onnet_mou_8 | float64 | 30011 | 0 | 0.00 | 17604 |
onnet_mou_9 | float64 | 30011 | 0 | 0.00 | 16674 |
offnet_mou_6 | float64 | 30011 | 0 | 0.00 | 22454 |
offnet_mou_7 | float64 | 30011 | 0 | 0.00 | 22650 |
offnet_mou_8 | float64 | 30011 | 0 | 0.00 | 21513 |
offnet_mou_9 | float64 | 30011 | 0 | 0.00 | 20452 |
roam_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 4338 |
roam_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 3649 |
roam_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 3655 |
loc_og_mou_8 | float64 | 30011 | 0 | 0.00 | 18885 |
std_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 12777 |
loc_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 4817 |
isd_og_mou_9 | float64 | 30011 | 0 | 0.00 | 908 |
spl_og_mou_7 | float64 | 30011 | 0 | 0.00 | 3399 |
spl_og_mou_8 | float64 | 30011 | 0 | 0.00 | 3238 |
spl_og_mou_9 | float64 | 30011 | 0 | 0.00 | 2966 |
og_others_6 | float64 | 30011 | 0 | 0.00 | 862 |
og_others_7 | float64 | 30011 | 0 | 0.00 | 123 |
og_others_8 | float64 | 30011 | 0 | 0.00 | 133 |
og_others_9 | float64 | 30011 | 0 | 0.00 | 132 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_og_mou_7 | float64 | 30011 | 0 | 0.00 | 24913 |
total_og_mou_8 | float64 | 30011 | 0 | 0.00 | 23644 |
total_og_mou_9 | float64 | 30011 | 0 | 0.00 | 22615 |
loc_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 9872 |
loc_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 9961 |
loc_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 9671 |
loc_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 9407 |
loc_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16015 |
loc_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 16068 |
loc_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 15598 |
loc_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 15194 |
spl_og_mou_6 | float64 | 30011 | 0 | 0.00 | 3053 |
isd_og_mou_8 | float64 | 30011 | 0 | 0.00 | 940 |
std_og_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 12983 |
isd_og_mou_7 | float64 | 30011 | 0 | 0.00 | 1125 |
std_og_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 11781 |
std_og_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 11141 |
std_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 14518 |
std_og_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 14589 |
std_og_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 13326 |
std_og_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 12445 |
std_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 1773 |
std_og_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 1714 |
std_og_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 1627 |
std_og_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 1595 |
std_og_t2c_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2c_mou_7 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2c_mou_8 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2c_mou_9 | category | 30011 | 0 | 0.00 | 1 |
std_og_mou_6 | float64 | 30011 | 0 | 0.00 | 18325 |
std_og_mou_7 | float64 | 30011 | 0 | 0.00 | 18445 |
std_og_mou_8 | float64 | 30011 | 0 | 0.00 | 16864 |
std_og_mou_9 | float64 | 30011 | 0 | 0.00 | 15900 |
isd_og_mou_6 | float64 | 30011 | 0 | 0.00 | 1113 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.00 | 3025 |
1print(data[data['date_of_last_rech_6'].isnull()][['date_of_last_rech_6','total_rech_amt_6','total_rech_num_6']].nunique())2print(data[data['date_of_last_rech_7'].isnull()][['date_of_last_rech_7','total_rech_amt_7','total_rech_num_7']].nunique())3print(data[data['date_of_last_rech_8'].isnull()][['date_of_last_rech_8','total_rech_amt_8','total_rech_num_8']].nunique())4print(data[data['date_of_last_rech_9'].isnull()][['date_of_last_rech_9','total_rech_amt_9','total_rech_num_9']].nunique())
1date_of_last_rech_6 02total_rech_amt_6 13total_rech_num_6 14dtype: int645date_of_last_rech_7 06total_rech_amt_7 17total_rech_num_7 18dtype: int649date_of_last_rech_8 010total_rech_amt_8 111total_rech_num_8 112dtype: int6413date_of_last_rech_9 014total_rech_amt_9 115total_rech_num_9 116dtype: int64
1print("\n",data[data['date_of_last_rech_6'].isnull()][['total_rech_amt_6','total_rech_num_6']].head())2print("\n",data[data['date_of_last_rech_7'].isnull()][['total_rech_amt_7','total_rech_num_7']].head())3print("\n",data[data['date_of_last_rech_8'].isnull()][['total_rech_amt_8','total_rech_num_8']].head())4print("\n",data[data['date_of_last_rech_9'].isnull()][['total_rech_amt_9','total_rech_num_9']].head())
1total_rech_amt_6 total_rech_num_62mobile_number37001588448 0 047001223277 0 057000721536 0 067001490351 0 077000665415 0 089 total_rech_amt_7 total_rech_num_710mobile_number117000369789 0 0127001967148 0 0137000066601 0 0147001189556 0 0157002024450 0 01617 total_rech_amt_8 total_rech_num_818mobile_number197000340381 0 0207000608224 0 0217000369789 0 0227000248548 0 0237001967063 0 02425 total_rech_amt_9 total_rech_num_926mobile_number277000340381 0 0287000854899 0 0297000369789 0 0307001967063 0 0317000066601 0 0
- The columns ‘date_of_last_rech’ for june,july and August does not have any value becuase there are no recharges done by the user during those months.
Dropping columns with one unique value.
1metadata=metadata_matrix(data)2singular_value_cols=metadata[metadata['Unique_Values_Count']==1].index.values3#data.loc[metadata_matrix(data)['Unique_Values_Count']==1].index
1#Dropping singular value columns.2data.drop(columns=singular_value_cols,inplace=True)
1# Dropping date columns2# since they are not usage related columns and can't be used for modelling3date_columns = data.filter(regex='^date.*').columns4data.drop(columns=date_columns, inplace=True)5metadata_matrix(data)
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
arpu_6 | float64 | 30011 | 0 | 0.0 | 29261 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 20602 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 20096 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 19437 |
spl_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 78 |
spl_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 93 |
spl_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 85 |
spl_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 287 |
isd_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 3429 |
isd_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 3639 |
isd_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 3493 |
isd_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 3329 |
ic_others_6 | float64 | 30011 | 0 | 0.0 | 1227 |
ic_others_7 | float64 | 30011 | 0 | 0.0 | 1371 |
ic_others_8 | float64 | 30011 | 0 | 0.0 | 1259 |
ic_others_9 | float64 | 30011 | 0 | 0.0 | 1284 |
total_rech_num_6 | int64 | 30011 | 0 | 0.0 | 102 |
total_rech_num_7 | int64 | 30011 | 0 | 0.0 | 101 |
total_rech_num_8 | int64 | 30011 | 0 | 0.0 | 96 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 20711 |
std_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 7745 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.0 | 2241 |
std_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 8033 |
loc_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 19030 |
loc_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 18573 |
loc_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 18018 |
std_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.0 | 4608 |
std_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.0 | 4706 |
std_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.0 | 4486 |
std_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.0 | 4280 |
std_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.0 | 6680 |
std_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.0 | 6747 |
std_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.0 | 6420 |
std_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.0 | 6168 |
std_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.0 | 2033 |
std_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.0 | 2075 |
std_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.0 | 1941 |
std_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.0 | 1971 |
std_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 8391 |
std_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 8543 |
total_rech_num_9 | int64 | 30011 | 0 | 0.0 | 96 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.0 | 2265 |
arpu_7 | float64 | 30011 | 0 | 0.0 | 29260 |
monthly_2g_8 | category | 30011 | 0 | 0.0 | 6 |
sachet_2g_6 | category | 30011 | 0 | 0.0 | 30 |
sachet_2g_7 | category | 30011 | 0 | 0.0 | 34 |
sachet_2g_8 | category | 30011 | 0 | 0.0 | 34 |
sachet_2g_9 | category | 30011 | 0 | 0.0 | 29 |
monthly_3g_6 | category | 30011 | 0 | 0.0 | 12 |
monthly_3g_7 | category | 30011 | 0 | 0.0 | 15 |
monthly_3g_8 | category | 30011 | 0 | 0.0 | 12 |
monthly_3g_9 | category | 30011 | 0 | 0.0 | 11 |
sachet_3g_6 | category | 30011 | 0 | 0.0 | 25 |
sachet_3g_7 | category | 30011 | 0 | 0.0 | 27 |
sachet_3g_8 | category | 30011 | 0 | 0.0 | 29 |
sachet_3g_9 | category | 30011 | 0 | 0.0 | 27 |
aon | int64 | 30011 | 0 | 0.0 | 3321 |
vbc_3g_8 | float64 | 30011 | 0 | 0.0 | 7291 |
vbc_3g_7 | float64 | 30011 | 0 | 0.0 | 7318 |
vbc_3g_6 | float64 | 30011 | 0 | 0.0 | 6864 |
vbc_3g_9 | float64 | 30011 | 0 | 0.0 | 2171 |
monthly_2g_9 | category | 30011 | 0 | 0.0 | 5 |
monthly_2g_7 | category | 30011 | 0 | 0.0 | 6 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.0 | 2299 |
monthly_2g_6 | category | 30011 | 0 | 0.0 | 5 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.0 | 2248 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.0 | 170 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.0 | 151 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.0 | 182 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.0 | 186 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.0 | 158 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.0 | 149 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.0 | 179 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.0 | 170 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.0 | 7809 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.0 | 7813 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.0 | 7310 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.0 | 6984 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.0 | 7043 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.0 | 7440 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.0 | 7151 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.0 | 7016 |
loc_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 19133 |
loc_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.0 | 4611 |
loc_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.0 | 4705 |
loc_og_t2t_mou_7 | float64 | 30011 | 0 | 0.0 | 11154 |
loc_og_t2t_mou_9 | float64 | 30011 | 0 | 0.0 | 10360 |
loc_og_t2m_mou_6 | float64 | 30011 | 0 | 0.0 | 16747 |
loc_og_t2m_mou_7 | float64 | 30011 | 0 | 0.0 | 16872 |
loc_og_t2m_mou_8 | float64 | 30011 | 0 | 0.0 | 16165 |
loc_og_t2m_mou_9 | float64 | 30011 | 0 | 0.0 | 15585 |
loc_og_t2f_mou_6 | float64 | 30011 | 0 | 0.0 | 3252 |
loc_og_t2f_mou_7 | float64 | 30011 | 0 | 0.0 | 3267 |
loc_og_t2f_mou_8 | float64 | 30011 | 0 | 0.0 | 3124 |
loc_og_t2f_mou_9 | float64 | 30011 | 0 | 0.0 | 3111 |
loc_og_t2c_mou_6 | float64 | 30011 | 0 | 0.0 | 1658 |
loc_og_t2c_mou_7 | float64 | 30011 | 0 | 0.0 | 1750 |
loc_og_t2c_mou_8 | float64 | 30011 | 0 | 0.0 | 1730 |
loc_og_t2c_mou_9 | float64 | 30011 | 0 | 0.0 | 1576 |
loc_og_mou_6 | float64 | 30011 | 0 | 0.0 | 19691 |
loc_og_mou_7 | float64 | 30011 | 0 | 0.0 | 19880 |
loc_og_mou_8 | float64 | 30011 | 0 | 0.0 | 18885 |
loc_og_mou_9 | float64 | 30011 | 0 | 0.0 | 18207 |
loc_og_t2t_mou_8 | float64 | 30011 | 0 | 0.0 | 10772 |
loc_og_t2t_mou_6 | float64 | 30011 | 0 | 0.0 | 11151 |
loc_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.0 | 4897 |
roam_og_mou_9 | float64 | 30011 | 0 | 0.0 | 4004 |
arpu_8 | float64 | 30011 | 0 | 0.0 | 28405 |
arpu_9 | float64 | 30011 | 0 | 0.0 | 27327 |
onnet_mou_6 | float64 | 30011 | 0 | 0.0 | 18813 |
onnet_mou_7 | float64 | 30011 | 0 | 0.0 | 18938 |
onnet_mou_8 | float64 | 30011 | 0 | 0.0 | 17604 |
onnet_mou_9 | float64 | 30011 | 0 | 0.0 | 16674 |
offnet_mou_6 | float64 | 30011 | 0 | 0.0 | 22454 |
offnet_mou_7 | float64 | 30011 | 0 | 0.0 | 22650 |
offnet_mou_8 | float64 | 30011 | 0 | 0.0 | 21513 |
offnet_mou_9 | float64 | 30011 | 0 | 0.0 | 20452 |
roam_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 4338 |
roam_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 3649 |
roam_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 3655 |
roam_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 3370 |
roam_og_mou_6 | float64 | 30011 | 0 | 0.0 | 5174 |
roam_og_mou_7 | float64 | 30011 | 0 | 0.0 | 4431 |
roam_og_mou_8 | float64 | 30011 | 0 | 0.0 | 4382 |
std_og_t2t_mou_6 | float64 | 30011 | 0 | 0.0 | 12777 |
std_og_t2t_mou_7 | float64 | 30011 | 0 | 0.0 | 12983 |
std_og_t2t_mou_8 | float64 | 30011 | 0 | 0.0 | 11781 |
std_og_t2t_mou_9 | float64 | 30011 | 0 | 0.0 | 11141 |
og_others_6 | float64 | 30011 | 0 | 0.0 | 862 |
og_others_7 | float64 | 30011 | 0 | 0.0 | 123 |
og_others_8 | float64 | 30011 | 0 | 0.0 | 133 |
og_others_9 | float64 | 30011 | 0 | 0.0 | 132 |
total_og_mou_6 | float64 | 30011 | 0 | 0.0 | 24607 |
total_og_mou_7 | float64 | 30011 | 0 | 0.0 | 24913 |
total_og_mou_8 | float64 | 30011 | 0 | 0.0 | 23644 |
total_og_mou_9 | float64 | 30011 | 0 | 0.0 | 22615 |
loc_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.0 | 9872 |
loc_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.0 | 9961 |
loc_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.0 | 9671 |
loc_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.0 | 9407 |
loc_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.0 | 16015 |
loc_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.0 | 16068 |
loc_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.0 | 15598 |
loc_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.0 | 15194 |
loc_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.0 | 4817 |
spl_og_mou_9 | float64 | 30011 | 0 | 0.0 | 2966 |
spl_og_mou_8 | float64 | 30011 | 0 | 0.0 | 3238 |
spl_og_mou_7 | float64 | 30011 | 0 | 0.0 | 3399 |
std_og_t2f_mou_9 | float64 | 30011 | 0 | 0.0 | 1595 |
std_og_t2m_mou_6 | float64 | 30011 | 0 | 0.0 | 14518 |
std_og_t2m_mou_7 | float64 | 30011 | 0 | 0.0 | 14589 |
std_og_t2m_mou_8 | float64 | 30011 | 0 | 0.0 | 13326 |
std_og_t2m_mou_9 | float64 | 30011 | 0 | 0.0 | 12445 |
std_og_t2f_mou_6 | float64 | 30011 | 0 | 0.0 | 1773 |
std_og_t2f_mou_7 | float64 | 30011 | 0 | 0.0 | 1714 |
std_og_t2f_mou_8 | float64 | 30011 | 0 | 0.0 | 1627 |
std_og_mou_6 | float64 | 30011 | 0 | 0.0 | 18325 |
spl_og_mou_6 | float64 | 30011 | 0 | 0.0 | 3053 |
std_og_mou_7 | float64 | 30011 | 0 | 0.0 | 18445 |
std_og_mou_8 | float64 | 30011 | 0 | 0.0 | 16864 |
std_og_mou_9 | float64 | 30011 | 0 | 0.0 | 15900 |
isd_og_mou_6 | float64 | 30011 | 0 | 0.0 | 1113 |
isd_og_mou_7 | float64 | 30011 | 0 | 0.0 | 1125 |
isd_og_mou_8 | float64 | 30011 | 0 | 0.0 | 940 |
isd_og_mou_9 | float64 | 30011 | 0 | 0.0 | 908 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.0 | 3025 |
Tagging Churn (TARGET variable)
1data['Churn'] = 02churned_customers = data.query('total_og_mou_9 == 0 & total_ic_mou_9 == 0 & vol_2g_mb_9 == 0 & vol_3g_mb_9 == 0').index3data.loc[churned_customers,'Churn']=14data['Churn'] = data['Churn'].astype('category')
1# Churn proportions2data['Churn'].value_counts(normalize=True).to_frame()
Churn | |
0 | 0.913598 |
1 | 0.086402 |
Dropping Churn Phase Columns
1churn_phase_columns = data.filter(regex='9$').columns2data.drop(columns=churn_phase_columns, inplace=True)3print('Retained Columns')4data.columns.to_frame(index=False)
1Retained Columns
0 | |
0 | arpu_6 |
1 | arpu_7 |
2 | arpu_8 |
3 | onnet_mou_6 |
4 | onnet_mou_7 |
5 | onnet_mou_8 |
6 | offnet_mou_6 |
7 | offnet_mou_7 |
8 | offnet_mou_8 |
9 | roam_ic_mou_6 |
10 | roam_ic_mou_7 |
11 | roam_ic_mou_8 |
12 | roam_og_mou_6 |
13 | roam_og_mou_7 |
14 | roam_og_mou_8 |
15 | loc_og_t2t_mou_6 |
16 | loc_og_t2t_mou_7 |
17 | loc_og_t2t_mou_8 |
18 | loc_og_t2m_mou_6 |
19 | loc_og_t2m_mou_7 |
20 | loc_og_t2m_mou_8 |
21 | loc_og_t2f_mou_6 |
22 | loc_og_t2f_mou_7 |
23 | loc_og_t2f_mou_8 |
24 | loc_og_t2c_mou_6 |
25 | loc_og_t2c_mou_7 |
26 | loc_og_t2c_mou_8 |
27 | loc_og_mou_6 |
28 | loc_og_mou_7 |
29 | loc_og_mou_8 |
30 | std_og_t2t_mou_6 |
31 | std_og_t2t_mou_7 |
32 | std_og_t2t_mou_8 |
33 | std_og_t2m_mou_6 |
34 | std_og_t2m_mou_7 |
35 | std_og_t2m_mou_8 |
36 | std_og_t2f_mou_6 |
37 | std_og_t2f_mou_7 |
38 | std_og_t2f_mou_8 |
39 | std_og_mou_6 |
40 | std_og_mou_7 |
41 | std_og_mou_8 |
42 | isd_og_mou_6 |
43 | isd_og_mou_7 |
44 | isd_og_mou_8 |
45 | spl_og_mou_6 |
46 | spl_og_mou_7 |
47 | spl_og_mou_8 |
48 | og_others_6 |
49 | og_others_7 |
50 | og_others_8 |
51 | total_og_mou_6 |
52 | total_og_mou_7 |
53 | total_og_mou_8 |
54 | loc_ic_t2t_mou_6 |
55 | loc_ic_t2t_mou_7 |
56 | loc_ic_t2t_mou_8 |
57 | loc_ic_t2m_mou_6 |
58 | loc_ic_t2m_mou_7 |
59 | loc_ic_t2m_mou_8 |
60 | loc_ic_t2f_mou_6 |
61 | loc_ic_t2f_mou_7 |
62 | loc_ic_t2f_mou_8 |
63 | loc_ic_mou_6 |
64 | loc_ic_mou_7 |
65 | loc_ic_mou_8 |
66 | std_ic_t2t_mou_6 |
67 | std_ic_t2t_mou_7 |
68 | std_ic_t2t_mou_8 |
69 | std_ic_t2m_mou_6 |
70 | std_ic_t2m_mou_7 |
71 | std_ic_t2m_mou_8 |
72 | std_ic_t2f_mou_6 |
73 | std_ic_t2f_mou_7 |
74 | std_ic_t2f_mou_8 |
75 | std_ic_mou_6 |
76 | std_ic_mou_7 |
77 | std_ic_mou_8 |
78 | total_ic_mou_6 |
79 | total_ic_mou_7 |
80 | total_ic_mou_8 |
81 | spl_ic_mou_6 |
82 | spl_ic_mou_7 |
83 | spl_ic_mou_8 |
84 | isd_ic_mou_6 |
85 | isd_ic_mou_7 |
86 | isd_ic_mou_8 |
87 | ic_others_6 |
88 | ic_others_7 |
89 | ic_others_8 |
90 | total_rech_num_6 |
91 | total_rech_num_7 |
92 | total_rech_num_8 |
93 | total_rech_amt_6 |
94 | total_rech_amt_7 |
95 | total_rech_amt_8 |
96 | max_rech_amt_6 |
97 | max_rech_amt_7 |
98 | max_rech_amt_8 |
99 | last_day_rch_amt_6 |
100 | last_day_rch_amt_7 |
101 | last_day_rch_amt_8 |
102 | vol_2g_mb_6 |
103 | vol_2g_mb_7 |
104 | vol_2g_mb_8 |
105 | vol_3g_mb_6 |
106 | vol_3g_mb_7 |
107 | vol_3g_mb_8 |
108 | monthly_2g_6 |
109 | monthly_2g_7 |
110 | monthly_2g_8 |
111 | sachet_2g_6 |
112 | sachet_2g_7 |
113 | sachet_2g_8 |
114 | monthly_3g_6 |
115 | monthly_3g_7 |
116 | monthly_3g_8 |
117 | sachet_3g_6 |
118 | sachet_3g_7 |
119 | sachet_3g_8 |
120 | aon |
121 | vbc_3g_8 |
122 | vbc_3g_7 |
123 | vbc_3g_6 |
124 | Average_rech_amt_6n7 |
125 | Churn |
1print('retained no of rows', data.shape[0])2print('retain no of columns', data.shape[1])
1retained no of rows 300112retain no of columns 126
Exploratory Data Analysis
Summary Statistics
1data.describe()
arpu_6 | arpu_7 | arpu_8 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | total_og_mou_6 | total_og_mou_7 | total_og_mou_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | total_ic_mou_6 | total_ic_mou_7 | total_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | total_rech_amt_6 | total_rech_amt_7 | total_rech_amt_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | vol_2g_mb_6 | vol_2g_mb_7 | vol_2g_mb_8 | vol_3g_mb_6 | vol_3g_mb_7 | vol_3g_mb_8 | aon | vbc_3g_8 | vbc_3g_7 | vbc_3g_6 | Average_rech_amt_6n7 | |
count | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.00000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.00000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.00000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 |
mean | 587.284404 | 589.135427 | 534.857433 | 296.034461 | 304.343206 | 267.600412 | 417.933372 | 423.924375 | 375.021691 | 17.412764 | 13.522114 | 13.25627 | 29.321648 | 22.036003 | 21.469272 | 94.680696 | 95.729729 | 87.139995 | 181.279583 | 181.271524 | 167.591199 | 6.97933 | 7.097268 | 6.494314 | 1.567160 | 1.862229 | 1.712739 | 282.948414 | 284.107492 | 261.233938 | 189.753131 | 199.877508 | 172.196408 | 203.097767 | 213.411914 | 179.568790 | 2.010766 | 2.034241 | 1.789728 | 394.865994 | 415.327988 | 353.558826 | 2.264425 | 2.207400 | 2.029314 | 5.916364 | 7.425487 | 6.885193 | 0.692507 | 0.047600 | 0.059131 | 686.697541 | 709.124730 | 623.774684 | 68.749054 | 70.311351 | 65.936968 | 159.613810 | 160.813032 | 153.628517 | 15.595629 | 16.510023 | 14.706512 | 243.968340 | 247.644401 | 234.281577 | 16.229350 | 16.893723 | 15.051559 | 32.015163 | 33.477150 | 30.434765 | 2.874506 | 2.992948 | 2.680925 | 51.122992 | 53.36786 | 48.170990 | 307.512073 | 314.875472 | 295.426531 | 0.066731 | 0.018066 | 0.027660 | 11.156530 | 12.360190 | 11.700835 | 1.188803 | 1.476889 | 1.237756 | 12.121322 | 11.913465 | 10.225317 | 697.365833 | 695.962880 | 613.638799 | 171.414048 | 175.661058 | 162.869348 | 104.485655 | 105.287128 | 95.653294 | 78.859009 | 78.171382 | 69.209105 | 258.392681 | 278.093737 | 269.864111 | 1264.064776 | 129.439626 | 135.127102 | 121.360548 | 696.664356 |
std | 442.722413 | 462.897814 | 492.259586 | 460.775592 | 481.780488 | 466.560947 | 470.588583 | 486.525332 | 477.489377 | 79.152657 | 76.303736 | 74.55207 | 118.570414 | 97.925249 | 106.244774 | 236.849265 | 248.132623 | 234.721938 | 250.132066 | 240.722132 | 234.862468 | 22.66552 | 22.588864 | 20.220028 | 6.889317 | 9.255645 | 7.397562 | 379.985249 | 375.837282 | 366.539171 | 409.716719 | 428.119476 | 410.033964 | 413.489240 | 437.941904 | 416.752834 | 12.457422 | 13.350441 | 11.700376 | 606.508681 | 637.446710 | 616.219690 | 45.918087 | 45.619381 | 44.794926 | 18.621373 | 23.065743 | 22.893414 | 2.281325 | 2.741786 | 3.320320 | 660.356820 | 685.071178 | 685.983313 | 158.647160 | 167.315954 | 155.702334 | 222.001036 | 219.432004 | 217.026349 | 45.827009 | 49.478371 | 43.714061 | 312.805586 | 315.468343 | 307.043800 | 78.862358 | 84.691403 | 72.433104 | 101.084965 | 105.806605 | 105.308898 | 19.928472 | 20.511317 | 20.269535 | 140.504104 | 149.17944 | 140.965196 | 361.159561 | 369.654489 | 360.343153 | 0.194273 | 0.181944 | 0.116574 | 67.258387 | 76.992293 | 74.928607 | 13.987003 | 15.406483 | 12.889879 | 9.543550 | 9.605532 | 9.478572 | 539.325984 | 562.143146 | 601.821630 | 174.703215 | 181.545389 | 172.605809 | 142.767207 | 141.148386 | 145.260363 | 277.445058 | 280.331857 | 268.494284 | 866.195376 | 855.682340 | 859.299266 | 975.263117 | 390.478591 | 408.024394 | 389.726031 | 488.782088 |
min | -2258.709000 | -2014.045000 | -945.808000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 180.000000 | 0.000000 | 0.000000 | 0.000000 | 368.500000 |
25% | 364.161000 | 365.004500 | 289.609500 | 41.110000 | 40.950000 | 27.010000 | 137.335000 | 135.680000 | 95.695000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 8.320000 | 9.130000 | 5.790000 | 30.290000 | 33.580000 | 22.420000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 51.010000 | 56.710000 | 38.270000 | 0.000000 | 0.000000 | 0.000000 | 1.600000 | 1.330000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.950000 | 5.555000 | 1.780000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 266.170000 | 275.045000 | 188.790000 | 8.290000 | 9.460000 | 6.810000 | 33.460000 | 38.130000 | 29.660000 | 0.000000 | 0.000000 | 0.000000 | 56.700000 | 63.535000 | 49.985000 | 0.000000 | 0.000000 | 0.000000 | 0.450000 | 0.480000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.630000 | 2.78000 | 1.430000 | 89.975000 | 98.820000 | 78.930000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6.000000 | 6.000000 | 4.000000 | 432.000000 | 426.500000 | 309.000000 | 110.000000 | 110.000000 | 67.000000 | 30.000000 | 27.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 480.000000 | 0.000000 | 0.000000 | 0.000000 | 450.000000 |
50% | 495.682000 | 493.561000 | 452.091000 | 125.830000 | 125.460000 | 99.440000 | 282.190000 | 281.940000 | 240.940000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 32.590000 | 33.160000 | 28.640000 | 101.240000 | 104.340000 | 89.810000 | 0.33000 | 0.400000 | 0.160000 | 0.000000 | 0.000000 | 0.000000 | 166.310000 | 170.440000 | 148.280000 | 12.830000 | 13.350000 | 5.930000 | 37.730000 | 37.530000 | 23.660000 | 0.000000 | 0.000000 | 0.000000 | 126.010000 | 131.730000 | 72.890000 | 0.000000 | 0.000000 | 0.000000 | 0.210000 | 0.780000 | 0.490000 | 0.000000 | 0.000000 | 0.000000 | 510.230000 | 525.580000 | 435.330000 | 29.130000 | 30.130000 | 26.840000 | 93.940000 | 96.830000 | 89.810000 | 1.960000 | 2.210000 | 1.850000 | 151.060000 | 154.830000 | 142.840000 | 1.050000 | 1.200000 | 0.560000 | 7.080000 | 7.460000 | 5.710000 | 0.000000 | 0.000000 | 0.000000 | 15.030000 | 16.11000 | 12.560000 | 205.240000 | 211.190000 | 193.440000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9.000000 | 9.000000 | 8.000000 | 584.000000 | 581.000000 | 520.000000 | 120.000000 | 128.000000 | 130.000000 | 110.000000 | 98.000000 | 50.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 914.000000 | 0.000000 | 0.000000 | 0.000000 | 568.500000 |
75% | 703.922000 | 700.788000 | 671.150000 | 353.310000 | 359.925000 | 297.735000 | 523.125000 | 532.695000 | 482.610000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 91.460000 | 91.480000 | 84.670000 | 240.165000 | 239.485000 | 223.590000 | 5.09000 | 5.260000 | 4.680000 | 0.000000 | 0.100000 | 0.050000 | 374.475000 | 375.780000 | 348.310000 | 178.085000 | 191.380000 | 132.820000 | 211.210000 | 223.010000 | 164.725000 | 0.000000 | 0.000000 | 0.000000 | 573.090000 | 615.150000 | 481.030000 | 0.000000 | 0.000000 | 0.000000 | 5.160000 | 7.110000 | 6.380000 | 0.000000 | 0.000000 | 0.000000 | 899.505000 | 931.050000 | 833.100000 | 73.640000 | 74.680000 | 70.330000 | 202.830000 | 203.485000 | 196.975000 | 12.440000 | 13.035000 | 11.605000 | 315.500000 | 316.780000 | 302.110000 | 10.280000 | 10.980000 | 8.860000 | 27.540000 | 29.235000 | 25.330000 | 0.180000 | 0.260000 | 0.130000 | 47.540000 | 50.36000 | 43.410000 | 393.680000 | 396.820000 | 380.410000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.060000 | 0.000000 | 0.060000 | 15.000000 | 15.000000 | 13.000000 | 837.000000 | 835.000000 | 790.000000 | 200.000000 | 200.000000 | 198.000000 | 120.000000 | 130.000000 | 130.000000 | 14.450000 | 14.960000 | 9.620000 | 0.000000 | 2.080000 | 0.000000 | 1924.000000 | 1.600000 | 1.990000 | 0.000000 | 795.500000 |
max | 27731.088000 | 35145.834000 | 33543.624000 | 7376.710000 | 8157.780000 | 10752.560000 | 8362.360000 | 9667.130000 | 14007.340000 | 2613.310000 | 3813.290000 | 4169.81000 | 3775.110000 | 2812.040000 | 5337.040000 | 6431.330000 | 7400.660000 | 10752.560000 | 4729.740000 | 4557.140000 | 4961.330000 | 1466.03000 | 1196.430000 | 928.490000 | 342.860000 | 569.710000 | 351.830000 | 10643.380000 | 7674.780000 | 11039.910000 | 7366.580000 | 8133.660000 | 8014.430000 | 8314.760000 | 9284.740000 | 13950.040000 | 628.560000 | 544.630000 | 516.910000 | 8432.990000 | 10936.730000 | 13980.060000 | 5900.660000 | 5490.280000 | 5681.540000 | 1023.210000 | 1265.790000 | 1390.880000 | 100.610000 | 370.130000 | 394.930000 | 10674.030000 | 11365.310000 | 14043.060000 | 6351.440000 | 5709.590000 | 4003.210000 | 4693.860000 | 4388.730000 | 5738.460000 | 1678.410000 | 1983.010000 | 1588.530000 | 6496.110000 | 6466.740000 | 5748.810000 | 5459.560000 | 5800.930000 | 4309.290000 | 4630.230000 | 3470.380000 | 5645.860000 | 1351.110000 | 1136.080000 | 1394.890000 | 5459.630000 | 6745.76000 | 5957.140000 | 6798.640000 | 7279.080000 | 5990.710000 | 19.760000 | 21.330000 | 6.230000 | 3965.690000 | 4747.910000 | 4100.380000 | 1344.140000 | 1495.940000 | 1209.860000 | 307.000000 | 138.000000 | 196.000000 | 35190.000000 | 40335.000000 | 45320.000000 | 4010.000000 | 4010.000000 | 4449.000000 | 4010.000000 | 4010.000000 | 4449.000000 | 10285.900000 | 7873.550000 | 11117.610000 | 45735.400000 | 28144.120000 | 30036.060000 | 4321.000000 | 12916.220000 | 9165.600000 | 11166.210000 | 37762.500000 |
- The telecom company has many users with negative average revenues in both phases. These users are likely to churn
1categorical_columns = data.dtypes[data.dtypes == 'category'].index.values2print('Mode : ')3data[categorical_columns].mode().T
1Mode :
0 | |
monthly_2g_6 | 0 |
monthly_2g_7 | 0 |
monthly_2g_8 | 0 |
sachet_2g_6 | 0 |
sachet_2g_7 | 0 |
sachet_2g_8 | 0 |
monthly_3g_6 | 0 |
monthly_3g_7 | 0 |
monthly_3g_8 | 0 |
sachet_3g_6 | 0 |
sachet_3g_7 | 0 |
sachet_3g_8 | 0 |
Churn | 0 |
- Most customers prefer the plans of ‘0’ category
Univariate Analysis
1churned_customers = data[data['Churn'] == 1]2non_churned_customers = data[data['Churn'] == 0]
Age on Network
1plt.figure(figsize=(12,8))2sns.violinplot(x='aon', y='Churn', data=data)3plt.title('Age on Network vs Churn')4plt.show()
- The customers with lesser ‘aon’ are more likely to Churn when compared to the Customers with higer ‘aon’
1# function for numerical variable univariate analysis2from tabulate import tabulate3def num_univariate_analysis(column_names,scale='linear') :4 # boxplot for column vs target56 fig = plt.figure(figsize=(16,8))7 ax1 = fig.add_subplot(1,3,1)8 sns.violinplot(x='Churn', y = column_names[0], data = data, ax=ax1)9 title = ''.join(column_names[0]) +' vs Churn'10 ax1.set(title=title)11 if scale == 'log' :12 plt.yscale('log')13 ax1.set(ylabel= column_names[0] + '(Log Scale)')1415 ax2 = fig.add_subplot(1,3,2)16 sns.violinplot(x='Churn', y = column_names[1], data = data, ax=ax2)17 title = ''.join(column_names[1]) +' vs Churn'18 ax2.set(title=title)19 if scale == 'log' :20 plt.yscale('log')21 ax2.set(ylabel= column_names[1] + '(Log Scale)')2223 ax3 = fig.add_subplot(1,3,3)24 sns.violinplot(x='Churn', y = column_names[2], data = data, ax=ax3)25 title = ''.join(column_names[2]) +' vs Churn'26 ax3.set(title=title)27 if scale == 'log' :28 plt.yscale('log')29 ax3.set(ylabel= column_names[2] + '(Log Scale)')3031 # summary statistic3233 print('Customers who churned (Churn : 1)')34 print(churned_customers[column_names].describe())3536 print('\nCustomers who did not churn (Churn : 0)')37 print(non_churned_customers[column_names].describe(),'\n')
1# function for categorical variable univariate analysis2!pip install sidetable3import sidetable4def cat_univariate_analysis(column_names,figsize=(16,4)) :56 # column vs target count plot7 fig = plt.figure(figsize=figsize)89 ax1 = fig.add_subplot(1,3,1)10 sns.countplot(x=column_names[0],hue='Churn',data=data, ax=ax1)11 title = column_names[0] + ' vs No of Churned Customers'12 ax1.set(title= title)13 ax1.legend(loc='upper right')141516 ax2 = fig.add_subplot(1,3,2)17 sns.countplot(x=column_names[1],hue='Churn',data=data, ax=ax2)18 title = column_names[1] + ' vs No of Churned Customers'19 ax2.set(title= title)20 ax2.legend(loc='upper right')212223 ax3 = fig.add_subplot(1,3,3)24 sns.countplot(x=column_names[2],hue='Churn',data=data, ax=ax3)25 title = column_names[2] + ' vs No of Churned Customers'26 ax3.set(title= title)27 ax3.legend(loc='upper right')282930 # Percentages31 print('Customers who churned (Churn : 1)')32 print(tabulate(pd.DataFrame(churned_customers.stb.freq([column_names[0]])), headers='keys', tablefmt='psql'),'\n')33 print(tabulate(pd.DataFrame(churned_customers.stb.freq([column_names[1]])), headers='keys', tablefmt='psql'),'\n')34 print(tabulate(pd.DataFrame(churned_customers.stb.freq([column_names[2]])), headers='keys', tablefmt='psql'),'\n')3536 print('\nCustomers who did not churn (Churn : 0)')37 print(tabulate(pd.DataFrame(non_churned_customers.stb.freq([column_names[0]])), headers='keys', tablefmt='psql'),'\n')38 print(tabulate(pd.DataFrame(non_churned_customers.stb.freq([column_names[1]])), headers='keys', tablefmt='psql'),'\n')39 print(tabulate(pd.DataFrame(non_churned_customers.stb.freq([column_names[2]])), headers='keys', tablefmt='psql'),'\n')
1Requirement already satisfied: sidetable in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (0.3.0)2Requirement already satisfied: pandas>=1.0 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from sidetable) (1.1.2)3Requirement already satisfied: python-dateutil>=2.7.3 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from pandas>=1.0->sidetable) (2.8.1)4Requirement already satisfied: numpy>=1.15.4 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from pandas>=1.0->sidetable) (1.18.1)5Requirement already satisfied: pytz>=2017.2 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from pandas>=1.0->sidetable) (2019.3)6Requirement already satisfied: six>=1.5 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas>=1.0->sidetable) (1.14.0)
arpu_6, arpu_7 , arpu_8
1columns = ['arpu_6','arpu_7','arpu_8']2num_univariate_analysis(columns,'log')
1Customers who churned (Churn : 1)2 arpu_6 arpu_7 arpu_83count 2593.000000 2593.000000 2593.0000004mean 678.716970 550.511946 243.0633435std 551.792864 517.241221 378.8435316min -209.465000 -158.963000 -37.887000725% 396.507000 289.641000 0.000000850% 573.396000 464.674000 101.894000975% 819.460000 691.588000 351.02800010max 11505.508000 13224.119000 5228.8260001112Customers who did not churn (Churn : 0)13 arpu_6 arpu_7 arpu_814count 27418.000000 27418.000000 27418.00000015mean 578.637360 592.788162 562.45324816std 429.988265 457.265996 492.80265517min -2258.709000 -2014.045000 -945.8080001825% 362.218000 369.610500 319.1185001950% 489.324000 496.182500 471.0240002075% 690.891750 701.418000 690.92100021max 27731.088000 35145.834000 33543.624000
- We can understand from the above plots that revenue generated by the Customers who are about to churn is very unstable.
- The Customers whose arpu decreases in 7th month are more likely to churn when compared to ones with increase in arpu.
total_og_mou_6, total_og_mou_7, total_og_mou_8
1columns = ['total_og_mou_6', 'total_og_mou_7', 'total_og_mou_8']2num_univariate_analysis(columns)
1Customers who churned (Churn : 1)2 total_og_mou_6 total_og_mou_7 total_og_mou_83count 2593.000000 2593.000000 2593.0000004mean 867.961342 677.868909 225.0837415std 852.697688 786.961399 471.6727186min 0.000000 0.000000 0.000000725% 277.880000 110.090000 0.000000850% 658.360000 466.910000 0.000000975% 1209.040000 926.760000 255.81000010max 8488.360000 8285.640000 5206.2100001112Customers who did not churn (Churn : 0)13 total_og_mou_6 total_og_mou_7 total_og_mou_814count 27418.000000 27418.000000 27418.00000015mean 669.554896 712.080684 661.48004616std 636.531612 674.580516 691.07911317min 0.000000 0.000000 0.0000001825% 265.682500 284.500000 227.9700001950% 500.410000 529.935000 470.4750002075% 872.070000 931.197500 866.04500021max 10674.030000 11365.310000 14043.060000
- The Customers with high total_og_mou in 6th month and lower total_og_mou in 7th month are more likely to churn compared to the rest.
‘total_ic_mou_6’, ‘total_ic_mou_7’, ‘total_ic_mou_8’
1columns = ['total_ic_mou_6', 'total_ic_mou_7', 'total_ic_mou_8']2num_univariate_analysis(columns)
1Customers who churned (Churn : 1)2 total_ic_mou_6 total_ic_mou_7 total_ic_mou_83count 2593.000000 2593.000000 2593.0000004mean 241.954404 193.341076 68.8070425std 360.836586 318.183813 154.4503406min 0.000000 0.000000 0.000000725% 49.460000 27.890000 0.000000850% 137.330000 99.980000 0.000000975% 289.510000 235.740000 70.29000010max 6633.180000 5137.560000 1859.2800001112Customers who did not churn (Churn : 0)13 total_ic_mou_6 total_ic_mou_7 total_ic_mou_814count 27418.000000 27418.000000 27418.00000015mean 313.712052 326.369333 316.85859516std 360.580253 372.112086 366.81871717min 0.000000 0.000000 0.0000001825% 94.460000 107.802500 98.2650001950% 212.160000 222.290000 212.3600002075% 401.602500 410.182500 402.27000021max 6798.640000 7279.080000 5990.710000
- The Customers with decrease in rate of total_ic_mou in 7th month are more likely to churn, compared to the rest.
vol_2g_mb_6, vol_2g_mb_7, vol_2g_mb_8
1columns = ['vol_2g_mb_6', 'vol_2g_mb_7', 'vol_2g_mb_8']2num_univariate_analysis(columns, 'log')
1Customers who churned (Churn : 1)2 vol_2g_mb_6 vol_2g_mb_7 vol_2g_mb_83count 2593.000000 2593.000000 2593.0000004mean 60.775588 49.054393 15.2831855std 243.084276 219.485813 120.9751116min 0.000000 0.000000 0.000000725% 0.000000 0.000000 0.000000850% 0.000000 0.000000 0.000000975% 0.000000 0.000000 0.00000010max 4017.160000 3430.730000 3349.1900001112Customers who did not churn (Churn : 0)13 vol_2g_mb_6 vol_2g_mb_7 vol_2g_mb_814count 27418.000000 27418.000000 27418.00000015mean 80.569210 80.925060 74.30903616std 280.420463 285.265125 277.88933917min 0.000000 0.000000 0.0000001825% 0.000000 0.000000 0.0000001950% 0.000000 0.000000 0.0000002075% 16.937500 18.267500 14.24500021max 10285.900000 7873.550000 11117.610000
- Customers with stable usage of 2g volumes throughout 6 and 7 months are less likely to churn.
- Customers with fall in consumption of 2g volumes in 7th month are more likely to Churn.
vol_3g_mb_6, vol_3g_mb_7, vol_3g_mb_8, monthly_3g_6
1columns = ['vol_3g_mb_6', 'vol_3g_mb_7', 'vol_3g_mb_8', 'monthly_3g_6']2num_univariate_analysis(columns, 'log')
1Customers who churned (Churn : 1)2 vol_3g_mb_6 vol_3g_mb_7 vol_3g_mb_83count 2593.000000 2593.000000 2593.0000004mean 188.395461 157.714254 56.7768805std 715.327843 690.773561 446.5327696min 0.000000 0.000000 0.000000725% 0.000000 0.000000 0.000000850% 0.000000 0.000000 0.000000975% 0.000000 0.000000 0.00000010max 9400.120000 15115.510000 13440.7200001112Customers who did not churn (Churn : 0)13 vol_3g_mb_6 vol_3g_mb_7 vol_3g_mb_814count 27418.000000 27418.000000 27418.00000015mean 265.012522 289.478375 290.01639016std 878.846885 868.808831 885.82110517min 0.000000 0.000000 0.0000001825% 0.000000 0.000000 0.0000001950% 0.000000 0.000000 0.0000002075% 0.000000 35.855000 27.12000021max 45735.400000 28144.120000 30036.060000
- Customers with stable usage of 3g volumes throughout 6 and 7 months are less likely to churn.
- Customers with fall in consumption of 3g volumes in 7th month are more likely to Churn.
monthly_2g_6, monthly_2g_7, monthly_2g_8
1columns = ['monthly_2g_6', 'monthly_2g_7', 'monthly_2g_8']2cat_univariate_analysis(columns)
1Customers who churned (Churn : 1)2+----+----------------+---------+-------------+--------------------+----------------------+3| | monthly_2g_6 | Count | Percent | Cumulative Count | Cumulative Percent |4|----+----------------+---------+-------------+--------------------+----------------------|5| 0 | 0 | 2454 | 0.946394 | 2454 | 0.946394 |6| 1 | 1 | 126 | 0.0485924 | 2580 | 0.994987 |7| 2 | 2 | 11 | 0.00424219 | 2591 | 0.999229 |8| 3 | 4 | 2 | 0.000771307 | 2593 | 1 |9+----+----------------+---------+-------------+--------------------+----------------------+1011+----+----------------+---------+------------+--------------------+----------------------+12| | monthly_2g_7 | Count | Percent | Cumulative Count | Cumulative Percent |13|----+----------------+---------+------------+--------------------+----------------------|14| 0 | 0 | 2477 | 0.955264 | 2477 | 0.955264 |15| 1 | 1 | 104 | 0.040108 | 2581 | 0.995372 |16| 2 | 2 | 12 | 0.00462784 | 2593 | 1 |17+----+----------------+---------+------------+--------------------+----------------------+1819+----+----------------+---------+-------------+--------------------+----------------------+20| | monthly_2g_8 | Count | Percent | Cumulative Count | Cumulative Percent |21|----+----------------+---------+-------------+--------------------+----------------------|22| 0 | 0 | 2555 | 0.985345 | 2555 | 0.985345 |23| 1 | 1 | 37 | 0.0142692 | 2592 | 0.999614 |24| 2 | 2 | 1 | 0.000385654 | 2593 | 1 |25+----+----------------+---------+-------------+--------------------+----------------------+262728Customers who did not churn (Churn : 0)29+----+----------------+---------+-------------+--------------------+----------------------+30| | monthly_2g_6 | Count | Percent | Cumulative Count | Cumulative Percent |31|----+----------------+---------+-------------+--------------------+----------------------|32| 0 | 0 | 24228 | 0.883653 | 24228 | 0.883653 |33| 1 | 1 | 2825 | 0.103035 | 27053 | 0.986688 |34| 2 | 2 | 334 | 0.0121818 | 27387 | 0.998869 |35| 3 | 3 | 26 | 0.000948282 | 27413 | 0.999818 |36| 4 | 4 | 5 | 0.000182362 | 27418 | 1 |37+----+----------------+---------+-------------+--------------------+----------------------+3839+----+----------------+---------+-------------+--------------------+----------------------+40| | monthly_2g_7 | Count | Percent | Cumulative Count | Cumulative Percent |41|----+----------------+---------+-------------+--------------------+----------------------|42| 0 | 0 | 24079 | 0.878219 | 24079 | 0.878219 |43| 1 | 1 | 2909 | 0.106098 | 26988 | 0.984317 |44| 2 | 2 | 394 | 0.0143701 | 27382 | 0.998687 |45| 3 | 3 | 29 | 0.0010577 | 27411 | 0.999745 |46| 4 | 4 | 5 | 0.000182362 | 27416 | 0.999927 |47| 5 | 5 | 2 | 7.29448e-05 | 27418 | 1 |48+----+----------------+---------+-------------+--------------------+----------------------+4950+----+----------------+---------+-------------+--------------------+----------------------+51| | monthly_2g_8 | Count | Percent | Cumulative Count | Cumulative Percent |52|----+----------------+---------+-------------+--------------------+----------------------|53| 0 | 0 | 24383 | 0.889306 | 24383 | 0.889306 |54| 1 | 1 | 2724 | 0.0993508 | 27107 | 0.988657 |55| 2 | 2 | 282 | 0.0102852 | 27389 | 0.998942 |56| 3 | 3 | 22 | 0.000802393 | 27411 | 0.999745 |57| 4 | 4 | 5 | 0.000182362 | 27416 | 0.999927 |58| 5 | 5 | 2 | 7.29448e-05 | 27418 | 1 |59+----+----------------+---------+-------------+--------------------+----------------------+
monthly_3g_6, monthly_3g_7, monthly_3g_8
1columns = ['monthly_3g_6', 'monthly_3g_7', 'monthly_3g_8']2cat_univariate_analysis(columns)
1Customers who churned (Churn : 1)2+----+----------------+---------+-------------+--------------------+----------------------+3| | monthly_3g_6 | Count | Percent | Cumulative Count | Cumulative Percent |4|----+----------------+---------+-------------+--------------------+----------------------|5| 0 | 0 | 2352 | 0.907057 | 2352 | 0.907057 |6| 1 | 1 | 170 | 0.0655611 | 2522 | 0.972619 |7| 2 | 2 | 49 | 0.018897 | 2571 | 0.991516 |8| 3 | 3 | 13 | 0.0050135 | 2584 | 0.996529 |9| 4 | 5 | 4 | 0.00154261 | 2588 | 0.998072 |10| 5 | 4 | 4 | 0.00154261 | 2592 | 0.999614 |11| 6 | 6 | 1 | 0.000385654 | 2593 | 1 |12+----+----------------+---------+-------------+--------------------+----------------------+1314+----+----------------+---------+-------------+--------------------+----------------------+15| | monthly_3g_7 | Count | Percent | Cumulative Count | Cumulative Percent |16|----+----------------+---------+-------------+--------------------+----------------------|17| 0 | 0 | 2399 | 0.925183 | 2399 | 0.925183 |18| 1 | 1 | 136 | 0.0524489 | 2535 | 0.977632 |19| 2 | 2 | 48 | 0.0185114 | 2583 | 0.996143 |20| 3 | 3 | 9 | 0.00347088 | 2592 | 0.999614 |21| 4 | 5 | 1 | 0.000385654 | 2593 | 1 |22+----+----------------+---------+-------------+--------------------+----------------------+2324+----+----------------+---------+-------------+--------------------+----------------------+25| | monthly_3g_8 | Count | Percent | Cumulative Count | Cumulative Percent |26|----+----------------+---------+-------------+--------------------+----------------------|27| 0 | 0 | 2524 | 0.97339 | 2524 | 0.97339 |28| 1 | 1 | 56 | 0.0215966 | 2580 | 0.994987 |29| 2 | 2 | 8 | 0.00308523 | 2588 | 0.998072 |30| 3 | 3 | 4 | 0.00154261 | 2592 | 0.999614 |31| 4 | 4 | 1 | 0.000385654 | 2593 | 1 |32+----+----------------+---------+-------------+--------------------+----------------------+333435Customers who did not churn (Churn : 0)36+----+----------------+---------+-------------+--------------------+----------------------+37| | monthly_3g_6 | Count | Percent | Cumulative Count | Cumulative Percent |38|----+----------------+---------+-------------+--------------------+----------------------|39| 0 | 0 | 24080 | 0.878255 | 24080 | 0.878255 |40| 1 | 1 | 2371 | 0.086476 | 26451 | 0.964731 |41| 2 | 2 | 648 | 0.0236341 | 27099 | 0.988365 |42| 3 | 3 | 194 | 0.00707564 | 27293 | 0.995441 |43| 4 | 4 | 70 | 0.00255307 | 27363 | 0.997994 |44| 5 | 5 | 28 | 0.00102123 | 27391 | 0.999015 |45| 6 | 6 | 10 | 0.000364724 | 27401 | 0.99938 |46| 7 | 7 | 9 | 0.000328252 | 27410 | 0.999708 |47| 8 | 8 | 3 | 0.000109417 | 27413 | 0.999818 |48| 9 | 11 | 2 | 7.29448e-05 | 27415 | 0.999891 |49| 10 | 9 | 2 | 7.29448e-05 | 27417 | 0.999964 |50| 11 | 14 | 1 | 3.64724e-05 | 27418 | 1 |51+----+----------------+---------+-------------+--------------------+----------------------+5253+----+----------------+---------+-------------+--------------------+----------------------+54| | monthly_3g_7 | Count | Percent | Cumulative Count | Cumulative Percent |55|----+----------------+---------+-------------+--------------------+----------------------|56| 0 | 0 | 23962 | 0.873951 | 23962 | 0.873951 |57| 1 | 1 | 2330 | 0.0849807 | 26292 | 0.958932 |58| 2 | 2 | 774 | 0.0282296 | 27066 | 0.987162 |59| 3 | 3 | 198 | 0.00722153 | 27264 | 0.994383 |60| 4 | 4 | 68 | 0.00248012 | 27332 | 0.996863 |61| 5 | 5 | 38 | 0.00138595 | 27370 | 0.998249 |62| 6 | 6 | 23 | 0.000838865 | 27393 | 0.999088 |63| 7 | 7 | 10 | 0.000364724 | 27403 | 0.999453 |64| 8 | 8 | 5 | 0.000182362 | 27408 | 0.999635 |65| 9 | 9 | 4 | 0.00014589 | 27412 | 0.999781 |66| 10 | 11 | 2 | 7.29448e-05 | 27414 | 0.999854 |67| 11 | 16 | 1 | 3.64724e-05 | 27415 | 0.999891 |68| 12 | 14 | 1 | 3.64724e-05 | 27416 | 0.999927 |69| 13 | 12 | 1 | 3.64724e-05 | 27417 | 0.999964 |70| 14 | 10 | 1 | 3.64724e-05 | 27418 | 1 |71+----+----------------+---------+-------------+--------------------+----------------------+7273+----+----------------+---------+-------------+--------------------+----------------------+74| | monthly_3g_8 | Count | Percent | Cumulative Count | Cumulative Percent |75|----+----------------+---------+-------------+--------------------+----------------------|76| 0 | 0 | 24002 | 0.87541 | 24002 | 0.87541 |77| 1 | 1 | 2347 | 0.0856007 | 26349 | 0.961011 |78| 2 | 2 | 728 | 0.0265519 | 27077 | 0.987563 |79| 3 | 3 | 193 | 0.00703917 | 27270 | 0.994602 |80| 4 | 4 | 86 | 0.00313663 | 27356 | 0.997739 |81| 5 | 5 | 30 | 0.00109417 | 27386 | 0.998833 |82| 6 | 6 | 14 | 0.000510613 | 27400 | 0.999343 |83| 7 | 7 | 9 | 0.000328252 | 27409 | 0.999672 |84| 8 | 9 | 3 | 0.000109417 | 27412 | 0.999781 |85| 9 | 8 | 3 | 0.000109417 | 27415 | 0.999891 |86| 10 | 10 | 2 | 7.29448e-05 | 27417 | 0.999964 |87| 11 | 16 | 1 | 3.64724e-05 | 27418 | 1 |88+----+----------------+---------+-------------+--------------------+----------------------+
sachet_3g_6, sachet_3g_7, sachet_3g_8
1columns = ['sachet_3g_6', 'sachet_3g_7','sachet_3g_8']2print(data[columns].dtypes)3cat_univariate_analysis(columns)
1sachet_3g_6 category2sachet_3g_7 category3sachet_3g_8 category4dtype: object5Customers who churned (Churn : 1)6+----+---------------+---------+-------------+--------------------+----------------------+7| | sachet_3g_6 | Count | Percent | Cumulative Count | Cumulative Percent |8|----+---------------+---------+-------------+--------------------+----------------------|9| 0 | 0 | 2454 | 0.946394 | 2454 | 0.946394 |10| 1 | 1 | 87 | 0.0335519 | 2541 | 0.979946 |11| 2 | 2 | 16 | 0.00617046 | 2557 | 0.986116 |12| 3 | 4 | 11 | 0.00424219 | 2568 | 0.990359 |13| 4 | 3 | 8 | 0.00308523 | 2576 | 0.993444 |14| 5 | 10 | 4 | 0.00154261 | 2580 | 0.994987 |15| 6 | 7 | 4 | 0.00154261 | 2584 | 0.996529 |16| 7 | 6 | 3 | 0.00115696 | 2587 | 0.997686 |17| 8 | 9 | 2 | 0.000771307 | 2589 | 0.998457 |18| 9 | 23 | 1 | 0.000385654 | 2590 | 0.998843 |19| 10 | 19 | 1 | 0.000385654 | 2591 | 0.999229 |20| 11 | 8 | 1 | 0.000385654 | 2592 | 0.999614 |21| 12 | 5 | 1 | 0.000385654 | 2593 | 1 |22+----+---------------+---------+-------------+--------------------+----------------------+2324+----+---------------+---------+-------------+--------------------+----------------------+25| | sachet_3g_7 | Count | Percent | Cumulative Count | Cumulative Percent |26|----+---------------+---------+-------------+--------------------+----------------------|27| 0 | 0 | 2458 | 0.947937 | 2458 | 0.947937 |28| 1 | 1 | 82 | 0.0316236 | 2540 | 0.97956 |29| 2 | 2 | 19 | 0.00732742 | 2559 | 0.986888 |30| 3 | 3 | 8 | 0.00308523 | 2567 | 0.989973 |31| 4 | 5 | 7 | 0.00269958 | 2574 | 0.992673 |32| 5 | 4 | 4 | 0.00154261 | 2578 | 0.994215 |33| 6 | 9 | 3 | 0.00115696 | 2581 | 0.995372 |34| 7 | 6 | 3 | 0.00115696 | 2584 | 0.996529 |35| 8 | 10 | 2 | 0.000771307 | 2586 | 0.9973 |36| 9 | 35 | 1 | 0.000385654 | 2587 | 0.997686 |37| 10 | 24 | 1 | 0.000385654 | 2588 | 0.998072 |38| 11 | 17 | 1 | 0.000385654 | 2589 | 0.998457 |39| 12 | 12 | 1 | 0.000385654 | 2590 | 0.998843 |40| 13 | 11 | 1 | 0.000385654 | 2591 | 0.999229 |41| 14 | 8 | 1 | 0.000385654 | 2592 | 0.999614 |42| 15 | 7 | 1 | 0.000385654 | 2593 | 1 |43+----+---------------+---------+-------------+--------------------+----------------------+4445+----+---------------+---------+-------------+--------------------+----------------------+46| | sachet_3g_8 | Count | Percent | Cumulative Count | Cumulative Percent |47|----+---------------+---------+-------------+--------------------+----------------------|48| 0 | 0 | 2546 | 0.981874 | 2546 | 0.981874 |49| 1 | 1 | 31 | 0.0119553 | 2577 | 0.99383 |50| 2 | 3 | 5 | 0.00192827 | 2582 | 0.995758 |51| 3 | 2 | 3 | 0.00115696 | 2585 | 0.996915 |52| 4 | 8 | 2 | 0.000771307 | 2587 | 0.997686 |53| 5 | 5 | 2 | 0.000771307 | 2589 | 0.998457 |54| 6 | 4 | 2 | 0.000771307 | 2591 | 0.999229 |55| 7 | 16 | 1 | 0.000385654 | 2592 | 0.999614 |56| 8 | 13 | 1 | 0.000385654 | 2593 | 1 |57+----+---------------+---------+-------------+--------------------+----------------------+585960Customers who did not churn (Churn : 0)61+----+---------------+---------+-------------+--------------------+----------------------+62| | sachet_3g_6 | Count | Percent | Cumulative Count | Cumulative Percent |63|----+---------------+---------+-------------+--------------------+----------------------|64| 0 | 0 | 25579 | 0.932927 | 25579 | 0.932927 |65| 1 | 1 | 1220 | 0.0444963 | 26799 | 0.977424 |66| 2 | 2 | 297 | 0.0108323 | 27096 | 0.988256 |67| 3 | 3 | 111 | 0.00404844 | 27207 | 0.992304 |68| 4 | 4 | 55 | 0.00200598 | 27262 | 0.99431 |69| 5 | 5 | 36 | 0.00131301 | 27298 | 0.995623 |70| 6 | 6 | 24 | 0.000875337 | 27322 | 0.996499 |71| 7 | 7 | 22 | 0.000802393 | 27344 | 0.997301 |72| 8 | 8 | 14 | 0.000510613 | 27358 | 0.997812 |73| 9 | 9 | 13 | 0.000474141 | 27371 | 0.998286 |74| 10 | 11 | 8 | 0.000291779 | 27379 | 0.998578 |75| 11 | 10 | 7 | 0.000255307 | 27386 | 0.998833 |76| 12 | 15 | 5 | 0.000182362 | 27391 | 0.999015 |77| 13 | 12 | 4 | 0.00014589 | 27395 | 0.999161 |78| 14 | 19 | 3 | 0.000109417 | 27398 | 0.999271 |79| 15 | 18 | 3 | 0.000109417 | 27401 | 0.99938 |80| 16 | 14 | 3 | 0.000109417 | 27404 | 0.999489 |81| 17 | 13 | 3 | 0.000109417 | 27407 | 0.999599 |82| 18 | 29 | 2 | 7.29448e-05 | 27409 | 0.999672 |83| 19 | 23 | 2 | 7.29448e-05 | 27411 | 0.999745 |84| 20 | 22 | 2 | 7.29448e-05 | 27413 | 0.999818 |85| 21 | 16 | 2 | 7.29448e-05 | 27415 | 0.999891 |86| 22 | 28 | 1 | 3.64724e-05 | 27416 | 0.999927 |87| 23 | 21 | 1 | 3.64724e-05 | 27417 | 0.999964 |88| 24 | 17 | 1 | 3.64724e-05 | 27418 | 1 |89+----+---------------+---------+-------------+--------------------+----------------------+9091+----+---------------+---------+-------------+--------------------+----------------------+92| | sachet_3g_7 | Count | Percent | Cumulative Count | Cumulative Percent |93|----+---------------+---------+-------------+--------------------+----------------------|94| 0 | 0 | 25595 | 0.933511 | 25595 | 0.933511 |95| 1 | 1 | 1151 | 0.0419797 | 26746 | 0.975491 |96| 2 | 2 | 293 | 0.0106864 | 27039 | 0.986177 |97| 3 | 3 | 107 | 0.00390255 | 27146 | 0.99008 |98| 4 | 4 | 68 | 0.00248012 | 27214 | 0.99256 |99| 5 | 5 | 59 | 0.00215187 | 27273 | 0.994712 |100| 6 | 6 | 39 | 0.00142242 | 27312 | 0.996134 |101| 7 | 7 | 17 | 0.000620031 | 27329 | 0.996754 |102| 8 | 9 | 13 | 0.000474141 | 27342 | 0.997228 |103| 9 | 8 | 13 | 0.000474141 | 27355 | 0.997702 |104| 10 | 11 | 12 | 0.000437669 | 27367 | 0.99814 |105| 11 | 12 | 9 | 0.000328252 | 27376 | 0.998468 |106| 12 | 10 | 8 | 0.000291779 | 27384 | 0.99876 |107| 13 | 15 | 5 | 0.000182362 | 27389 | 0.998942 |108| 14 | 14 | 5 | 0.000182362 | 27394 | 0.999125 |109| 15 | 18 | 4 | 0.00014589 | 27398 | 0.999271 |110| 16 | 13 | 4 | 0.00014589 | 27402 | 0.999416 |111| 17 | 22 | 3 | 0.000109417 | 27405 | 0.999526 |112| 18 | 20 | 3 | 0.000109417 | 27408 | 0.999635 |113| 19 | 19 | 3 | 0.000109417 | 27411 | 0.999745 |114| 20 | 21 | 2 | 7.29448e-05 | 27413 | 0.999818 |115| 21 | 33 | 1 | 3.64724e-05 | 27414 | 0.999854 |116| 22 | 31 | 1 | 3.64724e-05 | 27415 | 0.999891 |117| 23 | 24 | 1 | 3.64724e-05 | 27416 | 0.999927 |118| 24 | 17 | 1 | 3.64724e-05 | 27417 | 0.999964 |119| 25 | 16 | 1 | 3.64724e-05 | 27418 | 1 |120+----+---------------+---------+-------------+--------------------+----------------------+121122+----+---------------+---------+-------------+--------------------+----------------------+123| | sachet_3g_8 | Count | Percent | Cumulative Count | Cumulative Percent |124|----+---------------+---------+-------------+--------------------+----------------------|125| 0 | 0 | 25736 | 0.938653 | 25736 | 0.938653 |126| 1 | 1 | 1027 | 0.0374571 | 26763 | 0.976111 |127| 2 | 2 | 249 | 0.00908163 | 27012 | 0.985192 |128| 3 | 3 | 124 | 0.00452258 | 27136 | 0.989715 |129| 4 | 4 | 71 | 0.00258954 | 27207 | 0.992304 |130| 5 | 5 | 64 | 0.00233423 | 27271 | 0.994639 |131| 6 | 6 | 26 | 0.000948282 | 27297 | 0.995587 |132| 7 | 7 | 23 | 0.000838865 | 27320 | 0.996426 |133| 8 | 8 | 20 | 0.000729448 | 27340 | 0.997155 |134| 9 | 9 | 12 | 0.000437669 | 27352 | 0.997593 |135| 10 | 12 | 11 | 0.000401196 | 27363 | 0.997994 |136| 11 | 10 | 10 | 0.000364724 | 27373 | 0.998359 |137| 12 | 13 | 9 | 0.000328252 | 27382 | 0.998687 |138| 13 | 14 | 6 | 0.000218834 | 27388 | 0.998906 |139| 14 | 11 | 6 | 0.000218834 | 27394 | 0.999125 |140| 15 | 15 | 5 | 0.000182362 | 27399 | 0.999307 |141| 16 | 23 | 2 | 7.29448e-05 | 27401 | 0.99938 |142| 17 | 21 | 2 | 7.29448e-05 | 27403 | 0.999453 |143| 18 | 20 | 2 | 7.29448e-05 | 27405 | 0.999526 |144| 19 | 18 | 2 | 7.29448e-05 | 27407 | 0.999599 |145| 20 | 17 | 2 | 7.29448e-05 | 27409 | 0.999672 |146| 21 | 16 | 2 | 7.29448e-05 | 27411 | 0.999745 |147| 22 | 41 | 1 | 3.64724e-05 | 27412 | 0.999781 |148| 23 | 38 | 1 | 3.64724e-05 | 27413 | 0.999818 |149| 24 | 30 | 1 | 3.64724e-05 | 27414 | 0.999854 |150| 25 | 29 | 1 | 3.64724e-05 | 27415 | 0.999891 |151| 26 | 27 | 1 | 3.64724e-05 | 27416 | 0.999927 |152| 27 | 25 | 1 | 3.64724e-05 | 27417 | 0.999964 |153| 28 | 19 | 1 | 3.64724e-05 | 27418 | 1 |154+----+---------------+---------+-------------+--------------------+----------------------+
aug_vbc_3g, jul_vbc_3g, jun_vbc_3g
1columns = [ 'vbc_3g_6', 'vbc_3g_7','vbc_3g_8']2num_univariate_analysis(columns, 'log')
1Customers who churned (Churn : 1)2 vbc_3g_6 vbc_3g_7 vbc_3g_83count 2593.000000 2593.000000 2593.0000004mean 81.564601 71.143880 32.6106595std 320.898511 284.882601 197.9982466min 0.000000 0.000000 0.000000725% 0.000000 0.000000 0.000000850% 0.000000 0.000000 0.000000975% 0.000000 0.000000 0.00000010max 6931.810000 4908.270000 5738.7400001112Customers who did not churn (Churn : 0)13 vbc_3g_6 vbc_3g_7 vbc_3g_814count 27418.000000 27418.000000 27418.00000015mean 125.124167 141.178182 138.59702316std 395.413666 417.292310 402.76177917min 0.000000 0.000000 0.0000001825% 0.000000 0.000000 0.0000001950% 0.000000 0.000000 0.0000002075% 0.000000 9.940000 17.67500021max 11166.210000 9165.600000 12916.220000
Bivariate Analysis
1data.head()
arpu_6 | arpu_7 | arpu_8 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | total_og_mou_6 | total_og_mou_7 | total_og_mou_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | total_ic_mou_6 | total_ic_mou_7 | total_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | total_rech_amt_6 | total_rech_amt_7 | total_rech_amt_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | vol_2g_mb_6 | vol_2g_mb_7 | vol_2g_mb_8 | vol_3g_mb_6 | vol_3g_mb_7 | vol_3g_mb_8 | monthly_2g_6 | monthly_2g_7 | monthly_2g_8 | sachet_2g_6 | sachet_2g_7 | sachet_2g_8 | monthly_3g_6 | monthly_3g_7 | monthly_3g_8 | sachet_3g_6 | sachet_3g_7 | sachet_3g_8 | aon | vbc_3g_8 | vbc_3g_7 | vbc_3g_6 | Average_rech_amt_6n7 | Churn | |
mobile_number | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7000701601 | 1069.180 | 1349.850 | 3171.480 | 57.84 | 54.68 | 52.29 | 453.43 | 567.16 | 325.91 | 16.23 | 33.49 | 31.64 | 23.74 | 12.59 | 38.06 | 51.39 | 31.38 | 40.28 | 308.63 | 447.38 | 162.28 | 62.13 | 55.14 | 53.23 | 0.0 | 0.0 | 0.00 | 422.16 | 533.91 | 255.79 | 4.30 | 23.29 | 12.01 | 49.89 | 31.76 | 49.14 | 6.66 | 20.08 | 16.68 | 60.86 | 75.14 | 77.84 | 0.0 | 0.18 | 10.01 | 4.50 | 0.00 | 6.50 | 0.00 | 0.0 | 0.0 | 487.53 | 609.24 | 350.16 | 58.14 | 32.26 | 27.31 | 217.56 | 221.49 | 121.19 | 152.16 | 101.46 | 39.53 | 427.88 | 355.23 | 188.04 | 36.89 | 11.83 | 30.39 | 91.44 | 126.99 | 141.33 | 52.19 | 34.24 | 22.21 | 180.54 | 173.08 | 193.94 | 626.46 | 558.04 | 428.74 | 0.21 | 0.0 | 0.0 | 2.06 | 14.53 | 31.59 | 15.74 | 15.19 | 15.14 | 5 | 5 | 7 | 1580 | 790 | 3638 | 1580 | 790 | 1580 | 0 | 0 | 779 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 802 | 57.74 | 19.38 | 18.74 | 1185.0 | 1 |
7001524846 | 378.721 | 492.223 | 137.362 | 413.69 | 351.03 | 35.08 | 94.66 | 80.63 | 136.48 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 297.13 | 217.59 | 12.49 | 80.96 | 70.58 | 50.54 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 7.15 | 378.09 | 288.18 | 63.04 | 116.56 | 133.43 | 22.58 | 13.69 | 10.04 | 75.69 | 0.00 | 0.00 | 0.00 | 130.26 | 143.48 | 98.28 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 10.23 | 0.00 | 0.0 | 0.0 | 508.36 | 431.66 | 171.56 | 23.84 | 9.84 | 0.31 | 57.58 | 13.98 | 15.48 | 0.00 | 0.00 | 0.00 | 81.43 | 23.83 | 15.79 | 0.00 | 0.58 | 0.10 | 22.43 | 4.08 | 0.65 | 0.00 | 0.00 | 0.00 | 22.43 | 4.66 | 0.75 | 103.86 | 28.49 | 16.54 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19 | 21 | 14 | 437 | 601 | 120 | 90 | 154 | 30 | 50 | 0 | 10 | 0.0 | 356.0 | 0.03 | 0.0 | 750.95 | 11.94 | 0 | 1 | 0 | 0 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 315 | 21.03 | 910.65 | 122.16 | 519.0 | 0 |
7002191713 | 492.846 | 205.671 | 593.260 | 501.76 | 108.39 | 534.24 | 413.31 | 119.28 | 482.46 | 23.53 | 144.24 | 72.11 | 7.98 | 35.26 | 1.44 | 49.63 | 6.19 | 36.01 | 151.13 | 47.28 | 294.46 | 4.54 | 0.00 | 23.51 | 0.0 | 0.0 | 0.49 | 205.31 | 53.48 | 353.99 | 446.41 | 85.98 | 498.23 | 255.36 | 52.94 | 156.94 | 0.00 | 0.00 | 0.00 | 701.78 | 138.93 | 655.18 | 0.0 | 0.00 | 1.29 | 0.00 | 0.00 | 4.78 | 0.00 | 0.0 | 0.0 | 907.09 | 192.41 | 1015.26 | 67.88 | 7.58 | 52.58 | 142.88 | 18.53 | 195.18 | 4.81 | 0.00 | 7.49 | 215.58 | 26.11 | 255.26 | 115.68 | 38.29 | 154.58 | 308.13 | 29.79 | 317.91 | 0.00 | 0.00 | 1.91 | 423.81 | 68.09 | 474.41 | 968.61 | 172.58 | 1144.53 | 0.45 | 0.0 | 0.0 | 245.28 | 62.11 | 393.39 | 83.48 | 16.24 | 21.44 | 6 | 4 | 11 | 507 | 253 | 717 | 110 | 110 | 130 | 110 | 50 | 0 | 0.0 | 0.0 | 0.02 | 0.0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2607 | 0.00 | 0.00 | 0.00 | 380.0 | 0 |
7000875565 | 430.975 | 299.869 | 187.894 | 50.51 | 74.01 | 70.61 | 296.29 | 229.74 | 162.76 | 0.00 | 2.83 | 0.00 | 0.00 | 17.74 | 0.00 | 42.61 | 65.16 | 67.38 | 273.29 | 145.99 | 128.28 | 0.00 | 4.48 | 10.26 | 0.0 | 0.0 | 0.00 | 315.91 | 215.64 | 205.93 | 7.89 | 2.58 | 3.23 | 22.99 | 64.51 | 18.29 | 0.00 | 0.00 | 0.00 | 30.89 | 67.09 | 21.53 | 0.0 | 0.00 | 0.00 | 0.00 | 3.26 | 5.91 | 0.00 | 0.0 | 0.0 | 346.81 | 286.01 | 233.38 | 41.33 | 71.44 | 28.89 | 226.81 | 149.69 | 150.16 | 8.71 | 8.68 | 32.71 | 276.86 | 229.83 | 211.78 | 68.79 | 78.64 | 6.33 | 18.68 | 73.08 | 73.93 | 0.51 | 0.00 | 2.18 | 87.99 | 151.73 | 82.44 | 364.86 | 381.56 | 294.46 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.23 | 0.00 | 0.00 | 0.00 | 10 | 6 | 2 | 570 | 348 | 160 | 110 | 110 | 130 | 100 | 100 | 130 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 511 | 0.00 | 2.45 | 21.89 | 459.0 | 0 |
7000187447 | 690.008 | 18.980 | 25.499 | 1185.91 | 9.28 | 7.79 | 61.64 | 0.00 | 5.54 | 0.00 | 4.76 | 4.81 | 0.00 | 8.46 | 13.34 | 38.99 | 0.00 | 0.00 | 58.54 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 97.54 | 0.00 | 0.00 | 1146.91 | 0.81 | 0.00 | 1.55 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1148.46 | 0.81 | 0.00 | 0.0 | 0.00 | 0.00 | 2.58 | 0.00 | 0.00 | 0.93 | 0.0 | 0.0 | 1249.53 | 0.81 | 0.00 | 34.54 | 0.00 | 0.00 | 47.41 | 2.31 | 0.00 | 0.00 | 0.00 | 0.00 | 81.96 | 2.31 | 0.00 | 8.63 | 0.00 | 0.00 | 1.28 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.91 | 0.00 | 0.00 | 91.88 | 2.31 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19 | 2 | 4 | 816 | 0 | 30 | 110 | 0 | 30 | 30 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 667 | 0.00 | 0.00 | 0.00 | 408.0 | 0 |
‘total_og_mou_6’ vs ‘total_og_mou_8’ with respect to Churn.
1sns.scatterplot(x=data['total_og_mou_6'],y=data['total_og_mou_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafdb0f7d0>
‘total_og_mou_7’ vs ‘total_og_mou_8’ with respect to Churn.
1sns.scatterplot(x=data['total_og_mou_6'],y=data['total_og_mou_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafaf9d7d0>
- The customers with lower total_og_mou in 6th and 8th months are more likely to Churn compared to the ones with higher total_og_mou.
‘aon’ vs ‘total_og_mou_8’ with respect to Churn.
1sns.scatterplot(x=data['aon'],y=data['total_og_mou_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafcd5fb50>
- The customers with lesser total_og_mou_8 and aon are more likely to churn compared to the one with higher total_og_mou_8 and aon.
1sns.scatterplot(x=data['aon'],y=data['total_ic_mou_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafccb9d90>
- The customers with less total_ic_mou_8 are more likely to churn irrespective of aon.
- The customers with total_ic_mou_8 > 2000 are very less likely to churn.
‘max_rech_amt_6’ vs ‘max_rech_amt_8’ with respect to ‘Churn’.
1sns.scatterplot(x=data['max_rech_amt_6'],y=data['max_rech_amt_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafdd5a950>
Correlation Analysis
1# function to correlate variables2def correlation(dataframe) :34 columnsForAnalysis = set(dataframe.columns.values) - {'Churn'}5 cor0=dataframe[columnsForAnalysis].corr()6 type(cor0)7 cor0.where(np.triu(np.ones(cor0.shape),k=1).astype(np.bool))8 cor0=cor0.unstack().reset_index()9 cor0.columns=['VAR1','VAR2','CORR']10 cor0.dropna(subset=['CORR'], inplace=True)11 cor0.CORR=round(cor0['CORR'],2)12 cor0.CORR=cor0.CORR.abs()13 cor0.sort_values(by=['CORR'],ascending=False)14 cor0=cor0[~(cor0['VAR1']==cor0['VAR2'])]1516 # removing duplicate correlations17 cor0['pair'] = cor0[['VAR1', 'VAR2']].apply(lambda x: '{}-{}'.format(*sorted((x[0], x[1]))), axis=1)1819 cor0 = cor0.drop_duplicates(subset=['pair'], keep='first')20 cor0 = cor0[['VAR1', 'VAR2','CORR']]21 return pd.DataFrame(cor0.sort_values(by=['CORR'],ascending=False))
1# Correlations for Churn : 0 - non churn customers2# Absolute values are reported3pd.set_option('precision', 2)4cor_0 = correlation(non_churned_customers)56# filtering for correlations >= 40%7condition = cor_0['CORR'] > 0.48cor_0 = cor_0[condition]9cor_0.style.background_gradient(cmap='GnBu').hide_index()
VAR1 | VAR2 | CORR |
isd_og_mou_7 | isd_og_mou_8 | 0.96 |
isd_og_mou_6 | isd_og_mou_8 | 0.95 |
isd_og_mou_6 | isd_og_mou_7 | 0.95 |
total_rech_amt_8 | arpu_8 | 0.95 |
total_rech_amt_6 | arpu_6 | 0.94 |
total_rech_amt_7 | arpu_7 | 0.94 |
total_rech_amt_7 | Average_rech_amt_6n7 | 0.91 |
arpu_7 | Average_rech_amt_6n7 | 0.91 |
loc_ic_mou_6 | total_ic_mou_6 | 0.90 |
total_rech_amt_6 | Average_rech_amt_6n7 | 0.90 |
loc_ic_mou_8 | total_ic_mou_8 | 0.89 |
Average_rech_amt_6n7 | arpu_6 | 0.89 |
total_ic_mou_7 | loc_ic_mou_7 | 0.88 |
std_og_t2t_mou_8 | onnet_mou_8 | 0.85 |
loc_ic_mou_8 | loc_ic_t2m_mou_8 | 0.85 |
loc_ic_mou_6 | loc_ic_t2m_mou_6 | 0.85 |
loc_ic_mou_8 | loc_ic_mou_7 | 0.85 |
std_og_t2m_mou_8 | offnet_mou_8 | 0.85 |
std_og_t2t_mou_7 | onnet_mou_7 | 0.84 |
total_og_mou_8 | std_og_mou_8 | 0.84 |
loc_og_mou_7 | loc_og_mou_8 | 0.84 |
std_ic_t2m_mou_8 | std_ic_mou_8 | 0.84 |
std_og_t2t_mou_6 | onnet_mou_6 | 0.84 |
std_og_t2m_mou_7 | offnet_mou_7 | 0.84 |
loc_ic_mou_7 | loc_ic_t2m_mou_7 | 0.83 |
total_og_mou_7 | std_og_mou_7 | 0.83 |
loc_ic_mou_6 | loc_ic_mou_7 | 0.83 |
total_ic_mou_7 | total_ic_mou_8 | 0.83 |
loc_og_t2t_mou_8 | loc_og_t2t_mou_7 | 0.83 |
loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | 0.82 |
std_og_t2t_mou_8 | std_og_t2t_mou_7 | 0.82 |
loc_og_t2m_mou_8 | loc_og_t2m_mou_7 | 0.82 |
loc_ic_t2m_mou_8 | loc_ic_t2m_mou_7 | 0.82 |
onnet_mou_8 | onnet_mou_7 | 0.82 |
std_ic_t2m_mou_6 | std_ic_mou_6 | 0.82 |
loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | 0.81 |
std_og_mou_7 | std_og_mou_8 | 0.81 |
offnet_mou_6 | std_og_t2m_mou_6 | 0.81 |
total_ic_mou_7 | total_ic_mou_6 | 0.81 |
loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | 0.81 |
std_ic_mou_7 | std_ic_t2m_mou_7 | 0.81 |
std_og_mou_6 | total_og_mou_6 | 0.80 |
loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | 0.80 |
loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | 0.80 |
loc_og_mou_7 | loc_og_mou_6 | 0.80 |
loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | 0.79 |
loc_og_t2f_mou_7 | loc_og_t2f_mou_6 | 0.79 |
std_og_t2m_mou_8 | std_og_t2m_mou_7 | 0.79 |
loc_og_mou_6 | loc_og_t2m_mou_6 | 0.79 |
total_rech_num_8 | total_rech_num_7 | 0.78 |
loc_og_t2m_mou_7 | loc_og_t2m_mou_6 | 0.78 |
offnet_mou_8 | offnet_mou_7 | 0.78 |
arpu_8 | Average_rech_amt_6n7 | 0.78 |
loc_og_t2t_mou_8 | loc_og_mou_8 | 0.77 |
total_rech_amt_7 | arpu_8 | 0.77 |
std_og_t2f_mou_7 | std_og_t2f_mou_8 | 0.77 |
total_og_mou_8 | total_og_mou_7 | 0.77 |
loc_og_t2m_mou_8 | loc_og_mou_8 | 0.77 |
arpu_7 | total_rech_amt_8 | 0.77 |
loc_og_mou_7 | loc_og_t2t_mou_7 | 0.77 |
arpu_7 | arpu_8 | 0.77 |
loc_ic_t2m_mou_8 | total_ic_mou_8 | 0.76 |
loc_ic_t2m_mou_6 | total_ic_mou_6 | 0.76 |
std_ic_mou_8 | std_ic_mou_7 | 0.76 |
vol_3g_mb_7 | vol_3g_mb_8 | 0.75 |
std_og_t2m_mou_8 | std_og_mou_8 | 0.75 |
isd_ic_mou_6 | isd_ic_mou_7 | 0.75 |
loc_og_t2t_mou_6 | loc_og_mou_6 | 0.75 |
loc_ic_mou_8 | loc_ic_mou_6 | 0.75 |
total_rech_amt_8 | Average_rech_amt_6n7 | 0.75 |
loc_ic_mou_8 | total_ic_mou_7 | 0.75 |
isd_ic_mou_8 | isd_ic_mou_7 | 0.75 |
std_og_t2m_mou_7 | std_og_t2m_mou_6 | 0.75 |
loc_og_mou_7 | loc_og_t2m_mou_7 | 0.75 |
loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | 0.75 |
std_ic_mou_7 | std_ic_mou_6 | 0.75 |
loc_ic_mou_7 | total_ic_mou_8 | 0.75 |
total_ic_mou_7 | loc_ic_t2m_mou_7 | 0.74 |
std_ic_t2f_mou_7 | std_ic_t2f_mou_6 | 0.74 |
std_og_mou_7 | std_og_t2m_mou_7 | 0.74 |
std_og_t2t_mou_6 | std_og_mou_6 | 0.74 |
loc_ic_mou_7 | total_ic_mou_6 | 0.74 |
std_og_t2t_mou_8 | std_og_mou_8 | 0.74 |
std_ic_t2t_mou_7 | std_ic_t2t_mou_6 | 0.74 |
std_og_mou_6 | std_og_t2m_mou_6 | 0.74 |
std_og_t2t_mou_6 | std_og_t2t_mou_7 | 0.73 |
total_ic_mou_8 | total_ic_mou_6 | 0.73 |
std_ic_t2t_mou_8 | std_ic_t2t_mou_7 | 0.73 |
std_ic_t2m_mou_8 | std_ic_t2m_mou_7 | 0.73 |
loc_og_t2f_mou_6 | loc_og_t2f_mou_8 | 0.73 |
loc_og_mou_8 | loc_og_mou_6 | 0.73 |
total_rech_amt_7 | total_rech_amt_8 | 0.73 |
std_og_mou_7 | std_og_t2t_mou_7 | 0.73 |
std_og_mou_6 | std_og_mou_7 | 0.73 |
onnet_mou_6 | onnet_mou_7 | 0.73 |
loc_ic_mou_8 | loc_ic_t2m_mou_7 | 0.72 |
loc_ic_mou_6 | total_ic_mou_7 | 0.72 |
total_og_mou_8 | offnet_mou_8 | 0.72 |
std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | 0.72 |
loc_og_t2t_mou_8 | loc_og_t2t_mou_6 | 0.72 |
std_ic_t2m_mou_7 | std_ic_t2m_mou_6 | 0.72 |
loc_ic_t2m_mou_6 | loc_ic_t2m_mou_8 | 0.72 |
offnet_mou_6 | offnet_mou_7 | 0.72 |
ic_others_8 | ic_others_7 | 0.71 |
std_og_t2f_mou_7 | std_og_t2f_mou_6 | 0.71 |
vbc_3g_8 | vbc_3g_7 | 0.71 |
total_og_mou_8 | onnet_mou_8 | 0.71 |
total_og_mou_7 | onnet_mou_7 | 0.71 |
vol_3g_mb_7 | vol_3g_mb_6 | 0.71 |
total_og_mou_7 | offnet_mou_7 | 0.70 |
loc_ic_mou_7 | loc_ic_t2m_mou_8 | 0.70 |
arpu_7 | arpu_6 | 0.70 |
std_ic_mou_7 | std_ic_t2t_mou_7 | 0.70 |
loc_ic_t2t_mou_6 | loc_ic_t2t_mou_8 | 0.70 |
offnet_mou_6 | total_og_mou_6 | 0.70 |
onnet_mou_6 | total_og_mou_6 | 0.70 |
total_rech_amt_6 | arpu_7 | 0.70 |
vol_2g_mb_8 | vol_2g_mb_7 | 0.69 |
std_og_t2t_mou_8 | onnet_mou_7 | 0.69 |
loc_ic_mou_6 | loc_ic_t2m_mou_7 | 0.69 |
std_og_t2t_mou_7 | onnet_mou_8 | 0.69 |
total_rech_num_7 | total_rech_num_6 | 0.69 |
loc_og_t2m_mou_8 | loc_og_t2m_mou_6 | 0.69 |
vbc_3g_7 | vbc_3g_6 | 0.69 |
last_day_rch_amt_8 | max_rech_amt_8 | 0.69 |
loc_ic_t2t_mou_7 | loc_ic_mou_7 | 0.68 |
loc_ic_mou_8 | total_ic_mou_6 | 0.68 |
total_rech_amt_7 | arpu_6 | 0.68 |
loc_ic_t2m_mou_6 | loc_ic_mou_7 | 0.68 |
loc_ic_mou_6 | loc_ic_t2t_mou_6 | 0.67 |
loc_ic_mou_8 | loc_ic_t2t_mou_8 | 0.67 |
ic_others_6 | ic_others_7 | 0.67 |
vol_3g_mb_8 | vol_3g_mb_6 | 0.67 |
total_og_mou_7 | total_og_mou_6 | 0.67 |
vol_2g_mb_6 | vol_2g_mb_7 | 0.67 |
loc_ic_t2f_mou_6 | loc_ic_t2f_mou_8 | 0.67 |
std_ic_t2t_mou_6 | std_ic_mou_6 | 0.67 |
std_ic_t2f_mou_8 | std_ic_t2f_mou_6 | 0.67 |
total_og_mou_7 | std_og_mou_8 | 0.66 |
std_ic_mou_8 | std_ic_t2t_mou_8 | 0.66 |
loc_ic_mou_6 | total_ic_mou_8 | 0.66 |
std_og_t2m_mou_8 | offnet_mou_7 | 0.66 |
std_ic_mou_8 | std_ic_mou_6 | 0.66 |
vbc_3g_7 | vol_3g_mb_7 | 0.65 |
offnet_mou_8 | std_og_t2m_mou_7 | 0.65 |
max_rech_amt_6 | last_day_rch_amt_6 | 0.65 |
loc_og_mou_7 | loc_og_t2t_mou_8 | 0.65 |
std_og_t2f_mou_6 | std_og_t2f_mou_8 | 0.65 |
total_rech_amt_8 | arpu_6 | 0.64 |
std_ic_mou_8 | std_ic_t2m_mou_7 | 0.64 |
total_ic_mou_8 | loc_ic_t2m_mou_7 | 0.64 |
total_rech_amt_6 | total_rech_amt_7 | 0.64 |
roam_og_mou_6 | roam_ic_mou_6 | 0.64 |
total_og_mou_8 | std_og_mou_7 | 0.64 |
loc_og_mou_8 | loc_og_t2t_mou_7 | 0.64 |
total_og_mou_8 | std_og_t2m_mou_8 | 0.64 |
loc_ic_mou_8 | loc_ic_t2m_mou_6 | 0.64 |
arpu_8 | arpu_6 | 0.64 |
roam_ic_mou_7 | roam_og_mou_7 | 0.63 |
std_ic_t2m_mou_7 | std_ic_mou_6 | 0.63 |
total_rech_amt_6 | total_rech_amt_8 | 0.63 |
loc_og_mou_8 | loc_og_t2m_mou_7 | 0.63 |
total_rech_amt_6 | arpu_8 | 0.63 |
std_og_t2t_mou_8 | std_og_t2t_mou_6 | 0.63 |
loc_og_mou_7 | loc_og_t2t_mou_6 | 0.63 |
std_ic_t2m_mou_8 | std_ic_t2m_mou_6 | 0.63 |
total_og_mou_7 | std_og_t2m_mou_7 | 0.63 |
onnet_mou_6 | onnet_mou_8 | 0.63 |
loc_ic_t2m_mou_7 | total_ic_mou_6 | 0.63 |
vbc_3g_8 | vol_3g_mb_8 | 0.63 |
total_ic_mou_7 | loc_ic_t2m_mou_8 | 0.63 |
loc_ic_mou_6 | loc_ic_t2m_mou_8 | 0.63 |
vbc_3g_6 | vol_3g_mb_6 | 0.63 |
ic_others_8 | ic_others_6 | 0.63 |
loc_og_mou_7 | loc_og_t2m_mou_8 | 0.63 |
std_og_t2t_mou_8 | std_og_mou_7 | 0.62 |
isd_ic_mou_8 | isd_ic_mou_6 | 0.62 |
vbc_3g_8 | vbc_3g_6 | 0.61 |
total_og_mou_8 | std_og_t2t_mou_8 | 0.61 |
std_og_t2t_mou_6 | onnet_mou_7 | 0.61 |
offnet_mou_7 | std_og_t2m_mou_6 | 0.61 |
std_og_mou_8 | onnet_mou_8 | 0.61 |
loc_og_mou_7 | loc_og_t2m_mou_6 | 0.61 |
max_rech_amt_7 | last_day_rch_amt_7 | 0.61 |
std_og_mou_6 | std_og_mou_8 | 0.61 |
loc_og_mou_6 | loc_og_t2m_mou_7 | 0.61 |
std_ic_t2m_mou_8 | std_ic_mou_7 | 0.61 |
roam_ic_mou_8 | roam_og_mou_8 | 0.60 |
std_og_mou_8 | std_og_t2t_mou_7 | 0.60 |
total_rech_num_8 | total_rech_num_6 | 0.60 |
total_og_mou_7 | std_og_t2t_mou_7 | 0.60 |
std_og_mou_8 | std_og_t2m_mou_7 | 0.60 |
max_rech_amt_8 | max_rech_amt_6 | 0.60 |
std_og_mou_8 | offnet_mou_8 | 0.60 |
loc_ic_t2t_mou_7 | total_ic_mou_7 | 0.60 |
onnet_mou_6 | std_og_t2t_mou_7 | 0.60 |
std_og_t2m_mou_8 | std_og_t2m_mou_6 | 0.60 |
total_og_mou_6 | std_og_t2m_mou_6 | 0.60 |
loc_og_mou_6 | loc_og_t2t_mou_7 | 0.60 |
loc_ic_t2m_mou_6 | total_ic_mou_7 | 0.60 |
std_ic_t2t_mou_8 | std_ic_t2t_mou_6 | 0.59 |
loc_ic_t2t_mou_8 | loc_ic_mou_7 | 0.59 |
std_og_mou_6 | onnet_mou_6 | 0.59 |
loc_ic_t2t_mou_8 | total_ic_mou_8 | 0.59 |
loc_ic_t2t_mou_6 | total_ic_mou_6 | 0.59 |
std_og_mou_7 | onnet_mou_7 | 0.59 |
offnet_mou_6 | offnet_mou_8 | 0.59 |
std_og_mou_7 | std_og_t2m_mou_8 | 0.59 |
loc_ic_t2m_mou_8 | total_ic_mou_6 | 0.58 |
roam_og_mou_8 | roam_og_mou_7 | 0.58 |
std_og_t2t_mou_6 | total_og_mou_6 | 0.58 |
offnet_mou_6 | std_og_t2m_mou_7 | 0.58 |
total_og_mou_7 | onnet_mou_8 | 0.58 |
std_ic_mou_7 | std_ic_t2m_mou_6 | 0.58 |
loc_ic_t2t_mou_6 | loc_ic_mou_7 | 0.57 |
total_og_mou_7 | std_og_mou_6 | 0.57 |
std_og_mou_7 | offnet_mou_7 | 0.57 |
loc_og_t2t_mou_8 | loc_og_mou_6 | 0.57 |
spl_og_mou_7 | spl_og_mou_8 | 0.57 |
max_rech_amt_7 | max_rech_amt_8 | 0.56 |
std_ic_t2m_mou_8 | std_ic_mou_6 | 0.56 |
total_og_mou_8 | onnet_mou_7 | 0.56 |
roam_ic_mou_8 | roam_ic_mou_7 | 0.56 |
loc_ic_t2m_mou_6 | total_ic_mou_8 | 0.56 |
loc_og_mou_8 | loc_og_t2t_mou_6 | 0.56 |
spl_og_mou_6 | spl_og_mou_7 | 0.56 |
std_og_mou_7 | std_og_t2m_mou_6 | 0.56 |
loc_og_mou_8 | loc_og_t2m_mou_6 | 0.56 |
loc_ic_mou_8 | loc_ic_t2t_mou_7 | 0.56 |
loc_ic_mou_6 | loc_ic_t2t_mou_7 | 0.55 |
loc_og_t2m_mou_8 | loc_og_mou_6 | 0.55 |
std_ic_mou_7 | std_ic_t2t_mou_6 | 0.55 |
total_og_mou_8 | total_og_mou_6 | 0.55 |
total_og_mou_7 | offnet_mou_8 | 0.54 |
std_og_mou_6 | std_og_t2t_mou_7 | 0.54 |
std_ic_mou_8 | std_ic_t2m_mou_6 | 0.54 |
total_og_mou_8 | offnet_mou_7 | 0.54 |
std_og_mou_6 | offnet_mou_6 | 0.54 |
std_og_mou_6 | std_og_t2m_mou_7 | 0.54 |
std_og_t2t_mou_6 | std_og_mou_7 | 0.53 |
isd_og_mou_7 | Average_rech_amt_6n7 | 0.53 |
std_og_t2t_mou_6 | onnet_mou_8 | 0.53 |
loc_og_t2c_mou_7 | spl_og_mou_7 | 0.53 |
loc_og_t2c_mou_8 | loc_og_t2c_mou_7 | 0.53 |
std_ic_mou_7 | std_ic_t2t_mou_8 | 0.53 |
std_og_mou_7 | total_og_mou_6 | 0.53 |
std_og_t2t_mou_8 | onnet_mou_6 | 0.52 |
vol_2g_mb_6 | vol_2g_mb_8 | 0.52 |
arpu_7 | isd_og_mou_7 | 0.52 |
total_og_mou_6 | arpu_6 | 0.51 |
vol_3g_mb_7 | vbc_3g_6 | 0.51 |
loc_ic_t2t_mou_8 | total_ic_mou_7 | 0.51 |
loc_ic_mou_6 | loc_ic_t2t_mou_8 | 0.51 |
total_og_mou_8 | arpu_8 | 0.51 |
vbc_3g_8 | vol_3g_mb_7 | 0.51 |
total_rech_amt_7 | isd_og_mou_7 | 0.50 |
roam_og_mou_6 | roam_og_mou_7 | 0.50 |
std_og_mou_7 | onnet_mou_8 | 0.50 |
loc_ic_mou_8 | loc_ic_t2t_mou_6 | 0.50 |
std_ic_mou_8 | std_ic_t2t_mou_7 | 0.50 |
Average_rech_amt_6n7 | isd_og_mou_8 | 0.50 |
loc_ic_t2t_mou_6 | total_ic_mou_7 | 0.50 |
std_ic_t2t_mou_7 | std_ic_mou_6 | 0.50 |
loc_ic_t2m_mou_6 | loc_og_t2m_mou_6 | 0.50 |
isd_og_mou_6 | Average_rech_amt_6n7 | 0.50 |
max_rech_amt_7 | max_rech_amt_6 | 0.50 |
total_og_mou_8 | total_rech_amt_8 | 0.49 |
std_og_t2t_mou_8 | total_og_mou_7 | 0.49 |
loc_og_t2m_mou_7 | loc_ic_t2m_mou_7 | 0.49 |
loc_ic_t2t_mou_7 | total_ic_mou_8 | 0.49 |
vbc_3g_7 | vol_3g_mb_8 | 0.49 |
total_og_mou_7 | std_og_t2m_mou_8 | 0.49 |
total_rech_amt_6 | total_og_mou_6 | 0.49 |
std_og_mou_8 | onnet_mou_7 | 0.49 |
loc_og_t2m_mou_8 | loc_ic_t2m_mou_8 | 0.49 |
loc_ic_t2t_mou_7 | total_ic_mou_6 | 0.49 |
total_rech_amt_8 | isd_og_mou_8 | 0.49 |
spl_og_mou_6 | loc_og_t2c_mou_6 | 0.48 |
arpu_7 | isd_og_mou_8 | 0.48 |
total_rech_amt_8 | isd_og_mou_7 | 0.48 |
offnet_mou_8 | std_og_t2m_mou_6 | 0.48 |
max_rech_amt_8 | total_rech_amt_8 | 0.48 |
arpu_8 | isd_og_mou_8 | 0.48 |
isd_og_mou_6 | arpu_6 | 0.48 |
total_og_mou_7 | arpu_7 | 0.48 |
total_og_mou_8 | std_og_t2m_mou_7 | 0.48 |
total_og_mou_7 | onnet_mou_6 | 0.48 |
total_og_mou_6 | onnet_mou_7 | 0.48 |
total_og_mou_7 | offnet_mou_6 | 0.48 |
total_og_mou_8 | std_og_t2t_mou_7 | 0.47 |
offnet_mou_6 | loc_og_t2m_mou_6 | 0.47 |
vbc_3g_6 | vol_3g_mb_8 | 0.47 |
isd_og_mou_7 | arpu_6 | 0.47 |
std_og_t2t_mou_8 | std_og_mou_6 | 0.47 |
loc_og_t2t_mou_6 | onnet_mou_6 | 0.47 |
offnet_mou_6 | std_og_t2m_mou_8 | 0.47 |
arpu_8 | offnet_mou_8 | 0.47 |
loc_og_t2t_mou_7 | onnet_mou_7 | 0.47 |
total_og_mou_6 | offnet_mou_7 | 0.47 |
isd_og_mou_6 | total_rech_amt_6 | 0.47 |
total_rech_amt_6 | isd_og_mou_7 | 0.46 |
loc_og_t2c_mou_8 | spl_og_mou_8 | 0.46 |
roam_ic_mou_7 | roam_ic_mou_6 | 0.46 |
loc_og_t2t_mou_8 | onnet_mou_8 | 0.46 |
std_og_mou_8 | std_og_t2m_mou_6 | 0.46 |
max_rech_amt_7 | total_rech_amt_7 | 0.46 |
total_og_mou_8 | std_og_mou_6 | 0.46 |
arpu_6 | isd_og_mou_8 | 0.46 |
isd_og_mou_6 | arpu_7 | 0.46 |
std_ic_mou_8 | total_ic_mou_8 | 0.46 |
total_og_mou_7 | total_rech_amt_7 | 0.46 |
arpu_8 | isd_og_mou_7 | 0.46 |
total_rech_amt_8 | offnet_mou_8 | 0.46 |
offnet_mou_6 | arpu_6 | 0.46 |
vbc_3g_7 | vol_3g_mb_6 | 0.46 |
total_rech_amt_7 | isd_og_mou_8 | 0.46 |
total_og_mou_7 | std_og_t2m_mou_6 | 0.45 |
loc_ic_t2t_mou_8 | total_ic_mou_6 | 0.45 |
std_ic_mou_7 | total_ic_mou_7 | 0.45 |
total_rech_amt_6 | isd_og_mou_8 | 0.45 |
loc_ic_mou_6 | loc_og_mou_6 | 0.45 |
std_og_mou_8 | offnet_mou_7 | 0.45 |
std_og_t2t_mou_6 | std_og_mou_8 | 0.45 |
total_rech_amt_6 | offnet_mou_6 | 0.45 |
std_og_mou_6 | std_og_t2m_mou_8 | 0.44 |
loc_ic_mou_8 | loc_og_t2m_mou_8 | 0.44 |
std_ic_mou_8 | std_ic_t2t_mou_6 | 0.44 |
loc_ic_mou_6 | loc_og_t2m_mou_6 | 0.44 |
loc_og_mou_6 | total_og_mou_6 | 0.44 |
std_og_mou_7 | offnet_mou_8 | 0.44 |
std_og_mou_8 | total_og_mou_6 | 0.44 |
arpu_7 | offnet_mou_7 | 0.44 |
loc_ic_mou_8 | loc_og_mou_8 | 0.44 |
isd_og_mou_6 | total_rech_amt_8 | 0.44 |
loc_og_t2m_mou_8 | offnet_mou_8 | 0.44 |
std_ic_mou_6 | total_ic_mou_6 | 0.44 |
std_og_mou_6 | onnet_mou_7 | 0.43 |
total_rech_amt_7 | offnet_mou_7 | 0.43 |
isd_og_mou_6 | total_rech_amt_7 | 0.43 |
loc_ic_t2t_mou_6 | total_ic_mou_8 | 0.43 |
loc_og_t2m_mou_7 | loc_ic_t2m_mou_8 | 0.43 |
vbc_3g_8 | vol_3g_mb_6 | 0.43 |
loc_og_t2m_mou_8 | loc_ic_t2m_mou_7 | 0.43 |
total_rech_amt_6 | max_rech_amt_6 | 0.43 |
isd_og_mou_6 | arpu_8 | 0.43 |
loc_og_t2m_mou_7 | loc_ic_mou_7 | 0.42 |
loc_og_mou_7 | loc_ic_mou_7 | 0.42 |
std_ic_t2t_mou_8 | std_ic_mou_6 | 0.42 |
onnet_mou_8 | total_og_mou_6 | 0.42 |
Average_rech_amt_6n7 | total_og_mou_6 | 0.42 |
loc_og_t2m_mou_6 | loc_ic_t2m_mou_7 | 0.42 |
max_rech_amt_8 | last_day_rch_amt_6 | 0.42 |
total_og_mou_7 | Average_rech_amt_6n7 | 0.42 |
total_og_mou_8 | loc_og_mou_8 | 0.42 |
loc_og_t2m_mou_7 | offnet_mou_7 | 0.42 |
loc_ic_t2m_mou_6 | loc_og_t2m_mou_7 | 0.42 |
total_og_mou_8 | onnet_mou_6 | 0.41 |
spl_og_mou_6 | spl_og_mou_8 | 0.41 |
offnet_mou_6 | Average_rech_amt_6n7 | 0.41 |
last_day_rch_amt_8 | max_rech_amt_7 | 0.41 |
last_day_rch_amt_8 | max_rech_amt_6 | 0.41 |
loc_ic_t2m_mou_6 | loc_og_mou_6 | 0.41 |
1# Correlations for Churn : 1 - churned customers2# Absolute values are reported3pd.set_option('precision', 2)4cor_1 = correlation(churned_customers)56# filtering for correlations >= 40%7condition = cor_1['CORR'] > 0.48cor_1 = cor_1[condition]9cor_1.style.background_gradient(cmap='GnBu').hide_index()
VAR1 | VAR2 | CORR |
og_others_8 | og_others_7 | 1.00 |
total_rech_amt_8 | arpu_8 | 0.96 |
total_rech_amt_6 | arpu_6 | 0.95 |
total_rech_amt_7 | arpu_7 | 0.95 |
total_og_mou_8 | std_og_mou_8 | 0.95 |
std_og_t2t_mou_7 | onnet_mou_7 | 0.95 |
total_og_mou_7 | std_og_mou_7 | 0.94 |
loc_og_t2f_mou_6 | og_others_8 | 0.93 |
loc_og_t2f_mou_7 | loc_og_t2f_mou_6 | 0.93 |
std_og_t2t_mou_8 | onnet_mou_8 | 0.93 |
loc_og_t2f_mou_6 | og_others_7 | 0.93 |
offnet_mou_6 | std_og_t2m_mou_6 | 0.92 |
std_og_t2t_mou_6 | onnet_mou_6 | 0.92 |
std_ic_t2m_mou_8 | std_ic_mou_8 | 0.92 |
std_og_mou_6 | total_og_mou_6 | 0.92 |
std_og_t2m_mou_7 | offnet_mou_7 | 0.92 |
loc_og_t2f_mou_7 | og_others_8 | 0.91 |
loc_og_t2f_mou_7 | og_others_7 | 0.91 |
loc_ic_mou_8 | loc_ic_t2m_mou_8 | 0.90 |
loc_ic_mou_6 | loc_ic_t2m_mou_6 | 0.90 |
loc_ic_mou_8 | total_ic_mou_8 | 0.89 |
loc_og_t2m_mou_8 | loc_og_mou_8 | 0.88 |
std_og_t2m_mou_8 | offnet_mou_8 | 0.87 |
loc_ic_mou_6 | total_ic_mou_6 | 0.87 |
total_ic_mou_7 | loc_ic_mou_7 | 0.86 |
loc_og_mou_7 | loc_og_t2m_mou_7 | 0.84 |
loc_ic_mou_7 | loc_ic_t2m_mou_7 | 0.84 |
std_ic_mou_7 | std_ic_t2m_mou_7 | 0.82 |
loc_ic_t2m_mou_8 | total_ic_mou_8 | 0.81 |
std_og_t2t_mou_8 | std_og_mou_8 | 0.79 |
std_ic_t2t_mou_7 | std_ic_t2t_mou_6 | 0.78 |
arpu_7 | Average_rech_amt_6n7 | 0.77 |
std_ic_t2m_mou_6 | std_ic_mou_6 | 0.77 |
loc_og_mou_6 | loc_og_t2m_mou_6 | 0.77 |
loc_ic_t2m_mou_6 | total_ic_mou_6 | 0.77 |
total_rech_amt_6 | Average_rech_amt_6n7 | 0.76 |
total_rech_amt_7 | Average_rech_amt_6n7 | 0.76 |
total_og_mou_8 | std_og_t2t_mou_8 | 0.75 |
loc_og_t2t_mou_6 | loc_og_mou_6 | 0.75 |
total_og_mou_8 | onnet_mou_8 | 0.74 |
std_og_mou_8 | onnet_mou_8 | 0.74 |
std_og_mou_7 | std_og_t2m_mou_7 | 0.74 |
loc_og_t2t_mou_8 | loc_og_t2t_mou_7 | 0.73 |
Average_rech_amt_6n7 | arpu_6 | 0.73 |
loc_ic_mou_8 | loc_ic_t2t_mou_8 | 0.73 |
std_ic_t2t_mou_6 | std_ic_mou_6 | 0.72 |
total_ic_mou_7 | loc_ic_t2m_mou_7 | 0.72 |
loc_ic_mou_6 | loc_ic_t2t_mou_6 | 0.72 |
max_rech_amt_6 | last_day_rch_amt_6 | 0.72 |
total_og_mou_7 | offnet_mou_7 | 0.72 |
std_og_mou_6 | std_og_t2m_mou_6 | 0.72 |
roam_ic_mou_8 | roam_ic_mou_7 | 0.72 |
std_og_t2m_mou_8 | std_og_mou_8 | 0.71 |
total_og_mou_8 | offnet_mou_8 | 0.70 |
last_day_rch_amt_8 | max_rech_amt_8 | 0.70 |
total_og_mou_7 | std_og_t2m_mou_7 | 0.69 |
loc_og_mou_7 | loc_og_t2t_mou_7 | 0.69 |
std_og_mou_7 | std_og_t2t_mou_7 | 0.69 |
loc_ic_t2t_mou_7 | loc_ic_mou_7 | 0.69 |
max_rech_amt_8 | total_rech_amt_8 | 0.68 |
std_og_t2t_mou_6 | std_og_mou_6 | 0.68 |
offnet_mou_6 | total_og_mou_6 | 0.68 |
loc_og_t2t_mou_8 | loc_og_mou_8 | 0.68 |
total_og_mou_8 | std_og_t2m_mou_8 | 0.68 |
loc_og_t2c_mou_7 | spl_og_mou_7 | 0.68 |
total_og_mou_6 | std_og_t2m_mou_6 | 0.67 |
std_og_t2t_mou_6 | std_og_t2t_mou_7 | 0.67 |
vol_3g_mb_7 | vol_3g_mb_8 | 0.67 |
loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | 0.67 |
std_og_mou_7 | offnet_mou_7 | 0.66 |
total_og_mou_7 | onnet_mou_7 | 0.66 |
onnet_mou_6 | total_og_mou_6 | 0.65 |
roam_og_mou_8 | roam_og_mou_7 | 0.65 |
loc_og_t2m_mou_8 | loc_ic_t2m_mou_8 | 0.65 |
std_ic_mou_7 | std_ic_t2t_mou_7 | 0.65 |
loc_ic_mou_8 | loc_og_mou_8 | 0.65 |
std_og_mou_7 | onnet_mou_7 | 0.65 |
total_og_mou_7 | std_og_t2t_mou_7 | 0.64 |
onnet_mou_6 | onnet_mou_7 | 0.64 |
loc_og_mou_8 | loc_ic_t2m_mou_8 | 0.64 |
loc_ic_t2t_mou_8 | total_ic_mou_8 | 0.64 |
std_og_t2t_mou_6 | onnet_mou_7 | 0.63 |
loc_og_mou_7 | loc_og_mou_8 | 0.63 |
std_og_mou_6 | offnet_mou_6 | 0.63 |
roam_og_mou_6 | roam_ic_mou_6 | 0.63 |
std_og_mou_8 | offnet_mou_8 | 0.63 |
loc_ic_t2t_mou_6 | total_ic_mou_6 | 0.63 |
loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | 0.62 |
vbc_3g_6 | vol_3g_mb_6 | 0.62 |
onnet_mou_8 | onnet_mou_7 | 0.62 |
roam_ic_mou_7 | roam_ic_mou_6 | 0.62 |
std_og_t2t_mou_6 | total_og_mou_6 | 0.62 |
std_og_t2m_mou_7 | std_og_t2m_mou_6 | 0.62 |
max_rech_amt_8 | arpu_8 | 0.62 |
vbc_3g_8 | vbc_3g_7 | 0.61 |
loc_og_mou_8 | total_ic_mou_8 | 0.61 |
loc_og_t2m_mou_7 | loc_og_t2m_mou_6 | 0.61 |
std_og_t2t_mou_8 | std_og_t2t_mou_7 | 0.61 |
roam_og_mou_6 | roam_og_mou_7 | 0.61 |
std_og_mou_6 | onnet_mou_6 | 0.61 |
onnet_mou_6 | std_og_t2t_mou_7 | 0.61 |
isd_og_mou_7 | isd_og_mou_8 | 0.60 |
std_ic_mou_7 | std_ic_mou_6 | 0.60 |
total_og_mou_8 | arpu_8 | 0.60 |
std_og_t2t_mou_8 | onnet_mou_7 | 0.60 |
std_og_t2f_mou_7 | std_og_t2f_mou_8 | 0.60 |
loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | 0.60 |
loc_og_t2m_mou_8 | loc_og_t2m_mou_7 | 0.59 |
loc_og_mou_7 | loc_og_mou_6 | 0.59 |
arpu_8 | offnet_mou_8 | 0.59 |
max_rech_amt_7 | last_day_rch_amt_7 | 0.59 |
loc_ic_mou_8 | loc_ic_mou_7 | 0.58 |
std_og_mou_7 | std_og_mou_8 | 0.58 |
loc_ic_t2t_mou_7 | total_ic_mou_7 | 0.58 |
loc_og_t2m_mou_7 | loc_ic_t2m_mou_7 | 0.58 |
std_og_t2m_mou_8 | std_og_t2m_mou_7 | 0.58 |
std_og_mou_6 | std_og_mou_7 | 0.58 |
total_og_mou_8 | total_rech_amt_8 | 0.58 |
loc_ic_mou_8 | loc_og_t2m_mou_8 | 0.58 |
total_rech_amt_8 | offnet_mou_8 | 0.58 |
offnet_mou_6 | offnet_mou_7 | 0.57 |
loc_ic_t2m_mou_8 | loc_ic_t2m_mou_7 | 0.57 |
total_og_mou_8 | total_rech_num_8 | 0.57 |
loc_ic_mou_6 | loc_ic_mou_7 | 0.57 |
loc_og_t2c_mou_8 | spl_og_mou_8 | 0.57 |
isd_ic_mou_6 | isd_ic_mou_7 | 0.57 |
arpu_7 | isd_og_mou_7 | 0.57 |
offnet_mou_8 | offnet_mou_7 | 0.57 |
vol_3g_mb_7 | vol_3g_mb_6 | 0.57 |
loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | 0.56 |
loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | 0.56 |
total_og_mou_8 | total_og_mou_7 | 0.56 |
std_ic_mou_8 | total_ic_mou_8 | 0.56 |
std_og_t2t_mou_7 | onnet_mou_8 | 0.56 |
spl_og_mou_6 | loc_og_t2c_mou_6 | 0.56 |
total_rech_amt_7 | isd_og_mou_7 | 0.56 |
vbc_3g_7 | vol_3g_mb_7 | 0.56 |
ic_others_6 | ic_others_7 | 0.55 |
total_og_mou_7 | std_og_mou_8 | 0.55 |
total_ic_mou_7 | total_ic_mou_6 | 0.55 |
offnet_mou_7 | std_og_t2m_mou_6 | 0.55 |
total_rech_num_8 | total_rech_amt_8 | 0.55 |
total_rech_num_8 | total_rech_num_7 | 0.54 |
loc_og_t2m_mou_8 | total_ic_mou_8 | 0.54 |
total_rech_num_8 | arpu_8 | 0.54 |
total_ic_mou_7 | total_ic_mou_8 | 0.54 |
std_ic_t2t_mou_8 | std_ic_t2t_mou_7 | 0.54 |
std_ic_mou_7 | total_ic_mou_7 | 0.54 |
total_rech_num_8 | std_og_mou_8 | 0.54 |
loc_og_t2c_mou_7 | loc_og_t2c_mou_6 | 0.54 |
std_ic_mou_8 | std_ic_t2t_mou_8 | 0.54 |
offnet_mou_6 | std_og_t2m_mou_7 | 0.54 |
loc_ic_mou_6 | loc_ic_t2m_mou_7 | 0.54 |
std_ic_t2t_mou_7 | std_ic_mou_6 | 0.54 |
std_og_t2m_mou_8 | offnet_mou_7 | 0.54 |
loc_ic_t2m_mou_6 | loc_ic_mou_7 | 0.54 |
vbc_3g_7 | vbc_3g_6 | 0.53 |
vol_2g_mb_6 | vol_2g_mb_7 | 0.53 |
isd_og_mou_6 | arpu_6 | 0.53 |
std_ic_mou_6 | total_ic_mou_6 | 0.53 |
loc_og_mou_8 | loc_og_t2m_mou_7 | 0.52 |
total_og_mou_8 | std_og_mou_7 | 0.52 |
isd_og_mou_6 | total_rech_amt_6 | 0.52 |
std_ic_mou_8 | std_ic_mou_7 | 0.52 |
total_rech_amt_7 | arpu_8 | 0.51 |
loc_og_mou_7 | loc_ic_t2m_mou_7 | 0.51 |
total_og_mou_7 | arpu_7 | 0.51 |
total_og_mou_7 | std_og_mou_6 | 0.51 |
loc_ic_mou_8 | loc_ic_t2m_mou_7 | 0.51 |
roam_ic_mou_7 | roam_og_mou_7 | 0.51 |
arpu_8 | std_og_mou_8 | 0.51 |
total_og_mou_7 | total_og_mou_6 | 0.51 |
loc_og_mou_7 | loc_og_t2m_mou_6 | 0.51 |
loc_og_mou_7 | loc_og_t2m_mou_8 | 0.51 |
arpu_7 | arpu_8 | 0.50 |
loc_ic_t2m_mou_6 | loc_og_t2m_mou_6 | 0.50 |
offnet_mou_8 | std_og_t2m_mou_7 | 0.50 |
std_ic_t2m_mou_8 | std_ic_t2m_mou_7 | 0.50 |
std_og_mou_7 | total_og_mou_6 | 0.50 |
total_rech_amt_8 | onnet_mou_8 | 0.50 |
last_day_rch_amt_8 | total_rech_amt_8 | 0.50 |
loc_og_mou_7 | loc_og_t2t_mou_8 | 0.50 |
loc_ic_mou_8 | total_ic_mou_7 | 0.50 |
loc_ic_mou_7 | loc_ic_t2m_mou_8 | 0.50 |
total_rech_amt_8 | std_og_mou_8 | 0.50 |
arpu_8 | onnet_mou_8 | 0.50 |
std_og_t2f_mou_7 | loc_og_t2f_mou_7 | 0.50 |
loc_og_mou_7 | loc_ic_mou_7 | 0.50 |
max_rech_amt_7 | total_rech_amt_7 | 0.50 |
std_ic_t2m_mou_7 | std_ic_t2m_mou_6 | 0.50 |
loc_ic_mou_8 | loc_ic_t2f_mou_8 | 0.49 |
vbc_3g_8 | vol_3g_mb_8 | 0.49 |
std_ic_t2t_mou_8 | std_ic_t2t_mou_6 | 0.49 |
loc_og_t2m_mou_7 | loc_ic_mou_7 | 0.49 |
vol_2g_mb_8 | vol_2g_mb_7 | 0.49 |
loc_ic_mou_7 | total_ic_mou_6 | 0.49 |
loc_ic_mou_7 | total_ic_mou_8 | 0.49 |
std_og_t2f_mou_7 | og_others_7 | 0.48 |
vol_3g_mb_8 | vol_3g_mb_6 | 0.48 |
isd_og_mou_7 | isd_ic_mou_7 | 0.48 |
std_og_t2f_mou_7 | loc_og_t2f_mou_6 | 0.48 |
std_og_t2f_mou_7 | og_others_8 | 0.48 |
loc_og_t2t_mou_8 | loc_ic_t2t_mou_8 | 0.48 |
std_ic_t2m_mou_8 | total_ic_mou_8 | 0.48 |
isd_ic_mou_8 | isd_ic_mou_7 | 0.48 |
arpu_7 | total_rech_amt_8 | 0.48 |
total_og_mou_7 | total_rech_amt_7 | 0.48 |
std_ic_mou_7 | std_ic_t2t_mou_6 | 0.48 |
loc_ic_t2m_mou_7 | total_ic_mou_6 | 0.47 |
loc_ic_t2t_mou_8 | loc_ic_mou_7 | 0.47 |
total_rech_num_8 | onnet_mou_8 | 0.47 |
total_og_mou_6 | arpu_6 | 0.47 |
total_og_mou_8 | loc_og_mou_8 | 0.47 |
std_ic_mou_8 | std_ic_t2m_mou_7 | 0.46 |
total_og_mou_7 | total_rech_num_7 | 0.46 |
loc_ic_t2f_mou_6 | loc_ic_t2f_mou_8 | 0.46 |
loc_ic_mou_6 | total_ic_mou_7 | 0.46 |
last_day_rch_amt_8 | roam_og_mou_8 | 0.46 |
total_rech_amt_6 | max_rech_amt_6 | 0.46 |
std_og_mou_8 | std_og_t2t_mou_7 | 0.46 |
total_rech_amt_6 | total_og_mou_6 | 0.46 |
isd_og_mou_7 | Average_rech_amt_6n7 | 0.46 |
spl_og_mou_7 | spl_og_mou_8 | 0.46 |
loc_og_mou_8 | loc_og_t2t_mou_7 | 0.46 |
loc_ic_mou_6 | loc_og_t2m_mou_6 | 0.45 |
max_rech_amt_7 | max_rech_amt_8 | 0.45 |
std_ic_t2m_mou_8 | std_ic_mou_7 | 0.45 |
total_rech_amt_7 | total_rech_amt_8 | 0.45 |
arpu_7 | offnet_mou_7 | 0.45 |
std_og_t2t_mou_8 | total_rech_num_8 | 0.45 |
arpu_8 | roam_og_mou_8 | 0.45 |
std_ic_t2m_mou_7 | total_ic_mou_7 | 0.45 |
loc_og_mou_6 | loc_og_t2m_mou_7 | 0.45 |
total_rech_num_7 | total_rech_num_6 | 0.45 |
std_ic_t2f_mou_7 | std_ic_t2f_mou_6 | 0.45 |
loc_ic_mou_6 | loc_og_mou_6 | 0.45 |
loc_og_mou_6 | loc_og_t2t_mou_7 | 0.45 |
loc_ic_t2m_mou_6 | total_ic_mou_7 | 0.44 |
isd_ic_mou_6 | ic_others_6 | 0.44 |
last_day_rch_amt_8 | arpu_8 | 0.44 |
loc_og_t2c_mou_8 | loc_og_t2c_mou_7 | 0.44 |
arpu_8 | Average_rech_amt_6n7 | 0.44 |
roam_ic_mou_8 | roam_og_mou_8 | 0.44 |
std_og_mou_8 | onnet_mou_7 | 0.44 |
std_og_mou_7 | std_og_t2m_mou_8 | 0.44 |
loc_og_t2m_mou_8 | offnet_mou_8 | 0.44 |
total_rech_amt_8 | roam_og_mou_8 | 0.44 |
loc_og_mou_7 | total_ic_mou_7 | 0.44 |
total_og_mou_8 | onnet_mou_7 | 0.43 |
spl_og_mou_6 | spl_og_mou_7 | 0.43 |
total_ic_mou_8 | loc_ic_t2m_mou_7 | 0.43 |
std_ic_t2m_mou_6 | total_ic_mou_6 | 0.43 |
total_rech_num_8 | offnet_mou_8 | 0.43 |
loc_og_mou_8 | offnet_mou_8 | 0.43 |
std_og_t2t_mou_8 | std_og_mou_7 | 0.43 |
loc_ic_t2f_mou_8 | total_ic_mou_8 | 0.43 |
std_og_t2f_mou_7 | std_ic_t2f_mou_7 | 0.43 |
total_og_mou_6 | total_rech_num_6 | 0.43 |
std_ic_mou_7 | std_ic_t2m_mou_6 | 0.42 |
loc_ic_mou_8 | loc_og_t2t_mou_8 | 0.42 |
loc_ic_t2t_mou_8 | loc_og_mou_8 | 0.42 |
total_ic_mou_7 | loc_og_t2m_mou_7 | 0.42 |
total_rech_amt_7 | offnet_mou_7 | 0.42 |
max_rech_amt_8 | max_rech_amt_6 | 0.42 |
loc_ic_t2m_mou_6 | loc_og_mou_6 | 0.42 |
last_day_rch_amt_7 | max_rech_amt_8 | 0.42 |
total_ic_mou_7 | loc_ic_t2m_mou_8 | 0.42 |
loc_ic_t2f_mou_7 | loc_ic_mou_7 | 0.42 |
loc_og_t2m_mou_8 | loc_ic_t2m_mou_7 | 0.42 |
total_rech_amt_8 | Average_rech_amt_6n7 | 0.42 |
arpu_8 | total_ic_mou_8 | 0.42 |
offnet_mou_6 | arpu_6 | 0.42 |
total_og_mou_7 | std_og_t2m_mou_8 | 0.42 |
std_og_mou_6 | std_og_t2t_mou_7 | 0.42 |
total_og_mou_7 | offnet_mou_8 | 0.41 |
std_og_mou_6 | std_og_t2m_mou_7 | 0.41 |
total_og_mou_8 | std_og_t2t_mou_7 | 0.41 |
std_og_t2t_mou_6 | std_og_mou_7 | 0.41 |
std_og_mou_7 | total_rech_num_7 | 0.41 |
std_og_mou_7 | arpu_7 | 0.41 |
loc_og_t2t_mou_8 | loc_og_t2t_mou_6 | 0.41 |
total_og_mou_8 | total_ic_mou_8 | 0.41 |
std_og_mou_7 | std_og_t2m_mou_6 | 0.41 |
total_rech_amt_6 | offnet_mou_6 | 0.41 |
spl_ic_mou_6 | spl_ic_mou_8 | 0.41 |
Data Preparation
Derived Variables
1# Derived variables to measure change in usage23# Usage4data['delta_vol_2g'] = data['vol_2g_mb_8'] - data['vol_2g_mb_6'].add(data['vol_2g_mb_7']).div(2)5data['delta_vol_3g'] = data['vol_3g_mb_8'] - data['vol_3g_mb_6'].add(data['vol_3g_mb_7']).div(2)6data['delta_total_og_mou'] = data['total_og_mou_8'] - data['total_og_mou_6'].add(data['total_og_mou_7']).div(2)7data['delta_total_ic_mou'] = data['total_ic_mou_8'] - data['total_ic_mou_6'].add(data['total_ic_mou_7']).div(2)8data['delta_vbc_3g'] = data['vbc_3g_8'] - data['vbc_3g_6'].add(data['vbc_3g_7']).div(2)910# Revenue11data['delta_arpu'] = data['arpu_8'] - data['arpu_6'].add(data['arpu_7']).div(2)12data['delta_total_rech_amt'] = data['total_rech_amt_8'] - data['total_rech_amt_6'].add(data['total_rech_amt_7']).div(2)
1# Removing variables used for derivation :2data.drop(columns=[3 'vol_2g_mb_8', 'vol_2g_mb_6', 'vol_2g_mb_7',4 'vol_3g_mb_8' , 'vol_3g_mb_6', 'vol_3g_mb_7' ,5 'total_og_mou_8','total_og_mou_6', 'total_og_mou_7',6 'total_ic_mou_8','total_ic_mou_6', 'total_ic_mou_7',7 'vbc_3g_8','vbc_3g_6','vbc_3g_7',8 'arpu_8','arpu_6','arpu_7',9 'total_rech_amt_8', 'total_rech_amt_6', 'total_rech_amt_7'1011], inplace=True)
Outlier Treatment
1# Looking at quantiles from 0.90 to 1.2data.quantile(np.arange(0.9,1.01,0.01)).style.bar()
onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | aon | Average_rech_amt_6n7 | delta_vol_2g | delta_vol_3g | delta_total_og_mou | delta_total_ic_mou | delta_vbc_3g | delta_arpu | delta_total_rech_amt | |
0.9 | 794.98 | 824.38 | 723.61 | 915.58 | 935.69 | 853.79 | 32.73 | 18.36 | 18.68 | 64.48 | 41.20 | 37.11 | 207.93 | 207.84 | 196.91 | 435.16 | 437.49 | 416.66 | 18.38 | 18.66 | 16.96 | 4.04 | 4.84 | 4.45 | 661.74 | 657.38 | 633.34 | 630.53 | 663.79 | 567.34 | 604.41 | 645.88 | 531.26 | 2.20 | 2.18 | 1.73 | 1140.93 | 1177.18 | 1057.29 | 0.00 | 0.00 | 0.00 | 15.93 | 19.51 | 18.04 | 2.26 | 0.00 | 0.00 | 154.88 | 156.61 | 148.14 | 368.54 | 364.54 | 360.54 | 39.23 | 41.04 | 37.19 | 559.28 | 558.99 | 549.79 | 34.73 | 36.01 | 32.14 | 73.38 | 75.28 | 68.58 | 4.36 | 4.58 | 3.94 | 115.91 | 118.66 | 108.38 | 0.28 | 0.00 | 0.00 | 15.01 | 18.30 | 15.33 | 1.16 | 1.59 | 1.23 | 23.00 | 23.00 | 21.00 | 297.00 | 300.00 | 252.00 | 250.00 | 250.00 | 225.00 | 2846.00 | 1118.00 | 29.84 | 170.07 | 345.07 | 147.30 | 69.83 | 257.31 | 319.00 |
0.91 | 848.97 | 878.35 | 783.49 | 966.74 | 984.02 | 899.29 | 39.69 | 23.28 | 23.39 | 78.43 | 50.01 | 46.44 | 225.96 | 224.87 | 213.83 | 461.10 | 461.81 | 441.84 | 20.28 | 20.68 | 18.84 | 4.68 | 5.51 | 5.11 | 703.11 | 692.67 | 669.63 | 686.26 | 722.84 | 622.13 | 658.47 | 695.77 | 583.42 | 2.91 | 2.80 | 2.28 | 1195.61 | 1244.40 | 1125.28 | 0.00 | 0.00 | 0.00 | 17.54 | 21.28 | 19.69 | 2.54 | 0.00 | 0.00 | 165.79 | 168.03 | 159.84 | 390.64 | 387.11 | 382.20 | 43.59 | 45.39 | 41.21 | 593.13 | 589.65 | 580.54 | 38.21 | 39.91 | 35.93 | 80.41 | 81.93 | 75.54 | 5.21 | 5.49 | 4.71 | 125.98 | 129.29 | 118.24 | 0.30 | 0.00 | 0.00 | 18.34 | 21.84 | 18.83 | 1.44 | 1.94 | 1.51 | 24.00 | 24.00 | 22.00 | 325.00 | 330.00 | 289.00 | 250.00 | 250.00 | 250.00 | 2910.10 | 1156.00 | 39.88 | 227.15 | 377.46 | 161.80 | 95.33 | 278.90 | 345.50 |
0.92 | 909.05 | 941.99 | 848.96 | 1031.39 | 1038.09 | 953.35 | 48.71 | 29.68 | 29.64 | 93.60 | 60.97 | 57.59 | 247.94 | 244.78 | 232.33 | 490.63 | 488.04 | 468.83 | 22.56 | 23.14 | 20.93 | 5.45 | 6.26 | 5.86 | 742.96 | 735.69 | 711.57 | 750.31 | 786.39 | 680.10 | 713.49 | 760.98 | 640.57 | 3.74 | 3.71 | 3.01 | 1268.83 | 1315.08 | 1201.29 | 0.13 | 0.05 | 0.00 | 19.26 | 23.39 | 21.78 | 2.86 | 0.00 | 0.00 | 180.18 | 181.49 | 173.59 | 415.89 | 412.03 | 405.97 | 48.65 | 50.66 | 46.19 | 629.64 | 624.36 | 614.45 | 42.73 | 44.58 | 39.99 | 88.27 | 90.41 | 83.44 | 6.33 | 6.61 | 5.75 | 138.32 | 142.16 | 130.55 | 0.33 | 0.00 | 0.03 | 22.58 | 26.94 | 23.58 | 1.78 | 2.38 | 1.86 | 25.00 | 25.00 | 23.00 | 350.00 | 350.00 | 330.00 | 250.00 | 250.00 | 250.00 | 2981.20 | 1202.00 | 53.66 | 289.30 | 419.97 | 177.35 | 127.50 | 303.51 | 375.00 |
0.93 | 990.48 | 1016.15 | 920.96 | 1094.77 | 1103.93 | 1017.35 | 60.42 | 37.28 | 37.90 | 111.15 | 75.00 | 72.45 | 275.51 | 271.70 | 254.64 | 523.56 | 519.80 | 500.38 | 25.25 | 26.00 | 23.51 | 6.34 | 7.15 | 6.79 | 794.01 | 786.73 | 759.45 | 812.08 | 856.34 | 753.44 | 777.69 | 828.18 | 706.23 | 4.89 | 4.70 | 4.00 | 1358.41 | 1404.59 | 1283.20 | 0.33 | 0.25 | 0.00 | 21.33 | 25.84 | 24.07 | 3.23 | 0.00 | 0.00 | 195.66 | 196.99 | 188.25 | 444.94 | 439.30 | 434.35 | 54.59 | 57.50 | 51.92 | 671.69 | 667.07 | 654.41 | 48.03 | 50.82 | 45.64 | 98.00 | 99.68 | 93.47 | 7.84 | 8.08 | 7.08 | 153.30 | 158.86 | 146.87 | 0.36 | 0.00 | 0.11 | 28.18 | 33.33 | 29.95 | 2.20 | 2.93 | 2.38 | 27.00 | 27.00 | 25.00 | 350.00 | 398.00 | 350.00 | 252.00 | 252.00 | 250.00 | 3055.30 | 1257.00 | 71.44 | 361.57 | 466.79 | 195.08 | 166.36 | 331.83 | 408.50 |
0.9400000000000001 | 1066.85 | 1097.12 | 1007.56 | 1168.09 | 1186.36 | 1096.62 | 73.96 | 49.44 | 48.29 | 137.40 | 94.73 | 90.27 | 307.64 | 305.87 | 282.50 | 563.70 | 559.41 | 537.10 | 29.13 | 29.75 | 26.89 | 7.36 | 8.45 | 7.84 | 855.97 | 849.42 | 817.15 | 888.23 | 933.58 | 842.48 | 856.37 | 907.95 | 786.16 | 6.23 | 6.11 | 5.26 | 1456.41 | 1503.09 | 1385.91 | 0.63 | 0.51 | 0.23 | 23.92 | 28.49 | 26.83 | 3.64 | 0.00 | 0.00 | 216.28 | 214.47 | 207.73 | 478.56 | 478.91 | 472.15 | 62.96 | 65.76 | 58.84 | 717.00 | 711.74 | 704.89 | 55.49 | 58.04 | 52.58 | 110.61 | 113.20 | 105.54 | 9.66 | 9.89 | 8.77 | 174.34 | 179.84 | 165.92 | 0.40 | 0.00 | 0.21 | 35.80 | 40.97 | 36.57 | 2.81 | 3.73 | 2.98 | 28.00 | 28.00 | 26.00 | 400.00 | 455.00 | 398.00 | 252.00 | 252.00 | 252.00 | 3107.00 | 1317.70 | 97.46 | 463.36 | 524.65 | 218.73 | 217.42 | 366.26 | 447.20 |
0.9500000000000001 | 1153.97 | 1208.17 | 1115.66 | 1271.47 | 1286.28 | 1188.46 | 94.59 | 63.34 | 62.80 | 168.46 | 119.34 | 114.80 | 348.62 | 346.90 | 324.14 | 614.99 | 608.01 | 585.06 | 33.59 | 34.09 | 31.31 | 8.69 | 9.95 | 9.33 | 935.51 | 920.12 | 883.25 | 986.24 | 1029.29 | 936.49 | 960.80 | 1004.26 | 886.56 | 8.16 | 7.92 | 7.18 | 1558.50 | 1624.81 | 1518.82 | 1.10 | 1.01 | 0.55 | 26.81 | 32.15 | 30.23 | 4.14 | 0.00 | 0.00 | 243.94 | 238.62 | 232.50 | 520.55 | 518.65 | 516.67 | 72.61 | 76.05 | 67.56 | 773.27 | 781.18 | 767.31 | 64.94 | 66.55 | 61.56 | 126.66 | 130.41 | 121.88 | 12.24 | 12.31 | 10.98 | 200.64 | 205.16 | 191.95 | 0.43 | 0.06 | 0.25 | 46.45 | 51.98 | 46.48 | 3.63 | 4.83 | 3.93 | 30.00 | 30.00 | 28.00 | 500.00 | 500.00 | 455.00 | 252.00 | 274.00 | 252.00 | 3179.00 | 1406.00 | 129.68 | 562.66 | 604.55 | 245.97 | 284.40 | 404.58 | 499.00 |
0.9600000000000001 | 1282.78 | 1344.04 | 1256.34 | 1406.07 | 1407.78 | 1305.32 | 120.08 | 83.43 | 82.12 | 211.03 | 153.97 | 145.54 | 411.69 | 412.46 | 380.74 | 674.30 | 670.99 | 646.48 | 39.84 | 40.05 | 37.61 | 10.61 | 11.86 | 11.45 | 1025.57 | 1016.38 | 975.80 | 1099.72 | 1146.44 | 1066.04 | 1101.07 | 1136.32 | 998.28 | 11.43 | 10.87 | 9.59 | 1707.60 | 1766.85 | 1672.44 | 2.20 | 2.28 | 1.09 | 31.43 | 37.11 | 34.33 | 4.78 | 0.00 | 0.00 | 276.09 | 270.15 | 265.31 | 578.33 | 574.86 | 573.49 | 85.30 | 89.25 | 77.93 | 847.56 | 854.66 | 848.82 | 77.15 | 81.35 | 74.18 | 151.66 | 153.16 | 144.74 | 15.67 | 15.88 | 14.48 | 237.74 | 241.25 | 224.12 | 0.46 | 0.13 | 0.25 | 60.59 | 67.46 | 61.77 | 4.98 | 6.51 | 5.31 | 32.00 | 32.00 | 31.00 | 505.00 | 550.00 | 500.00 | 330.00 | 339.00 | 300.00 | 3264.00 | 1508.50 | 185.21 | 705.70 | 704.27 | 282.57 | 356.70 | 458.40 | 555.30 |
0.9700000000000001 | 1444.23 | 1497.25 | 1441.53 | 1578.82 | 1585.02 | 1481.57 | 155.13 | 117.54 | 112.07 | 270.52 | 203.66 | 188.86 | 508.01 | 500.20 | 458.64 | 758.99 | 749.79 | 734.08 | 49.38 | 49.09 | 46.36 | 13.04 | 14.68 | 14.14 | 1163.16 | 1143.62 | 1101.16 | 1243.36 | 1308.19 | 1235.72 | 1262.71 | 1308.35 | 1162.84 | 16.44 | 15.26 | 13.64 | 1904.26 | 1950.83 | 1877.19 | 5.01 | 5.36 | 2.73 | 37.80 | 44.44 | 40.55 | 5.56 | 0.00 | 0.00 | 320.64 | 322.92 | 308.49 | 655.65 | 646.22 | 642.58 | 103.65 | 109.78 | 96.64 | 959.39 | 962.29 | 941.13 | 97.62 | 101.69 | 94.54 | 184.59 | 187.18 | 176.96 | 20.81 | 21.78 | 19.62 | 290.90 | 295.51 | 279.29 | 0.53 | 0.20 | 0.36 | 82.75 | 91.07 | 85.44 | 7.03 | 9.05 | 7.58 | 35.00 | 36.00 | 34.00 | 550.00 | 550.00 | 550.00 | 398.00 | 398.00 | 379.00 | 3424.70 | 1633.85 | 262.23 | 895.25 | 843.24 | 334.26 | 461.60 | 529.37 | 644.00 |
0.9800000000000001 | 1694.68 | 1772.62 | 1700.24 | 1837.93 | 1838.39 | 1739.01 | 221.26 | 166.28 | 165.81 | 363.12 | 282.49 | 266.53 | 668.59 | 660.28 | 596.48 | 885.83 | 868.35 | 853.83 | 62.69 | 63.06 | 60.11 | 17.15 | 19.23 | 18.67 | 1372.78 | 1338.79 | 1306.65 | 1458.71 | 1520.56 | 1463.19 | 1518.64 | 1558.41 | 1413.14 | 24.58 | 23.23 | 21.17 | 2174.34 | 2312.91 | 2165.26 | 12.99 | 13.21 | 7.75 | 48.05 | 56.14 | 51.15 | 6.84 | 0.00 | 0.00 | 414.27 | 407.34 | 392.53 | 775.61 | 765.13 | 748.24 | 132.06 | 143.85 | 125.21 | 1136.29 | 1136.25 | 1114.04 | 132.11 | 138.47 | 131.28 | 245.19 | 253.11 | 234.79 | 30.02 | 31.45 | 28.10 | 367.71 | 388.61 | 362.84 | 0.58 | 0.30 | 0.50 | 129.13 | 135.43 | 127.01 | 10.84 | 13.99 | 11.54 | 40.00 | 40.00 | 39.00 | 655.00 | 750.00 | 619.00 | 500.00 | 500.00 | 500.00 | 3632.00 | 1834.70 | 392.11 | 1207.69 | 1051.14 | 431.92 | 621.75 | 649.14 | 779.30 |
0.9900000000000001 | 2166.37 | 2220.37 | 2188.50 | 2326.29 | 2410.10 | 2211.64 | 349.35 | 292.54 | 288.49 | 543.71 | 448.13 | 432.74 | 1076.24 | 1059.88 | 956.50 | 1147.05 | 1112.66 | 1092.59 | 90.88 | 91.06 | 86.68 | 24.86 | 28.24 | 28.87 | 1806.94 | 1761.43 | 1689.07 | 1885.20 | 1919.19 | 1938.13 | 1955.61 | 2112.66 | 1905.81 | 44.39 | 43.89 | 38.88 | 2744.49 | 2874.65 | 2800.87 | 41.25 | 40.43 | 31.24 | 71.36 | 79.87 | 74.11 | 9.31 | 0.00 | 0.00 | 625.35 | 648.79 | 621.67 | 1026.44 | 1009.29 | 976.09 | 197.17 | 205.25 | 185.62 | 1484.99 | 1515.87 | 1459.55 | 215.64 | 231.15 | 215.20 | 393.73 | 408.58 | 372.61 | 53.39 | 56.59 | 49.41 | 577.89 | 616.89 | 563.89 | 0.68 | 0.51 | 0.61 | 239.60 | 240.13 | 249.89 | 20.71 | 25.26 | 21.53 | 48.00 | 48.00 | 46.00 | 1000.00 | 1000.00 | 951.00 | 655.00 | 655.00 | 619.00 | 3651.00 | 2216.30 | 654.31 | 1878.12 | 1465.10 | 619.69 | 929.64 | 864.34 | 1036.40 |
1.0 | 7376.71 | 8157.78 | 10752.56 | 8362.36 | 9667.13 | 14007.34 | 2613.31 | 3813.29 | 4169.81 | 3775.11 | 2812.04 | 5337.04 | 6431.33 | 7400.66 | 10752.56 | 4729.74 | 4557.14 | 4961.33 | 1466.03 | 1196.43 | 928.49 | 342.86 | 569.71 | 351.83 | 10643.38 | 7674.78 | 11039.91 | 7366.58 | 8133.66 | 8014.43 | 8314.76 | 9284.74 | 13950.04 | 628.56 | 544.63 | 516.91 | 8432.99 | 10936.73 | 13980.06 | 5900.66 | 5490.28 | 5681.54 | 1023.21 | 1265.79 | 1390.88 | 100.61 | 370.13 | 394.93 | 6351.44 | 5709.59 | 4003.21 | 4693.86 | 4388.73 | 5738.46 | 1678.41 | 1983.01 | 1588.53 | 6496.11 | 6466.74 | 5748.81 | 5459.56 | 5800.93 | 4309.29 | 4630.23 | 3470.38 | 5645.86 | 1351.11 | 1136.08 | 1394.89 | 5459.63 | 6745.76 | 5957.14 | 19.76 | 21.33 | 6.23 | 3965.69 | 4747.91 | 4100.38 | 1344.14 | 1495.94 | 1209.86 | 307.00 | 138.00 | 196.00 | 4010.00 | 4010.00 | 4449.00 | 4010.00 | 4010.00 | 4449.00 | 4321.00 | 37762.50 | 8062.30 | 15646.39 | 12768.70 | 4862.62 | 8254.62 | 12808.62 | 14344.50 |
1# Looking at percentage change in quantiles from 0.90 to 1.2data.quantile(np.arange(0.9,1.01,0.01)).pct_change().mul(100).style.bar()
onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | aon | Average_rech_amt_6n7 | delta_vol_2g | delta_vol_3g | delta_total_og_mou | delta_total_ic_mou | delta_vbc_3g | delta_arpu | delta_total_rech_amt | |
0.9 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
0.91 | 6.79 | 6.55 | 8.27 | 5.59 | 5.17 | 5.33 | 21.27 | 26.80 | 25.22 | 21.64 | 21.39 | 25.13 | 8.67 | 8.20 | 8.59 | 5.96 | 5.56 | 6.04 | 10.34 | 10.83 | 11.08 | 15.84 | 13.88 | 14.88 | 6.25 | 5.37 | 5.73 | 8.84 | 8.90 | 9.66 | 8.94 | 7.72 | 9.82 | 32.27 | 28.44 | 31.79 | 4.79 | 5.71 | 6.43 | nan | nan | nan | 10.11 | 9.09 | 9.16 | 12.39 | nan | nan | 7.05 | 7.29 | 7.90 | 6.00 | 6.19 | 6.01 | 11.11 | 10.60 | 10.81 | 6.05 | 5.48 | 5.59 | 10.03 | 10.84 | 11.79 | 9.58 | 8.84 | 10.15 | 19.50 | 19.89 | 19.54 | 8.69 | 8.96 | 9.10 | 7.14 | nan | nan | 22.19 | 19.35 | 22.84 | 24.14 | 22.01 | 22.76 | 4.35 | 4.35 | 4.76 | 9.43 | 10.00 | 14.68 | 0.00 | 0.00 | 11.11 | 2.25 | 3.40 | 33.68 | 33.56 | 9.39 | 9.84 | 36.51 | 8.39 | 8.31 |
0.92 | 7.08 | 7.25 | 8.36 | 6.69 | 5.49 | 6.01 | 22.72 | 27.49 | 26.73 | 19.34 | 21.90 | 24.03 | 9.73 | 8.85 | 8.65 | 6.41 | 5.68 | 6.11 | 11.24 | 11.91 | 11.09 | 16.45 | 13.57 | 14.71 | 5.67 | 6.21 | 6.26 | 9.33 | 8.79 | 9.32 | 8.36 | 9.37 | 9.79 | 28.52 | 32.50 | 32.02 | 6.12 | 5.68 | 6.76 | inf | inf | nan | 9.81 | 9.92 | 10.60 | 12.60 | nan | nan | 8.68 | 8.01 | 8.60 | 6.46 | 6.44 | 6.22 | 11.60 | 11.61 | 12.08 | 6.15 | 5.89 | 5.84 | 11.83 | 11.70 | 11.30 | 9.77 | 10.35 | 10.46 | 21.50 | 20.38 | 22.12 | 9.79 | 9.95 | 10.41 | 10.00 | nan | inf | 23.12 | 23.35 | 25.20 | 23.61 | 22.68 | 23.18 | 4.17 | 4.17 | 4.55 | 7.69 | 6.06 | 14.19 | 0.00 | 0.00 | 0.00 | 2.44 | 3.98 | 34.55 | 27.36 | 11.26 | 9.61 | 33.76 | 8.82 | 8.54 |
0.93 | 8.96 | 7.87 | 8.48 | 6.14 | 6.34 | 6.71 | 24.03 | 25.62 | 27.86 | 18.75 | 23.02 | 25.80 | 11.12 | 11.00 | 9.60 | 6.71 | 6.51 | 6.73 | 11.91 | 12.32 | 12.33 | 16.33 | 14.27 | 15.79 | 6.87 | 6.94 | 6.73 | 8.23 | 8.89 | 10.78 | 9.00 | 8.83 | 10.25 | 30.75 | 26.77 | 32.76 | 7.06 | 6.81 | 6.82 | 153.85 | 400.00 | nan | 10.76 | 10.46 | 10.50 | 12.94 | nan | nan | 8.59 | 8.54 | 8.44 | 6.99 | 6.62 | 6.99 | 12.21 | 13.49 | 12.40 | 6.68 | 6.84 | 6.50 | 12.40 | 13.98 | 14.13 | 11.02 | 10.25 | 12.01 | 23.85 | 22.24 | 23.09 | 10.83 | 11.75 | 12.50 | 9.09 | nan | 266.67 | 24.79 | 23.73 | 27.02 | 23.60 | 23.24 | 27.96 | 8.00 | 8.00 | 8.70 | 0.00 | 13.71 | 6.06 | 0.80 | 0.80 | 0.00 | 2.49 | 4.58 | 33.13 | 24.98 | 11.15 | 10.00 | 30.47 | 9.33 | 8.93 |
0.9400000000000001 | 7.71 | 7.97 | 9.40 | 6.70 | 7.47 | 7.79 | 22.41 | 32.61 | 27.41 | 23.62 | 26.30 | 24.59 | 11.66 | 12.58 | 10.94 | 7.67 | 7.62 | 7.34 | 15.38 | 14.43 | 14.39 | 16.09 | 18.10 | 15.52 | 7.80 | 7.97 | 7.60 | 9.38 | 9.02 | 11.82 | 10.12 | 9.63 | 11.32 | 27.40 | 29.92 | 31.63 | 7.21 | 7.01 | 8.00 | 90.91 | 104.00 | inf | 12.12 | 10.26 | 11.49 | 12.69 | nan | nan | 10.54 | 8.87 | 10.35 | 7.56 | 9.02 | 8.70 | 15.33 | 14.36 | 13.33 | 6.75 | 6.70 | 7.71 | 15.53 | 14.22 | 15.21 | 12.87 | 13.57 | 12.92 | 23.21 | 22.45 | 23.84 | 13.72 | 13.21 | 12.97 | 11.11 | nan | 90.91 | 27.04 | 22.91 | 22.11 | 27.73 | 27.17 | 25.21 | 3.70 | 3.70 | 4.00 | 14.29 | 14.32 | 13.71 | 0.00 | 0.00 | 0.80 | 1.69 | 4.83 | 36.42 | 28.15 | 12.40 | 12.12 | 30.69 | 10.38 | 9.47 |
0.9500000000000001 | 8.17 | 10.12 | 10.73 | 8.85 | 8.42 | 8.38 | 27.89 | 28.10 | 30.03 | 22.61 | 25.97 | 27.18 | 13.32 | 13.41 | 14.74 | 9.10 | 8.69 | 8.93 | 15.33 | 14.58 | 16.43 | 18.07 | 17.72 | 18.94 | 9.29 | 8.32 | 8.09 | 11.03 | 10.25 | 11.16 | 12.19 | 10.61 | 12.77 | 30.98 | 29.62 | 36.50 | 7.01 | 8.10 | 9.59 | 74.60 | 98.04 | 139.13 | 12.09 | 12.85 | 12.67 | 13.74 | nan | nan | 12.79 | 11.26 | 11.92 | 8.77 | 8.30 | 9.43 | 15.33 | 15.65 | 14.82 | 7.85 | 9.76 | 8.85 | 17.04 | 14.66 | 17.08 | 14.51 | 15.21 | 15.48 | 26.71 | 24.42 | 25.23 | 15.09 | 14.08 | 15.69 | 7.50 | inf | 19.05 | 29.73 | 26.89 | 27.12 | 29.18 | 29.49 | 31.88 | 7.14 | 7.14 | 7.69 | 25.00 | 9.89 | 14.32 | 0.00 | 8.73 | 0.00 | 2.32 | 6.70 | 33.06 | 21.43 | 15.23 | 12.45 | 30.81 | 10.46 | 11.58 |
0.9600000000000001 | 11.16 | 11.25 | 12.61 | 10.59 | 9.45 | 9.83 | 26.95 | 31.73 | 30.77 | 25.27 | 29.02 | 26.77 | 18.09 | 18.90 | 17.46 | 9.64 | 10.36 | 10.50 | 18.58 | 17.51 | 20.13 | 22.09 | 19.26 | 22.68 | 9.63 | 10.46 | 10.48 | 11.51 | 11.38 | 13.83 | 14.60 | 13.15 | 12.60 | 40.07 | 37.27 | 33.57 | 9.57 | 8.74 | 10.11 | 99.64 | 125.74 | 98.55 | 17.23 | 15.43 | 13.57 | 15.46 | nan | nan | 13.18 | 13.22 | 14.11 | 11.10 | 10.84 | 11.00 | 17.48 | 17.37 | 15.35 | 9.61 | 9.41 | 10.62 | 18.80 | 22.23 | 20.50 | 19.74 | 17.44 | 18.76 | 28.04 | 28.98 | 31.88 | 18.49 | 17.59 | 16.76 | 6.98 | 116.67 | 0.00 | 30.45 | 29.77 | 32.87 | 37.19 | 34.78 | 35.01 | 6.67 | 6.67 | 10.71 | 1.00 | 10.00 | 9.89 | 30.95 | 23.72 | 19.05 | 2.67 | 7.29 | 42.82 | 25.42 | 16.49 | 14.88 | 25.42 | 13.30 | 11.28 |
0.9700000000000001 | 12.59 | 11.40 | 14.74 | 12.29 | 12.59 | 13.50 | 29.19 | 40.89 | 36.48 | 28.19 | 32.27 | 29.77 | 23.40 | 21.27 | 20.46 | 12.56 | 11.74 | 13.55 | 23.95 | 22.57 | 23.25 | 22.90 | 23.79 | 23.54 | 13.42 | 12.52 | 12.85 | 13.06 | 14.11 | 15.92 | 14.68 | 15.14 | 16.48 | 43.87 | 40.36 | 42.23 | 11.52 | 10.41 | 12.24 | 128.14 | 135.09 | 150.00 | 20.27 | 19.74 | 18.11 | 16.32 | nan | nan | 16.14 | 19.53 | 16.28 | 13.37 | 12.41 | 12.05 | 21.51 | 22.99 | 24.01 | 13.19 | 12.59 | 10.87 | 26.52 | 25.01 | 27.45 | 21.71 | 22.21 | 22.26 | 32.77 | 37.18 | 35.52 | 22.36 | 22.49 | 24.61 | 15.22 | 53.85 | 44.00 | 36.58 | 35.00 | 38.32 | 41.18 | 39.03 | 42.86 | 9.38 | 12.50 | 9.68 | 8.91 | 0.00 | 10.00 | 20.61 | 17.40 | 26.33 | 4.92 | 8.31 | 41.59 | 26.86 | 19.73 | 18.29 | 29.41 | 15.48 | 15.97 |
0.9800000000000001 | 17.34 | 18.39 | 17.95 | 16.41 | 15.99 | 17.38 | 42.63 | 41.47 | 47.95 | 34.23 | 38.71 | 41.12 | 31.61 | 32.00 | 30.06 | 16.71 | 15.81 | 16.31 | 26.97 | 28.45 | 29.66 | 31.50 | 30.95 | 32.04 | 18.02 | 17.07 | 18.66 | 17.32 | 16.23 | 18.41 | 20.27 | 19.11 | 21.53 | 49.46 | 52.20 | 55.19 | 14.18 | 18.56 | 15.35 | 159.20 | 146.46 | 183.88 | 27.11 | 26.34 | 26.15 | 23.02 | nan | nan | 29.20 | 26.14 | 27.24 | 18.30 | 18.40 | 16.44 | 27.41 | 31.04 | 29.57 | 18.44 | 18.08 | 18.37 | 35.34 | 36.17 | 38.87 | 32.83 | 35.22 | 32.68 | 44.29 | 44.39 | 43.18 | 26.40 | 31.51 | 29.92 | 9.43 | 50.00 | 38.89 | 56.05 | 48.71 | 48.66 | 54.17 | 54.57 | 52.27 | 14.29 | 11.11 | 14.71 | 19.09 | 36.36 | 12.55 | 25.63 | 25.63 | 31.93 | 6.05 | 12.29 | 49.53 | 34.90 | 24.65 | 29.22 | 34.70 | 22.63 | 21.01 |
0.9900000000000001 | 27.83 | 25.26 | 28.72 | 26.57 | 31.10 | 27.18 | 57.89 | 75.93 | 73.99 | 49.73 | 58.63 | 62.36 | 60.97 | 60.52 | 60.36 | 29.49 | 28.14 | 27.96 | 44.96 | 44.40 | 44.20 | 44.96 | 46.86 | 54.64 | 31.63 | 31.57 | 29.27 | 29.24 | 26.22 | 32.46 | 28.77 | 35.57 | 34.86 | 80.59 | 88.95 | 83.68 | 26.22 | 24.29 | 29.35 | 217.65 | 206.02 | 303.10 | 48.50 | 42.27 | 44.88 | 36.08 | nan | nan | 50.95 | 59.27 | 58.38 | 32.34 | 31.91 | 30.45 | 49.30 | 42.69 | 48.24 | 30.69 | 33.41 | 31.01 | 63.22 | 66.93 | 63.92 | 60.58 | 61.42 | 58.70 | 77.82 | 79.94 | 75.85 | 57.16 | 58.74 | 55.41 | 17.24 | 70.00 | 22.00 | 85.55 | 77.30 | 96.75 | 91.03 | 80.54 | 86.54 | 20.00 | 20.00 | 17.95 | 52.67 | 33.33 | 53.63 | 31.00 | 31.00 | 23.80 | 0.52 | 20.80 | 66.87 | 55.51 | 39.38 | 43.47 | 49.52 | 33.15 | 32.99 |
1.0 | 240.51 | 267.41 | 391.32 | 259.47 | 301.11 | 533.35 | 648.04 | 1203.51 | 1345.42 | 594.33 | 527.51 | 1133.30 | 497.57 | 598.26 | 1024.15 | 312.34 | 309.57 | 354.09 | 1513.24 | 1213.96 | 971.17 | 1279.33 | 1917.74 | 1118.63 | 489.03 | 335.71 | 553.61 | 290.76 | 323.81 | 313.51 | 325.17 | 339.48 | 631.98 | 1316.15 | 1141.04 | 1229.43 | 207.27 | 280.45 | 399.13 | 14204.63 | 13481.40 | 18086.75 | 1333.97 | 1484.83 | 1776.73 | 980.90 | inf | inf | 915.66 | 780.03 | 543.95 | 357.30 | 334.83 | 487.90 | 751.25 | 866.13 | 755.80 | 337.45 | 326.60 | 293.87 | 2431.76 | 2409.61 | 1902.47 | 1076.01 | 749.39 | 1415.23 | 2430.88 | 1907.56 | 2723.15 | 844.75 | 993.52 | 956.44 | 2805.88 | 4082.35 | 921.31 | 1555.13 | 1877.27 | 1540.89 | 6390.92 | 5822.64 | 5519.41 | 539.58 | 187.50 | 326.09 | 301.00 | 301.00 | 367.82 | 512.21 | 512.21 | 618.74 | 18.35 | 1603.85 | 1132.18 | 733.09 | 771.52 | 684.69 | 787.93 | 1381.89 | 1284.07 |
1# Columns with outliers2pct_change_99_1 = data.quantile(np.arange(0.9,1.01,0.01)).pct_change().mul(100).iloc[-1]3outlier_condition = pct_change_99_1 > 1004columns_with_outliers = pct_change_99_1[outlier_condition].index.values5print('Columns with outliers :\n', columns_with_outliers)
1Columns with outliers :2 ['onnet_mou_6' 'onnet_mou_7' 'onnet_mou_8' 'offnet_mou_6' 'offnet_mou_7'3 'offnet_mou_8' 'roam_ic_mou_6' 'roam_ic_mou_7' 'roam_ic_mou_8'4 'roam_og_mou_6' 'roam_og_mou_7' 'roam_og_mou_8' 'loc_og_t2t_mou_6'5 'loc_og_t2t_mou_7' 'loc_og_t2t_mou_8' 'loc_og_t2m_mou_6'6 'loc_og_t2m_mou_7' 'loc_og_t2m_mou_8' 'loc_og_t2f_mou_6'7 'loc_og_t2f_mou_7' 'loc_og_t2f_mou_8' 'loc_og_t2c_mou_6'8 'loc_og_t2c_mou_7' 'loc_og_t2c_mou_8' 'loc_og_mou_6' 'loc_og_mou_7'9 'loc_og_mou_8' 'std_og_t2t_mou_6' 'std_og_t2t_mou_7' 'std_og_t2t_mou_8'10 'std_og_t2m_mou_6' 'std_og_t2m_mou_7' 'std_og_t2m_mou_8'11 'std_og_t2f_mou_6' 'std_og_t2f_mou_7' 'std_og_t2f_mou_8' 'std_og_mou_6'12 'std_og_mou_7' 'std_og_mou_8' 'isd_og_mou_6' 'isd_og_mou_7'13 'isd_og_mou_8' 'spl_og_mou_6' 'spl_og_mou_7' 'spl_og_mou_8' 'og_others_6'14 'og_others_7' 'og_others_8' 'loc_ic_t2t_mou_6' 'loc_ic_t2t_mou_7'15 'loc_ic_t2t_mou_8' 'loc_ic_t2m_mou_6' 'loc_ic_t2m_mou_7'16 'loc_ic_t2m_mou_8' 'loc_ic_t2f_mou_6' 'loc_ic_t2f_mou_7'17 'loc_ic_t2f_mou_8' 'loc_ic_mou_6' 'loc_ic_mou_7' 'loc_ic_mou_8'18 'std_ic_t2t_mou_6' 'std_ic_t2t_mou_7' 'std_ic_t2t_mou_8'19 'std_ic_t2m_mou_6' 'std_ic_t2m_mou_7' 'std_ic_t2m_mou_8'20 'std_ic_t2f_mou_6' 'std_ic_t2f_mou_7' 'std_ic_t2f_mou_8' 'std_ic_mou_6'21 'std_ic_mou_7' 'std_ic_mou_8' 'spl_ic_mou_6' 'spl_ic_mou_7'22 'spl_ic_mou_8' 'isd_ic_mou_6' 'isd_ic_mou_7' 'isd_ic_mou_8' 'ic_others_6'23 'ic_others_7' 'ic_others_8' 'total_rech_num_6' 'total_rech_num_7'24 'total_rech_num_8' 'max_rech_amt_6' 'max_rech_amt_7' 'max_rech_amt_8'25 'last_day_rch_amt_6' 'last_day_rch_amt_7' 'last_day_rch_amt_8'26 'Average_rech_amt_6n7' 'delta_vol_2g' 'delta_vol_3g' 'delta_total_og_mou'27 'delta_total_ic_mou' 'delta_vbc_3g' 'delta_arpu' 'delta_total_rech_amt']
1# capping outliers to 99th percentile values2outlier_treatment = pd.DataFrame(columns=['Column', 'Outlier Threshold', 'Outliers replaced'])3for col in columns_with_outliers :4 outlier_threshold = data[col].quantile(0.99)5 condition = data[col] > outlier_threshold6 outlier_treatment = outlier_treatment.append({'Column' : col , 'Outlier Threshold' : outlier_threshold, 'Outliers replaced' : data.loc[condition,col].shape[0] }, ignore_index=True)7 data.loc[condition, col] = outlier_threshold8outlier_treatment
Column | Outlier Threshold | Outliers replaced | |
0 | onnet_mou_6 | 2166.37 | 301 |
1 | onnet_mou_7 | 2220.37 | 301 |
2 | onnet_mou_8 | 2188.50 | 301 |
3 | offnet_mou_6 | 2326.29 | 301 |
4 | offnet_mou_7 | 2410.10 | 301 |
5 | offnet_mou_8 | 2211.64 | 301 |
6 | roam_ic_mou_6 | 349.35 | 301 |
7 | roam_ic_mou_7 | 292.54 | 301 |
8 | roam_ic_mou_8 | 288.49 | 301 |
9 | roam_og_mou_6 | 543.71 | 301 |
10 | roam_og_mou_7 | 448.13 | 301 |
11 | roam_og_mou_8 | 432.74 | 301 |
12 | loc_og_t2t_mou_6 | 1076.24 | 301 |
13 | loc_og_t2t_mou_7 | 1059.88 | 301 |
14 | loc_og_t2t_mou_8 | 956.50 | 301 |
15 | loc_og_t2m_mou_6 | 1147.05 | 301 |
16 | loc_og_t2m_mou_7 | 1112.66 | 301 |
17 | loc_og_t2m_mou_8 | 1092.59 | 301 |
18 | loc_og_t2f_mou_6 | 90.88 | 301 |
19 | loc_og_t2f_mou_7 | 91.06 | 301 |
20 | loc_og_t2f_mou_8 | 86.68 | 300 |
21 | loc_og_t2c_mou_6 | 24.86 | 301 |
22 | loc_og_t2c_mou_7 | 28.24 | 301 |
23 | loc_og_t2c_mou_8 | 28.87 | 301 |
24 | loc_og_mou_6 | 1806.94 | 301 |
25 | loc_og_mou_7 | 1761.43 | 301 |
26 | loc_og_mou_8 | 1689.07 | 301 |
27 | std_og_t2t_mou_6 | 1885.20 | 301 |
28 | std_og_t2t_mou_7 | 1919.19 | 301 |
29 | std_og_t2t_mou_8 | 1938.13 | 301 |
30 | std_og_t2m_mou_6 | 1955.61 | 301 |
31 | std_og_t2m_mou_7 | 2112.66 | 301 |
32 | std_og_t2m_mou_8 | 1905.81 | 301 |
33 | std_og_t2f_mou_6 | 44.39 | 301 |
34 | std_og_t2f_mou_7 | 43.89 | 301 |
35 | std_og_t2f_mou_8 | 38.88 | 301 |
36 | std_og_mou_6 | 2744.49 | 301 |
37 | std_og_mou_7 | 2874.65 | 301 |
38 | std_og_mou_8 | 2800.87 | 301 |
39 | isd_og_mou_6 | 41.25 | 301 |
40 | isd_og_mou_7 | 40.43 | 301 |
41 | isd_og_mou_8 | 31.24 | 300 |
42 | spl_og_mou_6 | 71.36 | 301 |
43 | spl_og_mou_7 | 79.87 | 301 |
44 | spl_og_mou_8 | 74.11 | 301 |
45 | og_others_6 | 9.31 | 301 |
46 | og_others_7 | 0.00 | 164 |
47 | og_others_8 | 0.00 | 180 |
48 | loc_ic_t2t_mou_6 | 625.35 | 301 |
49 | loc_ic_t2t_mou_7 | 648.79 | 301 |
50 | loc_ic_t2t_mou_8 | 621.67 | 301 |
51 | loc_ic_t2m_mou_6 | 1026.44 | 301 |
52 | loc_ic_t2m_mou_7 | 1009.29 | 301 |
53 | loc_ic_t2m_mou_8 | 976.09 | 301 |
54 | loc_ic_t2f_mou_6 | 197.17 | 301 |
55 | loc_ic_t2f_mou_7 | 205.25 | 301 |
56 | loc_ic_t2f_mou_8 | 185.62 | 301 |
57 | loc_ic_mou_6 | 1484.99 | 301 |
58 | loc_ic_mou_7 | 1515.87 | 301 |
59 | loc_ic_mou_8 | 1459.55 | 301 |
60 | std_ic_t2t_mou_6 | 215.64 | 301 |
61 | std_ic_t2t_mou_7 | 231.15 | 301 |
62 | std_ic_t2t_mou_8 | 215.20 | 301 |
63 | std_ic_t2m_mou_6 | 393.73 | 301 |
64 | std_ic_t2m_mou_7 | 408.58 | 301 |
65 | std_ic_t2m_mou_8 | 372.61 | 301 |
66 | std_ic_t2f_mou_6 | 53.39 | 301 |
67 | std_ic_t2f_mou_7 | 56.59 | 300 |
68 | std_ic_t2f_mou_8 | 49.41 | 301 |
69 | std_ic_mou_6 | 577.89 | 301 |
70 | std_ic_mou_7 | 616.89 | 301 |
71 | std_ic_mou_8 | 563.89 | 301 |
72 | spl_ic_mou_6 | 0.68 | 278 |
73 | spl_ic_mou_7 | 0.51 | 295 |
74 | spl_ic_mou_8 | 0.61 | 293 |
75 | isd_ic_mou_6 | 239.60 | 301 |
76 | isd_ic_mou_7 | 240.13 | 301 |
77 | isd_ic_mou_8 | 249.89 | 301 |
78 | ic_others_6 | 20.71 | 301 |
79 | ic_others_7 | 25.26 | 301 |
80 | ic_others_8 | 21.53 | 300 |
81 | total_rech_num_6 | 48.00 | 283 |
82 | total_rech_num_7 | 48.00 | 283 |
83 | total_rech_num_8 | 46.00 | 287 |
84 | max_rech_amt_6 | 1000.00 | 169 |
85 | max_rech_amt_7 | 1000.00 | 204 |
86 | max_rech_amt_8 | 951.00 | 289 |
87 | last_day_rch_amt_6 | 655.00 | 284 |
88 | last_day_rch_amt_7 | 655.00 | 300 |
89 | last_day_rch_amt_8 | 619.00 | 283 |
90 | Average_rech_amt_6n7 | 2216.30 | 301 |
91 | delta_vol_2g | 654.31 | 301 |
92 | delta_vol_3g | 1878.12 | 301 |
93 | delta_total_og_mou | 1465.10 | 301 |
94 | delta_total_ic_mou | 619.69 | 301 |
95 | delta_vbc_3g | 929.64 | 301 |
96 | delta_arpu | 864.34 | 301 |
97 | delta_total_rech_amt | 1036.40 | 301 |
1categorical = data.dtypes == 'category'2categorical_vars = data.columns[categorical].to_list()3ind_categorical_vars = set(categorical_vars) - {'Churn'} #independent categorical variables4ind_categorical_vars
1{'monthly_2g_6',2 'monthly_2g_7',3 'monthly_2g_8',4 'monthly_3g_6',5 'monthly_3g_7',6 'monthly_3g_8',7 'sachet_2g_6',8 'sachet_2g_7',9 'sachet_2g_8',10 'sachet_3g_6',11 'sachet_3g_7',12 'sachet_3g_8'}
Grouping Categories with less Contribution
1# Finding & Grouping categories with less than 1% contribution in each column into "Others"2for col in ind_categorical_vars :3 category_counts = 100*data[col].value_counts(normalize=True)4 print('\n',tabulate(pd.DataFrame(category_counts), headers='keys', tablefmt='psql'),'\n')5 low_count_categories = category_counts[category_counts <= 1].index.to_list()6 print(f"Replaced {low_count_categories} in {col} with category : Others")7 data[col].replace(low_count_categories,'Others',inplace=True)
1+----+---------------+2| | sachet_3g_6 |3|----+---------------|4| 0 | 93.4091 |5| 1 | 4.35507 |6| 2 | 1.04295 |7| 3 | 0.396521 |8| 4 | 0.219919 |9| 5 | 0.123288 |10| 6 | 0.089967 |11| 7 | 0.0866349 |12| 8 | 0.0499817 |13| 9 | 0.0499817 |14| 10 | 0.0366532 |15| 11 | 0.0266569 |16| 15 | 0.0166606 |17| 12 | 0.0133284 |18| 19 | 0.0133284 |19| 13 | 0.00999633 |20| 14 | 0.00999633 |21| 18 | 0.00999633 |22| 23 | 0.00999633 |23| 16 | 0.00666422 |24| 22 | 0.00666422 |25| 29 | 0.00666422 |26| 28 | 0.00333211 |27| 17 | 0.00333211 |28| 21 | 0.00333211 |29+----+---------------+3031Replaced [3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 12, 19, 13, 14, 18, 23, 16, 22, 29, 28, 17, 21] in sachet_3g_6 with category : Others3233 +----+---------------+34| | sachet_2g_6 |35|----+---------------|36| 0 | 82.5631 |37| 1 | 7.87378 |38| 2 | 3.3621 |39| 3 | 2.0126 |40| 4 | 1.32951 |41| 5 | 0.703076 |42| 6 | 0.509813 |43| 7 | 0.356536 |44| 8 | 0.286562 |45| 9 | 0.239912 |46| 10 | 0.17327 |47| 12 | 0.146613 |48| 11 | 0.0999633 |49| 13 | 0.0566459 |50| 14 | 0.0533138 |51| 15 | 0.0433175 |52| 17 | 0.0366532 |53| 18 | 0.029989 |54| 19 | 0.029989 |55| 16 | 0.0233248 |56| 22 | 0.0133284 |57| 20 | 0.00999633 |58| 21 | 0.00999633 |59| 24 | 0.00999633 |60| 25 | 0.00999633 |61| 39 | 0.00333211 |62| 27 | 0.00333211 |63| 30 | 0.00333211 |64| 32 | 0.00333211 |65| 34 | 0.00333211 |66| 28 | 0 |67| 42 | 0 |68+----+---------------+6970Replaced [5, 6, 7, 8, 9, 10, 12, 11, 13, 14, 15, 17, 18, 19, 16, 22, 20, 21, 24, 25, 39, 27, 30, 32, 34, 28, 42] in sachet_2g_6 with category : Others7172 +----+----------------+73| | monthly_2g_7 |74|----+----------------|75| 0 | 88.4876 |76| 1 | 10.0397 |77| 2 | 1.35284 |78| 3 | 0.0966312 |79| 4 | 0.0166606 |80| 5 | 0.00666422 |81+----+----------------+8283Replaced [3, 4, 5] in monthly_2g_7 with category : Others8485 +----+---------------+86| | sachet_2g_7 |87|----+---------------|88| 0 | 81.8033 |89| 1 | 7.24068 |90| 2 | 3.34877 |91| 3 | 1.96595 |92| 4 | 1.50945 |93| 5 | 1.20622 |94| 6 | 0.843024 |95| 7 | 0.543134 |96| 8 | 0.403185 |97| 10 | 0.239912 |98| 9 | 0.219919 |99| 11 | 0.159941 |100| 12 | 0.0966312 |101| 14 | 0.0799707 |102| 13 | 0.0666422 |103| 15 | 0.0499817 |104| 16 | 0.0366532 |105| 18 | 0.0333211 |106| 17 | 0.029989 |107| 20 | 0.0266569 |108| 19 | 0.0233248 |109| 21 | 0.00999633 |110| 26 | 0.00999633 |111| 27 | 0.00999633 |112| 22 | 0.00666422 |113| 23 | 0.00666422 |114| 30 | 0.00666422 |115| 42 | 0.00333211 |116| 24 | 0.00333211 |117| 25 | 0.00333211 |118| 29 | 0.00333211 |119| 32 | 0.00333211 |120| 35 | 0.00333211 |121| 48 | 0.00333211 |122| 28 | 0 |123+----+---------------+124125Replaced [6, 7, 8, 10, 9, 11, 12, 14, 13, 15, 16, 18, 17, 20, 19, 21, 26, 27, 22, 23, 30, 42, 24, 25, 29, 32, 35, 48, 28] in sachet_2g_7 with category : Others126127 +----+----------------+128| | monthly_2g_6 |129|----+----------------|130| 0 | 88.9074 |131| 1 | 9.83306 |132| 2 | 1.14958 |133| 3 | 0.0866349 |134| 4 | 0.0233248 |135+----+----------------+136137Replaced [3, 4] in monthly_2g_6 with category : Others138139 +----+---------------+140| | sachet_3g_7 |141|----+---------------|142| 0 | 93.4757 |143| 1 | 4.10849 |144| 2 | 1.03962 |145| 3 | 0.383193 |146| 4 | 0.239912 |147| 5 | 0.219919 |148| 6 | 0.139949 |149| 7 | 0.059978 |150| 9 | 0.0533138 |151| 8 | 0.0466496 |152| 11 | 0.0433175 |153| 10 | 0.0333211 |154| 12 | 0.0333211 |155| 15 | 0.0166606 |156| 14 | 0.0166606 |157| 13 | 0.0133284 |158| 18 | 0.0133284 |159| 19 | 0.00999633 |160| 20 | 0.00999633 |161| 22 | 0.00999633 |162| 17 | 0.00666422 |163| 21 | 0.00666422 |164| 24 | 0.00666422 |165| 33 | 0.00333211 |166| 16 | 0.00333211 |167| 31 | 0.00333211 |168| 35 | 0.00333211 |169+----+---------------+170171Replaced [3, 4, 5, 6, 7, 9, 8, 11, 10, 12, 15, 14, 13, 18, 19, 20, 22, 17, 21, 24, 33, 16, 31, 35] in sachet_3g_7 with category : Others172173 +----+---------------+174| | sachet_3g_8 |175|----+---------------|176| 0 | 94.2388 |177| 1 | 3.52537 |178| 2 | 0.839692 |179| 3 | 0.429842 |180| 4 | 0.243244 |181| 5 | 0.219919 |182| 6 | 0.0866349 |183| 7 | 0.0766386 |184| 8 | 0.0733065 |185| 9 | 0.0399853 |186| 12 | 0.0366532 |187| 13 | 0.0333211 |188| 10 | 0.0333211 |189| 11 | 0.0199927 |190| 14 | 0.0199927 |191| 15 | 0.0166606 |192| 16 | 0.00999633 |193| 17 | 0.00666422 |194| 18 | 0.00666422 |195| 20 | 0.00666422 |196| 21 | 0.00666422 |197| 23 | 0.00666422 |198| 38 | 0.00333211 |199| 19 | 0.00333211 |200| 25 | 0.00333211 |201| 27 | 0.00333211 |202| 29 | 0.00333211 |203| 30 | 0.00333211 |204| 41 | 0.00333211 |205+----+---------------+206207Replaced [2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 10, 11, 14, 15, 16, 17, 18, 20, 21, 23, 38, 19, 25, 27, 29, 30, 41] in sachet_3g_8 with category : Others208209 +----+----------------+210| | monthly_3g_6 |211|----+----------------|212| 0 | 88.0744 |213| 1 | 8.4669 |214| 2 | 2.32248 |215| 3 | 0.689747 |216| 4 | 0.246576 |217| 5 | 0.106628 |218| 6 | 0.0366532 |219| 7 | 0.029989 |220| 8 | 0.00999633 |221| 11 | 0.00666422 |222| 9 | 0.00666422 |223| 14 | 0.00333211 |224+----+----------------+225226Replaced [3, 4, 5, 6, 7, 8, 11, 9, 14] in monthly_3g_6 with category : Others227228 +----+----------------+229| | monthly_2g_8 |230|----+----------------|231| 0 | 89.7604 |232| 1 | 9.19996 |233| 2 | 0.942988 |234| 3 | 0.0733065 |235| 4 | 0.0166606 |236| 5 | 0.00666422 |237+----+----------------+238239Replaced [2, 3, 4, 5] in monthly_2g_8 with category : Others240241 +----+---------------+242| | sachet_2g_8 |243|----+---------------|244| 0 | 79.7274 |245| 1 | 8.87008 |246| 2 | 3.25881 |247| 3 | 2.19253 |248| 4 | 1.81267 |249| 5 | 1.44947 |250| 6 | 0.88301 |251| 7 | 0.459831 |252| 8 | 0.313218 |253| 9 | 0.249908 |254| 10 | 0.169938 |255| 11 | 0.123288 |256| 12 | 0.113292 |257| 14 | 0.0766386 |258| 15 | 0.0566459 |259| 13 | 0.0499817 |260| 16 | 0.0433175 |261| 18 | 0.0266569 |262| 17 | 0.0233248 |263| 19 | 0.0233248 |264| 20 | 0.0133284 |265| 34 | 0.00666422 |266| 29 | 0.00666422 |267| 27 | 0.00666422 |268| 24 | 0.00666422 |269| 22 | 0.00666422 |270| 21 | 0.00666422 |271| 23 | 0.00333211 |272| 25 | 0.00333211 |273| 26 | 0.00333211 |274| 31 | 0.00333211 |275| 32 | 0.00333211 |276| 33 | 0.00333211 |277| 44 | 0.00333211 |278+----+---------------+279280Replaced [6, 7, 8, 9, 10, 11, 12, 14, 15, 13, 16, 18, 17, 19, 20, 34, 29, 27, 24, 22, 21, 23, 25, 26, 31, 32, 33, 44] in sachet_2g_8 with category : Others281282 +----+----------------+283| | monthly_3g_8 |284|----+----------------|285| 0 | 88.3876 |286| 1 | 8.00706 |287| 2 | 2.45243 |288| 3 | 0.656426 |289| 4 | 0.289894 |290| 5 | 0.0999633 |291| 6 | 0.0466496 |292| 7 | 0.029989 |293| 9 | 0.00999633 |294| 8 | 0.00999633 |295| 10 | 0.00666422 |296| 16 | 0.00333211 |297+----+----------------+298299Replaced [3, 4, 5, 6, 7, 9, 8, 10, 16] in monthly_3g_8 with category : Others300301 +----+----------------+302| | monthly_3g_7 |303|----+----------------|304| 0 | 87.8378 |305| 1 | 8.21699 |306| 2 | 2.739 |307| 3 | 0.689747 |308| 4 | 0.226584 |309| 5 | 0.129952 |310| 6 | 0.0766386 |311| 7 | 0.0333211 |312| 8 | 0.0166606 |313| 9 | 0.0133284 |314| 11 | 0.00666422 |315| 16 | 0.00333211 |316| 14 | 0.00333211 |317| 12 | 0.00333211 |318| 10 | 0.00333211 |319+----+----------------+320321Replaced [3, 4, 5, 6, 7, 8, 9, 11, 16, 14, 12, 10] in monthly_3g_7 with category : Others
Creating Dummy Variables
1dummy_vars = pd.get_dummies(data[ind_categorical_vars], drop_first=False, prefix=ind_categorical_vars, prefix_sep='_')2dummy_vars.head()
sachet_3g_6_0 | sachet_3g_6_1 | sachet_3g_6_2 | sachet_3g_6_Others | sachet_2g_6_0 | sachet_2g_6_1 | sachet_2g_6_2 | sachet_2g_6_3 | sachet_2g_6_4 | sachet_2g_6_Others | monthly_2g_7_0 | monthly_2g_7_1 | monthly_2g_7_2 | monthly_2g_7_Others | sachet_2g_7_0 | sachet_2g_7_1 | sachet_2g_7_2 | sachet_2g_7_3 | sachet_2g_7_4 | sachet_2g_7_5 | sachet_2g_7_Others | monthly_2g_6_0 | monthly_2g_6_1 | monthly_2g_6_2 | monthly_2g_6_Others | sachet_3g_7_0 | sachet_3g_7_1 | sachet_3g_7_2 | sachet_3g_7_Others | sachet_3g_8_0 | sachet_3g_8_1 | sachet_3g_8_Others | monthly_3g_6_0 | monthly_3g_6_1 | monthly_3g_6_2 | monthly_3g_6_Others | monthly_2g_8_0 | monthly_2g_8_1 | monthly_2g_8_Others | sachet_2g_8_0 | sachet_2g_8_1 | sachet_2g_8_2 | sachet_2g_8_3 | sachet_2g_8_4 | sachet_2g_8_5 | sachet_2g_8_Others | monthly_3g_8_0 | monthly_3g_8_1 | monthly_3g_8_2 | monthly_3g_8_Others | monthly_3g_7_0 | monthly_3g_7_1 | monthly_3g_7_2 | monthly_3g_7_Others | |
mobile_number | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
7000701601 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
7001524846 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
7002191713 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
7000875565 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
7000187447 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
1
1reference_cols = dummy_vars.filter(regex='.*Others$').columns.to_list() # Using category 'Others' in each column as reference.2dummy_vars.drop(columns=reference_cols, inplace=True)3reference_cols
1['sachet_3g_6_Others',2 'sachet_2g_6_Others',3 'monthly_2g_7_Others',4 'sachet_2g_7_Others',5 'monthly_2g_6_Others',6 'sachet_3g_7_Others',7 'sachet_3g_8_Others',8 'monthly_3g_6_Others',9 'monthly_2g_8_Others',10 'sachet_2g_8_Others',11 'monthly_3g_8_Others',12 'monthly_3g_7_Others']
1# concatenating dummy variables with original 'data'2data.drop(columns=ind_categorical_vars, inplace=True) # dropping original categorical columns3data = pd.concat([data, dummy_vars], axis=1)4data.head()
onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | aon | Average_rech_amt_6n7 | Churn | delta_vol_2g | delta_vol_3g | delta_total_og_mou | delta_total_ic_mou | delta_vbc_3g | delta_arpu | delta_total_rech_amt | sachet_3g_6_0 | sachet_3g_6_1 | sachet_3g_6_2 | sachet_2g_6_0 | sachet_2g_6_1 | sachet_2g_6_2 | sachet_2g_6_3 | sachet_2g_6_4 | monthly_2g_7_0 | monthly_2g_7_1 | monthly_2g_7_2 | sachet_2g_7_0 | sachet_2g_7_1 | sachet_2g_7_2 | sachet_2g_7_3 | sachet_2g_7_4 | sachet_2g_7_5 | monthly_2g_6_0 | monthly_2g_6_1 | monthly_2g_6_2 | sachet_3g_7_0 | sachet_3g_7_1 | sachet_3g_7_2 | sachet_3g_8_0 | sachet_3g_8_1 | monthly_3g_6_0 | monthly_3g_6_1 | monthly_3g_6_2 | monthly_2g_8_0 | monthly_2g_8_1 | sachet_2g_8_0 | sachet_2g_8_1 | sachet_2g_8_2 | sachet_2g_8_3 | sachet_2g_8_4 | sachet_2g_8_5 | monthly_3g_8_0 | monthly_3g_8_1 | monthly_3g_8_2 | monthly_3g_7_0 | monthly_3g_7_1 | monthly_3g_7_2 | |
mobile_number | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7000701601 | 57.84 | 54.68 | 52.29 | 453.43 | 567.16 | 325.91 | 16.23 | 33.49 | 31.64 | 23.74 | 12.59 | 38.06 | 51.39 | 31.38 | 40.28 | 308.63 | 447.38 | 162.28 | 62.13 | 55.14 | 53.23 | 0.0 | 0.0 | 0.00 | 422.16 | 533.91 | 255.79 | 4.30 | 23.29 | 12.01 | 49.89 | 31.76 | 49.14 | 6.66 | 20.08 | 16.68 | 60.86 | 75.14 | 77.84 | 0.0 | 0.18 | 10.01 | 4.50 | 0.00 | 6.50 | 0.00 | 0.0 | 0.0 | 58.14 | 32.26 | 27.31 | 217.56 | 221.49 | 121.19 | 152.16 | 101.46 | 39.53 | 427.88 | 355.23 | 188.04 | 36.89 | 11.83 | 30.39 | 91.44 | 126.99 | 141.33 | 52.19 | 34.24 | 22.21 | 180.54 | 173.08 | 193.94 | 0.21 | 0.0 | 0.0 | 2.06 | 14.53 | 31.59 | 15.74 | 15.19 | 15.14 | 5.0 | 5.0 | 7.0 | 1000.0 | 790.0 | 951.0 | 0.0 | 0.0 | 619.0 | 802 | 1185.0 | 1 | 0.00 | 0.00 | -198.22 | -163.51 | 38.68 | 864.34 | 1036.4 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
7001524846 | 413.69 | 351.03 | 35.08 | 94.66 | 80.63 | 136.48 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 297.13 | 217.59 | 12.49 | 80.96 | 70.58 | 50.54 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 7.15 | 378.09 | 288.18 | 63.04 | 116.56 | 133.43 | 22.58 | 13.69 | 10.04 | 75.69 | 0.00 | 0.00 | 0.00 | 130.26 | 143.48 | 98.28 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 10.23 | 0.00 | 0.0 | 0.0 | 23.84 | 9.84 | 0.31 | 57.58 | 13.98 | 15.48 | 0.00 | 0.00 | 0.00 | 81.43 | 23.83 | 15.79 | 0.00 | 0.58 | 0.10 | 22.43 | 4.08 | 0.65 | 0.00 | 0.00 | 0.00 | 22.43 | 4.66 | 0.75 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19.0 | 21.0 | 14.0 | 90.0 | 154.0 | 30.0 | 50.0 | 0.0 | 10.0 | 315 | 519.0 | 0 | -177.97 | -363.54 | -298.45 | -49.63 | -495.38 | -298.11 | -399.0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
7002191713 | 501.76 | 108.39 | 534.24 | 413.31 | 119.28 | 482.46 | 23.53 | 144.24 | 72.11 | 7.98 | 35.26 | 1.44 | 49.63 | 6.19 | 36.01 | 151.13 | 47.28 | 294.46 | 4.54 | 0.00 | 23.51 | 0.0 | 0.0 | 0.49 | 205.31 | 53.48 | 353.99 | 446.41 | 85.98 | 498.23 | 255.36 | 52.94 | 156.94 | 0.00 | 0.00 | 0.00 | 701.78 | 138.93 | 655.18 | 0.0 | 0.00 | 1.29 | 0.00 | 0.00 | 4.78 | 0.00 | 0.0 | 0.0 | 67.88 | 7.58 | 52.58 | 142.88 | 18.53 | 195.18 | 4.81 | 0.00 | 7.49 | 215.58 | 26.11 | 255.26 | 115.68 | 38.29 | 154.58 | 308.13 | 29.79 | 317.91 | 0.00 | 0.00 | 1.91 | 423.81 | 68.09 | 474.41 | 0.45 | 0.0 | 0.0 | 239.60 | 62.11 | 249.89 | 20.71 | 16.24 | 21.44 | 6.0 | 4.0 | 11.0 | 110.0 | 110.0 | 130.0 | 110.0 | 50.0 | 0.0 | 2607 | 380.0 | 0 | 0.02 | 0.00 | 465.51 | 573.93 | 0.00 | 244.00 | 337.0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
7000875565 | 50.51 | 74.01 | 70.61 | 296.29 | 229.74 | 162.76 | 0.00 | 2.83 | 0.00 | 0.00 | 17.74 | 0.00 | 42.61 | 65.16 | 67.38 | 273.29 | 145.99 | 128.28 | 0.00 | 4.48 | 10.26 | 0.0 | 0.0 | 0.00 | 315.91 | 215.64 | 205.93 | 7.89 | 2.58 | 3.23 | 22.99 | 64.51 | 18.29 | 0.00 | 0.00 | 0.00 | 30.89 | 67.09 | 21.53 | 0.0 | 0.00 | 0.00 | 0.00 | 3.26 | 5.91 | 0.00 | 0.0 | 0.0 | 41.33 | 71.44 | 28.89 | 226.81 | 149.69 | 150.16 | 8.71 | 8.68 | 32.71 | 276.86 | 229.83 | 211.78 | 68.79 | 78.64 | 6.33 | 18.68 | 73.08 | 73.93 | 0.51 | 0.00 | 2.18 | 87.99 | 151.73 | 82.44 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.23 | 0.00 | 0.00 | 0.00 | 10.0 | 6.0 | 2.0 | 110.0 | 110.0 | 130.0 | 100.0 | 100.0 | 130.0 | 511 | 459.0 | 0 | 0.00 | 0.00 | -83.03 | -78.75 | -12.17 | -177.53 | -299.0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
7000187447 | 1185.91 | 9.28 | 7.79 | 61.64 | 0.00 | 5.54 | 0.00 | 4.76 | 4.81 | 0.00 | 8.46 | 13.34 | 38.99 | 0.00 | 0.00 | 58.54 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 97.54 | 0.00 | 0.00 | 1146.91 | 0.81 | 0.00 | 1.55 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1148.46 | 0.81 | 0.00 | 0.0 | 0.00 | 0.00 | 2.58 | 0.00 | 0.00 | 0.93 | 0.0 | 0.0 | 34.54 | 0.00 | 0.00 | 47.41 | 2.31 | 0.00 | 0.00 | 0.00 | 0.00 | 81.96 | 2.31 | 0.00 | 8.63 | 0.00 | 0.00 | 1.28 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.91 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19.0 | 2.0 | 4.0 | 110.0 | 0.0 | 30.0 | 30.0 | 0.0 | 0.0 | 667 | 408.0 | 0 | 0.00 | 0.00 | -625.17 | -47.09 | 0.00 | -329.00 | -378.0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
1dummy_cols = dummy_vars.columns.to_list()2data[dummy_cols] = data[dummy_cols].astype('category')
1data.shape
1(30011, 142)
1data.reset_index('mobile_number').to_csv('cleaned_churn_data.csv')
Continue to Part-2 for Modelling
1