Navigate back to the homepage

Telecom Churn Case Study - Part 1

Jayanth Boddu
December 4th, 2020 · 8 min read

Photo by Mike Kononov on Unsplash

This analysis is the combined effort of Umaer and me.

Telecom Churn Case Study

Analysis Approach :

  • Telecommunications industry experiences an average of 15 - 25% annual churn rate. Given the fact that it costs 5-10 times more to acquire a new customer than to retain an existing one, customer retention has become even more important than customer acquisition.

  • Here we are given with 4 months of data related to customer usage. In this case study, we analyse customer-level data of a leading telecom firm, build predictive models to identify customers at high risk of churn and identify the main indicators of churn.

  • Churn is predicted using two approaches. Usage based churn and Revenue based churn. Usage based churn:

  • Customers who have zero usage, either incoming or outgoing - in terms of calls, internet etc. over a period of time.

  • This case study only considers usage based churn.

  • In the Indian and the southeast Asian market, approximately 80% of revenue comes from the top 20% customers (called high-value customers). Thus, if we can reduce churn of the high-value customers, we will be able to reduce significant revenue leakage. Hence, this case study focuses on high value customers only.

  • The dataset contains customer-level information for a span of four consecutive months - June, July, August and September. The months are encoded as 6, 7, 8 and 9, respectively.

  • The business objective is to predict the churn in the last (i.e. the ninth) month using the data (features) from the first three months.

  • This is a classification problem, where we need to predict whether the customers is about to churn or not. We have carried out Baseline Logistic Regression, then Logistic Regression with PCA, PCA + Random Forest, PCA + XGBoost.

Analysis Steps

Data Cleaning and EDA

  1. We have started with importing Necessary packages and libraries.
  2. We have loaded the dataset into a dataframe.
  3. We have checked the number of columns, their data types, Null count and unique value_value_count to get some understanding about data and to check if the columns are under correct data-type.
  4. Checking for duplicate records (rows) in the data. There were no duplicates.
  5. Since ‘mobile_number’ is the unique identifier available, we have made it our index to retain the identity.
  6. Have found some columns that donot follow the naming standard, we have renamed those columns to make sure all the variables follow the same naming convention.
  7. Follwing with column renaming, we have dealt with converting the columns into their respective data types. Here, we have evaluated all the columns which are having less than or equal to 29 unique values as catrgorical columns and rest as contineous columns.
  8. The date columns were having ‘object’ as their data type, we have converted to the proper datetime format.
  9. Since, our analysis is focused on the HVC(High value customers), we have filtered for high value customers to carryout the further analysis. The metric of this filtering of HVC is such that all the customers whose ‘Average_rech_amt’ of months 6 and 7 greater than or equal to 70th percentile of the ‘Average_rech_amt’ are considered as High Value Customers.
  10. Checked for missing values.
  11. Dropped all the columns with missing values greater than 50%.
  12. We have been given 4 months data. Since each months revenue and usage data is not related to other, we did month-wise drill down on missing values.
  13. Some columns had similar range of missing values. So, we have looked at their related columns and checked if these might be imputed with zero.
  14. We have found that ‘last_date_of_the_month’ had some misisng values, so this is very meaningful and we have imputed the last date based on the month.
  15. We have found some columns with only one unique value, so it is of no use for the analysis, hence we have dropped those columns.
  16. Once after checking all the data preparation tasks, tagged the Churn variable(which is our target variable).
  17. After imputing, we have dropped churn phase columns (Columns belonging to month - 9).
  18. After all the above processing, we have retained 30,011 rows and 126 columns.
  19. Exploratory Data Analysis
  • The telecom company has many users with negative average revenues in both phases. These users are likely to churn.
  • Most customers prefer the plans of ‘0’ category.
  • The customers with lesser ‘aon’ are more likely to Churn when compared to the Customers with higer ‘aon’.
  • Revenue generated by the Customers who are about to churn is very unstable.
  • The Customers whose arpu decreases in 7th month are more likely to churn when compared to ones with increase in arpu.
  • The Customers with high total_og_mou in 6th month and lower total_og_mou in 7th month are more likely to churn compared to the rest.
  • The Customers with decrease in rate of total_ic_mou in 7th month are more likely to churn, compared to the rest.
  • Customers with stable usage of 2g volume throughout 6 and 7 months are less likely to churn.
  • Customers with fall in usage of 2g volume in 7th month are more likely to Churn.
  • Customers with stable usage of 3g volume throughout 6 and 7 months are less likely to churn.
  • Customers with fall in consumption of 3g volume in 7th month are more likely to Churn.
  • The customers with lower total_og_mou in 6th and 8th months are more likely to Churn compared to the ones with higher total_og_mou.
  • The customers with lesser total_og_mou_8 and aon are more likely to churn compared to the one with higher total_og_mou_8 and aon.
  • The customers with less total_ic_mou_8 are more likely to churn irrespective of aon.
  • The customers with total_ic_mou_8 > 2000 are very less likely to churn.
  1. Correlation analysis has been performed.
  2. We have created the derived variables and then removed the variables that were used to derive new ones.
  3. Outlier treatment has been performed. We have looked at the quantiles to understand the spread of Data.
  4. We have capped the upper outliers to 99th percentile.
  5. We have checked categorical variables and contribution of classes in those variables. The classes with less ccontribution are grouped into ‘Others’.
  6. Dummy Variables were created.

Pre-processing Steps

  1. Train-Test Split has been performed.
  2. The data has high class-imbalance with the ratio of 0.095 (class 1 : class 0).
  3. SMOTE technique has been used to overcome class-imbalance.
  4. Predictor columns have been standardized to mean - 0 and standard_deviation- 1.

Modelling

Model 1 : Logistic Regression with RFE & Manual Elimination ( Interpretable Model )
Most important predictors of Churn , in order of importance and their coefficients are as follows :

  • loc_ic_t2f_mou_8 -1.2736
  • total_rech_num_8 -1.2033
  • total_rech_num_6 0.6053
  • monthly_3g_8_0 0.3994
  • monthly_2g_8_0 0.3666
  • std_ic_t2f_mou_8 -0.3363
  • std_og_t2f_mou_8 -0.2474
  • const -0.2336
  • monthly_3g_7_0 -0.2099
  • std_ic_t2f_mou_7 0.1532
  • sachet_2g_6_0 -0.1108
  • sachet_2g_7_0 -0.0987
  • sachet_2g_8_0 0.0488
  • sachet_3g_6_0 -0.0399

PCA: PCA : 95% of variance in the train set can be explained by first 16 principal components and 100% of variance is explained by the first 45 principal components.

Model 2 : PCA + Logistic Regression

1Train Performance :
2
3 Accuracy : 0.627
4 Sensitivity / True Positive Rate / Recall : 0.918
5 Specificity / True Negative Rate : 0.599
6 Precision / Positive Predictive Value : 0.179
7 F1-score : 0.3
8
9 Test Performance :
10
11 Accuracy : 0.086
12 Sensitivity / True Positive Rate / Recall : 1.0
13 Specificity / True Negative Rate : 0.0
14 Precision / Positive Predictive Value : 0.086
15 F1-score : 0.158

Model 3 : PCA + Random Forest Classifier

1Train Performance :
2
3 Accuracy : 0.882
4 Sensitivity / True Positive Rate / Recall : 0.816
5 Specificity / True Negative Rate : 0.888
6 Precision / Positive Predictive Value : 0.408
7 F1-score : 0.544
8
9 Test Performance :
10
11 Accuracy : 0.86
12 Sensitivity / True Positive Rate / Recall : 0.80
13 Specificity / True Negative Rate : 0.78
14 Precision / Positive Predictive Value :0.37
15 F1-score :0.51

Model 4 : PCA + XGBoost

1Train Performance :
2
3 Accuracy : 0.873
4 Sensitivity / True Positive Rate / Recall : 0.887
5 Specificity / True Negative Rate : 0.872
6 Precision / Positive Predictive Value : 0.396
7 F1-score : 0.548
8
9 Test Performance :
10
11 Accuracy : 0.086
12 Sensitivity / True Positive Rate / Recall : 1.0
13 Specificity / True Negative Rate : 0.0
14 Precision / Positive Predictive Value : 0.086
15 F1-score : 0.158

Recommendations :

Following are the strongest indicators of churn

Customers who churn show lower average monthly local incoming calls from fixed line in the action period by 1.27 standard deviations , compared to users who don’t churn , when all other factors are held constant. This is the strongest indicator of churn. Customers who churn show lower number of recharges done in action period by 1.20 standard deviations, when all other factors are held constant. This is the second strongest indicator of churn. Further customers who churn have done 0.6 standard deviations higher recharge than non-churn customers. This factor when coupled with above factors is a good indicator of churn. Customers who churn are more likely to be users of ‘monthly 2g package-0 / monthly 3g package-0’ in action period (approximately 0.3 std deviations higher than other packages), when all other factors are held constant.

Based on the above indicators the recommendations to the telecom company are :

Concentrate on users with 1.27 std devations lower than average incoming calls from fixed line. They are most likely to churn. Concentrate on users who recharge less number of times ( less than 1.2 std deviations compared to avg) in the 8th month. They are second most likely to churn. Models with high sensitivity are the best for predicting churn. Use the PCA + Logistic Regression model to predict churn. It has an ROC score of 0.87, test sensitivity of 100%.

Analysis

Data Understanding

1# Importing Necessary Libraries.
2import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns
3import warnings
4warnings.filterwarnings('ignore')
5
6# Setting max display columns and rows.
7pd.set_option('display.max_rows', 500)
8pd.set_option('display.max_columns', 500)
1# Reading Dataset into a DataFrame.
2data=pd.read_csv('telecom_churn_data.csv')
3data.head()
mobile_numbercircle_idloc_og_t2o_moustd_og_t2o_mouloc_ic_t2o_moulast_date_of_month_6last_date_of_month_7last_date_of_month_8last_date_of_month_9arpu_6arpu_7arpu_8arpu_9onnet_mou_6onnet_mou_7onnet_mou_8onnet_mou_9offnet_mou_6offnet_mou_7offnet_mou_8offnet_mou_9roam_ic_mou_6roam_ic_mou_7roam_ic_mou_8roam_ic_mou_9roam_og_mou_6roam_og_mou_7roam_og_mou_8roam_og_mou_9loc_og_t2t_mou_6loc_og_t2t_mou_7loc_og_t2t_mou_8loc_og_t2t_mou_9loc_og_t2m_mou_6loc_og_t2m_mou_7loc_og_t2m_mou_8loc_og_t2m_mou_9loc_og_t2f_mou_6loc_og_t2f_mou_7loc_og_t2f_mou_8loc_og_t2f_mou_9loc_og_t2c_mou_6loc_og_t2c_mou_7loc_og_t2c_mou_8loc_og_t2c_mou_9loc_og_mou_6loc_og_mou_7loc_og_mou_8loc_og_mou_9std_og_t2t_mou_6std_og_t2t_mou_7std_og_t2t_mou_8std_og_t2t_mou_9std_og_t2m_mou_6std_og_t2m_mou_7std_og_t2m_mou_8std_og_t2m_mou_9std_og_t2f_mou_6std_og_t2f_mou_7std_og_t2f_mou_8std_og_t2f_mou_9std_og_t2c_mou_6std_og_t2c_mou_7std_og_t2c_mou_8std_og_t2c_mou_9std_og_mou_6std_og_mou_7std_og_mou_8std_og_mou_9isd_og_mou_6isd_og_mou_7isd_og_mou_8isd_og_mou_9spl_og_mou_6spl_og_mou_7spl_og_mou_8spl_og_mou_9og_others_6og_others_7og_others_8og_others_9total_og_mou_6total_og_mou_7total_og_mou_8total_og_mou_9loc_ic_t2t_mou_6loc_ic_t2t_mou_7loc_ic_t2t_mou_8loc_ic_t2t_mou_9loc_ic_t2m_mou_6loc_ic_t2m_mou_7loc_ic_t2m_mou_8loc_ic_t2m_mou_9loc_ic_t2f_mou_6loc_ic_t2f_mou_7loc_ic_t2f_mou_8loc_ic_t2f_mou_9loc_ic_mou_6loc_ic_mou_7loc_ic_mou_8loc_ic_mou_9std_ic_t2t_mou_6std_ic_t2t_mou_7std_ic_t2t_mou_8std_ic_t2t_mou_9std_ic_t2m_mou_6std_ic_t2m_mou_7std_ic_t2m_mou_8std_ic_t2m_mou_9std_ic_t2f_mou_6std_ic_t2f_mou_7std_ic_t2f_mou_8std_ic_t2f_mou_9std_ic_t2o_mou_6std_ic_t2o_mou_7std_ic_t2o_mou_8std_ic_t2o_mou_9std_ic_mou_6std_ic_mou_7std_ic_mou_8std_ic_mou_9total_ic_mou_6total_ic_mou_7total_ic_mou_8total_ic_mou_9spl_ic_mou_6spl_ic_mou_7spl_ic_mou_8spl_ic_mou_9isd_ic_mou_6isd_ic_mou_7isd_ic_mou_8isd_ic_mou_9ic_others_6ic_others_7ic_others_8ic_others_9total_rech_num_6total_rech_num_7total_rech_num_8total_rech_num_9total_rech_amt_6total_rech_amt_7total_rech_amt_8total_rech_amt_9max_rech_amt_6max_rech_amt_7max_rech_amt_8max_rech_amt_9date_of_last_rech_6date_of_last_rech_7date_of_last_rech_8date_of_last_rech_9last_day_rch_amt_6last_day_rch_amt_7last_day_rch_amt_8last_day_rch_amt_9date_of_last_rech_data_6date_of_last_rech_data_7date_of_last_rech_data_8date_of_last_rech_data_9total_rech_data_6total_rech_data_7total_rech_data_8total_rech_data_9max_rech_data_6max_rech_data_7max_rech_data_8max_rech_data_9count_rech_2g_6count_rech_2g_7count_rech_2g_8count_rech_2g_9count_rech_3g_6count_rech_3g_7count_rech_3g_8count_rech_3g_9av_rech_amt_data_6av_rech_amt_data_7av_rech_amt_data_8av_rech_amt_data_9vol_2g_mb_6vol_2g_mb_7vol_2g_mb_8vol_2g_mb_9vol_3g_mb_6vol_3g_mb_7vol_3g_mb_8vol_3g_mb_9arpu_3g_6arpu_3g_7arpu_3g_8arpu_3g_9arpu_2g_6arpu_2g_7arpu_2g_8arpu_2g_9night_pck_user_6night_pck_user_7night_pck_user_8night_pck_user_9monthly_2g_6monthly_2g_7monthly_2g_8monthly_2g_9sachet_2g_6sachet_2g_7sachet_2g_8sachet_2g_9monthly_3g_6monthly_3g_7monthly_3g_8monthly_3g_9sachet_3g_6sachet_3g_7sachet_3g_8sachet_3g_9fb_user_6fb_user_7fb_user_8fb_user_9aonaug_vbc_3gjul_vbc_3gjun_vbc_3gsep_vbc_3g
070008427531090.00.00.06/30/20147/31/20148/31/20149/30/2014197.385214.816213.80321.100NaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.0NaNNaNNaN0.00NaNNaNNaN0.0NaNNaNNaN0.00NaNNaNNaN0.0NaN0.000.000.000.00NaNNaN0.16NaNNaNNaN4.13NaNNaNNaN1.15NaNNaNNaN5.44NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.00NaNNaNNaN0.0NaNNaNNaN0.00NaN0.000.005.440.00NaNNaN0.0NaNNaNNaN0.0NaNNaNNaN0.0NaN4326362252252025225225206/21/20147/16/20148/8/20149/28/201425225225206/21/20147/16/20148/8/2014NaN1.01.01.0NaN252.0252.0252.0NaN0.00.00.0NaN1.01.01.0NaN252.0252.0252.0NaN30.131.325.750.083.57150.76109.610.00212.17212.17212.17NaN212.17212.17212.17NaN0.00.00.0NaN00000000111000001.01.01.0NaN96830.40.0101.203.58
170018657781090.00.00.06/30/20147/31/20148/31/20149/30/201434.047355.074268.32186.28524.1178.687.6818.3415.7499.84304.7653.760.00.000.000.000.00.000.000.0023.8874.567.6818.3411.5175.94291.8653.760.000.000.000.000.02.910.000.0035.39150.51299.5472.110.234.110.000.000.000.460.130.000.000.000.000.00.00.00.00.00.234.580.130.000.00.00.00.04.6823.4312.760.000.000.00.00.040.31178.53312.4472.111.6129.9129.23116.0917.4865.38375.5856.930.008.933.610.0019.09104.23408.43173.030.000.002.350.005.900.0012.4915.010.000.000.000.000.00.00.00.05.900.0014.8415.0126.83104.23423.28188.040.000.00.00.001.830.000.00.000.000.000.00.0049115743842831214415465506/29/20147/31/20148/28/20149/30/20144423300NaN7/25/20148/10/2014NaNNaN1.02.0NaNNaN154.025.0NaNNaN1.02.0NaNNaN0.00.0NaNNaN154.050.0NaN0.00108.07365.470.00.000.000.000.00NaN0.000.00NaNNaN28.617.60NaNNaN0.00.0NaN0100002000000000NaN1.01.0NaN10060.00.00.000.00
270016259591090.00.00.06/30/20147/31/20148/31/20149/30/2014167.690189.058210.226290.71411.5455.2437.2674.81143.33220.59208.36118.910.00.000.0038.490.00.000.0070.947.1928.7413.5814.3929.3416.8638.4628.1624.1121.7915.6122.240.0135.5445.760.4860.6667.4167.6664.814.3426.4922.588.7641.8167.4175.539.281.4814.7622.830.00.00.00.00.047.64108.68120.9418.040.00.00.00.046.56236.8496.8442.080.450.00.00.0155.33412.94285.46124.94115.6971.1167.46148.2314.3815.4438.8938.9899.48122.2949.63158.19229.56208.86155.99345.4172.4171.2928.6949.4445.18177.01167.09118.1821.7358.3443.233.860.00.00.00.0139.33306.66239.03171.49370.04519.53395.03517.740.210.00.00.450.000.850.00.010.933.140.00.36542716831511635886200861006/17/20147/24/20148/14/20149/29/20140200860NaNNaNNaN9/17/2014NaNNaNNaN1.0NaNNaNNaN46.0NaNNaNNaN1.0NaNNaNNaN0.0NaNNaNNaN46.00.000.000.000.00.000.000.008.42NaNNaNNaN2.84NaNNaNNaN0.0NaNNaNNaN0.00000000100000000NaNNaNNaN1.011030.00.04.170.00
370012041721090.00.00.06/30/20147/31/20148/31/20149/30/2014221.338251.102508.054389.50099.9154.39310.98241.71123.31109.0171.68113.540.054.8644.380.000.028.0939.040.0073.6834.8110.6115.49107.4383.2122.4665.461.910.654.912.060.00.000.000.00183.03118.6837.9983.0326.2314.89289.58226.212.991.736.539.990.000.000.000.00.00.00.00.029.2316.63296.11236.210.00.00.00.010.960.0018.0943.290.000.00.00.0223.23135.31352.21362.5462.0819.988.0441.73113.9664.5120.2852.8657.4327.0919.8465.59233.48111.5948.18160.1943.4866.440.00129.841.3338.564.9413.981.180.000.000.000.00.00.00.045.99105.014.94143.83280.08216.6153.13305.380.590.00.00.550.000.000.00.000.000.000.00.8010111814230310601410605050506/28/20147/31/20148/31/20149/30/201430505030NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0.000.000.000.00.000.000.000.00NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0000000000000000NaNNaNNaNNaN24910.00.00.000.00
470001424931090.00.00.06/30/20147/31/20148/31/20149/30/2014261.636309.876238.174163.42650.31149.4483.8958.7876.9691.88124.2645.810.00.000.000.000.00.000.000.0050.31149.4483.8958.7867.6491.88124.2637.890.000.000.001.930.00.000.000.00117.96241.33208.1698.610.000.000.000.009.310.000.000.000.000.000.000.00.00.00.00.09.310.000.000.000.00.00.00.00.000.000.005.980.000.00.00.0127.28241.33208.16104.59105.6888.49233.81154.56106.84109.54104.1348.241.500.000.000.00214.03198.04337.94202.810.000.000.862.311.930.250.000.000.000.000.000.000.00.00.00.01.930.250.862.31216.44198.29338.81205.310.000.00.00.180.000.000.00.000.480.000.00.00563419635028720056110110506/26/20147/28/20148/9/20149/28/201450110110506/4/2014NaNNaNNaN1.0NaNNaNNaN56.0NaNNaNNaN1.0NaNNaNNaN0.0NaNNaNNaN56.0NaNNaNNaN0.000.000.000.00.000.000.000.000.00NaNNaNNaN0.00NaNNaNNaN0.0NaNNaNNaN00001000000000000.0NaNNaNNaN15260.00.00.000.00
1# Checking information about data.
2print(data.info())
3def metadata_matrix(data) :
4 return pd.DataFrame({
5 'Datatype' : data.dtypes.astype(str),
6 'Non_Null_Count': data.count(axis = 0).astype(int),
7 'Null_Count': data.isnull().sum().astype(int),
8 'Null_Percentage': round(data.isnull().sum()/len(data) * 100 , 2),
9 'Unique_Values_Count': data.nunique().astype(int)
10 }).sort_values(by='Null_Percentage', ascending=False)
11
12metadata_matrix(data)
1<class 'pandas.core.frame.DataFrame'>
2RangeIndex: 99999 entries, 0 to 99998
3Columns: 226 entries, mobile_number to sep_vbc_3g
4dtypes: float64(179), int64(35), object(12)
5memory usage: 172.4+ MB
6None
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
arpu_3g_6float64251537484674.857418
night_pck_user_6float64251537484674.852
total_rech_data_6float64251537484674.8537
arpu_2g_6float64251537484674.856990
max_rech_data_6float64251537484674.8548
fb_user_6float64251537484674.852
av_rech_amt_data_6float64251537484674.85887
date_of_last_rech_data_6object251537484674.8530
count_rech_2g_6float64251537484674.8531
count_rech_3g_6float64251537484674.8525
date_of_last_rech_data_7object255717442874.4331
total_rech_data_7float64255717442874.4342
fb_user_7float64255717442874.432
max_rech_data_7float64255717442874.4348
night_pck_user_7float64255717442874.432
count_rech_2g_7float64255717442874.4336
av_rech_amt_data_7float64255717442874.43961
arpu_2g_7float64255717442874.436586
count_rech_3g_7float64255717442874.4328
arpu_3g_7float64255717442874.437246
total_rech_data_9float64259227407774.0837
count_rech_3g_9float64259227407774.0827
fb_user_9float64259227407774.082
max_rech_data_9float64259227407774.0850
arpu_3g_9float64259227407774.088063
date_of_last_rech_data_9object259227407774.0830
night_pck_user_9float64259227407774.082
arpu_2g_9float64259227407774.086795
count_rech_2g_9float64259227407774.0832
av_rech_amt_data_9float64259227407774.08945
total_rech_data_8float64263397366073.6646
arpu_3g_8float64263397366073.667787
fb_user_8float64263397366073.662
night_pck_user_8float64263397366073.662
av_rech_amt_data_8float64263397366073.66973
max_rech_data_8float64263397366073.6650
count_rech_3g_8float64263397366073.6629
arpu_2g_8float64263397366073.666652
count_rech_2g_8float64263397366073.6634
date_of_last_rech_data_8object263397366073.6631
ic_others_9float649225477457.751923
std_og_mou_9float649225477457.7526553
std_og_t2c_mou_9float649225477457.751
isd_ic_mou_9float649225477457.755557
std_ic_mou_9float649225477457.7511266
isd_og_mou_9float649225477457.751255
spl_og_mou_9float649225477457.754095
spl_ic_mou_9float649225477457.75384
og_others_9float649225477457.75235
loc_ic_t2t_mou_9float649225477457.7512993
std_ic_t2o_mou_9float649225477457.751
loc_ic_t2m_mou_9float649225477457.7521484
std_ic_t2f_mou_9float649225477457.753090
loc_ic_t2f_mou_9float649225477457.757091
loc_ic_mou_9float649225477457.7527697
std_ic_t2m_mou_9float649225477457.758933
std_og_t2f_mou_9float649225477457.752295
std_og_t2t_mou_9float649225477457.7517934
std_ic_t2t_mou_9float649225477457.756157
loc_og_mou_9float649225477457.7525376
roam_og_mou_9float649225477457.755882
loc_og_t2m_mou_9float649225477457.7520141
loc_og_t2f_mou_9float649225477457.753758
roam_ic_mou_9float649225477457.754827
offnet_mou_9float649225477457.7530077
loc_og_t2c_mou_9float649225477457.752332
loc_og_t2t_mou_9float649225477457.7512949
std_og_t2m_mou_9float649225477457.7519052
onnet_mou_9float649225477457.7523565
onnet_mou_8float649462153785.3824089
std_ic_t2t_mou_8float649462153785.386352
std_ic_mou_8float649462153785.3811662
loc_ic_t2t_mou_8float649462153785.3813346
roam_og_mou_8float649462153785.386504
std_ic_t2m_mou_8float649462153785.389304
loc_ic_mou_8float649462153785.3828200
std_ic_t2f_mou_8float649462153785.383051
roam_ic_mou_8float649462153785.385315
std_ic_t2o_mou_8float649462153785.381
loc_og_t2t_mou_8float649462153785.3813336
loc_ic_t2f_mou_8float649462153785.387097
offnet_mou_8float649462153785.3830908
loc_ic_t2m_mou_8float649462153785.3821886
loc_og_t2m_mou_8float649462153785.3820544
isd_og_mou_8float649462153785.381276
ic_others_8float649462153785.381896
og_others_8float649462153785.38216
spl_ic_mou_8float649462153785.38102
loc_og_t2f_mou_8float649462153785.383807
std_og_t2m_mou_8float649462153785.3819786
spl_og_mou_8float649462153785.384390
std_og_t2c_mou_8float649462153785.381
isd_ic_mou_8float649462153785.385844
loc_og_t2c_mou_8float649462153785.382516
std_og_t2f_mou_8float649462153785.382333
std_og_t2t_mou_8float649462153785.3818291
loc_og_mou_8float649462153785.3825990
std_og_mou_8float649462153785.3827491
date_of_last_rech_9object9523947604.7630
std_ic_t2f_mou_6float649606239373.943125
ic_others_6float649606239373.941817
isd_ic_mou_6float649606239373.945521
std_ic_t2m_mou_6float649606239373.949308
std_ic_mou_6float649606239373.9411646
spl_ic_mou_6float649606239373.9484
std_ic_t2o_mou_6float649606239373.941
loc_ic_t2f_mou_6float649606239373.947250
loc_ic_t2t_mou_6float649606239373.9413540
std_og_t2c_mou_6float649606239373.941
std_og_t2f_mou_6float649606239373.942450
std_og_mou_6float649606239373.9427502
std_og_t2m_mou_6float649606239373.9419734
isd_og_mou_6float649606239373.941381
std_og_t2t_mou_6float649606239373.9418244
spl_og_mou_6float649606239373.943965
loc_og_mou_6float649606239373.9426372
og_others_6float649606239373.941018
loc_og_t2c_mou_6float649606239373.942235
loc_og_t2m_mou_6float649606239373.9420905
loc_og_t2f_mou_6float649606239373.943860
loc_og_t2t_mou_6float649606239373.9413539
roam_og_mou_6float649606239373.948038
std_ic_t2t_mou_6float649606239373.946279
onnet_mou_6float649606239373.9424313
loc_ic_mou_6float649606239373.9428569
offnet_mou_6float649606239373.9431140
roam_ic_mou_6float649606239373.946512
loc_ic_t2m_mou_6float649606239373.9422065
loc_og_t2c_mou_7float649614038593.862426
roam_ic_mou_7float649614038593.865230
loc_og_mou_7float649614038593.8626091
loc_og_t2t_mou_7float649614038593.8613411
offnet_mou_7float649614038593.8631023
loc_og_t2f_mou_7float649614038593.863863
std_og_t2t_mou_7float649614038593.8618567
std_ic_t2t_mou_7float649614038593.866481
onnet_mou_7float649614038593.8624336
std_og_t2m_mou_7float649614038593.8620018
loc_og_t2m_mou_7float649614038593.8620637
std_og_t2f_mou_7float649614038593.862391
roam_og_mou_7float649614038593.866639
std_og_t2c_mou_7float649614038593.861
std_ic_t2m_mou_7float649614038593.869464
isd_og_mou_7float649614038593.861380
ic_others_7float649614038593.862002
loc_ic_t2f_mou_7float649614038593.867395
loc_ic_t2m_mou_7float649614038593.8621918
std_ic_mou_7float649614038593.8611889
loc_ic_t2t_mou_7float649614038593.8613511
std_ic_t2f_mou_7float649614038593.863209
loc_ic_mou_7float649614038593.8628390
spl_ic_mou_7float649614038593.86107
og_others_7float649614038593.86187
spl_og_mou_7float649614038593.864396
isd_ic_mou_7float649614038593.865789
std_ic_t2o_mou_7float649614038593.861
std_og_mou_7float649614038593.8627951
date_of_last_rech_8object9637736223.6231
date_of_last_rech_7object9823217671.7731
last_date_of_month_9object9834016591.661
date_of_last_rech_6object9839216071.6130
last_date_of_month_8object9889911001.101
loc_ic_t2o_moufloat649898110181.021
std_og_t2o_moufloat649898110181.021
loc_og_t2o_moufloat649898110181.021
last_date_of_month_7object993986010.601
sachet_3g_8int649999900.0029
jul_vbc_3gfloat649999900.0014162
aug_vbc_3gfloat649999900.0014676
aonint649999900.003489
jun_vbc_3gfloat649999900.0013312
monthly_2g_9int649999900.005
sachet_3g_6int649999900.0025
vol_3g_mb_9float649999900.0014472
sachet_3g_7int649999900.0027
monthly_2g_8int649999900.006
monthly_3g_9int649999900.0011
monthly_3g_8int649999900.0012
sachet_3g_9int649999900.0027
monthly_3g_7int649999900.0015
monthly_3g_6int649999900.0012
sachet_2g_9int649999900.0032
sachet_2g_8int649999900.0034
sachet_2g_7int649999900.0035
sachet_2g_6int649999900.0032
monthly_2g_7int649999900.006
monthly_2g_6int649999900.005
mobile_numberint649999900.0099999
vol_3g_mb_8float649999900.0014960
total_og_mou_9float649999900.0039160
total_rech_num_7int649999900.00101
total_rech_num_6int649999900.00102
total_ic_mou_9float649999900.0031260
total_ic_mou_8float649999900.0032128
total_ic_mou_7float649999900.0032242
total_ic_mou_6float649999900.0032247
circle_idint649999900.001
total_og_mou_8float649999900.0040074
vol_3g_mb_7float649999900.0014519
total_og_mou_7float649999900.0040477
total_og_mou_6float649999900.0040327
arpu_9float649999900.0079937
arpu_8float649999900.0083615
arpu_7float649999900.0085308
arpu_6float649999900.0085681
last_date_of_month_6object9999900.001
total_rech_num_8int649999900.0096
total_rech_num_9int649999900.0097
total_rech_amt_6int649999900.002305
total_rech_amt_7int649999900.002329
vol_3g_mb_6float649999900.0013773
vol_2g_mb_9float649999900.0013919
vol_2g_mb_8float649999900.0014994
vol_2g_mb_7float649999900.0015114
vol_2g_mb_6float649999900.0015201
last_day_rch_amt_9int649999900.00185
last_day_rch_amt_8int649999900.00199
last_day_rch_amt_7int649999900.00173
last_day_rch_amt_6int649999900.00186
max_rech_amt_9int649999900.00201
max_rech_amt_8int649999900.00213
max_rech_amt_7int649999900.00183
max_rech_amt_6int649999900.00202
total_rech_amt_9int649999900.002304
total_rech_amt_8int649999900.002347
sep_vbc_3gfloat649999900.003720

Data Cleaning

1# Checking if there are any duplicate records.
2data['mobile_number'].value_counts().sum()
199999
  • Since number of rows is same as distinct mobile numbers, there is no duplicate data
1# mobile_number is a unique identifier
2# Setting mobile_number as the index
3data = data.set_index('mobile_number')
1# Renaming columns
2data = data.rename({'jun_vbc_3g' : 'vbc_3g_6', 'jul_vbc_3g' : 'vbc_3g_7', 'aug_vbc_3g' : 'vbc_3g_8', 'sep_vbc_3g' : 'vbc_3g_9'}, axis=1)
1#Converting columns into appropriate data types and extracting singe value columns.
2# Columns with unique values < 29 are considered as categorical variables.
3# The number 30 is arrived at, by looking at the above metadata_matrix output.
4
5columns=data.columns
6change_to_cat=[]
7single_value_col=[]
8for column in columns:
9 unique_value_count=data[column].nunique()
10 if unique_value_count==1:
11 single_value_col.append(column)
12 if unique_value_count<=29 and unique_value_count!=0 and data[column].dtype in ['int','float']:
13 change_to_cat.append(column)
14print( ' Columns to change to categorical data type : \n' ,pd.DataFrame(change_to_cat), '\n')
1Columns to change to categorical data type :
2 0
30 circle_id
41 loc_og_t2o_mou
52 std_og_t2o_mou
63 loc_ic_t2o_mou
74 std_og_t2c_mou_6
85 std_og_t2c_mou_7
96 std_og_t2c_mou_8
107 std_og_t2c_mou_9
118 std_ic_t2o_mou_6
129 std_ic_t2o_mou_7
1310 std_ic_t2o_mou_8
1411 std_ic_t2o_mou_9
1512 count_rech_3g_6
1613 count_rech_3g_7
1714 count_rech_3g_8
1815 count_rech_3g_9
1916 night_pck_user_6
2017 night_pck_user_7
2118 night_pck_user_8
2219 night_pck_user_9
2320 monthly_2g_6
2421 monthly_2g_7
2522 monthly_2g_8
2623 monthly_2g_9
2724 monthly_3g_6
2825 monthly_3g_7
2926 monthly_3g_8
3027 monthly_3g_9
3128 sachet_3g_6
3229 sachet_3g_7
3330 sachet_3g_8
3431 sachet_3g_9
3532 fb_user_6
3633 fb_user_7
3734 fb_user_8
3835 fb_user_9
1# Converting all the above columns having <=29 unique values into categorical data type.
2data[change_to_cat]=data[change_to_cat].astype('category')
1# Converting *sachet* variables to categorical data type
2sachet_columns = data.filter(regex='.*sachet.*', axis=1).columns.values
3data[sachet_columns] = data[sachet_columns].astype('category')
1#Changing datatype of date variables to datetime.
2columns=data.columns
3col_with_date=[]
4import re
5for column in columns:
6 x = re.findall("^date", column)
7 if x:
8 col_with_date.append(column)
9data[col_with_date].dtypes
1date_of_last_rech_6 object
2date_of_last_rech_7 object
3date_of_last_rech_8 object
4date_of_last_rech_9 object
5date_of_last_rech_data_6 object
6date_of_last_rech_data_7 object
7date_of_last_rech_data_8 object
8date_of_last_rech_data_9 object
9dtype: object
1# Checking the date format
2data[col_with_date].head()
date_of_last_rech_6date_of_last_rech_7date_of_last_rech_8date_of_last_rech_9date_of_last_rech_data_6date_of_last_rech_data_7date_of_last_rech_data_8date_of_last_rech_data_9
mobile_number
70008427536/21/20147/16/20148/8/20149/28/20146/21/20147/16/20148/8/2014NaN
70018657786/29/20147/31/20148/28/20149/30/2014NaN7/25/20148/10/2014NaN
70016259596/17/20147/24/20148/14/20149/29/2014NaNNaNNaN9/17/2014
70012041726/28/20147/31/20148/31/20149/30/2014NaNNaNNaNNaN
70001424936/26/20147/28/20148/9/20149/28/20146/4/2014NaNNaNNaN
  • Lets convert the above columns to datetime data type.
1for col in col_with_date:
2 data[col]=pd.to_datetime(data[col], format="%m/%d/%Y")
3data[col_with_date].head()
date_of_last_rech_6date_of_last_rech_7date_of_last_rech_8date_of_last_rech_9date_of_last_rech_data_6date_of_last_rech_data_7date_of_last_rech_data_8date_of_last_rech_data_9
mobile_number
70008427532014-06-212014-07-162014-08-082014-09-282014-06-212014-07-162014-08-08NaT
70018657782014-06-292014-07-312014-08-282014-09-30NaT2014-07-252014-08-10NaT
70016259592014-06-172014-07-242014-08-142014-09-29NaTNaTNaT2014-09-17
70012041722014-06-282014-07-312014-08-312014-09-30NaTNaTNaTNaT
70001424932014-06-262014-07-282014-08-092014-09-282014-06-04NaTNaTNaT

Filtering High Value Customers

  • Customers are High Values if their Average recharge amount of june and july is more than or equal to 70th percentile of Average recharge amount.
1#Deriving Average recharge amount of June and July.
2data['Average_rech_amt_6n7']=(data['total_rech_amt_6']+data['total_rech_amt_7'])/2
1#Filtering based HIGH VALUED CUSTOMERS based on (Average_rech_amt_6n7 >= 70th percentile of Average_rech_amt_6n7)
2data=data[(data['Average_rech_amt_6n7']>= data['Average_rech_amt_6n7'].quantile(0.7))]

Missing Values

1#Checking for missing values.
2missing_values = metadata_matrix(data)[['Datatype', 'Null_Percentage']].sort_values(by='Null_Percentage', ascending=False)
3missing_values
DatatypeNull_Percentage
av_rech_amt_data_6float6462.02
count_rech_2g_6float6462.02
arpu_2g_6float6462.02
max_rech_data_6float6462.02
night_pck_user_6category62.02
date_of_last_rech_data_6datetime64[ns]62.02
total_rech_data_6float6462.02
arpu_3g_6float6462.02
fb_user_6category62.02
count_rech_3g_6category62.02
av_rech_amt_data_9float6461.81
count_rech_2g_9float6461.81
night_pck_user_9category61.81
arpu_3g_9float6461.81
arpu_2g_9float6461.81
fb_user_9category61.81
date_of_last_rech_data_9datetime64[ns]61.81
total_rech_data_9float6461.81
count_rech_3g_9category61.81
max_rech_data_9float6461.81
count_rech_2g_7float6461.14
count_rech_3g_7category61.14
arpu_2g_7float6461.14
arpu_3g_7float6461.14
av_rech_amt_data_7float6461.14
max_rech_data_7float6461.14
fb_user_7category61.14
total_rech_data_7float6461.14
date_of_last_rech_data_7datetime64[ns]61.14
night_pck_user_7category61.14
av_rech_amt_data_8float6460.83
count_rech_3g_8category60.83
total_rech_data_8float6460.83
arpu_3g_8float6460.83
max_rech_data_8float6460.83
date_of_last_rech_data_8datetime64[ns]60.83
arpu_2g_8float6460.83
fb_user_8category60.83
night_pck_user_8category60.83
count_rech_2g_8float6460.83
loc_og_t2t_mou_9float645.68
ic_others_9float645.68
isd_ic_mou_9float645.68
og_others_9float645.68
loc_og_t2f_mou_9float645.68
roam_ic_mou_9float645.68
loc_og_mou_9float645.68
std_og_t2f_mou_9float645.68
loc_og_t2m_mou_9float645.68
std_og_t2m_mou_9float645.68
loc_og_t2c_mou_9float645.68
std_og_t2t_mou_9float645.68
std_ic_t2o_mou_9category5.68
std_ic_mou_9float645.68
spl_ic_mou_9float645.68
std_ic_t2f_mou_9float645.68
roam_og_mou_9float645.68
std_ic_t2m_mou_9float645.68
offnet_mou_9float645.68
std_og_mou_9float645.68
spl_og_mou_9float645.68
loc_ic_t2t_mou_9float645.68
onnet_mou_9float645.68
loc_ic_t2m_mou_9float645.68
loc_ic_t2f_mou_9float645.68
std_og_t2c_mou_9category5.68
loc_ic_mou_9float645.68
std_ic_t2t_mou_9float645.68
isd_og_mou_9float645.68
std_og_t2t_mou_8float643.13
std_og_t2c_mou_8category3.13
std_og_t2f_mou_8float643.13
std_og_mou_8float643.13
roam_og_mou_8float643.13
isd_og_mou_8float643.13
loc_og_t2t_mou_8float643.13
spl_ic_mou_8float643.13
std_og_t2m_mou_8float643.13
ic_others_8float643.13
offnet_mou_8float643.13
og_others_8float643.13
isd_ic_mou_8float643.13
roam_ic_mou_8float643.13
spl_og_mou_8float643.13
loc_og_t2f_mou_8float643.13
std_ic_t2m_mou_8float643.13
std_ic_t2f_mou_8float643.13
std_ic_t2t_mou_8float643.13
loc_og_t2c_mou_8float643.13
loc_ic_mou_8float643.13
onnet_mou_8float643.13
loc_og_t2m_mou_8float643.13
loc_ic_t2f_mou_8float643.13
std_ic_t2o_mou_8category3.13
loc_og_mou_8float643.13
loc_ic_t2m_mou_8float643.13
std_ic_mou_8float643.13
loc_ic_t2t_mou_8float643.13
date_of_last_rech_9datetime64[ns]2.89
date_of_last_rech_8datetime64[ns]1.98
last_date_of_month_9object1.20
loc_og_mou_6float641.05
std_ic_t2m_mou_6float641.05
roam_og_mou_6float641.05
std_ic_t2t_mou_6float641.05
loc_ic_mou_6float641.05
roam_ic_mou_6float641.05
loc_ic_t2f_mou_6float641.05
loc_ic_t2m_mou_6float641.05
std_og_t2t_mou_6float641.05
onnet_mou_6float641.05
loc_ic_t2t_mou_6float641.05
offnet_mou_6float641.05
og_others_6float641.05
loc_og_t2t_mou_6float641.05
isd_og_mou_6float641.05
std_og_t2m_mou_6float641.05
loc_og_t2f_mou_6float641.05
spl_ic_mou_6float641.05
std_ic_mou_6float641.05
isd_ic_mou_6float641.05
loc_og_t2m_mou_6float641.05
std_ic_t2o_mou_6category1.05
spl_og_mou_6float641.05
ic_others_6float641.05
std_ic_t2f_mou_6float641.05
loc_og_t2c_mou_6float641.05
std_og_mou_6float641.05
std_og_t2f_mou_6float641.05
std_og_t2c_mou_6category1.05
roam_ic_mou_7float641.01
loc_og_t2c_mou_7float641.01
loc_og_t2f_mou_7float641.01
loc_og_t2m_mou_7float641.01
loc_og_t2t_mou_7float641.01
roam_og_mou_7float641.01
std_ic_t2t_mou_7float641.01
offnet_mou_7float641.01
onnet_mou_7float641.01
std_ic_t2f_mou_7float641.01
std_ic_mou_7float641.01
loc_ic_t2f_mou_7float641.01
std_ic_t2m_mou_7float641.01
loc_og_mou_7float641.01
loc_ic_t2t_mou_7float641.01
std_og_t2t_mou_7float641.01
std_og_t2c_mou_7category1.01
std_og_mou_7float641.01
isd_og_mou_7float641.01
spl_og_mou_7float641.01
og_others_7float641.01
spl_ic_mou_7float641.01
loc_ic_t2m_mou_7float641.01
loc_ic_mou_7float641.01
ic_others_7float641.01
std_og_t2m_mou_7float641.01
isd_ic_mou_7float641.01
std_ic_t2o_mou_7category1.01
std_og_t2f_mou_7float641.01
last_date_of_month_8object0.52
loc_og_t2o_moucategory0.38
loc_ic_t2o_moucategory0.38
date_of_last_rech_7datetime64[ns]0.38
std_og_t2o_moucategory0.38
date_of_last_rech_6datetime64[ns]0.21
last_date_of_month_7object0.10
vol_3g_mb_6float640.00
arpu_6float640.00
total_rech_amt_8int640.00
total_rech_amt_7int640.00
total_rech_amt_6int640.00
total_rech_num_9int640.00
last_date_of_month_6object0.00
vol_3g_mb_8float640.00
arpu_7float640.00
arpu_8float640.00
arpu_9float640.00
total_og_mou_6float640.00
total_og_mou_7float640.00
vol_3g_mb_7float640.00
max_rech_amt_9int640.00
vol_2g_mb_9float640.00
vol_2g_mb_8float640.00
vol_2g_mb_7float640.00
vol_2g_mb_6float640.00
last_day_rch_amt_9int640.00
last_day_rch_amt_8int640.00
last_day_rch_amt_7int640.00
last_day_rch_amt_6int640.00
max_rech_amt_8int640.00
max_rech_amt_7int640.00
max_rech_amt_6int640.00
total_rech_amt_9int640.00
total_ic_mou_6float640.00
total_og_mou_8float640.00
vbc_3g_8float640.00
total_ic_mou_7float640.00
total_ic_mou_8float640.00
sachet_3g_9category0.00
sachet_3g_7category0.00
vbc_3g_9float640.00
vbc_3g_6float640.00
vbc_3g_7float640.00
aonint640.00
sachet_3g_6category0.00
monthly_3g_8category0.00
monthly_3g_9category0.00
sachet_3g_8category0.00
monthly_3g_7category0.00
sachet_2g_9category0.00
sachet_2g_8category0.00
sachet_2g_7category0.00
sachet_2g_6category0.00
monthly_2g_9category0.00
monthly_2g_8category0.00
monthly_2g_7category0.00
monthly_2g_6category0.00
monthly_3g_6category0.00
circle_idcategory0.00
vol_3g_mb_9float640.00
total_og_mou_9float640.00
total_rech_num_8int640.00
total_rech_num_7int640.00
total_rech_num_6int640.00
total_ic_mou_9float640.00
Average_rech_amt_6n7float640.00
1# Columns with high missing values , > 50%
2metadata = metadata_matrix(data)
3condition = metadata['Null_Percentage'] > 50
4high_missing_values = metadata[condition]
5high_missing_values
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
av_rech_amt_data_6float64113971861462.02794
count_rech_3g_6category113971861462.0225
count_rech_2g_6float64113971861462.0230
arpu_2g_6float64113971861462.024503
max_rech_data_6float64113971861462.0243
night_pck_user_6category113971861462.022
date_of_last_rech_data_6datetime64[ns]113971861462.0230
total_rech_data_6float64113971861462.0236
arpu_3g_6float64113971861462.024875
fb_user_6category113971861462.022
max_rech_data_9float64114611855061.8148
count_rech_3g_9category114611855061.8127
fb_user_9category114611855061.812
total_rech_data_9float64114611855061.8135
date_of_last_rech_data_9datetime64[ns]114611855061.8130
av_rech_amt_data_9float64114611855061.81812
arpu_2g_9float64114611855061.813846
arpu_3g_9float64114611855061.814800
night_pck_user_9category114611855061.812
count_rech_2g_9float64114611855061.8129
fb_user_7category116621834961.142
date_of_last_rech_data_7datetime64[ns]116621834961.1431
total_rech_data_7float64116621834961.1440
night_pck_user_7category116621834961.142
max_rech_data_7float64116621834961.1446
count_rech_2g_7float64116621834961.1435
arpu_3g_7float64116621834961.144860
av_rech_amt_data_7float64116621834961.14863
arpu_2g_7float64116621834961.144219
count_rech_3g_7category116621834961.1428
night_pck_user_8category117541825760.832
fb_user_8category117541825760.832
arpu_2g_8float64117541825760.833854
count_rech_2g_8float64117541825760.8333
date_of_last_rech_data_8datetime64[ns]117541825760.8331
av_rech_amt_data_8float64117541825760.83837
arpu_3g_8float64117541825760.834769
total_rech_data_8float64117541825760.8345
count_rech_3g_8category117541825760.8329
max_rech_data_8float64117541825760.8347
1# Dropping above columns with high missing values
2high_missing_value_columns = high_missing_values.index
3data.drop(columns=high_missing_value_columns, inplace=True)
1# Looking at remaining columns with missing values
2metadata_matrix(data)
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
std_ic_t2o_mou_9category2830717045.681
spl_og_mou_9float642830717045.682966
isd_og_mou_9float642830717045.68908
roam_ic_mou_9float642830717045.683370
std_og_mou_9float642830717045.6815900
roam_og_mou_9float642830717045.684004
std_ic_t2f_mou_9float642830717045.681971
std_og_t2c_mou_9category2830717045.681
loc_og_t2t_mou_9float642830717045.6810360
std_og_t2f_mou_9float642830717045.681595
std_ic_mou_9float642830717045.687745
loc_og_t2m_mou_9float642830717045.6815585
std_og_t2m_mou_9float642830717045.6812445
loc_og_t2f_mou_9float642830717045.683111
std_og_t2t_mou_9float642830717045.6811141
loc_ic_mou_9float642830717045.6818018
loc_og_t2c_mou_9float642830717045.681576
offnet_mou_9float642830717045.6820452
loc_og_mou_9float642830717045.6818207
spl_ic_mou_9float642830717045.68287
std_ic_t2m_mou_9float642830717045.686168
loc_ic_t2f_mou_9float642830717045.684611
ic_others_9float642830717045.681284
loc_ic_t2m_mou_9float642830717045.6815194
loc_ic_t2t_mou_9float642830717045.689407
std_ic_t2t_mou_9float642830717045.684280
isd_ic_mou_9float642830717045.683329
og_others_9float642830717045.68132
onnet_mou_9float642830717045.6816674
std_og_mou_8float64290739383.1316864
std_og_t2m_mou_8float64290739383.1313326
og_others_8float64290739383.13133
loc_ic_t2f_mou_8float64290739383.134705
std_og_t2t_mou_8float64290739383.1311781
loc_og_mou_8float64290739383.1318885
std_ic_t2o_mou_8category290739383.131
loc_ic_t2m_mou_8float64290739383.1315598
std_ic_t2m_mou_8float64290739383.136420
std_ic_t2t_mou_8float64290739383.134486
std_og_t2f_mou_8float64290739383.131627
std_ic_t2f_mou_8float64290739383.131941
spl_og_mou_8float64290739383.133238
loc_ic_t2t_mou_8float64290739383.139671
std_og_t2c_mou_8category290739383.131
isd_og_mou_8float64290739383.13940
loc_ic_mou_8float64290739383.1318573
roam_ic_mou_8float64290739383.133655
isd_ic_mou_8float64290739383.133493
onnet_mou_8float64290739383.1317604
loc_og_t2c_mou_8float64290739383.131730
spl_ic_mou_8float64290739383.1385
loc_og_t2f_mou_8float64290739383.133124
std_ic_mou_8float64290739383.138033
roam_og_mou_8float64290739383.134382
ic_others_8float64290739383.131259
loc_og_t2m_mou_8float64290739383.1316165
loc_og_t2t_mou_8float64290739383.1310772
offnet_mou_8float64290739383.1321513
date_of_last_rech_9datetime64[ns]291458662.8930
date_of_last_rech_8datetime64[ns]294175941.9831
last_date_of_month_9object296513601.201
std_ic_mou_6float64296953161.058391
offnet_mou_6float64296953161.0522454
std_ic_t2f_mou_6float64296953161.052033
isd_ic_mou_6float64296953161.053429
ic_others_6float64296953161.051227
onnet_mou_6float64296953161.0518813
std_ic_t2m_mou_6float64296953161.056680
loc_ic_t2t_mou_6float64296953161.059872
loc_ic_t2m_mou_6float64296953161.0516015
loc_ic_t2f_mou_6float64296953161.054817
loc_ic_mou_6float64296953161.0519133
std_ic_t2t_mou_6float64296953161.054608
og_others_6float64296953161.05862
spl_og_mou_6float64296953161.053053
roam_ic_mou_6float64296953161.054338
spl_ic_mou_6float64296953161.0578
std_og_t2t_mou_6float64296953161.0512777
loc_og_t2c_mou_6float64296953161.051658
std_og_t2m_mou_6float64296953161.0514518
loc_og_t2f_mou_6float64296953161.053252
std_og_t2f_mou_6float64296953161.051773
loc_og_t2m_mou_6float64296953161.0516747
std_ic_t2o_mou_6category296953161.051
std_og_t2c_mou_6category296953161.051
std_og_mou_6float64296953161.0518325
loc_og_t2t_mou_6float64296953161.0511151
isd_og_mou_6float64296953161.051113
roam_og_mou_6float64296953161.055174
loc_og_mou_6float64296953161.0519691
isd_ic_mou_7float64297083031.013639
std_ic_t2f_mou_7float64297083031.012075
std_ic_t2m_mou_7float64297083031.016747
std_ic_t2o_mou_7category297083031.011
ic_others_7float64297083031.011371
spl_ic_mou_7float64297083031.0193
std_ic_t2t_mou_7float64297083031.014706
std_ic_mou_7float64297083031.018543
loc_ic_t2f_mou_7float64297083031.014897
og_others_7float64297083031.01123
loc_ic_mou_7float64297083031.0119030
std_og_t2f_mou_7float64297083031.011714
onnet_mou_7float64297083031.0118938
roam_ic_mou_7float64297083031.013649
roam_og_mou_7float64297083031.014431
loc_og_t2t_mou_7float64297083031.0111154
loc_og_t2m_mou_7float64297083031.0116872
loc_og_t2f_mou_7float64297083031.013267
loc_og_t2c_mou_7float64297083031.011750
loc_og_mou_7float64297083031.0119880
std_og_t2t_mou_7float64297083031.0112983
std_og_t2m_mou_7float64297083031.0114589
offnet_mou_7float64297083031.0122650
std_og_t2c_mou_7category297083031.011
loc_ic_t2t_mou_7float64297083031.019961
isd_og_mou_7float64297083031.011125
spl_og_mou_7float64297083031.013399
std_og_mou_7float64297083031.0118445
loc_ic_t2m_mou_7float64297083031.0116068
last_date_of_month_8object298541570.521
std_og_t2o_moucategory298971140.381
loc_ic_t2o_moucategory298971140.381
date_of_last_rech_7datetime64[ns]298971140.3831
loc_og_t2o_moucategory298971140.381
date_of_last_rech_6datetime64[ns]29949620.2130
last_date_of_month_7object29980310.101
sachet_3g_6category3001100.0025
monthly_2g_8category3001100.006
vol_2g_mb_8float643001100.007310
vol_2g_mb_9float643001100.006984
vol_2g_mb_6float643001100.007809
sachet_3g_9category3001100.0027
sachet_3g_8category3001100.0029
monthly_3g_9category3001100.0011
vol_3g_mb_6float643001100.007043
vol_3g_mb_7float643001100.007440
vol_3g_mb_8float643001100.007151
vol_3g_mb_9float643001100.007016
monthly_2g_6category3001100.005
monthly_2g_7category3001100.006
monthly_2g_9category3001100.005
sachet_3g_7category3001100.0027
sachet_2g_6category3001100.0030
sachet_2g_7category3001100.0034
sachet_2g_8category3001100.0034
sachet_2g_9category3001100.0029
vbc_3g_9float643001100.002171
monthly_3g_8category3001100.0012
monthly_3g_7category3001100.0015
vbc_3g_6float643001100.006864
vbc_3g_7float643001100.007318
vbc_3g_8float643001100.007291
aonint643001100.003321
monthly_3g_6category3001100.0012
vol_2g_mb_7float643001100.007813
circle_idcategory3001100.001
last_day_rch_amt_9int643001100.00170
last_day_rch_amt_8int643001100.00179
last_date_of_month_6object3001100.001
arpu_6float643001100.0029261
arpu_7float643001100.0029260
arpu_8float643001100.0028405
arpu_9float643001100.0027327
total_og_mou_6float643001100.0024607
total_og_mou_7float643001100.0024913
total_og_mou_8float643001100.0023644
total_og_mou_9float643001100.0022615
total_ic_mou_6float643001100.0020602
total_ic_mou_7float643001100.0020711
total_ic_mou_8float643001100.0020096
total_ic_mou_9float643001100.0019437
total_rech_num_6int643001100.00102
total_rech_num_7int643001100.00101
total_rech_num_8int643001100.0096
total_rech_num_9int643001100.0096
total_rech_amt_6int643001100.002241
total_rech_amt_7int643001100.002265
total_rech_amt_8int643001100.002299
total_rech_amt_9int643001100.002248
max_rech_amt_6int643001100.00170
max_rech_amt_7int643001100.00151
max_rech_amt_8int643001100.00182
max_rech_amt_9int643001100.00186
last_day_rch_amt_6int643001100.00158
last_day_rch_amt_7int643001100.00149
Average_rech_amt_6n7float643001100.003025
  • data contains information of 04 months - 6,7,8,9.
  • For the purpose of missing value treatment, each month’s revenue and usage data is not related to the other months.
  • hence, missing value treatment could be performed month wise.
1# Month 6
1sixth_month_columns = []
2for column in data.columns:
3 x = re.search("6$", column)
4 if x:
5 sixth_month_columns.append(column)
6# missing_values.loc[sixth_month_columns].sort_values(by='Null_Percentage', ascending=False)
7metadata = metadata_matrix(data)
8condition = metadata.index.isin(sixth_month_columns)
9sixth_month_metadata = metadata[condition]
10sixth_month_metadata
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
std_ic_mou_6float64296953161.058391
offnet_mou_6float64296953161.0522454
std_ic_t2f_mou_6float64296953161.052033
isd_ic_mou_6float64296953161.053429
ic_others_6float64296953161.051227
onnet_mou_6float64296953161.0518813
std_ic_t2m_mou_6float64296953161.056680
loc_ic_t2t_mou_6float64296953161.059872
loc_ic_t2m_mou_6float64296953161.0516015
loc_ic_t2f_mou_6float64296953161.054817
loc_ic_mou_6float64296953161.0519133
std_ic_t2t_mou_6float64296953161.054608
og_others_6float64296953161.05862
spl_og_mou_6float64296953161.053053
roam_ic_mou_6float64296953161.054338
spl_ic_mou_6float64296953161.0578
std_og_t2t_mou_6float64296953161.0512777
loc_og_t2c_mou_6float64296953161.051658
std_og_t2m_mou_6float64296953161.0514518
loc_og_t2f_mou_6float64296953161.053252
std_og_t2f_mou_6float64296953161.051773
loc_og_t2m_mou_6float64296953161.0516747
std_ic_t2o_mou_6category296953161.051
std_og_t2c_mou_6category296953161.051
std_og_mou_6float64296953161.0518325
loc_og_t2t_mou_6float64296953161.0511151
isd_og_mou_6float64296953161.051113
roam_og_mou_6float64296953161.055174
loc_og_mou_6float64296953161.0519691
date_of_last_rech_6datetime64[ns]29949620.2130
sachet_3g_6category3001100.0025
vol_2g_mb_6float643001100.007809
vol_3g_mb_6float643001100.007043
monthly_2g_6category3001100.005
sachet_2g_6category3001100.0030
vbc_3g_6float643001100.006864
monthly_3g_6category3001100.0012
last_date_of_month_6object3001100.001
arpu_6float643001100.0029261
total_og_mou_6float643001100.0024607
total_ic_mou_6float643001100.0020602
total_rech_num_6int643001100.00102
total_rech_amt_6int643001100.002241
max_rech_amt_6int643001100.00170
last_day_rch_amt_6int643001100.00158
  • Note that all the columns with *_mou have exactly 3.94% rows with missing values.
  • This is an indicator of a meaningful missing values.
  • Further note that *_mou columns indicate minutes of usage, which are applicable only to customers using calling plans. It is probable that, the 3.94% customers not using calling plans.
  • This could confirmed by looking at ‘total_og_mou_6’ and ‘total_ic_mou_6’ related columns where _mou columns have missing values. If these columns are zero for a customer , then all _mou columns should be zero too.
1# columns with meaningful missing in 6th month
2sixth_month_meaningful_missing_condition = sixth_month_metadata['Null_Percentage'] == 1.05
3sixth_month_meaningful_missing_cols = sixth_month_metadata[sixth_month_meaningful_missing_condition].index.values
4sixth_month_meaningful_missing_cols
1array(['std_ic_mou_6', 'offnet_mou_6', 'std_ic_t2f_mou_6', 'isd_ic_mou_6',
2 'ic_others_6', 'onnet_mou_6', 'std_ic_t2m_mou_6',
3 'loc_ic_t2t_mou_6', 'loc_ic_t2m_mou_6', 'loc_ic_t2f_mou_6',
4 'loc_ic_mou_6', 'std_ic_t2t_mou_6', 'og_others_6', 'spl_og_mou_6',
5 'roam_ic_mou_6', 'spl_ic_mou_6', 'std_og_t2t_mou_6',
6 'loc_og_t2c_mou_6', 'std_og_t2m_mou_6', 'loc_og_t2f_mou_6',
7 'std_og_t2f_mou_6', 'loc_og_t2m_mou_6', 'std_ic_t2o_mou_6',
8 'std_og_t2c_mou_6', 'std_og_mou_6', 'loc_og_t2t_mou_6',
9 'isd_og_mou_6', 'roam_og_mou_6', 'loc_og_mou_6'], dtype=object)
1# Looking at all sixth month columns where rows of *_mou are null
2condition = data[sixth_month_meaningful_missing_cols].isnull()
3# data.loc[condition, sixth_month_columns]
4
5
6# Rows is null for all the above columns
7missing_rows = pd.Series([True]*data.shape[0], index = data.index)
8for column in sixth_month_meaningful_missing_cols :
9 missing_rows = missing_rows & data[column].isnull()
10
11print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_6'].unique()[0])
12print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_6'].unique()[0])
1Total outgoing mou for each customer with missing *_mou data is 0.0
2Total incoming mou for each customer with missing *_mou data is 0.0
  • Hence, these could be imputed with 0
1# Imputation
2data[sixth_month_meaningful_missing_cols] = data[sixth_month_meaningful_missing_cols].fillna(0)
3
4metadata = metadata_matrix(data)
5
6# Remaining Missing Values
7metadata.iloc[metadata.index.isin(sixth_month_columns)]
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
date_of_last_rech_6datetime64[ns]29949620.2130
monthly_2g_6category3001100.005
vbc_3g_6float643001100.006864
max_rech_amt_6int643001100.00170
sachet_3g_6category3001100.0025
sachet_2g_6category3001100.0030
vol_2g_mb_6float643001100.007809
monthly_3g_6category3001100.0012
vol_3g_mb_6float643001100.007043
last_day_rch_amt_6int643001100.00158
total_rech_amt_6int643001100.002241
loc_og_t2m_mou_6float643001100.0016747
isd_og_mou_6float643001100.001113
std_og_mou_6float643001100.0018325
std_og_t2c_mou_6category3001100.001
std_og_t2f_mou_6float643001100.001773
std_og_t2m_mou_6float643001100.0014518
std_og_t2t_mou_6float643001100.0012777
loc_og_mou_6float643001100.0019691
loc_og_t2c_mou_6float643001100.001658
loc_og_t2f_mou_6float643001100.003252
loc_og_t2t_mou_6float643001100.0011151
roam_og_mou_6float643001100.005174
roam_ic_mou_6float643001100.004338
offnet_mou_6float643001100.0022454
onnet_mou_6float643001100.0018813
arpu_6float643001100.0029261
last_date_of_month_6object3001100.001
spl_og_mou_6float643001100.003053
og_others_6float643001100.00862
total_og_mou_6float643001100.0024607
total_rech_num_6int643001100.00102
ic_others_6float643001100.001227
isd_ic_mou_6float643001100.003429
spl_ic_mou_6float643001100.0078
total_ic_mou_6float643001100.0020602
std_ic_mou_6float643001100.008391
std_ic_t2o_mou_6category3001100.001
std_ic_t2f_mou_6float643001100.002033
std_ic_t2m_mou_6float643001100.006680
std_ic_t2t_mou_6float643001100.004608
loc_ic_mou_6float643001100.0019133
loc_ic_t2f_mou_6float643001100.004817
loc_ic_t2m_mou_6float643001100.0016015
loc_ic_t2t_mou_6float643001100.009872
  • Looks like there ‘1.61%’ customers with missing date of last recharge. Let’s look at ‘recharge’ related columns for such customers
1# Looking at 'recharge' related 6th month columns for customers with missing 'date_of_last_rech_6'
2condition = data['date_of_last_rech_6'].isnull()
3data[condition].filter(regex='.*rech.*6$', axis=1).head()
total_rech_num_6total_rech_amt_6max_rech_amt_6date_of_last_rech_6
mobile_number
7001588448000NaT
7001223277000NaT
7000721536000NaT
7001490351000NaT
7000665415000NaT
1data[condition].filter(regex='.*rech.*6$', axis=1).nunique()
1total_rech_num_6 1
2total_rech_amt_6 1
3max_rech_amt_6 1
4date_of_last_rech_6 0
5dtype: int64
  • Notice, that the recharge related columns for customers with missing ‘date_of_last_rech_6’ has just one unique value. From the first few rows of the output, we see that this is 0.
  • Hence, ‘date_of_last_rech_6’ is missing since there were no recharges made in this month.
  • These are meaning missing values
1# Check for missing values in 6th month variables
2metadata = metadata_matrix(data)
3metadata[metadata.index.isin(sixth_month_columns)]
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
date_of_last_rech_6datetime64[ns]29949620.2130
monthly_2g_6category3001100.005
vbc_3g_6float643001100.006864
max_rech_amt_6int643001100.00170
sachet_3g_6category3001100.0025
sachet_2g_6category3001100.0030
vol_2g_mb_6float643001100.007809
monthly_3g_6category3001100.0012
vol_3g_mb_6float643001100.007043
last_day_rch_amt_6int643001100.00158
total_rech_amt_6int643001100.002241
loc_og_t2m_mou_6float643001100.0016747
isd_og_mou_6float643001100.001113
std_og_mou_6float643001100.0018325
std_og_t2c_mou_6category3001100.001
std_og_t2f_mou_6float643001100.001773
std_og_t2m_mou_6float643001100.0014518
std_og_t2t_mou_6float643001100.0012777
loc_og_mou_6float643001100.0019691
loc_og_t2c_mou_6float643001100.001658
loc_og_t2f_mou_6float643001100.003252
loc_og_t2t_mou_6float643001100.0011151
roam_og_mou_6float643001100.005174
roam_ic_mou_6float643001100.004338
offnet_mou_6float643001100.0022454
onnet_mou_6float643001100.0018813
arpu_6float643001100.0029261
last_date_of_month_6object3001100.001
spl_og_mou_6float643001100.003053
og_others_6float643001100.00862
total_og_mou_6float643001100.0024607
total_rech_num_6int643001100.00102
ic_others_6float643001100.001227
isd_ic_mou_6float643001100.003429
spl_ic_mou_6float643001100.0078
total_ic_mou_6float643001100.0020602
std_ic_mou_6float643001100.008391
std_ic_t2o_mou_6category3001100.001
std_ic_t2f_mou_6float643001100.002033
std_ic_t2m_mou_6float643001100.006680
std_ic_t2t_mou_6float643001100.004608
loc_ic_mou_6float643001100.0019133
loc_ic_t2f_mou_6float643001100.004817
loc_ic_t2m_mou_6float643001100.0016015
loc_ic_t2t_mou_6float643001100.009872
  • No more Missing Values in 6th month columns
1# Month : 7
2seventh_month_columns = data.filter(regex='7$', axis=1).columns
3seventh_month_columns
1Index(['last_date_of_month_7', 'arpu_7', 'onnet_mou_7', 'offnet_mou_7',
2 'roam_ic_mou_7', 'roam_og_mou_7', 'loc_og_t2t_mou_7',
3 'loc_og_t2m_mou_7', 'loc_og_t2f_mou_7', 'loc_og_t2c_mou_7',
4 'loc_og_mou_7', 'std_og_t2t_mou_7', 'std_og_t2m_mou_7',
5 'std_og_t2f_mou_7', 'std_og_t2c_mou_7', 'std_og_mou_7', 'isd_og_mou_7',
6 'spl_og_mou_7', 'og_others_7', 'total_og_mou_7', 'loc_ic_t2t_mou_7',
7 'loc_ic_t2m_mou_7', 'loc_ic_t2f_mou_7', 'loc_ic_mou_7',
8 'std_ic_t2t_mou_7', 'std_ic_t2m_mou_7', 'std_ic_t2f_mou_7',
9 'std_ic_t2o_mou_7', 'std_ic_mou_7', 'total_ic_mou_7', 'spl_ic_mou_7',
10 'isd_ic_mou_7', 'ic_others_7', 'total_rech_num_7', 'total_rech_amt_7',
11 'max_rech_amt_7', 'date_of_last_rech_7', 'last_day_rch_amt_7',
12 'vol_2g_mb_7', 'vol_3g_mb_7', 'monthly_2g_7', 'sachet_2g_7',
13 'monthly_3g_7', 'sachet_3g_7', 'vbc_3g_7', 'Average_rech_amt_6n7'],
14 dtype='object')
1seventh_month_metadata = metadata[metadata.index.isin(seventh_month_columns)]
2seventh_month_metadata
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
loc_ic_t2t_mou_7float64297083031.019961
og_others_7float64297083031.01123
loc_ic_t2f_mou_7float64297083031.014897
loc_ic_t2m_mou_7float64297083031.0116068
loc_ic_mou_7float64297083031.0119030
std_ic_t2t_mou_7float64297083031.014706
std_ic_t2f_mou_7float64297083031.012075
std_ic_t2o_mou_7category297083031.011
std_ic_mou_7float64297083031.018543
spl_ic_mou_7float64297083031.0193
isd_ic_mou_7float64297083031.013639
ic_others_7float64297083031.011371
std_ic_t2m_mou_7float64297083031.016747
isd_og_mou_7float64297083031.011125
spl_og_mou_7float64297083031.013399
std_og_t2f_mou_7float64297083031.011714
onnet_mou_7float64297083031.0118938
offnet_mou_7float64297083031.0122650
roam_ic_mou_7float64297083031.013649
roam_og_mou_7float64297083031.014431
loc_og_t2t_mou_7float64297083031.0111154
loc_og_t2f_mou_7float64297083031.013267
loc_og_t2c_mou_7float64297083031.011750
loc_og_mou_7float64297083031.0119880
std_og_t2t_mou_7float64297083031.0112983
std_og_t2m_mou_7float64297083031.0114589
loc_og_t2m_mou_7float64297083031.0116872
std_og_t2c_mou_7category297083031.011
std_og_mou_7float64297083031.0118445
date_of_last_rech_7datetime64[ns]298971140.3831
last_date_of_month_7object29980310.101
vol_2g_mb_7float643001100.007813
max_rech_amt_7int643001100.00151
vbc_3g_7float643001100.007318
sachet_3g_7category3001100.0027
total_rech_amt_7int643001100.002265
monthly_2g_7category3001100.006
sachet_2g_7category3001100.0034
last_day_rch_amt_7int643001100.00149
monthly_3g_7category3001100.0015
vol_3g_mb_7float643001100.007440
total_rech_num_7int643001100.00101
arpu_7float643001100.0029260
total_og_mou_7float643001100.0024913
total_ic_mou_7float643001100.0020711
Average_rech_amt_6n7float643001100.003025
  • Note that all the columns with *_mou have exactly 3.86% rows with missing values.
  • This is an indicator of a meaningful missing values.
  • Further note that *_mou columns indicate minutes of usage, which are applicable only to customers using calling plans. It is probable that, the 3.86% customers not using calling plans.
  • This could confirmed by looking at ‘total_og_mou_7’ and ‘total_ic_mou_7’ related columns where _mou columns have missing values. If these columns are zero for a customer , then all _mou columns should be zero too.
1# columns with meaningful missing in 7th month
2seventh_month_meaningful_missing_condition = seventh_month_metadata['Null_Percentage'] == 1.01
3seventh_month_meaningful_missing_cols = seventh_month_metadata[seventh_month_meaningful_missing_condition].index.values
4seventh_month_meaningful_missing_cols
1array(['loc_ic_t2t_mou_7', 'og_others_7', 'loc_ic_t2f_mou_7',
2 'loc_ic_t2m_mou_7', 'loc_ic_mou_7', 'std_ic_t2t_mou_7',
3 'std_ic_t2f_mou_7', 'std_ic_t2o_mou_7', 'std_ic_mou_7',
4 'spl_ic_mou_7', 'isd_ic_mou_7', 'ic_others_7', 'std_ic_t2m_mou_7',
5 'isd_og_mou_7', 'spl_og_mou_7', 'std_og_t2f_mou_7', 'onnet_mou_7',
6 'offnet_mou_7', 'roam_ic_mou_7', 'roam_og_mou_7',
7 'loc_og_t2t_mou_7', 'loc_og_t2f_mou_7', 'loc_og_t2c_mou_7',
8 'loc_og_mou_7', 'std_og_t2t_mou_7', 'std_og_t2m_mou_7',
9 'loc_og_t2m_mou_7', 'std_og_t2c_mou_7', 'std_og_mou_7'],
10 dtype=object)
1# Looking at all 7th month columns where rows of *_mou are null
2condition = data[seventh_month_meaningful_missing_cols].isnull()
3
4# Rows is null for all the above columns
5missing_rows = pd.Series([True]*data.shape[0], index = data.index)
6for column in seventh_month_meaningful_missing_cols :
7 missing_rows = missing_rows & data[column].isnull()
8
9print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_7'].unique()[0])
10print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_7'].unique()[0])
1Total outgoing mou for each customer with missing *_mou data is 0.0
2Total incoming mou for each customer with missing *_mou data is 0.0
  • Hence, these could be imputed with 0
1# Imputation
2data[seventh_month_meaningful_missing_cols] = data[seventh_month_meaningful_missing_cols].fillna(0)
3
4metadata = metadata_matrix(data)
5
6# Remaining Missing Values
7metadata.iloc[metadata.index.isin(seventh_month_columns)]
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
date_of_last_rech_7datetime64[ns]298971140.3831
last_date_of_month_7object29980310.101
total_rech_num_7int643001100.00101
ic_others_7float643001100.001371
isd_ic_mou_7float643001100.003639
spl_ic_mou_7float643001100.0093
total_rech_amt_7int643001100.002265
sachet_2g_7category3001100.0034
monthly_3g_7category3001100.0015
sachet_3g_7category3001100.0027
vbc_3g_7float643001100.007318
max_rech_amt_7int643001100.00151
last_day_rch_amt_7int643001100.00149
vol_2g_mb_7float643001100.007813
monthly_2g_7category3001100.006
vol_3g_mb_7float643001100.007440
loc_ic_t2f_mou_7float643001100.004897
total_ic_mou_7float643001100.0020711
loc_og_t2t_mou_7float643001100.0011154
std_og_t2m_mou_7float643001100.0014589
std_og_t2t_mou_7float643001100.0012983
loc_og_mou_7float643001100.0019880
loc_og_t2c_mou_7float643001100.001750
loc_og_t2f_mou_7float643001100.003267
loc_og_t2m_mou_7float643001100.0016872
roam_og_mou_7float643001100.004431
roam_ic_mou_7float643001100.003649
offnet_mou_7float643001100.0022650
onnet_mou_7float643001100.0018938
arpu_7float643001100.0029260
std_og_t2f_mou_7float643001100.001714
std_og_t2c_mou_7category3001100.001
loc_ic_t2m_mou_7float643001100.0016068
std_ic_mou_7float643001100.008543
std_ic_t2o_mou_7category3001100.001
std_ic_t2f_mou_7float643001100.002075
std_ic_t2m_mou_7float643001100.006747
std_ic_t2t_mou_7float643001100.004706
loc_ic_mou_7float643001100.0019030
loc_ic_t2t_mou_7float643001100.009961
total_og_mou_7float643001100.0024913
og_others_7float643001100.00123
spl_og_mou_7float643001100.003399
isd_og_mou_7float643001100.001125
std_og_mou_7float643001100.0018445
Average_rech_amt_6n7float643001100.003025
  • Looks like there ‘1.77%’ customers with missing date of last recharge. Let’s look at ‘recharge’ related columns for such customers
1# Looking at 'recharge' related 7th month columns for customers with missing 'date_of_last_rech_7'
2condition = data['date_of_last_rech_7'].isnull()
3data[condition].filter(regex='.*rech.*7$', axis=1).head()
total_rech_num_7total_rech_amt_7max_rech_amt_7date_of_last_rech_7Average_rech_amt_6n7
mobile_number
7000369789000NaT393.0
7001967148000NaT500.5
7000066601000NaT490.0
7001189556000NaT523.5
7002024450000NaT493.0
1data[condition].filter(regex='.*rech.*7$', axis=1).nunique()
1total_rech_num_7 1
2total_rech_amt_7 1
3max_rech_amt_7 1
4date_of_last_rech_7 0
5Average_rech_amt_6n7 90
6dtype: int64
  • Notice, that the recharge related columns for customers with missing ‘date_of_last_rech_7’ has just one unique value. From the first few rows of the output, we see that this is 0.
  • Hence, ‘date_of_last_rech_7’ is missing since there were no recharges made in this month.
  • These are meaning missing values
1# Month : 8
1eighth_month_columns = data.filter(regex="8$", axis=1).columns
2metadata = metadata_matrix(data)
3condition = metadata.index.isin(eighth_month_columns)
4eighth_month_metadata = metadata[condition]
5eighth_month_metadata
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
std_og_t2c_mou_8category290739383.131
std_og_mou_8float64290739383.1316864
isd_og_mou_8float64290739383.13940
loc_ic_mou_8float64290739383.1318573
std_og_t2m_mou_8float64290739383.1313326
loc_ic_t2m_mou_8float64290739383.1315598
loc_og_mou_8float64290739383.1318885
std_og_t2t_mou_8float64290739383.1311781
std_og_t2f_mou_8float64290739383.131627
loc_ic_t2f_mou_8float64290739383.134705
loc_og_t2c_mou_8float64290739383.131730
ic_others_8float64290739383.131259
loc_og_t2m_mou_8float64290739383.1316165
spl_og_mou_8float64290739383.133238
roam_ic_mou_8float64290739383.133655
std_ic_mou_8float64290739383.138033
spl_ic_mou_8float64290739383.1385
std_ic_t2o_mou_8category290739383.131
onnet_mou_8float64290739383.1317604
loc_og_t2f_mou_8float64290739383.133124
offnet_mou_8float64290739383.1321513
std_ic_t2f_mou_8float64290739383.131941
og_others_8float64290739383.13133
loc_ic_t2t_mou_8float64290739383.139671
std_ic_t2m_mou_8float64290739383.136420
std_ic_t2t_mou_8float64290739383.134486
roam_og_mou_8float64290739383.134382
isd_ic_mou_8float64290739383.133493
loc_og_t2t_mou_8float64290739383.1310772
date_of_last_rech_8datetime64[ns]294175941.9831
last_date_of_month_8object298541570.521
total_rech_num_8int643001100.0096
total_rech_amt_8int643001100.002299
last_day_rch_amt_8int643001100.00179
sachet_2g_8category3001100.0034
monthly_3g_8category3001100.0012
sachet_3g_8category3001100.0029
vbc_3g_8float643001100.007291
monthly_2g_8category3001100.006
max_rech_amt_8int643001100.00182
total_ic_mou_8float643001100.0020096
vol_2g_mb_8float643001100.007310
vol_3g_mb_8float643001100.007151
arpu_8float643001100.0028405
total_og_mou_8float643001100.0023644
1# columns with meaningful missing in 8th month
2eighth_month_meaningful_missing_condition = eighth_month_metadata['Null_Percentage'] == 3.13
3eighth_month_meaningful_missing_cols = eighth_month_metadata[eighth_month_meaningful_missing_condition].index.values
4eighth_month_meaningful_missing_cols
1array(['std_og_t2c_mou_8', 'std_og_mou_8', 'isd_og_mou_8', 'loc_ic_mou_8',
2 'std_og_t2m_mou_8', 'loc_ic_t2m_mou_8', 'loc_og_mou_8',
3 'std_og_t2t_mou_8', 'std_og_t2f_mou_8', 'loc_ic_t2f_mou_8',
4 'loc_og_t2c_mou_8', 'ic_others_8', 'loc_og_t2m_mou_8',
5 'spl_og_mou_8', 'roam_ic_mou_8', 'std_ic_mou_8', 'spl_ic_mou_8',
6 'std_ic_t2o_mou_8', 'onnet_mou_8', 'loc_og_t2f_mou_8',
7 'offnet_mou_8', 'std_ic_t2f_mou_8', 'og_others_8',
8 'loc_ic_t2t_mou_8', 'std_ic_t2m_mou_8', 'std_ic_t2t_mou_8',
9 'roam_og_mou_8', 'isd_ic_mou_8', 'loc_og_t2t_mou_8'], dtype=object)
1# Looking at all 8th month columns where rows of *_mou are null
2condition = data[eighth_month_meaningful_missing_cols].isnull()
3
4# Rows is null for all the above columns
5missing_rows = pd.Series([True]*data.shape[0], index = data.index)
6for column in eighth_month_meaningful_missing_cols :
7 missing_rows = missing_rows & data[column].isnull()
8
9print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_8'].unique()[0])
10print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_8'].unique()[0])
1Total outgoing mou for each customer with missing *_mou data is 0.0
2Total incoming mou for each customer with missing *_mou data is 0.0
1# Imputation
2data[eighth_month_meaningful_missing_cols] = data[eighth_month_meaningful_missing_cols].fillna(0)
3
4metadata = metadata_matrix(data)
5
6# Remaining Missing Values
7metadata.iloc[metadata.index.isin(eighth_month_columns)]
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
date_of_last_rech_8datetime64[ns]294175941.9831
last_date_of_month_8object298541570.521
spl_ic_mou_8float643001100.0085
total_rech_num_8int643001100.0096
std_ic_t2f_mou_8float643001100.001941
ic_others_8float643001100.001259
std_ic_t2o_mou_8category3001100.001
std_ic_mou_8float643001100.008033
total_ic_mou_8float643001100.0020096
isd_ic_mou_8float643001100.003493
sachet_2g_8category3001100.0034
monthly_3g_8category3001100.0012
sachet_3g_8category3001100.0029
vbc_3g_8float643001100.007291
monthly_2g_8category3001100.006
total_rech_amt_8int643001100.002299
max_rech_amt_8int643001100.00182
last_day_rch_amt_8int643001100.00179
vol_2g_mb_8float643001100.007310
vol_3g_mb_8float643001100.007151
std_ic_t2m_mou_8float643001100.006420
loc_og_t2m_mou_8float643001100.0016165
loc_og_t2f_mou_8float643001100.003124
loc_og_t2c_mou_8float643001100.001730
loc_og_mou_8float643001100.0018885
std_og_t2t_mou_8float643001100.0011781
loc_og_t2t_mou_8float643001100.0010772
onnet_mou_8float643001100.0017604
arpu_8float643001100.0028405
roam_og_mou_8float643001100.004382
offnet_mou_8float643001100.0021513
roam_ic_mou_8float643001100.003655
std_og_t2m_mou_8float643001100.0013326
loc_ic_t2t_mou_8float643001100.009671
loc_ic_t2m_mou_8float643001100.0015598
loc_ic_t2f_mou_8float643001100.004705
loc_ic_mou_8float643001100.0018573
std_ic_t2t_mou_8float643001100.004486
total_og_mou_8float643001100.0023644
og_others_8float643001100.00133
std_og_t2f_mou_8float643001100.001627
std_og_t2c_mou_8category3001100.001
std_og_mou_8float643001100.0016864
isd_og_mou_8float643001100.00940
spl_og_mou_8float643001100.003238
1# Looking at 'recharge' related 8th month columns for customers with missing 'date_of_last_rech_8'
2condition = data['date_of_last_rech_8'].isnull()
3data[condition].filter(regex='.*rech.*8$', axis=1).head()
total_rech_num_8total_rech_amt_8max_rech_amt_8date_of_last_rech_8
mobile_number
7000340381000NaT
7000608224000NaT
7000369789000NaT
7000248548000NaT
7001967063000NaT
1data[condition].filter(regex='.*rech.*8$', axis=1).nunique()
1total_rech_num_8 1
2total_rech_amt_8 1
3max_rech_amt_8 1
4date_of_last_rech_8 0
5dtype: int64
1# Month : 9
1ninth_month_columns = data.filter(regex="9$", axis=1).columns
2metadata = metadata_matrix(data)
3condition = metadata.index.isin(ninth_month_columns)
4ninth_month_metadata = metadata[condition]
5ninth_month_metadata
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
std_og_t2c_mou_9category2830717045.681
spl_ic_mou_9float642830717045.68287
loc_og_t2m_mou_9float642830717045.6815585
og_others_9float642830717045.68132
loc_og_t2c_mou_9float642830717045.681576
isd_ic_mou_9float642830717045.683329
loc_og_t2t_mou_9float642830717045.6810360
spl_og_mou_9float642830717045.682966
loc_ic_t2t_mou_9float642830717045.689407
loc_og_mou_9float642830717045.6818207
roam_og_mou_9float642830717045.684004
std_ic_mou_9float642830717045.687745
loc_ic_t2m_mou_9float642830717045.6815194
roam_ic_mou_9float642830717045.683370
std_og_t2t_mou_9float642830717045.6811141
offnet_mou_9float642830717045.6820452
loc_ic_t2f_mou_9float642830717045.684611
std_ic_t2f_mou_9float642830717045.681971
isd_og_mou_9float642830717045.68908
std_og_mou_9float642830717045.6815900
std_og_t2f_mou_9float642830717045.681595
ic_others_9float642830717045.681284
std_ic_t2t_mou_9float642830717045.684280
std_ic_t2o_mou_9category2830717045.681
loc_og_t2f_mou_9float642830717045.683111
std_og_t2m_mou_9float642830717045.6812445
loc_ic_mou_9float642830717045.6818018
std_ic_t2m_mou_9float642830717045.686168
onnet_mou_9float642830717045.6816674
date_of_last_rech_9datetime64[ns]291458662.8930
last_date_of_month_9object296513601.201
total_rech_num_9int643001100.0096
total_ic_mou_9float643001100.0019437
monthly_3g_9category3001100.0011
monthly_2g_9category3001100.005
sachet_2g_9category3001100.0029
sachet_3g_9category3001100.0027
vbc_3g_9float643001100.002171
vol_3g_mb_9float643001100.007016
total_rech_amt_9int643001100.002248
max_rech_amt_9int643001100.00186
last_day_rch_amt_9int643001100.00170
vol_2g_mb_9float643001100.006984
arpu_9float643001100.0027327
total_og_mou_9float643001100.0022615
1# columns with meaningful missing in 9th month
2ninth_month_meaningful_missing_condition = ninth_month_metadata['Null_Percentage'] == 5.68
3ninth_month_meaningful_missing_cols = ninth_month_metadata[ninth_month_meaningful_missing_condition].index.values
4ninth_month_meaningful_missing_cols
1array(['std_og_t2c_mou_9', 'spl_ic_mou_9', 'loc_og_t2m_mou_9',
2 'og_others_9', 'loc_og_t2c_mou_9', 'isd_ic_mou_9',
3 'loc_og_t2t_mou_9', 'spl_og_mou_9', 'loc_ic_t2t_mou_9',
4 'loc_og_mou_9', 'roam_og_mou_9', 'std_ic_mou_9',
5 'loc_ic_t2m_mou_9', 'roam_ic_mou_9', 'std_og_t2t_mou_9',
6 'offnet_mou_9', 'loc_ic_t2f_mou_9', 'std_ic_t2f_mou_9',
7 'isd_og_mou_9', 'std_og_mou_9', 'std_og_t2f_mou_9', 'ic_others_9',
8 'std_ic_t2t_mou_9', 'std_ic_t2o_mou_9', 'loc_og_t2f_mou_9',
9 'std_og_t2m_mou_9', 'loc_ic_mou_9', 'std_ic_t2m_mou_9',
10 'onnet_mou_9'], dtype=object)
1# Looking at all 9th month columns where rows of *_mou are null
2condition = data[ninth_month_meaningful_missing_cols].isnull()
3
4# Rows is null for all the above columns
5missing_rows = pd.Series([True]*data.shape[0], index = data.index)
6for column in ninth_month_meaningful_missing_cols :
7 missing_rows = missing_rows & data[column].isnull()
8
9print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_9'].unique()[0])
10print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_9'].unique()[0])
1Total outgoing mou for each customer with missing *_mou data is 0.0
2Total incoming mou for each customer with missing *_mou data is 0.0
1# Imputation
2data[ninth_month_meaningful_missing_cols] = data[ninth_month_meaningful_missing_cols].fillna(0)
3
4metadata = metadata_matrix(data)
5
6# Remaining Missing Values
7metadata.iloc[metadata.index.isin(ninth_month_columns)]
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
date_of_last_rech_9datetime64[ns]291458662.8930
last_date_of_month_9object296513601.201
spl_ic_mou_9float643001100.00287
total_ic_mou_9float643001100.0019437
std_ic_mou_9float643001100.007745
isd_ic_mou_9float643001100.003329
ic_others_9float643001100.001284
loc_ic_mou_9float643001100.0018018
std_ic_t2t_mou_9float643001100.004280
std_ic_t2m_mou_9float643001100.006168
std_ic_t2f_mou_9float643001100.001971
std_ic_t2o_mou_9category3001100.001
total_rech_amt_9int643001100.002248
total_rech_num_9int643001100.0096
monthly_3g_9category3001100.0011
monthly_2g_9category3001100.005
sachet_2g_9category3001100.0029
sachet_3g_9category3001100.0027
vbc_3g_9float643001100.002171
max_rech_amt_9int643001100.00186
vol_3g_mb_9float643001100.007016
last_day_rch_amt_9int643001100.00170
vol_2g_mb_9float643001100.006984
loc_ic_t2f_mou_9float643001100.004611
loc_og_t2t_mou_9float643001100.0010360
loc_og_t2m_mou_9float643001100.0015585
loc_og_t2f_mou_9float643001100.003111
loc_og_t2c_mou_9float643001100.001576
loc_og_mou_9float643001100.0018207
roam_og_mou_9float643001100.004004
onnet_mou_9float643001100.0016674
arpu_9float643001100.0027327
offnet_mou_9float643001100.0020452
roam_ic_mou_9float643001100.003370
std_og_t2t_mou_9float643001100.0011141
spl_og_mou_9float643001100.002966
og_others_9float643001100.00132
total_og_mou_9float643001100.0022615
loc_ic_t2t_mou_9float643001100.009407
loc_ic_t2m_mou_9float643001100.0015194
isd_og_mou_9float643001100.00908
std_og_t2m_mou_9float643001100.0012445
std_og_t2f_mou_9float643001100.001595
std_og_t2c_mou_9category3001100.001
std_og_mou_9float643001100.0015900
1# Looking at 'recharge' related 9th month columns for customers with missing 'date_of_last_rech_9'
2condition = data['date_of_last_rech_9'].isnull()
3data[condition].filter(regex='.*rech.*9$', axis=1).head()
total_rech_num_9total_rech_amt_9max_rech_amt_9date_of_last_rech_9
mobile_number
7000340381000NaT
7000854899000NaT
7000369789000NaT
7001967063000NaT
7000066601000NaT
1data[condition].filter(regex='.*rech.*9$', axis=1).nunique()
1total_rech_num_9 1
2total_rech_amt_9 1
3max_rech_amt_9 1
4date_of_last_rech_9 0
5dtype: int64
1# Imputing "last_date_of_month_*"
1print('Missing Value Percentage in last_date_of_month columns : \n', 100*data.filter(regex='last_date_of_month_.*', axis=1).isnull().sum() / data.shape[0], '\n')
2print('The unique values in last_date_of_month_6 : ' , data['last_date_of_month_6'].unique())
3print('The unique values in last_date_of_month_7 : ' , data['last_date_of_month_7'].unique())
4print('The unique values in last_date_of_month_8 : ' , data['last_date_of_month_8'].unique())
5print('The unique values in last_date_of_month_9 : ' , data['last_date_of_month_9'].unique())
1Missing Value Percentage in last_date_of_month columns :
2 last_date_of_month_6 0.000000
3last_date_of_month_7 0.103295
4last_date_of_month_8 0.523142
5last_date_of_month_9 1.199560
6dtype: float64
7
8The unique values in last_date_of_month_6 : ['6/30/2014']
9The unique values in last_date_of_month_7 : ['7/31/2014' nan]
10The unique values in last_date_of_month_8 : ['8/31/2014' nan]
11The unique values in last_date_of_month_9 : ['9/30/2014' nan]
  • Last date of month is the last calender date of a particular month, it is independent of the churn data.
  • Lets impute these missing values using mode.
1# Imputing last_date_of_month_* values
2data['last_date_of_month_7'] = data['last_date_of_month_7'].fillna(data['last_date_of_month_7'].mode()[0])
3data['last_date_of_month_8'] = data['last_date_of_month_8'].fillna(data['last_date_of_month_8'].mode()[0])
4data['last_date_of_month_9'] = data['last_date_of_month_9'].fillna(data['last_date_of_month_9'].mode()[0])
1data['last_date_of_month_7'].unique()
1array(['7/31/2014'], dtype=object)
1metadata = metadata_matrix(data)
2metadata
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
date_of_last_rech_9datetime64[ns]291458662.8930
date_of_last_rech_8datetime64[ns]294175941.9831
loc_og_t2o_moucategory298971140.381
date_of_last_rech_7datetime64[ns]298971140.3831
std_og_t2o_moucategory298971140.381
loc_ic_t2o_moucategory298971140.381
date_of_last_rech_6datetime64[ns]29949620.2130
isd_ic_mou_6float643001100.003429
total_ic_mou_6float643001100.0020602
total_ic_mou_7float643001100.0020711
total_ic_mou_8float643001100.0020096
total_ic_mou_9float643001100.0019437
spl_ic_mou_6float643001100.0078
spl_ic_mou_7float643001100.0093
spl_ic_mou_8float643001100.0085
spl_ic_mou_9float643001100.00287
total_rech_num_6int643001100.00102
ic_others_9float643001100.001284
std_ic_mou_8float643001100.008033
isd_ic_mou_7float643001100.003639
isd_ic_mou_8float643001100.003493
isd_ic_mou_9float643001100.003329
ic_others_6float643001100.001227
ic_others_7float643001100.001371
ic_others_8float643001100.001259
std_ic_mou_9float643001100.007745
std_ic_mou_7float643001100.008543
total_rech_num_8int643001100.0096
std_ic_t2m_mou_7float643001100.006747
loc_ic_mou_6float643001100.0019133
loc_ic_mou_7float643001100.0019030
loc_ic_mou_8float643001100.0018573
loc_ic_mou_9float643001100.0018018
std_ic_t2t_mou_6float643001100.004608
std_ic_t2t_mou_7float643001100.004706
std_ic_t2t_mou_8float643001100.004486
std_ic_t2t_mou_9float643001100.004280
std_ic_t2m_mou_6float643001100.006680
std_ic_t2m_mou_8float643001100.006420
std_ic_mou_6float643001100.008391
std_ic_t2m_mou_9float643001100.006168
std_ic_t2f_mou_6float643001100.002033
std_ic_t2f_mou_7float643001100.002075
std_ic_t2f_mou_8float643001100.001941
std_ic_t2f_mou_9float643001100.001971
std_ic_t2o_mou_6category3001100.001
std_ic_t2o_mou_7category3001100.001
std_ic_t2o_mou_8category3001100.001
std_ic_t2o_mou_9category3001100.001
total_rech_num_7int643001100.00101
circle_idcategory3001100.001
total_rech_num_9int643001100.0096
monthly_3g_9category3001100.0011
monthly_2g_9category3001100.005
sachet_2g_6category3001100.0030
sachet_2g_7category3001100.0034
sachet_2g_8category3001100.0034
sachet_2g_9category3001100.0029
monthly_3g_6category3001100.0012
monthly_3g_7category3001100.0015
monthly_3g_8category3001100.0012
sachet_3g_6category3001100.0025
monthly_2g_7category3001100.006
sachet_3g_7category3001100.0027
sachet_3g_8category3001100.0029
sachet_3g_9category3001100.0027
aonint643001100.003321
vbc_3g_8float643001100.007291
vbc_3g_7float643001100.007318
vbc_3g_6float643001100.006864
vbc_3g_9float643001100.002171
monthly_2g_8category3001100.006
monthly_2g_6category3001100.005
total_rech_amt_6int643001100.002241
last_day_rch_amt_7int643001100.00149
loc_ic_t2f_mou_9float643001100.004611
total_rech_amt_8int643001100.002299
total_rech_amt_9int643001100.002248
max_rech_amt_6int643001100.00170
max_rech_amt_7int643001100.00151
max_rech_amt_8int643001100.00182
max_rech_amt_9int643001100.00186
last_day_rch_amt_6int643001100.00158
last_day_rch_amt_8int643001100.00179
vol_3g_mb_9float643001100.007016
last_day_rch_amt_9int643001100.00170
vol_2g_mb_6float643001100.007809
vol_2g_mb_7float643001100.007813
vol_2g_mb_8float643001100.007310
vol_2g_mb_9float643001100.006984
vol_3g_mb_6float643001100.007043
vol_3g_mb_7float643001100.007440
vol_3g_mb_8float643001100.007151
total_rech_amt_7int643001100.002265
loc_ic_t2f_mou_7float643001100.004897
loc_ic_t2f_mou_8float643001100.004705
roam_og_mou_7float643001100.004431
roam_og_mou_9float643001100.004004
loc_og_t2t_mou_6float643001100.0011151
loc_og_t2t_mou_7float643001100.0011154
loc_og_t2t_mou_8float643001100.0010772
loc_og_t2t_mou_9float643001100.0010360
loc_og_t2m_mou_6float643001100.0016747
loc_og_t2m_mou_7float643001100.0016872
loc_og_t2m_mou_8float643001100.0016165
loc_og_t2m_mou_9float643001100.0015585
loc_og_t2f_mou_6float643001100.003252
loc_og_t2f_mou_7float643001100.003267
loc_og_t2f_mou_8float643001100.003124
loc_og_t2f_mou_9float643001100.003111
loc_og_t2c_mou_6float643001100.001658
loc_og_t2c_mou_7float643001100.001750
loc_og_t2c_mou_8float643001100.001730
loc_og_t2c_mou_9float643001100.001576
loc_og_mou_6float643001100.0019691
loc_og_mou_7float643001100.0019880
roam_og_mou_8float643001100.004382
roam_og_mou_6float643001100.005174
loc_og_mou_9float643001100.0018207
roam_ic_mou_9float643001100.003370
last_date_of_month_6object3001100.001
last_date_of_month_7object3001100.001
last_date_of_month_8object3001100.001
last_date_of_month_9object3001100.001
arpu_6float643001100.0029261
arpu_7float643001100.0029260
arpu_8float643001100.0028405
arpu_9float643001100.0027327
onnet_mou_6float643001100.0018813
onnet_mou_7float643001100.0018938
onnet_mou_8float643001100.0017604
onnet_mou_9float643001100.0016674
offnet_mou_6float643001100.0022454
offnet_mou_7float643001100.0022650
offnet_mou_8float643001100.0021513
offnet_mou_9float643001100.0020452
roam_ic_mou_6float643001100.004338
roam_ic_mou_7float643001100.003649
roam_ic_mou_8float643001100.003655
loc_og_mou_8float643001100.0018885
std_og_t2t_mou_6float643001100.0012777
loc_ic_t2f_mou_6float643001100.004817
isd_og_mou_9float643001100.00908
spl_og_mou_7float643001100.003399
spl_og_mou_8float643001100.003238
spl_og_mou_9float643001100.002966
og_others_6float643001100.00862
og_others_7float643001100.00123
og_others_8float643001100.00133
og_others_9float643001100.00132
total_og_mou_6float643001100.0024607
total_og_mou_7float643001100.0024913
total_og_mou_8float643001100.0023644
total_og_mou_9float643001100.0022615
loc_ic_t2t_mou_6float643001100.009872
loc_ic_t2t_mou_7float643001100.009961
loc_ic_t2t_mou_8float643001100.009671
loc_ic_t2t_mou_9float643001100.009407
loc_ic_t2m_mou_6float643001100.0016015
loc_ic_t2m_mou_7float643001100.0016068
loc_ic_t2m_mou_8float643001100.0015598
loc_ic_t2m_mou_9float643001100.0015194
spl_og_mou_6float643001100.003053
isd_og_mou_8float643001100.00940
std_og_t2t_mou_7float643001100.0012983
isd_og_mou_7float643001100.001125
std_og_t2t_mou_8float643001100.0011781
std_og_t2t_mou_9float643001100.0011141
std_og_t2m_mou_6float643001100.0014518
std_og_t2m_mou_7float643001100.0014589
std_og_t2m_mou_8float643001100.0013326
std_og_t2m_mou_9float643001100.0012445
std_og_t2f_mou_6float643001100.001773
std_og_t2f_mou_7float643001100.001714
std_og_t2f_mou_8float643001100.001627
std_og_t2f_mou_9float643001100.001595
std_og_t2c_mou_6category3001100.001
std_og_t2c_mou_7category3001100.001
std_og_t2c_mou_8category3001100.001
std_og_t2c_mou_9category3001100.001
std_og_mou_6float643001100.0018325
std_og_mou_7float643001100.0018445
std_og_mou_8float643001100.0016864
std_og_mou_9float643001100.0015900
isd_og_mou_6float643001100.001113
Average_rech_amt_6n7float643001100.003025
1print(data[data['date_of_last_rech_6'].isnull()][['date_of_last_rech_6','total_rech_amt_6','total_rech_num_6']].nunique())
2print(data[data['date_of_last_rech_7'].isnull()][['date_of_last_rech_7','total_rech_amt_7','total_rech_num_7']].nunique())
3print(data[data['date_of_last_rech_8'].isnull()][['date_of_last_rech_8','total_rech_amt_8','total_rech_num_8']].nunique())
4print(data[data['date_of_last_rech_9'].isnull()][['date_of_last_rech_9','total_rech_amt_9','total_rech_num_9']].nunique())
1date_of_last_rech_6 0
2total_rech_amt_6 1
3total_rech_num_6 1
4dtype: int64
5date_of_last_rech_7 0
6total_rech_amt_7 1
7total_rech_num_7 1
8dtype: int64
9date_of_last_rech_8 0
10total_rech_amt_8 1
11total_rech_num_8 1
12dtype: int64
13date_of_last_rech_9 0
14total_rech_amt_9 1
15total_rech_num_9 1
16dtype: int64
1print("\n",data[data['date_of_last_rech_6'].isnull()][['total_rech_amt_6','total_rech_num_6']].head())
2print("\n",data[data['date_of_last_rech_7'].isnull()][['total_rech_amt_7','total_rech_num_7']].head())
3print("\n",data[data['date_of_last_rech_8'].isnull()][['total_rech_amt_8','total_rech_num_8']].head())
4print("\n",data[data['date_of_last_rech_9'].isnull()][['total_rech_amt_9','total_rech_num_9']].head())
1total_rech_amt_6 total_rech_num_6
2mobile_number
37001588448 0 0
47001223277 0 0
57000721536 0 0
67001490351 0 0
77000665415 0 0
8
9 total_rech_amt_7 total_rech_num_7
10mobile_number
117000369789 0 0
127001967148 0 0
137000066601 0 0
147001189556 0 0
157002024450 0 0
16
17 total_rech_amt_8 total_rech_num_8
18mobile_number
197000340381 0 0
207000608224 0 0
217000369789 0 0
227000248548 0 0
237001967063 0 0
24
25 total_rech_amt_9 total_rech_num_9
26mobile_number
277000340381 0 0
287000854899 0 0
297000369789 0 0
307001967063 0 0
317000066601 0 0
  • The columns ‘date_of_last_rech’ for june,july and August does not have any value becuase there are no recharges done by the user during those months.

Dropping columns with one unique value.

1metadata=metadata_matrix(data)
2singular_value_cols=metadata[metadata['Unique_Values_Count']==1].index.values
3#data.loc[metadata_matrix(data)['Unique_Values_Count']==1].index
1#Dropping singular value columns.
2data.drop(columns=singular_value_cols,inplace=True)
1# Dropping date columns
2# since they are not usage related columns and can't be used for modelling
3date_columns = data.filter(regex='^date.*').columns
4data.drop(columns=date_columns, inplace=True)
5metadata_matrix(data)
DatatypeNon_Null_CountNull_CountNull_PercentageUnique_Values_Count
arpu_6float643001100.029261
total_ic_mou_6float643001100.020602
total_ic_mou_8float643001100.020096
total_ic_mou_9float643001100.019437
spl_ic_mou_6float643001100.078
spl_ic_mou_7float643001100.093
spl_ic_mou_8float643001100.085
spl_ic_mou_9float643001100.0287
isd_ic_mou_6float643001100.03429
isd_ic_mou_7float643001100.03639
isd_ic_mou_8float643001100.03493
isd_ic_mou_9float643001100.03329
ic_others_6float643001100.01227
ic_others_7float643001100.01371
ic_others_8float643001100.01259
ic_others_9float643001100.01284
total_rech_num_6int643001100.0102
total_rech_num_7int643001100.0101
total_rech_num_8int643001100.096
total_ic_mou_7float643001100.020711
std_ic_mou_9float643001100.07745
total_rech_amt_6int643001100.02241
std_ic_mou_8float643001100.08033
loc_ic_mou_7float643001100.019030
loc_ic_mou_8float643001100.018573
loc_ic_mou_9float643001100.018018
std_ic_t2t_mou_6float643001100.04608
std_ic_t2t_mou_7float643001100.04706
std_ic_t2t_mou_8float643001100.04486
std_ic_t2t_mou_9float643001100.04280
std_ic_t2m_mou_6float643001100.06680
std_ic_t2m_mou_7float643001100.06747
std_ic_t2m_mou_8float643001100.06420
std_ic_t2m_mou_9float643001100.06168
std_ic_t2f_mou_6float643001100.02033
std_ic_t2f_mou_7float643001100.02075
std_ic_t2f_mou_8float643001100.01941
std_ic_t2f_mou_9float643001100.01971
std_ic_mou_6float643001100.08391
std_ic_mou_7float643001100.08543
total_rech_num_9int643001100.096
total_rech_amt_7int643001100.02265
arpu_7float643001100.029260
monthly_2g_8category3001100.06
sachet_2g_6category3001100.030
sachet_2g_7category3001100.034
sachet_2g_8category3001100.034
sachet_2g_9category3001100.029
monthly_3g_6category3001100.012
monthly_3g_7category3001100.015
monthly_3g_8category3001100.012
monthly_3g_9category3001100.011
sachet_3g_6category3001100.025
sachet_3g_7category3001100.027
sachet_3g_8category3001100.029
sachet_3g_9category3001100.027
aonint643001100.03321
vbc_3g_8float643001100.07291
vbc_3g_7float643001100.07318
vbc_3g_6float643001100.06864
vbc_3g_9float643001100.02171
monthly_2g_9category3001100.05
monthly_2g_7category3001100.06
total_rech_amt_8int643001100.02299
monthly_2g_6category3001100.05
total_rech_amt_9int643001100.02248
max_rech_amt_6int643001100.0170
max_rech_amt_7int643001100.0151
max_rech_amt_8int643001100.0182
max_rech_amt_9int643001100.0186
last_day_rch_amt_6int643001100.0158
last_day_rch_amt_7int643001100.0149
last_day_rch_amt_8int643001100.0179
last_day_rch_amt_9int643001100.0170
vol_2g_mb_6float643001100.07809
vol_2g_mb_7float643001100.07813
vol_2g_mb_8float643001100.07310
vol_2g_mb_9float643001100.06984
vol_3g_mb_6float643001100.07043
vol_3g_mb_7float643001100.07440
vol_3g_mb_8float643001100.07151
vol_3g_mb_9float643001100.07016
loc_ic_mou_6float643001100.019133
loc_ic_t2f_mou_9float643001100.04611
loc_ic_t2f_mou_8float643001100.04705
loc_og_t2t_mou_7float643001100.011154
loc_og_t2t_mou_9float643001100.010360
loc_og_t2m_mou_6float643001100.016747
loc_og_t2m_mou_7float643001100.016872
loc_og_t2m_mou_8float643001100.016165
loc_og_t2m_mou_9float643001100.015585
loc_og_t2f_mou_6float643001100.03252
loc_og_t2f_mou_7float643001100.03267
loc_og_t2f_mou_8float643001100.03124
loc_og_t2f_mou_9float643001100.03111
loc_og_t2c_mou_6float643001100.01658
loc_og_t2c_mou_7float643001100.01750
loc_og_t2c_mou_8float643001100.01730
loc_og_t2c_mou_9float643001100.01576
loc_og_mou_6float643001100.019691
loc_og_mou_7float643001100.019880
loc_og_mou_8float643001100.018885
loc_og_mou_9float643001100.018207
loc_og_t2t_mou_8float643001100.010772
loc_og_t2t_mou_6float643001100.011151
loc_ic_t2f_mou_7float643001100.04897
roam_og_mou_9float643001100.04004
arpu_8float643001100.028405
arpu_9float643001100.027327
onnet_mou_6float643001100.018813
onnet_mou_7float643001100.018938
onnet_mou_8float643001100.017604
onnet_mou_9float643001100.016674
offnet_mou_6float643001100.022454
offnet_mou_7float643001100.022650
offnet_mou_8float643001100.021513
offnet_mou_9float643001100.020452
roam_ic_mou_6float643001100.04338
roam_ic_mou_7float643001100.03649
roam_ic_mou_8float643001100.03655
roam_ic_mou_9float643001100.03370
roam_og_mou_6float643001100.05174
roam_og_mou_7float643001100.04431
roam_og_mou_8float643001100.04382
std_og_t2t_mou_6float643001100.012777
std_og_t2t_mou_7float643001100.012983
std_og_t2t_mou_8float643001100.011781
std_og_t2t_mou_9float643001100.011141
og_others_6float643001100.0862
og_others_7float643001100.0123
og_others_8float643001100.0133
og_others_9float643001100.0132
total_og_mou_6float643001100.024607
total_og_mou_7float643001100.024913
total_og_mou_8float643001100.023644
total_og_mou_9float643001100.022615
loc_ic_t2t_mou_6float643001100.09872
loc_ic_t2t_mou_7float643001100.09961
loc_ic_t2t_mou_8float643001100.09671
loc_ic_t2t_mou_9float643001100.09407
loc_ic_t2m_mou_6float643001100.016015
loc_ic_t2m_mou_7float643001100.016068
loc_ic_t2m_mou_8float643001100.015598
loc_ic_t2m_mou_9float643001100.015194
loc_ic_t2f_mou_6float643001100.04817
spl_og_mou_9float643001100.02966
spl_og_mou_8float643001100.03238
spl_og_mou_7float643001100.03399
std_og_t2f_mou_9float643001100.01595
std_og_t2m_mou_6float643001100.014518
std_og_t2m_mou_7float643001100.014589
std_og_t2m_mou_8float643001100.013326
std_og_t2m_mou_9float643001100.012445
std_og_t2f_mou_6float643001100.01773
std_og_t2f_mou_7float643001100.01714
std_og_t2f_mou_8float643001100.01627
std_og_mou_6float643001100.018325
spl_og_mou_6float643001100.03053
std_og_mou_7float643001100.018445
std_og_mou_8float643001100.016864
std_og_mou_9float643001100.015900
isd_og_mou_6float643001100.01113
isd_og_mou_7float643001100.01125
isd_og_mou_8float643001100.0940
isd_og_mou_9float643001100.0908
Average_rech_amt_6n7float643001100.03025

Tagging Churn (TARGET variable)

1data['Churn'] = 0
2churned_customers = data.query('total_og_mou_9 == 0 & total_ic_mou_9 == 0 & vol_2g_mb_9 == 0 & vol_3g_mb_9 == 0').index
3data.loc[churned_customers,'Churn']=1
4data['Churn'] = data['Churn'].astype('category')
1# Churn proportions
2data['Churn'].value_counts(normalize=True).to_frame()
Churn
00.913598
10.086402

Dropping Churn Phase Columns

1churn_phase_columns = data.filter(regex='9$').columns
2data.drop(columns=churn_phase_columns, inplace=True)
3print('Retained Columns')
4data.columns.to_frame(index=False)
1Retained Columns
0
0arpu_6
1arpu_7
2arpu_8
3onnet_mou_6
4onnet_mou_7
5onnet_mou_8
6offnet_mou_6
7offnet_mou_7
8offnet_mou_8
9roam_ic_mou_6
10roam_ic_mou_7
11roam_ic_mou_8
12roam_og_mou_6
13roam_og_mou_7
14roam_og_mou_8
15loc_og_t2t_mou_6
16loc_og_t2t_mou_7
17loc_og_t2t_mou_8
18loc_og_t2m_mou_6
19loc_og_t2m_mou_7
20loc_og_t2m_mou_8
21loc_og_t2f_mou_6
22loc_og_t2f_mou_7
23loc_og_t2f_mou_8
24loc_og_t2c_mou_6
25loc_og_t2c_mou_7
26loc_og_t2c_mou_8
27loc_og_mou_6
28loc_og_mou_7
29loc_og_mou_8
30std_og_t2t_mou_6
31std_og_t2t_mou_7
32std_og_t2t_mou_8
33std_og_t2m_mou_6
34std_og_t2m_mou_7
35std_og_t2m_mou_8
36std_og_t2f_mou_6
37std_og_t2f_mou_7
38std_og_t2f_mou_8
39std_og_mou_6
40std_og_mou_7
41std_og_mou_8
42isd_og_mou_6
43isd_og_mou_7
44isd_og_mou_8
45spl_og_mou_6
46spl_og_mou_7
47spl_og_mou_8
48og_others_6
49og_others_7
50og_others_8
51total_og_mou_6
52total_og_mou_7
53total_og_mou_8
54loc_ic_t2t_mou_6
55loc_ic_t2t_mou_7
56loc_ic_t2t_mou_8
57loc_ic_t2m_mou_6
58loc_ic_t2m_mou_7
59loc_ic_t2m_mou_8
60loc_ic_t2f_mou_6
61loc_ic_t2f_mou_7
62loc_ic_t2f_mou_8
63loc_ic_mou_6
64loc_ic_mou_7
65loc_ic_mou_8
66std_ic_t2t_mou_6
67std_ic_t2t_mou_7
68std_ic_t2t_mou_8
69std_ic_t2m_mou_6
70std_ic_t2m_mou_7
71std_ic_t2m_mou_8
72std_ic_t2f_mou_6
73std_ic_t2f_mou_7
74std_ic_t2f_mou_8
75std_ic_mou_6
76std_ic_mou_7
77std_ic_mou_8
78total_ic_mou_6
79total_ic_mou_7
80total_ic_mou_8
81spl_ic_mou_6
82spl_ic_mou_7
83spl_ic_mou_8
84isd_ic_mou_6
85isd_ic_mou_7
86isd_ic_mou_8
87ic_others_6
88ic_others_7
89ic_others_8
90total_rech_num_6
91total_rech_num_7
92total_rech_num_8
93total_rech_amt_6
94total_rech_amt_7
95total_rech_amt_8
96max_rech_amt_6
97max_rech_amt_7
98max_rech_amt_8
99last_day_rch_amt_6
100last_day_rch_amt_7
101last_day_rch_amt_8
102vol_2g_mb_6
103vol_2g_mb_7
104vol_2g_mb_8
105vol_3g_mb_6
106vol_3g_mb_7
107vol_3g_mb_8
108monthly_2g_6
109monthly_2g_7
110monthly_2g_8
111sachet_2g_6
112sachet_2g_7
113sachet_2g_8
114monthly_3g_6
115monthly_3g_7
116monthly_3g_8
117sachet_3g_6
118sachet_3g_7
119sachet_3g_8
120aon
121vbc_3g_8
122vbc_3g_7
123vbc_3g_6
124Average_rech_amt_6n7
125Churn
1print('retained no of rows', data.shape[0])
2print('retain no of columns', data.shape[1])
1retained no of rows 30011
2retain no of columns 126

Exploratory Data Analysis

Summary Statistics

1data.describe()
arpu_6arpu_7arpu_8onnet_mou_6onnet_mou_7onnet_mou_8offnet_mou_6offnet_mou_7offnet_mou_8roam_ic_mou_6roam_ic_mou_7roam_ic_mou_8roam_og_mou_6roam_og_mou_7roam_og_mou_8loc_og_t2t_mou_6loc_og_t2t_mou_7loc_og_t2t_mou_8loc_og_t2m_mou_6loc_og_t2m_mou_7loc_og_t2m_mou_8loc_og_t2f_mou_6loc_og_t2f_mou_7loc_og_t2f_mou_8loc_og_t2c_mou_6loc_og_t2c_mou_7loc_og_t2c_mou_8loc_og_mou_6loc_og_mou_7loc_og_mou_8std_og_t2t_mou_6std_og_t2t_mou_7std_og_t2t_mou_8std_og_t2m_mou_6std_og_t2m_mou_7std_og_t2m_mou_8std_og_t2f_mou_6std_og_t2f_mou_7std_og_t2f_mou_8std_og_mou_6std_og_mou_7std_og_mou_8isd_og_mou_6isd_og_mou_7isd_og_mou_8spl_og_mou_6spl_og_mou_7spl_og_mou_8og_others_6og_others_7og_others_8total_og_mou_6total_og_mou_7total_og_mou_8loc_ic_t2t_mou_6loc_ic_t2t_mou_7loc_ic_t2t_mou_8loc_ic_t2m_mou_6loc_ic_t2m_mou_7loc_ic_t2m_mou_8loc_ic_t2f_mou_6loc_ic_t2f_mou_7loc_ic_t2f_mou_8loc_ic_mou_6loc_ic_mou_7loc_ic_mou_8std_ic_t2t_mou_6std_ic_t2t_mou_7std_ic_t2t_mou_8std_ic_t2m_mou_6std_ic_t2m_mou_7std_ic_t2m_mou_8std_ic_t2f_mou_6std_ic_t2f_mou_7std_ic_t2f_mou_8std_ic_mou_6std_ic_mou_7std_ic_mou_8total_ic_mou_6total_ic_mou_7total_ic_mou_8spl_ic_mou_6spl_ic_mou_7spl_ic_mou_8isd_ic_mou_6isd_ic_mou_7isd_ic_mou_8ic_others_6ic_others_7ic_others_8total_rech_num_6total_rech_num_7total_rech_num_8total_rech_amt_6total_rech_amt_7total_rech_amt_8max_rech_amt_6max_rech_amt_7max_rech_amt_8last_day_rch_amt_6last_day_rch_amt_7last_day_rch_amt_8vol_2g_mb_6vol_2g_mb_7vol_2g_mb_8vol_3g_mb_6vol_3g_mb_7vol_3g_mb_8aonvbc_3g_8vbc_3g_7vbc_3g_6Average_rech_amt_6n7
count30011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.0000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.0000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.0000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.00000030011.000000
mean587.284404589.135427534.857433296.034461304.343206267.600412417.933372423.924375375.02169117.41276413.52211413.2562729.32164822.03600321.46927294.68069695.72972987.139995181.279583181.271524167.5911996.979337.0972686.4943141.5671601.8622291.712739282.948414284.107492261.233938189.753131199.877508172.196408203.097767213.411914179.5687902.0107662.0342411.789728394.865994415.327988353.5588262.2644252.2074002.0293145.9163647.4254876.8851930.6925070.0476000.059131686.697541709.124730623.77468468.74905470.31135165.936968159.613810160.813032153.62851715.59562916.51002314.706512243.968340247.644401234.28157716.22935016.89372315.05155932.01516333.47715030.4347652.8745062.9929482.68092551.12299253.3678648.170990307.512073314.875472295.4265310.0667310.0180660.02766011.15653012.36019011.7008351.1888031.4768891.23775612.12132211.91346510.225317697.365833695.962880613.638799171.414048175.661058162.869348104.485655105.28712895.65329478.85900978.17138269.209105258.392681278.093737269.8641111264.064776129.439626135.127102121.360548696.664356
std442.722413462.897814492.259586460.775592481.780488466.560947470.588583486.525332477.48937779.15265776.30373674.55207118.57041497.925249106.244774236.849265248.132623234.721938250.132066240.722132234.86246822.6655222.58886420.2200286.8893179.2556457.397562379.985249375.837282366.539171409.716719428.119476410.033964413.489240437.941904416.75283412.45742213.35044111.700376606.508681637.446710616.21969045.91808745.61938144.79492618.62137323.06574322.8934142.2813252.7417863.320320660.356820685.071178685.983313158.647160167.315954155.702334222.001036219.432004217.02634945.82700949.47837143.714061312.805586315.468343307.04380078.86235884.69140372.433104101.084965105.806605105.30889819.92847220.51131720.269535140.504104149.17944140.965196361.159561369.654489360.3431530.1942730.1819440.11657467.25838776.99229374.92860713.98700315.40648312.8898799.5435509.6055329.478572539.325984562.143146601.821630174.703215181.545389172.605809142.767207141.148386145.260363277.445058280.331857268.494284866.195376855.682340859.299266975.263117390.478591408.024394389.726031488.782088
min-2258.709000-2014.045000-945.8080000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000180.0000000.0000000.0000000.000000368.500000
25%364.161000365.004500289.60950041.11000040.95000027.010000137.335000135.68000095.6950000.0000000.0000000.000000.0000000.0000000.0000008.3200009.1300005.79000030.29000033.58000022.4200000.000000.0000000.0000000.0000000.0000000.00000051.01000056.71000038.2700000.0000000.0000000.0000001.6000001.3300000.0000000.0000000.0000000.0000005.9500005.5550001.7800000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000266.170000275.045000188.7900008.2900009.4600006.81000033.46000038.13000029.6600000.0000000.0000000.00000056.70000063.53500049.9850000.0000000.0000000.0000000.4500000.4800000.0000000.0000000.0000000.0000002.6300002.780001.43000089.97500098.82000078.9300000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000006.0000006.0000004.000000432.000000426.500000309.000000110.000000110.00000067.00000030.00000027.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000480.0000000.0000000.0000000.000000450.000000
50%495.682000493.561000452.091000125.830000125.46000099.440000282.190000281.940000240.9400000.0000000.0000000.000000.0000000.0000000.00000032.59000033.16000028.640000101.240000104.34000089.8100000.330000.4000000.1600000.0000000.0000000.000000166.310000170.440000148.28000012.83000013.3500005.93000037.73000037.53000023.6600000.0000000.0000000.000000126.010000131.73000072.8900000.0000000.0000000.0000000.2100000.7800000.4900000.0000000.0000000.000000510.230000525.580000435.33000029.13000030.13000026.84000093.94000096.83000089.8100001.9600002.2100001.850000151.060000154.830000142.8400001.0500001.2000000.5600007.0800007.4600005.7100000.0000000.0000000.00000015.03000016.1100012.560000205.240000211.190000193.4400000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000009.0000009.0000008.000000584.000000581.000000520.000000120.000000128.000000130.000000110.00000098.00000050.0000000.0000000.0000000.0000000.0000000.0000000.000000914.0000000.0000000.0000000.000000568.500000
75%703.922000700.788000671.150000353.310000359.925000297.735000523.125000532.695000482.6100000.0000000.0000000.000000.0000000.0000000.00000091.46000091.48000084.670000240.165000239.485000223.5900005.090005.2600004.6800000.0000000.1000000.050000374.475000375.780000348.310000178.085000191.380000132.820000211.210000223.010000164.7250000.0000000.0000000.000000573.090000615.150000481.0300000.0000000.0000000.0000005.1600007.1100006.3800000.0000000.0000000.000000899.505000931.050000833.10000073.64000074.68000070.330000202.830000203.485000196.97500012.44000013.03500011.605000315.500000316.780000302.11000010.28000010.9800008.86000027.54000029.23500025.3300000.1800000.2600000.13000047.54000050.3600043.410000393.680000396.820000380.4100000.0000000.0000000.0000000.0000000.0000000.0000000.0600000.0000000.06000015.00000015.00000013.000000837.000000835.000000790.000000200.000000200.000000198.000000120.000000130.000000130.00000014.45000014.9600009.6200000.0000002.0800000.0000001924.0000001.6000001.9900000.000000795.500000
max27731.08800035145.83400033543.6240007376.7100008157.78000010752.5600008362.3600009667.13000014007.3400002613.3100003813.2900004169.810003775.1100002812.0400005337.0400006431.3300007400.66000010752.5600004729.7400004557.1400004961.3300001466.030001196.430000928.490000342.860000569.710000351.83000010643.3800007674.78000011039.9100007366.5800008133.6600008014.4300008314.7600009284.74000013950.040000628.560000544.630000516.9100008432.99000010936.73000013980.0600005900.6600005490.2800005681.5400001023.2100001265.7900001390.880000100.610000370.130000394.93000010674.03000011365.31000014043.0600006351.4400005709.5900004003.2100004693.8600004388.7300005738.4600001678.4100001983.0100001588.5300006496.1100006466.7400005748.8100005459.5600005800.9300004309.2900004630.2300003470.3800005645.8600001351.1100001136.0800001394.8900005459.6300006745.760005957.1400006798.6400007279.0800005990.71000019.76000021.3300006.2300003965.6900004747.9100004100.3800001344.1400001495.9400001209.860000307.000000138.000000196.00000035190.00000040335.00000045320.0000004010.0000004010.0000004449.0000004010.0000004010.0000004449.00000010285.9000007873.55000011117.61000045735.40000028144.12000030036.0600004321.00000012916.2200009165.60000011166.21000037762.500000
  • The telecom company has many users with negative average revenues in both phases. These users are likely to churn
1categorical_columns = data.dtypes[data.dtypes == 'category'].index.values
2print('Mode : ')
3data[categorical_columns].mode().T
1Mode :
0
monthly_2g_60
monthly_2g_70
monthly_2g_80
sachet_2g_60
sachet_2g_70
sachet_2g_80
monthly_3g_60
monthly_3g_70
monthly_3g_80
sachet_3g_60
sachet_3g_70
sachet_3g_80
Churn0
  • Most customers prefer the plans of ‘0’ category

Univariate Analysis

1churned_customers = data[data['Churn'] == 1]
2non_churned_customers = data[data['Churn'] == 0]

Age on Network

1plt.figure(figsize=(12,8))
2sns.violinplot(x='aon', y='Churn', data=data)
3plt.title('Age on Network vs Churn')
4plt.show()

png

  • The customers with lesser ‘aon’ are more likely to Churn when compared to the Customers with higer ‘aon’
1# function for numerical variable univariate analysis
2from tabulate import tabulate
3def num_univariate_analysis(column_names,scale='linear') :
4 # boxplot for column vs target
5
6 fig = plt.figure(figsize=(16,8))
7 ax1 = fig.add_subplot(1,3,1)
8 sns.violinplot(x='Churn', y = column_names[0], data = data, ax=ax1)
9 title = ''.join(column_names[0]) +' vs Churn'
10 ax1.set(title=title)
11 if scale == 'log' :
12 plt.yscale('log')
13 ax1.set(ylabel= column_names[0] + '(Log Scale)')
14
15 ax2 = fig.add_subplot(1,3,2)
16 sns.violinplot(x='Churn', y = column_names[1], data = data, ax=ax2)
17 title = ''.join(column_names[1]) +' vs Churn'
18 ax2.set(title=title)
19 if scale == 'log' :
20 plt.yscale('log')
21 ax2.set(ylabel= column_names[1] + '(Log Scale)')
22
23 ax3 = fig.add_subplot(1,3,3)
24 sns.violinplot(x='Churn', y = column_names[2], data = data, ax=ax3)
25 title = ''.join(column_names[2]) +' vs Churn'
26 ax3.set(title=title)
27 if scale == 'log' :
28 plt.yscale('log')
29 ax3.set(ylabel= column_names[2] + '(Log Scale)')
30
31 # summary statistic
32
33 print('Customers who churned (Churn : 1)')
34 print(churned_customers[column_names].describe())
35
36 print('\nCustomers who did not churn (Churn : 0)')
37 print(non_churned_customers[column_names].describe(),'\n')
1# function for categorical variable univariate analysis
2!pip install sidetable
3import sidetable
4def cat_univariate_analysis(column_names,figsize=(16,4)) :
5
6 # column vs target count plot
7 fig = plt.figure(figsize=figsize)
8
9 ax1 = fig.add_subplot(1,3,1)
10 sns.countplot(x=column_names[0],hue='Churn',data=data, ax=ax1)
11 title = column_names[0] + ' vs No of Churned Customers'
12 ax1.set(title= title)
13 ax1.legend(loc='upper right')
14
15
16 ax2 = fig.add_subplot(1,3,2)
17 sns.countplot(x=column_names[1],hue='Churn',data=data, ax=ax2)
18 title = column_names[1] + ' vs No of Churned Customers'
19 ax2.set(title= title)
20 ax2.legend(loc='upper right')
21
22
23 ax3 = fig.add_subplot(1,3,3)
24 sns.countplot(x=column_names[2],hue='Churn',data=data, ax=ax3)
25 title = column_names[2] + ' vs No of Churned Customers'
26 ax3.set(title= title)
27 ax3.legend(loc='upper right')
28
29
30 # Percentages
31 print('Customers who churned (Churn : 1)')
32 print(tabulate(pd.DataFrame(churned_customers.stb.freq([column_names[0]])), headers='keys', tablefmt='psql'),'\n')
33 print(tabulate(pd.DataFrame(churned_customers.stb.freq([column_names[1]])), headers='keys', tablefmt='psql'),'\n')
34 print(tabulate(pd.DataFrame(churned_customers.stb.freq([column_names[2]])), headers='keys', tablefmt='psql'),'\n')
35
36 print('\nCustomers who did not churn (Churn : 0)')
37 print(tabulate(pd.DataFrame(non_churned_customers.stb.freq([column_names[0]])), headers='keys', tablefmt='psql'),'\n')
38 print(tabulate(pd.DataFrame(non_churned_customers.stb.freq([column_names[1]])), headers='keys', tablefmt='psql'),'\n')
39 print(tabulate(pd.DataFrame(non_churned_customers.stb.freq([column_names[2]])), headers='keys', tablefmt='psql'),'\n')
1Requirement already satisfied: sidetable in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (0.3.0)
2Requirement already satisfied: pandas>=1.0 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from sidetable) (1.1.2)
3Requirement already satisfied: python-dateutil>=2.7.3 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from pandas>=1.0->sidetable) (2.8.1)
4Requirement already satisfied: numpy>=1.15.4 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from pandas>=1.0->sidetable) (1.18.1)
5Requirement already satisfied: pytz>=2017.2 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from pandas>=1.0->sidetable) (2019.3)
6Requirement already satisfied: six>=1.5 in /Users/jayanth/opt/anaconda3/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas>=1.0->sidetable) (1.14.0)

arpu_6, arpu_7 , arpu_8

1columns = ['arpu_6','arpu_7','arpu_8']
2num_univariate_analysis(columns,'log')
1Customers who churned (Churn : 1)
2 arpu_6 arpu_7 arpu_8
3count 2593.000000 2593.000000 2593.000000
4mean 678.716970 550.511946 243.063343
5std 551.792864 517.241221 378.843531
6min -209.465000 -158.963000 -37.887000
725% 396.507000 289.641000 0.000000
850% 573.396000 464.674000 101.894000
975% 819.460000 691.588000 351.028000
10max 11505.508000 13224.119000 5228.826000
11
12Customers who did not churn (Churn : 0)
13 arpu_6 arpu_7 arpu_8
14count 27418.000000 27418.000000 27418.000000
15mean 578.637360 592.788162 562.453248
16std 429.988265 457.265996 492.802655
17min -2258.709000 -2014.045000 -945.808000
1825% 362.218000 369.610500 319.118500
1950% 489.324000 496.182500 471.024000
2075% 690.891750 701.418000 690.921000
21max 27731.088000 35145.834000 33543.624000

png

  • We can understand from the above plots that revenue generated by the Customers who are about to churn is very unstable.
  • The Customers whose arpu decreases in 7th month are more likely to churn when compared to ones with increase in arpu.

total_og_mou_6, total_og_mou_7, total_og_mou_8

1columns = ['total_og_mou_6', 'total_og_mou_7', 'total_og_mou_8']
2num_univariate_analysis(columns)
1Customers who churned (Churn : 1)
2 total_og_mou_6 total_og_mou_7 total_og_mou_8
3count 2593.000000 2593.000000 2593.000000
4mean 867.961342 677.868909 225.083741
5std 852.697688 786.961399 471.672718
6min 0.000000 0.000000 0.000000
725% 277.880000 110.090000 0.000000
850% 658.360000 466.910000 0.000000
975% 1209.040000 926.760000 255.810000
10max 8488.360000 8285.640000 5206.210000
11
12Customers who did not churn (Churn : 0)
13 total_og_mou_6 total_og_mou_7 total_og_mou_8
14count 27418.000000 27418.000000 27418.000000
15mean 669.554896 712.080684 661.480046
16std 636.531612 674.580516 691.079113
17min 0.000000 0.000000 0.000000
1825% 265.682500 284.500000 227.970000
1950% 500.410000 529.935000 470.475000
2075% 872.070000 931.197500 866.045000
21max 10674.030000 11365.310000 14043.060000

png

  • The Customers with high total_og_mou in 6th month and lower total_og_mou in 7th month are more likely to churn compared to the rest.

‘total_ic_mou_6’, ‘total_ic_mou_7’, ‘total_ic_mou_8’

1columns = ['total_ic_mou_6', 'total_ic_mou_7', 'total_ic_mou_8']
2num_univariate_analysis(columns)
1Customers who churned (Churn : 1)
2 total_ic_mou_6 total_ic_mou_7 total_ic_mou_8
3count 2593.000000 2593.000000 2593.000000
4mean 241.954404 193.341076 68.807042
5std 360.836586 318.183813 154.450340
6min 0.000000 0.000000 0.000000
725% 49.460000 27.890000 0.000000
850% 137.330000 99.980000 0.000000
975% 289.510000 235.740000 70.290000
10max 6633.180000 5137.560000 1859.280000
11
12Customers who did not churn (Churn : 0)
13 total_ic_mou_6 total_ic_mou_7 total_ic_mou_8
14count 27418.000000 27418.000000 27418.000000
15mean 313.712052 326.369333 316.858595
16std 360.580253 372.112086 366.818717
17min 0.000000 0.000000 0.000000
1825% 94.460000 107.802500 98.265000
1950% 212.160000 222.290000 212.360000
2075% 401.602500 410.182500 402.270000
21max 6798.640000 7279.080000 5990.710000

png

  • The Customers with decrease in rate of total_ic_mou in 7th month are more likely to churn, compared to the rest.

vol_2g_mb_6, vol_2g_mb_7, vol_2g_mb_8

1columns = ['vol_2g_mb_6', 'vol_2g_mb_7', 'vol_2g_mb_8']
2num_univariate_analysis(columns, 'log')
1Customers who churned (Churn : 1)
2 vol_2g_mb_6 vol_2g_mb_7 vol_2g_mb_8
3count 2593.000000 2593.000000 2593.000000
4mean 60.775588 49.054393 15.283185
5std 243.084276 219.485813 120.975111
6min 0.000000 0.000000 0.000000
725% 0.000000 0.000000 0.000000
850% 0.000000 0.000000 0.000000
975% 0.000000 0.000000 0.000000
10max 4017.160000 3430.730000 3349.190000
11
12Customers who did not churn (Churn : 0)
13 vol_2g_mb_6 vol_2g_mb_7 vol_2g_mb_8
14count 27418.000000 27418.000000 27418.000000
15mean 80.569210 80.925060 74.309036
16std 280.420463 285.265125 277.889339
17min 0.000000 0.000000 0.000000
1825% 0.000000 0.000000 0.000000
1950% 0.000000 0.000000 0.000000
2075% 16.937500 18.267500 14.245000
21max 10285.900000 7873.550000 11117.610000

png

  • Customers with stable usage of 2g volumes throughout 6 and 7 months are less likely to churn.
  • Customers with fall in consumption of 2g volumes in 7th month are more likely to Churn.

vol_3g_mb_6, vol_3g_mb_7, vol_3g_mb_8, monthly_3g_6

1columns = ['vol_3g_mb_6', 'vol_3g_mb_7', 'vol_3g_mb_8', 'monthly_3g_6']
2num_univariate_analysis(columns, 'log')
1Customers who churned (Churn : 1)
2 vol_3g_mb_6 vol_3g_mb_7 vol_3g_mb_8
3count 2593.000000 2593.000000 2593.000000
4mean 188.395461 157.714254 56.776880
5std 715.327843 690.773561 446.532769
6min 0.000000 0.000000 0.000000
725% 0.000000 0.000000 0.000000
850% 0.000000 0.000000 0.000000
975% 0.000000 0.000000 0.000000
10max 9400.120000 15115.510000 13440.720000
11
12Customers who did not churn (Churn : 0)
13 vol_3g_mb_6 vol_3g_mb_7 vol_3g_mb_8
14count 27418.000000 27418.000000 27418.000000
15mean 265.012522 289.478375 290.016390
16std 878.846885 868.808831 885.821105
17min 0.000000 0.000000 0.000000
1825% 0.000000 0.000000 0.000000
1950% 0.000000 0.000000 0.000000
2075% 0.000000 35.855000 27.120000
21max 45735.400000 28144.120000 30036.060000

png

  • Customers with stable usage of 3g volumes throughout 6 and 7 months are less likely to churn.
  • Customers with fall in consumption of 3g volumes in 7th month are more likely to Churn.

monthly_2g_6, monthly_2g_7, monthly_2g_8

1columns = ['monthly_2g_6', 'monthly_2g_7', 'monthly_2g_8']
2cat_univariate_analysis(columns)
1Customers who churned (Churn : 1)
2+----+----------------+---------+-------------+--------------------+----------------------+
3| | monthly_2g_6 | Count | Percent | Cumulative Count | Cumulative Percent |
4|----+----------------+---------+-------------+--------------------+----------------------|
5| 0 | 0 | 2454 | 0.946394 | 2454 | 0.946394 |
6| 1 | 1 | 126 | 0.0485924 | 2580 | 0.994987 |
7| 2 | 2 | 11 | 0.00424219 | 2591 | 0.999229 |
8| 3 | 4 | 2 | 0.000771307 | 2593 | 1 |
9+----+----------------+---------+-------------+--------------------+----------------------+
10
11+----+----------------+---------+------------+--------------------+----------------------+
12| | monthly_2g_7 | Count | Percent | Cumulative Count | Cumulative Percent |
13|----+----------------+---------+------------+--------------------+----------------------|
14| 0 | 0 | 2477 | 0.955264 | 2477 | 0.955264 |
15| 1 | 1 | 104 | 0.040108 | 2581 | 0.995372 |
16| 2 | 2 | 12 | 0.00462784 | 2593 | 1 |
17+----+----------------+---------+------------+--------------------+----------------------+
18
19+----+----------------+---------+-------------+--------------------+----------------------+
20| | monthly_2g_8 | Count | Percent | Cumulative Count | Cumulative Percent |
21|----+----------------+---------+-------------+--------------------+----------------------|
22| 0 | 0 | 2555 | 0.985345 | 2555 | 0.985345 |
23| 1 | 1 | 37 | 0.0142692 | 2592 | 0.999614 |
24| 2 | 2 | 1 | 0.000385654 | 2593 | 1 |
25+----+----------------+---------+-------------+--------------------+----------------------+
26
27
28Customers who did not churn (Churn : 0)
29+----+----------------+---------+-------------+--------------------+----------------------+
30| | monthly_2g_6 | Count | Percent | Cumulative Count | Cumulative Percent |
31|----+----------------+---------+-------------+--------------------+----------------------|
32| 0 | 0 | 24228 | 0.883653 | 24228 | 0.883653 |
33| 1 | 1 | 2825 | 0.103035 | 27053 | 0.986688 |
34| 2 | 2 | 334 | 0.0121818 | 27387 | 0.998869 |
35| 3 | 3 | 26 | 0.000948282 | 27413 | 0.999818 |
36| 4 | 4 | 5 | 0.000182362 | 27418 | 1 |
37+----+----------------+---------+-------------+--------------------+----------------------+
38
39+----+----------------+---------+-------------+--------------------+----------------------+
40| | monthly_2g_7 | Count | Percent | Cumulative Count | Cumulative Percent |
41|----+----------------+---------+-------------+--------------------+----------------------|
42| 0 | 0 | 24079 | 0.878219 | 24079 | 0.878219 |
43| 1 | 1 | 2909 | 0.106098 | 26988 | 0.984317 |
44| 2 | 2 | 394 | 0.0143701 | 27382 | 0.998687 |
45| 3 | 3 | 29 | 0.0010577 | 27411 | 0.999745 |
46| 4 | 4 | 5 | 0.000182362 | 27416 | 0.999927 |
47| 5 | 5 | 2 | 7.29448e-05 | 27418 | 1 |
48+----+----------------+---------+-------------+--------------------+----------------------+
49
50+----+----------------+---------+-------------+--------------------+----------------------+
51| | monthly_2g_8 | Count | Percent | Cumulative Count | Cumulative Percent |
52|----+----------------+---------+-------------+--------------------+----------------------|
53| 0 | 0 | 24383 | 0.889306 | 24383 | 0.889306 |
54| 1 | 1 | 2724 | 0.0993508 | 27107 | 0.988657 |
55| 2 | 2 | 282 | 0.0102852 | 27389 | 0.998942 |
56| 3 | 3 | 22 | 0.000802393 | 27411 | 0.999745 |
57| 4 | 4 | 5 | 0.000182362 | 27416 | 0.999927 |
58| 5 | 5 | 2 | 7.29448e-05 | 27418 | 1 |
59+----+----------------+---------+-------------+--------------------+----------------------+

png

monthly_3g_6, monthly_3g_7, monthly_3g_8

1columns = ['monthly_3g_6', 'monthly_3g_7', 'monthly_3g_8']
2cat_univariate_analysis(columns)
1Customers who churned (Churn : 1)
2+----+----------------+---------+-------------+--------------------+----------------------+
3| | monthly_3g_6 | Count | Percent | Cumulative Count | Cumulative Percent |
4|----+----------------+---------+-------------+--------------------+----------------------|
5| 0 | 0 | 2352 | 0.907057 | 2352 | 0.907057 |
6| 1 | 1 | 170 | 0.0655611 | 2522 | 0.972619 |
7| 2 | 2 | 49 | 0.018897 | 2571 | 0.991516 |
8| 3 | 3 | 13 | 0.0050135 | 2584 | 0.996529 |
9| 4 | 5 | 4 | 0.00154261 | 2588 | 0.998072 |
10| 5 | 4 | 4 | 0.00154261 | 2592 | 0.999614 |
11| 6 | 6 | 1 | 0.000385654 | 2593 | 1 |
12+----+----------------+---------+-------------+--------------------+----------------------+
13
14+----+----------------+---------+-------------+--------------------+----------------------+
15| | monthly_3g_7 | Count | Percent | Cumulative Count | Cumulative Percent |
16|----+----------------+---------+-------------+--------------------+----------------------|
17| 0 | 0 | 2399 | 0.925183 | 2399 | 0.925183 |
18| 1 | 1 | 136 | 0.0524489 | 2535 | 0.977632 |
19| 2 | 2 | 48 | 0.0185114 | 2583 | 0.996143 |
20| 3 | 3 | 9 | 0.00347088 | 2592 | 0.999614 |
21| 4 | 5 | 1 | 0.000385654 | 2593 | 1 |
22+----+----------------+---------+-------------+--------------------+----------------------+
23
24+----+----------------+---------+-------------+--------------------+----------------------+
25| | monthly_3g_8 | Count | Percent | Cumulative Count | Cumulative Percent |
26|----+----------------+---------+-------------+--------------------+----------------------|
27| 0 | 0 | 2524 | 0.97339 | 2524 | 0.97339 |
28| 1 | 1 | 56 | 0.0215966 | 2580 | 0.994987 |
29| 2 | 2 | 8 | 0.00308523 | 2588 | 0.998072 |
30| 3 | 3 | 4 | 0.00154261 | 2592 | 0.999614 |
31| 4 | 4 | 1 | 0.000385654 | 2593 | 1 |
32+----+----------------+---------+-------------+--------------------+----------------------+
33
34
35Customers who did not churn (Churn : 0)
36+----+----------------+---------+-------------+--------------------+----------------------+
37| | monthly_3g_6 | Count | Percent | Cumulative Count | Cumulative Percent |
38|----+----------------+---------+-------------+--------------------+----------------------|
39| 0 | 0 | 24080 | 0.878255 | 24080 | 0.878255 |
40| 1 | 1 | 2371 | 0.086476 | 26451 | 0.964731 |
41| 2 | 2 | 648 | 0.0236341 | 27099 | 0.988365 |
42| 3 | 3 | 194 | 0.00707564 | 27293 | 0.995441 |
43| 4 | 4 | 70 | 0.00255307 | 27363 | 0.997994 |
44| 5 | 5 | 28 | 0.00102123 | 27391 | 0.999015 |
45| 6 | 6 | 10 | 0.000364724 | 27401 | 0.99938 |
46| 7 | 7 | 9 | 0.000328252 | 27410 | 0.999708 |
47| 8 | 8 | 3 | 0.000109417 | 27413 | 0.999818 |
48| 9 | 11 | 2 | 7.29448e-05 | 27415 | 0.999891 |
49| 10 | 9 | 2 | 7.29448e-05 | 27417 | 0.999964 |
50| 11 | 14 | 1 | 3.64724e-05 | 27418 | 1 |
51+----+----------------+---------+-------------+--------------------+----------------------+
52
53+----+----------------+---------+-------------+--------------------+----------------------+
54| | monthly_3g_7 | Count | Percent | Cumulative Count | Cumulative Percent |
55|----+----------------+---------+-------------+--------------------+----------------------|
56| 0 | 0 | 23962 | 0.873951 | 23962 | 0.873951 |
57| 1 | 1 | 2330 | 0.0849807 | 26292 | 0.958932 |
58| 2 | 2 | 774 | 0.0282296 | 27066 | 0.987162 |
59| 3 | 3 | 198 | 0.00722153 | 27264 | 0.994383 |
60| 4 | 4 | 68 | 0.00248012 | 27332 | 0.996863 |
61| 5 | 5 | 38 | 0.00138595 | 27370 | 0.998249 |
62| 6 | 6 | 23 | 0.000838865 | 27393 | 0.999088 |
63| 7 | 7 | 10 | 0.000364724 | 27403 | 0.999453 |
64| 8 | 8 | 5 | 0.000182362 | 27408 | 0.999635 |
65| 9 | 9 | 4 | 0.00014589 | 27412 | 0.999781 |
66| 10 | 11 | 2 | 7.29448e-05 | 27414 | 0.999854 |
67| 11 | 16 | 1 | 3.64724e-05 | 27415 | 0.999891 |
68| 12 | 14 | 1 | 3.64724e-05 | 27416 | 0.999927 |
69| 13 | 12 | 1 | 3.64724e-05 | 27417 | 0.999964 |
70| 14 | 10 | 1 | 3.64724e-05 | 27418 | 1 |
71+----+----------------+---------+-------------+--------------------+----------------------+
72
73+----+----------------+---------+-------------+--------------------+----------------------+
74| | monthly_3g_8 | Count | Percent | Cumulative Count | Cumulative Percent |
75|----+----------------+---------+-------------+--------------------+----------------------|
76| 0 | 0 | 24002 | 0.87541 | 24002 | 0.87541 |
77| 1 | 1 | 2347 | 0.0856007 | 26349 | 0.961011 |
78| 2 | 2 | 728 | 0.0265519 | 27077 | 0.987563 |
79| 3 | 3 | 193 | 0.00703917 | 27270 | 0.994602 |
80| 4 | 4 | 86 | 0.00313663 | 27356 | 0.997739 |
81| 5 | 5 | 30 | 0.00109417 | 27386 | 0.998833 |
82| 6 | 6 | 14 | 0.000510613 | 27400 | 0.999343 |
83| 7 | 7 | 9 | 0.000328252 | 27409 | 0.999672 |
84| 8 | 9 | 3 | 0.000109417 | 27412 | 0.999781 |
85| 9 | 8 | 3 | 0.000109417 | 27415 | 0.999891 |
86| 10 | 10 | 2 | 7.29448e-05 | 27417 | 0.999964 |
87| 11 | 16 | 1 | 3.64724e-05 | 27418 | 1 |
88+----+----------------+---------+-------------+--------------------+----------------------+

png

sachet_3g_6, sachet_3g_7, sachet_3g_8

1columns = ['sachet_3g_6', 'sachet_3g_7','sachet_3g_8']
2print(data[columns].dtypes)
3cat_univariate_analysis(columns)
1sachet_3g_6 category
2sachet_3g_7 category
3sachet_3g_8 category
4dtype: object
5Customers who churned (Churn : 1)
6+----+---------------+---------+-------------+--------------------+----------------------+
7| | sachet_3g_6 | Count | Percent | Cumulative Count | Cumulative Percent |
8|----+---------------+---------+-------------+--------------------+----------------------|
9| 0 | 0 | 2454 | 0.946394 | 2454 | 0.946394 |
10| 1 | 1 | 87 | 0.0335519 | 2541 | 0.979946 |
11| 2 | 2 | 16 | 0.00617046 | 2557 | 0.986116 |
12| 3 | 4 | 11 | 0.00424219 | 2568 | 0.990359 |
13| 4 | 3 | 8 | 0.00308523 | 2576 | 0.993444 |
14| 5 | 10 | 4 | 0.00154261 | 2580 | 0.994987 |
15| 6 | 7 | 4 | 0.00154261 | 2584 | 0.996529 |
16| 7 | 6 | 3 | 0.00115696 | 2587 | 0.997686 |
17| 8 | 9 | 2 | 0.000771307 | 2589 | 0.998457 |
18| 9 | 23 | 1 | 0.000385654 | 2590 | 0.998843 |
19| 10 | 19 | 1 | 0.000385654 | 2591 | 0.999229 |
20| 11 | 8 | 1 | 0.000385654 | 2592 | 0.999614 |
21| 12 | 5 | 1 | 0.000385654 | 2593 | 1 |
22+----+---------------+---------+-------------+--------------------+----------------------+
23
24+----+---------------+---------+-------------+--------------------+----------------------+
25| | sachet_3g_7 | Count | Percent | Cumulative Count | Cumulative Percent |
26|----+---------------+---------+-------------+--------------------+----------------------|
27| 0 | 0 | 2458 | 0.947937 | 2458 | 0.947937 |
28| 1 | 1 | 82 | 0.0316236 | 2540 | 0.97956 |
29| 2 | 2 | 19 | 0.00732742 | 2559 | 0.986888 |
30| 3 | 3 | 8 | 0.00308523 | 2567 | 0.989973 |
31| 4 | 5 | 7 | 0.00269958 | 2574 | 0.992673 |
32| 5 | 4 | 4 | 0.00154261 | 2578 | 0.994215 |
33| 6 | 9 | 3 | 0.00115696 | 2581 | 0.995372 |
34| 7 | 6 | 3 | 0.00115696 | 2584 | 0.996529 |
35| 8 | 10 | 2 | 0.000771307 | 2586 | 0.9973 |
36| 9 | 35 | 1 | 0.000385654 | 2587 | 0.997686 |
37| 10 | 24 | 1 | 0.000385654 | 2588 | 0.998072 |
38| 11 | 17 | 1 | 0.000385654 | 2589 | 0.998457 |
39| 12 | 12 | 1 | 0.000385654 | 2590 | 0.998843 |
40| 13 | 11 | 1 | 0.000385654 | 2591 | 0.999229 |
41| 14 | 8 | 1 | 0.000385654 | 2592 | 0.999614 |
42| 15 | 7 | 1 | 0.000385654 | 2593 | 1 |
43+----+---------------+---------+-------------+--------------------+----------------------+
44
45+----+---------------+---------+-------------+--------------------+----------------------+
46| | sachet_3g_8 | Count | Percent | Cumulative Count | Cumulative Percent |
47|----+---------------+---------+-------------+--------------------+----------------------|
48| 0 | 0 | 2546 | 0.981874 | 2546 | 0.981874 |
49| 1 | 1 | 31 | 0.0119553 | 2577 | 0.99383 |
50| 2 | 3 | 5 | 0.00192827 | 2582 | 0.995758 |
51| 3 | 2 | 3 | 0.00115696 | 2585 | 0.996915 |
52| 4 | 8 | 2 | 0.000771307 | 2587 | 0.997686 |
53| 5 | 5 | 2 | 0.000771307 | 2589 | 0.998457 |
54| 6 | 4 | 2 | 0.000771307 | 2591 | 0.999229 |
55| 7 | 16 | 1 | 0.000385654 | 2592 | 0.999614 |
56| 8 | 13 | 1 | 0.000385654 | 2593 | 1 |
57+----+---------------+---------+-------------+--------------------+----------------------+
58
59
60Customers who did not churn (Churn : 0)
61+----+---------------+---------+-------------+--------------------+----------------------+
62| | sachet_3g_6 | Count | Percent | Cumulative Count | Cumulative Percent |
63|----+---------------+---------+-------------+--------------------+----------------------|
64| 0 | 0 | 25579 | 0.932927 | 25579 | 0.932927 |
65| 1 | 1 | 1220 | 0.0444963 | 26799 | 0.977424 |
66| 2 | 2 | 297 | 0.0108323 | 27096 | 0.988256 |
67| 3 | 3 | 111 | 0.00404844 | 27207 | 0.992304 |
68| 4 | 4 | 55 | 0.00200598 | 27262 | 0.99431 |
69| 5 | 5 | 36 | 0.00131301 | 27298 | 0.995623 |
70| 6 | 6 | 24 | 0.000875337 | 27322 | 0.996499 |
71| 7 | 7 | 22 | 0.000802393 | 27344 | 0.997301 |
72| 8 | 8 | 14 | 0.000510613 | 27358 | 0.997812 |
73| 9 | 9 | 13 | 0.000474141 | 27371 | 0.998286 |
74| 10 | 11 | 8 | 0.000291779 | 27379 | 0.998578 |
75| 11 | 10 | 7 | 0.000255307 | 27386 | 0.998833 |
76| 12 | 15 | 5 | 0.000182362 | 27391 | 0.999015 |
77| 13 | 12 | 4 | 0.00014589 | 27395 | 0.999161 |
78| 14 | 19 | 3 | 0.000109417 | 27398 | 0.999271 |
79| 15 | 18 | 3 | 0.000109417 | 27401 | 0.99938 |
80| 16 | 14 | 3 | 0.000109417 | 27404 | 0.999489 |
81| 17 | 13 | 3 | 0.000109417 | 27407 | 0.999599 |
82| 18 | 29 | 2 | 7.29448e-05 | 27409 | 0.999672 |
83| 19 | 23 | 2 | 7.29448e-05 | 27411 | 0.999745 |
84| 20 | 22 | 2 | 7.29448e-05 | 27413 | 0.999818 |
85| 21 | 16 | 2 | 7.29448e-05 | 27415 | 0.999891 |
86| 22 | 28 | 1 | 3.64724e-05 | 27416 | 0.999927 |
87| 23 | 21 | 1 | 3.64724e-05 | 27417 | 0.999964 |
88| 24 | 17 | 1 | 3.64724e-05 | 27418 | 1 |
89+----+---------------+---------+-------------+--------------------+----------------------+
90
91+----+---------------+---------+-------------+--------------------+----------------------+
92| | sachet_3g_7 | Count | Percent | Cumulative Count | Cumulative Percent |
93|----+---------------+---------+-------------+--------------------+----------------------|
94| 0 | 0 | 25595 | 0.933511 | 25595 | 0.933511 |
95| 1 | 1 | 1151 | 0.0419797 | 26746 | 0.975491 |
96| 2 | 2 | 293 | 0.0106864 | 27039 | 0.986177 |
97| 3 | 3 | 107 | 0.00390255 | 27146 | 0.99008 |
98| 4 | 4 | 68 | 0.00248012 | 27214 | 0.99256 |
99| 5 | 5 | 59 | 0.00215187 | 27273 | 0.994712 |
100| 6 | 6 | 39 | 0.00142242 | 27312 | 0.996134 |
101| 7 | 7 | 17 | 0.000620031 | 27329 | 0.996754 |
102| 8 | 9 | 13 | 0.000474141 | 27342 | 0.997228 |
103| 9 | 8 | 13 | 0.000474141 | 27355 | 0.997702 |
104| 10 | 11 | 12 | 0.000437669 | 27367 | 0.99814 |
105| 11 | 12 | 9 | 0.000328252 | 27376 | 0.998468 |
106| 12 | 10 | 8 | 0.000291779 | 27384 | 0.99876 |
107| 13 | 15 | 5 | 0.000182362 | 27389 | 0.998942 |
108| 14 | 14 | 5 | 0.000182362 | 27394 | 0.999125 |
109| 15 | 18 | 4 | 0.00014589 | 27398 | 0.999271 |
110| 16 | 13 | 4 | 0.00014589 | 27402 | 0.999416 |
111| 17 | 22 | 3 | 0.000109417 | 27405 | 0.999526 |
112| 18 | 20 | 3 | 0.000109417 | 27408 | 0.999635 |
113| 19 | 19 | 3 | 0.000109417 | 27411 | 0.999745 |
114| 20 | 21 | 2 | 7.29448e-05 | 27413 | 0.999818 |
115| 21 | 33 | 1 | 3.64724e-05 | 27414 | 0.999854 |
116| 22 | 31 | 1 | 3.64724e-05 | 27415 | 0.999891 |
117| 23 | 24 | 1 | 3.64724e-05 | 27416 | 0.999927 |
118| 24 | 17 | 1 | 3.64724e-05 | 27417 | 0.999964 |
119| 25 | 16 | 1 | 3.64724e-05 | 27418 | 1 |
120+----+---------------+---------+-------------+--------------------+----------------------+
121
122+----+---------------+---------+-------------+--------------------+----------------------+
123| | sachet_3g_8 | Count | Percent | Cumulative Count | Cumulative Percent |
124|----+---------------+---------+-------------+--------------------+----------------------|
125| 0 | 0 | 25736 | 0.938653 | 25736 | 0.938653 |
126| 1 | 1 | 1027 | 0.0374571 | 26763 | 0.976111 |
127| 2 | 2 | 249 | 0.00908163 | 27012 | 0.985192 |
128| 3 | 3 | 124 | 0.00452258 | 27136 | 0.989715 |
129| 4 | 4 | 71 | 0.00258954 | 27207 | 0.992304 |
130| 5 | 5 | 64 | 0.00233423 | 27271 | 0.994639 |
131| 6 | 6 | 26 | 0.000948282 | 27297 | 0.995587 |
132| 7 | 7 | 23 | 0.000838865 | 27320 | 0.996426 |
133| 8 | 8 | 20 | 0.000729448 | 27340 | 0.997155 |
134| 9 | 9 | 12 | 0.000437669 | 27352 | 0.997593 |
135| 10 | 12 | 11 | 0.000401196 | 27363 | 0.997994 |
136| 11 | 10 | 10 | 0.000364724 | 27373 | 0.998359 |
137| 12 | 13 | 9 | 0.000328252 | 27382 | 0.998687 |
138| 13 | 14 | 6 | 0.000218834 | 27388 | 0.998906 |
139| 14 | 11 | 6 | 0.000218834 | 27394 | 0.999125 |
140| 15 | 15 | 5 | 0.000182362 | 27399 | 0.999307 |
141| 16 | 23 | 2 | 7.29448e-05 | 27401 | 0.99938 |
142| 17 | 21 | 2 | 7.29448e-05 | 27403 | 0.999453 |
143| 18 | 20 | 2 | 7.29448e-05 | 27405 | 0.999526 |
144| 19 | 18 | 2 | 7.29448e-05 | 27407 | 0.999599 |
145| 20 | 17 | 2 | 7.29448e-05 | 27409 | 0.999672 |
146| 21 | 16 | 2 | 7.29448e-05 | 27411 | 0.999745 |
147| 22 | 41 | 1 | 3.64724e-05 | 27412 | 0.999781 |
148| 23 | 38 | 1 | 3.64724e-05 | 27413 | 0.999818 |
149| 24 | 30 | 1 | 3.64724e-05 | 27414 | 0.999854 |
150| 25 | 29 | 1 | 3.64724e-05 | 27415 | 0.999891 |
151| 26 | 27 | 1 | 3.64724e-05 | 27416 | 0.999927 |
152| 27 | 25 | 1 | 3.64724e-05 | 27417 | 0.999964 |
153| 28 | 19 | 1 | 3.64724e-05 | 27418 | 1 |
154+----+---------------+---------+-------------+--------------------+----------------------+

png

aug_vbc_3g, jul_vbc_3g, jun_vbc_3g

1columns = [ 'vbc_3g_6', 'vbc_3g_7','vbc_3g_8']
2num_univariate_analysis(columns, 'log')
1Customers who churned (Churn : 1)
2 vbc_3g_6 vbc_3g_7 vbc_3g_8
3count 2593.000000 2593.000000 2593.000000
4mean 81.564601 71.143880 32.610659
5std 320.898511 284.882601 197.998246
6min 0.000000 0.000000 0.000000
725% 0.000000 0.000000 0.000000
850% 0.000000 0.000000 0.000000
975% 0.000000 0.000000 0.000000
10max 6931.810000 4908.270000 5738.740000
11
12Customers who did not churn (Churn : 0)
13 vbc_3g_6 vbc_3g_7 vbc_3g_8
14count 27418.000000 27418.000000 27418.000000
15mean 125.124167 141.178182 138.597023
16std 395.413666 417.292310 402.761779
17min 0.000000 0.000000 0.000000
1825% 0.000000 0.000000 0.000000
1950% 0.000000 0.000000 0.000000
2075% 0.000000 9.940000 17.675000
21max 11166.210000 9165.600000 12916.220000

png

Bivariate Analysis

1data.head()
arpu_6arpu_7arpu_8onnet_mou_6onnet_mou_7onnet_mou_8offnet_mou_6offnet_mou_7offnet_mou_8roam_ic_mou_6roam_ic_mou_7roam_ic_mou_8roam_og_mou_6roam_og_mou_7roam_og_mou_8loc_og_t2t_mou_6loc_og_t2t_mou_7loc_og_t2t_mou_8loc_og_t2m_mou_6loc_og_t2m_mou_7loc_og_t2m_mou_8loc_og_t2f_mou_6loc_og_t2f_mou_7loc_og_t2f_mou_8loc_og_t2c_mou_6loc_og_t2c_mou_7loc_og_t2c_mou_8loc_og_mou_6loc_og_mou_7loc_og_mou_8std_og_t2t_mou_6std_og_t2t_mou_7std_og_t2t_mou_8std_og_t2m_mou_6std_og_t2m_mou_7std_og_t2m_mou_8std_og_t2f_mou_6std_og_t2f_mou_7std_og_t2f_mou_8std_og_mou_6std_og_mou_7std_og_mou_8isd_og_mou_6isd_og_mou_7isd_og_mou_8spl_og_mou_6spl_og_mou_7spl_og_mou_8og_others_6og_others_7og_others_8total_og_mou_6total_og_mou_7total_og_mou_8loc_ic_t2t_mou_6loc_ic_t2t_mou_7loc_ic_t2t_mou_8loc_ic_t2m_mou_6loc_ic_t2m_mou_7loc_ic_t2m_mou_8loc_ic_t2f_mou_6loc_ic_t2f_mou_7loc_ic_t2f_mou_8loc_ic_mou_6loc_ic_mou_7loc_ic_mou_8std_ic_t2t_mou_6std_ic_t2t_mou_7std_ic_t2t_mou_8std_ic_t2m_mou_6std_ic_t2m_mou_7std_ic_t2m_mou_8std_ic_t2f_mou_6std_ic_t2f_mou_7std_ic_t2f_mou_8std_ic_mou_6std_ic_mou_7std_ic_mou_8total_ic_mou_6total_ic_mou_7total_ic_mou_8spl_ic_mou_6spl_ic_mou_7spl_ic_mou_8isd_ic_mou_6isd_ic_mou_7isd_ic_mou_8ic_others_6ic_others_7ic_others_8total_rech_num_6total_rech_num_7total_rech_num_8total_rech_amt_6total_rech_amt_7total_rech_amt_8max_rech_amt_6max_rech_amt_7max_rech_amt_8last_day_rch_amt_6last_day_rch_amt_7last_day_rch_amt_8vol_2g_mb_6vol_2g_mb_7vol_2g_mb_8vol_3g_mb_6vol_3g_mb_7vol_3g_mb_8monthly_2g_6monthly_2g_7monthly_2g_8sachet_2g_6sachet_2g_7sachet_2g_8monthly_3g_6monthly_3g_7monthly_3g_8sachet_3g_6sachet_3g_7sachet_3g_8aonvbc_3g_8vbc_3g_7vbc_3g_6Average_rech_amt_6n7Churn
mobile_number
70007016011069.1801349.8503171.48057.8454.6852.29453.43567.16325.9116.2333.4931.6423.7412.5938.0651.3931.3840.28308.63447.38162.2862.1355.1453.230.00.00.00422.16533.91255.794.3023.2912.0149.8931.7649.146.6620.0816.6860.8675.1477.840.00.1810.014.500.006.500.000.00.0487.53609.24350.1658.1432.2627.31217.56221.49121.19152.16101.4639.53427.88355.23188.0436.8911.8330.3991.44126.99141.3352.1934.2422.21180.54173.08193.94626.46558.04428.740.210.00.02.0614.5331.5915.7415.1915.145571580790363815807901580007790.00.00.000.00.000.0000000000000080257.7419.3818.741185.01
7001524846378.721492.223137.362413.69351.0335.0894.6680.63136.480.000.000.000.000.000.00297.13217.5912.4980.9670.5850.540.000.000.000.00.07.15378.09288.1863.04116.56133.4322.5813.6910.0475.690.000.000.00130.26143.4898.280.00.000.000.000.0010.230.000.00.0508.36431.66171.5623.849.840.3157.5813.9815.480.000.000.0081.4323.8315.790.000.580.1022.434.080.650.000.000.0022.434.660.75103.8628.4916.540.000.00.00.000.000.000.000.000.001921144376011209015430500100.0356.00.030.0750.9511.9401001300000031521.03910.65122.16519.00
7002191713492.846205.671593.260501.76108.39534.24413.31119.28482.4623.53144.2472.117.9835.261.4449.636.1936.01151.1347.28294.464.540.0023.510.00.00.49205.3153.48353.99446.4185.98498.23255.3652.94156.940.000.000.00701.78138.93655.180.00.001.290.000.004.780.000.00.0907.09192.411015.2667.887.5852.58142.8818.53195.184.810.007.49215.5826.11255.26115.6838.29154.58308.1329.79317.910.000.001.91423.8168.09474.41968.61172.581144.530.450.00.0245.2862.11393.3983.4816.2421.4464115072537171101101301105000.00.00.020.00.000.0000000300000026070.000.000.00380.00
7000875565430.975299.869187.89450.5174.0170.61296.29229.74162.760.002.830.000.0017.740.0042.6165.1667.38273.29145.99128.280.004.4810.260.00.00.00315.91215.64205.937.892.583.2322.9964.5118.290.000.000.0030.8967.0921.530.00.000.000.003.265.910.000.00.0346.81286.01233.3841.3371.4428.89226.81149.69150.168.718.6832.71276.86229.83211.7868.7978.646.3318.6873.0873.930.510.002.1887.99151.7382.44364.86381.56294.460.000.00.00.000.000.230.000.000.0010625703481601101101301001001300.00.00.000.00.000.000000000000005110.002.4521.89459.00
7000187447690.00818.98025.4991185.919.287.7961.640.005.540.004.764.810.008.4613.3438.990.000.0058.540.000.000.000.000.000.00.00.0097.540.000.001146.910.810.001.550.000.000.000.000.001148.460.810.000.00.000.002.580.000.000.930.00.01249.530.810.0034.540.000.0047.412.310.000.000.000.0081.962.310.008.630.000.001.280.000.000.000.000.009.910.000.0091.882.310.000.000.00.00.000.000.000.000.000.00192481603011003030000.00.00.000.00.000.000000000000006670.000.000.00408.00

‘total_og_mou_6’ vs ‘total_og_mou_8’ with respect to Churn.

1sns.scatterplot(x=data['total_og_mou_6'],y=data['total_og_mou_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafdb0f7d0>

png

‘total_og_mou_7’ vs ‘total_og_mou_8’ with respect to Churn.

1sns.scatterplot(x=data['total_og_mou_6'],y=data['total_og_mou_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafaf9d7d0>

png

  • The customers with lower total_og_mou in 6th and 8th months are more likely to Churn compared to the ones with higher total_og_mou.

‘aon’ vs ‘total_og_mou_8’ with respect to Churn.

1sns.scatterplot(x=data['aon'],y=data['total_og_mou_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafcd5fb50>

png

  • The customers with lesser total_og_mou_8 and aon are more likely to churn compared to the one with higher total_og_mou_8 and aon.
1sns.scatterplot(x=data['aon'],y=data['total_ic_mou_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafccb9d90>

png

  • The customers with less total_ic_mou_8 are more likely to churn irrespective of aon.
  • The customers with total_ic_mou_8 > 2000 are very less likely to churn.

‘max_rech_amt_6’ vs ‘max_rech_amt_8’ with respect to ‘Churn’.

1sns.scatterplot(x=data['max_rech_amt_6'],y=data['max_rech_amt_8'],hue=data['Churn'])
1<matplotlib.axes._subplots.AxesSubplot at 0x7fbafdd5a950>

png

Correlation Analysis

1# function to correlate variables
2def correlation(dataframe) :
3
4 columnsForAnalysis = set(dataframe.columns.values) - {'Churn'}
5 cor0=dataframe[columnsForAnalysis].corr()
6 type(cor0)
7 cor0.where(np.triu(np.ones(cor0.shape),k=1).astype(np.bool))
8 cor0=cor0.unstack().reset_index()
9 cor0.columns=['VAR1','VAR2','CORR']
10 cor0.dropna(subset=['CORR'], inplace=True)
11 cor0.CORR=round(cor0['CORR'],2)
12 cor0.CORR=cor0.CORR.abs()
13 cor0.sort_values(by=['CORR'],ascending=False)
14 cor0=cor0[~(cor0['VAR1']==cor0['VAR2'])]
15
16 # removing duplicate correlations
17 cor0['pair'] = cor0[['VAR1', 'VAR2']].apply(lambda x: '{}-{}'.format(*sorted((x[0], x[1]))), axis=1)
18
19 cor0 = cor0.drop_duplicates(subset=['pair'], keep='first')
20 cor0 = cor0[['VAR1', 'VAR2','CORR']]
21 return pd.DataFrame(cor0.sort_values(by=['CORR'],ascending=False))
1# Correlations for Churn : 0 - non churn customers
2# Absolute values are reported
3pd.set_option('precision', 2)
4cor_0 = correlation(non_churned_customers)
5
6# filtering for correlations >= 40%
7condition = cor_0['CORR'] > 0.4
8cor_0 = cor_0[condition]
9cor_0.style.background_gradient(cmap='GnBu').hide_index()
VAR1 VAR2 CORR
isd_og_mou_7isd_og_mou_80.96
isd_og_mou_6isd_og_mou_80.95
isd_og_mou_6isd_og_mou_70.95
total_rech_amt_8arpu_80.95
total_rech_amt_6arpu_60.94
total_rech_amt_7arpu_70.94
total_rech_amt_7Average_rech_amt_6n70.91
arpu_7Average_rech_amt_6n70.91
loc_ic_mou_6total_ic_mou_60.90
total_rech_amt_6Average_rech_amt_6n70.90
loc_ic_mou_8total_ic_mou_80.89
Average_rech_amt_6n7arpu_60.89
total_ic_mou_7loc_ic_mou_70.88
std_og_t2t_mou_8onnet_mou_80.85
loc_ic_mou_8loc_ic_t2m_mou_80.85
loc_ic_mou_6loc_ic_t2m_mou_60.85
loc_ic_mou_8loc_ic_mou_70.85
std_og_t2m_mou_8offnet_mou_80.85
std_og_t2t_mou_7onnet_mou_70.84
total_og_mou_8std_og_mou_80.84
loc_og_mou_7loc_og_mou_80.84
std_ic_t2m_mou_8std_ic_mou_80.84
std_og_t2t_mou_6onnet_mou_60.84
std_og_t2m_mou_7offnet_mou_70.84
loc_ic_mou_7loc_ic_t2m_mou_70.83
total_og_mou_7std_og_mou_70.83
loc_ic_mou_6loc_ic_mou_70.83
total_ic_mou_7total_ic_mou_80.83
loc_og_t2t_mou_8loc_og_t2t_mou_70.83
loc_og_t2f_mou_7loc_og_t2f_mou_80.82
std_og_t2t_mou_8std_og_t2t_mou_70.82
loc_og_t2m_mou_8loc_og_t2m_mou_70.82
loc_ic_t2m_mou_8loc_ic_t2m_mou_70.82
onnet_mou_8onnet_mou_70.82
std_ic_t2m_mou_6std_ic_mou_60.82
loc_ic_t2t_mou_6loc_ic_t2t_mou_70.81
std_og_mou_7std_og_mou_80.81
offnet_mou_6std_og_t2m_mou_60.81
total_ic_mou_7total_ic_mou_60.81
loc_ic_t2t_mou_7loc_ic_t2t_mou_80.81
std_ic_mou_7std_ic_t2m_mou_70.81
std_og_mou_6total_og_mou_60.80
loc_ic_t2m_mou_6loc_ic_t2m_mou_70.80
loc_og_t2t_mou_6loc_og_t2t_mou_70.80
loc_og_mou_7loc_og_mou_60.80
loc_ic_t2f_mou_7loc_ic_t2f_mou_80.79
loc_og_t2f_mou_7loc_og_t2f_mou_60.79
std_og_t2m_mou_8std_og_t2m_mou_70.79
loc_og_mou_6loc_og_t2m_mou_60.79
total_rech_num_8total_rech_num_70.78
loc_og_t2m_mou_7loc_og_t2m_mou_60.78
offnet_mou_8offnet_mou_70.78
arpu_8Average_rech_amt_6n70.78
loc_og_t2t_mou_8loc_og_mou_80.77
total_rech_amt_7arpu_80.77
std_og_t2f_mou_7std_og_t2f_mou_80.77
total_og_mou_8total_og_mou_70.77
loc_og_t2m_mou_8loc_og_mou_80.77
arpu_7total_rech_amt_80.77
loc_og_mou_7loc_og_t2t_mou_70.77
arpu_7arpu_80.77
loc_ic_t2m_mou_8total_ic_mou_80.76
loc_ic_t2m_mou_6total_ic_mou_60.76
std_ic_mou_8std_ic_mou_70.76
vol_3g_mb_7vol_3g_mb_80.75
std_og_t2m_mou_8std_og_mou_80.75
isd_ic_mou_6isd_ic_mou_70.75
loc_og_t2t_mou_6loc_og_mou_60.75
loc_ic_mou_8loc_ic_mou_60.75
total_rech_amt_8Average_rech_amt_6n70.75
loc_ic_mou_8total_ic_mou_70.75
isd_ic_mou_8isd_ic_mou_70.75
std_og_t2m_mou_7std_og_t2m_mou_60.75
loc_og_mou_7loc_og_t2m_mou_70.75
loc_ic_t2f_mou_6loc_ic_t2f_mou_70.75
std_ic_mou_7std_ic_mou_60.75
loc_ic_mou_7total_ic_mou_80.75
total_ic_mou_7loc_ic_t2m_mou_70.74
std_ic_t2f_mou_7std_ic_t2f_mou_60.74
std_og_mou_7std_og_t2m_mou_70.74
std_og_t2t_mou_6std_og_mou_60.74
loc_ic_mou_7total_ic_mou_60.74
std_og_t2t_mou_8std_og_mou_80.74
std_ic_t2t_mou_7std_ic_t2t_mou_60.74
std_og_mou_6std_og_t2m_mou_60.74
std_og_t2t_mou_6std_og_t2t_mou_70.73
total_ic_mou_8total_ic_mou_60.73
std_ic_t2t_mou_8std_ic_t2t_mou_70.73
std_ic_t2m_mou_8std_ic_t2m_mou_70.73
loc_og_t2f_mou_6loc_og_t2f_mou_80.73
loc_og_mou_8loc_og_mou_60.73
total_rech_amt_7total_rech_amt_80.73
std_og_mou_7std_og_t2t_mou_70.73
std_og_mou_6std_og_mou_70.73
onnet_mou_6onnet_mou_70.73
loc_ic_mou_8loc_ic_t2m_mou_70.72
loc_ic_mou_6total_ic_mou_70.72
total_og_mou_8offnet_mou_80.72
std_ic_t2f_mou_7std_ic_t2f_mou_80.72
loc_og_t2t_mou_8loc_og_t2t_mou_60.72
std_ic_t2m_mou_7std_ic_t2m_mou_60.72
loc_ic_t2m_mou_6loc_ic_t2m_mou_80.72
offnet_mou_6offnet_mou_70.72
ic_others_8ic_others_70.71
std_og_t2f_mou_7std_og_t2f_mou_60.71
vbc_3g_8vbc_3g_70.71
total_og_mou_8onnet_mou_80.71
total_og_mou_7onnet_mou_70.71
vol_3g_mb_7vol_3g_mb_60.71
total_og_mou_7offnet_mou_70.70
loc_ic_mou_7loc_ic_t2m_mou_80.70
arpu_7arpu_60.70
std_ic_mou_7std_ic_t2t_mou_70.70
loc_ic_t2t_mou_6loc_ic_t2t_mou_80.70
offnet_mou_6total_og_mou_60.70
onnet_mou_6total_og_mou_60.70
total_rech_amt_6arpu_70.70
vol_2g_mb_8vol_2g_mb_70.69
std_og_t2t_mou_8onnet_mou_70.69
loc_ic_mou_6loc_ic_t2m_mou_70.69
std_og_t2t_mou_7onnet_mou_80.69
total_rech_num_7total_rech_num_60.69
loc_og_t2m_mou_8loc_og_t2m_mou_60.69
vbc_3g_7vbc_3g_60.69
last_day_rch_amt_8max_rech_amt_80.69
loc_ic_t2t_mou_7loc_ic_mou_70.68
loc_ic_mou_8total_ic_mou_60.68
total_rech_amt_7arpu_60.68
loc_ic_t2m_mou_6loc_ic_mou_70.68
loc_ic_mou_6loc_ic_t2t_mou_60.67
loc_ic_mou_8loc_ic_t2t_mou_80.67
ic_others_6ic_others_70.67
vol_3g_mb_8vol_3g_mb_60.67
total_og_mou_7total_og_mou_60.67
vol_2g_mb_6vol_2g_mb_70.67
loc_ic_t2f_mou_6loc_ic_t2f_mou_80.67
std_ic_t2t_mou_6std_ic_mou_60.67
std_ic_t2f_mou_8std_ic_t2f_mou_60.67
total_og_mou_7std_og_mou_80.66
std_ic_mou_8std_ic_t2t_mou_80.66
loc_ic_mou_6total_ic_mou_80.66
std_og_t2m_mou_8offnet_mou_70.66
std_ic_mou_8std_ic_mou_60.66
vbc_3g_7vol_3g_mb_70.65
offnet_mou_8std_og_t2m_mou_70.65
max_rech_amt_6last_day_rch_amt_60.65
loc_og_mou_7loc_og_t2t_mou_80.65
std_og_t2f_mou_6std_og_t2f_mou_80.65
total_rech_amt_8arpu_60.64
std_ic_mou_8std_ic_t2m_mou_70.64
total_ic_mou_8loc_ic_t2m_mou_70.64
total_rech_amt_6total_rech_amt_70.64
roam_og_mou_6roam_ic_mou_60.64
total_og_mou_8std_og_mou_70.64
loc_og_mou_8loc_og_t2t_mou_70.64
total_og_mou_8std_og_t2m_mou_80.64
loc_ic_mou_8loc_ic_t2m_mou_60.64
arpu_8arpu_60.64
roam_ic_mou_7roam_og_mou_70.63
std_ic_t2m_mou_7std_ic_mou_60.63
total_rech_amt_6total_rech_amt_80.63
loc_og_mou_8loc_og_t2m_mou_70.63
total_rech_amt_6arpu_80.63
std_og_t2t_mou_8std_og_t2t_mou_60.63
loc_og_mou_7loc_og_t2t_mou_60.63
std_ic_t2m_mou_8std_ic_t2m_mou_60.63
total_og_mou_7std_og_t2m_mou_70.63
onnet_mou_6onnet_mou_80.63
loc_ic_t2m_mou_7total_ic_mou_60.63
vbc_3g_8vol_3g_mb_80.63
total_ic_mou_7loc_ic_t2m_mou_80.63
loc_ic_mou_6loc_ic_t2m_mou_80.63
vbc_3g_6vol_3g_mb_60.63
ic_others_8ic_others_60.63
loc_og_mou_7loc_og_t2m_mou_80.63
std_og_t2t_mou_8std_og_mou_70.62
isd_ic_mou_8isd_ic_mou_60.62
vbc_3g_8vbc_3g_60.61
total_og_mou_8std_og_t2t_mou_80.61
std_og_t2t_mou_6onnet_mou_70.61
offnet_mou_7std_og_t2m_mou_60.61
std_og_mou_8onnet_mou_80.61
loc_og_mou_7loc_og_t2m_mou_60.61
max_rech_amt_7last_day_rch_amt_70.61
std_og_mou_6std_og_mou_80.61
loc_og_mou_6loc_og_t2m_mou_70.61
std_ic_t2m_mou_8std_ic_mou_70.61
roam_ic_mou_8roam_og_mou_80.60
std_og_mou_8std_og_t2t_mou_70.60
total_rech_num_8total_rech_num_60.60
total_og_mou_7std_og_t2t_mou_70.60
std_og_mou_8std_og_t2m_mou_70.60
max_rech_amt_8max_rech_amt_60.60
std_og_mou_8offnet_mou_80.60
loc_ic_t2t_mou_7total_ic_mou_70.60
onnet_mou_6std_og_t2t_mou_70.60
std_og_t2m_mou_8std_og_t2m_mou_60.60
total_og_mou_6std_og_t2m_mou_60.60
loc_og_mou_6loc_og_t2t_mou_70.60
loc_ic_t2m_mou_6total_ic_mou_70.60
std_ic_t2t_mou_8std_ic_t2t_mou_60.59
loc_ic_t2t_mou_8loc_ic_mou_70.59
std_og_mou_6onnet_mou_60.59
loc_ic_t2t_mou_8total_ic_mou_80.59
loc_ic_t2t_mou_6total_ic_mou_60.59
std_og_mou_7onnet_mou_70.59
offnet_mou_6offnet_mou_80.59
std_og_mou_7std_og_t2m_mou_80.59
loc_ic_t2m_mou_8total_ic_mou_60.58
roam_og_mou_8roam_og_mou_70.58
std_og_t2t_mou_6total_og_mou_60.58
offnet_mou_6std_og_t2m_mou_70.58
total_og_mou_7onnet_mou_80.58
std_ic_mou_7std_ic_t2m_mou_60.58
loc_ic_t2t_mou_6loc_ic_mou_70.57
total_og_mou_7std_og_mou_60.57
std_og_mou_7offnet_mou_70.57
loc_og_t2t_mou_8loc_og_mou_60.57
spl_og_mou_7spl_og_mou_80.57
max_rech_amt_7max_rech_amt_80.56
std_ic_t2m_mou_8std_ic_mou_60.56
total_og_mou_8onnet_mou_70.56
roam_ic_mou_8roam_ic_mou_70.56
loc_ic_t2m_mou_6total_ic_mou_80.56
loc_og_mou_8loc_og_t2t_mou_60.56
spl_og_mou_6spl_og_mou_70.56
std_og_mou_7std_og_t2m_mou_60.56
loc_og_mou_8loc_og_t2m_mou_60.56
loc_ic_mou_8loc_ic_t2t_mou_70.56
loc_ic_mou_6loc_ic_t2t_mou_70.55
loc_og_t2m_mou_8loc_og_mou_60.55
std_ic_mou_7std_ic_t2t_mou_60.55
total_og_mou_8total_og_mou_60.55
total_og_mou_7offnet_mou_80.54
std_og_mou_6std_og_t2t_mou_70.54
std_ic_mou_8std_ic_t2m_mou_60.54
total_og_mou_8offnet_mou_70.54
std_og_mou_6offnet_mou_60.54
std_og_mou_6std_og_t2m_mou_70.54
std_og_t2t_mou_6std_og_mou_70.53
isd_og_mou_7Average_rech_amt_6n70.53
std_og_t2t_mou_6onnet_mou_80.53
loc_og_t2c_mou_7spl_og_mou_70.53
loc_og_t2c_mou_8loc_og_t2c_mou_70.53
std_ic_mou_7std_ic_t2t_mou_80.53
std_og_mou_7total_og_mou_60.53
std_og_t2t_mou_8onnet_mou_60.52
vol_2g_mb_6vol_2g_mb_80.52
arpu_7isd_og_mou_70.52
total_og_mou_6arpu_60.51
vol_3g_mb_7vbc_3g_60.51
loc_ic_t2t_mou_8total_ic_mou_70.51
loc_ic_mou_6loc_ic_t2t_mou_80.51
total_og_mou_8arpu_80.51
vbc_3g_8vol_3g_mb_70.51
total_rech_amt_7isd_og_mou_70.50
roam_og_mou_6roam_og_mou_70.50
std_og_mou_7onnet_mou_80.50
loc_ic_mou_8loc_ic_t2t_mou_60.50
std_ic_mou_8std_ic_t2t_mou_70.50
Average_rech_amt_6n7isd_og_mou_80.50
loc_ic_t2t_mou_6total_ic_mou_70.50
std_ic_t2t_mou_7std_ic_mou_60.50
loc_ic_t2m_mou_6loc_og_t2m_mou_60.50
isd_og_mou_6Average_rech_amt_6n70.50
max_rech_amt_7max_rech_amt_60.50
total_og_mou_8total_rech_amt_80.49
std_og_t2t_mou_8total_og_mou_70.49
loc_og_t2m_mou_7loc_ic_t2m_mou_70.49
loc_ic_t2t_mou_7total_ic_mou_80.49
vbc_3g_7vol_3g_mb_80.49
total_og_mou_7std_og_t2m_mou_80.49
total_rech_amt_6total_og_mou_60.49
std_og_mou_8onnet_mou_70.49
loc_og_t2m_mou_8loc_ic_t2m_mou_80.49
loc_ic_t2t_mou_7total_ic_mou_60.49
total_rech_amt_8isd_og_mou_80.49
spl_og_mou_6loc_og_t2c_mou_60.48
arpu_7isd_og_mou_80.48
total_rech_amt_8isd_og_mou_70.48
offnet_mou_8std_og_t2m_mou_60.48
max_rech_amt_8total_rech_amt_80.48
arpu_8isd_og_mou_80.48
isd_og_mou_6arpu_60.48
total_og_mou_7arpu_70.48
total_og_mou_8std_og_t2m_mou_70.48
total_og_mou_7onnet_mou_60.48
total_og_mou_6onnet_mou_70.48
total_og_mou_7offnet_mou_60.48
total_og_mou_8std_og_t2t_mou_70.47
offnet_mou_6loc_og_t2m_mou_60.47
vbc_3g_6vol_3g_mb_80.47
isd_og_mou_7arpu_60.47
std_og_t2t_mou_8std_og_mou_60.47
loc_og_t2t_mou_6onnet_mou_60.47
offnet_mou_6std_og_t2m_mou_80.47
arpu_8offnet_mou_80.47
loc_og_t2t_mou_7onnet_mou_70.47
total_og_mou_6offnet_mou_70.47
isd_og_mou_6total_rech_amt_60.47
total_rech_amt_6isd_og_mou_70.46
loc_og_t2c_mou_8spl_og_mou_80.46
roam_ic_mou_7roam_ic_mou_60.46
loc_og_t2t_mou_8onnet_mou_80.46
std_og_mou_8std_og_t2m_mou_60.46
max_rech_amt_7total_rech_amt_70.46
total_og_mou_8std_og_mou_60.46
arpu_6isd_og_mou_80.46
isd_og_mou_6arpu_70.46
std_ic_mou_8total_ic_mou_80.46
total_og_mou_7total_rech_amt_70.46
arpu_8isd_og_mou_70.46
total_rech_amt_8offnet_mou_80.46
offnet_mou_6arpu_60.46
vbc_3g_7vol_3g_mb_60.46
total_rech_amt_7isd_og_mou_80.46
total_og_mou_7std_og_t2m_mou_60.45
loc_ic_t2t_mou_8total_ic_mou_60.45
std_ic_mou_7total_ic_mou_70.45
total_rech_amt_6isd_og_mou_80.45
loc_ic_mou_6loc_og_mou_60.45
std_og_mou_8offnet_mou_70.45
std_og_t2t_mou_6std_og_mou_80.45
total_rech_amt_6offnet_mou_60.45
std_og_mou_6std_og_t2m_mou_80.44
loc_ic_mou_8loc_og_t2m_mou_80.44
std_ic_mou_8std_ic_t2t_mou_60.44
loc_ic_mou_6loc_og_t2m_mou_60.44
loc_og_mou_6total_og_mou_60.44
std_og_mou_7offnet_mou_80.44
std_og_mou_8total_og_mou_60.44
arpu_7offnet_mou_70.44
loc_ic_mou_8loc_og_mou_80.44
isd_og_mou_6total_rech_amt_80.44
loc_og_t2m_mou_8offnet_mou_80.44
std_ic_mou_6total_ic_mou_60.44
std_og_mou_6onnet_mou_70.43
total_rech_amt_7offnet_mou_70.43
isd_og_mou_6total_rech_amt_70.43
loc_ic_t2t_mou_6total_ic_mou_80.43
loc_og_t2m_mou_7loc_ic_t2m_mou_80.43
vbc_3g_8vol_3g_mb_60.43
loc_og_t2m_mou_8loc_ic_t2m_mou_70.43
total_rech_amt_6max_rech_amt_60.43
isd_og_mou_6arpu_80.43
loc_og_t2m_mou_7loc_ic_mou_70.42
loc_og_mou_7loc_ic_mou_70.42
std_ic_t2t_mou_8std_ic_mou_60.42
onnet_mou_8total_og_mou_60.42
Average_rech_amt_6n7total_og_mou_60.42
loc_og_t2m_mou_6loc_ic_t2m_mou_70.42
max_rech_amt_8last_day_rch_amt_60.42
total_og_mou_7Average_rech_amt_6n70.42
total_og_mou_8loc_og_mou_80.42
loc_og_t2m_mou_7offnet_mou_70.42
loc_ic_t2m_mou_6loc_og_t2m_mou_70.42
total_og_mou_8onnet_mou_60.41
spl_og_mou_6spl_og_mou_80.41
offnet_mou_6Average_rech_amt_6n70.41
last_day_rch_amt_8max_rech_amt_70.41
last_day_rch_amt_8max_rech_amt_60.41
loc_ic_t2m_mou_6loc_og_mou_60.41
1# Correlations for Churn : 1 - churned customers
2# Absolute values are reported
3pd.set_option('precision', 2)
4cor_1 = correlation(churned_customers)
5
6# filtering for correlations >= 40%
7condition = cor_1['CORR'] > 0.4
8cor_1 = cor_1[condition]
9cor_1.style.background_gradient(cmap='GnBu').hide_index()
VAR1 VAR2 CORR
og_others_8og_others_71.00
total_rech_amt_8arpu_80.96
total_rech_amt_6arpu_60.95
total_rech_amt_7arpu_70.95
total_og_mou_8std_og_mou_80.95
std_og_t2t_mou_7onnet_mou_70.95
total_og_mou_7std_og_mou_70.94
loc_og_t2f_mou_6og_others_80.93
loc_og_t2f_mou_7loc_og_t2f_mou_60.93
std_og_t2t_mou_8onnet_mou_80.93
loc_og_t2f_mou_6og_others_70.93
offnet_mou_6std_og_t2m_mou_60.92
std_og_t2t_mou_6onnet_mou_60.92
std_ic_t2m_mou_8std_ic_mou_80.92
std_og_mou_6total_og_mou_60.92
std_og_t2m_mou_7offnet_mou_70.92
loc_og_t2f_mou_7og_others_80.91
loc_og_t2f_mou_7og_others_70.91
loc_ic_mou_8loc_ic_t2m_mou_80.90
loc_ic_mou_6loc_ic_t2m_mou_60.90
loc_ic_mou_8total_ic_mou_80.89
loc_og_t2m_mou_8loc_og_mou_80.88
std_og_t2m_mou_8offnet_mou_80.87
loc_ic_mou_6total_ic_mou_60.87
total_ic_mou_7loc_ic_mou_70.86
loc_og_mou_7loc_og_t2m_mou_70.84
loc_ic_mou_7loc_ic_t2m_mou_70.84
std_ic_mou_7std_ic_t2m_mou_70.82
loc_ic_t2m_mou_8total_ic_mou_80.81
std_og_t2t_mou_8std_og_mou_80.79
std_ic_t2t_mou_7std_ic_t2t_mou_60.78
arpu_7Average_rech_amt_6n70.77
std_ic_t2m_mou_6std_ic_mou_60.77
loc_og_mou_6loc_og_t2m_mou_60.77
loc_ic_t2m_mou_6total_ic_mou_60.77
total_rech_amt_6Average_rech_amt_6n70.76
total_rech_amt_7Average_rech_amt_6n70.76
total_og_mou_8std_og_t2t_mou_80.75
loc_og_t2t_mou_6loc_og_mou_60.75
total_og_mou_8onnet_mou_80.74
std_og_mou_8onnet_mou_80.74
std_og_mou_7std_og_t2m_mou_70.74
loc_og_t2t_mou_8loc_og_t2t_mou_70.73
Average_rech_amt_6n7arpu_60.73
loc_ic_mou_8loc_ic_t2t_mou_80.73
std_ic_t2t_mou_6std_ic_mou_60.72
total_ic_mou_7loc_ic_t2m_mou_70.72
loc_ic_mou_6loc_ic_t2t_mou_60.72
max_rech_amt_6last_day_rch_amt_60.72
total_og_mou_7offnet_mou_70.72
std_og_mou_6std_og_t2m_mou_60.72
roam_ic_mou_8roam_ic_mou_70.72
std_og_t2m_mou_8std_og_mou_80.71
total_og_mou_8offnet_mou_80.70
last_day_rch_amt_8max_rech_amt_80.70
total_og_mou_7std_og_t2m_mou_70.69
loc_og_mou_7loc_og_t2t_mou_70.69
std_og_mou_7std_og_t2t_mou_70.69
loc_ic_t2t_mou_7loc_ic_mou_70.69
max_rech_amt_8total_rech_amt_80.68
std_og_t2t_mou_6std_og_mou_60.68
offnet_mou_6total_og_mou_60.68
loc_og_t2t_mou_8loc_og_mou_80.68
total_og_mou_8std_og_t2m_mou_80.68
loc_og_t2c_mou_7spl_og_mou_70.68
total_og_mou_6std_og_t2m_mou_60.67
std_og_t2t_mou_6std_og_t2t_mou_70.67
vol_3g_mb_7vol_3g_mb_80.67
loc_ic_t2f_mou_6loc_ic_t2f_mou_70.67
std_og_mou_7offnet_mou_70.66
total_og_mou_7onnet_mou_70.66
onnet_mou_6total_og_mou_60.65
roam_og_mou_8roam_og_mou_70.65
loc_og_t2m_mou_8loc_ic_t2m_mou_80.65
std_ic_mou_7std_ic_t2t_mou_70.65
loc_ic_mou_8loc_og_mou_80.65
std_og_mou_7onnet_mou_70.65
total_og_mou_7std_og_t2t_mou_70.64
onnet_mou_6onnet_mou_70.64
loc_og_mou_8loc_ic_t2m_mou_80.64
loc_ic_t2t_mou_8total_ic_mou_80.64
std_og_t2t_mou_6onnet_mou_70.63
loc_og_mou_7loc_og_mou_80.63
std_og_mou_6offnet_mou_60.63
roam_og_mou_6roam_ic_mou_60.63
std_og_mou_8offnet_mou_80.63
loc_ic_t2t_mou_6total_ic_mou_60.63
loc_ic_t2f_mou_7loc_ic_t2f_mou_80.62
vbc_3g_6vol_3g_mb_60.62
onnet_mou_8onnet_mou_70.62
roam_ic_mou_7roam_ic_mou_60.62
std_og_t2t_mou_6total_og_mou_60.62
std_og_t2m_mou_7std_og_t2m_mou_60.62
max_rech_amt_8arpu_80.62
vbc_3g_8vbc_3g_70.61
loc_og_mou_8total_ic_mou_80.61
loc_og_t2m_mou_7loc_og_t2m_mou_60.61
std_og_t2t_mou_8std_og_t2t_mou_70.61
roam_og_mou_6roam_og_mou_70.61
std_og_mou_6onnet_mou_60.61
onnet_mou_6std_og_t2t_mou_70.61
isd_og_mou_7isd_og_mou_80.60
std_ic_mou_7std_ic_mou_60.60
total_og_mou_8arpu_80.60
std_og_t2t_mou_8onnet_mou_70.60
std_og_t2f_mou_7std_og_t2f_mou_80.60
loc_ic_t2m_mou_6loc_ic_t2m_mou_70.60
loc_og_t2m_mou_8loc_og_t2m_mou_70.59
loc_og_mou_7loc_og_mou_60.59
arpu_8offnet_mou_80.59
max_rech_amt_7last_day_rch_amt_70.59
loc_ic_mou_8loc_ic_mou_70.58
std_og_mou_7std_og_mou_80.58
loc_ic_t2t_mou_7total_ic_mou_70.58
loc_og_t2m_mou_7loc_ic_t2m_mou_70.58
std_og_t2m_mou_8std_og_t2m_mou_70.58
std_og_mou_6std_og_mou_70.58
total_og_mou_8total_rech_amt_80.58
loc_ic_mou_8loc_og_t2m_mou_80.58
total_rech_amt_8offnet_mou_80.58
offnet_mou_6offnet_mou_70.57
loc_ic_t2m_mou_8loc_ic_t2m_mou_70.57
total_og_mou_8total_rech_num_80.57
loc_ic_mou_6loc_ic_mou_70.57
loc_og_t2c_mou_8spl_og_mou_80.57
isd_ic_mou_6isd_ic_mou_70.57
arpu_7isd_og_mou_70.57
offnet_mou_8offnet_mou_70.57
vol_3g_mb_7vol_3g_mb_60.57
loc_ic_t2t_mou_7loc_ic_t2t_mou_80.56
loc_og_t2t_mou_6loc_og_t2t_mou_70.56
total_og_mou_8total_og_mou_70.56
std_ic_mou_8total_ic_mou_80.56
std_og_t2t_mou_7onnet_mou_80.56
spl_og_mou_6loc_og_t2c_mou_60.56
total_rech_amt_7isd_og_mou_70.56
vbc_3g_7vol_3g_mb_70.56
ic_others_6ic_others_70.55
total_og_mou_7std_og_mou_80.55
total_ic_mou_7total_ic_mou_60.55
offnet_mou_7std_og_t2m_mou_60.55
total_rech_num_8total_rech_amt_80.55
total_rech_num_8total_rech_num_70.54
loc_og_t2m_mou_8total_ic_mou_80.54
total_rech_num_8arpu_80.54
total_ic_mou_7total_ic_mou_80.54
std_ic_t2t_mou_8std_ic_t2t_mou_70.54
std_ic_mou_7total_ic_mou_70.54
total_rech_num_8std_og_mou_80.54
loc_og_t2c_mou_7loc_og_t2c_mou_60.54
std_ic_mou_8std_ic_t2t_mou_80.54
offnet_mou_6std_og_t2m_mou_70.54
loc_ic_mou_6loc_ic_t2m_mou_70.54
std_ic_t2t_mou_7std_ic_mou_60.54
std_og_t2m_mou_8offnet_mou_70.54
loc_ic_t2m_mou_6loc_ic_mou_70.54
vbc_3g_7vbc_3g_60.53
vol_2g_mb_6vol_2g_mb_70.53
isd_og_mou_6arpu_60.53
std_ic_mou_6total_ic_mou_60.53
loc_og_mou_8loc_og_t2m_mou_70.52
total_og_mou_8std_og_mou_70.52
isd_og_mou_6total_rech_amt_60.52
std_ic_mou_8std_ic_mou_70.52
total_rech_amt_7arpu_80.51
loc_og_mou_7loc_ic_t2m_mou_70.51
total_og_mou_7arpu_70.51
total_og_mou_7std_og_mou_60.51
loc_ic_mou_8loc_ic_t2m_mou_70.51
roam_ic_mou_7roam_og_mou_70.51
arpu_8std_og_mou_80.51
total_og_mou_7total_og_mou_60.51
loc_og_mou_7loc_og_t2m_mou_60.51
loc_og_mou_7loc_og_t2m_mou_80.51
arpu_7arpu_80.50
loc_ic_t2m_mou_6loc_og_t2m_mou_60.50
offnet_mou_8std_og_t2m_mou_70.50
std_ic_t2m_mou_8std_ic_t2m_mou_70.50
std_og_mou_7total_og_mou_60.50
total_rech_amt_8onnet_mou_80.50
last_day_rch_amt_8total_rech_amt_80.50
loc_og_mou_7loc_og_t2t_mou_80.50
loc_ic_mou_8total_ic_mou_70.50
loc_ic_mou_7loc_ic_t2m_mou_80.50
total_rech_amt_8std_og_mou_80.50
arpu_8onnet_mou_80.50
std_og_t2f_mou_7loc_og_t2f_mou_70.50
loc_og_mou_7loc_ic_mou_70.50
max_rech_amt_7total_rech_amt_70.50
std_ic_t2m_mou_7std_ic_t2m_mou_60.50
loc_ic_mou_8loc_ic_t2f_mou_80.49
vbc_3g_8vol_3g_mb_80.49
std_ic_t2t_mou_8std_ic_t2t_mou_60.49
loc_og_t2m_mou_7loc_ic_mou_70.49
vol_2g_mb_8vol_2g_mb_70.49
loc_ic_mou_7total_ic_mou_60.49
loc_ic_mou_7total_ic_mou_80.49
std_og_t2f_mou_7og_others_70.48
vol_3g_mb_8vol_3g_mb_60.48
isd_og_mou_7isd_ic_mou_70.48
std_og_t2f_mou_7loc_og_t2f_mou_60.48
std_og_t2f_mou_7og_others_80.48
loc_og_t2t_mou_8loc_ic_t2t_mou_80.48
std_ic_t2m_mou_8total_ic_mou_80.48
isd_ic_mou_8isd_ic_mou_70.48
arpu_7total_rech_amt_80.48
total_og_mou_7total_rech_amt_70.48
std_ic_mou_7std_ic_t2t_mou_60.48
loc_ic_t2m_mou_7total_ic_mou_60.47
loc_ic_t2t_mou_8loc_ic_mou_70.47
total_rech_num_8onnet_mou_80.47
total_og_mou_6arpu_60.47
total_og_mou_8loc_og_mou_80.47
std_ic_mou_8std_ic_t2m_mou_70.46
total_og_mou_7total_rech_num_70.46
loc_ic_t2f_mou_6loc_ic_t2f_mou_80.46
loc_ic_mou_6total_ic_mou_70.46
last_day_rch_amt_8roam_og_mou_80.46
total_rech_amt_6max_rech_amt_60.46
std_og_mou_8std_og_t2t_mou_70.46
total_rech_amt_6total_og_mou_60.46
isd_og_mou_7Average_rech_amt_6n70.46
spl_og_mou_7spl_og_mou_80.46
loc_og_mou_8loc_og_t2t_mou_70.46
loc_ic_mou_6loc_og_t2m_mou_60.45
max_rech_amt_7max_rech_amt_80.45
std_ic_t2m_mou_8std_ic_mou_70.45
total_rech_amt_7total_rech_amt_80.45
arpu_7offnet_mou_70.45
std_og_t2t_mou_8total_rech_num_80.45
arpu_8roam_og_mou_80.45
std_ic_t2m_mou_7total_ic_mou_70.45
loc_og_mou_6loc_og_t2m_mou_70.45
total_rech_num_7total_rech_num_60.45
std_ic_t2f_mou_7std_ic_t2f_mou_60.45
loc_ic_mou_6loc_og_mou_60.45
loc_og_mou_6loc_og_t2t_mou_70.45
loc_ic_t2m_mou_6total_ic_mou_70.44
isd_ic_mou_6ic_others_60.44
last_day_rch_amt_8arpu_80.44
loc_og_t2c_mou_8loc_og_t2c_mou_70.44
arpu_8Average_rech_amt_6n70.44
roam_ic_mou_8roam_og_mou_80.44
std_og_mou_8onnet_mou_70.44
std_og_mou_7std_og_t2m_mou_80.44
loc_og_t2m_mou_8offnet_mou_80.44
total_rech_amt_8roam_og_mou_80.44
loc_og_mou_7total_ic_mou_70.44
total_og_mou_8onnet_mou_70.43
spl_og_mou_6spl_og_mou_70.43
total_ic_mou_8loc_ic_t2m_mou_70.43
std_ic_t2m_mou_6total_ic_mou_60.43
total_rech_num_8offnet_mou_80.43
loc_og_mou_8offnet_mou_80.43
std_og_t2t_mou_8std_og_mou_70.43
loc_ic_t2f_mou_8total_ic_mou_80.43
std_og_t2f_mou_7std_ic_t2f_mou_70.43
total_og_mou_6total_rech_num_60.43
std_ic_mou_7std_ic_t2m_mou_60.42
loc_ic_mou_8loc_og_t2t_mou_80.42
loc_ic_t2t_mou_8loc_og_mou_80.42
total_ic_mou_7loc_og_t2m_mou_70.42
total_rech_amt_7offnet_mou_70.42
max_rech_amt_8max_rech_amt_60.42
loc_ic_t2m_mou_6loc_og_mou_60.42
last_day_rch_amt_7max_rech_amt_80.42
total_ic_mou_7loc_ic_t2m_mou_80.42
loc_ic_t2f_mou_7loc_ic_mou_70.42
loc_og_t2m_mou_8loc_ic_t2m_mou_70.42
total_rech_amt_8Average_rech_amt_6n70.42
arpu_8total_ic_mou_80.42
offnet_mou_6arpu_60.42
total_og_mou_7std_og_t2m_mou_80.42
std_og_mou_6std_og_t2t_mou_70.42
total_og_mou_7offnet_mou_80.41
std_og_mou_6std_og_t2m_mou_70.41
total_og_mou_8std_og_t2t_mou_70.41
std_og_t2t_mou_6std_og_mou_70.41
std_og_mou_7total_rech_num_70.41
std_og_mou_7arpu_70.41
loc_og_t2t_mou_8loc_og_t2t_mou_60.41
total_og_mou_8total_ic_mou_80.41
std_og_mou_7std_og_t2m_mou_60.41
total_rech_amt_6offnet_mou_60.41
spl_ic_mou_6spl_ic_mou_80.41

Data Preparation

Derived Variables

1# Derived variables to measure change in usage
2
3# Usage
4data['delta_vol_2g'] = data['vol_2g_mb_8'] - data['vol_2g_mb_6'].add(data['vol_2g_mb_7']).div(2)
5data['delta_vol_3g'] = data['vol_3g_mb_8'] - data['vol_3g_mb_6'].add(data['vol_3g_mb_7']).div(2)
6data['delta_total_og_mou'] = data['total_og_mou_8'] - data['total_og_mou_6'].add(data['total_og_mou_7']).div(2)
7data['delta_total_ic_mou'] = data['total_ic_mou_8'] - data['total_ic_mou_6'].add(data['total_ic_mou_7']).div(2)
8data['delta_vbc_3g'] = data['vbc_3g_8'] - data['vbc_3g_6'].add(data['vbc_3g_7']).div(2)
9
10# Revenue
11data['delta_arpu'] = data['arpu_8'] - data['arpu_6'].add(data['arpu_7']).div(2)
12data['delta_total_rech_amt'] = data['total_rech_amt_8'] - data['total_rech_amt_6'].add(data['total_rech_amt_7']).div(2)
1# Removing variables used for derivation :
2data.drop(columns=[
3 'vol_2g_mb_8', 'vol_2g_mb_6', 'vol_2g_mb_7',
4 'vol_3g_mb_8' , 'vol_3g_mb_6', 'vol_3g_mb_7' ,
5 'total_og_mou_8','total_og_mou_6', 'total_og_mou_7',
6 'total_ic_mou_8','total_ic_mou_6', 'total_ic_mou_7',
7 'vbc_3g_8','vbc_3g_6','vbc_3g_7',
8 'arpu_8','arpu_6','arpu_7',
9 'total_rech_amt_8', 'total_rech_amt_6', 'total_rech_amt_7'
10
11], inplace=True)

Outlier Treatment

1# Looking at quantiles from 0.90 to 1.
2data.quantile(np.arange(0.9,1.01,0.01)).style.bar()
onnet_mou_6 onnet_mou_7 onnet_mou_8 offnet_mou_6 offnet_mou_7 offnet_mou_8 roam_ic_mou_6 roam_ic_mou_7 roam_ic_mou_8 roam_og_mou_6 roam_og_mou_7 roam_og_mou_8 loc_og_t2t_mou_6 loc_og_t2t_mou_7 loc_og_t2t_mou_8 loc_og_t2m_mou_6 loc_og_t2m_mou_7 loc_og_t2m_mou_8 loc_og_t2f_mou_6 loc_og_t2f_mou_7 loc_og_t2f_mou_8 loc_og_t2c_mou_6 loc_og_t2c_mou_7 loc_og_t2c_mou_8 loc_og_mou_6 loc_og_mou_7 loc_og_mou_8 std_og_t2t_mou_6 std_og_t2t_mou_7 std_og_t2t_mou_8 std_og_t2m_mou_6 std_og_t2m_mou_7 std_og_t2m_mou_8 std_og_t2f_mou_6 std_og_t2f_mou_7 std_og_t2f_mou_8 std_og_mou_6 std_og_mou_7 std_og_mou_8 isd_og_mou_6 isd_og_mou_7 isd_og_mou_8 spl_og_mou_6 spl_og_mou_7 spl_og_mou_8 og_others_6 og_others_7 og_others_8 loc_ic_t2t_mou_6 loc_ic_t2t_mou_7 loc_ic_t2t_mou_8 loc_ic_t2m_mou_6 loc_ic_t2m_mou_7 loc_ic_t2m_mou_8 loc_ic_t2f_mou_6 loc_ic_t2f_mou_7 loc_ic_t2f_mou_8 loc_ic_mou_6 loc_ic_mou_7 loc_ic_mou_8 std_ic_t2t_mou_6 std_ic_t2t_mou_7 std_ic_t2t_mou_8 std_ic_t2m_mou_6 std_ic_t2m_mou_7 std_ic_t2m_mou_8 std_ic_t2f_mou_6 std_ic_t2f_mou_7 std_ic_t2f_mou_8 std_ic_mou_6 std_ic_mou_7 std_ic_mou_8 spl_ic_mou_6 spl_ic_mou_7 spl_ic_mou_8 isd_ic_mou_6 isd_ic_mou_7 isd_ic_mou_8 ic_others_6 ic_others_7 ic_others_8 total_rech_num_6 total_rech_num_7 total_rech_num_8 max_rech_amt_6 max_rech_amt_7 max_rech_amt_8 last_day_rch_amt_6 last_day_rch_amt_7 last_day_rch_amt_8 aon Average_rech_amt_6n7 delta_vol_2g delta_vol_3g delta_total_og_mou delta_total_ic_mou delta_vbc_3g delta_arpu delta_total_rech_amt
0.9794.98824.38723.61915.58935.69853.7932.7318.3618.6864.4841.2037.11207.93207.84196.91435.16437.49416.6618.3818.6616.964.044.844.45661.74657.38633.34630.53663.79567.34604.41645.88531.262.202.181.731140.931177.181057.290.000.000.0015.9319.5118.042.260.000.00154.88156.61148.14368.54364.54360.5439.2341.0437.19559.28558.99549.7934.7336.0132.1473.3875.2868.584.364.583.94115.91118.66108.380.280.000.0015.0118.3015.331.161.591.2323.0023.0021.00297.00300.00252.00250.00250.00225.002846.001118.0029.84170.07345.07147.3069.83257.31319.00
0.91848.97878.35783.49966.74984.02899.2939.6923.2823.3978.4350.0146.44225.96224.87213.83461.10461.81441.8420.2820.6818.844.685.515.11703.11692.67669.63686.26722.84622.13658.47695.77583.422.912.802.281195.611244.401125.280.000.000.0017.5421.2819.692.540.000.00165.79168.03159.84390.64387.11382.2043.5945.3941.21593.13589.65580.5438.2139.9135.9380.4181.9375.545.215.494.71125.98129.29118.240.300.000.0018.3421.8418.831.441.941.5124.0024.0022.00325.00330.00289.00250.00250.00250.002910.101156.0039.88227.15377.46161.8095.33278.90345.50
0.92909.05941.99848.961031.391038.09953.3548.7129.6829.6493.6060.9757.59247.94244.78232.33490.63488.04468.8322.5623.1420.935.456.265.86742.96735.69711.57750.31786.39680.10713.49760.98640.573.743.713.011268.831315.081201.290.130.050.0019.2623.3921.782.860.000.00180.18181.49173.59415.89412.03405.9748.6550.6646.19629.64624.36614.4542.7344.5839.9988.2790.4183.446.336.615.75138.32142.16130.550.330.000.0322.5826.9423.581.782.381.8625.0025.0023.00350.00350.00330.00250.00250.00250.002981.201202.0053.66289.30419.97177.35127.50303.51375.00
0.93990.481016.15920.961094.771103.931017.3560.4237.2837.90111.1575.0072.45275.51271.70254.64523.56519.80500.3825.2526.0023.516.347.156.79794.01786.73759.45812.08856.34753.44777.69828.18706.234.894.704.001358.411404.591283.200.330.250.0021.3325.8424.073.230.000.00195.66196.99188.25444.94439.30434.3554.5957.5051.92671.69667.07654.4148.0350.8245.6498.0099.6893.477.848.087.08153.30158.86146.870.360.000.1128.1833.3329.952.202.932.3827.0027.0025.00350.00398.00350.00252.00252.00250.003055.301257.0071.44361.57466.79195.08166.36331.83408.50
0.94000000000000011066.851097.121007.561168.091186.361096.6273.9649.4448.29137.4094.7390.27307.64305.87282.50563.70559.41537.1029.1329.7526.897.368.457.84855.97849.42817.15888.23933.58842.48856.37907.95786.166.236.115.261456.411503.091385.910.630.510.2323.9228.4926.833.640.000.00216.28214.47207.73478.56478.91472.1562.9665.7658.84717.00711.74704.8955.4958.0452.58110.61113.20105.549.669.898.77174.34179.84165.920.400.000.2135.8040.9736.572.813.732.9828.0028.0026.00400.00455.00398.00252.00252.00252.003107.001317.7097.46463.36524.65218.73217.42366.26447.20
0.95000000000000011153.971208.171115.661271.471286.281188.4694.5963.3462.80168.46119.34114.80348.62346.90324.14614.99608.01585.0633.5934.0931.318.699.959.33935.51920.12883.25986.241029.29936.49960.801004.26886.568.167.927.181558.501624.811518.821.101.010.5526.8132.1530.234.140.000.00243.94238.62232.50520.55518.65516.6772.6176.0567.56773.27781.18767.3164.9466.5561.56126.66130.41121.8812.2412.3110.98200.64205.16191.950.430.060.2546.4551.9846.483.634.833.9330.0030.0028.00500.00500.00455.00252.00274.00252.003179.001406.00129.68562.66604.55245.97284.40404.58499.00
0.96000000000000011282.781344.041256.341406.071407.781305.32120.0883.4382.12211.03153.97145.54411.69412.46380.74674.30670.99646.4839.8440.0537.6110.6111.8611.451025.571016.38975.801099.721146.441066.041101.071136.32998.2811.4310.879.591707.601766.851672.442.202.281.0931.4337.1134.334.780.000.00276.09270.15265.31578.33574.86573.4985.3089.2577.93847.56854.66848.8277.1581.3574.18151.66153.16144.7415.6715.8814.48237.74241.25224.120.460.130.2560.5967.4661.774.986.515.3132.0032.0031.00505.00550.00500.00330.00339.00300.003264.001508.50185.21705.70704.27282.57356.70458.40555.30
0.97000000000000011444.231497.251441.531578.821585.021481.57155.13117.54112.07270.52203.66188.86508.01500.20458.64758.99749.79734.0849.3849.0946.3613.0414.6814.141163.161143.621101.161243.361308.191235.721262.711308.351162.8416.4415.2613.641904.261950.831877.195.015.362.7337.8044.4440.555.560.000.00320.64322.92308.49655.65646.22642.58103.65109.7896.64959.39962.29941.1397.62101.6994.54184.59187.18176.9620.8121.7819.62290.90295.51279.290.530.200.3682.7591.0785.447.039.057.5835.0036.0034.00550.00550.00550.00398.00398.00379.003424.701633.85262.23895.25843.24334.26461.60529.37644.00
0.98000000000000011694.681772.621700.241837.931838.391739.01221.26166.28165.81363.12282.49266.53668.59660.28596.48885.83868.35853.8362.6963.0660.1117.1519.2318.671372.781338.791306.651458.711520.561463.191518.641558.411413.1424.5823.2321.172174.342312.912165.2612.9913.217.7548.0556.1451.156.840.000.00414.27407.34392.53775.61765.13748.24132.06143.85125.211136.291136.251114.04132.11138.47131.28245.19253.11234.7930.0231.4528.10367.71388.61362.840.580.300.50129.13135.43127.0110.8413.9911.5440.0040.0039.00655.00750.00619.00500.00500.00500.003632.001834.70392.111207.691051.14431.92621.75649.14779.30
0.99000000000000012166.372220.372188.502326.292410.102211.64349.35292.54288.49543.71448.13432.741076.241059.88956.501147.051112.661092.5990.8891.0686.6824.8628.2428.871806.941761.431689.071885.201919.191938.131955.612112.661905.8144.3943.8938.882744.492874.652800.8741.2540.4331.2471.3679.8774.119.310.000.00625.35648.79621.671026.441009.29976.09197.17205.25185.621484.991515.871459.55215.64231.15215.20393.73408.58372.6153.3956.5949.41577.89616.89563.890.680.510.61239.60240.13249.8920.7125.2621.5348.0048.0046.001000.001000.00951.00655.00655.00619.003651.002216.30654.311878.121465.10619.69929.64864.341036.40
1.07376.718157.7810752.568362.369667.1314007.342613.313813.294169.813775.112812.045337.046431.337400.6610752.564729.744557.144961.331466.031196.43928.49342.86569.71351.8310643.387674.7811039.917366.588133.668014.438314.769284.7413950.04628.56544.63516.918432.9910936.7313980.065900.665490.285681.541023.211265.791390.88100.61370.13394.936351.445709.594003.214693.864388.735738.461678.411983.011588.536496.116466.745748.815459.565800.934309.294630.233470.385645.861351.111136.081394.895459.636745.765957.1419.7621.336.233965.694747.914100.381344.141495.941209.86307.00138.00196.004010.004010.004449.004010.004010.004449.004321.0037762.508062.3015646.3912768.704862.628254.6212808.6214344.50
1# Looking at percentage change in quantiles from 0.90 to 1.
2data.quantile(np.arange(0.9,1.01,0.01)).pct_change().mul(100).style.bar()
onnet_mou_6 onnet_mou_7 onnet_mou_8 offnet_mou_6 offnet_mou_7 offnet_mou_8 roam_ic_mou_6 roam_ic_mou_7 roam_ic_mou_8 roam_og_mou_6 roam_og_mou_7 roam_og_mou_8 loc_og_t2t_mou_6 loc_og_t2t_mou_7 loc_og_t2t_mou_8 loc_og_t2m_mou_6 loc_og_t2m_mou_7 loc_og_t2m_mou_8 loc_og_t2f_mou_6 loc_og_t2f_mou_7 loc_og_t2f_mou_8 loc_og_t2c_mou_6 loc_og_t2c_mou_7 loc_og_t2c_mou_8 loc_og_mou_6 loc_og_mou_7 loc_og_mou_8 std_og_t2t_mou_6 std_og_t2t_mou_7 std_og_t2t_mou_8 std_og_t2m_mou_6 std_og_t2m_mou_7 std_og_t2m_mou_8 std_og_t2f_mou_6 std_og_t2f_mou_7 std_og_t2f_mou_8 std_og_mou_6 std_og_mou_7 std_og_mou_8 isd_og_mou_6 isd_og_mou_7 isd_og_mou_8 spl_og_mou_6 spl_og_mou_7 spl_og_mou_8 og_others_6 og_others_7 og_others_8 loc_ic_t2t_mou_6 loc_ic_t2t_mou_7 loc_ic_t2t_mou_8 loc_ic_t2m_mou_6 loc_ic_t2m_mou_7 loc_ic_t2m_mou_8 loc_ic_t2f_mou_6 loc_ic_t2f_mou_7 loc_ic_t2f_mou_8 loc_ic_mou_6 loc_ic_mou_7 loc_ic_mou_8 std_ic_t2t_mou_6 std_ic_t2t_mou_7 std_ic_t2t_mou_8 std_ic_t2m_mou_6 std_ic_t2m_mou_7 std_ic_t2m_mou_8 std_ic_t2f_mou_6 std_ic_t2f_mou_7 std_ic_t2f_mou_8 std_ic_mou_6 std_ic_mou_7 std_ic_mou_8 spl_ic_mou_6 spl_ic_mou_7 spl_ic_mou_8 isd_ic_mou_6 isd_ic_mou_7 isd_ic_mou_8 ic_others_6 ic_others_7 ic_others_8 total_rech_num_6 total_rech_num_7 total_rech_num_8 max_rech_amt_6 max_rech_amt_7 max_rech_amt_8 last_day_rch_amt_6 last_day_rch_amt_7 last_day_rch_amt_8 aon Average_rech_amt_6n7 delta_vol_2g delta_vol_3g delta_total_og_mou delta_total_ic_mou delta_vbc_3g delta_arpu delta_total_rech_amt
0.9nannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannannan
0.916.796.558.275.595.175.3321.2726.8025.2221.6421.3925.138.678.208.595.965.566.0410.3410.8311.0815.8413.8814.886.255.375.738.848.909.668.947.729.8232.2728.4431.794.795.716.43nannannan10.119.099.1612.39nannan7.057.297.906.006.196.0111.1110.6010.816.055.485.5910.0310.8411.799.588.8410.1519.5019.8919.548.698.969.107.14nannan22.1919.3522.8424.1422.0122.764.354.354.769.4310.0014.680.000.0011.112.253.4033.6833.569.399.8436.518.398.31
0.927.087.258.366.695.496.0122.7227.4926.7319.3421.9024.039.738.858.656.415.686.1111.2411.9111.0916.4513.5714.715.676.216.269.338.799.328.369.379.7928.5232.5032.026.125.686.76infinfnan9.819.9210.6012.60nannan8.688.018.606.466.446.2211.6011.6112.086.155.895.8411.8311.7011.309.7710.3510.4621.5020.3822.129.799.9510.4110.00naninf23.1223.3525.2023.6122.6823.184.174.174.557.696.0614.190.000.000.002.443.9834.5527.3611.269.6133.768.828.54
0.938.967.878.486.146.346.7124.0325.6227.8618.7523.0225.8011.1211.009.606.716.516.7311.9112.3212.3316.3314.2715.796.876.946.738.238.8910.789.008.8310.2530.7526.7732.767.066.816.82153.85400.00nan10.7610.4610.5012.94nannan8.598.548.446.996.626.9912.2113.4912.406.686.846.5012.4013.9814.1311.0210.2512.0123.8522.2423.0910.8311.7512.509.09nan266.6724.7923.7327.0223.6023.2427.968.008.008.700.0013.716.060.800.800.002.494.5833.1324.9811.1510.0030.479.338.93
0.94000000000000017.717.979.406.707.477.7922.4132.6127.4123.6226.3024.5911.6612.5810.947.677.627.3415.3814.4314.3916.0918.1015.527.807.977.609.389.0211.8210.129.6311.3227.4029.9231.637.217.018.0090.91104.00inf12.1210.2611.4912.69nannan10.548.8710.357.569.028.7015.3314.3613.336.756.707.7115.5314.2215.2112.8713.5712.9223.2122.4523.8413.7213.2112.9711.11nan90.9127.0422.9122.1127.7327.1725.213.703.704.0014.2914.3213.710.000.000.801.694.8336.4228.1512.4012.1230.6910.389.47
0.95000000000000018.1710.1210.738.858.428.3827.8928.1030.0322.6125.9727.1813.3213.4114.749.108.698.9315.3314.5816.4318.0717.7218.949.298.328.0911.0310.2511.1612.1910.6112.7730.9829.6236.507.018.109.5974.6098.04139.1312.0912.8512.6713.74nannan12.7911.2611.928.778.309.4315.3315.6514.827.859.768.8517.0414.6617.0814.5115.2115.4826.7124.4225.2315.0914.0815.697.50inf19.0529.7326.8927.1229.1829.4931.887.147.147.6925.009.8914.320.008.730.002.326.7033.0621.4315.2312.4530.8110.4611.58
0.960000000000000111.1611.2512.6110.599.459.8326.9531.7330.7725.2729.0226.7718.0918.9017.469.6410.3610.5018.5817.5120.1322.0919.2622.689.6310.4610.4811.5111.3813.8314.6013.1512.6040.0737.2733.579.578.7410.1199.64125.7498.5517.2315.4313.5715.46nannan13.1813.2214.1111.1010.8411.0017.4817.3715.359.619.4110.6218.8022.2320.5019.7417.4418.7628.0428.9831.8818.4917.5916.766.98116.670.0030.4529.7732.8737.1934.7835.016.676.6710.711.0010.009.8930.9523.7219.052.677.2942.8225.4216.4914.8825.4213.3011.28
0.970000000000000112.5911.4014.7412.2912.5913.5029.1940.8936.4828.1932.2729.7723.4021.2720.4612.5611.7413.5523.9522.5723.2522.9023.7923.5413.4212.5212.8513.0614.1115.9214.6815.1416.4843.8740.3642.2311.5210.4112.24128.14135.09150.0020.2719.7418.1116.32nannan16.1419.5316.2813.3712.4112.0521.5122.9924.0113.1912.5910.8726.5225.0127.4521.7122.2122.2632.7737.1835.5222.3622.4924.6115.2253.8544.0036.5835.0038.3241.1839.0342.869.3812.509.688.910.0010.0020.6117.4026.334.928.3141.5926.8619.7318.2929.4115.4815.97
0.980000000000000117.3418.3917.9516.4115.9917.3842.6341.4747.9534.2338.7141.1231.6132.0030.0616.7115.8116.3126.9728.4529.6631.5030.9532.0418.0217.0718.6617.3216.2318.4120.2719.1121.5349.4652.2055.1914.1818.5615.35159.20146.46183.8827.1126.3426.1523.02nannan29.2026.1427.2418.3018.4016.4427.4131.0429.5718.4418.0818.3735.3436.1738.8732.8335.2232.6844.2944.3943.1826.4031.5129.929.4350.0038.8956.0548.7148.6654.1754.5752.2714.2911.1114.7119.0936.3612.5525.6325.6331.936.0512.2949.5334.9024.6529.2234.7022.6321.01
0.990000000000000127.8325.2628.7226.5731.1027.1857.8975.9373.9949.7358.6362.3660.9760.5260.3629.4928.1427.9644.9644.4044.2044.9646.8654.6431.6331.5729.2729.2426.2232.4628.7735.5734.8680.5988.9583.6826.2224.2929.35217.65206.02303.1048.5042.2744.8836.08nannan50.9559.2758.3832.3431.9130.4549.3042.6948.2430.6933.4131.0163.2266.9363.9260.5861.4258.7077.8279.9475.8557.1658.7455.4117.2470.0022.0085.5577.3096.7591.0380.5486.5420.0020.0017.9552.6733.3353.6331.0031.0023.800.5220.8066.8755.5139.3843.4749.5233.1532.99
1.0240.51267.41391.32259.47301.11533.35648.041203.511345.42594.33527.511133.30497.57598.261024.15312.34309.57354.091513.241213.96971.171279.331917.741118.63489.03335.71553.61290.76323.81313.51325.17339.48631.981316.151141.041229.43207.27280.45399.1314204.6313481.4018086.751333.971484.831776.73980.90infinf915.66780.03543.95357.30334.83487.90751.25866.13755.80337.45326.60293.872431.762409.611902.471076.01749.391415.232430.881907.562723.15844.75993.52956.442805.884082.35921.311555.131877.271540.896390.925822.645519.41539.58187.50326.09301.00301.00367.82512.21512.21618.7418.351603.851132.18733.09771.52684.69787.931381.891284.07
1# Columns with outliers
2pct_change_99_1 = data.quantile(np.arange(0.9,1.01,0.01)).pct_change().mul(100).iloc[-1]
3outlier_condition = pct_change_99_1 > 100
4columns_with_outliers = pct_change_99_1[outlier_condition].index.values
5print('Columns with outliers :\n', columns_with_outliers)
1Columns with outliers :
2 ['onnet_mou_6' 'onnet_mou_7' 'onnet_mou_8' 'offnet_mou_6' 'offnet_mou_7'
3 'offnet_mou_8' 'roam_ic_mou_6' 'roam_ic_mou_7' 'roam_ic_mou_8'
4 'roam_og_mou_6' 'roam_og_mou_7' 'roam_og_mou_8' 'loc_og_t2t_mou_6'
5 'loc_og_t2t_mou_7' 'loc_og_t2t_mou_8' 'loc_og_t2m_mou_6'
6 'loc_og_t2m_mou_7' 'loc_og_t2m_mou_8' 'loc_og_t2f_mou_6'
7 'loc_og_t2f_mou_7' 'loc_og_t2f_mou_8' 'loc_og_t2c_mou_6'
8 'loc_og_t2c_mou_7' 'loc_og_t2c_mou_8' 'loc_og_mou_6' 'loc_og_mou_7'
9 'loc_og_mou_8' 'std_og_t2t_mou_6' 'std_og_t2t_mou_7' 'std_og_t2t_mou_8'
10 'std_og_t2m_mou_6' 'std_og_t2m_mou_7' 'std_og_t2m_mou_8'
11 'std_og_t2f_mou_6' 'std_og_t2f_mou_7' 'std_og_t2f_mou_8' 'std_og_mou_6'
12 'std_og_mou_7' 'std_og_mou_8' 'isd_og_mou_6' 'isd_og_mou_7'
13 'isd_og_mou_8' 'spl_og_mou_6' 'spl_og_mou_7' 'spl_og_mou_8' 'og_others_6'
14 'og_others_7' 'og_others_8' 'loc_ic_t2t_mou_6' 'loc_ic_t2t_mou_7'
15 'loc_ic_t2t_mou_8' 'loc_ic_t2m_mou_6' 'loc_ic_t2m_mou_7'
16 'loc_ic_t2m_mou_8' 'loc_ic_t2f_mou_6' 'loc_ic_t2f_mou_7'
17 'loc_ic_t2f_mou_8' 'loc_ic_mou_6' 'loc_ic_mou_7' 'loc_ic_mou_8'
18 'std_ic_t2t_mou_6' 'std_ic_t2t_mou_7' 'std_ic_t2t_mou_8'
19 'std_ic_t2m_mou_6' 'std_ic_t2m_mou_7' 'std_ic_t2m_mou_8'
20 'std_ic_t2f_mou_6' 'std_ic_t2f_mou_7' 'std_ic_t2f_mou_8' 'std_ic_mou_6'
21 'std_ic_mou_7' 'std_ic_mou_8' 'spl_ic_mou_6' 'spl_ic_mou_7'
22 'spl_ic_mou_8' 'isd_ic_mou_6' 'isd_ic_mou_7' 'isd_ic_mou_8' 'ic_others_6'
23 'ic_others_7' 'ic_others_8' 'total_rech_num_6' 'total_rech_num_7'
24 'total_rech_num_8' 'max_rech_amt_6' 'max_rech_amt_7' 'max_rech_amt_8'
25 'last_day_rch_amt_6' 'last_day_rch_amt_7' 'last_day_rch_amt_8'
26 'Average_rech_amt_6n7' 'delta_vol_2g' 'delta_vol_3g' 'delta_total_og_mou'
27 'delta_total_ic_mou' 'delta_vbc_3g' 'delta_arpu' 'delta_total_rech_amt']
1# capping outliers to 99th percentile values
2outlier_treatment = pd.DataFrame(columns=['Column', 'Outlier Threshold', 'Outliers replaced'])
3for col in columns_with_outliers :
4 outlier_threshold = data[col].quantile(0.99)
5 condition = data[col] > outlier_threshold
6 outlier_treatment = outlier_treatment.append({'Column' : col , 'Outlier Threshold' : outlier_threshold, 'Outliers replaced' : data.loc[condition,col].shape[0] }, ignore_index=True)
7 data.loc[condition, col] = outlier_threshold
8outlier_treatment
ColumnOutlier ThresholdOutliers replaced
0onnet_mou_62166.37301
1onnet_mou_72220.37301
2onnet_mou_82188.50301
3offnet_mou_62326.29301
4offnet_mou_72410.10301
5offnet_mou_82211.64301
6roam_ic_mou_6349.35301
7roam_ic_mou_7292.54301
8roam_ic_mou_8288.49301
9roam_og_mou_6543.71301
10roam_og_mou_7448.13301
11roam_og_mou_8432.74301
12loc_og_t2t_mou_61076.24301
13loc_og_t2t_mou_71059.88301
14loc_og_t2t_mou_8956.50301
15loc_og_t2m_mou_61147.05301
16loc_og_t2m_mou_71112.66301
17loc_og_t2m_mou_81092.59301
18loc_og_t2f_mou_690.88301
19loc_og_t2f_mou_791.06301
20loc_og_t2f_mou_886.68300
21loc_og_t2c_mou_624.86301
22loc_og_t2c_mou_728.24301
23loc_og_t2c_mou_828.87301
24loc_og_mou_61806.94301
25loc_og_mou_71761.43301
26loc_og_mou_81689.07301
27std_og_t2t_mou_61885.20301
28std_og_t2t_mou_71919.19301
29std_og_t2t_mou_81938.13301
30std_og_t2m_mou_61955.61301
31std_og_t2m_mou_72112.66301
32std_og_t2m_mou_81905.81301
33std_og_t2f_mou_644.39301
34std_og_t2f_mou_743.89301
35std_og_t2f_mou_838.88301
36std_og_mou_62744.49301
37std_og_mou_72874.65301
38std_og_mou_82800.87301
39isd_og_mou_641.25301
40isd_og_mou_740.43301
41isd_og_mou_831.24300
42spl_og_mou_671.36301
43spl_og_mou_779.87301
44spl_og_mou_874.11301
45og_others_69.31301
46og_others_70.00164
47og_others_80.00180
48loc_ic_t2t_mou_6625.35301
49loc_ic_t2t_mou_7648.79301
50loc_ic_t2t_mou_8621.67301
51loc_ic_t2m_mou_61026.44301
52loc_ic_t2m_mou_71009.29301
53loc_ic_t2m_mou_8976.09301
54loc_ic_t2f_mou_6197.17301
55loc_ic_t2f_mou_7205.25301
56loc_ic_t2f_mou_8185.62301
57loc_ic_mou_61484.99301
58loc_ic_mou_71515.87301
59loc_ic_mou_81459.55301
60std_ic_t2t_mou_6215.64301
61std_ic_t2t_mou_7231.15301
62std_ic_t2t_mou_8215.20301
63std_ic_t2m_mou_6393.73301
64std_ic_t2m_mou_7408.58301
65std_ic_t2m_mou_8372.61301
66std_ic_t2f_mou_653.39301
67std_ic_t2f_mou_756.59300
68std_ic_t2f_mou_849.41301
69std_ic_mou_6577.89301
70std_ic_mou_7616.89301
71std_ic_mou_8563.89301
72spl_ic_mou_60.68278
73spl_ic_mou_70.51295
74spl_ic_mou_80.61293
75isd_ic_mou_6239.60301
76isd_ic_mou_7240.13301
77isd_ic_mou_8249.89301
78ic_others_620.71301
79ic_others_725.26301
80ic_others_821.53300
81total_rech_num_648.00283
82total_rech_num_748.00283
83total_rech_num_846.00287
84max_rech_amt_61000.00169
85max_rech_amt_71000.00204
86max_rech_amt_8951.00289
87last_day_rch_amt_6655.00284
88last_day_rch_amt_7655.00300
89last_day_rch_amt_8619.00283
90Average_rech_amt_6n72216.30301
91delta_vol_2g654.31301
92delta_vol_3g1878.12301
93delta_total_og_mou1465.10301
94delta_total_ic_mou619.69301
95delta_vbc_3g929.64301
96delta_arpu864.34301
97delta_total_rech_amt1036.40301
1categorical = data.dtypes == 'category'
2categorical_vars = data.columns[categorical].to_list()
3ind_categorical_vars = set(categorical_vars) - {'Churn'} #independent categorical variables
4ind_categorical_vars
1{'monthly_2g_6',
2 'monthly_2g_7',
3 'monthly_2g_8',
4 'monthly_3g_6',
5 'monthly_3g_7',
6 'monthly_3g_8',
7 'sachet_2g_6',
8 'sachet_2g_7',
9 'sachet_2g_8',
10 'sachet_3g_6',
11 'sachet_3g_7',
12 'sachet_3g_8'}

Grouping Categories with less Contribution

1# Finding & Grouping categories with less than 1% contribution in each column into "Others"
2for col in ind_categorical_vars :
3 category_counts = 100*data[col].value_counts(normalize=True)
4 print('\n',tabulate(pd.DataFrame(category_counts), headers='keys', tablefmt='psql'),'\n')
5 low_count_categories = category_counts[category_counts <= 1].index.to_list()
6 print(f"Replaced {low_count_categories} in {col} with category : Others")
7 data[col].replace(low_count_categories,'Others',inplace=True)
1+----+---------------+
2| | sachet_3g_6 |
3|----+---------------|
4| 0 | 93.4091 |
5| 1 | 4.35507 |
6| 2 | 1.04295 |
7| 3 | 0.396521 |
8| 4 | 0.219919 |
9| 5 | 0.123288 |
10| 6 | 0.089967 |
11| 7 | 0.0866349 |
12| 8 | 0.0499817 |
13| 9 | 0.0499817 |
14| 10 | 0.0366532 |
15| 11 | 0.0266569 |
16| 15 | 0.0166606 |
17| 12 | 0.0133284 |
18| 19 | 0.0133284 |
19| 13 | 0.00999633 |
20| 14 | 0.00999633 |
21| 18 | 0.00999633 |
22| 23 | 0.00999633 |
23| 16 | 0.00666422 |
24| 22 | 0.00666422 |
25| 29 | 0.00666422 |
26| 28 | 0.00333211 |
27| 17 | 0.00333211 |
28| 21 | 0.00333211 |
29+----+---------------+
30
31Replaced [3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 12, 19, 13, 14, 18, 23, 16, 22, 29, 28, 17, 21] in sachet_3g_6 with category : Others
32
33 +----+---------------+
34| | sachet_2g_6 |
35|----+---------------|
36| 0 | 82.5631 |
37| 1 | 7.87378 |
38| 2 | 3.3621 |
39| 3 | 2.0126 |
40| 4 | 1.32951 |
41| 5 | 0.703076 |
42| 6 | 0.509813 |
43| 7 | 0.356536 |
44| 8 | 0.286562 |
45| 9 | 0.239912 |
46| 10 | 0.17327 |
47| 12 | 0.146613 |
48| 11 | 0.0999633 |
49| 13 | 0.0566459 |
50| 14 | 0.0533138 |
51| 15 | 0.0433175 |
52| 17 | 0.0366532 |
53| 18 | 0.029989 |
54| 19 | 0.029989 |
55| 16 | 0.0233248 |
56| 22 | 0.0133284 |
57| 20 | 0.00999633 |
58| 21 | 0.00999633 |
59| 24 | 0.00999633 |
60| 25 | 0.00999633 |
61| 39 | 0.00333211 |
62| 27 | 0.00333211 |
63| 30 | 0.00333211 |
64| 32 | 0.00333211 |
65| 34 | 0.00333211 |
66| 28 | 0 |
67| 42 | 0 |
68+----+---------------+
69
70Replaced [5, 6, 7, 8, 9, 10, 12, 11, 13, 14, 15, 17, 18, 19, 16, 22, 20, 21, 24, 25, 39, 27, 30, 32, 34, 28, 42] in sachet_2g_6 with category : Others
71
72 +----+----------------+
73| | monthly_2g_7 |
74|----+----------------|
75| 0 | 88.4876 |
76| 1 | 10.0397 |
77| 2 | 1.35284 |
78| 3 | 0.0966312 |
79| 4 | 0.0166606 |
80| 5 | 0.00666422 |
81+----+----------------+
82
83Replaced [3, 4, 5] in monthly_2g_7 with category : Others
84
85 +----+---------------+
86| | sachet_2g_7 |
87|----+---------------|
88| 0 | 81.8033 |
89| 1 | 7.24068 |
90| 2 | 3.34877 |
91| 3 | 1.96595 |
92| 4 | 1.50945 |
93| 5 | 1.20622 |
94| 6 | 0.843024 |
95| 7 | 0.543134 |
96| 8 | 0.403185 |
97| 10 | 0.239912 |
98| 9 | 0.219919 |
99| 11 | 0.159941 |
100| 12 | 0.0966312 |
101| 14 | 0.0799707 |
102| 13 | 0.0666422 |
103| 15 | 0.0499817 |
104| 16 | 0.0366532 |
105| 18 | 0.0333211 |
106| 17 | 0.029989 |
107| 20 | 0.0266569 |
108| 19 | 0.0233248 |
109| 21 | 0.00999633 |
110| 26 | 0.00999633 |
111| 27 | 0.00999633 |
112| 22 | 0.00666422 |
113| 23 | 0.00666422 |
114| 30 | 0.00666422 |
115| 42 | 0.00333211 |
116| 24 | 0.00333211 |
117| 25 | 0.00333211 |
118| 29 | 0.00333211 |
119| 32 | 0.00333211 |
120| 35 | 0.00333211 |
121| 48 | 0.00333211 |
122| 28 | 0 |
123+----+---------------+
124
125Replaced [6, 7, 8, 10, 9, 11, 12, 14, 13, 15, 16, 18, 17, 20, 19, 21, 26, 27, 22, 23, 30, 42, 24, 25, 29, 32, 35, 48, 28] in sachet_2g_7 with category : Others
126
127 +----+----------------+
128| | monthly_2g_6 |
129|----+----------------|
130| 0 | 88.9074 |
131| 1 | 9.83306 |
132| 2 | 1.14958 |
133| 3 | 0.0866349 |
134| 4 | 0.0233248 |
135+----+----------------+
136
137Replaced [3, 4] in monthly_2g_6 with category : Others
138
139 +----+---------------+
140| | sachet_3g_7 |
141|----+---------------|
142| 0 | 93.4757 |
143| 1 | 4.10849 |
144| 2 | 1.03962 |
145| 3 | 0.383193 |
146| 4 | 0.239912 |
147| 5 | 0.219919 |
148| 6 | 0.139949 |
149| 7 | 0.059978 |
150| 9 | 0.0533138 |
151| 8 | 0.0466496 |
152| 11 | 0.0433175 |
153| 10 | 0.0333211 |
154| 12 | 0.0333211 |
155| 15 | 0.0166606 |
156| 14 | 0.0166606 |
157| 13 | 0.0133284 |
158| 18 | 0.0133284 |
159| 19 | 0.00999633 |
160| 20 | 0.00999633 |
161| 22 | 0.00999633 |
162| 17 | 0.00666422 |
163| 21 | 0.00666422 |
164| 24 | 0.00666422 |
165| 33 | 0.00333211 |
166| 16 | 0.00333211 |
167| 31 | 0.00333211 |
168| 35 | 0.00333211 |
169+----+---------------+
170
171Replaced [3, 4, 5, 6, 7, 9, 8, 11, 10, 12, 15, 14, 13, 18, 19, 20, 22, 17, 21, 24, 33, 16, 31, 35] in sachet_3g_7 with category : Others
172
173 +----+---------------+
174| | sachet_3g_8 |
175|----+---------------|
176| 0 | 94.2388 |
177| 1 | 3.52537 |
178| 2 | 0.839692 |
179| 3 | 0.429842 |
180| 4 | 0.243244 |
181| 5 | 0.219919 |
182| 6 | 0.0866349 |
183| 7 | 0.0766386 |
184| 8 | 0.0733065 |
185| 9 | 0.0399853 |
186| 12 | 0.0366532 |
187| 13 | 0.0333211 |
188| 10 | 0.0333211 |
189| 11 | 0.0199927 |
190| 14 | 0.0199927 |
191| 15 | 0.0166606 |
192| 16 | 0.00999633 |
193| 17 | 0.00666422 |
194| 18 | 0.00666422 |
195| 20 | 0.00666422 |
196| 21 | 0.00666422 |
197| 23 | 0.00666422 |
198| 38 | 0.00333211 |
199| 19 | 0.00333211 |
200| 25 | 0.00333211 |
201| 27 | 0.00333211 |
202| 29 | 0.00333211 |
203| 30 | 0.00333211 |
204| 41 | 0.00333211 |
205+----+---------------+
206
207Replaced [2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 10, 11, 14, 15, 16, 17, 18, 20, 21, 23, 38, 19, 25, 27, 29, 30, 41] in sachet_3g_8 with category : Others
208
209 +----+----------------+
210| | monthly_3g_6 |
211|----+----------------|
212| 0 | 88.0744 |
213| 1 | 8.4669 |
214| 2 | 2.32248 |
215| 3 | 0.689747 |
216| 4 | 0.246576 |
217| 5 | 0.106628 |
218| 6 | 0.0366532 |
219| 7 | 0.029989 |
220| 8 | 0.00999633 |
221| 11 | 0.00666422 |
222| 9 | 0.00666422 |
223| 14 | 0.00333211 |
224+----+----------------+
225
226Replaced [3, 4, 5, 6, 7, 8, 11, 9, 14] in monthly_3g_6 with category : Others
227
228 +----+----------------+
229| | monthly_2g_8 |
230|----+----------------|
231| 0 | 89.7604 |
232| 1 | 9.19996 |
233| 2 | 0.942988 |
234| 3 | 0.0733065 |
235| 4 | 0.0166606 |
236| 5 | 0.00666422 |
237+----+----------------+
238
239Replaced [2, 3, 4, 5] in monthly_2g_8 with category : Others
240
241 +----+---------------+
242| | sachet_2g_8 |
243|----+---------------|
244| 0 | 79.7274 |
245| 1 | 8.87008 |
246| 2 | 3.25881 |
247| 3 | 2.19253 |
248| 4 | 1.81267 |
249| 5 | 1.44947 |
250| 6 | 0.88301 |
251| 7 | 0.459831 |
252| 8 | 0.313218 |
253| 9 | 0.249908 |
254| 10 | 0.169938 |
255| 11 | 0.123288 |
256| 12 | 0.113292 |
257| 14 | 0.0766386 |
258| 15 | 0.0566459 |
259| 13 | 0.0499817 |
260| 16 | 0.0433175 |
261| 18 | 0.0266569 |
262| 17 | 0.0233248 |
263| 19 | 0.0233248 |
264| 20 | 0.0133284 |
265| 34 | 0.00666422 |
266| 29 | 0.00666422 |
267| 27 | 0.00666422 |
268| 24 | 0.00666422 |
269| 22 | 0.00666422 |
270| 21 | 0.00666422 |
271| 23 | 0.00333211 |
272| 25 | 0.00333211 |
273| 26 | 0.00333211 |
274| 31 | 0.00333211 |
275| 32 | 0.00333211 |
276| 33 | 0.00333211 |
277| 44 | 0.00333211 |
278+----+---------------+
279
280Replaced [6, 7, 8, 9, 10, 11, 12, 14, 15, 13, 16, 18, 17, 19, 20, 34, 29, 27, 24, 22, 21, 23, 25, 26, 31, 32, 33, 44] in sachet_2g_8 with category : Others
281
282 +----+----------------+
283| | monthly_3g_8 |
284|----+----------------|
285| 0 | 88.3876 |
286| 1 | 8.00706 |
287| 2 | 2.45243 |
288| 3 | 0.656426 |
289| 4 | 0.289894 |
290| 5 | 0.0999633 |
291| 6 | 0.0466496 |
292| 7 | 0.029989 |
293| 9 | 0.00999633 |
294| 8 | 0.00999633 |
295| 10 | 0.00666422 |
296| 16 | 0.00333211 |
297+----+----------------+
298
299Replaced [3, 4, 5, 6, 7, 9, 8, 10, 16] in monthly_3g_8 with category : Others
300
301 +----+----------------+
302| | monthly_3g_7 |
303|----+----------------|
304| 0 | 87.8378 |
305| 1 | 8.21699 |
306| 2 | 2.739 |
307| 3 | 0.689747 |
308| 4 | 0.226584 |
309| 5 | 0.129952 |
310| 6 | 0.0766386 |
311| 7 | 0.0333211 |
312| 8 | 0.0166606 |
313| 9 | 0.0133284 |
314| 11 | 0.00666422 |
315| 16 | 0.00333211 |
316| 14 | 0.00333211 |
317| 12 | 0.00333211 |
318| 10 | 0.00333211 |
319+----+----------------+
320
321Replaced [3, 4, 5, 6, 7, 8, 9, 11, 16, 14, 12, 10] in monthly_3g_7 with category : Others

Creating Dummy Variables

1dummy_vars = pd.get_dummies(data[ind_categorical_vars], drop_first=False, prefix=ind_categorical_vars, prefix_sep='_')
2dummy_vars.head()
sachet_3g_6_0sachet_3g_6_1sachet_3g_6_2sachet_3g_6_Otherssachet_2g_6_0sachet_2g_6_1sachet_2g_6_2sachet_2g_6_3sachet_2g_6_4sachet_2g_6_Othersmonthly_2g_7_0monthly_2g_7_1monthly_2g_7_2monthly_2g_7_Otherssachet_2g_7_0sachet_2g_7_1sachet_2g_7_2sachet_2g_7_3sachet_2g_7_4sachet_2g_7_5sachet_2g_7_Othersmonthly_2g_6_0monthly_2g_6_1monthly_2g_6_2monthly_2g_6_Otherssachet_3g_7_0sachet_3g_7_1sachet_3g_7_2sachet_3g_7_Otherssachet_3g_8_0sachet_3g_8_1sachet_3g_8_Othersmonthly_3g_6_0monthly_3g_6_1monthly_3g_6_2monthly_3g_6_Othersmonthly_2g_8_0monthly_2g_8_1monthly_2g_8_Otherssachet_2g_8_0sachet_2g_8_1sachet_2g_8_2sachet_2g_8_3sachet_2g_8_4sachet_2g_8_5sachet_2g_8_Othersmonthly_3g_8_0monthly_3g_8_1monthly_3g_8_2monthly_3g_8_Othersmonthly_3g_7_0monthly_3g_7_1monthly_3g_7_2monthly_3g_7_Others
mobile_number
7000701601100010000010001000000100010001001000100100000010001000
7001524846100010000001000100000100010001001000100000100010001000
7002191713100010000010001000000100010001001000100000100010001000
7000875565100010000010001000000100010001001000100100000010001000
7000187447100010000010001000000100010001001000100100000010001000
1
1reference_cols = dummy_vars.filter(regex='.*Others$').columns.to_list() # Using category 'Others' in each column as reference.
2dummy_vars.drop(columns=reference_cols, inplace=True)
3reference_cols
1['sachet_3g_6_Others',
2 'sachet_2g_6_Others',
3 'monthly_2g_7_Others',
4 'sachet_2g_7_Others',
5 'monthly_2g_6_Others',
6 'sachet_3g_7_Others',
7 'sachet_3g_8_Others',
8 'monthly_3g_6_Others',
9 'monthly_2g_8_Others',
10 'sachet_2g_8_Others',
11 'monthly_3g_8_Others',
12 'monthly_3g_7_Others']
1# concatenating dummy variables with original 'data'
2data.drop(columns=ind_categorical_vars, inplace=True) # dropping original categorical columns
3data = pd.concat([data, dummy_vars], axis=1)
4data.head()
onnet_mou_6onnet_mou_7onnet_mou_8offnet_mou_6offnet_mou_7offnet_mou_8roam_ic_mou_6roam_ic_mou_7roam_ic_mou_8roam_og_mou_6roam_og_mou_7roam_og_mou_8loc_og_t2t_mou_6loc_og_t2t_mou_7loc_og_t2t_mou_8loc_og_t2m_mou_6loc_og_t2m_mou_7loc_og_t2m_mou_8loc_og_t2f_mou_6loc_og_t2f_mou_7loc_og_t2f_mou_8loc_og_t2c_mou_6loc_og_t2c_mou_7loc_og_t2c_mou_8loc_og_mou_6loc_og_mou_7loc_og_mou_8std_og_t2t_mou_6std_og_t2t_mou_7std_og_t2t_mou_8std_og_t2m_mou_6std_og_t2m_mou_7std_og_t2m_mou_8std_og_t2f_mou_6std_og_t2f_mou_7std_og_t2f_mou_8std_og_mou_6std_og_mou_7std_og_mou_8isd_og_mou_6isd_og_mou_7isd_og_mou_8spl_og_mou_6spl_og_mou_7spl_og_mou_8og_others_6og_others_7og_others_8loc_ic_t2t_mou_6loc_ic_t2t_mou_7loc_ic_t2t_mou_8loc_ic_t2m_mou_6loc_ic_t2m_mou_7loc_ic_t2m_mou_8loc_ic_t2f_mou_6loc_ic_t2f_mou_7loc_ic_t2f_mou_8loc_ic_mou_6loc_ic_mou_7loc_ic_mou_8std_ic_t2t_mou_6std_ic_t2t_mou_7std_ic_t2t_mou_8std_ic_t2m_mou_6std_ic_t2m_mou_7std_ic_t2m_mou_8std_ic_t2f_mou_6std_ic_t2f_mou_7std_ic_t2f_mou_8std_ic_mou_6std_ic_mou_7std_ic_mou_8spl_ic_mou_6spl_ic_mou_7spl_ic_mou_8isd_ic_mou_6isd_ic_mou_7isd_ic_mou_8ic_others_6ic_others_7ic_others_8total_rech_num_6total_rech_num_7total_rech_num_8max_rech_amt_6max_rech_amt_7max_rech_amt_8last_day_rch_amt_6last_day_rch_amt_7last_day_rch_amt_8aonAverage_rech_amt_6n7Churndelta_vol_2gdelta_vol_3gdelta_total_og_moudelta_total_ic_moudelta_vbc_3gdelta_arpudelta_total_rech_amtsachet_3g_6_0sachet_3g_6_1sachet_3g_6_2sachet_2g_6_0sachet_2g_6_1sachet_2g_6_2sachet_2g_6_3sachet_2g_6_4monthly_2g_7_0monthly_2g_7_1monthly_2g_7_2sachet_2g_7_0sachet_2g_7_1sachet_2g_7_2sachet_2g_7_3sachet_2g_7_4sachet_2g_7_5monthly_2g_6_0monthly_2g_6_1monthly_2g_6_2sachet_3g_7_0sachet_3g_7_1sachet_3g_7_2sachet_3g_8_0sachet_3g_8_1monthly_3g_6_0monthly_3g_6_1monthly_3g_6_2monthly_2g_8_0monthly_2g_8_1sachet_2g_8_0sachet_2g_8_1sachet_2g_8_2sachet_2g_8_3sachet_2g_8_4sachet_2g_8_5monthly_3g_8_0monthly_3g_8_1monthly_3g_8_2monthly_3g_7_0monthly_3g_7_1monthly_3g_7_2
mobile_number
700070160157.8454.6852.29453.43567.16325.9116.2333.4931.6423.7412.5938.0651.3931.3840.28308.63447.38162.2862.1355.1453.230.00.00.00422.16533.91255.794.3023.2912.0149.8931.7649.146.6620.0816.6860.8675.1477.840.00.1810.014.500.006.500.000.00.058.1432.2627.31217.56221.49121.19152.16101.4639.53427.88355.23188.0436.8911.8330.3991.44126.99141.3352.1934.2422.21180.54173.08193.940.210.00.02.0614.5331.5915.7415.1915.145.05.07.01000.0790.0951.00.00.0619.08021185.010.000.00-198.22-163.5138.68864.341036.4100100001001000001001001010010100000100100
7001524846413.69351.0335.0894.6680.63136.480.000.000.000.000.000.00297.13217.5912.4980.9670.5850.540.000.000.000.00.07.15378.09288.1863.04116.56133.4322.5813.6910.0475.690.000.000.00130.26143.4898.280.00.000.000.000.0010.230.000.00.023.849.840.3157.5813.9815.480.000.000.0081.4323.8315.790.000.580.1022.434.080.650.000.000.0022.434.660.750.000.00.00.000.000.000.000.000.0019.021.014.090.0154.030.050.00.010.0315519.00-177.97-363.54-298.45-49.63-495.38-298.11-399.0100100000100100001001001010010000100100100
7002191713501.76108.39534.24413.31119.28482.4623.53144.2472.117.9835.261.4449.636.1936.01151.1347.28294.464.540.0023.510.00.00.49205.3153.48353.99446.4185.98498.23255.3652.94156.940.000.000.00701.78138.93655.180.00.001.290.000.004.780.000.00.067.887.5852.58142.8818.53195.184.810.007.49215.5826.11255.26115.6838.29154.58308.1329.79317.910.000.001.91423.8168.09474.410.450.00.0239.6062.11249.8920.7116.2421.446.04.011.0110.0110.0130.0110.050.00.02607380.000.020.00465.51573.930.00244.00337.0100100001001000001001001010010000100100100
700087556550.5174.0170.61296.29229.74162.760.002.830.000.0017.740.0042.6165.1667.38273.29145.99128.280.004.4810.260.00.00.00315.91215.64205.937.892.583.2322.9964.5118.290.000.000.0030.8967.0921.530.00.000.000.003.265.910.000.00.041.3371.4428.89226.81149.69150.168.718.6832.71276.86229.83211.7868.7978.646.3318.6873.0873.930.510.002.1887.99151.7382.440.000.00.00.000.000.230.000.000.0010.06.02.0110.0110.0130.0100.0100.0130.0511459.000.000.00-83.03-78.75-12.17-177.53-299.0100100001001000001001001010010100000100100
70001874471185.919.287.7961.640.005.540.004.764.810.008.4613.3438.990.000.0058.540.000.000.000.000.000.00.00.0097.540.000.001146.910.810.001.550.000.000.000.000.001148.460.810.000.00.000.002.580.000.000.930.00.034.540.000.0047.412.310.000.000.000.0081.962.310.008.630.000.001.280.000.000.000.000.009.910.000.000.000.00.00.000.000.000.000.000.0019.02.04.0110.00.030.030.00.00.0667408.000.000.00-625.17-47.090.00-329.00-378.0100100001001000001001001010010100000100100
1dummy_cols = dummy_vars.columns.to_list()
2data[dummy_cols] = data[dummy_cols].astype('category')
1data.shape
1(30011, 142)
1data.reset_index('mobile_number').to_csv('cleaned_churn_data.csv')

Continue to Part-2 for Modelling

1

More articles from Yugen

Telecom Churn Case Study - Part 2

Predicting telecom customers who might suspend connections.

December 4th, 2020 · 2 min read

Lead Scoring for X Education

How to find potential customers for Education offerings

September 7th, 2020 · 14 min read
© 2020–2021 Yugen
Link to $http://twitter.com/JayanthBoddu/Link to $https://github.com/jayantb1019Link to $https://www.instagram.com/jayantb1019/Link to $https://www.linkedin.com/in/jayanthboddu/