Sonic Log Prediction

Summary

Sonic logs are required for well-seismic ties, an essential step in the exploration workflow. However, due to various resource constraints, sonic logs might not be recorded in several wells in a field. On the other hand, it is seen that basic logs like Gamma Ray, Neutron Porosity, Bulk Density, Resistivity are almost always recorded in all wells. Geoscientists approximate missing sonic logs from sonic logs of nearby wells using empirical techniques. These techniques might not generalise to wells other than the well in question. There is also the consistency problem.

This analysis uses a data driven approach to predict missing sonic logs on a field scale. This might help predict sonic logs for all wells in a field , in turn vastly improving well seismic ties, seismic resolution , depth migration etc. This Analysis predicts Delta-T compressional from GR, RHOB, NPHI using Machine learning based algorithms. The dataset consists of 8 wells from Poseidon field of North West Australian Offshore, obtained from Data Underground. LAS files containing spliced well logs sampled at 0.5m interval are used for the analysis ( Spliced logs provided by Occam Technologies ).

3 Models have been trained, one each using Support Vector Machines, Gradient Boosting & Artificial Neural Networks.

The model is trained on 6 out of 8 wells and validated on 2 wells.

The analysis shows promising results with the mean square error obtained on validation wells being 0.165 & 0.07. Further, predicted sonic logs have high negative correlation with bulk density as dictated by physics.

Importing Data

1import numpy as np
2import pandas as pd
3import glob
4import lasio as ls
5import matplotlib.pyplot as plt
6import warnings 
7warnings.filterwarnings('ignore')

1log_list = glob.glob('Poseidon_logs/*.LAS')
2log_list

1['Poseidon_logs/Poseidon1Decim.LAS',
2 'Poseidon_logs/Kronos1Decim.LAS',
3 'Poseidon_logs/Boreas1Decim.LAS',
4 'Poseidon_logs/Proteus1Decim.LAS',
5 'Poseidon_logs/Pharos1Decim.LAS',
6 'Poseidon_logs/Torosa1LDecim.LAS',
7 'Poseidon_logs/PoseidonNorth1Decim.LAS',
8 'Poseidon_logs/Poseidon2Decim.LAS']

1# Predictors
2columns_of_interest = ['RHOB', 'TNPH','ECGR','DTCO']
3mnemonic_replacements = {
4    'NPHI' : 'TNPH' ,
5    'HROM' : 'RHOB',
6    'GR'   : 'ECGR',
7    'GRD'   : 'ECGR',
8    'TNP'  : 'TNPH',
9    'RHOZ' : 'RHOB',
10    'HTNP' : 'TNPH',
11    'BATC' : 'DTCO',
12    'GRARC'   : 'ECGR',
13    'DTC' : 'DTCO'
14   
15}

1# Reading LAS files 
2
3las_data = {}
4las_df = {}
5well_list = []
6
7for las_file in log_list : 
8    well_name = las_file.split('Decim.LAS')[0].split('/')[1]
9    
10    well_list.append(well_name)
11    
12    las = ls.read(las_file)
13    
14    # saving las in las_data 
15    
16    las_data[well_name] = las
17    
18    df = las.df()
19    
20    # mnemonic replacements 
21    for column in df.columns.values : 
22        if column in mnemonic_replacements.keys() : 
23            df = df.rename(columns = {column : mnemonic_replacements[column]})
24    
25    
26    las_df[well_name] = df
27    print('\nWell Name : ', well_name )
28    print(df.head())
29    print("No of datapoints :",df.shape[0])
30print(well_list)

1Well Name :  Poseidon1
2       ECGR    ATRX    ATRT  TNPH  CAL1  HDAR  RHOB  DTSM  DTCO
3DEPT                                                           
4579.5   NaN  0.6000  0.6700   NaN   NaN   NaN   NaN   NaN   NaN
5580.0   NaN  0.6700  0.7000   NaN   NaN   NaN   NaN   NaN   NaN
6580.5   NaN  0.6200  0.6700   NaN   NaN   NaN   NaN   NaN   NaN
7581.0   NaN  0.5500  0.6400   NaN   NaN   NaN   NaN   NaN   NaN
8581.5   NaN  0.5598  0.6295   NaN   NaN   NaN   NaN   NaN   NaN
9No of datapoints : 9073
10
11Well Name :  Kronos1
12         ECGR  RS  RD  TNPH  RHOB  DCAV  DTCO  DTSM
13DEPT                                               
14531.0  0.5559 NaN NaN   NaN   NaN   NaN   NaN   NaN
15531.5  0.9649 NaN NaN   NaN   NaN   NaN   NaN   NaN
16532.0  0.7309 NaN NaN   NaN   NaN   NaN   NaN   NaN
17532.5  4.3208 NaN NaN   NaN   NaN   NaN   NaN   NaN
18533.0 -0.0001 NaN NaN   NaN   NaN   NaN   NaN   NaN
19No of datapoints : 9584
20
21Well Name :  Boreas1
22         ECGR  RS  RD  RHOB  TNPH  DTCO  DTSM  HDAR
23DEPT                                               
24200.0  3.8287 NaN NaN   NaN   NaN   NaN   NaN   NaN
25200.5  3.8287 NaN NaN   NaN   NaN   NaN   NaN   NaN
26201.0  3.1437 NaN NaN   NaN   NaN   NaN   NaN   NaN
27201.5  1.4981 NaN NaN   NaN   NaN   NaN   NaN   NaN
28202.0  3.9696 NaN NaN   NaN   NaN   NaN   NaN   NaN
29No of datapoints : 10012
30
31Well Name :  Proteus1
32          ECGR  RS  RD  DTCO  DTSM  HDAR  RHOB  TNPH
33DEPT                                                
34489.5  16.2084 NaN NaN   NaN   NaN   NaN   NaN   NaN
35490.0  20.6540 NaN NaN   NaN   NaN   NaN   NaN   NaN
36490.5  19.2559 NaN NaN   NaN   NaN   NaN   NaN   NaN
37491.0  15.8395 NaN NaN   NaN   NaN   NaN   NaN   NaN
38491.5  14.7239 NaN NaN   NaN   NaN   NaN   NaN   NaN
39No of datapoints : 9521
40
41Well Name :  Pharos1
42       RHOB  TNPH     ECGR  P16H  P34H  DTCO  DTS
43DEPT                                             
44482.0   NaN   NaN  48.9556   NaN   NaN   NaN  NaN
45482.5   NaN   NaN  54.2686   NaN   NaN   NaN  NaN
46483.0   NaN   NaN  56.0523   NaN   NaN   NaN  NaN
47483.5   NaN   NaN  53.8891   NaN   NaN   NaN  NaN
48484.0   NaN   NaN  67.2855   NaN   NaN   NaN  NaN
49No of datapoints : 9488
50
51Well Name :  Torosa1L
52          ECGR      RS      RD  RHOB  TNPH  HDAR  DTCO  DTS
53DEPT                                                       
54555.0  12.3003  0.6826  0.7422   NaN   NaN   NaN   NaN  NaN
55555.5  24.5442  0.4900  0.7223   NaN   NaN   NaN   NaN  NaN
56556.0  25.4925  0.4401  0.4498   NaN   NaN   NaN   NaN  NaN
57556.5  25.2527  0.4874  0.7347   NaN   NaN   NaN   NaN  NaN
58557.0  28.6476  0.4607  0.6694   NaN   NaN   NaN   NaN  NaN
59No of datapoints : 8257
60
61Well Name :  PoseidonNorth1
62          ECGR  RS  RD  RHOB  TNPH  DTCO  DTSM  HDAR
63DEPT                                                
64508.0  24.4403 NaN NaN   NaN   NaN   NaN   NaN   NaN
65508.5  29.6471 NaN NaN   NaN   NaN   NaN   NaN   NaN
66509.0  36.3076 NaN NaN   NaN   NaN   NaN   NaN   NaN
67509.5  56.3806 NaN NaN   NaN   NaN   NaN   NaN   NaN
68510.0  54.7321 NaN NaN   NaN   NaN   NaN   NaN   NaN
69No of datapoints : 9544
70
71Well Name :  Poseidon2
72          ECGR      RS      RD  DTCO  DTSM  DCAV  RHOB  TNPH
73DEPT                                                        
74490.0  17.6006  0.6898  0.3094   NaN   NaN   NaN   NaN   NaN
75490.5  17.6006  0.6898  0.3094   NaN   NaN   NaN   NaN   NaN
76491.0  17.6006  0.6898  0.3094   NaN   NaN   NaN   NaN   NaN
77491.5  16.7792  0.6959  0.3141   NaN   NaN   NaN   NaN   NaN
78492.0  15.9579  0.6959  0.3141   NaN   NaN   NaN   NaN   NaN
79No of datapoints : 9723
80['Poseidon1', 'Kronos1', 'Boreas1', 'Proteus1', 'Pharos1', 'Torosa1L', 'PoseidonNorth1', 'Poseidon2']

The well names are : ‘Boreas1’, ‘Poseidon1’, ‘Kronos1’, ‘Proteus1’, ‘Poseidon2’, ‘Torosa1L’, ‘PoseidonNorth1’, ‘Pharos1’.
The following curves are extracted from each LAS file.

ECGR - EDTC Corrected Gamma Ray
RHOB - Bulk Density
TNPH - Thermal Neutron Porosity ( Ratio Method ) in selected lithology
DTCO - Delta T Compressional

Visualising Logs

1# Plotting Logs 
2def plot_logs(well) : 
3
4    fig,ax = plt.subplots(nrows=1, ncols= len(columns_of_interest), figsize=(20,10))
5    colors = ['black', 'red', 'blue', 'green', 'purple', 'black', 'orange']
6
7    for i,log in enumerate(columns_of_interest) : 
8        ax[i].plot(well[log], well['DEPT'], color = colors[i])
9        ax[i].set_title(log)
10        ax[i].grid(True)
11
12    ax[2].set_xlim(0,300)
13    plt.tight_layout(1.1)
14    plt.show()
15
16for well in well_list : 
17    print('Well Name :', well)
18    well1 = las_df[well]
19    plot_logs(well1[columns_of_interest].reset_index())

1Well Name : Poseidon1

1Well Name : Kronos1

1Well Name : Boreas1

1Well Name : Proteus1

1Well Name : Pharos1

1Well Name : Torosa1L

1Well Name : PoseidonNorth1

1Well Name : Poseidon2

Data Preparation

1# restricting depth ranges per well to reduce the number of missing values.
2depth_ranges = {
3    'Pharos1' : [4000,5000],
4    'PoseidonNorth1' : [3800,4900],
5    'Torosa1L' : [3600, 4600],
6    'Poseidon2' : [4100, 5000],
7    'Proteus1' : [4950, 5200],
8    'Kronos1' : [4750,5000],
9    'Poseidon1' : [4600, 4950],
10    'Boreas1' : [4000,5100]
11    
12}
13
14for well in well_list : 
15    df = las_df[well]
16    df = df[columns_of_interest].reset_index()
17    print("\nWell Name : ", well)
18    print("Well Start : ", df['DEPT'].min() , '\t Well Stop :', df['DEPT'].max())
19    print("Missing Value % ", 100 * df.isnull().sum()/df.shape[0])
20    
21    depth_start = depth_ranges[well][0]
22    depth_stop = depth_ranges[well][-1]
23    
24    condition = (df['DEPT'] >= depth_start) & (df['DEPT'] <=depth_stop)
25    las_df[well] = df[condition]
26    
27    assert (las_df[well]['DEPT'].min() >= depth_start) and (las_df[well]['DEPT'].max() <= depth_stop) , 'Depths change error'
28    print("New Well Start : ", las_df[well]['DEPT'].min() , '\t New Well Stop :',las_df[well]['DEPT'].max(), '\t Datapoints : ', las_df[well].shape[0])
29    print("New Missing Value % ", 100 * las_df[well].isnull().sum()/las_df[well].shape[0])
30    print('\n--------------------------------------------------------')

1Well Name :  Poseidon1
2Well Start :  579.5      Well Stop : 5115.5
3Missing Value %  DEPT     0.000000
4RHOB    86.200816
5TNPH    85.781991
6ECGR    17.480436
7DTCO    87.556486
8dtype: float64
9New Well Start :  4600.0     New Well Stop : 4950.0      Datapoints :  701
10New Missing Value %  DEPT    0.0
11RHOB    0.0
12TNPH    0.0
13ECGR    0.0
14DTCO    0.0
15dtype: float64
16
17--------------------------------------------------------
18
19Well Name :  Kronos1
20Well Start :  531.0      Well Stop : 5322.5
21Missing Value %  DEPT     0.000000
22RHOB    74.989566
23TNPH    75.156511
24ECGR     2.493740
25DTCO    87.447830
26dtype: float64
27New Well Start :  4750.0     New Well Stop : 5000.0      Datapoints :  501
28New Missing Value %  DEPT    0.000000
29RHOB    6.187625
30TNPH    6.786427
31ECGR    0.000000
32DTCO    0.000000
33dtype: float64
34
35--------------------------------------------------------
36
37Well Name :  Boreas1
38Well Start :  200.0      Well Stop : 5205.5
39Missing Value %  DEPT     0.000000
40RHOB    76.568118
41TNPH    76.538154
42ECGR     3.016380
43DTCO    63.084299
44dtype: float64
45New Well Start :  4000.0     New Well Stop : 5100.0      Datapoints :  2201
46New Missing Value %  DEPT    0.000000
47RHOB    2.089959
48TNPH    1.862790
49ECGR    4.134484
50DTCO    1.135847
51dtype: float64
52
53--------------------------------------------------------
54
55Well Name :  Proteus1
56Well Start :  489.5      Well Stop : 5249.5
57Missing Value %  DEPT     0.000000
58RHOB    93.225502
59TNPH    92.826384
60ECGR     1.890558
61DTCO    52.221405
62dtype: float64
63New Well Start :  4950.0     New Well Stop : 5200.0      Datapoints :  501
64New Missing Value %  DEPT    0.0
65RHOB    0.0
66TNPH    0.0
67ECGR    0.0
68DTCO    0.0
69dtype: float64
70
71--------------------------------------------------------
72
73Well Name :  Pharos1
74Well Start :  482.0      Well Stop : 5225.5
75Missing Value %  DEPT     0.000000
76RHOB    75.231872
77TNPH    74.936762
78ECGR     0.221332
79DTCO    76.117201
80dtype: float64
81New Well Start :  4000.0     New Well Stop : 5000.0      Datapoints :  2001
82New Missing Value %  DEPT    0.000000
83RHOB    0.499750
84TNPH    0.549725
85ECGR    0.000000
86DTCO    5.247376
87dtype: float64
88
89--------------------------------------------------------
90
91Well Name :  Torosa1L
92Well Start :  555.0      Well Stop : 4683.0
93Missing Value %  DEPT     0.000000
94RHOB    73.610270
95TNPH    70.812644
96ECGR     0.181664
97DTCO     7.036454
98dtype: float64
99New Well Start :  3600.0     New Well Stop : 4600.0      Datapoints :  2001
100New Missing Value %  DEPT    0.0
101RHOB    0.0
102TNPH    0.0
103ECGR    0.0
104DTCO    0.0
105dtype: float64
106
107--------------------------------------------------------
108
109Well Name :  PoseidonNorth1
110Well Start :  508.0      Well Stop : 5279.5
111Missing Value %  DEPT     0.000000
112RHOB    75.901090
113TNPH    75.869656
114ECGR    18.692372
115DTCO    67.026404
116dtype: float64
117New Well Start :  3800.0     New Well Stop : 4900.0      Datapoints :  2201
118New Missing Value %  DEPT    0.0
119RHOB    0.0
120TNPH    0.0
121ECGR    0.0
122DTCO    0.0
123dtype: float64
124
125--------------------------------------------------------
126
127Well Name :  Poseidon2
128Well Start :  490.0      Well Stop : 5351.0
129Missing Value %  DEPT     0.000000
130RHOB    75.079708
131TNPH    74.154068
132ECGR     0.123419
133DTCO    41.571531
134dtype: float64
135New Well Start :  4100.0     New Well Stop : 5000.0      Datapoints :  1801
136New Missing Value %  DEPT    0.000000
137RHOB    0.000000
138TNPH    0.000000
139ECGR    0.000000
140DTCO    1.554692
141dtype: float64
142
143--------------------------------------------------------

1# Concatenating well data into one dataframe 
2# Adding a column called Well_name to distinguish data points of each well
3for well in well_list :     
4    las_df[well]['Well'] = well
5    
6well_df = pd.concat(las_df.values(), axis=0)
7well_df = well_df[columns_of_interest + ['Well']]
8well_df.head()

	RHOB	TNPH	ECGR	DTCO	Well
8041	2.5211	10.10	34.3997	70.4003	Poseidon1
8042	2.5112	12.40	42.8228	80.4120	Poseidon1
8043	2.5277	15.30	60.1452	82.5646	Poseidon1
8044	2.5127	12.28	36.4007	72.5366	Poseidon1
8045	2.5271	10.16	35.3279	70.7520	Poseidon1

Missing Value Treatment

1missing_values_df = pd.DataFrame( index = well_df.columns.values )
2
3
4for well_name in well_df['Well'].unique() : 
5    condition = well_df['Well'] == well_name 
6    subset = well_df[condition]
7    missing_values_df[well_name] =  100 * subset.isnull().sum()/subset.shape[0]
8missing_values_df = missing_values_df.T
9missing_values_df.sort_values(by=['DTCO'],ascending=[True]).style.bar()

	RHOB	TNPH	ECGR	DTCO	Well
Poseidon1	0.000000	0.000000	0.000000	0.000000	0.000000
Kronos1	6.187625	6.786427	0.000000	0.000000	0.000000
Proteus1	0.000000	0.000000	0.000000	0.000000	0.000000
Torosa1L	0.000000	0.000000	0.000000	0.000000	0.000000
PoseidonNorth1	0.000000	0.000000	0.000000	0.000000	0.000000
Boreas1	2.089959	1.862790	4.134484	1.135847	0.000000
Poseidon2	0.000000	0.000000	0.000000	1.554692	0.000000
Pharos1	0.499750	0.549725	0.000000	5.247376	0.000000

Train & Test split

1training_wells = ['PoseidonNorth1', 'Torosa1L','Boreas1', 'Poseidon2','Kronos1', 'Proteus1']
2testing_wells = [well for well in well_list if well not in training_wells]
3
4print('Training Wells : ',training_wells)
5print('Testing Wells : ' , testing_wells)
6
7train_dataset = pd.concat([las_df[well] for well in training_wells],axis=0)
8test_dataset = pd.concat([las_df[well] for well in testing_wells],axis=0)
9train_dataset.head()

1Training Wells :  ['PoseidonNorth1', 'Torosa1L', 'Boreas1', 'Poseidon2', 'Kronos1', 'Proteus1']
2Testing Wells :  ['Poseidon1', 'Pharos1']

	DEPT	RHOB	TNPH	ECGR	DTCO	Well
6584	3800.0	2.5913	35.5945	83.2622	98.2931	PoseidonNorth1
6585	3800.5	2.5680	34.2785	73.9849	98.1903	PoseidonNorth1
6586	3801.0	2.6482	29.6083	83.8913	98.1472	PoseidonNorth1
6587	3801.5	2.7102	34.2985	91.5737	95.7302	PoseidonNorth1
6588	3802.0	2.9027	34.7045	86.1709	95.8680	PoseidonNorth1

1train_dataset.shape

1(9206, 6)

1100 * train_dataset.isnull().sum() / train_dataset.shape[0]

1DEPT    0.000000
2RHOB    0.836411
3TNPH    0.814686
4ECGR    0.988486
5DTCO    0.575711
6Well    0.000000
7dtype: float64

1train_dataset.dropna(subset=columns_of_interest, inplace=True)
2test_dataset.dropna(subset=columns_of_interest, inplace=True)

Correlation

1# Pairplot 
2import seaborn as sns
3sns.set_style('whitegrid')
4sns.pairplot(data=train_dataset, vars = ['RHOB', 'TNPH','ECGR','DTCO'], diag_kind='kde')

1<seaborn.axisgrid.PairGrid at 0x7f511c422990>

1fig,ax = plt.subplots(figsize=(10,10))
2sns.heatmap(train_dataset[columns_of_interest].corr(method='spearman'), annot=True ,ax=ax, center=0)
3plt.title('Correlations')
4plt.show()

Normalization

1# move the depth column to the right
2depth_train, depth_pred = train_dataset.pop('DEPT'), test_dataset.pop('DEPT')
3train_dataset['DEPT'], test_dataset['DEPT'] = depth_train, depth_pred

1columns_of_interest

1['RHOB', 'TNPH', 'ECGR', 'DTCO']

1from sklearn.compose import ColumnTransformer
2from sklearn.preprocessing import PowerTransformer
3
4
5scaler = PowerTransformer(method='yeo-johnson')
6column_drop = ['WELL','DEPT']
7ct = ColumnTransformer([('transform', scaler, columns_of_interest)], remainder='passthrough')
8
9train_dataset_norm = ct.fit_transform(train_dataset)
10train_dataset_norm = pd.DataFrame(train_dataset_norm, columns = train_dataset.columns)
11train_dataset_norm

	RHOB	TNPH	ECGR	DTCO	Well	DEPT
0	0.194983	1.41744	0.288585	1.069585	PoseidonNorth1	3800.0
1	-0.024624	1.295228	0.0322	1.061787	PoseidonNorth1	3800.5
2	0.74861	0.851648	0.305598	1.058519	PoseidonNorth1	3801.0
3	1.380549	1.297094	0.509886	0.876203	PoseidonNorth1	3801.5
4	3.543468	1.334914	0.366876	0.886546	PoseidonNorth1	3802.0
...	...	...	...	...	...	...
8969	-3.48235	-0.712419	-1.350794	-1.010424	Proteus1	5198.0
8970	1.775529	-0.25349	0.321077	-1.018072	Proteus1	5198.5
8971	1.506393	-0.751591	-2.436269	-1.541108	Proteus1	5199.0
8972	2.664714	-2.337805	-2.461197	-1.597139	Proteus1	5199.5
8973	0.283779	-2.130702	-1.708333	-1.674272	Proteus1	5200.0

8974 rows × 6 columns

1# Pair Plot after normalization 
2
3sns.pairplot(data=train_dataset, vars = ['RHOB', 'TNPH','ECGR','DTCO'], diag_kind='kde')

1<seaborn.axisgrid.PairGrid at 0x7f5102e4e350>

Outlier Removal

1train_dataset_drop = train_dataset_norm.copy()
2train_dataset_drop = train_dataset_drop.drop(columns=['Well','DEPT'], )
3train_dataset_drop

	RHOB	TNPH	ECGR	DTCO
0	0.194983	1.41744	0.288585	1.069585
1	-0.024624	1.295228	0.0322	1.061787
2	0.74861	0.851648	0.305598	1.058519
3	1.380549	1.297094	0.509886	0.876203
4	3.543468	1.334914	0.366876	0.886546
...	...	...	...	...
8969	-3.48235	-0.712419	-1.350794	-1.010424
8970	1.775529	-0.25349	0.321077	-1.018072
8971	1.506393	-0.751591	-2.436269	-1.541108
8972	2.664714	-2.337805	-2.461197	-1.597139
8973	0.283779	-2.130702	-1.708333	-1.674272

8974 rows × 4 columns

1from sklearn.svm import OneClassSVM
2svm = OneClassSVM(nu=0.1)
3yhat = svm.fit_predict(train_dataset_drop)
4mask = yhat != -1
5train_dataset_svm = train_dataset_norm[mask]
6print('Number of points after outliers removed with One-class SVM     :', len(train_dataset_svm))

1Number of points after outliers removed with One-class SVM     : 8076

Modelling

1y_train = train_dataset_svm['DTCO']
2X_train = train_dataset_svm[['RHOB','TNPH','ECGR']]

1# Derived Features
2from scipy.ndimage import median_filter
3
4for col in X_train.columns : 
5    X_train[col + '_GRAD'] = np.gradient(X_train[col].values)
6    X_train[col + '_AVG'] = median_filter(X_train[col].values.tolist(),10,mode='nearest')

1X_train.columns

1Index(['RHOB', 'TNPH', 'ECGR', 'RHOB_GRAD', 'RHOB_AVG', 'TNPH_GRAD',
2       'TNPH_AVG', 'ECGR_GRAD', 'ECGR_AVG'],
3      dtype='object')

1# Random Forest
2from sklearn.ensemble import RandomForestRegressor 
3rf_model = RandomForestRegressor()
4
5rf_model.fit(X_train, y_train)
6print('Train Accuracy :', rf_model.score(X_train, y_train))
7
8# for well in test_wells : 
9#     X_test, y_test = process_test_set(test_dataset, well, scaler)
10#     print('Test Accuracy in ' + well + ' : ', rf_model.score(X_test_scaled, y_test))

1Train Accuracy : 0.9917603654437802

1# Gradient Boost
2from sklearn.ensemble import GradientBoostingRegressor
3gb_model = GradientBoostingRegressor()
4gb_model.fit(X_train, y_train)
5print('Train Accuracy :',gb_model.score(X_train, y_train))

1Train Accuracy : 0.9081315988009683

Hyper Parameter Tuning

1# estimator = GradientBoostingRegressor()
2
3# ## Hyperparameters
4# max_depth = [10, 20, 30,50,  100]
5# min_samples_leaf = [1, 2,3, 4]
6# min_samples_split = [2, 10]
7# n_estimators = [100,200,300]
8
9# param_grid = {'n_estimators': n_estimators,'max_depth': max_depth , 'min_samples_leaf' : min_samples_leaf, 'min_samples_split' : min_samples_split}
10
11# from sklearn.model_selection import RandomizedSearchCV
12# search = RandomizedSearchCV(estimator, param_grid , cv = 5, verbose=1,n_jobs=-1)
13# search.fit(X_train, y_train)
14# search.best_params_

1# estimator = RandomForestRegressor()
2# ## Hyperparameters
3# max_depth = [10, 20, 30,50,  100]
4# min_samples_leaf = [1, 2,3, 4]
5# min_samples_split = [2, 10]
6# n_estimators = [100,200,300]
7
8# param_grid = {'n_estimators': n_estimators,'max_depth': max_depth , 'min_samples_leaf' : min_samples_leaf, 'min_samples_split' : min_samples_split}
9
10# from sklearn.model_selection import RandomizedSearchCV
11# search = RandomizedSearchCV(estimator, param_grid , cv = 5, verbose=1,n_jobs=-1, n_iter=20)
12# search.fit(X_train, y_train)
13# search.best_params_

1#search.best_estimator_

1# New Gradient Boosting model with the best hyper parameters
2gb_model_tuned = GradientBoostingRegressor(max_depth =  20, n_estimators= 100,min_samples_leaf=4,min_samples_split=2)
3gb_model_tuned.fit(X_train, y_train)
4print('Train Accuracy :',gb_model_tuned.score(X_train, y_train))

1Train Accuracy : 0.9998761339778823

1rf_model_tuned = RandomForestRegressor(max_depth=50, min_samples_leaf=2, min_samples_split=10,
2                      n_estimators=200)
3rf_model_tuned.fit(X_train,y_train)

1RandomForestRegressor(max_depth=50, min_samples_leaf=2, min_samples_split=10,
2                      n_estimators=200)

Test Wells

1# normalisation
2test_dataset_norm = pd.DataFrame(ct.transform(test_dataset), columns = test_dataset.columns)

1test_dataset_norm

	RHOB	TNPH	ECGR	DTCO	Well	DEPT
0	-0.454429	-1.257473	-1.226668	-0.912366	Poseidon1	4600.0
1	-0.543094	-0.975991	-0.928349	-0.233124	Poseidon1	4600.5
2	-0.394922	-0.638131	-0.372358	-0.082209	Poseidon1	4601.0
3	-0.529706	-0.990353	-1.153705	-0.770638	Poseidon1	4601.5
4	-0.400345	-1.249951	-1.192646	-0.889156	Poseidon1	4602.0
...	...	...	...	...	...	...
2579	-0.847423	-1.802829	-2.111444	-1.249448	Pharos1	4998.0
2580	-0.406668	-1.645907	-1.818261	-1.338187	Pharos1	4998.5
2581	-1.341027	-1.628059	-1.834552	-1.27002	Pharos1	4999.0
2582	-2.260172	-2.131092	-2.367431	-1.088848	Pharos1	4999.5
2583	-2.370962	-2.157379	-2.45466	-0.973624	Pharos1	5000.0

2584 rows × 6 columns

1# outlier removal 
2test_dataset_drop = test_dataset_norm.copy()
3
4yhat = svm.predict(test_dataset_drop[['RHOB','TNPH','ECGR','DTCO']])
5mask = yhat != -1
6test_dataset_svm = test_dataset_norm[mask]
7test_dataset_svm_unscaled = test_dataset[mask]
8print('Number of points after outliers removed with One-class SVM     :', len(test_dataset_svm))

1Number of points after outliers removed with One-class SVM     : 2152

1X_test = []
2y_test = []
3
4for i, well in enumerate(testing_wells) : 
5    test = None
6    test = test_dataset_svm.loc[test_dataset_svm['Well'] == well]
7    
8    test = test.drop(columns=['Well','DEPT'])
9    
10    
11    # Derived Features
12
13    for col in ['RHOB','TNPH','ECGR'] : 
14        test[col + '_GRAD'] = np.gradient(test[col].values)
15        test[col + '_AVG'] = median_filter(test[col].values.tolist(),10,mode='nearest')
16    
17    y_test.append(test.pop('DTCO').values)
18    X_test.append(test.values)

1# predictions 
2from sklearn.metrics import mean_squared_error
3y_test_pred_gb = []
4y_test_pred_rf = []
5for i in range(len(X_test)) : 
6    y_test_pred_gb.append(gb_model_tuned.predict(X_test[i]))
7    y_test_pred_rf.append(rf_model_tuned.predict(X_test[i]))
8    print('\n------------------------')
9    print('Test Accuracy of Well using Gradient Boosting Model ', testing_wells[i], ' : ', gb_model_tuned.score(X_test[i], y_test[i]))
10    print('MSE of Well using Gradient Boosting Model ', testing_wells[i], ' : ', mean_squared_error(y_test[i], y_test_pred_gb[i]))
11    print('Test Accuracy of Well using RF Model ', testing_wells[i], ' : ', rf_model_tuned.score(X_test[i], y_test[i]))
12    print('MSE of Well using RF Model ', testing_wells[i], ' : ', mean_squared_error(y_test[i], y_test_pred_rf[i]))

1------------------------
2Test Accuracy of Well using Gradient Boosting Model  Poseidon1  :  0.7519203113851671
3MSE of Well using Gradient Boosting Model  Poseidon1  :  0.162851261183905
4Test Accuracy of Well using RF Model  Poseidon1  :  0.7812918613913131
5MSE of Well using RF Model  Poseidon1  :  0.14357038418775003
6
7------------------------
8Test Accuracy of Well using Gradient Boosting Model  Pharos1  :  0.7545701941806116
9MSE of Well using Gradient Boosting Model  Pharos1  :  0.09787985129034865
10Test Accuracy of Well using RF Model  Pharos1  :  0.7866187967475008
11MSE of Well using RF Model  Pharos1  :  0.08509854935011467

Using Artificial Neural Networks

1import tensorflow as tf
2tf.random.set_seed(42)
3import keras 
4from keras.models import Sequential
5from keras.layers import Dense, Dropout, Activation, BatchNormalization, Dropout
6from keras.callbacks import ModelCheckpoint

12021-07-11 19:25:27.814036: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
22021-07-11 19:25:27.814077: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

1# Over Parameterised ANN 
2model = Sequential([
3   Dense(128, activation='relu',kernel_initializer='normal', input_shape=(9,) ) , 
4    BatchNormalization(),
5    Dropout(0.10),
6
7    Dense(64, kernel_initializer='normal',activation='relu' ) , 
8    BatchNormalization(), 
9    Dropout(0.10),
10
11    Dense(32, kernel_initializer='normal',activation='relu' ) , 
12    BatchNormalization(),
13    Dropout(0.10),
14    
15    Dense(16, kernel_initializer='normal',activation='relu' ) , 
16    BatchNormalization(),
17    Dropout(0.10),
18    
19    Dense(8, kernel_initializer='normal',activation='relu' ) , 
20    BatchNormalization(),
21    Dropout(0.10),
22
23    Dense(4, kernel_initializer='normal',activation='relu' ) ,
24    BatchNormalization(),
25    Dropout(0.10),
26
27    Dense(units=1, kernel_initializer='normal') ]
28    
29)
30
31
32
33model.compile(optimizer='adam', loss='mean_squared_error')
34print(model.summary())
35model_checkpoint = ModelCheckpoint('best_model',monitor='val_loss', mode = 'min', save_best_only=True, verbose=1)
36
37def scheduler(epoch, lr):
38    if epoch <=10 : 
39        return lr 
40    else : 
41        return lr * tf.math.exp(-0.1 * epoch)
42    
43lr_callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
44
45X_train_tf = tf.convert_to_tensor(X_train.values, np.float32)
46y_train_tf = tf.convert_to_tensor(y_train.values, np.float32)

12021-07-11 19:25:32.081060: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
22021-07-11 19:25:32.081302: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
32021-07-11 19:25:32.081316: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
42021-07-11 19:25:32.081333: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (p-fbd0b10b-3636-4896-bb87-fbf506835cad): /proc/driver/nvidia/version does not exist
52021-07-11 19:25:32.081543: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
6To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
72021-07-11 19:25:32.081688: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
8Model: "sequential"
9_________________________________________________________________
10Layer (type)                 Output Shape              Param #   
11=================================================================
12dense (Dense)                (None, 128)               1280      
13_________________________________________________________________
14batch_normalization (BatchNo (None, 128)               512       
15_________________________________________________________________
16dropout (Dropout)            (None, 128)               0         
17_________________________________________________________________
18dense_1 (Dense)              (None, 64)                8256      
19_________________________________________________________________
20batch_normalization_1 (Batch (None, 64)                256       
21_________________________________________________________________
22dropout_1 (Dropout)          (None, 64)                0         
23_________________________________________________________________
24dense_2 (Dense)              (None, 32)                2080      
25_________________________________________________________________
26batch_normalization_2 (Batch (None, 32)                128       
27_________________________________________________________________
28dropout_2 (Dropout)          (None, 32)                0         
29_________________________________________________________________
30dense_3 (Dense)              (None, 16)                528       
31_________________________________________________________________
32batch_normalization_3 (Batch (None, 16)                64        
33_________________________________________________________________
34dropout_3 (Dropout)          (None, 16)                0         
35_________________________________________________________________
36dense_4 (Dense)              (None, 8)                 136       
37_________________________________________________________________
38batch_normalization_4 (Batch (None, 8)                 32        
39_________________________________________________________________
40dropout_4 (Dropout)          (None, 8)                 0         
41_________________________________________________________________
42dense_5 (Dense)              (None, 4)                 36        
43_________________________________________________________________
44batch_normalization_5 (Batch (None, 4)                 16        
45_________________________________________________________________
46dropout_5 (Dropout)          (None, 4)                 0         
47_________________________________________________________________
48dense_6 (Dense)              (None, 1)                 5         
49=================================================================
50Total params: 13,329
51Trainable params: 12,825
52Non-trainable params: 504
53_________________________________________________________________
54None

1history = model.fit(X_train_tf, y_train_tf, batch_size=00, validation_split=0.3, epochs=40,callbacks=[model_checkpoint, lr_callback])

1Epoch 1/40
22021-07-11 19:25:32.416134: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
32021-07-11 19:25:32.416620: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2499995000 Hz
4177/177 [==============================] - 2s 6ms/step - loss: 0.6706 - val_loss: 1.0109
5
6Epoch 00001: val_loss improved from inf to 1.01087, saving model to best_model
72021-07-11 19:25:35.935800: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
8INFO:tensorflow:Assets written to: best_model/assets
9Epoch 2/40
10177/177 [==============================] - 1s 3ms/step - loss: 0.2181 - val_loss: 0.6717
11
12Epoch 00002: val_loss improved from 1.01087 to 0.67171, saving model to best_model
13INFO:tensorflow:Assets written to: best_model/assets
14Epoch 3/40
15177/177 [==============================] - 1s 3ms/step - loss: 0.1860 - val_loss: 0.2183
16
17Epoch 00003: val_loss improved from 0.67171 to 0.21830, saving model to best_model
18INFO:tensorflow:Assets written to: best_model/assets
19Epoch 4/40
20177/177 [==============================] - 1s 3ms/step - loss: 0.1927 - val_loss: 0.1277
21
22Epoch 00004: val_loss improved from 0.21830 to 0.12774, saving model to best_model
23INFO:tensorflow:Assets written to: best_model/assets
24Epoch 5/40
25177/177 [==============================] - 1s 4ms/step - loss: 0.1761 - val_loss: 0.1395
26
27Epoch 00005: val_loss did not improve from 0.12774
28Epoch 6/40
29177/177 [==============================] - 1s 3ms/step - loss: 0.1792 - val_loss: 0.1409
30
31Epoch 00006: val_loss did not improve from 0.12774
32Epoch 7/40
33177/177 [==============================] - 1s 3ms/step - loss: 0.1682 - val_loss: 0.1790
34
35Epoch 00007: val_loss did not improve from 0.12774
36Epoch 8/40
37177/177 [==============================] - 1s 3ms/step - loss: 0.1706 - val_loss: 0.1403
38
39Epoch 00008: val_loss did not improve from 0.12774
40Epoch 9/40
41177/177 [==============================] - 1s 3ms/step - loss: 0.1751 - val_loss: 0.1392
42
43Epoch 00009: val_loss did not improve from 0.12774
44Epoch 10/40
45177/177 [==============================] - 1s 3ms/step - loss: 0.1725 - val_loss: 0.1580
46
47Epoch 00010: val_loss did not improve from 0.12774
48Epoch 11/40
49177/177 [==============================] - 1s 3ms/step - loss: 0.1775 - val_loss: 0.1340
50
51Epoch 00011: val_loss did not improve from 0.12774
52Epoch 12/40
53177/177 [==============================] - 1s 3ms/step - loss: 0.1675 - val_loss: 0.1367
54
55Epoch 00012: val_loss did not improve from 0.12774
56Epoch 13/40
57177/177 [==============================] - 1s 3ms/step - loss: 0.1543 - val_loss: 0.1406
58
59Epoch 00013: val_loss did not improve from 0.12774
60Epoch 14/40
61177/177 [==============================] - 1s 3ms/step - loss: 0.1471 - val_loss: 0.1436
62
63Epoch 00014: val_loss did not improve from 0.12774
64Epoch 15/40
65177/177 [==============================] - 1s 4ms/step - loss: 0.1496 - val_loss: 0.1432
66
67Epoch 00015: val_loss did not improve from 0.12774
68Epoch 16/40
69177/177 [==============================] - 1s 3ms/step - loss: 0.1599 - val_loss: 0.1423
70
71Epoch 00016: val_loss did not improve from 0.12774
72Epoch 17/40
73177/177 [==============================] - 1s 3ms/step - loss: 0.1613 - val_loss: 0.1413
74
75Epoch 00017: val_loss did not improve from 0.12774
76Epoch 18/40
77177/177 [==============================] - 1s 3ms/step - loss: 0.1445 - val_loss: 0.1421
78
79Epoch 00018: val_loss did not improve from 0.12774
80Epoch 19/40
81177/177 [==============================] - 1s 3ms/step - loss: 0.1512 - val_loss: 0.1420
82
83Epoch 00019: val_loss did not improve from 0.12774
84Epoch 20/40
85177/177 [==============================] - 1s 3ms/step - loss: 0.1469 - val_loss: 0.1416
86
87Epoch 00020: val_loss did not improve from 0.12774
88Epoch 21/40
89177/177 [==============================] - 1s 3ms/step - loss: 0.1556 - val_loss: 0.1422
90
91Epoch 00021: val_loss did not improve from 0.12774
92Epoch 22/40
93177/177 [==============================] - 1s 3ms/step - loss: 0.1545 - val_loss: 0.1425
94
95Epoch 00022: val_loss did not improve from 0.12774
96Epoch 23/40
97177/177 [==============================] - 1s 3ms/step - loss: 0.1496 - val_loss: 0.1426
98
99Epoch 00023: val_loss did not improve from 0.12774
100Epoch 24/40
101177/177 [==============================] - 1s 3ms/step - loss: 0.1540 - val_loss: 0.1418
102
103Epoch 00024: val_loss did not improve from 0.12774
104Epoch 25/40
105177/177 [==============================] - 1s 3ms/step - loss: 0.1472 - val_loss: 0.1407
106
107Epoch 00025: val_loss did not improve from 0.12774
108Epoch 26/40
109177/177 [==============================] - 1s 3ms/step - loss: 0.1550 - val_loss: 0.1411
110
111Epoch 00026: val_loss did not improve from 0.12774
112Epoch 27/40
113177/177 [==============================] - 1s 3ms/step - loss: 0.1572 - val_loss: 0.1428
114
115Epoch 00027: val_loss did not improve from 0.12774
116Epoch 28/40
117177/177 [==============================] - 1s 3ms/step - loss: 0.1663 - val_loss: 0.1426
118
119Epoch 00028: val_loss did not improve from 0.12774
120Epoch 29/40
121177/177 [==============================] - 1s 3ms/step - loss: 0.1568 - val_loss: 0.1418
122
123Epoch 00029: val_loss did not improve from 0.12774
124Epoch 30/40
125177/177 [==============================] - 1s 3ms/step - loss: 0.1565 - val_loss: 0.1426
126
127Epoch 00030: val_loss did not improve from 0.12774
128Epoch 31/40
129177/177 [==============================] - 1s 3ms/step - loss: 0.1573 - val_loss: 0.1442
130
131Epoch 00031: val_loss did not improve from 0.12774
132Epoch 32/40
133177/177 [==============================] - 1s 3ms/step - loss: 0.1512 - val_loss: 0.1425
134
135Epoch 00032: val_loss did not improve from 0.12774
136Epoch 33/40
137177/177 [==============================] - 1s 4ms/step - loss: 0.1498 - val_loss: 0.1419
138
139Epoch 00033: val_loss did not improve from 0.12774
140Epoch 34/40
141177/177 [==============================] - 1s 3ms/step - loss: 0.1529 - val_loss: 0.1418
142
143Epoch 00034: val_loss did not improve from 0.12774
144Epoch 35/40
145177/177 [==============================] - 1s 3ms/step - loss: 0.1542 - val_loss: 0.1419
146
147Epoch 00035: val_loss did not improve from 0.12774
148Epoch 36/40
149177/177 [==============================] - 1s 3ms/step - loss: 0.1591 - val_loss: 0.1432
150
151Epoch 00036: val_loss did not improve from 0.12774
152Epoch 37/40
153177/177 [==============================] - 1s 3ms/step - loss: 0.1611 - val_loss: 0.1430
154
155Epoch 00037: val_loss did not improve from 0.12774
156Epoch 38/40
157177/177 [==============================] - 1s 3ms/step - loss: 0.1492 - val_loss: 0.1423
158
159Epoch 00038: val_loss did not improve from 0.12774
160Epoch 39/40
161177/177 [==============================] - 1s 3ms/step - loss: 0.1604 - val_loss: 0.1420
162
163Epoch 00039: val_loss did not improve from 0.12774
164Epoch 40/40
165177/177 [==============================] - 1s 3ms/step - loss: 0.1460 - val_loss: 0.1428
166
167Epoch 00040: val_loss did not improve from 0.12774

1def plot_model_history(history):
2    fig, axes = plt.subplots(nrows=1, ncols=1)
3    axes.plot(history.history['loss'])   
4    axes.plot(history.history['val_loss'])
5    axes.legend(['loss','val_loss'])
6
7plot_model_history(history)

1y_test_pred_nn = []
2
3for i in range(len(X_test)) : 
4    print('Test Well : ',testing_wells[i] )
5    X_test_tf = tf.convert_to_tensor(X_test[i], np.float32)
6    y_test_tf =tf.convert_to_tensor(y_test[i], np.float32)
7    y_test_pred_nn.append(model.predict(X_test_tf))
8    print('MSE :', model.evaluate(X_test_tf, y_test_tf))
9    print('\n---------------------\n')

1Test Well :  Poseidon1
216/16 [==============================] - 0s 1ms/step - loss: 0.1655
3MSE : 0.16550424695014954
4
5---------------------
6
7Test Well :  Pharos1
852/52 [==============================] - 0s 883us/step - loss: 0.0743
9MSE : 0.07427336275577545
10
11---------------------

1# 
2for i in range(len(testing_wells)) :
3    preds = y_test_pred_nn[i]
4    rows = preds.shape[0]
5
6    df = pd.DataFrame(columns = columns_of_interest, index = range(rows))
7    for j in range(len(columns_of_interest)-1) : 
8        df[:][j] = 0
9    df['DTCO'] = preds
10
11    rescaled_df = pd.DataFrame(ct.named_transformers_.transform.inverse_transform(df) , columns = columns_of_interest)
12
13
14    condition = test_dataset_svm_unscaled['Well'] == testing_wells[i] 
15
16    test_dataset_svm_unscaled.loc[condition, 'DTCO_Predicted'] = rescaled_df['DTCO'].values

1test_dataset_svm_unscaled

	RHOB	TNPH	ECGR	DTCO	Well	DEPT	DTCO_Predicted
8041	2.5211	10.1000	34.3997	70.4003	Poseidon1	4600.0	73.254893
8042	2.5112	12.4000	42.8228	80.4120	Poseidon1	4600.5	72.147674
8043	2.5277	15.3000	60.1452	82.5646	Poseidon1	4601.0	75.254447
8044	2.5127	12.2800	36.4007	72.5366	Poseidon1	4601.5	72.460500
8045	2.5271	10.1600	35.3279	70.7520	Poseidon1	4602.0	76.343241
...	...	...	...	...	...	...	...
9030	2.4565	6.8472	16.4082	64.9338	Pharos1	4997.0	65.889312
9031	2.4493	6.0554	13.1265	64.0184	Pharos1	4997.5	65.930052
9032	2.4766	5.9867	13.4912	65.2055	Pharos1	4998.0	66.617091
9033	2.5264	7.1198	19.6898	63.8091	Pharos1	4998.5	66.574747
9034	2.4183	7.2514	19.3252	64.8829	Pharos1	4999.0	66.289676

2152 rows × 7 columns

Plotting Results

1# Plotting Results
2def plot_logs(well) : 
3    columns = columns_of_interest + ['DTCO_Predicted']
4    fig,ax = plt.subplots(nrows=1, ncols= len(columns), figsize=(20,10))
5    colors = ['black', 'red', 'blue', 'green', 'purple', 'black', 'orange']
6
7    for i,log in enumerate(columns) : 
8        if log not in ['DEPT','Well'] : 
9            ax[i].plot(well[log], well['DEPT'], color = colors[i])
10            ax[i].set_title(log)
11            ax[i].grid(True)
12
13    ax[2].set_xlim(0,300)
14    plt.tight_layout(1.1)
15    plt.show()
16
17for well in testing_wells : 
18    print('Well Name :', well)
19    condition = test_dataset_svm_unscaled['Well'] == well 
20    
21    plot_logs(test_dataset_svm_unscaled[condition])