Navigate back to the homepage

Sonic Log Prediction

Jayanth Boddu
July 14th, 2021 · 1 min read

Photo by Pawel Czerwinski on Unsplash

Sonic Log Prediction

Summary

Sonic logs are required for well-seismic ties, an essential step in the exploration workflow. However, due to various resource constraints, sonic logs might not be recorded in several wells in a field. On the other hand, it is seen that basic logs like Gamma Ray, Neutron Porosity, Bulk Density, Resistivity are almost always recorded in all wells. Geoscientists approximate missing sonic logs from sonic logs of nearby wells using empirical techniques. These techniques might not generalise to wells other than the well in question. There is also the consistency problem.

This analysis uses a data driven approach to predict missing sonic logs on a field scale. This might help predict sonic logs for all wells in a field , in turn vastly improving well seismic ties, seismic resolution , depth migration etc. This Analysis predicts Delta-T compressional from GR, RHOB, NPHI using Machine learning based algorithms. The dataset consists of 8 wells from Poseidon field of North West Australian Offshore, obtained from Data Underground. LAS files containing spliced well logs sampled at 0.5m interval are used for the analysis ( Spliced logs provided by Occam Technologies ).

3 Models have been trained, one each using Support Vector Machines, Gradient Boosting & Artificial Neural Networks.

The model is trained on 6 out of 8 wells and validated on 2 wells.

The analysis shows promising results with the mean square error obtained on validation wells being 0.165 & 0.07. Further, predicted sonic logs have high negative correlation with bulk density as dictated by physics.

Importing Data

1import numpy as np
2import pandas as pd
3import glob
4import lasio as ls
5import matplotlib.pyplot as plt
6import warnings
7warnings.filterwarnings('ignore')
1log_list = glob.glob('Poseidon_logs/*.LAS')
2log_list
1['Poseidon_logs/Poseidon1Decim.LAS',
2 'Poseidon_logs/Kronos1Decim.LAS',
3 'Poseidon_logs/Boreas1Decim.LAS',
4 'Poseidon_logs/Proteus1Decim.LAS',
5 'Poseidon_logs/Pharos1Decim.LAS',
6 'Poseidon_logs/Torosa1LDecim.LAS',
7 'Poseidon_logs/PoseidonNorth1Decim.LAS',
8 'Poseidon_logs/Poseidon2Decim.LAS']
1# Predictors
2columns_of_interest = ['RHOB', 'TNPH','ECGR','DTCO']
3mnemonic_replacements = {
4 'NPHI' : 'TNPH' ,
5 'HROM' : 'RHOB',
6 'GR' : 'ECGR',
7 'GRD' : 'ECGR',
8 'TNP' : 'TNPH',
9 'RHOZ' : 'RHOB',
10 'HTNP' : 'TNPH',
11 'BATC' : 'DTCO',
12 'GRARC' : 'ECGR',
13 'DTC' : 'DTCO'
14
15}
1# Reading LAS files
2
3las_data = {}
4las_df = {}
5well_list = []
6
7for las_file in log_list :
8 well_name = las_file.split('Decim.LAS')[0].split('/')[1]
9
10 well_list.append(well_name)
11
12 las = ls.read(las_file)
13
14 # saving las in las_data
15
16 las_data[well_name] = las
17
18 df = las.df()
19
20 # mnemonic replacements
21 for column in df.columns.values :
22 if column in mnemonic_replacements.keys() :
23 df = df.rename(columns = {column : mnemonic_replacements[column]})
24
25
26 las_df[well_name] = df
27 print('\nWell Name : ', well_name )
28 print(df.head())
29 print("No of datapoints :",df.shape[0])
30print(well_list)
1Well Name : Poseidon1
2 ECGR ATRX ATRT TNPH CAL1 HDAR RHOB DTSM DTCO
3DEPT
4579.5 NaN 0.6000 0.6700 NaN NaN NaN NaN NaN NaN
5580.0 NaN 0.6700 0.7000 NaN NaN NaN NaN NaN NaN
6580.5 NaN 0.6200 0.6700 NaN NaN NaN NaN NaN NaN
7581.0 NaN 0.5500 0.6400 NaN NaN NaN NaN NaN NaN
8581.5 NaN 0.5598 0.6295 NaN NaN NaN NaN NaN NaN
9No of datapoints : 9073
10
11Well Name : Kronos1
12 ECGR RS RD TNPH RHOB DCAV DTCO DTSM
13DEPT
14531.0 0.5559 NaN NaN NaN NaN NaN NaN NaN
15531.5 0.9649 NaN NaN NaN NaN NaN NaN NaN
16532.0 0.7309 NaN NaN NaN NaN NaN NaN NaN
17532.5 4.3208 NaN NaN NaN NaN NaN NaN NaN
18533.0 -0.0001 NaN NaN NaN NaN NaN NaN NaN
19No of datapoints : 9584
20
21Well Name : Boreas1
22 ECGR RS RD RHOB TNPH DTCO DTSM HDAR
23DEPT
24200.0 3.8287 NaN NaN NaN NaN NaN NaN NaN
25200.5 3.8287 NaN NaN NaN NaN NaN NaN NaN
26201.0 3.1437 NaN NaN NaN NaN NaN NaN NaN
27201.5 1.4981 NaN NaN NaN NaN NaN NaN NaN
28202.0 3.9696 NaN NaN NaN NaN NaN NaN NaN
29No of datapoints : 10012
30
31Well Name : Proteus1
32 ECGR RS RD DTCO DTSM HDAR RHOB TNPH
33DEPT
34489.5 16.2084 NaN NaN NaN NaN NaN NaN NaN
35490.0 20.6540 NaN NaN NaN NaN NaN NaN NaN
36490.5 19.2559 NaN NaN NaN NaN NaN NaN NaN
37491.0 15.8395 NaN NaN NaN NaN NaN NaN NaN
38491.5 14.7239 NaN NaN NaN NaN NaN NaN NaN
39No of datapoints : 9521
40
41Well Name : Pharos1
42 RHOB TNPH ECGR P16H P34H DTCO DTS
43DEPT
44482.0 NaN NaN 48.9556 NaN NaN NaN NaN
45482.5 NaN NaN 54.2686 NaN NaN NaN NaN
46483.0 NaN NaN 56.0523 NaN NaN NaN NaN
47483.5 NaN NaN 53.8891 NaN NaN NaN NaN
48484.0 NaN NaN 67.2855 NaN NaN NaN NaN
49No of datapoints : 9488
50
51Well Name : Torosa1L
52 ECGR RS RD RHOB TNPH HDAR DTCO DTS
53DEPT
54555.0 12.3003 0.6826 0.7422 NaN NaN NaN NaN NaN
55555.5 24.5442 0.4900 0.7223 NaN NaN NaN NaN NaN
56556.0 25.4925 0.4401 0.4498 NaN NaN NaN NaN NaN
57556.5 25.2527 0.4874 0.7347 NaN NaN NaN NaN NaN
58557.0 28.6476 0.4607 0.6694 NaN NaN NaN NaN NaN
59No of datapoints : 8257
60
61Well Name : PoseidonNorth1
62 ECGR RS RD RHOB TNPH DTCO DTSM HDAR
63DEPT
64508.0 24.4403 NaN NaN NaN NaN NaN NaN NaN
65508.5 29.6471 NaN NaN NaN NaN NaN NaN NaN
66509.0 36.3076 NaN NaN NaN NaN NaN NaN NaN
67509.5 56.3806 NaN NaN NaN NaN NaN NaN NaN
68510.0 54.7321 NaN NaN NaN NaN NaN NaN NaN
69No of datapoints : 9544
70
71Well Name : Poseidon2
72 ECGR RS RD DTCO DTSM DCAV RHOB TNPH
73DEPT
74490.0 17.6006 0.6898 0.3094 NaN NaN NaN NaN NaN
75490.5 17.6006 0.6898 0.3094 NaN NaN NaN NaN NaN
76491.0 17.6006 0.6898 0.3094 NaN NaN NaN NaN NaN
77491.5 16.7792 0.6959 0.3141 NaN NaN NaN NaN NaN
78492.0 15.9579 0.6959 0.3141 NaN NaN NaN NaN NaN
79No of datapoints : 9723
80['Poseidon1', 'Kronos1', 'Boreas1', 'Proteus1', 'Pharos1', 'Torosa1L', 'PoseidonNorth1', 'Poseidon2']

The well names are : ‘Boreas1’, ‘Poseidon1’, ‘Kronos1’, ‘Proteus1’, ‘Poseidon2’, ‘Torosa1L’, ‘PoseidonNorth1’, ‘Pharos1’.
The following curves are extracted from each LAS file.

  • ECGR - EDTC Corrected Gamma Ray
  • RHOB - Bulk Density
  • TNPH - Thermal Neutron Porosity ( Ratio Method ) in selected lithology
  • DTCO - Delta T Compressional

Visualising Logs

1# Plotting Logs
2def plot_logs(well) :
3
4 fig,ax = plt.subplots(nrows=1, ncols= len(columns_of_interest), figsize=(20,10))
5 colors = ['black', 'red', 'blue', 'green', 'purple', 'black', 'orange']
6
7 for i,log in enumerate(columns_of_interest) :
8 ax[i].plot(well[log], well['DEPT'], color = colors[i])
9 ax[i].set_title(log)
10 ax[i].grid(True)
11
12 ax[2].set_xlim(0,300)
13 plt.tight_layout(1.1)
14 plt.show()
15
16for well in well_list :
17 print('Well Name :', well)
18 well1 = las_df[well]
19 plot_logs(well1[columns_of_interest].reset_index())
1Well Name : Poseidon1

png

1Well Name : Kronos1

png

1Well Name : Boreas1

png

1Well Name : Proteus1

png

1Well Name : Pharos1

png

1Well Name : Torosa1L

png

1Well Name : PoseidonNorth1

png

1Well Name : Poseidon2

png

Data Preparation

1# restricting depth ranges per well to reduce the number of missing values.
2depth_ranges = {
3 'Pharos1' : [4000,5000],
4 'PoseidonNorth1' : [3800,4900],
5 'Torosa1L' : [3600, 4600],
6 'Poseidon2' : [4100, 5000],
7 'Proteus1' : [4950, 5200],
8 'Kronos1' : [4750,5000],
9 'Poseidon1' : [4600, 4950],
10 'Boreas1' : [4000,5100]
11
12}
13
14for well in well_list :
15 df = las_df[well]
16 df = df[columns_of_interest].reset_index()
17 print("\nWell Name : ", well)
18 print("Well Start : ", df['DEPT'].min() , '\t Well Stop :', df['DEPT'].max())
19 print("Missing Value % ", 100 * df.isnull().sum()/df.shape[0])
20
21 depth_start = depth_ranges[well][0]
22 depth_stop = depth_ranges[well][-1]
23
24 condition = (df['DEPT'] >= depth_start) & (df['DEPT'] <=depth_stop)
25 las_df[well] = df[condition]
26
27 assert (las_df[well]['DEPT'].min() >= depth_start) and (las_df[well]['DEPT'].max() <= depth_stop) , 'Depths change error'
28 print("New Well Start : ", las_df[well]['DEPT'].min() , '\t New Well Stop :',las_df[well]['DEPT'].max(), '\t Datapoints : ', las_df[well].shape[0])
29 print("New Missing Value % ", 100 * las_df[well].isnull().sum()/las_df[well].shape[0])
30 print('\n--------------------------------------------------------')
1Well Name : Poseidon1
2Well Start : 579.5 Well Stop : 5115.5
3Missing Value % DEPT 0.000000
4RHOB 86.200816
5TNPH 85.781991
6ECGR 17.480436
7DTCO 87.556486
8dtype: float64
9New Well Start : 4600.0 New Well Stop : 4950.0 Datapoints : 701
10New Missing Value % DEPT 0.0
11RHOB 0.0
12TNPH 0.0
13ECGR 0.0
14DTCO 0.0
15dtype: float64
16
17--------------------------------------------------------
18
19Well Name : Kronos1
20Well Start : 531.0 Well Stop : 5322.5
21Missing Value % DEPT 0.000000
22RHOB 74.989566
23TNPH 75.156511
24ECGR 2.493740
25DTCO 87.447830
26dtype: float64
27New Well Start : 4750.0 New Well Stop : 5000.0 Datapoints : 501
28New Missing Value % DEPT 0.000000
29RHOB 6.187625
30TNPH 6.786427
31ECGR 0.000000
32DTCO 0.000000
33dtype: float64
34
35--------------------------------------------------------
36
37Well Name : Boreas1
38Well Start : 200.0 Well Stop : 5205.5
39Missing Value % DEPT 0.000000
40RHOB 76.568118
41TNPH 76.538154
42ECGR 3.016380
43DTCO 63.084299
44dtype: float64
45New Well Start : 4000.0 New Well Stop : 5100.0 Datapoints : 2201
46New Missing Value % DEPT 0.000000
47RHOB 2.089959
48TNPH 1.862790
49ECGR 4.134484
50DTCO 1.135847
51dtype: float64
52
53--------------------------------------------------------
54
55Well Name : Proteus1
56Well Start : 489.5 Well Stop : 5249.5
57Missing Value % DEPT 0.000000
58RHOB 93.225502
59TNPH 92.826384
60ECGR 1.890558
61DTCO 52.221405
62dtype: float64
63New Well Start : 4950.0 New Well Stop : 5200.0 Datapoints : 501
64New Missing Value % DEPT 0.0
65RHOB 0.0
66TNPH 0.0
67ECGR 0.0
68DTCO 0.0
69dtype: float64
70
71--------------------------------------------------------
72
73Well Name : Pharos1
74Well Start : 482.0 Well Stop : 5225.5
75Missing Value % DEPT 0.000000
76RHOB 75.231872
77TNPH 74.936762
78ECGR 0.221332
79DTCO 76.117201
80dtype: float64
81New Well Start : 4000.0 New Well Stop : 5000.0 Datapoints : 2001
82New Missing Value % DEPT 0.000000
83RHOB 0.499750
84TNPH 0.549725
85ECGR 0.000000
86DTCO 5.247376
87dtype: float64
88
89--------------------------------------------------------
90
91Well Name : Torosa1L
92Well Start : 555.0 Well Stop : 4683.0
93Missing Value % DEPT 0.000000
94RHOB 73.610270
95TNPH 70.812644
96ECGR 0.181664
97DTCO 7.036454
98dtype: float64
99New Well Start : 3600.0 New Well Stop : 4600.0 Datapoints : 2001
100New Missing Value % DEPT 0.0
101RHOB 0.0
102TNPH 0.0
103ECGR 0.0
104DTCO 0.0
105dtype: float64
106
107--------------------------------------------------------
108
109Well Name : PoseidonNorth1
110Well Start : 508.0 Well Stop : 5279.5
111Missing Value % DEPT 0.000000
112RHOB 75.901090
113TNPH 75.869656
114ECGR 18.692372
115DTCO 67.026404
116dtype: float64
117New Well Start : 3800.0 New Well Stop : 4900.0 Datapoints : 2201
118New Missing Value % DEPT 0.0
119RHOB 0.0
120TNPH 0.0
121ECGR 0.0
122DTCO 0.0
123dtype: float64
124
125--------------------------------------------------------
126
127Well Name : Poseidon2
128Well Start : 490.0 Well Stop : 5351.0
129Missing Value % DEPT 0.000000
130RHOB 75.079708
131TNPH 74.154068
132ECGR 0.123419
133DTCO 41.571531
134dtype: float64
135New Well Start : 4100.0 New Well Stop : 5000.0 Datapoints : 1801
136New Missing Value % DEPT 0.000000
137RHOB 0.000000
138TNPH 0.000000
139ECGR 0.000000
140DTCO 1.554692
141dtype: float64
142
143--------------------------------------------------------
1# Concatenating well data into one dataframe
2# Adding a column called Well_name to distinguish data points of each well
3for well in well_list :
4 las_df[well]['Well'] = well
5
6well_df = pd.concat(las_df.values(), axis=0)
7well_df = well_df[columns_of_interest + ['Well']]
8well_df.head()
RHOBTNPHECGRDTCOWell
80412.521110.1034.399770.4003Poseidon1
80422.511212.4042.822880.4120Poseidon1
80432.527715.3060.145282.5646Poseidon1
80442.512712.2836.400772.5366Poseidon1
80452.527110.1635.327970.7520Poseidon1

Missing Value Treatment

1missing_values_df = pd.DataFrame( index = well_df.columns.values )
2
3
4for well_name in well_df['Well'].unique() :
5 condition = well_df['Well'] == well_name
6 subset = well_df[condition]
7 missing_values_df[well_name] = 100 * subset.isnull().sum()/subset.shape[0]
8missing_values_df = missing_values_df.T
9missing_values_df.sort_values(by=['DTCO'],ascending=[True]).style.bar()
RHOB TNPH ECGR DTCO Well
Poseidon10.0000000.0000000.0000000.0000000.000000
Kronos16.1876256.7864270.0000000.0000000.000000
Proteus10.0000000.0000000.0000000.0000000.000000
Torosa1L0.0000000.0000000.0000000.0000000.000000
PoseidonNorth10.0000000.0000000.0000000.0000000.000000
Boreas12.0899591.8627904.1344841.1358470.000000
Poseidon20.0000000.0000000.0000001.5546920.000000
Pharos10.4997500.5497250.0000005.2473760.000000

Train & Test split

1training_wells = ['PoseidonNorth1', 'Torosa1L','Boreas1', 'Poseidon2','Kronos1', 'Proteus1']
2testing_wells = [well for well in well_list if well not in training_wells]
3
4print('Training Wells : ',training_wells)
5print('Testing Wells : ' , testing_wells)
6
7train_dataset = pd.concat([las_df[well] for well in training_wells],axis=0)
8test_dataset = pd.concat([las_df[well] for well in testing_wells],axis=0)
9train_dataset.head()
1Training Wells : ['PoseidonNorth1', 'Torosa1L', 'Boreas1', 'Poseidon2', 'Kronos1', 'Proteus1']
2Testing Wells : ['Poseidon1', 'Pharos1']
DEPTRHOBTNPHECGRDTCOWell
65843800.02.591335.594583.262298.2931PoseidonNorth1
65853800.52.568034.278573.984998.1903PoseidonNorth1
65863801.02.648229.608383.891398.1472PoseidonNorth1
65873801.52.710234.298591.573795.7302PoseidonNorth1
65883802.02.902734.704586.170995.8680PoseidonNorth1
1train_dataset.shape
1(9206, 6)
1100 * train_dataset.isnull().sum() / train_dataset.shape[0]
1DEPT 0.000000
2RHOB 0.836411
3TNPH 0.814686
4ECGR 0.988486
5DTCO 0.575711
6Well 0.000000
7dtype: float64
1train_dataset.dropna(subset=columns_of_interest, inplace=True)
2test_dataset.dropna(subset=columns_of_interest, inplace=True)

Correlation

1# Pairplot
2import seaborn as sns
3sns.set_style('whitegrid')
4sns.pairplot(data=train_dataset, vars = ['RHOB', 'TNPH','ECGR','DTCO'], diag_kind='kde')
1<seaborn.axisgrid.PairGrid at 0x7f511c422990>

png

1fig,ax = plt.subplots(figsize=(10,10))
2sns.heatmap(train_dataset[columns_of_interest].corr(method='spearman'), annot=True ,ax=ax, center=0)
3plt.title('Correlations')
4plt.show()

png

Normalization

1# move the depth column to the right
2depth_train, depth_pred = train_dataset.pop('DEPT'), test_dataset.pop('DEPT')
3train_dataset['DEPT'], test_dataset['DEPT'] = depth_train, depth_pred
1columns_of_interest
1['RHOB', 'TNPH', 'ECGR', 'DTCO']
1from sklearn.compose import ColumnTransformer
2from sklearn.preprocessing import PowerTransformer
3
4
5scaler = PowerTransformer(method='yeo-johnson')
6column_drop = ['WELL','DEPT']
7ct = ColumnTransformer([('transform', scaler, columns_of_interest)], remainder='passthrough')
8
9train_dataset_norm = ct.fit_transform(train_dataset)
10train_dataset_norm = pd.DataFrame(train_dataset_norm, columns = train_dataset.columns)
11train_dataset_norm
RHOBTNPHECGRDTCOWellDEPT
00.1949831.417440.2885851.069585PoseidonNorth13800.0
1-0.0246241.2952280.03221.061787PoseidonNorth13800.5
20.748610.8516480.3055981.058519PoseidonNorth13801.0
31.3805491.2970940.5098860.876203PoseidonNorth13801.5
43.5434681.3349140.3668760.886546PoseidonNorth13802.0
.....................
8969-3.48235-0.712419-1.350794-1.010424Proteus15198.0
89701.775529-0.253490.321077-1.018072Proteus15198.5
89711.506393-0.751591-2.436269-1.541108Proteus15199.0
89722.664714-2.337805-2.461197-1.597139Proteus15199.5
89730.283779-2.130702-1.708333-1.674272Proteus15200.0

8974 rows × 6 columns

1# Pair Plot after normalization
2
3sns.pairplot(data=train_dataset, vars = ['RHOB', 'TNPH','ECGR','DTCO'], diag_kind='kde')
1<seaborn.axisgrid.PairGrid at 0x7f5102e4e350>

png

Outlier Removal

1train_dataset_drop = train_dataset_norm.copy()
2train_dataset_drop = train_dataset_drop.drop(columns=['Well','DEPT'], )
3train_dataset_drop
RHOBTNPHECGRDTCO
00.1949831.417440.2885851.069585
1-0.0246241.2952280.03221.061787
20.748610.8516480.3055981.058519
31.3805491.2970940.5098860.876203
43.5434681.3349140.3668760.886546
...............
8969-3.48235-0.712419-1.350794-1.010424
89701.775529-0.253490.321077-1.018072
89711.506393-0.751591-2.436269-1.541108
89722.664714-2.337805-2.461197-1.597139
89730.283779-2.130702-1.708333-1.674272

8974 rows × 4 columns

1from sklearn.svm import OneClassSVM
2svm = OneClassSVM(nu=0.1)
3yhat = svm.fit_predict(train_dataset_drop)
4mask = yhat != -1
5train_dataset_svm = train_dataset_norm[mask]
6print('Number of points after outliers removed with One-class SVM :', len(train_dataset_svm))
1Number of points after outliers removed with One-class SVM : 8076

Modelling

1y_train = train_dataset_svm['DTCO']
2X_train = train_dataset_svm[['RHOB','TNPH','ECGR']]
1# Derived Features
2from scipy.ndimage import median_filter
3
4for col in X_train.columns :
5 X_train[col + '_GRAD'] = np.gradient(X_train[col].values)
6 X_train[col + '_AVG'] = median_filter(X_train[col].values.tolist(),10,mode='nearest')
1X_train.columns
1Index(['RHOB', 'TNPH', 'ECGR', 'RHOB_GRAD', 'RHOB_AVG', 'TNPH_GRAD',
2 'TNPH_AVG', 'ECGR_GRAD', 'ECGR_AVG'],
3 dtype='object')
1# Random Forest
2from sklearn.ensemble import RandomForestRegressor
3rf_model = RandomForestRegressor()
4
5rf_model.fit(X_train, y_train)
6print('Train Accuracy :', rf_model.score(X_train, y_train))
7
8# for well in test_wells :
9# X_test, y_test = process_test_set(test_dataset, well, scaler)
10# print('Test Accuracy in ' + well + ' : ', rf_model.score(X_test_scaled, y_test))
1Train Accuracy : 0.9917603654437802
1# Gradient Boost
2from sklearn.ensemble import GradientBoostingRegressor
3gb_model = GradientBoostingRegressor()
4gb_model.fit(X_train, y_train)
5print('Train Accuracy :',gb_model.score(X_train, y_train))
1Train Accuracy : 0.9081315988009683

Hyper Parameter Tuning

1# estimator = GradientBoostingRegressor()
2
3# ## Hyperparameters
4# max_depth = [10, 20, 30,50, 100]
5# min_samples_leaf = [1, 2,3, 4]
6# min_samples_split = [2, 10]
7# n_estimators = [100,200,300]
8
9# param_grid = {'n_estimators': n_estimators,'max_depth': max_depth , 'min_samples_leaf' : min_samples_leaf, 'min_samples_split' : min_samples_split}
10
11# from sklearn.model_selection import RandomizedSearchCV
12# search = RandomizedSearchCV(estimator, param_grid , cv = 5, verbose=1,n_jobs=-1)
13# search.fit(X_train, y_train)
14# search.best_params_
1# estimator = RandomForestRegressor()
2# ## Hyperparameters
3# max_depth = [10, 20, 30,50, 100]
4# min_samples_leaf = [1, 2,3, 4]
5# min_samples_split = [2, 10]
6# n_estimators = [100,200,300]
7
8# param_grid = {'n_estimators': n_estimators,'max_depth': max_depth , 'min_samples_leaf' : min_samples_leaf, 'min_samples_split' : min_samples_split}
9
10# from sklearn.model_selection import RandomizedSearchCV
11# search = RandomizedSearchCV(estimator, param_grid , cv = 5, verbose=1,n_jobs=-1, n_iter=20)
12# search.fit(X_train, y_train)
13# search.best_params_
1#search.best_estimator_
1# New Gradient Boosting model with the best hyper parameters
2gb_model_tuned = GradientBoostingRegressor(max_depth = 20, n_estimators= 100,min_samples_leaf=4,min_samples_split=2)
3gb_model_tuned.fit(X_train, y_train)
4print('Train Accuracy :',gb_model_tuned.score(X_train, y_train))
1Train Accuracy : 0.9998761339778823
1rf_model_tuned = RandomForestRegressor(max_depth=50, min_samples_leaf=2, min_samples_split=10,
2 n_estimators=200)
3rf_model_tuned.fit(X_train,y_train)
1RandomForestRegressor(max_depth=50, min_samples_leaf=2, min_samples_split=10,
2 n_estimators=200)

Test Wells

1# normalisation
2test_dataset_norm = pd.DataFrame(ct.transform(test_dataset), columns = test_dataset.columns)
1test_dataset_norm
RHOBTNPHECGRDTCOWellDEPT
0-0.454429-1.257473-1.226668-0.912366Poseidon14600.0
1-0.543094-0.975991-0.928349-0.233124Poseidon14600.5
2-0.394922-0.638131-0.372358-0.082209Poseidon14601.0
3-0.529706-0.990353-1.153705-0.770638Poseidon14601.5
4-0.400345-1.249951-1.192646-0.889156Poseidon14602.0
.....................
2579-0.847423-1.802829-2.111444-1.249448Pharos14998.0
2580-0.406668-1.645907-1.818261-1.338187Pharos14998.5
2581-1.341027-1.628059-1.834552-1.27002Pharos14999.0
2582-2.260172-2.131092-2.367431-1.088848Pharos14999.5
2583-2.370962-2.157379-2.45466-0.973624Pharos15000.0

2584 rows × 6 columns

1# outlier removal
2test_dataset_drop = test_dataset_norm.copy()
3
4yhat = svm.predict(test_dataset_drop[['RHOB','TNPH','ECGR','DTCO']])
5mask = yhat != -1
6test_dataset_svm = test_dataset_norm[mask]
7test_dataset_svm_unscaled = test_dataset[mask]
8print('Number of points after outliers removed with One-class SVM :', len(test_dataset_svm))
1Number of points after outliers removed with One-class SVM : 2152
1X_test = []
2y_test = []
3
4for i, well in enumerate(testing_wells) :
5 test = None
6 test = test_dataset_svm.loc[test_dataset_svm['Well'] == well]
7
8 test = test.drop(columns=['Well','DEPT'])
9
10
11 # Derived Features
12
13 for col in ['RHOB','TNPH','ECGR'] :
14 test[col + '_GRAD'] = np.gradient(test[col].values)
15 test[col + '_AVG'] = median_filter(test[col].values.tolist(),10,mode='nearest')
16
17 y_test.append(test.pop('DTCO').values)
18 X_test.append(test.values)
1# predictions
2from sklearn.metrics import mean_squared_error
3y_test_pred_gb = []
4y_test_pred_rf = []
5for i in range(len(X_test)) :
6 y_test_pred_gb.append(gb_model_tuned.predict(X_test[i]))
7 y_test_pred_rf.append(rf_model_tuned.predict(X_test[i]))
8 print('\n------------------------')
9 print('Test Accuracy of Well using Gradient Boosting Model ', testing_wells[i], ' : ', gb_model_tuned.score(X_test[i], y_test[i]))
10 print('MSE of Well using Gradient Boosting Model ', testing_wells[i], ' : ', mean_squared_error(y_test[i], y_test_pred_gb[i]))
11 print('Test Accuracy of Well using RF Model ', testing_wells[i], ' : ', rf_model_tuned.score(X_test[i], y_test[i]))
12 print('MSE of Well using RF Model ', testing_wells[i], ' : ', mean_squared_error(y_test[i], y_test_pred_rf[i]))
1------------------------
2Test Accuracy of Well using Gradient Boosting Model Poseidon1 : 0.7519203113851671
3MSE of Well using Gradient Boosting Model Poseidon1 : 0.162851261183905
4Test Accuracy of Well using RF Model Poseidon1 : 0.7812918613913131
5MSE of Well using RF Model Poseidon1 : 0.14357038418775003
6
7------------------------
8Test Accuracy of Well using Gradient Boosting Model Pharos1 : 0.7545701941806116
9MSE of Well using Gradient Boosting Model Pharos1 : 0.09787985129034865
10Test Accuracy of Well using RF Model Pharos1 : 0.7866187967475008
11MSE of Well using RF Model Pharos1 : 0.08509854935011467

Using Artificial Neural Networks

1import tensorflow as tf
2tf.random.set_seed(42)
3import keras
4from keras.models import Sequential
5from keras.layers import Dense, Dropout, Activation, BatchNormalization, Dropout
6from keras.callbacks import ModelCheckpoint
12021-07-11 19:25:27.814036: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
22021-07-11 19:25:27.814077: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
1# Over Parameterised ANN
2model = Sequential([
3 Dense(128, activation='relu',kernel_initializer='normal', input_shape=(9,) ) ,
4 BatchNormalization(),
5 Dropout(0.10),
6
7 Dense(64, kernel_initializer='normal',activation='relu' ) ,
8 BatchNormalization(),
9 Dropout(0.10),
10
11 Dense(32, kernel_initializer='normal',activation='relu' ) ,
12 BatchNormalization(),
13 Dropout(0.10),
14
15 Dense(16, kernel_initializer='normal',activation='relu' ) ,
16 BatchNormalization(),
17 Dropout(0.10),
18
19 Dense(8, kernel_initializer='normal',activation='relu' ) ,
20 BatchNormalization(),
21 Dropout(0.10),
22
23 Dense(4, kernel_initializer='normal',activation='relu' ) ,
24 BatchNormalization(),
25 Dropout(0.10),
26
27 Dense(units=1, kernel_initializer='normal') ]
28
29)
30
31
32
33model.compile(optimizer='adam', loss='mean_squared_error')
34print(model.summary())
35model_checkpoint = ModelCheckpoint('best_model',monitor='val_loss', mode = 'min', save_best_only=True, verbose=1)
36
37def scheduler(epoch, lr):
38 if epoch <=10 :
39 return lr
40 else :
41 return lr * tf.math.exp(-0.1 * epoch)
42
43lr_callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
44
45X_train_tf = tf.convert_to_tensor(X_train.values, np.float32)
46y_train_tf = tf.convert_to_tensor(y_train.values, np.float32)
12021-07-11 19:25:32.081060: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
22021-07-11 19:25:32.081302: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
32021-07-11 19:25:32.081316: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
42021-07-11 19:25:32.081333: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (p-fbd0b10b-3636-4896-bb87-fbf506835cad): /proc/driver/nvidia/version does not exist
52021-07-11 19:25:32.081543: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
6To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
72021-07-11 19:25:32.081688: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
8Model: "sequential"
9_________________________________________________________________
10Layer (type) Output Shape Param #
11=================================================================
12dense (Dense) (None, 128) 1280
13_________________________________________________________________
14batch_normalization (BatchNo (None, 128) 512
15_________________________________________________________________
16dropout (Dropout) (None, 128) 0
17_________________________________________________________________
18dense_1 (Dense) (None, 64) 8256
19_________________________________________________________________
20batch_normalization_1 (Batch (None, 64) 256
21_________________________________________________________________
22dropout_1 (Dropout) (None, 64) 0
23_________________________________________________________________
24dense_2 (Dense) (None, 32) 2080
25_________________________________________________________________
26batch_normalization_2 (Batch (None, 32) 128
27_________________________________________________________________
28dropout_2 (Dropout) (None, 32) 0
29_________________________________________________________________
30dense_3 (Dense) (None, 16) 528
31_________________________________________________________________
32batch_normalization_3 (Batch (None, 16) 64
33_________________________________________________________________
34dropout_3 (Dropout) (None, 16) 0
35_________________________________________________________________
36dense_4 (Dense) (None, 8) 136
37_________________________________________________________________
38batch_normalization_4 (Batch (None, 8) 32
39_________________________________________________________________
40dropout_4 (Dropout) (None, 8) 0
41_________________________________________________________________
42dense_5 (Dense) (None, 4) 36
43_________________________________________________________________
44batch_normalization_5 (Batch (None, 4) 16
45_________________________________________________________________
46dropout_5 (Dropout) (None, 4) 0
47_________________________________________________________________
48dense_6 (Dense) (None, 1) 5
49=================================================================
50Total params: 13,329
51Trainable params: 12,825
52Non-trainable params: 504
53_________________________________________________________________
54None
1history = model.fit(X_train_tf, y_train_tf, batch_size=00, validation_split=0.3, epochs=40,callbacks=[model_checkpoint, lr_callback])
1Epoch 1/40
22021-07-11 19:25:32.416134: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
32021-07-11 19:25:32.416620: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2499995000 Hz
4177/177 [==============================] - 2s 6ms/step - loss: 0.6706 - val_loss: 1.0109
5
6Epoch 00001: val_loss improved from inf to 1.01087, saving model to best_model
72021-07-11 19:25:35.935800: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
8INFO:tensorflow:Assets written to: best_model/assets
9Epoch 2/40
10177/177 [==============================] - 1s 3ms/step - loss: 0.2181 - val_loss: 0.6717
11
12Epoch 00002: val_loss improved from 1.01087 to 0.67171, saving model to best_model
13INFO:tensorflow:Assets written to: best_model/assets
14Epoch 3/40
15177/177 [==============================] - 1s 3ms/step - loss: 0.1860 - val_loss: 0.2183
16
17Epoch 00003: val_loss improved from 0.67171 to 0.21830, saving model to best_model
18INFO:tensorflow:Assets written to: best_model/assets
19Epoch 4/40
20177/177 [==============================] - 1s 3ms/step - loss: 0.1927 - val_loss: 0.1277
21
22Epoch 00004: val_loss improved from 0.21830 to 0.12774, saving model to best_model
23INFO:tensorflow:Assets written to: best_model/assets
24Epoch 5/40
25177/177 [==============================] - 1s 4ms/step - loss: 0.1761 - val_loss: 0.1395
26
27Epoch 00005: val_loss did not improve from 0.12774
28Epoch 6/40
29177/177 [==============================] - 1s 3ms/step - loss: 0.1792 - val_loss: 0.1409
30
31Epoch 00006: val_loss did not improve from 0.12774
32Epoch 7/40
33177/177 [==============================] - 1s 3ms/step - loss: 0.1682 - val_loss: 0.1790
34
35Epoch 00007: val_loss did not improve from 0.12774
36Epoch 8/40
37177/177 [==============================] - 1s 3ms/step - loss: 0.1706 - val_loss: 0.1403
38
39Epoch 00008: val_loss did not improve from 0.12774
40Epoch 9/40
41177/177 [==============================] - 1s 3ms/step - loss: 0.1751 - val_loss: 0.1392
42
43Epoch 00009: val_loss did not improve from 0.12774
44Epoch 10/40
45177/177 [==============================] - 1s 3ms/step - loss: 0.1725 - val_loss: 0.1580
46
47Epoch 00010: val_loss did not improve from 0.12774
48Epoch 11/40
49177/177 [==============================] - 1s 3ms/step - loss: 0.1775 - val_loss: 0.1340
50
51Epoch 00011: val_loss did not improve from 0.12774
52Epoch 12/40
53177/177 [==============================] - 1s 3ms/step - loss: 0.1675 - val_loss: 0.1367
54
55Epoch 00012: val_loss did not improve from 0.12774
56Epoch 13/40
57177/177 [==============================] - 1s 3ms/step - loss: 0.1543 - val_loss: 0.1406
58
59Epoch 00013: val_loss did not improve from 0.12774
60Epoch 14/40
61177/177 [==============================] - 1s 3ms/step - loss: 0.1471 - val_loss: 0.1436
62
63Epoch 00014: val_loss did not improve from 0.12774
64Epoch 15/40
65177/177 [==============================] - 1s 4ms/step - loss: 0.1496 - val_loss: 0.1432
66
67Epoch 00015: val_loss did not improve from 0.12774
68Epoch 16/40
69177/177 [==============================] - 1s 3ms/step - loss: 0.1599 - val_loss: 0.1423
70
71Epoch 00016: val_loss did not improve from 0.12774
72Epoch 17/40
73177/177 [==============================] - 1s 3ms/step - loss: 0.1613 - val_loss: 0.1413
74
75Epoch 00017: val_loss did not improve from 0.12774
76Epoch 18/40
77177/177 [==============================] - 1s 3ms/step - loss: 0.1445 - val_loss: 0.1421
78
79Epoch 00018: val_loss did not improve from 0.12774
80Epoch 19/40
81177/177 [==============================] - 1s 3ms/step - loss: 0.1512 - val_loss: 0.1420
82
83Epoch 00019: val_loss did not improve from 0.12774
84Epoch 20/40
85177/177 [==============================] - 1s 3ms/step - loss: 0.1469 - val_loss: 0.1416
86
87Epoch 00020: val_loss did not improve from 0.12774
88Epoch 21/40
89177/177 [==============================] - 1s 3ms/step - loss: 0.1556 - val_loss: 0.1422
90
91Epoch 00021: val_loss did not improve from 0.12774
92Epoch 22/40
93177/177 [==============================] - 1s 3ms/step - loss: 0.1545 - val_loss: 0.1425
94
95Epoch 00022: val_loss did not improve from 0.12774
96Epoch 23/40
97177/177 [==============================] - 1s 3ms/step - loss: 0.1496 - val_loss: 0.1426
98
99Epoch 00023: val_loss did not improve from 0.12774
100Epoch 24/40
101177/177 [==============================] - 1s 3ms/step - loss: 0.1540 - val_loss: 0.1418
102
103Epoch 00024: val_loss did not improve from 0.12774
104Epoch 25/40
105177/177 [==============================] - 1s 3ms/step - loss: 0.1472 - val_loss: 0.1407
106
107Epoch 00025: val_loss did not improve from 0.12774
108Epoch 26/40
109177/177 [==============================] - 1s 3ms/step - loss: 0.1550 - val_loss: 0.1411
110
111Epoch 00026: val_loss did not improve from 0.12774
112Epoch 27/40
113177/177 [==============================] - 1s 3ms/step - loss: 0.1572 - val_loss: 0.1428
114
115Epoch 00027: val_loss did not improve from 0.12774
116Epoch 28/40
117177/177 [==============================] - 1s 3ms/step - loss: 0.1663 - val_loss: 0.1426
118
119Epoch 00028: val_loss did not improve from 0.12774
120Epoch 29/40
121177/177 [==============================] - 1s 3ms/step - loss: 0.1568 - val_loss: 0.1418
122
123Epoch 00029: val_loss did not improve from 0.12774
124Epoch 30/40
125177/177 [==============================] - 1s 3ms/step - loss: 0.1565 - val_loss: 0.1426
126
127Epoch 00030: val_loss did not improve from 0.12774
128Epoch 31/40
129177/177 [==============================] - 1s 3ms/step - loss: 0.1573 - val_loss: 0.1442
130
131Epoch 00031: val_loss did not improve from 0.12774
132Epoch 32/40
133177/177 [==============================] - 1s 3ms/step - loss: 0.1512 - val_loss: 0.1425
134
135Epoch 00032: val_loss did not improve from 0.12774
136Epoch 33/40
137177/177 [==============================] - 1s 4ms/step - loss: 0.1498 - val_loss: 0.1419
138
139Epoch 00033: val_loss did not improve from 0.12774
140Epoch 34/40
141177/177 [==============================] - 1s 3ms/step - loss: 0.1529 - val_loss: 0.1418
142
143Epoch 00034: val_loss did not improve from 0.12774
144Epoch 35/40
145177/177 [==============================] - 1s 3ms/step - loss: 0.1542 - val_loss: 0.1419
146
147Epoch 00035: val_loss did not improve from 0.12774
148Epoch 36/40
149177/177 [==============================] - 1s 3ms/step - loss: 0.1591 - val_loss: 0.1432
150
151Epoch 00036: val_loss did not improve from 0.12774
152Epoch 37/40
153177/177 [==============================] - 1s 3ms/step - loss: 0.1611 - val_loss: 0.1430
154
155Epoch 00037: val_loss did not improve from 0.12774
156Epoch 38/40
157177/177 [==============================] - 1s 3ms/step - loss: 0.1492 - val_loss: 0.1423
158
159Epoch 00038: val_loss did not improve from 0.12774
160Epoch 39/40
161177/177 [==============================] - 1s 3ms/step - loss: 0.1604 - val_loss: 0.1420
162
163Epoch 00039: val_loss did not improve from 0.12774
164Epoch 40/40
165177/177 [==============================] - 1s 3ms/step - loss: 0.1460 - val_loss: 0.1428
166
167Epoch 00040: val_loss did not improve from 0.12774
1def plot_model_history(history):
2 fig, axes = plt.subplots(nrows=1, ncols=1)
3 axes.plot(history.history['loss'])
4 axes.plot(history.history['val_loss'])
5 axes.legend(['loss','val_loss'])
6
7plot_model_history(history)

png

1y_test_pred_nn = []
2
3for i in range(len(X_test)) :
4 print('Test Well : ',testing_wells[i] )
5 X_test_tf = tf.convert_to_tensor(X_test[i], np.float32)
6 y_test_tf =tf.convert_to_tensor(y_test[i], np.float32)
7 y_test_pred_nn.append(model.predict(X_test_tf))
8 print('MSE :', model.evaluate(X_test_tf, y_test_tf))
9 print('\n---------------------\n')
1Test Well : Poseidon1
216/16 [==============================] - 0s 1ms/step - loss: 0.1655
3MSE : 0.16550424695014954
4
5---------------------
6
7Test Well : Pharos1
852/52 [==============================] - 0s 883us/step - loss: 0.0743
9MSE : 0.07427336275577545
10
11---------------------
1#
2for i in range(len(testing_wells)) :
3 preds = y_test_pred_nn[i]
4 rows = preds.shape[0]
5
6 df = pd.DataFrame(columns = columns_of_interest, index = range(rows))
7 for j in range(len(columns_of_interest)-1) :
8 df[:][j] = 0
9 df['DTCO'] = preds
10
11 rescaled_df = pd.DataFrame(ct.named_transformers_.transform.inverse_transform(df) , columns = columns_of_interest)
12
13
14 condition = test_dataset_svm_unscaled['Well'] == testing_wells[i]
15
16 test_dataset_svm_unscaled.loc[condition, 'DTCO_Predicted'] = rescaled_df['DTCO'].values
1test_dataset_svm_unscaled
RHOBTNPHECGRDTCOWellDEPTDTCO_Predicted
80412.521110.100034.399770.4003Poseidon14600.073.254893
80422.511212.400042.822880.4120Poseidon14600.572.147674
80432.527715.300060.145282.5646Poseidon14601.075.254447
80442.512712.280036.400772.5366Poseidon14601.572.460500
80452.527110.160035.327970.7520Poseidon14602.076.343241
........................
90302.45656.847216.408264.9338Pharos14997.065.889312
90312.44936.055413.126564.0184Pharos14997.565.930052
90322.47665.986713.491265.2055Pharos14998.066.617091
90332.52647.119819.689863.8091Pharos14998.566.574747
90342.41837.251419.325264.8829Pharos14999.066.289676

2152 rows × 7 columns

Plotting Results

1# Plotting Results
2def plot_logs(well) :
3 columns = columns_of_interest + ['DTCO_Predicted']
4 fig,ax = plt.subplots(nrows=1, ncols= len(columns), figsize=(20,10))
5 colors = ['black', 'red', 'blue', 'green', 'purple', 'black', 'orange']
6
7 for i,log in enumerate(columns) :
8 if log not in ['DEPT','Well'] :
9 ax[i].plot(well[log], well['DEPT'], color = colors[i])
10 ax[i].set_title(log)
11 ax[i].grid(True)
12
13 ax[2].set_xlim(0,300)
14 plt.tight_layout(1.1)
15 plt.show()
16
17for well in testing_wells :
18 print('Well Name :', well)
19 condition = test_dataset_svm_unscaled['Well'] == well
20
21 plot_logs(test_dataset_svm_unscaled[condition])
1Well Name : Poseidon1

png

1Well Name : Pharos1

png

More articles from Yugen

Machine Learning & AI for Predictive Exploration

AI trends and adoption themes in Upstream and resources to get started with AI in Geoscience.

July 5th, 2021 · 4 min read

Telecom Churn Case Study - Part 1

Predicting telecom customers who might suspend connections.

December 4th, 2020 · 8 min read
© 2020–2021 Yugen
Link to $http://twitter.com/JayanthBoddu/Link to $https://github.com/jayantb1019Link to $https://www.instagram.com/jayantb1019/Link to $https://www.linkedin.com/in/jayanthboddu/