236 Geophysics in Geothermal Exploration Figure 7.10 Supervised technique as a two-step problem. In Machine Learning, and in data science in general, it is recommended to prepare the data by splitting the dataset into three, respectively for training, validation and predictability assessment. In practice, in geosciences, well data is expensive, and the reservoir samples are rare and often underrepresented, leading to difficulties while applying these rigorous recommendations. The data is explored, grouping lithology or facies, considering a property or other, separating by interval using markers and horizons, to propose the best model. The limitation is often the number of input variables, which prevents explaining a too complex system. Another paradigm, often observed in projects, is the homogeneity between the data on which the training is based (wells) and the seismic resolution. Ideally, the upscaling should be performed to ensure compatibility, but the lack of points, especially in the reservoirs, may also lead to high uncertainty during the prediction. Finally, considering classification only, the outputs of such approaches are not only labels, but also scores or probabilities. These latest outputs are key to propose scenarios, considering the uncertainties associated with the predictive model in the seismic characterization (Yareshchenko et al., 2021). Unsupervised approaches Clustering methods (K-means, Self-Organizing Maps, …) are algorithms, selftrained, and allow to classify the data regarding “typical responses”, labelling the input data as “classes”. These algorithms are applied: • On maps, for example, in risk analysis or seismic fracture characterization (Kumar et al., 2017). • On horizon-slice, considering typical shapes of trace from channel or karst identification (Voutay et al., 2002). • On volume, considering each sample, for reservoir, salt or igneous rock identification (Cardoso et al., 2022).
RkJQdWJsaXNoZXIy NjA3NzQ=