TURKISH JOURNAL OF ONCOLOGY

Summary

OBJECTIVE
The problem in gastric cancer patients is multifactorial and it is not possible to use one factor alone to predict disease survival. Machine learning (ML) algorithms have become popular in the medical field, recently. We aimed to evaluate the power of ML algorithms for predicting deaths due to gastric cancer.

METHODS
We reevaluated the retrospective data published. Seven different ML algorithms (logistic regression [LR], artificial neural networks/multilayer perceptron, gradient boosted trees, support vector machine, random forest, naive Bayes, and probabilistic neural network) tried to predict disease-related deaths using the significant variables effective on disease-specific survival (DSS) obtained from univariate analysis.

RESULTS
Median follow-up time was 34 months (4-156 months), and the death with disease occurred in 194 (86.6%) patients in the follow-up period. The median DSS was 22 (4-139) months. Using the significant variables effective on DSS obtained from univariate analysis, the highest accuracy rate (99%) was the best in the LR, and only one patient was classified incorrectly.

CONCLUSION
We can successfully predict the treatment outcomes such as disease-related deaths in gastric cancer patients using ML algorithms.

Summary

Introduction

Gastric cancer, the common cause of cancer-related deaths, is the sixth most common cancer worldwide. [1] Although the TNM stage is the most valuable prognostic factor, lymphovascular space invasion (LVSI), grade, surgery type, and performance score are other factors that can affect the survival of the patient.[2-4] Gastrectomy plus regional lymph node dissection remains the primary treatment of the surgically removable disease, but surgery without any pre-operative or post-operative treatment provides a 5-year overall survival (OS) rate of 20-30%.[5,6] The survival benefit of adjuvant treatment options when compared with surgery alone in potentially operable patients has been shown in several randomized trials.[2,4,5] Adjuvant chemoradiotherapy (ChRT) as the result of SWOG 9008/INT-0116 trial and perioperative (pre-operative plus post-operative) chemotherapy (ChT) as the result of the MAGIC trial are the treatment options that have been used until today.[2,6] While the most significant prognostic factors are tumor spread, tumor size, and lymph node metastasis status, tumor grade, histology, surgical margin, tumor localization, and performance status are also using in the treatment decision and estimating prognosis.[2-4,7]

The problem in patients diagnosed with gastric cancer is also multi-factorial as in many different areas in the universe that means many variables contribute to treatment results. We cannot use one factor alone to predict disease survival, as disease, patient, and treatment- related factors are the relationship to cancer patients" survival. In this way, the multivariate analysis tool aims to find patterns and relationships between several variables simultaneously and, multivariate analysis lets us predict the effects of a change in one variable will have on other variables. The multivariate analysis is capable of providing a more accurate depiction and understanding of the behavior of data that are highly correlated with each other. Multivariate analysis techniques are complex and a statistical program is necessary for performing this analysis. One of the significant limitations of multivariate analysis is that statistical modeling outputs are not always easy for clinicians to interpret. Furthermore, to obtain meaningful results for multivariate techniques, a large sample of data is necessary.

Machine learning (ML) has become popular in the health sector recently. Although there is no consensus on which algorithm is the best, applications related to ML are studied in several trials that include patients with cancer.[8] Many ML algorithms are capable of learning from the provided data by investigators and also the accuracy and efficiency of models to make decisions improve with subsequent training as new data are provided. Although the most significant advantage of ML is the ability to automate various decision-making tasks, the most painful and difficult point of ML is the acquisition of data and the cost of collecting data.

In our study, we aimed to evaluate the power of ML algorithms for predicting deaths due to stomach cancer. Thus, we used statistically significant parameters that we obtained from univariate analysis for diseasespecific survival (DSS).

Introduction

Methods

In this study, we reevaluated the retrospective data published as prognostic factors for survival in patients with gastric cancer: Single-center experience was reported by Yaprak et al.[9] We aimed to work on these data because we had ready-made data sets of a large number of patients diagnosed with gastric cancer. We excluded patients with Stage 4 and who had missing data.

Patient Characteristics
The patient characteristics are summarized in Table 1. In our study, the median age was 57 (range; 22-87), and 66.5% of the patients were male. Total gastrectomy was performed in 168 (50.3%) patients, and subtotal gastrectomy was performed in 166 (49.7%) patients. 258 (77.8%) patients had positive lymph node disease and 76 (22.8%) patients confirmed as a node-negative disease based on pathologic examination. According to staging, 50 (15%) patients were Stage I, 94 (28.2%) patients were Stage II, and 190 (56.8%) patients were Stage III. Perineural invasion (PNI) was identified in 203 (60.8%) patients, and LVSI was identified in 238 (71.3%) patients. 41 patients had Grade 1 (12.3%), 107 patients had Grade 2 (32.0%), and 186 patients had Grade 3 (55.7%) disease.

Table 1: Demographic and clinicopathologic characteristics of the patients

Treatment and Relapse Patterns in Follow-up
Two hundred and twelve patients (63.5%) were considered eligible for adjuvant ChRT. The RT treatment was administered as 45 Gy/25 fractions in 172 (81.1%) patients and 50.4 Gy/28 fractions in 27 (12.7%) patients. Thirteen (6.1%) patients could not complete 45 Gy due to toxicity. Two-dimensional technique and three-dimensional conformal technique was used in 52 (24.5%) and 160 (75.5%) patients, respectively. All patients received bolus or infusional 5-FU as one cycle before RT and one cycle after RT. Used concomitant ChT schemes were bolus fluorouracil and leukovorin, or infusional fluorouracil, or oral capecitabine. The characteristics of the received treatments are summarized in Table 2.

Table 2: Treatment characteristics of the patients

ML
In our study, to predict the DSS, we used seven different ML algorithms, such as logistic regression (LR), artificial neural networks/multilayer perceptron (ANN/ MLP), gradient boosted trees (GBT), support vector machine (SVM), random forest (RF), naive Bayes (NB), and probabilistic neural network (PNN). We used the parameters obtained from the univariate analysis results for predicting DSS using ML algorithms.

LR algorithm generates a curve between 0 and 1 value and makes probability estimation. The algorithm uses the natural logarithm of the probabilities of the target variable while constructing the model.[10] The ANN/MLP algorithm is created by imitating the way nerve cells in the human brain, known as neurons, carries information. While performing the learning process with experience, this algorithm tries to find the relationship between data and create a meaningful pattern between them.[11] The GBT algorithm, which dominates data sets, is an algorithm that is created using gradient supported decision trees, which is preferred due to its speed and performance and is used in the solution of classification and regression problems.[12] The SVM algorithm is an algorithm that can make binary or multiple classifications on the data set and can generalize on data whose distribution is unknown and can predict new data thanks to these data.[13] RF algorithm is a classification algorithm that can work with missing data and show high accuracy when used in large data sets. Since different data and variables are used in each tree, no overfitting problem has been encountered in the algorithm.[14] The NB algorithm is an algorithm that can increase the classification accuracy and is used to process continuous values frequently.[15] PNN algorithm is an algorithm based on Bayes rule and class probability estimation to minimize the possibility of misclassification. PNN, an algorithm that is used frequently in classification and pattern recognition problems, was created using feed-forward neural networks. This algorithm approximates the parent probability distribution function of the classes using the Parzen window and a non-parametric model. PNN uses the parental probability distribution function of each class to estimate the class probability of new input and adds the class with the highest odds according to the Bayesian approach as the new input.[16]

From the retrospective gastric cancer data we have, we reevaluated patient, disease, and treatment characteristics, such as age, stage, tumor diameter, LVSI, PNI, grade, surgery status, surgery type, radiation technique, and concomitant ChT status. We decided the dataset into two groups for algorithm training and testing the accuracy of prediction. Patients distributed between these two groups in a ratio of 70-30%. The models were constructed using the training set and validated using the testing set.

Statistics and Application
The complexity matrix is a matrix created from the information obtained by comparing the actual and predicted data and applying the classification process to these data. The complexity matrix is used to determine the classification performance of the methods used. [17,18] The accuracy rate method is used to determine how it performs the classification process accurately. This method is calculated by dividing the number of true-positive and true-negative samples in the samples by the total number. The error rate is calculated by proportioning the number of false-positive and false-negative samples to the total number.[19]

We defined DSS as a period from the date of diagnosis to the date of cancer-related death or the last follow-up date. The Kaplan-Meier method was used for survival analysis. A Cox proportional hazard model was utilized for multivariate analysis to determine independent prognostic factors. All the tests were twosided and, p<0.05 was considered to be statistically significant.

Methods

Results

Median follow-up time was 34 months (4-156 months), and in the follow-up, locoregional relapse, and/or distant relapse occurred in 204 (61.1) patients. Locoregional relapse alone occurred in 48 (23.5%) patients and, distant relapse alone occurred in 145 (71.1%) patients. The death occurred in 224 (67.1) patients and, death with disease occurred in 194 (86.6%) patients. The median OS was 34.5 (4-156) months and DSS was 22 (4-139) months.

The univariate analysis showed that age (<70 vs. ?70 years, p=0.042), tumor diameter (<5 vs. ?5 cm, p=0.006), T stage (p<0.001), N stage (p<0.001), stage (p<0.001), LVSI (p=0.005), grade (p<0.001), adjuvant RT dose (<45 Gy vs. ?45 Gy, p=0.023), and relapse situation (p<0.001) were affecting factors on DSS.

According to the univariate analysis results, two different multivariate analysis models were described. In the first model age, tumor diameter, T stage, N stage, LVSI, grade, adjuvant RT dose, and relapse, and in the second model age, tumor diameter, TNM Stage, LVSI, grade, adjuvant RT dose, and relapse situation were included in the study. As a result of multivariate analysis, the independent prognostic factor was the N stage (p=004) and TNM stage (p<0.001) in two different models, respectively.

The results for the prediction of DSS obtained from seven different ML algorithms using parameters obtained from the univariate analysis are shown in Table 3. The accuracy rate was the best in the LR algorithm, and also incorrectly classified patient count was only 1.

Table 3: Recall, precision, sensitivity, specificity, f-measure, and accuracy of machine learning algorithms

Results

Discussion

The mortality rate and incidence of gastric cancer differ throughout the world.[20] Surgery is a curative treatment. Despite the improvements in surgical techniques, surgery alone without any pre- or post-operative treatment option provides a reasonable OS rate. Randomized studies demonstrated the OS between 20 and 30% following surgery alone in patients with operable gastric cancer.[3,21] The survival rate varies according to the T and N stage, such as 85-90% in patients with T1 tumors and 15-20% in patients with T4 and node-positive patients. Furthermore, locoregional recurrence rates are a serious concern in resected patients.[22] Given the information above, a multi-modal approach is necessary to improve surgical results.

The stage has been the most commonly used and most-effective factor for predicting the prognosis in patients with gastric cancer.[23] Similarly, in our study, we found the N and the AJCC Stage is negative prognostic factors on DSS. However, receiving different treatments, such as a combination of surgery, ChT, and RT, will cause many more significant risk factors influencing the DSS. Therefore, more-comprehensive prognosis models, such as nomograms, have been implemented, including demographics and other significant clinical parameters except the stage.[24] The constructed nomograms for this purpose over time include several independent prognostic factors. First, Zhong et al.[25] presented a nomogram for predicting the 10-year DSS for patients with gastric cancer. Age is an important prognostic factor for DSS in many studies, and low survival has been demonstrated in elderly patients.[26,27] According to these studies, we have shown that advanced age is a worse prognostic factor in our assessment.

There is currently no consensus on the optimal algorithm to predict treatment results by ML. Several studies in the literature used clinical, radiological, tissue, and blood genomics for predicting survival by ML in several cancer types.[28-33] However, to the best of our knowledge, there are no studies that evaluate using ML algorithms for predicting DSS in patients with gastric cancer to date. Using our retrospective data, in the present study, we aimed to compare seven ML methods commonly used in the literature as LR, ANN/MLP, GBoosted, SVM, RF, NB, and PNN. Although all algorithms had a high accuracy rate of >90%; in our study, the best algorithm with the highest accuracy to predict DSS was the LR algorithm.

LR algorithm has high success in classification problems where dependent variables are not continuous.[10] Our findings suggest that the issue in the data set we used in our study is more suitable for the LR method. ANN/MLP algorithm can work with missing data and is also successful in solving both regression and classification problems. When the literature is examined, the findings suggest that the ANN/MLP algorithm achieves better results in large data sets.[11] The failure of the ANN/MLP algorithm in solving the problem with the data set we used in our study may be related to the number of patients. Because of the outliers in the data set used in the study, the GBT algorithm showed a lower classification success compared to the other methods because it was overly adapted to the outliers in the classification. Since there are many tree structures in the model, it is more expensive concerning computation time and requires more memory. [12] SVM algorithm is a simple and practical, high performance. and useful algorithm based on estimating the most suitable function to separate data from each other. In this algorithm, the number of samples is insignificant, and superiority to the other algorithms is that it can classify data that have not been seen during training without any problems.[13] However, finding the optimal plane to separate the samples is critical for the algorithm, and samples were classified in a multifactorial disease, such as cancer, may not always be separated by a linear line. Since its ability in probabilistic classification is lower than other methods, it has led to lower performance in this problem. RF algorithm has a high success rate that classifies using many decision tree structures. The number of instances to be used in each node and the number of trees to be created must be determined in the algorithm to create a tree structure. Given the class ratios in the data set, the algorithm uses 2/3 of the whole data set as training and 1/3 as test data. RF generally outperforms decision trees and has a lower accuracy than gradient supported trees. Since there are too many tree structures in the algorithm, it is known to obtain slower results in real-time classification problems.[14] Although the RF algorithm showed high performance in our study, it worked slower than other algorithms. NB algorithm is a probabilistic simple classification method based on Bayes" theorem. The algorithm accepts attributes independently of each other, and the examples are all equally important. However, it is not possible given that each datum has the same significance in patients with cancer.[15] In our study, parameters and class information are dependent on each other. Since the NB algorithm considers the relationship between parameters independent from each other, the successful performance has calculated lower than other methods. PNN algorithm is relatively insensitive to outliers. Outliers in the data set used in our study negatively affected the performance of the model, and it was less successful than other methods. Furthermore, the PNN algorithm is slower in classifying new cases than multi-layer sensor networks and requires more memory space to store the model.[16]

Discussion

Conclusion

The prediction of prognosis in patients with cancer that underlies critical clinical decisions regarding treatment or monitoring is vital. Although our study has several limitations, we assessed the potential of predictions of disease-related deaths using an ML trained with prognostic parameters. Consequently, we can predict the treatment outcomes using these algorithms that enable learning based on different data types and by providing computers with the ability to detect complicated patterns and make rational decisions based on the data in patients with gastric cancer.

Peer-review: Externally peer-reviewed.

Conflict of Interest: The authors have no conflicts of interest to declare.

Ethics Committee Approval: The study protocol was approved by the University of Health Science, Dr. Lütfi Kırdar Training and Research Hospital Clinical Research Ethics Committee. (Number: 2018/514/136/1, Date: 28/08/2018).

Financial Support: The authors declared that this study has received no financial support.

Authorship contributions: Concept - U.K., G.Y.; Design - U.K., G.Y.; Supervision - A.Y., A.Ö.; Funding - None; Materials - G.Y.; Data collection and/or processing - G.Y.; Data analysis and/or interpretation - U.K., A.Y.; Literature search - U.K., A.Ö.; Writing - U.K.; Critical review ? A.Y., G.Y., A.Ö.

Conclusion