TURKISH JOURNAL OF ONCOLOGY 2021 , Vol 36 , Num 4
Prediction of Survival and Progression-free Survival Using Machine Learning in Stage III Lung Cancer: A Pilot Study
Melek YAKAR1,Durmuş ETİZ1,Şenay YILMAZ2,Özer ÇELİK3,Güntülü AK2,Muzaffer METİNTAŞ2
1Department of Radiation Oncology, Eskisehir Osmangazi University Faculty of Medicine, Eskisehir-Turkey
2Department of Chest Diseases, Eskisehir Osmangazi University Faculty of Medicine, Eskisehir-Turkey
3Department of Mathematics and Computer Science, Eskisehir Osmangazi University Faculty of Arts and Sciences, Eskisehir-Turkey
DOI : 10.5505/tjo.2021.2788

Summary

OBJECTIVE
This study aimed to predict the overall survival (OS), survival time, and time to progression in cases diagnosed with Stage III lung cancer.

METHODS
The sample consisted of 585 patients that underwent radiotherapy and chemotherapy with the diagnosis of Stage III lung cancer. OS prediction was undertaken in 324 cases, survival time prediction in 241 that died due to lung cancer, and prediction of time to progression in 223 that showed progression during follow-up. Twenty-seven variables were evaluated, and logistic regression, multilayer perceptron classifier (MLP), extreme gradient boosting, support vector clustering, random forest classifier (RFC), Gaussian Naive Bayes, and light gradient boosting machine algorithms were used to construct prediction models.

RESULTS
In OS prediction, over a median 21-month follow-up, 255 of 324 cases died and the median OS was 20 (2-101) months. The best predictive algorithms belonged to logistic regression for OS (accuracy rate: 70%, confidence interval [CI]: 0.60-0.82, area under curve [AUC]: 0.76), MLP classifier for 12- and 20-month survival times (67%, CI: 0.54-0.81, AUC: 0.64 and 71%, CI: 0.59-0.84, AUC: 0.61, respectively), and RFC for time to progression (76%, CI: 0.66-0.86, AUC: 0.78).

CONCLUSION
Considering high treatment costs, potential serious toxicity, the harm of early progression, and low survival in cases of ineffective treatment, machine learning-based predictive systems are promising. Personalizing prognosis and treatment using these algorithms can improve oncological results.

Introduction

Lung cancer is the leading cause of cancer-related deaths worldwide.[1] Although multiple treatment modalities are applied, the median overall survival (OS) is 12-23.2 months for non-small-cell lung cancer (SCLC) and 16-20 months for limited-stage SCLC. [2,3] A standard treatment based on the TNM staging system may not be suitable for every patient. Identifying patients at high risk of recurrence and high mortality due to the disease is also valuable in guiding treatment. Therefore, in this complex and heterogeneous disease group, it is important to evaluate prognosis in a personalized manner and plan treatment accordingly.

Artificial intelligence (AI) is a branch of computer science that aims to emulate human-like intelligence in machines using computer software and algorithms without direct human stimuli to perform certain tasks.[4] Machine learning (ML) is a subunit of AI using data-driven algorithms that learn to imitate human behavior based on a previous example or experience.[5] ML uses mathematical algorithms applied with computer programs to identify patterns in large data sets and improve this identification with additional data.[6]

It is important to predict survival and progression in cases diagnosed with cancer to improve treatment and provide patients and clinicians with information. Considering the data set of lung cancer patients with specific demographic, tumor and treatment information, it is essential to determine if any parameter can be used to predict whether the patient will survive or the disease will recur.

The current study aimed to predict OS, survival time, and time to progression using ML in patients diagnosed with Stage III lung cancer and treated at the Radiation Oncology and Chest Diseases departments of Eskişehir Osmangazi University Faculty of Medicine.

Methods

Patient Characteristics
The study included 585 cases diagnosed with Stage III lung cancer from 2007 to 2018. For the application of the ML technique, the cases were determined for each prediction group.

The inclusion criteria were as follows: A histopathological diagnosis of lung cancer, no diagnosis of distant metastasis or multiple primary neoplasia, Karnofsky Performance Scale (KPS) score ≥60, age >18, having completed all planned radiotherapy (RT) and chemotherapy schemes, and regularly attending the follow-up sessions. Staging was performed according to the American Joint Committee on Cancer Staging System, eighth edition.[7] For staging purposes, the thorax-abdomen computed tomography (CT)/fluorodeoxyglucose positron emission tomography (FDG-PET)/CT and brain magnetic resonance (MR) images were reviewed in each case. After the diagnosis, the cases were evaluated at the lung/pleural cancer council of ESOGUMF, and the treatment decision was taken using a multidisciplinary approach. Our study was approved by Eskisehir Osmangazi University Clinical Research Ethics Committee. All patients provided written informed consent before enrollment in the study.

Treatment Characteristics
Radiotherapy and concurrent chemotherapy

The patients were immobilized in a supine position using T-bar/Wingboard with their hands above their head, and planning CT was performed with the Somatom Definition AS® device with a 3-5-mm crosssection. The images were fused with the FDG-PET/ thoracic CT images at the time of diagnosis and current thorax CT images after chemotherapy in cases that underwent chemotherapy before RT. The gross tumor volume (GTV) was determined after fusion. In cases receiving chemotherapy before RT, GTVtumor was determined as the post-chemotherapy volume, and GTVlymph node as the pre-chemotherapy volume. The clinical target volume (CTV) margin was set according to tumor histopathology: CTVtumor was taken as 0.8 cm for adenocarcinoma, 0.6 cm for squamous cell carcinoma, and 0.5 cm for other histologies. CTVlymph node was determined as 0.5 cm. No elective nodal irradiation was performed. For the planning target volume (PTV), the CTVtumor and CTVlymph node, the volumes were given a 0.5-cm margin, and the cases were treated with imageguided radiation therapy after 2014. Radiation therapy was applied with daily fractions ranging from 1.8-2 Gy to 45-68 Gy depending on various criteria, such as tumor localization and size, lung volume, and tumor volume, under the guidance of 3DCRT/IMRT/VMAT using a Varian Trilogy®/TrueBeam® or Elekta Precise? device. In SCLC cases with good treatment response, 25 Gy (2.5 Gy/day×10 fractions) prophylactic cranial irradiation was applied.

Concurrent chemotherapy was applied to the appropriate cases. In the non-SCLC group, cisplatin (40 mg/m2) or paclitaxel (45-50 mg/m2)+carboplatin (area under curve [AUC]: 2) was administered weekly. In the patients with SCLC, cisplatin (40 mg/m2) was administered weekly or cisplatin (75 mg/m2)+etoposide (100 mg/m2) every 21 days. The patients attended the outpatient clinic every week.

Chemotherapy
In squamous cell lung cancer, gemcitabine, paclitaxel, or vinorelbine was used in primary and secondary chemotherapy, either alone or in combination with platinum. The first-line chemotherapy of adenocarcinoma was the same as given in the section above, but pemetrexed was applied as the second-line therapy. In patients with epidermal growth factor receptor, anaplastic lymphoma receptor tyrosine kinase gene translocation, or ROS proto-oncogene 1 receptor tyrosine kinase gene rearrangement, first-line chemotherapy was the same as in the previous section, and the second-line therapy was arranged as the targeted therapy specific to the genetic change. In patients with recurrent/progressive disease, a chemotherapy regimen that had not previously been used was applied, taking into account the clinical performance ability and comorbidities of the patient; therefore, the decision to continue this therapy was taken according to the patient response. In the treatment of SCLC, etoposide combined with platinum was used as the firstline chemotherapy regimen, and irinotecan or the combination of vincristine+cyclophosphamide+adriablastina was used as the second-line regimen in cases that did not respond to treatment or recurred.

Post-treatment Follow-up
At the 1st month after the end of treatment, anamnesis, a physical examination, thorax CT, and response to treatment were evaluated. The follow-up evaluations of anamnesis, physical examination, and thorax CT were performed every 3 months for the following 3 years, and every 6 months for the 4th and 5th years. After the 5th year, annual follow-up was undertaken. In suspected cases of recurrence/metastasis, abdominal CT/ brain MR and/or PET CT was also conducted.

ML, Statistical Analysis, and Application
In the prediction of both OS and time to progression, the following 27 variables were evaluated: Age, gender, KPS score, body mass index, smoking history, presence of chronic obstructive pulmonary disease, histopathology, tumor localization, tumor size, lymph node site, lymph node involvement (single level/multilevel), T stage, N stage, TNM stage, surgical history, presence of concurrent chemotherapy, concurrent chemotherapy scheme, number of chemotherapy cycles before RT, GTV, PTV, total RT dose, RT fraction dose, prognostic nutritional index, pretreatment serum albumin and hemoglobin values, neutrophil lymphocyte ratio (NLR), and advanced lung cancer inflammation index. These parameters were determined by considering previous prognosis studies related to lung cancer.[8-13] For the predictions, the ML algorithms of logistic regression, multilayer perceptron classifier (MLP), extreme gradient boosting (XGB) classifier, support vector clustering (SVC), random forest classifier (RFC), Gaussian Naive Bayes (GNB), and light gradient boosting machine (LGBM) classifier were used.

Statistical Analysis and Application
Extreme value analysis is a branch of statistics that deals with extreme deviations from the median of probability distributions. It aims to assess the likelihood of more extreme events than those previously observed from a particular sequential example of a certain random variable. Excessive values decrease predictive performance, and there are different methods for detecting extreme values, but in simple terms, values that deviate a certain amount from the mean are considered as extreme.[14] In this study, to increase the predictive performance, the data that were 1.96 × standard deviation from the mean (excessive values) according to the box plot method were excluded from the study. As training-test data rates, 80-20% were selected for the prediction of OS and OS time (12-month and 20-month), and 70-30% for the prediction of time to progression (12-month).

Synthetic minority over-sampling involves developing predictive models based on unbalanced classification data sets with severe class imbalance. The difficulty in working with unbalanced data sets is that most ML techniques do not take into account the minority class and perform poorly, but typically the most important performance belongs to the minority class. One approach to unbalanced data sets is to over-sample the minority class. The simplest approach is the duplication of samples in the minority class; however, these samples do not add any new information to the model; rather, new samples can be synthesized from existing samples.[15]

Cross validation is a model validation technique that tests what results will be obtained from a statistical analysis performed on an independent data set. Its main use is to predict what accuracy a prediction system will have in practice. In a prediction problem, the model is usually trained with a "known data set" (training set) and tested with an "unknown data set" (verification or test set). The purpose of this test is to measure the ability of the trained model to generalize new data and to identify problems of over-compliance or selection bias.[16] In the current study, cross verification was also undertaken. The structure of cross-validation is shown in Supplementary 1.

Suppl. Fig 1: Cross-validation structure.

Results

Patient, Tumor, and Treatment Characteristics
In OS prediction, 324 Stage III lung cancer cases were evaluated. The median age was 61 (range, 44-79) years. The median RT dose was 60 (range, 50-68) Gy. Concurrent chemotherapy was administered to 239 cases. The median number of concurrent chemotherapy is 4 (min: 0, max: 6). RT timing was with the first cycle of chemotherapy in 25 patients. Patient and tumor characteristics are summarized in Table 1a, and treatment characteristics in Table 1b.

Table 1a: Patient and tumor characteristics for the prediction of survival

Table1b: Treatment characteristics for the prediction of survival

In the prediction of OS time, 241 Stage III lung cancer cases that died were evaluated. The median age was 62 (range, 44-80) years. The median RT dose was 60 (range, 50-68) Gy. Concurrent chemotherapy was applied in 180 cases. The median number of concurrent chemotherapy is 4 (min: 0, max: 6). RT timing was with the first cycle of chemotherapy in 17 patients. The characteristics of the patients and tumors are summarized in Table 2a, and the treatment characteristics are given in Table 2b.

Table 2a: Patient and tumor characteristics for the prediction of survival time

Table 2b: Treatment characteristics for the prediction of survival time

For the prediction of time to progression, 223 cases that showed progression during the follow-up were evaluated. The median age was 61 (range, 44-80) years. The median RT dose was 60 (range, 50-68) Gy. Concurrent chemotherapy was applied to 172 cases. RT timing was with the first cycle of chemotherapy in 11 patients. The median number of concurrent chemotherapy is 4 (min: 0, max: 6). Patient and tumor characteristics are summarized in Table 3a, and the treatment characteristics are given in Table 3b.

Table 3a: Patient and tumor characteristics for the prediction of time to progression

Table 3b: Treatment characteristics for the prediction of time to progression

OS and Progression-free OS
The OS evaluation was conducted with 324 cases, and over a median follow-up of 21 months, 255 patients died. The prediction of OS time was performed with 241 of the patients that died, and the median survival time of this group was 20 (2-101) months. The median survival times for substages IIIA, IIIB, and IIIC were 25 (6-101), 19.5 (5-70), and 15 (2-65) months, respectively. The prediction of time to progression was undertaken with 223 cases that showed progression during the follow-up. The median time from the end of treatment to progression was 9 (0-96) months. The median values for substages IIIA, IIIB, and IIIC were 10 (0-96), 9 (0-68), and 7 (1-28) months, respectively.

ML Prediction
OS prediction

Significant variables were determined as PTV, lymph node site, and KPS score. Figure 1a gives the feature importance plot and the correlation matrix of the variables. The best predictive algorithm was identified as logistic regression with 70% accuracy (AUC: 0.76, confidence interval [CI]: 0.597-0.818), 94.44% sensitivity, and 41.38% specificity. The accuracy rates for the MLP, XGB, SVC, RFC, GNB, and LGBM algorithms were calculated as 63%, 53%, 56%, 60%, 66%, and 64%, respectively. The AUC graphs of the algorithms are given in Figure 2a, and the data belonging to the best predictive algorithm are shown in Table 4. The logistic regression algorithm accurately predicted 34 of 51 cases that died and 12 of 14 cases that survived, and the confusion matrix is presented in Table 5a.

Fig 1: Feature importance plots and correlation matrices. (a) Prediction of survival. (b) Prediction of 12-month survival. BMI: Body mass index; GTV: Gross tumor volume; PTV: Planning target volume; RT: Radiotherapy; PNI: Prognostic nutritional index;
NLR: Neutrophil-to-lymphocyte ratio; KPS: Karnofsky performance scale; ALI: Advance lung cancer inflammation index.

Table 4: Results of the best performing algorithm for each prediction

Table 5a: Confusion matrix for the prediction of survival

Table 5b: Confusion matrix for the prediction of 12-month survival

Table 5c: Confusion matrix for the prediction of 20-month survival

Table 5d: Confusion matrix for the prediction of time to progression

Fig 2: Feature importance plots and correlation matrices. (c) Prediction of 20-month survival. (d) Prediction of time to progression.
BMI: Body mass index; GTV: Gross tumor volume; PTV: Planning target volume; RT: Radiotherapy; PNI: Prognostic nutritional index; NLR: Neutrophil-to-lymphocyte ratio; KPS: Karnofsky performance scale; ALI: Advance lung cancer inflammation index.

OS time prediction
Twelve-month survival prediction

Significant variables were identified as GTV, lymph node site, surgical history, and histopathology. Figure 1b presents the feature importance plot and the correlation matrix of the variables. The best predictive algorithm was found to be MLP with 67% accuracy (AUC: 0.64, CI: 0.542-0.805), 66.67% sensitivity, and 67.65% specificity. The accuracy rates for the logistic regression, XGB, SVC, RFC, GNB, and LGBM algorithms were determined as 46%, 57%, 51%, 55%, 59%, and 53%, respectively. The AUC graph of the algorithms is given in Figure 2b. The data on the algorithm with the best predictive results are shown in Table 3. The MLP algorithm accurately predicted 10 of 21 cases that survived for ≤12 months and 23 of 28 cases that survived for >12 months, and the confusion matrix is given in Table 5b.

Twenty-month survival prediction
Significant variables were identified as GTV, lymph node site, and T stage. In Figure 1c, the feature importance plot and the correlation matrix of the variables are shown. The algorithm with the best predictive ability was MLP, which had an accuracy of 71% (AUC: 0.61, CI: 0.588-0.841), sensitivity of 73.17% and specificity of 62.50%. The accuracy rates for the logistic regression, XGB, SVC, RFC, GNB, and LGBM algorithms were determined as 59%, 59%, 71%, 51%, 67%, and 59%, respectively. Figure 2c presents the AUC graph of the algorithms, and Table 3 gives the detailed data of the best predictive algorithm. The MLP algorithm accurately predicted 30 of 33 cases that survived for ?20 months and 5 of 16 cases that survived for >20 months. The confusion matrix is presented in Table 5c.

Prediction of time to progression
Significant variables were determined as NLR, lymph node site, age, and T stage. In Figure 1d, the feature importance plot and the correlation matrix of the variables are shown. RFC was identified as the best predictive algorithm with 76% accuracy (AUC: 0.79, CI: 0.659-0.863), 90.91% sensitivity, and 61.76% specificity. The accuracy rates for the remaining algorithms were calculated as 61% for logistic Regression, 73% for XGB, 56% for SVC, 53% for MLP, 70% for GNB, and 68% for LGBM. Figure 2d presents the AUC graphic of all algorithms, and Table 3 shows the detailed data obtained from the best predictive algorithm. The RFC algorithm accurately predicted 30 of 43 cases that showed progression within 12 months and 21 of 24 cases that progressed after 12 months. Finally, the confusion matrix is presented in Table 5d.

Fig 2: Area under the curve graphs. (a) Prediction of survival. (b) Prediction of 12-month survival. (c) Prediction of 20-month survival. (d) Prediction of time to progression.
ROC: Receiver operating characteristic; SVC: Support vector classification; MLP: Multilayer perceptron classifier; LGBM: Light gradient boosting machine.

Discussion

In the past two decades, there has been an increase in the use of digital footprints to track and predict human behavior. Furthermore, the ML approach is increasingly being adopted in clinical settings. It is considered that using ML techniques will lead to a change in clinical medicine by solving basic problems related to large and complex data sets. ML offers the potential to derive adaptive systems from various data sets, discover hidden connections between data items, and predict results.[17]

Today, many hospitals store data in a digital environment. By evaluating these large data sets with ML techniques, it could become possible to predict the treatment results of patients, plan individualized patient treatment, improve institutional performance, and regulate health insurance. The accurate prediction of survival in cancer patients continues to be a problem due to the increased heterogeneity and complexity of cancer, various treatment options, and different patient characteristics (age, KPS score, comorbidities, etc.). If reliable estimates are obtained by ML, it can help achieve personalized care and treatment.

There is a growing interest in studies on prognosis prediction based on ML using patient, tumor and treatment data.[18,19] In a study conducted with 8,066 patients diagnosed with breast cancer, Ganggayah et al.[20] evaluated 23 variables for the OS prediction. The authors used the algorithms of decision tree, RFC, neural networks, extreme boost, logistic regression and SVM. Cancer stage, tumor size, total number of dissected axillary lymph nodes, number of metastatic lymph nodes, and primary treatment applied were determined as significant variables, and the algorithm that had the highest predictive ability was RFC with an accuracy rate of 82.7. Li et al.[21] examined 515 tumor tissues and 59 adjacent normal tissues and analyzed the gene expression profiles of the cases. They used three different algorithms (sigFeature, RFC, and univariate cox regression) to assess the prognostic value of survival-associated genes. A risk estimation model was established, and the expression of 16 genes was found to be highly correlated with recurrence-free survival and high-risk group with low OS. In the current study, OS prediction was made using ML in Stage III lung cancer, and significant parameters were determined as PTV, lymph node site, and KPS score with the logistic regression algorithm providing the best predictive results.

Gupta et al.[17] predicted 6-month, 12-month and 24-month OS times in 869 cancer patients, and calculated the AUC values as 0.87 (95% CI: 0.848-0.890), 0.796 (95% CI: 0.774-0.823) and 0.764 (95% CI: 0.737-0.789), respectively. Parikh et al.[19] performed the prediction of 6-month survival in cancer patients. Of the 26,525 cancer cases evaluated, 1,065 died within 180 days. The data of 70% of the cases were used for training and 30% for testing. They reported the positive predictive values of the RFC, XGB and logistic regression algorithms as 51.3%, 49.4%, and 44.7%, respectively, and their AUC (95% CI) values as 0.88 (0.86-0.89), 0.87 (0.85-0.89), and 0.86 (0.84-0.88), respectively. In the current study, 12- and 20-month OS predictions were made, and significant variables affecting survival time were determined as T stage, lymph node site, GTV, surgical history, and histopathology. MLP was the algorithm with the highest accuracy rate in the OS time prediction.

The N stage, which is also used in TNM staging, affects the treatment decision and prognosis. In a previous study, the 5-year OS was examined according to the Nclinical and Npathological stages, and these rates were found to be 60% and 75%, respectively, for N0, 37% and 49%, respectively, for N1, 23% and 36%, respectively, for N2, and 9% and 20%, respectively, for N3.[22] Descriptors of the N stage (lymph node site) in the TNM system, which are routinely used when making the treatment decision, were also determined as a significant variable in the current study for the prediction of OS and OS time using ML. In another study with 157 cases diagnosed with locally advanced lung cancer, Pöttgen et al.[13] considered Nclinical stage, addition of pneumonectomy to treatment, gender, adenocarcinoma histology, age, and Pancoast tumor localization as significant prognostic factors. Firat et al.,[23] evaluating 163 patients with a lung cancer diagnosis, identified comorbidity and KPS score <70 to be prognostic factors for OS. In a review published by Hirsch et al.,[24] the effect of histology on prognosis in lung cancer was investigated by evaluating 408 studies, of which 11 had established a relationship between histology and clinical outcomes and seven had shown that histopathology affected oncological results in locally advanced lung cancer. In the current study, the KPS score was a significant variable for the OS prediction, surgical history, and histopathology for the OS time prediction.

In a study conducted with 207 cases diagnosed with inoperable lung cancer, Bradley et al.[25] accepted receiving RT as a prognostic factor for not only OS but also disease-specific survival and local tumor control. Etiz et al.,[26] carrying out a study with a 150-patient sample with Stage I-IIIB lung cancer, reported that total tumor volume, age, KPS score, and gender were significant prognostic factors affecting OS. In the current study, significant variables for the OS and OS time prediction were identified as PTV and GTV, respectively.

In the current study, in the prediction of OS time, the cases that survived for ≤20 months were successfully predicted by the MLP algorithm at an accuracy rate of 91%, and this algorithm had an accuracy of 31% for those surviving for >20 months. The same algorithm had a 48% accuracy rate in predicting patients surviving for ≤12 months and 82% accuracy rate in predicting those surviving for more than 12 months. These results may be associated with the patient data set including a low number of cases surviving for <12 months or more than 20 months. There is a need for larger case studies on ML.

Gupta et al.[27] performed TNM staging and 5-year disease-free survival prediction among 4,021 cases diagnosed with colon cancer. The authors reported that the RFC algorithm had the highest accuracy in both TNM staging (89%) and 5-year disease-free survival prediction (84%). In the current study, the prediction of time to progression was undertaken with the significant variables of age, NLR, T stage, and lymph node site, and as a result, the RFC algorithm had the highest accuracy rate. Inflammation is a known factor for the development and progression of cancer.[8] While the presence of CD8 T cells in tumor microenvironment is related to better oncological results, neutrophils, M2 polarized macrophages, and FOXP3 positive regulator T cells are associated with a poor prognosis.[28-30] In many cancer types, such as those of the breast, head and neck, kidney, and stomach, the relationship between high NLR and poor prognosis has been reported in many studies. [31-33] In their meta-analysis of 19 studies with a total of 7283 cases diagnosed with lung cancer, Yang et al.[34] determined that higher NLR was associated with lower OS and progression-free survival. In the same study, tumor invasion depth, extension of lymph node metastasis, poor differentiation, and vascular invasion were associated with high NLR. NLR may show a pro-angiogenic/ pro-inflammatory status in tumor tissue, which may reflect the immune system function of patients.[35] A high NLR value indicates high neutrophil and low lymphocyte levels, indirectly associated with low lymphocyte- mediated immune response, accelerated tumor process, and poor prognosis.[36]

ML is becoming part of people's lives day by day, and its use in the health area can both improve treatment outcome and reduce treatment costs. However, large data sets are required for ML, and data size and diversity are important to achieve an effective algorithm. There is still no standard ML algorithm to predict prognosis, treatment outcome, or toxicity rate in oncology, and multicenter large-scale data are required to create the most appropriate algorithm. Thus, in future work, it is planned to establish big data and re-evaluate the results by increasing the number of patients and collaborating with other centers.

Conclusion

Given high treatment costs, potential serious toxicity, harms of early progression, and low survival in cases of ineffective treatment, predictive systems with ML are promising. Multicenter studies with large data sets can provide algorithms with higher accuracy rates.

Peer-review: Externally peer-reviewed.

Conflict of Interest: All authors declared no conflict of interest.

Ethics Committee Approval: The study was approved by the Eskişehir Osmangazi University Non-Invasive Clinical Research Ethics Committee (No: 29, Date: 17/12/2019).

Financial Support: None declared.

Authorship contributions: Concept - D.E., M.Y., M.M., G.A., Ş.Y.; Design - D.E., M.Y., M.M.; Supervision - D.E., M.Y.; Funding - D.E., M.Y., Ş.Y.; Materials - M.Y., Ş.Y.; Data collection and/or processing - M.Y., Ş.Y.; Data analysis and/ or interpretation - D.E., M.Y., Ş.Y., M.M., G.A., Ö.Ç.; Literature search - D.E., M.Y. Ş.Y., M.M., G.A.; Writing - D.E., M.Y. Ş.Y., M.M., G.A., Ö.Ç.; Critical review - D.E., M.Y. Ş.Y., M.M., G.A., Ö.Ç.

References

1) Siegel R, Naishadham D, Jemal A. Cancer statistics. CA Cancer J Clin 2013;63(1):11?30.

2) Aupérin A, Le Péchoux C, Rolland E, Curran WJ, Furuse K, Fournel P, et al. Meta-analysis of concomitant versus sequential radiochemotherapy in locally advanced non-small-cell lung cancer. J Clin Oncol 2010;28(13):2181?90.

3) Chen J, Jiang R, Garces YI, Jatoi A, Stoddard SM, Sun Z, et al. Prognostic factors for limited-stage small cell lung cancer: A study of 284 patients. Lung Cancer 2010;67(2):221?6.

4) Meyer P, Noblet V, Mazzara C, Lallement A. Survey on deep learning for radiotherapy. Comput Biol Med 2018;98:126?46.

5) Jarrett D, Stride E, Vallis K, Gooding MJ. Applications and limitations of machine learning in radiation oncology. Br J Radiol 2019;92(1100):20190001.

6) Lynch CM, Abdollahi B, Fuqua JD, de Carlo AR, Bartholomai JA, Balgemann RN, et al. Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int J Med Inform 2017;108:1-8.

7) Brierley J, Gospodarowicz MK, Wittekind C. TNM Classification of Malignant Tumours. 8th ed. Hoboken, NJ: John Wiley and Sons, Inc.; 2017.

8) Diem S, Schmid S, Krapf M, Flatz L, Born D, Jochum W, et al. Neutrophil-to-lymphocyte ratio (NLR) and platelet-to-lymphocyte ratio (PLR) as prognostic markers in patients with non-small cell lung cancer (NSCLC) treated with nivolumab. Lung Cancer 2017;111:176?81.

9) Hong S, Zhou T, Fang W, Xue C, Hu Z, Qin T, et al. The prognostic nutritional index (PNI) predicts overall survival of small-cell lung cancer patients. Tumor Biol 2015;36(5):3389?97.

10) Zhu H, Zhou Z, Xue Q, Zhang X, He J, Wang L. Treatment modality selection and prognosis of early stage small cell lung cancer: Retrospective analysis from a single cancer institute. Eur J Cancer Care (Engl) 2013;22(6):789?96.

11) Kasmann L, Bolm L, Janssen S, Rades D. Prognostic factors and treatment of early-stage small-cell lung cancer. Anticancer Res 2017;37(3):1535?8.

12) Wang L, Dong T, Xin B, Xu C, Guo M, Zhang H, et al. Integrative nomogram of CT imaging, clinical, and hematological features for survival prediction of patients with locally advanced non-small cell lung cancer. Eur Radiol 2019;29(6):2958?67.

13) Pöttgen C, Stuschke M, Graupner B, Theegarten D, Gauler T, Jendrossek V, et al. Prognostic model for longterm survival of locally advanced non-small-cell lung cancer patients after neoadjuvant radiochemotherapy and resection integrating clinical and histopathologic factors. BMC Cancer 2015;15:363.

14) de Haan L, Ferreira A. Extreme Value Theory: An Introduction. Berlin: Springer Science Business Media; 2007.

15) Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 2002;16:321?57.

16) Ron K. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann; 1995. p. 1137?43.

17) Gupta S, Tran T, Luo W, Phung D, Kennedy RL, Broad A, et al. Machine learning prediction of cancer survival: A retrospective study using electronic administrative records and a cancer registry. BMJ Open 2014;4(3):e004007.

18) Chen YC, Ke WC, Chiu HW. Risk classification of cancer survival using ANN with gene expression data from multiple laboratories. Comput Biol Med 2014;48:1-7.

19) Parikh RB, Manz C, Chives C, Regli SH, Braun J, Draugelis ME, et al. Machine learning approaches to predict 6-month mortality among patients with cancer. JAMA Netw Open 2019;2(10):e1915997.

20) Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak 2019;19:48.

21) Li Y, Ge D, Gu J, Xu F, Zhu Q, Lu C, et al. A large cohort study identifying a novel prognosis prediction model for lung adenocarcinoma through machine learning strategies. BMC Cancer 2019;19:886.

22) Asamura H, Chansky K, Crowley J, Goldstraw P, Rusch VW, Vansteenkiste JF, et al. The international association for the study of lung cancer lung cancer staging project: Proposals for the revision of the N descriptors in the forthcoming 8th edition of the TNM classification for lung cancer. J Thorac Oncol 2015;10(12):1675-

84)

23) Firat S, Bousamra M, Gore E, Byhardt RW. Comorbidity and KPS are independent prognostic factors in stage I non-small-cell lung cancer. Int J Rad Oncol Biol Phys 2002;52(4):1047?57.

24) Hirsch FR, Spreafico A, Novella S, Wood MD, Simms L, Papotti M. The prognostic and predictive role of histology in advanced non-small cell lung cancer a literature review. J Thorac Oncol 2008;3(12):1468?81.

25) Bradley JD, Ieumwananonthachai N, Purdy JA, Wasserman TH, Lockett MA, Graham MV, et al. Gross tumor volume, critical prognostic factor in patients treated with three-dimensional conformal radiation therapy for non?small-cell lung carcinoma. Int J Rad Oncol Biol Phys 2002;52(1):49?57.

26) Etiz D, Marks LB, Zhou SM, Bentel GC, Clough R, Hernando ML, et al. Influence of tumor volume on survival in patients irradiated for non?small-cell lung cancer. Int J Rad Oncol Biol Phys 2002;53(4):835?46.

27) Gupta P, Chiang SF, Sahoo PK, Mohapatra SK, You JF, Onthoni DD, et al. Prediction of colon cancer stages and survival period with machine learning approach. Cancers 2019;11(12):2007.

28) Diakos CI, Charles KA, McMillan DC, Clarke SJ. Cancer-related inflammation and treatment effectiveness. Lancet Oncol 2014;15(11):e493?503.

29) Yuan A, Hsiao YJ, Chen HY, Chen HW, Ho CC, Chen YY, et al. Opposite effects of M1 and M2 macrophage subtypes on lung cancer progression. Sci Rep 2015;5:14273.

30) Tao H, Mimura Y, Aoe K, Kobayashi S, Yamamoto H, Matsuda E, et al. Prognostic potential of FOXP3 expression in non-small cell lung cancer cells combined with tumor-infiltrating regulatory T cells. Lung Cancer 2012;75(1):95?101.

31) Pei D, Zhu F, Chen X, Qian J, He S, Qian Y, et al. Preadjuvant chemotherapy leukocyte count may predict the outcome for advanced gastric cancer after radical resection. Biomed Pharmacother 2014;68(2):213?7.

32) Tsai YD, Wang CP, Chen CY, Lin LW, Hwang TZ, Lu LF, et al. Pretreatment circulating monocyte count associated with poor prognosis in patients with oral cavity cancer. Head Neck 2014;36(7):947?53.

33) Forget P, Machiels JP, Coulie PG, Berliere M, Poncelet AJ, Tombal B, et al. Neutrophil: Lymphocyte ratio and intraoperative use of ketorolac or diclofenac are prognostic factors in different cohorts of patients undergoing breast, lung, and kidney cancer surgery. Ann Surg Oncol 2013;20(Suppl 3):S650?60.

34) Yang HB, Xing M, Ma LN, Feng LX, Yu Z. Prognostic significance of neutrophil-lymphocyteratio/plateletlymphocyteratioin lung cancers: A meta-analysis. Oncotarget 2016;7(47):76769?78.

35) Botta C, Barbieri V, Ciliberto D, Rossi A, Rocco D, Addeo R, et al. Systemic inflammatory status at baseline predicts bevacizumab benefit in advanced nonsmall cell lung cancer patients. Cancer Biol Ther 2013;14(6):469?75.

36) Cho H, Hur HW, Kim SW, Kim SH, Kim JH, Kim YT, et al. Pre-treatment neutrophil to lymphocyte ratio is elevated in epithelial ovarian cancer and predicts survival after treatment. Cancer Immunol Immunother 2009;58(1):15?23.