Baseline characteristics of the study populationThe procedure of subject inclusion is shown in Fig. 1. The process of variable selection, nomogram construction and performance validation is depicted in Fig. 2. According to meticulous inclusion and exclusion criteria, 249 eligible participants were enrolled and divided into two independent cohorts (Fig. 1). Participants who underwent an index colonoscopy between September 29, 2004, and May 6, 2016, were included in the primary cohort (n = 174) to construct the nomogram, and those who underwent an index colonoscopy between May 11, 2016, and April 10, 2019, were treated as the validation cohort (n = 75) to appraise model performance. At a median (range) follow-up time of 25.40 months (6.07–144.47), 38.15% (95 of 249) of the subjects developed colorectal adenomas. The 12-month, 24-month, and 36-month AFS percentages were 93.57%, 81.93% and 75.90%, respectively.Figure 1Flow diagram depicting the procedure of subject inclusion.Figure 2Flow diagram depicting the process of variable selection, nomogram construction and performance validation. LASSO: least absolute shrinkage and selection operator; Cox-PH: Cox proportional hazards; AIC: Akaike information criterion; AUC: area under the curve; C-index: concordance index; DCA: decision curve analysis; KM: Kaplan–Meier; SE: sensitivity; SP: specificity; PPV: positive predictive value; NPV: negative predictive value; PLR: positive likelihood ratio; NLR: negative likelihood ratio.The demographic and clinical characteristics of subjects in the primary cohort and the validation cohort were demonstrated in Table 1. In the primary cohort, 58.05% (n = 101) of the subjects were female, with a median (IQR) age of 61.134 (58.579, 64.411) years, and a median (IQR) BMI of 23.931 (22.656, 25.952) kg/m2. Moreover, 14.94% (n = 26) of them were current or past smokers, and 14.94% (n = 26) of them drank occasionally, frequently or even daily. Additionally, 43.68% (n = 76) of them shared the common feature of physical inactivity, and 12.07% (n = 21) of them had a family history of CRC in FDR. The percentages of individuals who had a personal history of chronic constipation, chronic diarrhea, and chronic appendicitis or appendectomy were 29.89% (n = 52), 28.74% (n = 50) and 10.34% (n = 18), respectively. Notwithstanding a temporal disconnect, the characteristics of subjects in the validation cohort were similar to those of subjects in the primary cohort. Only age (p < 0.0001) and physical activity (p = 0.003) were significantly different between the two cohorts. The characteristics of subjects in the occurrence group and the non-occurrence group within each cohort were exhibited in Supplementary Table S1.
Table 1 Baseline information of demographic and clinical characteristics of subjects in the primary cohort and the validation cohort.Identification of predictive factors and construction of nomogramMethod-1In the LASSO regression analysis based on tenfold cross-validation (Fig. 3A,B), a λ value of 0.013, which corresponded with a log(λ) of −4.360, was chosen according to the minimum criteria. Eight variables with non-zero coefficients were retained, including baseline age, BMI, physical activity, history of chronic constipation, gender, smoking status, alcohol consumption, and family history of CRC in FDR. As shown in Fig. 3C, the multivariate Cox-PH regression analysis disclosed that among these eight variables, age (hazard ratio [HR] = 1.131, 95% confidence interval [CI] = 1.066–1.200, p < 0.0001), BMI (HR = 1.124, 95% CI = 1.020–1.238, p = 0.018), physical activity (HR = 0.526, 95% CI = 0.303–0.914, p = 0.023) and family history of CRC in FDR (HR = 3.335, 95% CI = 1.394–7.977, p = 0.007) were independent predictors for the development of colorectal adenomas, while other parameters demonstrated no significant association (p > 0.05). No VIF ≥ 5 was detected (Fig. 3C), which excluded the possibility of multicollinearity among variables and justified our use of the multivariate Cox-PH regression. As demonstrated in Fig. 4A–H, the results of Schoenfeld’s individual and global test indicated no violation of the PH assumption (p > 0.05), proving that none of the covariates were time-varying. The results of deviance residual also uncovered that none of the individual observations were highly influential (Fig. 4I–P). Subsequently, we incorporated these four variables into Nomogram-1 (Fig. 5A) formulated using the multivariate Cox-PH regression, the result of which was demonstrated in Table 2. No VIF ≥ 5 (Table 2) or breach of the PH assumption (p > 0.05, Supplementary Figure S1A–D) was detected, and there existed no greatly influential individual observations (Supplementary Figure S1E–H). The AIC, 24-month AUC, 30-month AUC and 36-month AUC values were calculated as 502.451, 0.657, 0.638, and 0.642 respectively.Figure 3Variable selection through the LASSO-Cox regression for predicting risks of colorectal adenoma occurrence based on the primary cohort. (A) Tuning parameter (λ) selection of deviance in the LASSO regression analysis. The red dot denotes the CVM, and the gray line stands for the SE of CVM. (B) LASSO coefficient profiles of ten candidate variables. Each curve in different color signifies the trajectory of the coefficient of each variable. (C) A forest plot exhibiting results of the multivariate Cox-PH regression analysis. CVM: mean cross-validated error; SE: standard error; HR: hazard ratio; CI: confidence interval; VIF: variance inflation factor; BMI: body mass index; CRC: colorectal cancer; FDR: first-degree relatives.Figure 4Schoenfeld’s individual and global test and deviance residual were conducted to appraise whether the eight covariates selected using the LASSO regression were associated with time before performing the multivariate Cox-PH regression analysis. (A–H) Plots of the scaled Schoenfeld residuals against the changed time. The solid line represents a smoothing spline fit to the plot, and the dashed lines represent a ± 2-SE band around the fit. Considerable departures from the horizontal line indicate nonproportional hazards. (I–P) Index graphs of dfbeta for the Cox-PH regression of AFS. The graphs comparing the magnitudes of the largest dfbeta values with the regression coefficients uncover that there exist no greatly influential individual observations. AFS: adenoma-free survival.Figure 5Construction of two nomogram models for predicting 12-month, 24-month, and 36-month AFS probabilities.Table 2 Results of the multivariate Cox-PH regression analysis for construction of two nomogram models.Method-2The univariate Cox-PH regression analysis elucidated that older age (HR = 1.099, 95% CI = 1.048–1.153, p = 0.001), higher BMI (HR = 1.147, 95% CI = 1.038–1.269, p = 0.007) and male gender (HR = 1.743, 95% CI = 1.073–2.832, p = 0.025) were significantly associated with an elevated risk of developing colorectal adenomas (Table 3). The results of Schoenfeld’s tests and deviance residual were shown in Supplementary Figure S2. The multivariate Cox-PH regression analysis further disclosed that among the three variables, only age (HR = 1.102, 95% CI = 1.047–1.161, p < 0.0001, Table 3) and BMI (HR = 1.133, 95% CI = 1.031–1.246, p = 0.010) were independent predictors for the development of colorectal adenomas, while other parameters demonstrated no significant association (p > 0.05). No VIF ≥ 5 was detected (Table 3), and the results of Schoenfeld’s test and deviance residual suggested no breach of the PH assumption (p > 0.05, Supplementary Figure S3A–C) or greatly influential individual observations (Supplementary Figure S3D–F). Subsequently, we incorporated these two variables into Nomogram-2 (Fig. 5B) formulated using multivariate Cox-PH regression, the result of which was demonstrated in Table 2. No VIF ≥ 5 (Table 2) or breach of the PH assumption (p > 0.05, Supplementary Figure S4A,B) was detected, and there existed no greatly influential individual observations (Supplementary Figure S4C,D). The AIC, 24-month AUC, 30-month AUC and 36-month AUC values were calculated as 506.608, 0.590, 0.603 and 0.605 respectively.
Table 3 Variable selection through the univariate/multivariate Cox-PH regression analysis for predicting risks of colorectal adenoma occurrence based on the primary cohort.Consequently, we chose Nomogram-1, which harbored both lower AIC and higher AUC values than Nomogram-2, for predicting risks of colorectal adenoma occurrence. Within Nomogram-1, points are assigned by drawing a line upward from the corresponding values to the “Points” line. The sum of these four points, plotted downward on the “Total Points” line, corresponds with predictions of 12-month, 24-month, and 36-month AFS probabilities.Validation of nomogram performanceThe total score of these predictors was 182.504 points. The optimal cut-off value of the nomogram was 89.026 in the primary cohort and 93.581 in the validation cohort. Under the optimal cut-off value, the SE, SP, PPV, NPV, PLR, and NLR were 69.70%, 57.41%, 50.00%, 75.61%, 1.636 and 0.528 respectively in the primary cohort, and were 31.03%, 86.96%, 60.00%, 66.67%, 2.379, and 0.793 respectively in the validation cohort.To further evaluate the performance of Nomogram-1, the time-dependent ROC curves of the primary cohort and the validation cohort were delineated for AFS status (Fig. 6A). The C-index and bootstrapping-corrected C-index were calculated as 0.682 and 0.663 respectively for the primary cohort, indicating reliable predictive accuracy of the nomogram. The ROC curves of the primary cohort for 24-month, 30-month and 36-month AFS were shown in Fig. 6B–D. Calibration plots exhibited a robust pertinence between the actual probability (y-axis) and the predicted probability (x-axis) of 24-month AFS in the primary cohort and the validation cohort. There also existed a high consistency between the actual probability (y-axis) and the predicted probability (x-axis) of 30-month AFS in the primary cohort and the validation cohort (Fig. 6E–H). DCA for 24-month and 30-month AFS also manifested that applying our nomogram to identify patients who developed colorectal adenomas after negative index colonoscopy had an edge over the scheme of ‘‘surveillance colonoscopy for no patients’’ and the strategy of ‘‘surveillance colonoscopy for all patients’’, suggesting great clinical utility of the nomogram in both cohorts (Fig. 6I,J). To assess the risk stratification ability of Nomogram-1, participants were divided into the high-risk group and the low-risk group based on the optimal cut-off value of risk scores in each cohort. The KM survival curves with individual survival numbers and time data were delineated in Fig. 6K–L. Log-rank tests revealed a significantly lower proportion of adenoma-free subjects in the high-risk group compared with that in the low-risk group in the primary cohort (Fig. 6K, p < 0.0001) and the validation cohort (Fig. 6L, p = 0.017).Figure 6The predictive accuracy, discriminatory ability, clinical utility, and risk stratification capacity of the nomogram for predicting AFS probabilities were evaluated using the tROC curves, calibration plots, DCA curves, and KM survival curves. (A) The tROC curves indicated stable predictive accuracy of the nomogram over time in the primary cohort (blue) and the validation cohort (red). (B–D) The ROC curves of the nomogram for predicting 24-month (B), 30-month (C), and 36-month (D) AFS probabilities based on the primary cohort (blue) and the validation cohort (red). (E–H) Calibration plots exhibited a robust pertinence between the actual probability (y-axis) and the predicted probability (x-axis) of 24-month AFS in the primary cohort (E) and the validation cohort (F). There also existed a high consistency between the actual probability (y-axis) and the predicted probability (x-axis) of 30-month AFS in the primary cohort (G) and the validation cohort (H). The grey line represents the ideal fit. The blue or red line reflects the nomogram prediction, of which a closer fit to the grey line suggests better performance. (I,J) DCA of the nomogram in the primary cohort (blue) and the validation cohort (red) at 24-month (I) and 30-month (J) follow up. The black dotted line represents the screen-none scheme. The red or blue solid line represents the screen-all strategy. The red or blue dotted line represents the nomogram. (K,L) KM survival curves demonstrating the AFS probabilities in the primary cohort (K) and the validation cohort (L) with individual survival numbers and time data. tROC: time-dependent receiver operating characteristic; DCA: decision curve analysis; AUC: area under the curve.



Source link