The concordance statistic (“c-index”, Harrell et al (1996)) is a metric that quantifies that the rank order of the times is consistent with the model score.
It takes into account censoring and does not depend on a specific evaluation time. The range of values is \([-1, 1]\).
where \(N\) is the number of non-missing rows in the data.
Brier scores
demo_brier <-brier_survival(demo_cat_preds, truth = event_time, .pred)demo_brier#> # A tibble: 10 × 4#> .metric .estimator .eval_time .estimate#> <chr> <chr> <dbl> <dbl>#> 1 brier_survival standard 30 0.151#> 2 brier_survival standard 60 0.174#> 3 brier_survival standard 90 0.148#> 4 brier_survival standard 120 0.170#> 5 brier_survival standard 150 0.159#> 6 brier_survival standard 180 0.185#> 7 brier_survival standard 210 0.147#> 8 brier_survival standard 240 0.156#> 9 brier_survival standard 270 0.162#> 10 brier_survival standard 300 0.129
Brier scores over evaluation time
Integrated Brier Score
brier_survival_integrated(demo_cat_preds, truth = event_time, .pred)#> # A tibble: 1 × 3#> .metric .estimator .estimate#> <chr> <chr> <dbl>#> 1 brier_survival_integrated standard 0.144
Area Under the ROC Curve
We can use the standard ROC curve machinery on the indicators, probabilities, and censoring weights at evaluation time \(\tau\).
See Hung and Chiang (2010).
ROC curves measure the separation between events and non-events and are ignorant of how well-calibrated the probabilities are.
Area under the ROC curve
demo_roc_auc <-roc_auc_survival(demo_cat_preds, truth = event_time, .pred)demo_roc_auc#> # A tibble: 10 × 4#> .metric .estimator .eval_time .estimate#> <chr> <chr> <dbl> <dbl>#> 1 roc_auc_survival standard 30 0.704#> 2 roc_auc_survival standard 60 0.615#> 3 roc_auc_survival standard 90 0.556#> 4 roc_auc_survival standard 120 0.556#> 5 roc_auc_survival standard 150 0.552#> 6 roc_auc_survival standard 180 0.552#> 7 roc_auc_survival standard 210 0.481#> 8 roc_auc_survival standard 240 0.481#> 9 roc_auc_survival standard 270 0.481#> 10 roc_auc_survival standard 300 0.483
ROC AUC over evaluation time
Setting evaluation times
You can get predictions at any values of \(\tau\).
During model development, we suggest picking a more focused set of evaluation times (for computational time).
You should also pick a single time to perform your optimizations/comparisons with and list that value first in the vector. If 90 days was of particular interest over a 30-to-120-day span, you’d use
collect_metrics(final_fit)#> # A tibble: 4 × 5#> .metric .estimator .eval_time .estimate .config #> <chr> <chr> <dbl> <dbl> <chr> #> 1 brier_survival standard 30 0.217 Preprocessor1_Model1#> 2 brier_survival standard 60 0.225 Preprocessor1_Model1#> 3 brier_survival standard 90 0.160 Preprocessor1_Model1#> 4 brier_survival standard 120 0.108 Preprocessor1_Model1
These are metrics computed with the test set
What is in final_fit?
extract_workflow(final_fit)#> ══ Workflow [trained] ════════════════════════════════════════════════#> Preprocessor: Formula#> Model: rand_forest()#> #> ── Preprocessor ──────────────────────────────────────────────────────#> event_time ~ .#> #> ── Model ─────────────────────────────────────────────────────────────#> ---------- Oblique random survival forest#> #> Linear combinations: Accelerated Cox regression#> N observations: 1765#> N events: 1116#> N trees: 1000#> N predictors total: 18#> N predictors per node: 6#> Average leaves per tree: 142.739#> Min observations in leaf: 5#> Min events in leaf: 1#> OOB stat value: 0.63#> OOB stat type: Harrell's C-index#> Variable importance: anova#> #> -----------------------------------------
Use this for prediction on new data, like for deploying