
EARL 2024



That time is observation time, not time to event.


If we assume that’s time-to-event, we assume everything is an event.


Who is likely to stop being a customer while we observe them?
Our outcome has two aspects: time and event status.
Our outcome may be censored: incomplete data is not missing data.
Regression and classification are not directly equipped to deal with either challenge.
Survival analysis is unique because it simultaneously considers if events happened (i.e. a binary outcome) and when events happened (e.g. a continuous outcome).1




Model the survival curve (or derivatives) to capture time and event status.
Censored observations are partially included, rather than discarded.
#> # A tibble: 7,032 × 18
#>   tenure churn female senior_citizen partner dependents phone_service
#>    <int> <fct>  <dbl>          <int>   <dbl>      <dbl>         <dbl>
#> 1      1 No         1              0       1          0             0
#> 2     34 No         0              0       0          0             1
#> 3      2 Yes        0              0       0          0             1
#> 4     45 No         0              0       0          0             0
#> 5      2 Yes        1              0       0          0             1
#> 6      8 Yes        1              0       0          0             1
#> # ℹ 7,026 more rows
#> # ℹ 11 more variables: multiple_lines <chr>, internet_service <fct>,
#> #   online_security <chr>, online_backup <chr>,
#> #   device_protection <chr>, tech_support <chr>, streaming_tv <chr>,
#> #   streaming_movies <chr>, paperless_billing <dbl>,
#> #   payment_method <fct>, monthly_charges <dbl>telco_rec <- recipe(churn_surv ~ ., data = telco_train) %>% 
  step_zv(all_predictors()) 
telco_spec <- proportional_hazards() %>%
  set_mode("censored regression") %>%
  set_engine("survival")
telco_wflow <- workflow() %>%
  add_recipe(telco_rec) %>%
  add_model(telco_spec)
telco_fit <- fit(telco_wflow, data = telco_train)tidymodels for time-to-event data
 
  
 
 
  
  
 

 
 


