useR! 2024
#> # A tibble: 7,032 × 18
#> tenure churn female senior_citizen partner dependents phone_service
#> <int> <fct> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 1 No 1 0 1 0 0
#> 2 34 No 0 0 0 0 1
#> 3 2 Yes 0 0 0 0 1
#> 4 45 No 0 0 0 0 0
#> 5 2 Yes 1 0 0 0 1
#> 6 8 Yes 1 0 0 0 1
#> # ℹ 7,026 more rows
#> # ℹ 11 more variables: multiple_lines <chr>, internet_service <fct>,
#> # online_security <chr>, online_backup <chr>,
#> # device_protection <chr>, tech_support <chr>, streaming_tv <chr>,
#> # streaming_movies <chr>, paperless_billing <dbl>,
#> # payment_method <fct>, monthly_charges <dbl>
Let’s try to predict:
That time is observation time, not time to event.
If we assume that’s time-to-event, we assume everything is an event.
Who is likely to stop being a customer while we observe them?
Survival analysis is unique because it simultaneously considers if events happened (i.e. a binary outcome) and when events happened (e.g. a continuous outcome).1
tidymodels is a framework for modelling and
machine learning using tidyverse principles.
Focus on the modelling question,
not the infrastructure for
empirical validation.
Focus on the modelling question,
not the syntax.
telco_rec <- recipe(churn_surv ~ ., data = telco_train) %>%
step_zv(all_predictors())
telco_spec <- survival_reg() %>%
set_mode("censored regression") %>%
set_engine("survival")
telco_wflow <- workflow() %>%
add_recipe(telco_rec) %>%
add_model(telco_spec)
telco_fit <- fit(telco_wflow, data = telco_train)
Learn more via