- Set Up
- Data Set: Diamonds
- Separating Testing and Training Data:
**rsample** - Data Pre-Processing and Feature Engineering:
**recipes** - Defining and Fitting Models:
**parsnip** - Summarizing Fitted Models:
**broom** - Evaluating Model Performance:
**yardstick** - Tuning Model Parameters:
**tune**and**dials** - Summary
- Further Resources

caret is a well known R package for machine learning, which includes almost everything from data pre-processing to cross-validation. The unofficial successor of caret is tidymodels, which has a modular approach meaning that specific, smaller packages are designed to work hand in hand. Thus, tidymodels is to modeling what the tidyverse is to data wrangling. Herein, I will walk through a machine learning example from start to end and explain how to use the appropriate tidymodels packages at each place.

## Set Up

Loading the **tidymodels** package loads a bunch of packages for modeling and also a few others from the tidyverse like **ggplot2** and **dplyr**.
Furthermore, we’ll need **tune** and **workflows**, which are also part of the tidymodels ecosystem but are not attached by `library("tidymodels")`

.

```
library("conflicted")
library("tidymodels")
#> -- Attaching packages ------------------------------------------------ tidymodels 0.0.3 --
#> v broom 0.5.4 v purrr 0.3.3
#> v dials 0.0.4 v recipes 0.1.9
#> v dplyr 0.8.3 v rsample 0.0.5
#> v ggplot2 3.2.1 v tibble 2.1.3
#> v infer 0.5.1 v yardstick 0.0.5
#> v parsnip 0.0.5
library("tune")
library("workflows")
# Additional packages for dataviz etc.
library("ggrepel") # for geom_label_repel()
library("corrplot") # for corrplot()
#> corrplot 0.84 loaded
conflict_prefer("filter", "dplyr")
#> [conflicted] Will prefer dplyr::filter over any other package
ggplot2::theme_set(theme_light())
```

## Data Set: Diamonds

With **ggplot2** comes the diamonds data set, which has information on the size and quality of diamonds.
Herein, we’ll use these features to predict the price of a diamond.

```
data("diamonds")
diamonds %>%
sample_n(2000) %>%
mutate_if(is.factor, as.numeric) %>%
select(price, everything()) %>%
cor %>%
{.[order(abs(.[, 1]), decreasing = TRUE),
order(abs(.[, 1]), decreasing = TRUE)]} %>%
corrplot(method = "number", type = "upper", mar = c(0, 0, 1.5, 0),
title = "Correlations between price and various features of diamonds")
```

## Separating Testing and Training Data: **rsample**

First of all, we want to extract a data set for testing the predictions in the end.
We’ll only use a small proportion for training (only to speed things up a little).
Furthermore, the training data set will be prepared for 3-fold cross-validation (using three here to speed things up).
All this is accomplished using the **rsample** package:

```
set.seed(1243)
dia_split <- initial_split(diamonds, prop = .1, strata = price)
dia_train <- training(dia_split)
dia_test <- testing(dia_split)
dim(dia_train)
#> [1] 5395 10
dim(dia_test)
#> [1] 48545 10
dia_vfold <- vfold_cv(dia_train, v = 3, repeats = 1, strata = price)
dia_vfold %>%
mutate(df_ana = map(splits, analysis),
df_ass = map(splits, assessment))
#> # 3-fold cross-validation using stratification
#> # A tibble: 3 x 4
#> splits id df_ana df_ass
#> * <named list> <chr> <named list> <named list>
#> 1 <split [3.6K/1.8K]> Fold1 <tibble [3,596 x 10]> <tibble [1,799 x 10]>
#> 2 <split [3.6K/1.8K]> Fold2 <tibble [3,596 x 10]> <tibble [1,799 x 10]>
#> 3 <split [3.6K/1.8K]> Fold3 <tibble [3,598 x 10]> <tibble [1,797 x 10]>
```

## Data Pre-Processing and Feature Engineering: **recipes**

The **recipes** package can be used to prepare a data set (for modeling) using different `step_*()`

functions.
For example, the plot below indicates that there may be a nonlinear relationship between price and carat, and I want to address that using higher-order terms.

```
qplot(carat, price, data = dia_train) +
scale_y_continuous(trans = log_trans(), labels = function(x) round(x, -2)) +
geom_smooth(method = "lm", formula = "y ~ poly(x, 4)") +
labs(title = "Nonlinear relationship between price and carat of diamonds",
subtitle = "The degree of the polynomial is a potential tuning parameter")
```

The `recipe()`

takes a formula and a data set, and then the different steps are added using the appropriate `step_*()`

functions.
The **recipes** package comes with a ton of useful step functions (see, e.g., `vignette("Simple_Example", package = "recipes")`

).

Herein, I want to log transform price (`step_log()`

), I want to center and scale all numeric predictors (`step_normalize()`

), and the categorical predictors should be dummy coded (`step_dummy()`

).
Furthermore, a quadratic effect of carat is added using `step_poly()`

.

```
dia_rec <-
recipe(price ~ ., data = dia_train) %>%
step_log(all_outcomes()) %>%
step_normalize(all_predictors(), -all_nominal()) %>%
step_dummy(all_nominal()) %>%
step_poly(carat, degree = 2) %>%
prep()
dia_rec
#> Data Recipe
#>
#> Inputs:
#>
#> role #variables
#> outcome 1
#> predictor 9
#>
#> Training data contained 5395 data points and no missing data.
#>
#> Operations:
#>
#> Log transformation on price [trained]
#> Centering and scaling for carat, depth, table, x, y, z [trained]
#> Dummy variables from cut, color, clarity [trained]
#> Orthogonal polynomials on carat [trained]
```

Calling `prep()`

on a recipe applies all the steps.
You can now call `juice()`

to extract the transformed data set or call `bake()`

on a new data set.

```
# Note the linear and quadratic term for carat and the dummies for e.g. color
dia_juiced <- juice(dia_rec)
dim(dia_juiced)
#> [1] 5395 25
names(dia_juiced)
#> [1] "depth" "table" "x" "y" "z"
#> [6] "price" "cut_1" "cut_2" "cut_3" "cut_4"
#> [11] "color_1" "color_2" "color_3" "color_4" "color_5"
#> [16] "color_6" "clarity_1" "clarity_2" "clarity_3" "clarity_4"
#> [21] "clarity_5" "clarity_6" "clarity_7" "carat_poly_1" "carat_poly_2"
```

## Defining and Fitting Models: **parsnip**

The **parsnip** package has wrappers around many^{1} popular machine learning algorithms, and you can fit them using a unified interface.
This is extremely helpful, since you have to remember only one rather then dozens of interfaces.

The models are separated into two modes/categories, namely, regression and classification (`set_mode()`

).
The model is defined using a function specific to each algorithm (e.g., `linear_reg()`

, `rand_forest()`

).
Finally, the backend/engine/implementation is selected using `set_engine()`

.

Herein, I will start with a basic linear regression model as implemented in `stats::lm()`

.

```
lm_model <-
linear_reg() %>%
set_mode("regression") %>%
set_engine("lm")
```

Furthermore, take the example of a random forest model.
This could be fit using packages **ranger** or **randomForest**.
Both have different interfaces (e.g., argument `ntree`

vs. `num.trees`

), and **parsnip** removes the hassle of remembering both interfaces.
More general arguments pertaining to the algorithm are specified in the algorithm function (e.g., `rand_forest()`

).
Arguments specific to the engine are specified in `set_engine()`

.

```
rand_forest(mtry = 3, trees = 500, min_n = 5) %>%
set_mode("regression") %>%
set_engine("ranger", importance = "impurity_corrected")
```

Finally, we can `fit()`

the model.

```
lm_fit1 <- fit(lm_model, price ~ ., dia_juiced)
lm_fit1
#> parsnip model object
#>
#> Fit time: 20ms
#>
#> Call:
#> stats::lm(formula = formula, data = data)
#>
#> Coefficients:
#> (Intercept) depth table x y
#> 7.711965 0.010871 0.005889 0.251155 0.054422
#> z cut_1 cut_2 cut_3 cut_4
#> 0.054196 0.106701 -0.026356 0.024207 -0.006191
#> color_1 color_2 color_3 color_4 color_5
#> -0.455831 -0.084108 -0.004810 0.009725 -0.005591
#> color_6 clarity_1 clarity_2 clarity_3 clarity_4
#> -0.009730 0.860961 -0.242698 0.132234 -0.052903
#> clarity_5 clarity_6 clarity_7 carat_poly_1 carat_poly_2
#> 0.028996 0.002403 0.022235 51.663971 -17.508316
```

You can use `fit()`

with a formula (e.g., `price ~ .`

) or by specifying `x`

and `y`

.
In both cases, I recommend keeping only the variables you need when preparing the data set, since this will prevent forgetting the new variable `d`

when using `y ~ a + b + c`

.
Unnecessary variables can easily be dropped in the recipe using `step_rm()`

.

## Summarizing Fitted Models: **broom**

Many models have implemented `summary()`

or `coef()`

methods.
However, the output of these is usually not in a tidy format, and the **broom** package has the aim to resolve this issue.

`glance()`

gives us information about the whole model.
Here, R squared is pretty high and the RMSE equals 0.154.

```
glance(lm_fit1$fit)
#> # A tibble: 1 x 11
#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 0.977 0.977 0.154 9607. 0 25 2457. -4863. -4691. 127.
#> # ... with 1 more variable: df.residual <int>
```

`tidy()`

gives us information about the model parameters, and we see that we have a significant quadratic effect of carat.

```
tidy(lm_fit1) %>%
arrange(desc(abs(statistic)))
#> # A tibble: 25 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 7.71 0.00431 1790. 0.
#> 2 carat_poly_2 -17.5 0.259 -67.7 0.
#> 3 clarity_1 0.861 0.0135 64.0 0.
#> 4 color_1 -0.456 0.00738 -61.8 0.
#> 5 carat_poly_1 51.7 1.10 47.0 0.
#> 6 clarity_2 -0.243 0.0127 -19.2 3.63e-79
#> 7 color_2 -0.0841 0.00665 -12.7 3.50e-36
#> 8 clarity_3 0.132 0.0107 12.3 2.51e-34
#> 9 cut_1 0.107 0.00997 10.7 1.88e-26
#> 10 clarity_4 -0.0529 0.00839 -6.30 3.15e-10
#> # ... with 15 more rows
```

Finally, `augment()`

can be used to get model predictions, residuals, etc.

```
lm_predicted <- augment(lm_fit1$fit, data = dia_juiced) %>%
rowid_to_column()
select(lm_predicted, rowid, price, .fitted:.std.resid)
#> # A tibble: 5,395 x 9
#> rowid price .fitted .se.fit .resid .hat .sigma .cooksd .std.resid
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 5.83 5.90 0.0133 -0.0769 0.00744 0.154 0.0000755 -0.502
#> 2 2 5.84 5.80 0.0110 0.0472 0.00507 0.154 0.0000193 0.307
#> 3 3 5.86 5.89 0.0138 -0.0286 0.00801 0.154 0.0000112 -0.187
#> 4 4 5.88 6.24 0.00988 -0.360 0.00413 0.154 0.000910 -2.34
#> 5 5 6.00 6.24 0.0104 -0.244 0.00458 0.154 0.000467 -1.59
#> 6 6 6.00 6.06 0.0100 -0.0615 0.00427 0.154 0.0000275 -0.401
#> 7 7 6.00 6.10 0.0111 -0.0999 0.00520 0.154 0.0000887 -0.651
#> 8 8 6.00 6.06 0.0104 -0.0566 0.00459 0.154 0.0000251 -0.369
#> 9 9 6.00 6.22 0.00895 -0.216 0.00339 0.154 0.000270 -1.41
#> 10 10 6.32 6.55 0.00872 -0.233 0.00321 0.154 0.000297 -1.52
#> # ... with 5,385 more rows
```

A plot of the predicted vs. actual prices shows small residuals with a few outliers, which are not well explained by the model.

```
ggplot(lm_predicted, aes(.fitted, price)) +
geom_point(alpha = .2) +
ggrepel::geom_label_repel(aes(label = rowid),
data = filter(lm_predicted, abs(.resid) > 2)) +
labs(title = "Actual vs. Predicted Price of Diamonds")
```

## Evaluating Model Performance: **yardstick**

We already saw performance measures RMSE and R squared in the output of `glance()`

above.
The **yardstick** package is specifically designed for such measures for both numeric and categorical outcomes, and it plays well with multiple predictions.

Let’s use **rsample**, **parsnip**, and **yardstick** for cross-validation to get a more accurate estimation of RMSE.

In the following pipeline, the model is `fit()`

separately to the three *analysis data* sets, and then the fitted models are used to `predict()`

on the three corresponding *assessment data* sets (i.e., 3-fold cross-validation).
Before that, `analysis()`

and `assessment()`

are used to extract the respective folds, and `bake()`

is used to apply the recipe steps to these data sets.
(Note that one could have created `dia_vfold`

using the prepped rather than the raw data in the first place to get rid of these two `bake()`

calls.)

Herein, I use list columns to store all information about the three folds in one data frame and a combination of `dplyr::mutate()`

and `purrr::map()`

to “loop” across the rows of the data frame.

```
lm_fit2 <- dia_vfold %>%
mutate(
df_ana = map(splits, analysis),
df_ana = map(df_ana, ~bake(dia_rec, .x)),
df_ass = map(splits, assessment),
df_ass = map(df_ass, ~bake(dia_rec, .x)),
model_fit = map(df_ana, ~fit(lm_model, price ~ ., data = .x)),
model_pred = map2(model_fit, df_ass, ~predict(.x, new_data = .y)))
lm_fit2
#> # 3-fold cross-validation using stratification
#> # A tibble: 3 x 6
#> splits id df_ana df_ass model_fit model_pred
#> * <named list> <chr> <named list> <named list> <named li> <named list>
#> 1 <split [3.6K/~ Fold1 <tibble [3,596~ <tibble [1,79~ <fit[+]> <tibble [1,799~
#> 2 <split [3.6K/~ Fold2 <tibble [3,596~ <tibble [1,79~ <fit[+]> <tibble [1,799~
#> 3 <split [3.6K/~ Fold3 <tibble [3,598~ <tibble [1,79~ <fit[+]> <tibble [1,797~
```

Now, we can extract the actual prices from the assessment data and compare them to the predicted prices.
Across the three folds, we see that the RMSE is a little higher and R squared a little smaller compared to above (see output of `glance(lm_fit1$fit)`

).
This is expected, since out-of-sample prediction is harder but also way more useful.

```
lm_fit2 %>%
mutate(res = map2(df_ass, model_pred, ~data.frame(price = .x$price,
.pred = .y$.pred))) %>%
select(id, res) %>%
unnest(res) %>%
group_by(id) %>%
metrics(truth = price, estimate = .pred)
#> # A tibble: 9 x 4
#> id .metric .estimator .estimate
#> <chr> <chr> <chr> <dbl>
#> 1 Fold1 rmse standard 0.168
#> 2 Fold2 rmse standard 0.147
#> 3 Fold3 rmse standard 0.298
#> 4 Fold1 rsq standard 0.973
#> 5 Fold2 rsq standard 0.979
#> 6 Fold3 rsq standard 0.918
#> 7 Fold1 mae standard 0.116
#> 8 Fold2 mae standard 0.115
#> 9 Fold3 mae standard 0.110
```

Note that `yardstick::metrics()`

has default measures for numeric and categorical outcomes, and here RMSE, R squared, and the mean absolute difference (MAE) is returned.
You could also use one metric directly like `rmse`

or define a custom set of metrics via `metric_set()`

.

## Tuning Model Parameters: **tune** and **dials**

Let’s get a little bit more involved and do some hyperparameter tuning. We turn to a different model, namely, a random forest model.

The **tune** package has functions for doing the actual tuning (e.g., via grid search), while all the parameters and their defaults (e.g., `mtry()`

, `neighbors()`

) are implemented in **dials**.
Thus, the two packages can almost only be used in combination.

### Preparing a **parsnip** Model for Tuning

First, I want to tune the `mtry`

parameter of a random forest model.
Thus, the model is defined using **parsnip** as above.
However, rather than using a default value (i.e., `mtry = NULL`

) or one specific value (i.e., `mtry = 3`

), we use `tune()`

as a placeholder and let cross-validation decide on the best value for `mtry`

later on.

As the output indicates, the default minimum of `mtry`

is 1 and the maximum depends on the data.

```
rf_model <-
rand_forest(mtry = tune()) %>%
set_mode("regression") %>%
set_engine("ranger")
parameters(rf_model)
#> Collection of 1 parameters for tuning
#>
#> id parameter type object class
#> mtry mtry nparam[?]
#>
#> Model parameters needing finalization:
#> # Randomly Selected Predictors ('mtry')
#>
#> See `?dials::finalize` or `?dials::update.parameters` for more information.
mtry()
#> # Randomly Selected Predictors (quantitative)
#> Range: [1, ?]
```

Thus, this model is not yet ready for fitting.
You can either specify the maximum for `mtry`

yourself using `update()`

, or you can use `finalize()`

to let the data decide on the maximum.

```
rf_model %>%
parameters %>%
update(mtry = mtry(c(1L, 5L)))
#> Collection of 1 parameters for tuning
#>
#> id parameter type object class
#> mtry mtry nparam[+]
rf_model %>%
parameters %>%
# Here, the maximum of mtry equals the number of predictors, i.e., 24.
finalize(x = select(juice(dia_rec), -price)) %>%
magrittr::extract2("object")
#> [[1]]
#> # Randomly Selected Predictors (quantitative)
#> Range: [1, 24]
```

### Preparing Data for Tuning: **recipes**

The second thing I want to tune is the degree of the polynomial for the variable carat. As you saw in the plot above, polynomials up to a degree of four seemed well suited for the data. However, a simpler model might do equally well, and we want to use cross-validation to decide on the degree that works best.

Similar to tuning parameters in a model, certain aspects of a recipe can be tuned.
Let’s define a second recipe and use `tune()`

inside `step_poly()`

.

```
# Note that this recipe cannot be prepped (and juiced), since "degree" is unknown
dia_rec2 <-
recipe(price ~ ., data = dia_train) %>%
step_log(all_outcomes()) %>%
step_normalize(all_predictors(), -all_nominal()) %>%
step_dummy(all_nominal()) %>%
step_poly(carat, degree = tune())
dia_rec2 %>%
parameters() %>%
magrittr::extract2("object")
#> [[1]]
#> Polynomial Degree (quantitative)
#> Range: [1, 3]
```

### Combine Everything: **workflows**

The **workflows** package is designed to bundle together different parts of a machine learning pipeline like a recipe or a model.

First, let’s create an initial workflow and add the recipe and the random forest model, both of which have a tuning parameter.

```
rf_wflow <-
workflow() %>%
add_model(rf_model) %>%
add_recipe(dia_rec2)
rf_wflow
#> == Workflow ==============================================================================
#> Preprocessor: Recipe
#> Model: rand_forest()
#>
#> -- Preprocessor --------------------------------------------------------------------------
#> 4 Recipe Steps
#>
#> * step_log()
#> * step_normalize()
#> * step_dummy()
#> * step_poly()
#>
#> -- Model ---------------------------------------------------------------------------------
#> Random Forest Model Specification (regression)
#>
#> Main Arguments:
#> mtry = tune()
#>
#> Computational engine: ranger
```

Second, we need to update the parameters in `rf_wflow`

, because the maximum of `mtry`

is not yet known and the maximum of `degree`

should be four (while three is the default).

```
rf_param <-
rf_wflow %>%
parameters() %>%
update(mtry = mtry(range = c(3L, 5L)),
degree = degree_int(range = c(2L, 4L)))
rf_param$object
#> [[1]]
#> # Randomly Selected Predictors (quantitative)
#> Range: [3, 5]
#>
#> [[2]]
#> Polynomial Degree (quantitative)
#> Range: [2, 4]
```

Third, we want to use cross-validation for tuning, that is, to select the best combination of the hyperparameters.
Bayesian optimization (see `vignette("svm_classification", package = "tune")`

) is recommended for complex tuning problems, and this can be done using `tune_bayes()`

.

Herein, however, grid search will suffice. To this end, let’s create a grid of all necessary parameter combinations.

```
rf_grid <- grid_regular(rf_param, levels = 3)
rf_grid
#> # A tibble: 9 x 2
#> mtry degree
#> <int> <int>
#> 1 3 2
#> 2 4 2
#> 3 5 2
#> 4 3 3
#> 5 4 3
#> 6 5 3
#> 7 3 4
#> 8 4 4
#> 9 5 4
```

Cross-validation and hyperparameter tuning can involve fitting many models.
Herein, for example, we have to fit 3 x 9 models (folds x parameter combinations).
To increase speed, we can fit the models in parallel.
This is directly supported by the **tune** package (see `vignette("optimizations", package = "tune")`

).

```
library("doFuture")
all_cores <- parallel::detectCores(logical = FALSE)
registerDoFuture()
cl <- makeCluster(all_cores)
plan(future::cluster, workers = cl)
```

Then, we can finally start tuning.

```
rf_search <- tune_grid(rf_wflow, grid = rf_grid, resamples = dia_vfold,
param_info = rf_param)
```

The results can be examined using `autoplot()`

and `show_best()`

:

```
autoplot(rf_search, metric = "rmse") +
labs(title = "Results of Grid Search for Two Tuning Parameters of a Random Forest")
```

```
show_best(rf_search, "rmse", maximize = FALSE, n = 9)
#> # A tibble: 9 x 7
#> mtry degree .metric .estimator mean n std_err
#> <int> <int> <chr> <chr> <dbl> <int> <dbl>
#> 1 5 2 rmse standard 0.121 3 0.00498
#> 2 5 3 rmse standard 0.121 3 0.00454
#> 3 4 2 rmse standard 0.122 3 0.00463
#> 4 5 4 rmse standard 0.122 3 0.00471
#> 5 4 3 rmse standard 0.123 3 0.00469
#> 6 4 4 rmse standard 0.124 3 0.00496
#> 7 3 3 rmse standard 0.128 3 0.00502
#> 8 3 2 rmse standard 0.128 3 0.00569
#> 9 3 4 rmse standard 0.128 3 0.00501
select_best(rf_search, metric = "rmse", maximize = FALSE)
#> # A tibble: 1 x 2
#> mtry degree
#> <int> <int>
#> 1 5 2
select_by_one_std_err(rf_search, mtry, degree,
metric = "rmse", maximize = FALSE)
#> # A tibble: 1 x 9
#> mtry degree .metric .estimator mean n std_err .best .bound
#> <int> <int> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 4 2 rmse standard 0.122 3 0.00463 0.121 0.126
```

With a cross-validation RMSE of ca. 0.12, the random forest model seems to outperform the linear regression from above. Furthermore, 0.12 is (hopefully) a realistic estimate of the out-of-sample error.

### Selecting the Best Model to Make the Final Predictions

We saw above that a quadratic trend was enough to get a good model.
Furthermore, cross-validation revealed that `mtry = 4`

seems to perform well.

To use this combination of hyperparameters, we `fit()`

the corresponding model (or workflow, more precisely) on the whole training data set `dia_train`

.
This model (or workflow) can than be used to `predict()`

on data never seen before, namely, `dia_test`

in the present case.

```
rf_param_final <-
select_by_one_std_err(rf_search, mtry, degree,
metric = "rmse", maximize = FALSE)
rf_wflow_final <-
finalize_workflow(rf_wflow, rf_param_final)
rf_wflow_final_fit <-
fit(rf_wflow_final, data = dia_train)
dia_test$.pred <- predict(rf_wflow_final_fit, new_data = dia_test)$.pred
dia_test$logprice <- log(dia_test$price)
metrics(dia_test, truth = logprice, estimate = .pred)
#> # A tibble: 3 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 rmse standard 0.113
#> 2 rsq standard 0.988
#> 3 mae standard 0.0846
```

As you can see, we get an RMSE of 0.11, which is even slightly better compared to the cross-validation RMSE.

## Summary

The tidymodels ecosystem bundles together a set of packages, that work hand in hand to solve machine-learning problems from start to end.
Together with the data-wrangling facilities in the **tidyverse** and the plotting tools from **ggplot2**, this makes for a rich toolbox for every data scientist working with R.

The only thing that is definitely missing in **tidymodels** is a package for combining different machine learning models (i.e., ensemble/stacking/super learner).
We have **caretEnsemble** for **caret**, and I am sure they are working on something similar for **tidymodels** at RStudio.
Alex Hayes has a related blog post focusing on tidymodels, for those who can’t wait.

## Further Resources

- For further information about each of the tidymodels packages, I recommend the vignettes/articles on the respective package homepage (e.g., https://tidymodels.github.io/recipes/ or https://tidymodels.github.io/tune/).
- Max Kuhn, one of the developer of tidymodels packages, was interviewed on the R podcast and on the DataFramed podcast.
- Max Kuhn is the author of the books Applied Predictive Modeling (with Kjell Johnson) and The caret Package.
- R for Data Science by Hadley Wickham and Garrett Grolemund covers all the basics of data import, transformation, visualization, and modeling using tidyverse and tidymodels packages.
- Variable importance (plots) are provided by the package vip, which works well in combination with tidymodels packages.
- Recipe steps for dealing with unbalanced data are provided by the themis package.
- There are a few more tidymodels packages that I did not cover herein, like
**infer**or**tidytext**. Read more about these at https://tidymodels.github.io/tidymodels/.

For a list of models available via

**parsnip**, see https://tidymodels.github.io/parsnip/articles/articles/Models.html.↩