Chapter 8 ML Pipelines
There has been code provided throughout the course in live code blocks, where you get snippets of code to perform particular tasks by directly leveraging individual packages for particular methods. However, there are also full software pipelines for machine learning available in R (R Core Team, 2021) which make running machine learning analyses much easier. Arguably the two top ML pipelines in R today are tidymodels
(Kuhn and Wickham, 2020) and mlr3
(Lang et al., 2019), which we will cover in the labs on the course.
8.1 Computer Labs
It is recommended to use Github Codespaces for the computer labs, since this enables us to have a consistent R and RStudio environment irrespective of what computer you are using. This is because everything runs through a web browser on a remote cloud server at Github. Please see the video below for guidance if you have never used it before.
- Github Codespaces
- Setup Instructions: In preparation for the course you will have been sent a link to sign up to Github and register as a student account (for free “Pro” tools). If you have not done this yet, please complete at least Step 1 of the setup instructions now if you plan to use the recommended Github Codespaces environment.
- Running Codespaces: To run the lab in Codespaces, please go to the lab repository and launch the environment from there: https://github.com/louisaslett/APTS-StatML. For quick future access, you can also click the link “Github Codespace” near the bottom of the contents on the left menu.
- Lab 1: Tidyverse
- Lab 2: MLR 3
References
Kuhn, M., Wickham, H. (2020). Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles. URL https://www.tidymodels.org
Lang, M., Binder, M., Richter, J., Schratz, P., Pfisterer, F., Coors, S., Au, Q., Casalicchio, G., Kotthoff, L., Bischl, B. (2019). mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software 4(44), 1903. DOI: 10.21105/joss.01903
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/