class: center, middle, inverse, title-slide .title[ # Data Science and Statistical Computing ] .subtitle[ ## Practical Lecture 4
Data Visualisation (base) ] .author[ ### Dr Louis Aslett ] .institute[ ### Durham University ] .date[ ### 31 October 2024 ] --- ## Plotting data - R has a wide array of tools for visualising data. - So-called "exploratory data analysis" is really important, and can allow you to understand a data set a lot better. - Never dive in to complicated modelling without first *looking* at your data! - There are a few plotting frameworks in R, two most popular: - Base R - ggplot2 - ggplot2 can be made to produce publication quality graphics and is extremely powerful! However, it is a good bit more complicated than base R. **DEMO:** motivation for looking at the data! --- class: middle ## Presentation or exploration? <img src="i/present-vs-explore.jpg" width="100%" /> --- ## Base R Core plotting function: ``` r plot(x, y, ...) ``` Provide vectors for `x` and `y`, which are points to plot. -- Example of common extra arguments `...`: - `col` to set colour of points (can use RGB or colour name as a string; can be vector for each point) - `pch` change the plotting symbol (an integer, [click here for symbol numbers](https://cpb-us-e1.wpmucdn.com/sites.ucsc.edu/dist/d/276/files/2015/10/symbols.jpg)) - `xlab`, `ylab` x-axis or y-axis label - `xlim`, `ylim` change the plotting range of x or y (by default uses data range) - `main` plot title - `type` - `"p"` default, points - `"l"` connect observations in order by lines - `"b"` points and lines simultaneously -- If you only supply `x` then it will plot the values in `x` on the y-axis, against their index position in the vector on the x-axis. --- ## Base R Each call to `plot()` makes a brand new plot. To add more points/lines (eg varying colour) to an existing plot: ``` r points(x, y, ...) lines(x, y, ...) # shorthand for points(x, y, type = "l") ``` - Similar options for `...` - Some functions can be passed straight to lines: - `lowess()` to fit a smoothed line (`f` argument controls smoothness) - `density()` to fit a smoothed continuous version of histogram -- Other plotting functions include: - `hist()` for histograms - `boxplot()` for boxplots - `barplot()` for categorical bar charts (tip: use `table` to get summary first!) - `abline()` add `\(y=mx+c\)` type straight lines to existing plot - can pass `lm()` directly to fit a straight line -- All have slightly different interface, so part of learning R is learning to read the documentation! --- ## Breakneck tour! (I) ``` r data("diamonds", package = "ggplot2") plot(diamonds$carat, diamonds$price) plot(diamonds$carat, diamonds$price, pch = 20) plot(diamonds$carat, diamonds$price, pch = 20, xlim = c(0,1), ylim = c(0,10000)) plot(diamonds$carat, diamonds$price, pch = 20, main = "Dollar price against carat", xlab = "Carats", ylab = "Price in $") abline(lm(price ~ carat, diamonds), col = "red") lines(lowess(diamonds$carat, diamonds$price, f = 0.05), col = "green") ``` --- ## Breakneck tour! (II) ``` r hist(diamonds$price) hist(diamonds$price, freq = FALSE) hgram <- hist(diamonds$price, freq = FALSE) hgram$breaks hist(diamonds$price, freq = FALSE, breaks = seq(0, 20000, 2000)) mu.hat <- mean(diamonds$price) sig.hat <- sd(diamonds$price) x <- seq(0, 20000, length.out = 300) lines(x, dnorm(x, mu.hat, sig.hat)) lines(density(diamonds$price), col = "red") boxplot(diamonds$carat) boxplot(diamonds$carat ~ diamonds$clarity) cuts <- table(diamonds$cut) barplot(cuts) ``` --- <!-- ## Thinking about plots: films example --> <!-- Note that the deceptive simplicity of plotting in R can make one rush through without thought. --> <!-- Much can be gleaned from a data set if we stop to think while we analyse it! --> <!-- ```{r,eval=FALSE} --> <!-- data("movies", package = "ggplot2movies") --> <!-- hist(movies$length) --> <!-- ``` --> <!-- -- --> <!-- ```{r,eval=FALSE} --> <!-- boxplot(movies$length) --> <!-- ``` --> <!-- -- --> <!-- ```{r,eval=FALSE} --> <!-- hist(movies$length, --> <!-- breaks=seq(0,6000,1), xlim=c(0,180)) --> <!-- ``` --> <!-- -- --> <!-- ```{r,eval=FALSE} --> <!-- plot(movies$votes, movies$rating, --> <!-- pch=16) --> <!-- ``` --> <!-- --- --> ## Multiple plots To get a grid of all pairwise scatter plots, use `pairs()` ``` r pairs(mtcars) pairs(mtcars[,1:4]) ``` -- To manually create multiple plots, define the layout in terms of rows and columns ``` r par(mfrow = c(2,1)) plot(diamonds$carat, diamonds$price) boxplot(diamonds$carat) par(mfrow = c(1,1)) # <- need this to reset to a single plot! ``` -- We will see better options with more flexible layouts soon! <!-- --- --> <!-- ## 🤔 Programming Quiz --> <!-- Can you write a function which takes a single argument and returns a draw of that many cards (with replacement) from a deck of 52? --> <!-- --- --> <!-- 5. ggplot --> <!-- 6. dplyr + tidyr --> <!-- 7. Rmarkdown --> <!-- 8. Shiny --> <!-- 9. web scraping + text? --> <!-- 10. --> <!-- --- --> <!-- ## 🏃♀️ Simple exercise --> <!-- 1. Look at the Github page for the `naniar` project. [https://github.com/njtierney/naniar](https://github.com/njtierney/naniar) --> <!-- 1. What does the package do? --> <!-- 1. Is the package on CRAN? --> <!-- 1. Install the package and load it ready for use. --> <!-- ## External package: Palmer penguins --> <!-- ```{r,echo=FALSE,out.extra='style="float:right; padding:10px"',out.width="20%"} --> <!-- knitr::include_graphics("i/palmerpenguins_hex.png") --> <!-- ``` --> <!-- A dataset containing various size measurements and other covariates on 344 penguins of 3 different species, collected at [Palmer Station in Antarctica](https://pal.lternet.edu/) --> <!-- ```{r,eval=FALSE} --> <!-- # Only need to install the first time you use! --> <!-- install.packages("palmerpenguins") --> <!-- ``` --> <!-- ```{r} --> <!-- # Then each time you want to bring the data in: --> <!-- data(penguins, package = 'palmerpenguins') --> <!-- ``` --> <!-- ```{r, echo=FALSE, fig.align="center", out.width="60%"} --> <!-- knitr::include_graphics("i/palmerpenguins_culmen_depth.png") --> <!-- ``` -->