cm009 Exercises: tidy data

library(tidyverse)
lotr  <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/lotr_tidy.csv")
guest <- read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/wedding/attend.csv")
email <- read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/wedding/emails.csv")

Exercise 1: Univariate Pivoting

Consider the Lord of the Rings data:

lotr

Would you say this data is in tidy format?
Widen the data so that we see the words spoken by each race, by putting race as its own column.

(lotr_wide <- lotr %>% 
  pivot_wider(id_cols = c(-Race, -Words), 
              names_from = Race, 
              values_from = Words))

Re-lengthen the wide LOTR data from Question 2 above.

lotr_wide %>% 
  pivot_longer(cols = c(-Film, -Gender),
               names_to  = "Race", 
               values_to = "Words")

Exercise 2: Multivariate Pivoting

Congratulations, you’re getting married! In addition to the wedding, you’ve decided to hold two other events: a day-of brunch and a day-before round of golf. You’ve made a guestlist of attendance so far, along with food preference for the food events (wedding and brunch).

guest %>% 
  DT::datatable(rownames = FALSE)

Put “meal” and “attendance” as their own columns, with the events living in a new column.

(guest_long <- guest %>% 
  pivot_longer(cols      = c(-"party",-"name"), 
               names_to  = c(".value","event"),
               names_sep = "_"))

Use tidyr::separate() to split the name into two columns: “first” and “last”. Then, re-unite them with tidyr::unite().

guest_long %>% 
  separate(name, into = c("first","last"), sep = " ") %>% 
  unite(col = "name", first, last, sep = " ")

Which parties still have a “PENDING” status for all members and all events?

guest_long %>% 
  group_by(party) %>% 
  summarize(all_pending = all(attendance == "PENDING"))

Which parties still have a “PENDING” status for all members for the wedding?

guest %>% 
  group_by(party) %>% 
  summarize(pending_wedding = all(attendance_wedding == "PENDING"))

Put the data back to the way it was.

guest_long %>% 
  pivot_wider(id_cols     = c(party, name), 
              names_from  = c(event), 
              names_sep   = "_", 
              values_from = c(meal, attendance))

You also have a list of emails for each party, in this worksheet under the variable email. Change this so that each person gets their own row. Use tidyr::separate_rows()

email %>% 
  separate_rows(guest, sep = ", ")

Exercise 3: Making tibbles

Create a tibble that has the following columns:

A label column with "Sample A" in its entries.
100 random observations drawn from the N(0,1) distribution in the column x
y calculated as the x values + N(0,1) error.

n <- 100
tibble(label = "Sample A",
             x = rnorm(n),
             y = x + rnorm(n))

Generate a Gaussian sample of size 100 for each combination of the following means (mu) and standard deviations (sd).

n <- 100
mu <- c(-5, 0, 5)
sd <- c(1, 3, 10)
FILL_THIS_IN(mu = mu, sd = sd) %>% 
  group_by_all() %>% 
  mutate(z = list(rnorm(n, mu, sd))) %>% 
  FILL_THIS_IN

## Error in FILL_THIS_IN(mu = mu, sd = sd): could not find function "FILL_THIS_IN"

Fix the experiment tibble below (originally defined in the documentation of the tidyr::expand() function) so that all three repeats are displayed for each person, and the measurements are kept. The code is given, but needs one adjustment. What is it?

experiment <- tibble(
  name = rep(c("Alex", "Robert", "Sam"), c(3, 2, 1)),
  trt  = rep(c("a", "b", "a"), c(3, 2, 1)),
  rep = c(1, 2, 3, 1, 2, 1),
  measurement_1 = runif(6),
  measurement_2 = runif(6)
)

experiment %>% complete(nesting(name, trt), rep)

experiment %>% expand(name, trt, rep)