R basics

Objectives

  • Knowing what types exist in R

  • Knowing the most common data structures: vectors, lists, data frames, and sets

  • Creating and using functions

  • Knowing what a library is

  • Knowing what library() does

  • Being able to “read” an error

Motivation for R

  • Free

  • Huge ecosystem of examples, libraries, and tools

  • Relatively easy to read and understand

  • Similar in scope and use cases to Python, Julia, and Matlab

Basic types

# integer
num_measurements <- 13L

# numeric (float)
some_fraction <- 0.25

# character (string)
name <- "Bruce Wayne"

# logical (bool)
value_is_missing <- FALSE
skip_verification <- TRUE

# we can print values
print(name)

# arithmetic operations with integers and numeric values
print(5 * num_measurements)
print(1.0 - some_fraction)
  • R is dynamically typed: We do not have to define that an integer is an integer, we can use it directly, and R will infer it.

  • Comments in R use the # symbol.

Data structures for collections: vectors, lists, data frames, and sets

# vectors (similar to Python lists, important for ordered elements)
scores <- c(13, 5, 2, 3, 4, 3)

# first element
print(scores[1])

# add items to vectors
scores <- c(scores, 4)

# sort vectors
scores <- sort(scores)
print(scores)

# lists are useful to store collections with named elements
experiment <- list(location = "Svalbard", date = "2021-03-23", num_measurements = 23)

print(experiment$date)

# add items to lists
experiment$instrument <- "a particular brand"
print(experiment)

if ("instrument" %in% names(experiment)) {
  print("yes, the list 'experiment' contains the element 'instrument'")
} else {
  print("no, it doesn't")
}
  • Vectors are good when order matters and elements are homogeneous.

  • Lists allow heterogeneous elements and are named collections.

  • Data frames are like tables.

  • Sets (unique() function or sets package) for collections without repetition.

You can nest:

  • lists inside vectors

  • vectors inside lists

  • data frames inside lists

  • lists inside data frames

Iterating over collections

Iterating over a vector:

scores <- c(13, 5, 2, 3, 4, 3)

for (score in scores) {
  print(score)
}

# example with string formatting
for (score in scores) {
  print(sprintf("the score is %s", score))
}

Iterating over a list:

experiment <- list(location = "Svalbard", date = "2021-03-23", num_measurements = 23)

# iterating over values
for (value in experiment) {
  print(value)
}

# iterating over names and values
for (key in names(experiment)) {
  print(paste(key, experiment[[key]]))
}

Functions

  • Functions are like reusable recipes. They receive input arguments, perform operations, and return a result.

    add <- function(a, b) {
      result <- a + b
      return(result)
    }
    
  • Function summing elements in a vector:

    add_all_elements <- function(sequence) {
      s <- 0.0
      for (element in sequence) {
        s <- s + element
      }
      return(s)
    }
    
    measurements <- c(1,2,3,4,5,6,7,8,9,10)
    print(add_all_elements(measurements))
    
  • Function computing the mean:

    arithmetic_mean <- function(sequence) {
      s <- add_all_elements(sequence)
      n <- length(sequence)
      return(s / n)
    }
    
    measurements <- c(1,2,3,4,5,6,7,8,9,10)
    mean <- arithmetic_mean(measurements)
    print(mean)
    
  • Functions calling other functions and returning multiple values:

    uppercase_and_lowercase <- function(text) {
      u <- toupper(text)
      l <- tolower(text)
      return(list(upper = u, lower = l))
    }
    
    some_text <- "SequenceOfCharacters"
    text_cases <- uppercase_and_lowercase(some_text)
    
    print(text_cases$upper)
    print(text_cases$lower)
    

Why functions? Reduce repetition and simplify understanding of the code.

Reading error messages

Here we introduce a mistake and try to understand the traceback:

Example error traceback

Example error traceback. Can you explain the error?

Libraries

Libraries are collections of functions. We load libraries to reuse functions defined in them.

Try this:

library(stats)

measurements <- c(1,2,3,4,5,6,7,8,9,10)
result <- sd(measurements)
print(result)

Often, libraries are loaded directly:

library(dplyr)

You can create your own libraries (packages) with functions for reuse.

Great resources to learn more

Exercises

Exercise: create a function that computes the standard deviation

  • Arithmetic mean:

    \[\bar{x} = \frac{1}{N} \sum_{i=1}^N x_i\]
  • Standard deviation:

    \[\sqrt{ \frac{1}{N} \sum_{i=1}^N (x_i - \bar{x})^2 }\]
  • Take this as starting point:

    arithmetic_mean <- function(sequence) {
      mean(sequence)
    }
    
    standard_deviation <- function(sequence) {
      # your code here
    }
    

Exercise: working with a named list

  • Add more names and grades:

    grades <- list(Alice = 80, Bob = 95)
    
  • Print, modify, and explore the list.