R.1: Introduction to R and RStudio

Laurent Modolo laurent.modolo@ens-lyon.fr; Hélène Polvèche hpolveche@istem.fr

2022

https://can.gitbiopages.ens-lyon.fr/R_basis/

1 Introduction

The goal of this practical is to familiarize yourself with R and the RStudio environment.

The objectives of this session will be to:

  • Understand the purpose of each pane in RStudio
  • Do basic computation with R
  • Define variables and assign data to variables
  • Manage a workspace in R
  • Call functions
  • Manage packages
  • Be ready to write graphics !

1.2 Some R background

is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.

  • Created by Ross Ihaka and Robert Gentleman
  • initial version released in 1995
  • free and open-source implementation the S programming language
  • Currently developed by the R Development Core Team.

Reasons to use it:

  • It’s open source, which means that we have access to every bit of underlying computer code to prove that our results are correct (which is always a good point in science).

  • It’s free, well documented, and runs almost everywhere

  • It has a large (and growing) user base among scientists

  • It has a large library of external packages available for performing diverse tasks.

  • 19910 available packages on https://cran.r-project.org/

  • 2230 available packages on http://www.bioconductor.org

  • >500k available repository using R on https://github.com/

1.3 How do I use R ?

Unlike other statistical software programs like Excel, SPSS, or Minitab that provide point-and-click interfaces, R is an interpreted language.

This means that you have to write instructions for R. Which means that you are going to learn to write code / program in R.

R is usually used in a terminal in which you can type or paste your R code:

But navigating between your terminal, your code and your plots can be tedious, this is why in 2023 there is a better way to use R !

1.4 RStudio, the R Integrated development environment (IDE)

An IDE application provides comprehensive facilities to computer programmers for software development. Rstudio is free and open-source.

To open RStudio, you can install the RStudio application and open the app.

Otherwise you can use the link and the login details provided to you by email. The web version of Rstudio is the same as the application expect that you can open it any recent browser.

1.4.1 Rstudio interface

1.4.2 A R console

The same console as before (in Red box)

1.4.3 A code editor

We are now going to write our first commands. We could do it directly in the R console, with multi-line commands but this process is tedious.

Instead we are going to use the Rstudio code editor panel, to write our code. You can go to File > New File > R script to open your editor panel.

Beside, you can keep your code history.

1.5 How to execute R code in Rstudio ?

RStudio offers you great flexibility in running code from within the editor window. There are buttons, menu choices, and keyboard shortcuts. To run the current line, you can

  • click on the Run button above the editor panel, or
  • select Run Selected Lines from the Code menu, or
  • hit Ctrl+Return in Windows or Linux or Cmd+Return on OS X. To run a block of code, select it and then Run.

If you have modified a line of code within a block of code you have just run, there is no need to reselect the section and Run, you can use the next button along, Rerun the previous region. This will run the previous code block including the modifications you have made.

2 R as a Calculator

Now that we know what we should do and what to expect, we are going to try some basic R instructions. A computer can perform all the operations that a calculator can do, so let’s start with that:

  • Add: +
  • Divide: /
  • Multiply: *
  • Subtract: -
  • Exponents: ^ or **
  • Parentheses: (, )

Now Open RStudio.

You can copy paste but I advise you to practice writing directly in the terminal. Like all the languages, you will become more familiar with R by using it.

To validate the line at the end of your command: press Return.

2.1 First commands

You should see a > character before a blinking cursor. The > is called a prompt. The prompt is shown when you can enter a new line of R code.

1 + 100

For classical output R will write the results with a [N] with N the row number. Here you have a one-line results [1]

[1] 101

Do the same things but press (return) after typing +.

1 +

The console displays +.
The > can become a + in case of multi-lines code. As there are two sides to the + operator, R know that you still need to enter the right side of your formula. It is waiting for the next command. Write just 100 and press :

100
[1] 101

2.2 Errors, warnings, and messages

The R console is a textual interface, which means that you will enter code, but it also means that R is going to write information back to you and that you will have to pay attention at what is written.

There are 3 categories of messages that R can send you: Errors prefaced with Error in…, Warnings prefaced with Warning: and Messages which don’t start with either Error or Warning.

  • Errors, you must consider them as red light. You must figure out what is causing it. Usually you can find useful clues in the errors message about how to solve it.
  • Warning, warnings are yellow light. The code is running but you have to pay attention. It’s almost always a good idea to try to fix warnings.
  • Message are just friendly messages from R telling you how things are running.

2.3 R keeps to the mathematical order

The order of operation is the natural mathematical order in R:

3 + 5 * 2
[1] 13

You can use parenthesis ( ) to change this order.

(3 + 5) * 2
[1] 16

But to much parenthesis can be hard to read

(3 + (5 * (2 ^ 2))) # hard to read
[1] 23
3 + 5 * (2 ^ 2)     # if you forget some rules, this might help
[1] 23

Note : The text following a # is a comment. It will not be interpreted by R. In the future, I advise you to use comments a lot to explain in your own words what the command means.

2.4 Scientific notation

For small of large numbers, R will automatically switch to scientific notation.

2/10000
[1] 2e-04

2e-4 is shorthand for 2 * 10^(-4) You can use e to write your own scientific notation.

5e3
[1] 5000

2.5 Mathematical functions

R is distributed with a large number of existing functions. To call mathematical function you must with function_name(<number>).

For example, for the natural logarithm:

log(2)  # natural logarithm
[1] 0.6931472
log10(10) # base-10 logarithm
[1] 1
exp(0.5)
[1] 1.648721

Compute the factorial of 9 (9!)

9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1
[1] 362880

or

factorial(9)
[1] 362880

2.6 Comparing things

We have seen some examples that R can do all the things that a calculator can do. But when we are speaking of programming language, we are thinking of writing computer programs. Programs are collections of instructions that perform specific tasks. If we want our future programs to be able to perform automatic choices, we need them to be able to perform comparisons.

Comparisons can be made with R. The result will return a TRUE or FALSE value (which is not a number as before but a boolean type).

Try the following operator to get a TRUE then change your command to get a FALSE.

You can use the (upper arrow) key to edit the last command and go through your history of commands

  • equality (note two equal signs read as “is equal to”)
1 == 1
[1] TRUE
  • inequality (read as “is not equal to”)
1 != 2
[1] TRUE
  • less than
1 < 2
[1] TRUE
  • less than or equal to
1 <= 1
[1] TRUE
  • greater than
1 > 0
[1] TRUE

Summary so far

  • R is a programming language and free software environment for statistical computing and graphics (free & opensource) with a large library of external packages available for performing diverse tasks.
  • RStudio is an IDE application that provides comprehensive facilities to computer programmers for software development.
  • R can be used as a calculator
  • R can perform comparisons

3 Variables and assignment

In addition to being able to perform a huge number of computations very fast, computers can also store information to memory. This is a mandatory function to load your data and store intermediate states in your analysis.

In R <- is the assignment operator (read as left members take right member value).

= Also exists but is not recommended! It will be used preferentially in other cases. (We will see them later). If you really don’t want to press two consecutive keys for assignment, you can press alt + - to write <-. Rstudio provides lots of such shortcuts (you can display them by pressing alt + shift + k).

We assign a value to x, x is called a variable.

x <- 1 / 40

We can then ask R to display the value of x.

x
[1] 0.025

3.1 The environment

You now see the x value in the environment box (in red).

This variable is present in your work environment. You can use it to perform different mathematical applications.

log(x)
[1] -3.688879

You can assign another value to x.

x <- 100
log(x)
[1] 4.60517
x <- x + 1  # x become 101 (100 + 1)
y <- x * 2
y
[1] 202

A variable can be assigned a numeric value as well as a character value.

Just put our character (or string) between double quote " when you assign this value.

z <- "x"  # One character
z
[1] "x"
a <- "Hello world"  # Multiple characters == String
a
[1] "Hello world"

You cannot mix different types of variable together:

x + z

How to test the type of the variable?

is.character(z)
[1] TRUE
b <- 1 / 40
b
[1] 0.025
typeof(b)
[1] "double"

You can type is. and press tabulation. Rstudio will show you a list of function whose names start with is.. This is called autocompletion, don’t hesitate to spam your tabulation key as you write R code.

3.2 Variables names

Variable names can contain letters, numbers, underscores and periods.

They cannot start with a number or an underscore nor contain spaces at all.

Different people use different conventions for long variable names, these include:

periods.between.words
underscores_between_words
camelCaseToSeparateWords

What you use is up to you, but be consistent.

Which of the following are valid R variable names?
min_height
max.height
_age
.mass
MaxLength
min-length
2widths
celsius2kelvin
Solution

min_height
max.height
.mass
MaxLength
celsius2kelvin

3.3 Functions are also variables

logarithm <- log

Try to use the logarithm variable.

A R function can have different arguments

function (x, base = exp(1))
  • base is a named argument are read from left to right
  • named arguments breaks the reading order
  • named arguments make your code more readable

To know more about the log function we can read its manual.

help(log)

or

?log

This block allows you to view the different outputs (?help, graphs, etc.).

Test that your logarithm function can work in base 10

Solution

10^logarithm(12, base = 10)

3.4 Writing function

We can define our own function with :

  • function name,
  • declaration of function type: function,
  • arguments: between ( ),
  • { and } to open and close function body,

Here is an example of function declaration with two arguments a and b. Any argument name can be used.

function_name <- function(a, b){


}
  • a series of operations,

The argument a and b are accessible from within the function body as the variable a and b. In the function body argument are independant of the global environment.

function_name <- function(a, b){
  result_1 <- operation1(a, b)
  result_2 <- operation2(result_1, b)
  
}
  • return operation

At the end of a function we want to return a result, so function calls will be equal to this result.

function_name <- function(a, b){
  result_1 <- operation1(a, b)
  result_2 <- operation2(result_1, b)
  return(result_2)
}

Note: if you don’t use return by default the evaluation of the last line of your function body is returned.

Note: The function variables (here a and b) are independant of the global environment: They define to which values the operation will be applied in the function body.

  • The order of arguments is important

Predict the result of R1, R2 and R3.

minus <- function(a, b){
  result_1 <- a - b 
  return(result_1)
}

#R1:
R1 <- minus(4,2)

#R2
R2 <- minus(2,4)

#R3
a <- 2
b <- 10
R3  <-  minus(b,a)
Solution 1

minus <- function(a, b){
  result_1 <- a - b 
  return(result_1)
}
minus(4,2)
[1] 2

Solution 2

minus(2,4)
[1] -2

Solution 3

a <- 10
b <- 2
minus(b,a)
[1] -8

  • Naming variables is more explicit and bypasses the order.

Predict the result of R1, R2, R3 and R4.

a <- 10
b <- 2

minus <- function(a, b){
  result_1 <- a - b 
  return(result_1)
}

#R1:
R1 <- minus(a=6,b=3)

#R2
R2 <- minus(b=3,a=6)

#R3
R3 <- a

#R4
R4  <- minus(b=b,a=a)
Solution 1

a <- 10
b <- 2
minus <- function(a, b){
  result_1 <- a - b 
  return(result_1)
}
R1 <- minus(a=6,b=3)
R1
[1] 3

Solution 2

R2 <- minus(b=3,a=6)
R2
[1] 3

Solution 3

R3 <- a
R3
[1] 10

Solution 4

R4  <- minus(b=b,a=a)
R4
[1] 8

  • Default values for arguments may be set at definition and the Default value is used when argument is not provided.
minus_10 <- function(a, b=10){
  result_1 <- a - b 
  return(result_1)
}
minus_10(40)
[1] 30
minus_10(40,b=5)
[1] 35
minus_10(40,5)
[1] 35
  • Functions can be define without argument
print_hw <- function(){
  print("Hello world!")
  print("How R U?")
}

What is the difference between print_hw and print_hw() ?

Solution

print_hw is considered as an environment variable, and R return the definition of print_hw. You need to add () to execute it

print_hw
function(){
  print("Hello world!")
  print("How R U?")
}
print_hw()
[1] "Hello world!"
[1] "How R U?"

3.5 Some exercices

  1. Try a function (rect_area) to calculate the area of a rectangle of length “L” and width “W”

  2. (more difficult) Try a function (even_test) to test if a number is even? For that, you can use the %% modulo operators to get the remainder of an euclidean division and use the == comparison to test if the results of the modulo is equal to 0.

13 %% 2
[1] 1
  1. Using your even_test function, write a new function even_print which will print “This number is even” or “This number is odd”. You will need the if else statement and the function print. Find help on how to use them.
Solution 1

rect_area <- function(L,W){
  area <- L * W
  return(area)
}
rect_area(4,3)
[1] 12

Solution 2

even_test <- function(x){
  modulo_result <- x %% 2
  is_even <- modulo_result == 0
  return(is_even)
}
even_test(4)
[1] TRUE
even_test(3)
[1] FALSE

Note : A function can be written in several forms.

even_test2 <- function(x){
  (x %% 2) == 0
}
even_test(4)
[1] TRUE
even_test(3)
[1] FALSE

Solution 3

even_print <- function(x){
  if(even_test(x) == TRUE) {
    print("This number is even")
  } else {
    print("This number is odd")
  }
}
even_print(4)
[1] "This number is even"
even_print(3)
[1] "This number is odd"

Note : There is no need to test whether a boolean variable (TRUE/FALSE) is TRUE or FALSE.

even_print <- function(x){
  if(even_test(x)) {
    print("This number is even")
  } else {
    print("This number is odd")
  }
}
even_print(4)
[1] "This number is even"
even_print(3)
[1] "This number is odd"

3.6 Cleaning up

We can now clean your environment

rm(minus)

What appenned in the Environment panel ? Check the documentation of this command

Solution

?rm

ls()
 [1] "a"                     "b"                     "bioconductor_packages"
 [4] "biocPackages"          "cran_packages"         "even_print"           
 [7] "even_test"             "even_test2"            "logarithm"            
[10] "minus_10"              "print_hw"              "R1"                   
[13] "R2"                    "R3"                    "R4"                   
[16] "rect_area"             "url"                   "x"                    
[19] "y"                     "z"                    

Combine rm and ls to cleanup your Environment

Solution

rm(list = ls())

ls()
character(0)

Summary so far:

  • Assigning a variable is done with <-.
  • The assigned variables are listed in the environment box.
  • Variable names can contain letters, numbers, underscores and periods.
  • Functions are also variable and can write in several forms
  • An editing box is available on Rstudio.

4 Complex variable type

You can only go so far with the variables we have already seen. In R there are also complex variable type, which can be seen as combination of simple variable type.

4.1 Vector (aka list)

Vectors are simple list of variable of the same type

c(1, 2, 3, 4, 5)
[1] 1 2 3 4 5

or

c(1:5)
[1] 1 2 3 4 5

A mathematical calculation can be performed on the elements of the vector:

2^c(1:5)
[1]  2  4  8 16 32
x <- c(1:5)
2^x
[1]  2  4  8 16 32

Note: this kind of operation is called vectorisation and is very powerful in R.

To determine the type of the elements of a vector:

typeof(x)
[1] "integer"
typeof(x + 0.5)
[1] "double"
x + 0.5
[1] 1.5 2.5 3.5 4.5 5.5
is.vector(x)
[1] TRUE

Vectors can be extended to named vectors:

y <- c(a = 1, b = 2, c = 3, d = 4, e = 5)
y
a b c d e 
1 2 3 4 5 

We can compare the elements of two vectors:

x
[1] 1 2 3 4 5
y
a b c d e 
1 2 3 4 5 
x == y
   a    b    c    d    e 
TRUE TRUE TRUE TRUE TRUE 

4.2 Accessing values

There are multiple ways to access or replace values in vectors or other data structures. The most common approach is to use “indexing”. In the below, note that brackets [ ] are used for indexing, whereas you have already seen that parentheses ( ) are used to call a function and { } to define function. It is very important not to mix these up.

Here are some examples that show how elements of vectors can be obtained by indexing.

You can use the position(s) of the value(s) in the vector

x <- c(1,5,7,8)
x[4]
[1] 8
x[c(1,3,4)]
[1] 1 7 8

You can use booleans to define which values should be kept.

x <- c(1,5,7,8,15)
x[c(TRUE,FALSE,TRUE,FALSE,TRUE)]
[1]  1  7 15
x[c(FALSE,TRUE)] # Bolean vector is reused if it is not of the same size of the vector to index
[1] 5 8
y <- c(TRUE,FALSE,FALSE,FALSE,TRUE)
x[y]
[1]  1 15

You can use names in the case of a named vector.

x <-c(a = 1, b = 2, c = 3, d = 4, e = 5)
x[c("a","c")]
a c 
1 3 

You can also use an index to change values

x <- c(1,5,7,8,15)
x[1] <- 3
x
[1]  3  5  7  8 15
x[x>5] <- 13
x
[1]  3  5 13 13 13

Summary so far

  • A variable can be of different types : numeric, character, vector, function, etc.
  • Calculations and comparisons apply to vectors.
  • Do not hesitate to use the help box to understand functions!

We will see other complex variables type during this formation.

5 Packages

R base is like a new smartphone, you can do loots of things with it but you can also install new apps to a huge range of other things. In R those apps are called packages.

There are different sources to get packages from:

  • The CRAN which is the default source
  • Bioconducor which is another source specialized for biology packages
  • Directly from github

To install packages from Bioconducor and github you will need to install specific packages from the CRAN.

5.1 Installing packages

5.1.1 From CRAN

To install packages, you can use the install.packages function (don’t forget to use tabulation for long variable names).

install.packages("tidyverse")

or you can click on Tools and Install Packages...

Install also the ggplot2 package.

Solution

install.packages("ggplot2")

5.1.2 From Bioconducor

To install packages from bioconductor you must first install a package called “BiocManager”. This package imports a function called “install” allowing you to install packages hosted in bioconductor from their name.

To install “BiocManager” you must type:

install.packages("BiocManager")

Then to install, for example “tximport”, you just have to write:

BiocManager::install("tximport")

5.1.3 From github

If you need to install a package that is not available on the CRAN but on a github repository, you can do it using the “remotes” package. Indeed this package imports functions that will allow you to install a package available on github or bitbucket or gitlab directly on your computer.

To use the “remotes” packages, you must first install it:

install.packages("remotes")

Once “remotes” is installed, you will be able to install all R package from github or from their URL.

For example, if you want to install the last version of a “gganimate”, which allow you to animate ggplot2 graphes, you can use :

remotes::install_github("thomasp85/gganimate")

By default the latest version of the package is installed, if you want a given version you can specify it :

remotes::install_github("thomasp85/gganimate@v1.0.7")

You can find more information in the documentation of remotes : https://remotes.r-lib.org

5.2 Loading packages

Once a package is installed, you need to load it in your R session to be able to use it. The command sessionInfo display your session information.

sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rvest_1.0.3       klippy_0.0.0.9500 fontawesome_0.3.0

loaded via a namespace (and not attached):
 [1] xml2_1.3.3       knitr_1.40       magrittr_2.0.3   R6_2.5.1        
 [5] rlang_1.0.5      fastmap_1.1.0    fansi_1.0.3      stringr_1.4.1   
 [9] httr_1.4.4       tools_4.2.1      xfun_0.32        utf8_1.2.2      
[13] cli_3.3.0        jquerylib_0.1.4  htmltools_0.5.3  yaml_2.3.5      
[17] digest_0.6.29    assertthat_0.2.1 tibble_3.1.8     lifecycle_1.0.1 
[21] bookdown_0.28    vctrs_0.4.1      sass_0.4.2       curl_4.3.2      
[25] glue_1.6.2       cachem_1.0.6     evaluate_0.16    rmarkdown_2.16  
[29] stringi_1.7.8    pillar_1.8.1     compiler_4.2.1   bslib_0.4.0     
[33] rmdformats_1.0.4 jsonlite_1.8.0   pkgconfig_2.0.3 

Use the command library to load the ggplot2 package and check your session

Solution

library("ggplot2")
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.3.6     rvest_1.0.3       klippy_0.0.0.9500 fontawesome_0.3.0

loaded via a namespace (and not attached):
 [1] bslib_0.4.0      compiler_4.2.1   pillar_1.8.1     jquerylib_0.1.4 
 [5] rmdformats_1.0.4 tools_4.2.1      digest_0.6.29    jsonlite_1.8.0  
 [9] evaluate_0.16    lifecycle_1.0.1  tibble_3.1.8     gtable_0.3.0    
[13] pkgconfig_2.0.3  rlang_1.0.5      DBI_1.1.3        cli_3.3.0       
[17] curl_4.3.2       yaml_2.3.5       xfun_0.32        fastmap_1.1.0   
[21] withr_2.5.0      dplyr_1.0.9      httr_1.4.4       stringr_1.4.1   
[25] knitr_1.40       xml2_1.3.3       generics_0.1.3   sass_0.4.2      
[29] vctrs_0.4.1      tidyselect_1.1.2 grid_4.2.1       glue_1.6.2      
[33] R6_2.5.1         fansi_1.0.3      rmarkdown_2.16   bookdown_0.28   
[37] purrr_0.3.4      magrittr_2.0.3   scales_1.2.1     htmltools_0.5.3 
[41] assertthat_0.2.1 colorspace_2.0-3 utf8_1.2.2       stringi_1.7.8   
[45] munsell_0.5.0    cachem_1.0.6    

5.3 Unloading packages

Sometime, you may want to unload package from your session instead of relaunching R.

unloadNamespace("ggplot2")
sessionInfo()

5.4 Help on packages (when existing)

browseVignettes("ggplot2")