4 Vectors

R operates on named data structures. These include vectors, factors, matrices, arrays, lists and data frames. For now, let’s think about data structures as containers that store different types of data.

This chapter focuses on vectors. Vectors are not just another data structure. They are a central component of R.

4.1 Generating sequences

The simplest structure in R is the numeric vector. It consists of an ordered collection of numbers.

A vector can also contain strings, or logical values, but not a mixture.

Let’s start with numeric vectors and how to generate them in R.

`:` operator

: generates a sequence from a number to another number in steps of 1 or -1.

: is an operator. Operators are used to perform operations on variables and values.

Below let’s see a few quick examples of sequence generation.

Sequence 1 to 4 (step 1):

1:4

## [1] 1 2 3 4

Sequence 1 to -4 (step -1):

1:-4

## [1]  1  0 -1 -2 -3 -4

Sequence 8.5 down to 4.5 (stops before exceeding 4):

8.5:4

## [1] 8.5 7.5 6.5 5.5 4.5

Sequence 4 up to 8 (stops before exceeding 8.5):

4:8.5

## [1] 4 5 6 7 8

`c()`

c() combines values into a vector.

c(1, 2, 3, 4, 5)

## [1] 1 2 3 4 5

The example below combines a value with a sequence.

c(18, 9:5)

## [1] 18  9  8  7  6  5

If the arguments to c() are themselves vectors, c() flattens the vectors and combines them into one single vector.

c0 <- c() # Creates empty vector
c1 <- 1:3
c2 <- c(4, 5, 6)
c3 <- c(c0, c1, c2) # Flattens and combines multiple vectors
c3

## [1] 1 2 3 4 5 6

`seq(from, to)`

seq(from, to) is a generic function to generate regular sequences. It has five arguments, but not all of them will be specified in one call.

The two arguments from and to specify the beginning and end of the sequence.

seq(from = 0, to = 20)

##  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

If these are the only two arguments given, then the result is the same as the colon operator :.

0:20

##  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

The colon operator : works for sequences that grow by 1 only. But the seq() function supports optional arguments (by) to specify increment of the sequence.

seq(from = 0, to = 20, by = 2)

##  [1]  0  2  4  6  8 10 12 14 16 18 20

length.out is another optional argument, which specifies a length for the output sequence, and then R will calculate the necessary increment. The increment need not be an integer; R can create sequences with fractional increments.

The example below calculates increment to fit exactly 9 elements.

seq(from = 0, to = 20, length.out = 9)

## [1]  0.0  2.5  5.0  7.5 10.0 12.5 15.0 17.5 20.0

There are three other specialist sequence functions that are faster and easier to use, which cover specific use cases.

`seq.int()`

seq.int() lets us create a sequence from one number to another.

seq.int(from = 1, to = 10)

##  [1]  1  2  3  4  5  6  7  8  9 10

seq.int(from = 1, to = 10, by = 2)

## [1] 1 3 5 7 9

seq.int(from = 1, by = 2, length.out = 10)

##  [1]  1  3  5  7  9 11 13 15 17 19

Note: Arguments to seq(), and to other R functions, can be given in the named form, where the order of arguments in which they appear is irrelevant. For instance, seq.int(from = 1, to = 10) is the same as seq.int(to = 10, from = 1).

However, the interpretation of the unnamed arguments of a function is not standard. Therefore, it is recommended to always name the arguments.

`seq_len()`

seq_len() creates a sequence from 1 up to its input.

seq_len(5)

## [1] 1 2 3 4 5

seq_len(0) # Returns empty sequence

## integer(0)

seq_len(5) returns the same result as 1:5.

`seq_along()`

seq_along() creates a sequence from 1 up to the length of its input.

x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
seq_along(x)

## [1] 1 2 3 4 5

The code below generates indices based on length of x.

for(i in seq_along(x)) print(x[i])

## [1] 10.4
## [1] 5.6
## [1] 3.1
## [1] 6.4
## [1] 21.7

This returns the same result as below.

for(i in 1:length(x)) print(x[i])

## [1] 10.4
## [1] 5.6
## [1] 3.1
## [1] 6.4
## [1] 21.7

Note: However, there are times when iterating over 1:length(x) will fail. That’s when x is empty and length(x) is 0.

x <- vector("numeric")
1:length(x)

## [1] 1 0

Therefore, it is recommended that we use seq_along(x) whenever we can. It always returns a value the same length as x.

x <- vector("numeric")
seq_along(x)

## integer(0)

`rep()`

rep() replicates the values in a vector.

Below let’s see a group of examples.

The first example repeats the whole vector 1:4 twice.

rep(1:4, times = 2)

## [1] 1 2 3 4 1 2 3 4

Here rep() repeats each individual element twice before moving to the next.

rep(1:4, each = 2)

## [1] 1 1 2 2 3 3 4 4

The next example repeat 1 once, 2 twice, 3 three times, and 4 four times.

rep(1:4, times = c(1, 2, 3, 4))

##  [1] 1 2 2 3 3 3 4 4 4 4

The example below first repeat each element twice; then repeat that entire sequence 3 times.

rep(1:4, each = 2, times = 3)

##  [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4

The last example repeats the vector as needed but stops after producing exactly 10 values.

rep(1:4, length.out = 10)

##  [1] 1 2 3 4 1 2 3 4 1 2

4.2 Arithmetic operations

Vectors can be used in arithmetic expressions. The arithmetic operators include +, -, *, / and ^ (raising to a power).

In these cases, the operations are performed element by element on entire vectors.

In the two examples below, there are two vectors of equal length; R adds each pair of elements.

1:5 + 6:10

## [1]  7  9 11 13 15

c(1, 3, 6, 10, 15) + c(0, 1, 3, 6, 10)

## [1]  1  4  9 16 25

Specifically, for 1:5 + 6:10, here is how it works behind the scenes:

Operation	Result
1 + 6	7
2 + 7	9
3 + 8	11
4 + 9	13
5 + 10	15

Note: The colon operator : has high priority within an expression. This means R evaluates the : sequence before performing arithmetic like +, -, *, or /.

Compare the two examples below:

In the first example, R creates the sequence 1:5 first. Then, R subtracts 1 from each element.

1:5 - 1

## [1] 0 1 2 3 4

In the second example, parentheses override priority. 5 - 1 is computed first. Then, R performs 1:4.

1: (5 - 1)

## [1] 1 2 3 4

We will explain operator precedence more in detail below.

vector recycling

So far the vectors we’ve seen occurring in the same expression are of the same length. What happens if we try to do arithmetic on vectors of different lengths? R will recycle elements in the shorter vector to match the longer one.

1:5 + 1:15

##  [1]  2  4  6  8 10  7  9 11 13 15 12 14 16 18 20

In the case above, the vector 1:5 is repeated three times to match 1:15.

If the length of the longer vector isn’t a multiple of the length of the shorter one, R recycles but a warning will be given.

1:5 + 1:7

## Warning in 1:5 + 1:7: longer object length is not a multiple of shorter object length

## [1]  2  4  6  8 10  7  9

Here’s how everything works under the hood:

Longer Vector (1:7)	Shorter Vector Recycled (1:5)	Operation
1	1	1 + 1
2	2	2 + 2
3	3	3 + 3
4	4	4 + 4
5	5	5 + 5
6	1	6 + 1
7	2	7 + 2

The operations below are performed between every vector element and the scalar. The scalar is repeated. (In R we do not really have scalars; a “scalar” is simply a numeric vector with one element.)

c(2, 3, 5, 7, 11, 13) - 2

## [1]  0  1  3  5  9 11

1:10 / 3

##  [1] 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 2.0000000 2.3333333 2.6666667
##  [9] 3.0000000 3.3333333

(1:5)^2

## [1]  1  4  9 16 25

A peek at the behind‑the‑scenes mechanics for c(2, 3, 5, 7, 11, 13) - 2:

Original Vector Element	Recycled Value	Operation
2	2	2 - 2
3	2	3 - 2
5	2	5 - 2
7	2	7 - 2
11	2	11 - 2
13	2	13 - 2

common arithmetic functions

Common arithmetic functions are available. For example, length(x) is the number of elements in x. sum(x) gives the total of the elements in x.

max() and min() select the largest and smallest elements of a vector respectively. range() is a function whose value is a vector of length two, namely c(min(x), max(x)).

Other useful functions include log(), exp(), sin(), cos(), tan(), sqrt() etc.

Below are functions to get descriptive statistics that help characterize the distribution of your data.

u <- c(0,1,1,2,3,5,8,13,21,34)
mean(u)

## [1] 8.8

median(u)

## [1] 4

sd(u) # Sample standard deviation

## [1] 11.03328

var(u) # Sample variance

## [1] 121.7333

4.3 R is vectorized

All the arithmetic operators in R are vectorized. This means that an operator or a function will act on each element of a vector without the need for us to explicitly write a loop. Vector operations are one of R’s great strengths.

Let’s see an example. We want to write an R program to multiply two vectors of integers type and length 6. We can of course write a for loop …

vec1 <- 1:6
vec2 <- c(4, 5, 6, 7, 8, 9)

vec<- c()
for (i in seq_along(vec1)){
  vec <- c(vec, vec1[i] * vec2[i])
}

print(vec)

## [1]  4 10 18 28 40 54

Except that we don’t actually need a for loop! The built-in implicit looping over elements is much faster than explicitly writing our own loop. As we see below, the operator is applied to corresponding elements from both vectors. Specifically, 1 is multiplied by 4, 2 by 5, 3 by 6, etc.

vec1 <- 1:6
vec2 <- c(4, 5, 6, 7, 8, 9)
vec <- vec1 * vec2
vec

## [1]  4 10 18 28 40 54

To show how far this idea extends, consider that vectorization isn’t limited to arithmetic. We can transform an entire dataset in a single expression. For example, recentering a vector simply subtracts the mean from each element; no loop required.

u <- c(0, 1, 1, 2, 3, 5, 8, 13, 21, 34)
u - mean(u)

##  [1] -8.8 -7.8 -7.8 -6.8 -5.8 -3.8 -0.8  4.2 12.2 25.2

mean(u)

## [1] 8.8

Many statistical functions automatically operate element‑wise. For instance, cor() evaluates the relationship between two vectors by internally applying its formula across all paired elements.

x <- c(0, 1, 1, 2, 3, 5, 8, 13, 21, 34)
y <- log(x + 1)
cor(x, y)

## [1] 0.9068053

4.4 Logical vectors

R allows for manipulation of logical values. R has two logical values, TRUE and FALSE. ⁵ These are often called Boolean values in other programming languages.

The logical operators are !, & and |.

Most logical vectors aren’t typed out by hand; they are generated by comparison operators. These operators compare two values and return a logical result.

The comparison operators compare two values and return TRUE or FALSE. The comparison operators are <, <=, >, >=, == (exact equality) and != (inequality).

At its simplest, we compare a single value:

pi > 3

## [1] TRUE

We can also compare a vector against a single scalar, as in arithmetic operations. R will expand the scalar to the vector’s length and then perform the element-wise comparison.

c(3, 4 - 1, 1 + 1 + 1) == 3

## [1] TRUE TRUE TRUE

(1:5)^2 >= 16

## [1] FALSE FALSE FALSE  TRUE  TRUE

Again, as in arithmetic operations, we can compare entire vectors at once. R will perform an element-by-element comparison and return a vector of logical values, one for each comparison.

1:3 != 3:1

## [1]  TRUE FALSE  TRUE

v <- c(3, pi, 4)
w <- c(pi, pi, pi)
v <= w

## [1]  TRUE  TRUE FALSE

Once we have logical results, we often need to combine them. For example, “Is \(x > 5\) AND \(x < 10\)?”

Let’s use our vectors v and w to create two logical conditions. We will store these conditions in variables c1 and c2 so we can combine them.

v <- c(3, pi, 4)
w <- c(pi, pi, pi)
c1 <- v <= w
c2 <- v >= w

c1 and c2 are logical expressions.

c1

## [1]  TRUE  TRUE FALSE

c2

## [1] FALSE  TRUE  TRUE

c1 & c2 is their intersection (“and”); it returns TRUE only if both sides are TRUE.

c1 & c2

## [1] FALSE  TRUE FALSE

c1 | c2 is their union (“or”); it returns TRUE if at least one side is TRUE.

c1 | c2

## [1] TRUE TRUE TRUE

!c1 is the negation of c1; it flips the value.

!c1

## [1] FALSE FALSE  TRUE

Note: In R, you will sometimes see TRUE and FALSE abbreviated as T and F. While this saves a few keystrokes, it’s a dangerous habit. Unlike TRUE and FALSE, which are reserved words, T and F are just variables. This means a user (or a messy script) could accidentally overwrite them: T <- FALSE. Therefore, we should always use TRUE and FALSE.

logical vectors coerced into numeric vectors

Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors. FALSE becomes 0 and TRUE becomes 1.

sum(c(TRUE, FALSE, TRUE))

## [1] 2

mean(c(TRUE, FALSE, TRUE))

## [1] 0.6666667

The above expressions can be useful if we want to find out if any case meets a condition, or how many. For instance, if at least one case is evaluated to be TRUE, then sum(c(TRUE, FALSE, TRUE)) should be larger or equal to 1.

`any()` and `all()`

Two useful functions for dealing with logical vectors are any() and all(). Both test a logical vector.

any() returns TRUE if the given vector contains at least one TRUE value.

v <- c(3, pi, 4)
any(v == pi)

## [1] TRUE

all() returns TRUE if all of the values are TRUE in the given vector.

all(v == pi)

## [1] FALSE

4.5 Operators

So far we have discussed assignment, arithmetic, logical and comparison operations. We use operators to perform these operations on variables and values. More formally, an operator is a function that takes one or two arguments and can be written without parentheses.

To sum up, in R we have:

assignment operator: <-
arithmetic operators: +, -, *, /, ^ (raising to a power) and %% (modulus; remainder from division)
comparison operators: <, <=, >, >=, == (exact equality), and != (inequality)
logical operators: ! (not), & (and) and | (or)

R also has special operators like %in%, which returns a logical vector indicating if there is a match or not for its left operand.

c(1:10) %in% c(1:5)

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

operator precedence

When we enter an expression in R, R always evaluates some expressions first over others. We call this order of operations operator precedence. Operator with higher precedence is evaluated first and operator with lowest precedence is evaluated at last. Operators of equal precedence are evaluated from left to right.

We can use ?Syntax to find out the precedence of operators. The operator precedence, from high to low, is:

[, [[ indexing
^ exponentiation
: sequence operator
*, / multiply, divide
+, - add, subtract
<, >, <=, >=, ==, != comparison
! negation
& and
| or
<- assignment
? help

What are the results of the following operations?

Compare (1:5)^2 and 1:5^2, for instance. (1:5)^2 squares each element of 1:5.

(1:5)^2

## [1]  1  4  9 16 25

In 1:5^2, ^ has higher precedence, so this is essentially 1:25.

1:5^2

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

-2:2 * -2:2 multiplies element‑wise between -2:2 and -2:2.

-2:2 * -2:2

## [1] 4 1 0 1 4

The example below checks membership, and then negates the result.

!c("a","b") %in% letters

## [1] FALSE FALSE

4.6 Character vectors

In R, we can also enter expressions with characters using a pair of double or single quotes. Characters are printed using double quotes.

"Hello World!"

## [1] "Hello World!"

'Hello again!'

## [1] "Hello again!"

In R, character vector is the basic unit of text.

You may also have come across the term “string”. How is “string” related to “character”? In R, “string” is an informal term meaning “element of a character vector”. Most string manipulation functions operate on vectors of strings, in the same way that arithmetic operations are vectorized.

`c()`, `paste()`

Strings may be concatenated into a vector by c().

c("this", "is", "a", "character", "vector", ".")

## [1] "this"      "is"        "a"         "character" "vector"    "."

We can also use the paste() function to concatenate strings. paste() concatenates one or more objects; by default, they are separated in the result by a single blank character.

paste(c("NYU"), c("Shanghai", "New York", "Abu Dhabi"))

## [1] "NYU Shanghai"  "NYU New York"  "NYU Abu Dhabi"

We can change how the resulting character is separated using the argument sep, which takes a character string.

paste("1st", "2nd", "3rd", sep = ", ")

## [1] "1st, 2nd, 3rd"

Vector recycling also applies here. Below c("X", "Y") is repeated 5 times to match the sequence 1:10.

paste(c("X","Y"), 1:10, sep = "")

##  [1] "X1"  "Y2"  "X3"  "Y4"  "X5"  "Y6"  "X7"  "Y8"  "X9"  "Y10"

paste0() is equivalent to paste(..., sep = ""), a slightly more efficient version in this case.

nth <- paste0(1:12, c("st", "nd", "rd", rep("th", 9)))
paste(month.abb, "is the", nth, "month of the year.")

##  [1] "Jan is the 1st month of the year."  "Feb is the 2nd month of the year." 
##  [3] "Mar is the 3rd month of the year."  "Apr is the 4th month of the year." 
##  [5] "May is the 5th month of the year."  "Jun is the 6th month of the year." 
##  [7] "Jul is the 7th month of the year."  "Aug is the 8th month of the year." 
##  [9] "Sep is the 9th month of the year."  "Oct is the 10th month of the year."
## [11] "Nov is the 11th month of the year." "Dec is the 12th month of the year."

`nchar()`, `length()`

Unlike some programming languages, R does not distinguish between whole strings and individual characters. A string containing one character is treated the same as any other string. Therefore, “character” in R refers to the type of data, not an individual character within a string.

To count the number of single characters in a string, use nchar(). To get the length of a character vector, use length().

string <- "Hello"
nchar(string)

## [1] 5

length(string)

## [1] 1

4.7 Useful built-in functions operating on vectors

`head()`, `tail()`

head() and tail() returns the first or last parts of an object (such as a vector, matrix, data frame).

By default, head() returns the first 6 elements, and tail() returns the last 6 elements.

head(letters)

## [1] "a" "b" "c" "d" "e" "f"

tail(letters)

## [1] "u" "v" "w" "x" "y" "z"

We can also control how many elements are returned by specifying the argument n.

head(letters, n = 10)

##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

`sort()`

sort() sorts a vector into ascending or descending order.

Sort ascending:

sort(c(10:3, 2:12))

##  [1]  2  3  3  4  4  5  5  6  6  7  7  8  8  9  9 10 10 11 12

Sort descending:

sort(c(10:3, 2:12), decreasing = TRUE)

##  [1] 12 11 10 10  9  9  8  8  7  7  6  6  5  5  4  4  3  3  2

set operations

union(), intersect(),setdiff(), setequal(), and is.element() perform set operations.

union(), intersect(), setdiff() and setequal() will discard any duplicated values in the arguments.

Set union

union(x, y) returns all unique elements from both sets.

x <- 1:15
y <- 10:25
union(x, y)

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Intersection

intersect(x, y) returns elements common to both x and y.

intersect(x, y)

## [1] 10 11 12 13 14 15

Difference

setdiff(x, y) returns elements in x not in y.

setdiff(x, y)

## [1] 1 2 3 4 5 6 7 8 9

setdiff(y, x) returns elements in y not in x.

setdiff(y, x)

##  [1] 16 17 18 19 20 21 22 23 24 25

Equality

setequal(x, y) returns TRUE if sets contain same elements.

setequal(x, y)

## [1] FALSE

Membership on two vectors

is.element(x, y) returns which elements of x appear in y.

is.element(x, y)

##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [15]  TRUE

is.element(y, x) returns which elements of y appear in x.

is.element(y, x)

##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [15] FALSE FALSE

`unique()`

unique() returns a vector with duplicate elements removed. It keeps only first occurrence of each value.

unique(c(10:3, 2:12))

##  [1] 10  9  8  7  6  5  4  3  2 11 12

4.8 Missing values

`NA`

In some cases, the elements of a vector may not be completely known. R assigns the special value NA to these elements to indicate that they are “not available” or “missing”.

NA is a logical constant of length 1.

In general, any operation on an NA becomes an NA. The motivation for this rule is that if the specification of an operation is incomplete, the result cannot be known and hence is not available.

NA + 1

## [1] NA

NA > 1

## [1] NA

The function is.na() evaluates whether an element is NA.

x <- c(1:3, NA)
is.na(x)

## [1] FALSE FALSE FALSE  TRUE

The logical expression x == NA is different from is.na(x). As said earlier, any operation on an NA becomes an NA; hence x == NA will return a vector of the same length as x, whose values are all NA.

x == NA

## [1] NA NA NA NA

Note: Functions are very careful about values that are not available. NA value in the vector as an argument may cause a function to return NA or an error.

x <- c(0, 1, 1, 2, 3, NA)
mean(x)

## [1] NA

sd(x)

## [1] NA

We can choose to ignore the NA values by setting na.rm to be TRUE.

x <- c(0, 1, 1, 2, 3, NA)
mean(x, na.rm = TRUE)

## [1] 1.4

sd(x, na.rm = TRUE)

## [1] 1.140175

`NaN`

There is a second kind of “missing” value, NaN, which is produced by numerical computation. NaN is short for “not-a-number”, which means that a calculation either does not make mathematical sense or cannot be performed properly.

0 / 0

## [1] NaN

Inf - Inf

## [1] NaN

`NULL`

In R, we also have a special value NULL. It represents an empty variable. Unlike NA, it takes up no space.

length(NA)

## [1] 1

length(NaN)

## [1] 1

length(NULL)

## [1] 0

4.9 Indexing

Sometimes we want to access only part of a vector. This operation is called indexing (also known as subsetting, subscripting, or slicing). In R, we access the vector elements by placing an index vector inside square brackets [], which serve as the indexing operator.

v[index vector]

Below we discuss the rules of indexing.

A vector of positive numbers selects elements by their position.

The corresponding elements of the vector are selected, concatenated, and returned in the order that they are referenced.

Let’s create a vector and see some examples.

x <- c(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)

Extract the first element:

x[1]

## [1] 1

Extract a sequence of elements from position 1 to 3:

x[1:3]

## [1] 1 4 9

Extract elements in a specific, non-sequential order:

x[c(1, 5, 7, 3, 7)]

## [1]  1 25 49  9 49

Very important: The first element has an index of 1, not 0 as in some other programming languages.

A vector of negative numbers excludes elements at specified locations.

Think of the minus sign as a “drop” command. R will return everything except the indices you provide.

In the example below, the code returns everything except the second element.

x[-2]

## [1]   1   9  16  25  36  49  64  81 100

Drop the 2nd and 4th elements:

x[c(-2, -4)]

## [1]   1   9  25  36  49  64  81 100

Drop positions 1 through 5:

x[-(1:5)]

## [1]  36  49  64  81 100

Note: You cannot mix positive and negative integers in one call. R gets confused if you try to “keep” and “drop” simultaneously (e.g., x[c(1, -1)] will trigger an error).

A logical vector selects elements based on a condition.

R looks for where the condition is TRUE and keeps those elements, while discarding the FALSE positions.

The example below selects only the elements that are less than 10:

x[x < 10]

## [1] 1 4 9

Behind the scenes, R creates a TRUE/FALSE mask to find the values:

x[c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)]

## [1] 1 4 9

Remember that logical operations are element-wise. The index vector is recycled to the same length as x. Values corresponding to TRUE in the index vector are selected and those corresponding to FALSE are omitted.

Below we have a couple of more examples.

Select only the elements that are less than 10:

x[x > mean(x)]

## [1]  49  64  81 100

Select values in the bottom 10% or top 10%:

x[(x < quantile(x, 0.1)) | (x > quantile(x, 0.9))]

## [1]   1 100

Keep only the values that are not missing:

x[!is.na(x)]

##  [1]   1   4   9  16  25  36  49  64  81 100

Use the modulo operator to select only even numbers or odd numbers:

x[x %% 2 == 0]

## [1]   4  16  36  64 100

x[x %% 2 == 1]

## [1]  1  9 25 49 81

Using names to access named elements

This only applies to named vectors. Label access allows us to retrieve data by its label rather than its position.

We can use a character vector of names to access the part of the vector containing the elements with those names.

# Nobel laureates in Literature
years <- c(2016, 2012, 1954, 1953, 1950)
names(years) <- c("Bob Dylan", "Mo Yan", "Ernest Hemingway", "Winston Churchill", "Bertrand Russell")
years

##         Bob Dylan            Mo Yan  Ernest Hemingway Winston Churchill  Bertrand Russell 
##              2016              2012              1954              1953              1950

years["Bob Dylan"]

## Bob Dylan 
##      2016

years[c("Bob Dylan", "Winston Churchill")]

##         Bob Dylan Winston Churchill 
##              2016              1953

This option is particularly useful in connection with data frames, as we shall see in later chapters.

Not passing any index will return the whole of the vector.

x[]

##  [1]   1   4   9  16  25  36  49  64  81 100

To change the value of a specific item, refer to the index number.

The example below locates the 6th position and overwrite it with -36.

x[6] <- -36
x

##  [1]   1   4   9  16  25 -36  49  64  81 100

The second example finds every NA in x and replace it with 0.

x <- c(1, 4, 9, 16, 25, 36, 49, 64, 81, 100, NA)

x[is.na(x)] <- 0
x

##  [1]   1   4   9  16  25  36  49  64  81 100   0

Appending value(s) to a vector

There are two common ways to grow a vector. You can either rebuild it using the c() constructor or assign values to positions that don’t exist yet.

Vector constructor:

x <- c(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
x <- c(x, 121, 144) # Append 121 and 144 to the end of the existing vector
x

##  [1]   1   4   9  16  25  36  49  64  81 100 121 144

Element assignment:

x <- c(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
x[c(11,12)] <- c(121, 144) # Assign new values specifically to positions 11 and 12
x

##  [1]   1   4   9  16  25  36  49  64  81 100 121 144

If we assign value(s) to the position past the end of the vector, R extends the vector and fills it with NAs.

x <- c(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
x[15] <- 225  # Assign 225 to the 15th slot, skipping slots 11 through 14
x

##  [1]   1   4   9  16  25  36  49  64  81 100  NA  NA  NA  NA 225

`which()`

The which() function is the bridge between logical vectors (TRUE/FALSE) and integer indices (positions). It essentially answers the question: “At which positions is my condition met?”

which() returns the locations where a logical vector is TRUE.

which(c(TRUE, FALSE, TRUE, NA, FALSE, FALSE, TRUE))

## [1] 1 3 7

The example below finds the position of the letter “R” in the built-in alphabet vector.

which(LETTERS == "R")

## [1] 18

4.10 Data types

A vector can contain numbers, strings, or logical values, but not a mixture. All elements in a vector must share the same mode, which describes how an object is stored in memory.

In R, every object has a mode. This tells us whether something is stored as a number, a character string, a function, or another basic type.

More often, you may hear the term class. In R, every object also has a class, which determines how R interprets the object and what operations apply to it.

To confuse you even further, an object may have a mode “numeric”, but it has the class “Date”. In the example below, d consists of a single number (the number of days since January 1, 1970), but is interpreted as a date.

d <- as.Date("2010-03-15")
mode(d)

## [1] "numeric"

class(d)

## [1] "Date"

Does this sound too complicated? Don’t worry! Modes mostly exist for legacy purposes, so in practice you should only ever need to work with an object’s class.

To check a vector’s class, we use class().

class(TRUE)

## [1] "logical"

class("a")

## [1] "character"

class(2021)

## [1] "numeric"

To test whether an object belongs to a particular class, we use is.* family of functions.

is.logical(TRUE)

## [1] TRUE

is.character("a")

## [1] TRUE

is.numeric(2021)

## [1] TRUE

coercion

Now that we know vectors must contain elements of a single type, what happens when we try to combine mixed types? In that case, R automatically coerces the elements to a single type. The general rule is that R converts from more specific types to more general ones.

Logical values become numbers: TRUE is converted to 1, and FALSE to 0.

v1 <- c(1, 2, 3)
v2 <- c("1", "2", "3")
v3 <- c(TRUE, FALSE)
v4 <- c("A", "B", "C")

c(v1, v3)

## [1] 1 2 3 1 0

Numbers become strings:

c(v1, v2)

## [1] "1" "2" "3" "1" "2" "3"

Logical values become strings:

c(v3, v4)

## [1] "TRUE"  "FALSE" "A"     "B"     "C"

Numbers and logical values become strings:

c(v1, v3, v4)

## [1] "1"     "2"     "3"     "TRUE"  "FALSE" "A"     "B"     "C"

From these examples, we can see the basic coercion rules. The ordering is roughly logical < numeric < character.

We can also explicitly convert an object’s type using the as.* functions.

as.character(1)

## [1] "1"

as.numeric(TRUE)

## [1] 1

as.numeric("1")

## [1] 1

Note: When an object is coerced, its attributes are dropped. We’ll return to attributes shortly.

empty vectors

A vector can be empty and still have a mode.

character(0)

## character(0)

numeric(0)

## numeric(0)

If we need an empty vector of a specific type and length, we can create one with vector().

vector("numeric", 6)

## [1] 0 0 0 0 0 0

vector("logical", 6)

## [1] FALSE FALSE FALSE FALSE FALSE FALSE

vector("character", 6)

## [1] "" "" "" "" "" ""

vector("complex", 6)

## [1] 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i

vector("raw", 6)

## [1] 00 00 00 00 00 00

4.11 Attributes

Attributes are properties of an object. We’ve already encountered one attribute, which is mode. But objects can have several others.

All vectors also have a length attribute, which tells us how many elements they contain.

length(1:5)

## [1] 5

length(c(TRUE, FALSE, NA))

## [1] 3

length(NA)

## [1] 1

length(c("banana", "apple", "orange", "mango", "lemon"))

## [1] 5

Vectors can also have names, stored in a name attribute.

x <- 1:4
names(x) <- c("banana", "apple", "kiwi fruit", "")
x

##     banana      apple kiwi fruit            
##          1          2          3          4

We can use names() to retrieve the names of a vector, in addition to assigning names to elements.

names(x)

## [1] "banana"     "apple"      "kiwi fruit" ""

Although some may say there are three. As we will see shortly, NA is also a logical value, meaning “not available”.↩︎