SoFunction
Updated on 2025-04-09

In-depth explanation of R language data types

The objects used by R language to store data include: vectors, factors, arrays, matrices, data frames, time series (ts) and lists

Introduction to meaning

1. Vector (one-dimensional data): Only data of the same type can be stored

Syntax: c(data1, data2, ...), the subscript starts from 1 (same as Matlab); the vector can only store the same type of data.

> x <- c(1,5,8,9,1,2,5)
> x
[1] 1 5 8 9 1 2 5
> y <- c(1,"zhao") # There are integers and strings in it, and integers are automatically converted into characters> y[1] 
[1] "1"

access:

> x[-(1:2)]  # No element 1 and 2 is displayed[1] 8 9 1 2 5
> x[2:4]    # Visit 2, 3, 4 elements[1] 5 8 9

2. Factors: Provides a more concise way to process classified data

Factors are no longer used as numerical values ​​during the entire calculation process, but as "symbols".

factor(x=character(), levels, labels=levels, exclude=NA, ordered=(x), nmax=NA)

x: a data vector, which will be converted into a factor;

levels: Used to specify the possible level of a factor (the default is the different values ​​in the vector x, sort(unique(x))); it is a character vector (that is, each element is a single character, a vector composed of), and the variable b below is a character vector (can be generated using the() function).

labels: a name used to specify a level;

> a <- c(6,1,3,0)
> b = (a)
> b
[1] "6" "1" "3" "0"

exclude: A value vector that represents the horizontal value removed from the vector x.

nmax: upper bound of the horizontal number.

> factor(1:3)
[1] 1 2 3
Levels: 1 2 3
> factor(1:3, levels=1:6)
[1] 1 2 3
Levels: 1 2 3 4 5 6
> factor(1:6, exclude = 2)
[1] 1  <NA> 3  4  5  6  
Levels: 1 3 4 5 6

General factor VS ordered factor

Factors are used to store variables or ordered variables. These variables cannot be used for calculations, but can only be used for classification or counting. Generally, factors represent categorical variables, and ordered factors are used for ordering variables.

Create a factor:

&gt; colour &lt;- c('G', 'G', 'R', 'Y', 'G', 'Y', 'Y', 'R', 'Y')
&gt; col &lt;- factor(colour) #Generate FactorThe content in #labels replaces the content of levels at the corresponding location&gt; col1 &lt;- factor(colour, levels = c('G', 'R', 'Y'), labels = c('Green', 'Red', 'Yellow'))
&gt; levels(col)
[1] "G" "R" "Y"
&gt; levels(col1)
[1] "Green" "Red"  "Yellow"
&gt; col2 &lt;- factor(colour, levels = c('G', 'R', 'Y'), labels = c('1', '2', '3'))
&gt; levels(col2)
[1] "1" "2" "3"
&gt; col_vec &lt;- (col2)
&gt; class(col_vec)
[1] "character"
&gt; col2
[1] 1 1 2 3 1 3 3 2 3
Levels: 1 2 3
&gt; col_num &lt;- (col2)
&gt; col_num
[1] 1 1 2 3 1 3 3 2 3
&gt; col3 &lt;- factor(colour, levels = c('G', 'R')) There is no 'B' in #levels, causing the 'B' in col3 to become <NA>&gt; col3
[1] G  G  R  &lt;NA&gt; G  &lt;NA&gt; &lt;NA&gt; R  &lt;NA&gt;
Levels: G R
&gt; colour
[1] "G" "G" "R" "Y" "G" "Y" "Y" "R" "Y"

Create an ordered factor:

> score <- c('A', 'B', 'A', 'C', 'B')
> score1 <- ordered(score, levels = c('C', 'B', 'A'));
> score1
[1] A B A C B
Levels: C < B < A

3. Matrix (matrix, two-dimensional data): Only the same type can be stored

Syntax: matrix(data, nrow = , ncol = , byrow = F) -- byrow = F means to store data in columns (default), byrow=T means to store data in rows;

> xx = matrix(1:10, 2, 5)
> xx
   [,1] [,2] [,3] [,4] [,5]
[1,]  1  3  5  7  9
[2,]  2  4  6  8  10

4. Array (data greater than or equal to three-dimensional): Only the same type can be stored

Syntax: array(data, dim) -- data: It must be data of the same type; dim: a vector composed of dimensions of each dimension; (How does it feel like the reshape function in matlab)

> a = array(1:10,c(2,5))
> a
   [,1] [,2] [,3] [,4] [,5]
[1,]  1  3  5  7  9
[2,]  2  4  6  8  10

5. Data frame

A data frame is a data arranged in a matrix form (similar to an excel table), but unlike a matrix, each column of it can be of a different data type (or is very similar to excel).

Syntax: (data1, data2,...) -- data1,... is the data for each column.

> name <- c("Mr A", "Mr B", "Mr C")
> group <- rep(1,3)
> scort <- c(58,15,41)
> df <- (name, group, scort)
> df
 name group scort
1 Mr A   1  58
2 Mr B   1  15
3 Mr C   1  41

Data access:

> df$name
[1] Mr A Mr B Mr C
Levels: Mr A Mr B Mr C
 > df[1]
  name
 1 Mr A
 2 Mr B
 3 Mr C

6. List: Can store different types of data

Syntax: list(name1=component1, name2=component2, ...)

> xx <- rep(1:2, 3:4)
> yy <- c('Mr A', 'Mr B', 'Mr C', 'Mr D', 'Mr E', 'Mr D', 'Mr F')
> zz <- 'discussion group'
>  <- list(group = xx, name = yy, decription = zz)  
> 
$group
[1] 1 1 1 2 2 2 2

$name
[1] "Mr A" "Mr B" "Mr C" "Mr D" "Mr E" "Mr D" "Mr F"

$decription
[1] "discussion group"

refer to:

/s/blog_4d9814240102vigp.html

This is the end of this article about the in-depth explanation of R language data types. For more related R language data types, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!