SoFunction
Updated on 2025-03-01

Summary of knowledge points of other objects in R language

Other objects

matrix

Two-dimensional vector

Matrix operations are more similar to vectors, rather than vectors or vector lists of vectors.

Subscripts can be used to refer to elements, but do not reflect the storage method of the matrix.

The matrix does not have a definite property

Array

Vectors with more than two dimensions

Arrays can be used to represent data of the same type in multiple dimensions

The underlying storage mechanism of arrays is vectors

There are no definite class attributes in the array

factor

Factor variables represent classification information

Factor variables are usually a collection of ordered items

All values ​​that a factor variable can obtain are called factor levels

The output results of factor variables are not quotation marks and are clearly displayed.

> ( <- factor(c("brown", "blue", "blue", "green", "brown", "brown", "brown")))
[1] brown blue blue green brown brown brown
Levels: blue brown green
> class()
[1] "factor"

R displays the order of factor levels when outputting ordered factors

Factor data is stored as integer data inside the computer

Factor level maps each integer data to a factor level

Because integer data occupies less storage space, this method saves more storage space than string vectors

You can convert factor variables into integer arrays

> ( <- unclass())
[1] 2 1 1 3 2 2 2
attr(,"levels")
[1] "blue" "brown" "green"
> class()
[1] "integer"

The serpentine attribute can also reconvert integer vectors to factors

> (class() <- "factor")
[1] "factor"
> 
[1] brown blue blue green brown brown brown
Levels: blue brown green

(Note: I don't understand how the integer value matches the factor level internally)

Data frame

Used to display table data, the contents of each column can be classified into different types

Each row in the data frame must have the same length

Usually, every column in the data frame has a column name, and sometimes the row also has a name.

Columns in data frames are often used to represent variables

library(nutshell)
data()
> 
      city rank
1    Seattle 100
2   Washington  96
3    Chicago  94
4    New York  93
5    Portland  93
6    St Louis  92
7     Denver  90
8     Boston  90
9  Minneapolis  89
10    Austin  87
11 Philadelphia  85
12 San Francisco  84
13    Atlanta  82
14  Los Angeles  80
15  Richardson  80
> typeof()
[1] "list"
> class()
[1] ""

The method of referencing elements in the list can also be used in the data frame.

Formula class formula

Used to describe the relationship between variables

y is a function of x1, x2 and x3

>  <- (y ~ x1 + x2 + x3)
> class()
[1] "formula"
> typeof()
[1] "language"

The meaning of different items contained in the formula

illustrate chestnut
Variable name Name of the variable
Wave~ Used to connect the response variable (left to the tilde sign) and independent variable (right to the tilde sign) in the formula
Add a sign+ Used to represent linear relationships between variables
0 When added to the formula0When, it means there is no intercept term in the model y~u+w+v+0
Vertical line| Used to specify condition variables, commonly used in lattice drawing formulas
Variable protection functionI() PutIThe expressions within are explained in arithmetic meaning a + b: means that both variable a and variable b are included in the formula.I(a + b):express(a + b)This sum should be included in the formula
Asterisk* Represents the interaction between variables y~(u + v)* w
Cassette^ Indicates power y~(u + v)^2Equivalent toy~(u + v)*(u + v)
Functions of variables Meaning that the function that specifies the variable should be included in the formula as an independent variable y~log(u) + sin(v) + w

This is the article about the summary of other objects in R language knowledge points. For more related content in R language, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!