The loop efficiency of R language is not high, so loops and loop nesting are not recommended. In order to realize the loop function and take into account efficiency, R language provides apply series functions for functional iterative processing of regular data.
apply
The apply function acts on an array or matrix with more than two dimensions. There are three necessary input parameters, namely the data to be processed, the dimension used for loops, and the processing function. The example is as follows
data <- matrix(c(1:20), 5, 4) apply(data, 1, mean) # [1] 8.5 9.5 10.5 11.5 12.5
The meaning of the above code is to perform the mean value means operation on the first dimension of data, in other words, take the average value for each line. data is a matrix of 5 rows and 4 columns. Each row takes the average value and you can get a vector with 4 elements.
apply also supports operating data of multiple coordinate axes. Take data as an example. If you want to take root numbers for all elements, you can write it as follows, and the result is the same as sqrt(data).
> apply(data, 1:2, sqrt) [,1] [,2] [,3] [,4] [1,] 1.000000 2.449490 3.316625 4.000000 [2,] 1.414214 2.645751 3.464102 4.123106 [3,] 1.732051 2.828427 3.605551 4.242641 [4,] 2.000000 3.000000 3.741657 4.358899 [5,] 2.236068 3.162278 3.872983 4.472136
lapply, sapply, vapply
apply cannot act on one-dimensional arrays, lapply and sapply complement this function
> arr <- apply(data, 1, mean) > apply(arr, 1, sqrt) Error in apply(arr, 1, sqrt) : dim(X)The value must be a positive number > sapply(arr, sqrt) [1] 2.915476 3.082207 3.240370 3.391165 3.535534 > lapply(arr, sqrt) [[1]] [1] 2.915476 [[2]] [1] 3.082207 [[3]] [1] 3.24037 [[4]] [1] 3.391165 [[5]] [1] 3.535534
From the above code, we can see that the main difference between the two is the return value. Sapply will adjust the return value according to the actual situation, and its return logic is
- 1 list -> vector
- Multiple lists of the same length -> matrix,
- Multiple lists of different lengths -> lists
In contrast, vapply allows for more flexibility in selecting output data types
> vapply(arr, sqrt, numeric(1)) [1] 2.915476 3.082207 3.240370 3.391165 3.535534
rapply
rapply can handle nested lists. Try to see the difference between lappy
> x <- list(1,2,c(1:5)) > sapply(x, sqrt) [[1]] [1] 1 [[2]] [1] 1.414214 [[3]] [1] 1.000000 1.414214 1.732051 2.000000 2.236068 > rapply(x, sqrt) [1] 1.000000 1.414214 1.000000 1.414214 1.732051 2.000000 2.236068
In other words, during the execution process, rapply will constantly check whether there is a list. If there is a list, open the list. In addition to x and fun, its available parameters can also specify the processing category classes, and the processing method how, and there are three parameters available.
- "replace" directly replaces the original element in the list
- "list" creates a new list, and the element type compound classes call FUN
- "unlist" is equivalent to calling unlist(recursive=TRUE) for the result in "list" mode
tapply
taply can group the input data. Here is an example to experience the usage of taply using iris data.
> tapply(iris$, iris$Species, mean) setosa versicolor virginica 5.006 5.936 6.588
iris provides data on the length, width and other data of three types of iris, among which iris$Species is its type information. The meaning of the above code is that the length of the iris is averaged according to the same Species.
mapply
The usage logic of mapply is to perform some function operation on two sets of data of the same dimension, similar to performing the following operation
for(i in 1:N){ func(L1[i], L2[i]) }
Here is an example of two different types of iris
L1<-iris[iris$Species=="setosa",] L2<-iris[iris$Species=="virginica",] max(L1$, L2$) # The return value is 7.9, and the maximum value of all data is calculated.
Below, through mapply, we can find that 50 sets of data in each category are compared and the maximum value is selected.
> mapply(max, L1$, L2$) [1] 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3 6.7 7.2 6.5 6.4 6.8 5.7 5.8 [16] 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2 [31] 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 [46] 6.7 6.3 6.5 6.2 5.9
This is the end of this article about the use of the R language apply series functions in detail. For more related R language apply series functions, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!