Data reshaping in R language is about changing the way data is organized into rows and columns. Most of the time, data processing in R language is done by taking input data as data frames. It is easy to extract data from rows and columns of a data frame, but in some cases the format we need is different from the format we receive data frames. R language has many features, splitting, merging and changing rows to columns in dataframes, and vice versa.
Add columns and rows to data frames
We can use the cbind() function to connect multiple vectors to create data frames. Additionally, we can use the rbind() function to merge two dataframes.
# Create vector objects. city <- c("Tampa","Seattle","Hartford","Denver") state <- c("FL","WA","CT","CO") zipcode <- c(33602,98104,06161,80294) # Combine above three vectors into one data frame. addresses <- cbind(city,state,zipcode) # Print a header. cat("# # # # The First data frame ") # Print the data frame. print(addresses) # Create another data frame with similar columns <- ( city = c("Lowry","Charlotte"), state = c("CO","FL"), zipcode = c("80230","33949"), stringsAsFactors = FALSE ) # Print a header. cat("# # # The Second data frame ") # Print the data frame. print() # Combine rows form both the data frames. <- rbind(addresses,) # Print a header. cat("# # # The combined data frame ") # Print the result. print()
When we execute the above code, it produces the following result −
# # # # The First data frame city state zipcode [1,] "Tampa" "FL" "33602" [2,] "Seattle" "WA" "98104" [3,] "Hartford" "CT" "6161" [4,] "Denver" "CO" "80294" # # # The Second data frame city state zipcode 1 Lowry CO 80230 2 Charlotte FL 33949 # # # The combined data frame city state zipcode 1 Tampa FL 33602 2 Seattle WA 98104 3 Hartford CT 6161 4 Denver CO 80294 5 Lowry CO 80230 6 Charlotte FL 33949
Merge data frames
We can use the merge() function to merge two data frames. The data frame must have the same column name, merged on it.
In the following example, we consider the data set of diabetes in the library name “MASS” about Pima Indian Women. We combined two datasets based on the values of blood pressure (“bp”) and body mass index (“bmi”). When these two columns are selected for merging, the records matching the values of the two variables in the two datasets are combined together to form a single data frame.
library(MASS) <- merge(x = , y = , = c("bp", "bmi"), = c("bp", "bmi") ) print() nrow()
When we execute the above code, it produces the following result −
bp bmi 1 60 33.8 1 117 23 0.466 27 No 2 125 20 0.088 2 64 29.7 2 75 24 0.370 33 No 2 100 23 0.368 3 64 31.2 5 189 33 0.583 29 Yes 3 158 13 0.295 4 64 33.2 4 117 27 0.230 24 No 1 96 27 0.289 5 66 38.1 3 115 39 0.150 28 No 1 114 36 0.289 6 68 38.5 2 100 25 0.324 26 No 7 129 49 0.439 7 70 27.4 1 116 28 0.204 21 No 0 124 20 0.254 8 70 33.1 4 91 32 0.446 22 No 9 123 44 0.374 9 70 35.4 9 124 33 0.282 34 No 6 134 23 0.542 10 72 25.6 1 157 21 0.123 24 No 4 99 17 0.294 11 72 37.7 5 95 33 0.370 27 No 6 103 32 0.324 12 74 25.9 9 134 33 0.460 81 No 8 126 38 0.162 13 74 25.9 1 95 21 0.673 36 No 8 126 38 0.162 14 78 27.6 5 88 30 0.258 37 No 6 125 31 0.565 15 78 27.6 10 122 31 0.512 45 No 6 125 31 0.565 16 78 39.4 2 112 50 0.175 24 No 4 112 40 0.236 17 88 34.5 1 117 24 0.403 40 Yes 4 127 11 0.598 1 31 No 2 21 No 3 24 No 4 21 No 5 21 No 6 43 Yes 7 36 Yes 8 40 No 9 29 Yes 10 28 No 11 55 No 12 39 No 13 39 No 14 49 Yes 15 49 Yes 16 38 No 17 28 No [1] 17
Sometimes, the spreadsheet data is formatted in a compact manner, giving covariates for each topic, followed by all observations for that topic. The modeling function of R needs to be observed in a single column. Consider the following sample of data from repeated MRI brain measurements
Status Age V1 V2 V3 V4 P 23646 45190 50333 55166 56271 CC 26174 35535 38227 37911 41184 CC 27723 25691 25712 26144 26398 CC 27193 30949 29693 29754 30772 CC 24370 50542 51966 54341 54273 CC 28359 58591 58803 59435 61292 CC 25136 45801 45389 47197 47126
There are two covariates and up to four measurements on each topic. Data is exported from Excel to a file.
We can use the stack to help manipulate this data to give a single response.
zz <- ("", = TRUE) zzz <- cbind(zz[gl(nrow(zz), 1, 4*nrow(zz)), 1:2], stack(zz[, 3:6]))
The result is:
Status Age values ind X1 P 23646 45190 V1 X2 CC 26174 35535 V1 X3 CC 27723 25691 V1 X4 CC 27193 30949 V1 X5 CC 24370 50542 V1 X6 CC 28359 58591 V1 X7 CC 25136 45801 V1 X11 P 23646 50333 V2 ...
The function unstack has the opposite direction and may be useful for exporting data.
Another way is to use function reshaping
> reshape(zz, idvar="id",timevar="var", varying=list(c("V1","V2","V3","V4")),direction="long") Status Age var V1 id 1.1 P 23646 1 45190 1 2.1 CC 26174 1 35535 2 3.1 CC 27723 1 25691 3 4.1 CC 27193 1 30949 4 5.1 CC 24370 1 50542 5 6.1 CC 28359 1 58591 6 7.1 CC 25136 1 45801 7 1.2 P 23646 2 50333 1 2.2 CC 26174 2 38227 2 ...
The syntax of reshaping functions is more complex than stacks, but can be used for more than one column of data in a "long" form. If the direction = "width", reshaping can also perform the opposite transformation.
This is the end of this article about the summary of R language data reshaping knowledge points. For more related R language data reshaping content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!