Summary of knowledge points for reshaping R language data

Data reshaping in R language is about changing the way data is organized into rows and columns. Most of the time, data processing in R language is done by taking input data as data frames. It is easy to extract data from rows and columns of a data frame, but in some cases the format we need is different from the format we receive data frames. R language has many features, splitting, merging and changing rows to columns in dataframes, and vice versa.

Add columns and rows to data frames

We can use the cbind() function to connect multiple vectors to create data frames. Additionally, we can use the rbind() function to merge two dataframes.

# Create vector objects.
city <- c("Tampa","Seattle","Hartford","Denver")
state <- c("FL","WA","CT","CO")
zipcode <- c(33602,98104,06161,80294)

# Combine above three vectors into one data frame.
addresses <- cbind(city,state,zipcode)

# Print a header.
cat("# # # # The First data frame
") 

# Print the data frame.
print(addresses)

# Create another data frame with similar columns
 <- (
  city = c("Lowry","Charlotte"),
  state = c("CO","FL"),
  zipcode = c("80230","33949"),
  stringsAsFactors = FALSE
)

# Print a header.
cat("# # # The Second data frame
") 

# Print the data frame.
print()

# Combine rows form both the data frames.
 <- rbind(addresses,)

# Print a header.
cat("# # # The combined data frame
") 

# Print the result.
print()

When we execute the above code, it produces the following result −

# # # # The First data frame
   city    state zipcode
[1,] "Tampa"  "FL" "33602"
[2,] "Seattle" "WA" "98104"
[3,] "Hartford" "CT"  "6161" 
[4,] "Denver"  "CO" "80294"

# # # The Second data frame
    city    state  zipcode
1   Lowry   CO   80230
2   Charlotte FL   33949

# # # The combined data frame
    city   state zipcode
1   Tampa   FL  33602
2   Seattle  WA  98104
3   Hartford CT   6161
4   Denver  CO  80294
5   Lowry   CO  80230
6   Charlotte FL  33949

Merge data frames

We can use the merge() function to merge two data frames. The data frame must have the same column name, merged on it.

In the following example, we consider the data set of diabetes in the library name “MASS” about Pima Indian Women. We combined two datasets based on the values of blood pressure (“bp”) and body mass index (“bmi”). When these two columns are selected for merging, the records matching the values of the two variables in the two datasets are combined together to form a single data frame.

library(MASS)
 <- merge(x = , y = ,
   = c("bp", "bmi"),
   = c("bp", "bmi")
)
print()
nrow()

When we execute the above code, it produces the following result −

  bp bmi          
1 60 33.8    1  117   23 0.466  27   No    2  125   20 0.088
2 64 29.7    2  75   24 0.370  33   No    2  100   23 0.368
3 64 31.2    5  189   33 0.583  29  Yes    3  158   13 0.295
4 64 33.2    4  117   27 0.230  24   No    1  96   27 0.289
5 66 38.1    3  115   39 0.150  28   No    1  114   36 0.289
6 68 38.5    2  100   25 0.324  26   No    7  129   49 0.439
7 70 27.4    1  116   28 0.204  21   No    0  124   20 0.254
8 70 33.1    4  91   32 0.446  22   No    9  123   44 0.374
9 70 35.4    9  124   33 0.282  34   No    6  134   23 0.542
10 72 25.6    1  157   21 0.123  24   No    4  99   17 0.294
11 72 37.7    5  95   33 0.370  27   No    6  103   32 0.324
12 74 25.9    9  134   33 0.460  81   No    8  126   38 0.162
13 74 25.9    1  95   21 0.673  36   No    8  126   38 0.162
14 78 27.6    5  88   30 0.258  37   No    6  125   31 0.565
15 78 27.6   10  122   31 0.512  45   No    6  125   31 0.565
16 78 39.4    2  112   50 0.175  24   No    4  112   40 0.236
17 88 34.5    1  117   24 0.403  40  Yes    4  127   11 0.598
   
1   31   No
2   21   No
3   24   No
4   21   No
5   21   No
6   43  Yes
7   36  Yes
8   40   No
9   29  Yes
10  28   No
11  55   No
12  39   No
13  39   No
14  49  Yes
15  49  Yes
16  38   No
17  28   No
[1] 17

Sometimes, the spreadsheet data is formatted in a compact manner, giving covariates for each topic, followed by all observations for that topic. The modeling function of R needs to be observed in a single column. Consider the following sample of data from repeated MRI brain measurements

 Status  Age  V1   V2   V3  V4
   P 23646 45190 50333 55166 56271
   CC 26174 35535 38227 37911 41184
   CC 27723 25691 25712 26144 26398
   CC 27193 30949 29693 29754 30772
   CC 24370 50542 51966 54341 54273
   CC 28359 58591 58803 59435 61292
   CC 25136 45801 45389 47197 47126

There are two covariates and up to four measurements on each topic. Data is exported from Excel to a file.

We can use the stack to help manipulate this data to give a single response.

zz <- ("",  = TRUE)
zzz <- cbind(zz[gl(nrow(zz), 1, 4*nrow(zz)), 1:2], stack(zz[, 3:6]))

The result is:

   Status  Age values ind
X1     P 23646 45190 V1
X2    CC 26174 35535 V1
X3    CC 27723 25691 V1
X4    CC 27193 30949 V1
X5    CC 24370 50542 V1
X6    CC 28359 58591 V1
X7    CC 25136 45801 V1
X11    P 23646 50333 V2
...

The function unstack has the opposite direction and may be useful for exporting data.

Another way is to use function reshaping

> reshape(zz, idvar="id",timevar="var",
 varying=list(c("V1","V2","V3","V4")),direction="long")
  Status  Age var  V1 id
1.1   P 23646  1 45190 1
2.1   CC 26174  1 35535 2
3.1   CC 27723  1 25691 3
4.1   CC 27193  1 30949 4
5.1   CC 24370  1 50542 5
6.1   CC 28359  1 58591 6
7.1   CC 25136  1 45801 7
1.2   P 23646  2 50333 1
2.2   CC 26174  2 38227 2
...

The syntax of reshaping functions is more complex than stacks, but can be used for more than one column of data in a "long" form. If the direction = "width", reshaping can also perform the opposite transformation.

This is the end of this article about the summary of R language data reshaping knowledge points. For more related R language data reshaping content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!