SoFunction
Updated on 2025-04-10

R language sharing of skills for data set ranks and queues interchange

The data processing technique I will introduce to you now is length to width, which is equivalent to transposition in Excel. However, the length to width and data merging implemented in R language are naturally much more powerful than Excel.

Here are 4 functions, among which melt() and dcast() come from the reshape2 package, gather() and spread() come from the tidyr package

1. Width to length—melt(), gather()

mydata<-(
 name=c("store1","store2","store3","store4"),
 address=c("Putuo District","Huangpu District","Xuhui District","Pudong New Area"),
 sale2014=c(3000,2500,2100,1000),
 sale2015=c(3020,2800,3900,2000),
 sale2016=c(5150,3600,2700,2500),
 sale2017=c(4450,4100,4000,3200)
)
#Wide to length—meltmydata1<-melt(
 mydata,
 =c("address","name"),#The main field to be retained  = "Year",#Converted classification field name (dimension)  = "Sale" #Converted metric value name)

Output result

> mydata1<-melt(
+ mydata,
+ =c("address","name"),#The main field to be retained+  = "Year",#Converted classification field name (dimension)+  = "Sale" #Converted metric value name+ )
> mydata1
 address name  Year Sale
1 Putuo District store1 sale2014 3000
2 Huangpu District store2 sale2014 2500
3 Xuhui District store3 sale2014 2100
4 Pudong New Area store4 sale2014 1000
5 Putuo District store1 sale2015 3020
6 Huangpu District store2 sale2015 2800
7 Xuhui District store3 sale2015 3900
8 Pudong New Area store4 sale2015 2000
9 Putuo District store1 sale2016 5150
10 Huangpu District store2 sale2016 3600
11 Xuhui District store3 sale2016 2700
12 Pudong New Area store4 sale2016 2500
13 Putuo District store1 sale2017 4450
14 Huangpu District store2 sale2017 4100
15 Xuhui District store3 sale2017 4000
16 Pudong New Area store4 sale2017 3200

Let's see how to use the gather() function

> #Wide to length—gather> mydata1<-tidyr::gather(
+ data=mydata,
+ key="Year",
+ value="sale",
+ sale2014:sale2017
+ )
> mydata1
  name address  Year sale
1 store1 Putuo District sale2014 3000
2 store2 Huangpu District sale2014 2500
3 store3 Xuhui District sale2014 2100
4 store4 Pudong New Area sale2014 1000
5 store1 Putuo District sale2015 3020
6 store2 Huangpu District sale2015 2800
7 store3 Xuhui District sale2015 3900
8 store4 Pudong New Area sale2015 2000
9 store1 Putuo District sale2016 5150
10 store2 Huangpu District sale2016 3600
11 store3 Xuhui District sale2016 2700
12 store4 Pudong New Area sale2016 2500
13 store1 Putuo District sale2017 4450
14 store2 Huangpu District sale2017 4100
15 store3 Xuhui District sale2017 4000
16 store4 Pudong New Area sale2017 3200

Unlike melt() function, the gather() function needs to specify the key field key and the corresponding value of the key field, but the gather() function is easier to understand.

2. Length to width—dcast() and spread()

Let’s use the data1 dataset above, let’s take a look at the dcast() function first

#Length to width—dcastdcast(
 data=mydata1,
 name+address~Year
 #The left side is the field to be retained, the right side is the categorical variable to be divided, the number of columns is equal to the expression #Number of categories of categorical variables on the right)
> #Length to width—dcast> dcast(
+ data=mydata1,
+ name+address~Year
+ #The left side is the field to be retained, the right side is the categorical variable to be divided, the number of columns is equal to the expression+ #Number of categories of categorical variables on the right+ )
Using sale as value column: use  to override.
 name address sale2014 sale2015 sale2016 sale2017
1 store1 Putuo District  3000  3020  5150  4450
2 store2 Huangpu District  2500  2800  3600  4100
3 store3 Xuhui District  2100  3900  2700  4000
4 store4 Pudong New Area  1000  2000  2500  3200

The rules of using dcast() function need to be pondered before you can understand it. Please take a look at the comments section and then take a look at spread()

#Length to width—spreadtidyr::spread(
 data=mydata1,
 key=Year,
 value=sale
)
> #Length to width—spread> tidyr::spread(
+ data=mydata1,
+ key=Year,
+ value=sale
+ )
 name address sale2014 sale2015 sale2016 sale2017
1 store1 Putuo District  3000  3020  5150  4450
2 store2 Huangpu District  2500  2800  3600  4100
3 store3 Xuhui District  2100  3900  2700  4000
4 store4 Pudong New Area  1000  2000  2500  3200

To call tidyr::spread directly, you need to specify the key field key and the corresponding value.

But from an understanding point of view, I personally prefer the functions of the tidyr package. The use is very clear. You can choose according to the actual situation. Okay, today's sharing ends, see you next time!

Supplement: Use R language to convert data ranks and queues

Software used: R language, mysql

System of use: ubuntu16.04

Comparison before and after use:

Before use

x1 x2 y
a A 1
b B 2
c C 3
a A 4

After use (the values ​​inside can undergo multiple transformations, mainly changing x1 and x2 to their crosstab)

A B C
a 5(1+4) NA NA
b NA 2 NA
c NA NA 3

Language and mysql database connection

Download RMysql (because I'm using ubuntu, so I use RMysql):

("RMySQL")

Load the required program package:

library(RMySQL)

Import mysql data into R:

con<-dbConnect(MySQL(),dbname="database_name",user="user_name",password="password")

After importing, the resulting data type is Dataframe()

For the RMySQL package, there are several more important statements:

summary(con) : used to view the mysql information corresponding to con

dbListTables(con) : Used to view all tables in the con

dbRemoveTable(con,”table_name”)

dbReadTable(con,”table_name”)

dbWriteTable(con,"table_name",value) where the value type needs to be a data frame

2. Implement data transformation in R

We use the above statement to read the table that needs to be transformed

data=dbReadTable(con,"table_name")

Then use an important function: taply()

Note that I am using the table I used in the example I gave below

data2=tapply(data\$y,list(data\$x1,data$x2),sum)

For taply(x,list(a,b),fac)

a and b are the row list headers that form the crosstab

And x is the value, of course we can not use the value in data as x

fac is a function operation on x. For example, using sum, which is to add the same y of x1 and x2

3. Import the data back to mysql

We have transformed the data into the desired table form above. However, after using taply, the data2 class will become a matrix. If you want to know the data type, you can view it using class() in R. The RMySQL package we use requires a data frame to write data in r to mysql, so we need to convert the matrix into a data frame. So how to do it? It's very simple.

data3=(data2)

In this way, our data3 is a data frame, using statements

dbWriteTable(con,test,data3)

This writes a table in mysql, called test, and the data value is data3.

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.