SoFunction
Updated on 2025-03-01

Operations of recoding and renaming of R language variables

1. Variable recode

Recoding involves the process of creating new values ​​based on the existing values ​​of the same variable and/or other variables, such as reassigning values ​​that meet a certain condition, etc. Here are two common methods:

#The first methodper <- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"),
                  age = c(23,45,34,1000))
per
per$age[per$age == 1000] <- NA #Set missing valuesper$age1[per$age < 30] = "young" #Generate new variablesper$age1[per$age >= 30 & per$age<50] <- "middle age" 
per
#The second methodper <- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"),
                  age = c(23,45,34,1000))
per <- within(per,{
   age1 <- NA
   age1[age < 30] <- "young"
   age1[age>=30 & age<50] <- "middle age"
})
per

2. Rename variables

Variables already exist, but if you are not satisfied with the variable name, you can rename the variable. Here are the following methods:

Enter manually. Use the fix() function to call up the edit box and enter it manually.

Use the names() function. The format is: names(x) <- value. You need to specify which variable name to modify.

Use the rename() function in the plyr package. The format is rename(x, replace, warning_missing = TRUE, warning_duplicated = TRUE), and it is necessary to specify which variable name to modify.

per &lt;- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"),
                  age = c(23,45,34,1000))
per
#The first method: manual inputfix(per) #Click out the input box and enter manually#The second method, the names() functionper &lt;- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"),
                  age = c(23,45,34,1000))
names(per)[2] &lt;- "age"  #Specify the first variable to renameper
#The third methodper &lt;- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"),
                  age = c(23,45,34,1000))
library(plyr)
per &lt;- rename(per,c(age="age")) #Directly modify the variable nameper

Supplement: Processing of R language variables (create new variables and reassign variables)

Create a new variable:

Method 1:

#Create a new variable sum in mydata database. sum is the sum of x1 and x2 in mydata databasemydata$sum &lt;- mydata$x1 + mydata$x2
#Create a new variable mean in mydata database, sum is the average x1 and x2 in mydata databasemydata$mean &lt;- (mydata$x1 + mydata$x2)/2

eg:

> newwomen=women
> newwomen$bmi=women$weight/women$height^2;
> newwomen
   height weight        bmi
1      58    115 0.03418549
2      59    117 0.03361103
3      60    120 0.03333333
4      61    123 0.03305563
5      62    126 0.03277836
6      63    129 0.03250189
7      64    132 0.03222656
8      65    135 0.03195266
9      66    139 0.03191001
10     67    142 0.03163288
11     68    146 0.03157439
12     69    150 0.03150599
13     70    154 0.03142857
14     71    159 0.03154136
15     72    164 0.03163580
> 

Method 2:

attach(mydata) #attach the data mydatamydata$sum &lt;- x1 + x2 #Create a new variable sum in mydata databasemydata$mean &lt;- (x1 + x2)/2 #New variable meansdetach(mydata) #every timeattachAfter the data,shoulddetach,Unlock previous database adhesion

In this place, we see that the x1 and x2 variables do not need to use mydata$ before, because we have attached the mydata database (attach). At this time, R already knows that it will use the x1 and x2 variables in the attached data to directly calculate. However, the new variable sum created must be used under the mydata database using $, otherwise R will calculate the name sum, but it is a separate database with a list nature.

Method 3:

mydata &lt;- transform( mydata, sum = x1 + x2,mean = (x1 + x2)/2)
#Create multiple new variables with one line of command。
 newwomen=transform(women,bmi=weight/height^2)
> newwomen
   height weight        bmi
1      58    115 0.03418549
2      59    117 0.03361103
3      60    120 0.03333333
4      61    123 0.03305563
5      62    126 0.03277836
6      63    129 0.03250189
7      64    132 0.03222656
8      65    135 0.03195266
9      66    139 0.03191001
10     67    142 0.03163288
11     68    146 0.03157439
12     69    150 0.03150599
13     70    154 0.03142857
14     71    159 0.03154136
15     72    164 0.03163580

Variable reassignment

Method 1:

mydata$agecat &lt;- ifelse(mydata$age &gt; 70,c("older"), c("younger")) # Create an age group(2Group)

This command utilizes the ifelse function, which is somewhat similar to if….else in other languages. The left side of this command tells R that we need to create a new variable agecat (age group) in the mydata database. When age>70, the age group variable is assigned to older. In other cases (age<=70), the age group variable is assigned to younger. For more ifelse() information and cases, you can view it through help(ifelse).

Method 2:

attach(mydata)
mydata$agecat[age &gt; 75] &lt;- "Elder"
mydata$agecat[age &gt; 45 &amp; age &lt;= 75] &lt;- "Middle Aged"
mydata$agecat[age &lt;= 45] &lt;- "Young"
detach(mydata)
#Create variablesagecat,And directly assign values ​​according to age.

Rename

Method 1

fix(mydata) #Rename directly,Save when closed

Method 2

library(reshape)
mydata &lt;- rename(mydata, c(oldname="newname"))
#Use the rename function in the reshape package to directly rename.

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.