1. Variable recode
Recoding involves the process of creating new values based on the existing values of the same variable and/or other variables, such as reassigning values that meet a certain condition, etc. Here are two common methods:
#The first methodper <- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"), age = c(23,45,34,1000)) per per$age[per$age == 1000] <- NA #Set missing valuesper$age1[per$age < 30] = "young" #Generate new variablesper$age1[per$age >= 30 & per$age<50] <- "middle age" per #The second methodper <- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"), age = c(23,45,34,1000)) per <- within(per,{ age1 <- NA age1[age < 30] <- "young" age1[age>=30 & age<50] <- "middle age" }) per
2. Rename variables
Variables already exist, but if you are not satisfied with the variable name, you can rename the variable. Here are the following methods:
Enter manually. Use the fix() function to call up the edit box and enter it manually.
Use the names() function. The format is: names(x) <- value. You need to specify which variable name to modify.
Use the rename() function in the plyr package. The format is rename(x, replace, warning_missing = TRUE, warning_duplicated = TRUE), and it is necessary to specify which variable name to modify.
per <- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"), age = c(23,45,34,1000)) per #The first method: manual inputfix(per) #Click out the input box and enter manually#The second method, the names() functionper <- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"), age = c(23,45,34,1000)) names(per)[2] <- "age" #Specify the first variable to renameper #The third methodper <- (name = c("Zhang San","Li Si","Wang Wu","Zhao Liu"), age = c(23,45,34,1000)) library(plyr) per <- rename(per,c(age="age")) #Directly modify the variable nameper
Supplement: Processing of R language variables (create new variables and reassign variables)
Create a new variable:
Method 1:
#Create a new variable sum in mydata database. sum is the sum of x1 and x2 in mydata databasemydata$sum <- mydata$x1 + mydata$x2 #Create a new variable mean in mydata database, sum is the average x1 and x2 in mydata databasemydata$mean <- (mydata$x1 + mydata$x2)/2
eg:
> newwomen=women > newwomen$bmi=women$weight/women$height^2; > newwomen height weight bmi 1 58 115 0.03418549 2 59 117 0.03361103 3 60 120 0.03333333 4 61 123 0.03305563 5 62 126 0.03277836 6 63 129 0.03250189 7 64 132 0.03222656 8 65 135 0.03195266 9 66 139 0.03191001 10 67 142 0.03163288 11 68 146 0.03157439 12 69 150 0.03150599 13 70 154 0.03142857 14 71 159 0.03154136 15 72 164 0.03163580 >
Method 2:
attach(mydata) #attach the data mydatamydata$sum <- x1 + x2 #Create a new variable sum in mydata databasemydata$mean <- (x1 + x2)/2 #New variable meansdetach(mydata) #every timeattachAfter the data,shoulddetach,Unlock previous database adhesion
In this place, we see that the x1 and x2 variables do not need to use mydata$ before, because we have attached the mydata database (attach). At this time, R already knows that it will use the x1 and x2 variables in the attached data to directly calculate. However, the new variable sum created must be used under the mydata database using $, otherwise R will calculate the name sum, but it is a separate database with a list nature.
Method 3:
mydata <- transform( mydata, sum = x1 + x2,mean = (x1 + x2)/2) #Create multiple new variables with one line of command。
newwomen=transform(women,bmi=weight/height^2) > newwomen height weight bmi 1 58 115 0.03418549 2 59 117 0.03361103 3 60 120 0.03333333 4 61 123 0.03305563 5 62 126 0.03277836 6 63 129 0.03250189 7 64 132 0.03222656 8 65 135 0.03195266 9 66 139 0.03191001 10 67 142 0.03163288 11 68 146 0.03157439 12 69 150 0.03150599 13 70 154 0.03142857 14 71 159 0.03154136 15 72 164 0.03163580
Variable reassignment
Method 1:
mydata$agecat <- ifelse(mydata$age > 70,c("older"), c("younger")) # Create an age group(2Group)
This command utilizes the ifelse function, which is somewhat similar to if….else in other languages. The left side of this command tells R that we need to create a new variable agecat (age group) in the mydata database. When age>70, the age group variable is assigned to older. In other cases (age<=70), the age group variable is assigned to younger. For more ifelse() information and cases, you can view it through help(ifelse).
Method 2:
attach(mydata) mydata$agecat[age > 75] <- "Elder" mydata$agecat[age > 45 & age <= 75] <- "Middle Aged" mydata$agecat[age <= 45] <- "Young" detach(mydata) #Create variablesagecat,And directly assign values according to age.
Rename
Method 1
fix(mydata) #Rename directly,Save when closed
Method 2
library(reshape) mydata <- rename(mydata, c(oldname="newname")) #Use the rename function in the reshape package to directly rename.
The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.