Since you understand the basic data types of R language, how can you send huge data into R language for processing? How is the sent data stored in R language? What are the methods to process this data? Let’s discuss it together below.
First of all, the most direct and intuitive way to input data is keyboard input. In the above articles, we have mentioned that using c to create vectors, using matrix to create matrices, using data frames, etc., but we often process a lot of data. Keyboard input is obviously unrealistic when facing such huge data. Of course, you can spend several days to enter data and ensure no errors. Moreover, what is to be processed is generally stored in Excel, web pages, and other intermediaries in the database. Therefore, how to read data in large batches without errors and efficiently becomes the first problem that R language needs to solve.
First: If you learn to write code by yourself, load the data packages cars, etc. that come with R language, the loading method is the same as other packages. The specific code is as follows:
> ("car") > library(cars)
Second: Reading external data is generally used to read.***( ), which represents the type of file to be read. The following explains in detail the reading of each type of file:
(file, header = FALSE, sep = "", quote = "\"'", dec = ".", numerals = c("", "", ""), , , = !stringsAsFactors, = "NA", colClasses = NA, nrows = -1, skip = 0, = TRUE, fill = !, = FALSE, = TRUE, = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = (), fileEncoding = "", encoding = "unknown", text, skipNul = FALSE) (file, header = TRUE, sep = ",", quote = "\"", dec = ".", fill = TRUE, = "", ...) read.csv2(file, header = TRUE, sep = ";", quote = "\"", dec = ",", fill = TRUE, = "", ...) (file, header = TRUE, sep = "\t", quote = "\"", dec = ".", fill = TRUE, = "", ...) read.delim2(file, header = TRUE, sep = "\t", quote = "\"", dec = ",", fill = TRUE, = "", ...)
in:
Parameter file: represents the file name and file path to be read. If the current working path is the storage path of the file that needs to be read, you can write the file name directly, remember to enclose it in double quotes. So how to set up a working path? First, we use the getwd( ) function to check the current working path. When setting the working path, we need to change all "\" in the path to "\\". The specific operation is as follows:
> getwd() #View the current working path[1] "E:/Zhao Zhibo/R" > setwd("E:\Zhao Zhibo") #Incorrect setting methodError: '\? is an unrecognized escape in character string starting ""E:\? > setwd("E:\\Zhao Zhibo") #The correct way to set it up> getwd() #View the working path after setting[1] "E:/Zhao Zhibo"
When the working path is set, you can directly read the files in the working path without adding the storage location of the file. As shown in the code, the current working path is "E:/Zhao Zhibo". A file "" is created under this path. The file can be read directly. However, a new file "" is created in "D:/", so it cannot be read directly. The complete file path needs to be added. The path also needs to be changed to "\\".
mydata <- ("",sep = ',') > mydata V1 V2 V3 1 1 2 3 2 4 5 6 3 7 8 9 > mydataD <- ("",sep = ',') Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file '': No such file or directory > mydataD <- ("D:\\",sep = ',') > mydataD V1 V2 V3 1 1 2 3 2 4 5 6 3 7 8 9
It is too troublesome for some occasions to set the path every time. Therefore, R provides a function that can directly select the file location, read.***(()), for example, read txt data:
mydatachoose <- (()) #Freely select data to read,but( )There seems to be no parameters
Parameter header: mainly determines whether the file to be read has set the column name itself, and the default value is FALSE. This needs to be explained: We have already talked about types in the previous section, so the return value of read.***( ) is also a data frame type, and it is naturally a data frame format filled in by columns. Therefore, the column name is particularly important, which is equivalent to the header part of Excel. The row number system will be arranged in order of 1~N, and of course it can also be modified.
Parameter sep: It mainly determines the way of segmentation of each character in the file that needs to be read, generally there are space segmentation, comma segmentation, etc. Only by determining the segmentation method can the data be read as you want.
These are the common parameters. The other principles are similar. If you need to learn it yourself.
File: mydata <- ( )
Format data: mydata <- ( ), before using this function, you need to install the xlsx package first, and the installation method is the same as the installation method of other packages. Generally speaking, reading xlsx files is slower, and the Excel file is usually saved in csv format, and then reading mydata <- ( )
Data: Before reading XML data, you must load the XML package, and then use mydata <- xmlRoot(xmlTreeParse("***.xml")) to read it
Let’s introduce the data reading here first, and learn other useful things separately.
This is the end of this article about the in-depth explanation of data input in R language. For more relevant data input in R language, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!