A data frame is a table or two-dimensional array-like structure, where each column contains the value of a variable, and each row contains a set of values from each column.
The following are the characteristics of the data frame.
- The column name should be non-empty.
- The line name should be unique.
- The data stored in the data frame can be a number, factor, or character type.
- Each column should contain the same number of data items.
Create data frames
# Create the data frame. <- ( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = (c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors = FALSE ) # Print the data frame. print()
When we execute the above code, it produces the following result −
emp_id emp_name salary start_date 1 1 Rick 623.30 2012-01-01 2 2 Dan 515.20 2013-09-23 3 3 Michelle 611.00 2014-11-15 4 4 Ryan 729.00 2014-05-11 5 5 Gary 843.25 2015-03-27
Get the structure of the data frame
The structure of the data frame can be seen by using the str() function.
# Create the data frame. <- ( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = (c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors = FALSE ) # Get the structure of the data frame. str()
When we execute the above code, it produces the following result −
'': 5 obs. of 4 variables: $ emp_id : int 1 2 3 4 5 $ emp_name : chr "Rick" "Dan" "Michelle" "Ryan" ... $ salary : num 623 515 611 729 843 $ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...
Data summary in the data frame
Can be appliedsummary()Functions obtain statistical summary and properties of data.
# Create the data frame. <- ( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = (c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors = FALSE ) # Print the summary. print(summary())
When we execute the above code, it produces the following result −
emp_id emp_name salary start_date Min. :1 Length:5 Min. :515.2 Min. :2012-01-01 1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23 Median :3 Mode :character Median :623.3 Median :2014-05-11 Mean :3 Mean :664.4 Mean :2014-01-14 3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15 Max. :5 Max. :843.2 Max. :2015-03-27
Extract data from data frame
Use the column name to extract specific columns from the data frame.
# Create the data frame. <- ( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = (c("2012-01-01","2013-09-23","2014-11-15","2014-05-11", "2015-03-27")), stringsAsFactors = FALSE ) # Extract Specific columns. result <- ($emp_name,$salary) print(result)
When we execute the above code, it produces the following result −
.emp_name 1 Rick 623.30 2 Dan 515.20 3 Michelle 611.00 4 Ryan 729.00 5 Gary 843.25
Extract the first two rows first, then extract all columns
# Create the data frame. <- ( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = (c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors = FALSE ) # Extract first two rows. result <- [1:2,] print(result)
When we execute the above code, it produces the following result −
emp_id emp_name salary start_date 1 1 Rick 623.3 2012-01-01 2 2 Dan 515.2 2013-09-23
Extract rows 3 and 5 with columns 2 and 4
# Create the data frame. <- ( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = (c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors = FALSE ) # Extract 3rd and 5th row with 2nd and 4th column. result <- [c(3,5),c(2,4)] print(result)
When we execute the above code, it produces the following result −
emp_name start_date 3 Michelle 2014-11-15 5 Gary 2015-03-27
Extended data frames
Dataframes can be extended by adding columns and rows.
Add column
Simply add the column vector with the new column name.
# Create the data frame. <- ( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = (c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors = FALSE ) # Add the "dept" coulmn. $dept <- c("IT","Operations","IT","HR","Finance") v <- print(v)
When we execute the above code, it produces the following result −
emp_id emp_name salary start_date dept 1 1 Rick 623.30 2012-01-01 IT 2 2 Dan 515.20 2013-09-23 Operations 3 3 Michelle 611.00 2014-11-15 IT 4 4 Ryan 729.00 2014-05-11 HR 5 5 Gary 843.25 2015-03-27 Finance
Add rows
To permanently add more rows to an existing dataframe, we need to introduce new rows with the same structure as the existing dataframe and userbind()function.
In the following example, we create a data frame with a new row and merge it with an existing data frame to create the final data frame.
# Create the first data frame. <- ( emp_id = c (1:5), emp_name = c("Rick","Dan","Michelle","Ryan","Gary"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = (c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), dept = c("IT","Operations","IT","HR","Finance"), stringsAsFactors = FALSE ) # Create the second data frame <- ( emp_id = c (6:8), emp_name = c("Rasmi","Pranab","Tusar"), salary = c(578.0,722.5,632.8), start_date = (c("2013-05-21","2013-07-30","2014-06-17")), dept = c("IT","Operations","Fianance"), stringsAsFactors = FALSE ) # Bind the two data frames. <- rbind(,) print()
When we execute the above code, it produces the following result −
emp_id emp_name salary start_date dept 1 1 Rick 623.30 2012-01-01 IT 2 2 Dan 515.20 2013-09-23 Operations 3 3 Michelle 611.00 2014-11-15 IT 4 4 Ryan 729.00 2014-05-11 HR 5 5 Gary 843.25 2015-03-27 Finance 6 6 Rasmi 578.00 2013-05-21 IT 7 7 Pranab 722.50 2013-07-30 Operations 8 8 Tusar 632.80 2014-06-17 Fianance
This is the end of this article about the detailed explanation of R language knowledge points about data frames. For more related R language data frame content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!