Table joining problems are often encountered in data processing. This time, we introduce three left joining methods in R language. These three are equivalent, but there will be time-speed problems, so please use them carefully.
Method 1:
> data0 <- merge(a,c,=TRUE,by='CELLPHONE')
Method 2:
> data1 <- sqldf('select a.*,b.* from a left join c on =')
Method Three:
> data2 <- c[a,on='CELLPHONE']
Note: The order of the third method cannot be written in reverse.
Supplement: inner_join, full_join, left_join, right_join in R language
In R for Data Science, the author uses very intuitive examples to explain the above four concepts. The description is as follows:
Our dataset looks like this:
x <- tribble( ~key, ~val_x, 1, "x1", 2, "x2", 3, "x3" ) y <- tribble( ~key, ~val_y, 1, "y1", 2, "y2", 4, "y3" )
It can be seen that the keys of x and y have 1 and 2, but there are 3 in the key of x and 4 in the key of y.
Let’s look at these four concepts:
1. inner_join
x %>% inner_join(y, by = "key")
The result is
key val_x val_y <dbl> <chr> <chr> 1 x1 y1 2 x2 y2
It can be seen that at this time, the key-based connection only retains the data corresponding to the common key values 1 and 2;
2. full_join
x %>% full_join(y, by = "key")
The result is
key val_x val_y <dbl> <chr> <chr> 1 x1 y1 2 x2 y2 3 x3 NA 4 NA y3
It can be seen that at this time, the connection based on the key retains the data corresponding to all key values. When the corresponding value does not exist, NA is used instead;
3. left_join
x %>% left_join(y, by = "key")
The result at this time is
<dbl> <chr> <chr> 1 x1 y1 2 x2 y2 3 x3 NA
It can be seen that at this time, the connection based on key only retains the data of the key value corresponding to x. When the corresponding value does not exist, NA is used instead;
4. right_join
x %>% right_join(y, by = "key")
The result at this time is
key val_x val_y <dbl> <chr> <chr> 1 x1 y1 2 x2 y2 4 NA y3
It can be seen that at this time, the connection based on the key only retains the data of the key value corresponding to y. When the corresponding value does not exist, NA is used instead.
The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.