1

This is regarding data manipulation and cleaning in R.


I have dataset 1:

Date   time Range Waterconsumption    
1/1/01 0300 31km  2.0liters
2/1/01 0800 30km  1.8liters
3/1/01 0300 33km  1.7liters
4/1/01 0600 32km  1.8liters
5/1/01 0800 28km  1.7liters
6/1/01 0300 35km  1.6liters
7/1/01 0800 31km  1.8liters

And also dataset 2:

Date   time heatlost weight    
1/1/01 0300 0.27     61.5kg
2/1/01 0800 0.33     62.0kg
5/1/01 0800 0.69     61.7kg
6/1/01 0300 0.15     61.8kg
7/1/01 0800 0.63     62.0kg

As you can see, dataset 2 has lost some dates (from 3/1/01 to 4/1/01).

So how can I combine dataset 1 and 2 using cbind i.e. inserting heatlost and weight behind waterconsumption (dataset1) according to date?

hpesoj626
  • 3,529
  • 1
  • 17
  • 25
MT32
  • 677
  • 1
  • 11
  • 24

1 Answers1

2

You can use the library dplyr::left_join(df1, df2, "time")

First let's generate some data to work with, reflecting the variables in your project above:

df1 <- 
  data.frame(
    id = c(1:4),
    time = c(1:4), 
    range = floor(runif(4, 28,32)),
    watercon = round(runif(4,1.5,1.7),2)
  )

df2 <- 
  data.frame(
    id = c(1,4),
    time = c(1,4), 
    heatlost = c(0.25,0.33),
    weight = c(62.5,61.4)
  )

df2 has some missing values as per your original questions, and when we apply a left_join these values will be replaced with an NA.

If you apply left_join to join by "time" and then keep only the variables you want using select:

library(dplyr)
left_join(df1, df2, "time") %>% 
  select(time, range, watercon, heatlost, weight)

You'll get the returning dataframe:

time       range   watercon heatlost    weight
    1          30      1.52     0.25      62.5
    2          29      1.55       NA        NA
    3          29      1.51       NA        NA
    4          30      1.53     0.33       61.4
onlyphantom
  • 8,606
  • 4
  • 44
  • 58