0

I am working from two Excel sheets. This is my code:

# Read the first dataset
df1 <- read.csv("ATLweather.csv")

# Read the second dataset
df2 <- read.csv("electricityprices.csv")

 # Merge the datasets
 library("dplyr")
 merged_df <- left_join(df1, df2, by = "Timestamp")
  head(merged_df)

 library(writexl)

 # Save the merged data frame as an Excel file
  write_xlsx(merged_df, path = "C:/Users/Nathaniel/Documents/newfile.xlsx")

This successfully merges the data sets, but replaces all the data in my "cost" column with "NA" (pictured below in my environment).

enter image description here

This is my first time doing any coding, so I'm at a loss as to how to fix this. What am I doing wrong?

jpsmith
  • 11,023
  • 5
  • 15
  • 36
Hades
  • 13
  • 3
  • 2
    From your image it looks like the time stamp is in a different format. Slashes instead of dashes.. this means that the join can't find exact matches. I'm uniforming the Timestamp column should fix this. – Claudio Paladini Feb 21 '23 at 02:23
  • 3
    It would be more helpful if you used dput() function on the data frames and posted a sample of the actual data. Likely the times are not matching up. If you left join D1 to D2 then if there is no match it will create a null value, that can also happen if the data types are not the same e.g. numeric vs integer64. For time values you may need to round one/both sides so that it is possible to get an exact match. – alexrai93 Feb 21 '23 at 02:26
  • @ClaudioPaladini Thanks! So you're saying the problem is that the dates in the "Timestamp" columns are not formatted the same? – Hades Feb 21 '23 at 02:26
  • Test the time stamps using class(timestamp) in each and see if it agrees, if they are the same class then consider rounding to the nearest hour/day or whatever makes sense for the data. – alexrai93 Feb 21 '23 at 02:30
  • 1
    Please provide enough code so others can better understand or reproduce the problem. – Community Feb 21 '23 at 10:24

1 Answers1

1

As @Claudio pointed out, the Timestamp vectors are character vectors that are formatted differently (you can tell they are characters by the "chr" in your screenshot). R doesn't know that characters should be interpreted as dates. You can change them both to date objects and merge them that way, but a simple way would be to convert one format to the other and leave them as characters (sometimes playing with dates in R is more work than it is worth, but see the package lubridate if you want to play with them as dates).

Here is a quick fix:

Timestamp<-c("1/1/2010 0:00","1/1/2010 1:00")  ## create example data

Timestamp<-as.POSIXct(Timestamp,format="%m/%d/%Y %H:%M") ## can enter timezone here if you want
Timestamp<-as.character(Timestamp)

Timestamp  # look at output
"2010-01-01 00:00:00" "2010-01-01 01:00:00"

Then just change Timestamp here to df2$Timestamp for your purposes.

Dylan_Gomes
  • 2,066
  • 14
  • 29
  • Thank you! I will try to follow these instructions. Quick question: the dates in the original Excel sheets were actually formatted exactly the same way (I just checked). Would you be able to tell me why one was changed when imported to R? – Hades Feb 21 '23 at 02:39
  • Can't tell you for certain. Sometimes Excel automatically formats dates, so they might look different in Excel than they do in the csv file (viewed outside of Excel). Try opening each of them in a text editor and see if the format is still the same. – Dylan_Gomes Feb 21 '23 at 02:42
  • @Hades did this solve your issue? – Dylan_Gomes Feb 21 '23 at 15:48