0

I read in a bunch of csv files into R and binded them together so I had one giant data file that looked like this

This file is called data

Date         Time   pH
1976-01-26   6:00   4.00
1976-01-26   7:30   4.05
1976-01-26   13:30  4.50
1976-01-27   5:00   4.50

for 2000 rows

I then tried

data$DateTime = as.POSIXct(data$Date, data$Time)

However this gives me this result

Date         Time   pH     DateTime
1976-01-26   6:00   4.00   1976-01-25 19:00:00
1976-01-26   7:30   4.05   1976-01-25 19:00:00
1976-01-26   13:30  4.50   1976-01-25 19:00:00
1976-01-27   5:00   4.50   1976-01-26 19:00:00

Any clue as to why this happens?

Here is my exact code to get to the dataframe listed here

## BD files
setwd(DIR)
setwd("Acid Rain BD")
DIR.BD = getwd()
files = list.files(DIR.BD)

# read the files into a list of data.frames
data.list <- lapply(files, read.csv);
# concatenate into one big data.frame
data.cat.BD <- do.call(rbind, data.list);

#fix the dates
data.cat.BD$Date <- as.Date(data.cat.BD$Date, "%m/%d/%Y")

#Create the datetime variable (IMPORTANT ONLY DO THIS ONCE OR REDO)

data.cat.BD$DateTime = as.POSIXct(paste(data.cat.BD$Date, data.cat.BD$Time))

This is what happens when I input the listed code

> sample(unique(data.cat.BD$Date), min (length(data.cat.BD$Date),100))
  [1] "1984-08-20" "1979-09-06" "1984-07-17" "1982-01-23"
  [5] "1977-11-24" "1980-05-14" "1980-10-25" "1980-04-28"
  [9] "1981-09-15" "1979-03-06" "1976-03-16" "1985-09-08"
 [13] "1982-10-21" "1984-01-14" "1976-07-24" "1985-08-31"
 [17] "1976-02-19" "1985-11-29" "1984-03-14" "1983-11-25"
 [21] "1984-02-11" "1982-01-04" "1982-01-13" "1980-07-21"
 [25] "1976-08-27" "1982-03-04" "1983-09-30" "1977-03-19"
 [29] "1980-04-27" "1982-10-09" "1985-02-02" "1984-07-18"
 [33] "1977-10-15" "1984-03-06" "1983-05-01" "1983-03-11"
 [37] "1984-03-17" "1978-03-26" "1979-08-02" "1977-12-19"
 [41] "1984-12-22" "1982-03-17" "1977-10-17" "1985-10-02"
 [45] "1981-07-04" "1980-06-08" "1981-02-20" "1982-01-20"
 [49] "1979-02-07" "1985-07-21" "1977-01-14" "1982-09-03"
 [53] "1984-10-02" "1976-05-07" "1984-12-27" "1980-11-27"
 [57] "1985-01-31" "1978-01-26" "1979-08-10" "1985-10-04"
 [61] "1984-05-13" "1979-12-07" "1977-01-24" "1982-09-27"
 [65] "1980-03-21" "1982-10-26" "1984-06-30" "1981-04-28"
 [69] "1984-04-04" "1979-10-05" "1985-08-19" "1976-05-01"
 [73] "1976-02-05" "1984-02-03" "1982-04-17" "1977-04-25"
 [77] "1977-10-16" "1980-04-15" "1984-10-10" "1976-03-09"
 [81] "1979-05-03" "1980-11-18" "1976-02-17" "1983-03-20"
 [85] "1982-11-26" "1982-03-12" "1983-03-19" "1983-04-30"
 [89] "1985-12-07" "1981-02-02" "1981-02-08" "1982-02-09"
 [93] "1977-02-24" "1976-09-18" "1977-03-22" "1985-10-13"
 [97] "1981-06-13" "1985-01-15" "1976-09-20" "1977-12-15"
Brian W
  • 31
  • 2
  • 10

1 Answers1

2

You can use paste to combine the date and time:

df$DateTime <- as.POSIXct(paste(df$Date, df$Time))
df
#         Date  Time   pH            DateTime
# 1 1976-01-26  6:00 4.00 1976-01-26 06:00:00
# 2 1976-01-26  7:30 4.05 1976-01-26 07:30:00
# 3 1976-01-26 13:30 4.50 1976-01-26 13:30:00
# 4 1976-01-27  5:00 4.50 1976-01-27 05:00:00

Reproducible Data

df <- read.table(text="
Date         Time   pH
1976-01-26   6:00   4.00
1976-01-26   7:30   4.05
1976-01-26   13:30  4.50
1976-01-27   5:00   4.50", header=T)
Pierre L
  • 28,203
  • 6
  • 47
  • 69
  • I just tried this and got "Character string is not in a standard unambiguous format" – Brian W Oct 17 '15 at 21:58
  • I posted proof of the data you provided and the correct output. – Pierre L Oct 17 '15 at 22:00
  • That is really odd, I did used your code and got the correct output for the reproducible data section. But I have the exact same code you wrote getting that error for my dataframe. Is there any chance it's because it was read in using a csv? I can copy and paste my exact code to get to the dataframe I have if that will help – Brian W Oct 17 '15 at 22:04
  • I can't reproduce your error. You can locate the error by running the command on portions of your data frame at a time. Try it on `data.cat.BD[1:10,]` then try it on `data.cat.BD[1:50,]`. See if you can find the row that is throwing the error. Or you can use a for loop and see where it stops. `test <- c(); for (i in 1:nrow(data.cat.BD)) test[i] <- as.POSIXct(paste(data.cat.BD$Date[i], data.cat.BD$Time[i]))` – Pierre L Oct 17 '15 at 22:16
  • In the for loop, when the error is thrown. you can look at `test` and see the last time it ran successfully. – Pierre L Oct 17 '15 at 22:17
  • The dates look fine. The `Time` column may be the problem. Use the for loop approach I mentioned to locate which row it fails on. `data.cat.BD[length(test)+1,]` will be your first problem row. – Pierre L Oct 17 '15 at 22:19
  • So I fixed two errors, and I thought I got it working. But now it is only pasting my Date instead of my dateTime in the DateTime column. Did I mess something up? **test <- c(); for (i in 1:nrow(data.cat.BD)) test[i] <- as.POSIXct(paste(data.cat.BD$Date[i], data.cat.BD$Time[i]))** then **data.cat.BD = data.cat.BD[-c(389,1652),]** then **data.cat.BD$DateTime = as.POSIXct(paste(data.cat.BD$Date, data.cat.BD$Time))** – Brian W Oct 17 '15 at 22:43
  • Then it is definitely the `Time` column. It's impossible to see the data from my vantage point. There's something non-standard about one of your times. Now you may have to use my other trouble-shooting method. Run the command on portions of your data frame and look for where it stops giving the time in the output to locate your problem rows. You have some data cleaning to do. – Pierre L Oct 17 '15 at 22:46
  • All of your times should have two numbers separated by a colon. `"num1:num2"`. To help locate the problem, try `which(!grepl(":", data.cat.BD$Time))` to locate any times without colons. – Pierre L Oct 17 '15 at 22:49
  • You can also use `which(!grepl("^\\d+:\\d+$", data.cat.BD$Time))` to check for any entries that do not have the right layout. – Pierre L Oct 17 '15 at 22:50
  • I have been attempting to fix it and haven't found my errors yet. It's odd because as.POSIXct(paste(data.cat.BD[1,1], data.cat.BD[1,2])) and as.POSIXct(paste(data.cat.BD[1,1], data.cat.BD[2,2])) both work. But when I attempt to change out it to test4 = c(); for (i in 2) test4[i] = as.POSIXct(paste(data.cat.BD[i,1], data.cat.BD[i,2])) It is returning NA values. – Brian W Oct 17 '15 at 23:25
  • I also just tried both of those codes and both say 1496. I looked through the data and they all appear to have colons. (there are a little over 2000 data points) – Brian W Oct 17 '15 at 23:26
  • I'm about to give up for the night. I can't seem to get this to work well. Using a for loop gives me a numerical number while using a specific row data.cat.BD[1,1], data.cat.BD[1,2] gives me what I need. I also cannot find any times that are wrong when I look through the data. I still am only getting the dates when i use the data.cat.BD$DateTime = "your code" I am unsure of what to do next. – Brian W Oct 17 '15 at 23:42
  • Just post the whole dataset. I'll take look at it. do `dput(data.cat.BD)` and add it to your question. It will be a lot but post it anyway. We'll delete it after we fix it. – Pierre L Oct 17 '15 at 23:44
  • By the way, the reason you got NA values is because you tried to assign the output to `test4[i]` which is `test4[2]`. It can't put the value in test4[2] if test4[1] doesn't even exist yet. – Pierre L Oct 17 '15 at 23:50
  • 1
    So I just figured everything out. Thank you so much for all the help Pierre. – Brian W Oct 18 '15 at 00:12