-3

I have something like this within a function:

x <- as.POSIXct((substr((dataframe[z, ])$variable, 1, 8)), tz = "GMT", 
format = "%H:%M:%S")
print(x)
if ( (x >= as.POSIXct("06:00:00", tz = "GMT", format = "%H:%M:%S")) &   
(x < as.POSIXct("12:00:00", tz = "GMT", format = "%H:%M:%S")) ){
position <- "first"
}

but I get this output:

character(0) Error in if ((as.numeric(departure) - as.numeric(arrival)) < 0) { : argument is of length zero

how can I fix this so my comparison works and it prints the correct thing?

some examples of the dataframe$variable column: 16:33:00 15:34:00 14:51:00 07:26:00 05:48:00 11:10:00 17:48:00 06:17:00 08:22:00 11:31:00

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Deb Martin
  • 51
  • 12
  • If anyone know how I can compare as.POSIXct objects in if statements I would be so grateful. – Deb Martin Jun 11 '16 at 01:25
  • Can I assume you've narrowed the issue to the section of code you're showing us? Your error says otherwise, so I'm hoping you've ruled that "departure - arrival" section out? – rosscova Jun 11 '16 at 04:33
  • 1
    Any reason you're calling substr, given that your values are already in the correct format? – rosscova Jun 11 '16 at 04:37
  • sorry I forgot to change the output (I switched the variable names): Error in if ((x >= as.POSIXct("05:00:00", tz = "GMT", format = "%H:%M:%S")) & : argument is of length zero – Deb Martin Jun 11 '16 at 04:37
  • oh yes I noticed I don't need substring, but it doesnt seem to fix the problem – Deb Martin Jun 11 '16 at 04:40

1 Answers1

2

Welcome to Stack Overflow!

First, the reason you've gotten some down votes is most likely because you haven't given much in your question to go on. For one thing, you haven't shown us what

(dataframe[z, ])$variable

is, which makes it hard for us to formulate a complete answer. You seem to be trying to extract a single value from a dataframe, is that right? If so, I've never seen it done that way, try replacing the above with:

dataframe$variable[z]

My guess is what you're trying to achieve is a comparison of an entire column of the dataframe called "variable", since that's generally more useful...

Having said that, I often come up against issues with time data, and from what I've heard, my experiences are not uncommon. When I'm dealing with just times, as it appears you are here, I prefer the chron::times format over POSIXct (POSIX is a date-time format, so a date is always included, it also tries to correct for timezone changes, as well as daylight savings changes, which tends to get in my way more than help). If you've got your data in the format you've specified in your first as.POSIXct call, you won't even need to specify that in calling the times function instead.

x <- chron::times( dataframe$variable )
print(x)
position <- ifelse ( x >= chron::times( "06:00:00" ) &
                     x < chron::times( "12:00:00" ), 
               "first", "not first" 
)

This will output a vector "position", with a result for all values taken from dataframe$variable. Does that achieve what you're hoping for?

From here, if you did want to extract the comparison result for the particular row "z" in dataframe, you can still do that with

position[z]

EDIT to add: It might be worth checking for missing values in "variable". This should return TRUE:

sum( is.na( dataframe$variable ) ) == 0

Also check for any that aren't correctly formatted. Again, this should return TRUE:

sum( is.na( chron::times( dataframe$variable ) ) ) == 0

EDIT to add: As per the comments, it looks like some values in your "variables" column aren't converting properly. You should be able to find them with

subset( dataframe, is.na( chron::times( variable ) ) )

That should let you see what's wrong. It may be a single cell, or it may be a number of them. You'll need to tidy up that data, which you can do in a few ways. You could go through and fix them manually, you could add a function in your script to repair them before the conversion (this might be a good idea if there is a common issue between all of those values, or if you expect the same issue to happen again as new data comes in, if indeed you need to allow for that).

The other option is simply to exclude those rows from your analysis. If you go this route, make sure it's appropriate to the analysis you're running. If it is appropriate in your case, you can add a step to clean up the dataframe before running the steps in your question:

dataframe <- subset( dataframe, !is.na( chron::times( variable ) ) )

NOTE: there's a good chance this will come up with a warning. If you run the same line twice, and the warning goes away the second time (after the offending rows have been removed), you may need to look further into it.

That should drop the offending values, leaving only values that are properly converting to the times format, which should help with the steps you're trying to run. Check how your dataframe dimensions change before and after that step; that'll tell you how many rows you're dropping.

You could do the same thing with POSIXct if that's what you're comfortable with, I'm just personally more comfortable with times for what you're doing.

rosscova
  • 5,430
  • 1
  • 22
  • 35
  • I tried using chron::times as you suggested, but when I print x within my function I am still getting something incorrect (times(0), and I got character(0) using POSIXct. Do you have any idea why this may be happening? All the printing and comparisons work fine when I copy it in my console, but when I run the page nothing works. – Deb Martin Jun 11 '16 at 03:00
  • That sounds like an error somewhere else in your code, or more likely a format mismatch somewhere in your data. Again, not having a complete picture makes it very hard to give a good answer. – rosscova Jun 11 '16 at 03:21
  • What does dataframe$variable look like? – rosscova Jun 11 '16 at 03:25
  • dataframe$variable is referencing one column/variable of the dataset. The dataset has thousands of rows. – Deb Martin Jun 11 '16 at 04:16
  • You don't need to show us the entire column, but something to let us see it would be very useful in answering your question. Perhaps include a small sample of, say, 10 values from dataframe$variable ? – rosscova Jun 11 '16 at 04:18
  • Also, what does class(dataframe$variable) return? – rosscova Jun 11 '16 at 04:19
  • I've added "as.character" to the substr call. That's one place where you may be getting a failure. – rosscova Jun 11 '16 at 04:25
  • class(dataframe$variable) returns character. I will update the question to show a few samples of values from the dataframe$variable column. – Deb Martin Jun 11 '16 at 04:30
  • I've added an explanation to the change I made to your (dataframe[z,])$variable. That's the next place I'd suspect your issue, but it should have disappeared if you changed your code to what I put in my original answer. – rosscova Jun 11 '16 at 04:59
  • yes, it is still giving me the same error. when I past the comparison into my console and assign x and then print it, I get the correct results. It's only when I run my function and plug in variables when these errors arise. Is there something that's causing the two areas to act differently? – Deb Martin Jun 11 '16 at 05:11
  • Well, since everything here seems to be working fine, I'd be looking elsewhere for your issue. – rosscova Jun 11 '16 at 05:29
  • I've added something else to try, just to make sure your while "variable" column does look like the section you've shown. – rosscova Jun 11 '16 at 07:10
  • The first expression you wrote above returns true, but when I substitiute the correct data and variable and I run chron::times( dataframe$variable ), I get Error in convert.times(times., fmt) : format h:m:s may be incorrect In addition: Warning message: In convert.times(times., fmt) : NAs introduced by coercion. Is this relevant to my original problem? – Deb Martin Jun 11 '16 at 17:40
  • Yes. You probably have some values in your "variables" column that are not in the correct format. Try finding them with subset(dataframe,is.na(chron::times(variable))) – rosscova Jun 12 '16 at 00:06
  • I've added some more to the answer about finding and repairing or removing rows that might be causing you issues. – rosscova Jun 12 '16 at 02:04
  • I tried finding the problematic rows using the code you supplied but I'm getting this error: Error in convert.times(times., fmt) : format h:m:s may be incorrect In addition : warning message: in convert.times(times., fmt) : NAs introduced by coercion – Deb Martin Jun 12 '16 at 19:50
  • Those errors and warnings are telling you in no uncertain terms that you've got some bad values in your data. You can ask for help on that here, but we're getting a long way from your original question. Maybe close this one and start a new one: "How to find incorrectly formatted time values in a large dataset" or something like that. – rosscova Jun 12 '16 at 23:05
  • okay I'll do that. The thing is when I limit my function to the first row which is formatted correctly I still get the error in my original problem. Is there anything else I can try? – Deb Martin Jun 13 '16 at 00:53
  • Did you correct the formatting as per my answer? You need to change "(dataframe[z, ])$variable" to "dataframe$variable[z]". I'm also assuming you've defined z (eg: "z <- 1"). – rosscova Jun 13 '16 at 00:57