0

My dataframes sometimes contain NA values. These were previously blanks, characters like 'BAD' or actual 'NA' characters from the imported .csv file. I have changed everything in my dataframes to numeric - this changes all non-numeric characters to NA. So far, so good.

I am aware I can use the following using dataframe 'df' to ensure a line is always drawn between data points, ensuring there are no gaps:

ggplot(na.omit(df), aes(x=Time, y=pH)) +
  geom_line()

However, sometimes I wish to plot 2 or more dataframes using ggplot2 to get a single plot. I do this because my x axis (Time) is indeed the same for all dataframes, but the specific numbers are different. I was having immense trouble merging these dataframes because the rows are not equal. Otherwise I would merge, melt the data and use ggplot2 as normal to make a multiple-lined line plot.

I have since learnt you can plot multiple dataframes manually on ggplot at the 'geom level':

ggplot() + 
  geom_line(df1, aes(x=Time1, y=pH1), colour='green') + 
  geom_line(df2, aes(x=Time2, y=pH2), colour='red') +
  geom_line(df3, aes(x=Time3, y=pH3), colour='blue') +
  geom_line(df4, aes(x=Time4, y=pH4), colour='yellow')

However, how can I now ensure NA values are omitted and the lines are connected?! It all seems to work, but my 4 plots have gaps in them where the NA values are!

I am new to R, but enjoying it so far and realise there are usually multiple solutions to an issue. Any help or advice appreciated.

EDIT (for anyone who later sees this)

So, after playing around for 30 mins I realised I could first use the no.omit function separately on each dataframe, name these new objects and then just these plot these instead on ggplot. This works fine. Also, the above code was incorrect anyway if I wanted a suitable legend.

New, correct code:

df1.omit <- na.omit(df1)
df2.omit <- na.omit(df2)
df3.omit <- na.omit(df3)
df4.omit <- na.omit(df4)

ggplot() + 
  geom_line(df1.omit, aes(x=Time1, y=pH1, colour="Variable 1") + 
  geom_line(df2.omit, aes(x=Time2, y=pH2, colour="Variable 2") +
  geom_line(df3.omit, aes(x=Time3, y=pH3, colour="Variable 3") +
  geom_line(df4.omit, aes(x=Time4, y=pH4, colour="Variable 4")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
WilldoesR
  • 33
  • 4
  • Hi there and welcome to stackoverflow. Does it not work if you use the `na.omit(df*)` at the geom level? If not, could you include some dummy data for us to try out your code? – teunbrand Feb 04 '21 at 13:36
  • Thank you. See my edit for solution. I attempted to use na.omit at the geom level but it didn't seem to work, but I was unsure if I was getting the syntax correct. – WilldoesR Feb 06 '21 at 12:33
  • Feel free to post it as an answer to your own question. That way, people who search for similar problems can see that this question has an answer. – teunbrand Feb 06 '21 at 12:38

1 Answers1

1

So, after playing around for 30 mins I realised I could first use the no.omit function separately on each dataframe, name these new objects and then just these plot these instead on ggplot. This works fine. Also, the above code was incorrect anyway if I wanted a suitable legend.

df1.omit <- na.omit(df1)
df2.omit <- na.omit(df2)
df3.omit <- na.omit(df3)
df4.omit <- na.omit(df4)

ggplot() + 
  geom_line(df1.omit, aes(x=Time1, y=pH1, colour="Variable 1") + 
  geom_line(df2.omit, aes(x=Time2, y=pH2, colour="Variable 2") +
  geom_line(df3.omit, aes(x=Time3, y=pH3, colour="Variable 3") +
  geom_line(df4.omit, aes(x=Time4, y=pH4, colour="Variable 4")
WilldoesR
  • 33
  • 4