1

I have a problem occuring in my ggplot that I am unable to find the problem. I have a data frame looking somewhat as follows and whish to plot it color coded by regime:

df <- data.frame(
  CO = c(0, 4, 5, 23, 25, 28, 31, 2), 
  time = structure(c(1670585739, 1670586039, 1670586339, 1670586639, 1670586939, 1670587239, 1670587539, 1670587839), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
  regime = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L), levels = c("LA", "PAn", "So"), class = "factor")
  )
df
#>   CO                time regime
#> 1  0 2022-12-09 11:35:39    PAn
#> 2  4 2022-12-09 11:40:39    PAn
#> 3  5 2022-12-09 11:45:39    PAn
#> 4 23 2022-12-09 11:50:39     LA
#> 5 25 2022-12-09 11:55:39     LA
#> 6 28 2022-12-09 12:00:39     LA
#> 7 31 2022-12-09 12:05:39     LA
#> 8  2 2022-12-09 12:10:39     So

the data looks as follows:

> sapply(df,class)

$CO
[1] "integer"

$time
[1] "POSIXct" "POSIXt"

$regime
[1] "factor"

> summary(df)

      CO             time                        regime   
Min.   :  0.0   Min.   :2022-12-09 11:35:39.00   LA : 431
1st Qu.:124.0   1st Qu.:2022-12-10 17:59:09.00   PAn:2384
Median :142.0   Median :2022-12-11 23:41:39.00   SoL:  37
Mean   :134.5   Mean   :2022-12-11 23:31:39.06
3rd Qu.:149.0   3rd Qu.:2022-12-13 05:24:09.00
Max.   :176.0   3rd Qu.:2022-12-13 05:24:09.00
                Max.   :2022-12-14 11:04:09.00

however using ggplot the color is not shown due to regime:

ggplot(df, aes(x=time, y=CO, color = factor(regime))) +
  geom_point() +
  theme(legend.position = "bottom")

gives me

ggplot2

I am lost here since the factors of "regime" are shown in different colors at the bottom, but not color coded in the plot itself despite the fact that the class is correctly "factor" and there are no error messages whatsoever. What am I missing?

I already tried recreating the problem in an artificial data set but could not reproduce it, also using + geom_point(color=factor(regime)did not solve the problem.

shs
  • 3,683
  • 1
  • 6
  • 34
Jono
  • 11
  • 2
  • 2
    Is it possible that number of points per some levels is so small that it's being overshadowed or overplot by the green level? Try adding `facet_wrap(~ regime)` and see what you get. – Roman Luštrik Feb 14 '23 at 10:07
  • Hi, I tried your suggestion, it results in three panels, not one. I added summary info, the green level is the largest however the other should be seen, no? – Jono Feb 14 '23 at 10:14
  • However your suggestion showns the PAn factor to be continous whereas teh other two factors actually only appear where they should be, so that gives me a hint, the PAn level seems to be attached to every data point, not only the ones it should be. I have to look into that (the data set was merged from two other data frames, so maybe there is a problem during the merge?) – Jono Feb 14 '23 at 10:17
  • Have you tried your code for the 8 observations you have shared here? For those, I see points of all 3 colors on my machine – shs Feb 14 '23 at 10:19
  • yes I have tried that and it works fine for me too. – Jono Feb 14 '23 at 10:24
  • 2
    @RomanLuštrik gave me the right hint to find my problem. For some reason, when I merged the datasets before plotting, the PAn factor was associated to every datapont, so I have two factors for each point, PAn and the actual factor. When plotting the data, PAn is plotted on top and thus really overshadowed the other levels. Thanks! – Jono Feb 14 '23 at 10:27

1 Answers1

0

Try this :

ggplot(df, aes(x=time, y=CO, color = regime,size=regime)) +
  geom_point(alpha=.5) +
  theme(legend.position = "bottom") + 
  scale_size_manual(values=c("LA"=5,"PAn"=1,"So"=5))

Tinker with the values as needed

warning; be consistent as to whether your 3rd regime is So or SoL as you've shown us both ...

Nir Graham
  • 2,567
  • 2
  • 6
  • 10