1

I am trying to connect the geom_points in my ggplot with geom_path. The lines should be in the same color as the geom_point fill color. However, geom_path does not know fill and color is used for a different grouping.

I am also highlighting certain geom_points with black outline using

scale_color_manual(values = c("NA", "black"), labels = c("No Buy Box", "Buy Box"))

What can I do? In fact, I want to plot the dots in different color (fill) by seller_id, highlight certain of these dots with colour = black if bbox = 1 and in addition connect the dots in their color using geom_path. I assume there are some more general issues in how I layered up the graphs in terms of the sub-sampling. geom_path does not know fill, this would have been the easiest solution. A data snippet is at the end to this post.

Thank you!!

ggplot(data = subset(algo_pricing,bbox_product == 9200000096286280), aes(x = bbox_time2)) +
  geom_point(mapping = aes(y = price_total, colour = as.factor(bbox), fill = seller_id), shape = 21) +
  geom_line(data = subset(algo_pricing, bbox ==1 & bbox_product == 9200000096286280), 
            mapping = aes(y = bbox_price, linetype = as.factor(bbox)),colour = "black") +
  geom_path(mapping = aes(y = price_total, colour = seller_id), linetype = "dotted") +
  scale_linetype_manual(values = "dotted", labels = "Buy Box Price") +
  scale_color_manual(values = c("NA", "black"), labels = c("No Buy Box", "Buy Box"))
example <- wrapr::build_frame(
   "bbox_time2"           , "bbox_price", "price_total", "seller_id"            , "bbox", "min_price", "bbox_product" |
     as.Date("2019-01-07"), 151         , 169.9        , "linkerlisse"          , 0L    , 129.5      , 4.641e-308     |
     as.Date("2019-01-18"), 125         , 169.9        , "linkerlisse"          , 0L    , 112        , 4.641e-308     |
     as.Date("2019-01-20"), 125         , 169.9        , "goedslapennl"         , 0L    , 118.5      , 4.641e-308     |
     as.Date("2019-01-14"), 120         , 169.9        , "decoware"             , 0L    , 114.3      , 4.641e-308     |
     as.Date("2019-01-18"), 125         , 169.9        , "goedslapennl"         , 0L    , 112        , 4.641e-308     |
     as.Date("2019-01-19"), 125         , 125          , "bol.com"              , 1L    , 125        , 4.641e-308     |
     as.Date("2019-01-20"), 125         , 169.9        , "decoware"             , 0L    , 121        , 4.641e-308     |
     as.Date("2019-01-19"), 125         , 169.9        , "decoware"             , 0L    , 124.2      , 4.641e-308     |
     as.Date("2019-01-10"), 135         , 120.3        , "hetbestebeddengoed.nl", 0L    , 120.3      , 4.641e-308     |
     as.Date("2019-01-11"), 135         , 135          , "bol.com"              , 1L    , 115.5      , 4.641e-308     |
     as.Date("2018-12-31"), 151         , 151          , "bol.com"              , 1L    , 143.8      , 4.641e-308     |
     as.Date("2019-01-17"), 125         , 169.9        , "goedslapennl"         , 0L    , 116.2      , 4.641e-308     |
     as.Date("2019-01-20"), 125         , 169.9        , "goedslapennl"         , 0L    , 119.8      , 4.641e-308     |
     as.Date("2019-01-17"), 125         , 169.9        , "goedslapennl"         , 0L    , 115.5      , 4.641e-308     |
     as.Date("2019-01-22"), 112.3       , 112.3        , "hetbestebeddengoed.nl", 1L    , 112.3      , 4.641e-308     |
     as.Date("2019-01-01"), 151         , 169.9        , "linkerlisse"          , 0L    , 142.1      , 4.641e-308     |
     as.Date("2019-01-21"), 125         , 127.5        , "sleepworld"           , 0L    , 117.8      , 4.641e-308     |
     as.Date("2018-12-31"), 151         , 151          , "bol.com"              , 1L    , 142.8      , 4.641e-308     |
     as.Date("2019-01-18"), 125         , 169.9        , "smulderstextiel.nl"   , 0L    , 125        , 4.641e-308     |
     as.Date("2019-01-01"), 151         , 169.9        , "linkerlisse"          , 0L    , 141.2      , 4.641e-308     )
  • 1
    Hi - can you post your data via `dput(your.data.frame)` or at least a portion of it so that we can have a reproducible example? – chemdork123 Apr 14 '20 at 13:28
  • Sure, I forgot that. Added a snippet to the OP. – marcellobello Apr 14 '20 at 13:52
  • 2
    Thank you for posting, but unfortunately I cannot generate a sensible graph using that data (I think it's only representing a single date). Possibly would be a good idea to use `sample()` to grab a few rows from your original dataframe: (e.g. `algo_pricing[sample(1:nrow(algo_pricing), 20),]` would work to grab 20 rows randomly), but you would want to make sure your example works with your dataset you post. Additionally, what are you getting as a result of your code, and where does it seem to go wrong? Are you generating a plot that doesn't look right or are you getting an error message? – chemdork123 Apr 14 '20 at 14:12
  • Ah obviously, sorry. Made the edit above. I am getting the "Insufficient values in manual scale error". That is because color is already defined with two values. – marcellobello Apr 14 '20 at 14:20
  • Okay - that's what I was getting too. I'll have a look. – chemdork123 Apr 14 '20 at 14:26
  • 2
    Remove your `scale_color_manual` and `scale_linetype_manual` calls. That will generate a plot, but probably it's not what you want to see. `ggplot` will combine scales when you specify the same aesthetic: so for example, you specify to apply the linetype of `geom_line` and outside color of `geom_point` to the same factor (`as.factor(bbox)`). A legend is created that indicates bbox as being either dotted with one color or solid as another color (combining the two). Remove those `scale_...` calls, post your plot, then clarify your question - probably concerning the legend and labeling. – chemdork123 Apr 14 '20 at 14:34

2 Answers2

2

Give this a try, it is hard to replicate your dataframe but I had similar issues and the following worked.

First define your colour and values (also I am not understanding exactly what you are trying to do with the "NA" here, you need to have colors and not NAs). You also have one colour defined for two different plots (line and path make sure you add that for the two separately).

Also take a look at this solution: [plot below showing 2 legends when controling scale color manual


cl <- c("black" = "Buy Box", "blue" = "No Buy Box")
ggplot(data = subset(algo_pricing,bbox_product == 9200000096286280), aes(x = bbox_time2)) +
  geom_point(mapping = aes(y = price_total, colour = as.factor(bbox), fill = seller_id), shape = 21) +
  geom_line(data = subset(algo_pricing, bbox ==1 & bbox_product == 9200000096286280), 
            mapping = aes(y = bbox_price, linetype = as.factor(bbox)),colour = "Buy Box") +
  geom_path(mapping = aes(y = price_total, colour = seller_id), linetype = "dotted", colour = "No Buy Box") +
  scale_color_manual(values = c("blue", "black"))

Sally_ar
  • 126
  • 8
1

That's probably the convoluted plot of the day! To put in my two cents worth: The more I do visualisation, the more I think that if you struggle hard to get things done with ggplot, it is possibly a sign that your visualisation may not be ideal. Maybe think of reducing the amount of dimensions that you want to show with your plot.

However, it is a very nice exercise to map and control aesthetics.

It requires a bit of data wrangling, factor level control, and controlling how many points to plot per time. I solved this by just removing duplicate entries for each time. You need to decide how you will manage this.

I also added a bit of jitter to the prices, so that you can see the lines better. It distorts the values slightly, but you could change the jitter amount.

Other comments in the code.

library(tidyverse)
example <- example %>% 
  distinct(seller_id, bbox_time2,.keep_all = TRUE) %>%
  mutate(bbox_sell  = paste(seller_id, bbox, sep = '_'),
         price_total = jitter(price_total, amount = 1)) %>%
  arrange(bbox_time2, seller_id)   # arranging is important for geom_path


ggplot( # setting the general aesthetics. You could do this in each geom call, but I am a bit lazy, so I define the main aesthetics here. 
  data = example,
  aes(
    x = bbox_time2,
    y = price_total,
    group = seller_id
  )
) +
  geom_point(
    aes(colour = seller_id) # color aesthetic matches the following geom_path.
# also, the shape defaults to 16. This avoids messing with both fill and color aesthetic. 
  ) +
  geom_path(
    aes(colour = seller_id),
    linetype = "dotted"
  ) +
  geom_path(
    data = filter(example, bbox == 1),
    aes(linetype = "box1", group = bbox_sell),
    colour = "black".   # color defined outside of aesthetic! 
  ) +
  scale_linetype_manual(name = NULL, values = "dotted", labels = "Buy Box Price") +
  ggnewscale::new_scale_colour() + # now here's an option how to easily create two color scales. 
  geom_point(
    data = filter(example, bbox == 1),
    aes(color = as.character(bbox)), # you can now use a new color scale. 
    shape = 21, # using a different shape for the highlighted points
  ) +
  scale_color_manual(name = NULL, values = "black", labels = "Buy Box") 

The legend order is of course peculiar. Controlling legend order is quite a thing, and the 'usual' way of controlling it with + guides(xxx = guide_legend(order =...)) does not seem to work with ggnewscale.

Created on 2020-04-15 by the reprex package (v0.3.0)

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • 1
    ```ggnewscale::new_scale_colour()``` is what I was looking for all along!! This is very helpful, thank you so much for your help and the valuable comments. I do realize the plot is a bit over the top but I thought it is useful nonetheless to learn how to define several manual color scales. Cheers! – marcellobello Apr 15 '20 at 11:13