-1

Let say I have this:

Customer Server Size
Cus_a    Ser_1  3
Cus_a    Ser_1  4
Cus_a    Ser_2  2
Cus_b    Ser_2  1
Cus b    Ser_2  3
Cus_b    Ser_2  2
Cus_c    Ser_2  4
Cus c    Ser_2  1
Cus_c    Ser_3  4

I need to aggregate new data form that shows total size of every customer on each server like:

Cus_a Ser_1 7
Cus_a Ser_2 2
Cus_b Ser_2 6
Cus_c Ser_2 5
Cus_c Ser_3 4

And after that I need to put everything in a geom_col that shows bars visually grouped by server. :) So again - one client can exist as more than 1 bar in the chart if it's located to more than one server

Thank you very much

Yavor I
  • 77
  • 1
  • 6

2 Answers2

1

Another solution, where the aggregation is done in base R, is this:

df3 <- aggregate(df$Size, list(df$Customer, df$Server), sum)

Note the changed column names:

df3
  Group.1 Group.2 x
1   Cus_a   Ser_1 7
2   Cus_a   Ser_2 2
3   Cus_b   Ser_2 6
4   Cus_c   Ser_2 5
5   Cus_c   Ser_3 4

For convenience, rename the columns using the column names in df:

Draw the stacked barplot using the new column names:

names(df3) <- names(df)

Now draw the stacked barplot:

ggplot(df3) + aes(x = Server, y = Size, fill = Customer) + geom_col()

enter image description here

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • I receive an error: Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector – Yavor I Apr 05 '20 at 09:25
  • Thank you. Now the code works even with my data :) The problem is still tha graph itself - in your example the bar is the customer (80 in total). The colouring is per vault. This is almost what I need. For example if I had smaller number of customers, just to change x and y will be sufcient, because will be clear with which customers every vault is full. But they are too many, and there will be no space for labels and color coding with 80 different colurs will not be readable. Please check the ugly example here: https://drive.google.com/open?id=1X4MWIU4kpEU4ejbEg9itO4eg1eybJHXA – Yavor I Apr 05 '20 at 15:05
  • I'm not quire sure I understand. Perhaps, if the number of `Server` values is smaller than the number of customers, you could do this: `ggplot(df3) + aes(x = Customer, y = Size, fill = Server) + geom_col()`. Please try it out and let me know if it works for you. – Chris Ruehlemann Apr 05 '20 at 17:02
  • Yes I already did that, but in this case the focus is - where the customer is keeping their data. Which is useful, and I will keep this one. But I also need - the opposite - what is the content of each server as clients. I need bars of clients, but visually grouped by server. In this case there will be more than 1 bar per customer, because sometimes the customer persists on 2 or 3 different servers at the same time. – Yavor I Apr 05 '20 at 17:12
  • "I need bars of clients, but visually grouped by server."--that's what the stacked bar plot shows. So, I'm not sure I understand your problem. – Chris Ruehlemann Apr 05 '20 at 17:17
  • The same but separated in different bars next to each other and grouped by server.. Imagine 30 clients in a single server (we have such) if they are placed in a single bar - will be very difficult to say which one is which just from the colors. – Yavor I Apr 06 '20 at 08:51
  • Is there perhaps a possibility to group the customers into, say, two or three large groups and, based on this grouping, draw two or three separate stacked barplots? – Chris Ruehlemann Apr 06 '20 at 11:13
  • Maybe I just need to produce separate charts per server, as they are only 6 servers only, but 80 clients. Thank you very much for your help! I click UP, but I am less than 15 reputation so it's not visible :) – Yavor I Apr 06 '20 at 15:21
0

You can try this :

library(dplyr)
library(ggplot2)

df %>%
  group_by(Customer, Server) %>%
  summarise(Size = sum(Size)) %>%
  ggplot() + aes(x = Server, y = Size, fill = Customer) + geom_col()

enter image description here

data

df <- structure(list(Customer = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("Cus_a", "Cus_b", "Cus_c"), class = "factor"), 
Server = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), .Label = c("Ser_1", 
"Ser_2", "Ser_3"), class = "factor"), Size = c(3L, 4L, 2L, 
1L, 3L, 2L, 4L, 1L, 4L)), class = "data.frame", row.names = c(NA,-9L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you. I am trying to understand why your code is working, but if I plement it with my data is not :). Please give me some time to investigate. Regarding the chart - this will work with small amount of clients, but in my case thet are more than 70-80 total, and simple color coding will not be readable. That's why I am imaging this with seperate (but grouped by server) clients with labels on each bar. Ugly, but readable. It will be a tall chart with horizontal bars - to be able to scroll down. – Yavor I Apr 05 '20 at 09:08
  • @YavorI What error do you get? Can you check with the data that I have shared at the end of my post and see if it works with that data? – Ronak Shah Apr 05 '20 at 09:41
  • Yes - your data is fine, which means I am making something wrong: ggplot(df2) + aes(x = Server, y = x, fill = Customer) + geom_col() Error: `data` must be a data frame, or other object coercible by `fortify()`, not an integer vector Run `rlang::last_error()` to see where the error occurred. > source('C:/00-HOME/R/backups_byclientandvault.R', echo=TRUE) Error in source("C:/00-HOME/R/backups_byclientandvault.R", echo = TRUE) : C:/00-HOME/R/backups_byclientandvault.R:38:34: unexpected '=' 37: group_by(backups$CustomerName, backups$Vault) %>% 38: summarise(backups$OriginalSize = – Yavor I Apr 05 '20 at 11:35