0

Plotting the below data. trying to include the lines per tech, Without using the facetwrap. Plot both tech and files per enm --

Date emmm task coll_task tech filescount
2023-06-01 vnmenm1 tr e3g_tr_vnmenm1 3g 1136
2023-06-01 vnmenm1 tr e4g_tr_vnmenm1 4g 3475
2023-06-01 vnmsgsn1 tr e4g_tr_vnmsgsn1 4g 317
2023-06-03 vnmenm1 tr e3g_tr_vnmenm1 3g 1136
2023-06-03 vnmenm1 tr e4g_tr_vnmenm1 4g 8899
2023-06-03 vnmsgsn1 tr e4g_tr_vnmsgsn1 4g 296
2023-06-04 vnmenm1 tr e3g_tr_vnmenm1 3g 1136
2023-06-04 vnmenm1 tr e4g_tr_vnmenm1 4g 9034
2023-06-04 vnmsgsn1 tr e4g_tr_vnmsgsn1 4g 292

Using below code -

data %>% group_by(Date, emmm, tech, filescount) %>%  summarize(filescount = sum(filescount)) %>% ggplot(aes(Date, emmm, tech, color = filescount)) + geom_point(size = 2.4, alpha = 0.5) + geom_line(aes(x = Date, y = emmm), size = 1,  alpha = 0.4, stat = "identity", na.rm = TRUE) 

gives an image like - enter image description here

Any way to present with both tech and showing the filescount, i.e - geom_line per tech and filescount !

leoin86
  • 17
  • 5

1 Answers1

1
library(ggplot2)
library(dplyr)

# Your given test data as a dataframe
df <- data.frame(
    Date = as.Date(c("2023-06-01", "2023-06-01", "2023-06-01", "2023-06-03", "2023-06-03", "2023-06-03", "2023-06-04", "2023-06-04", "2023-06-04")),
    emmm = c("vnmenm1", "vnmenm1", "vnmsgsn1", "vnmenm1", "vnmenm1", "vnmsgsn1", "vnmenm1", "vnmenm1", "vnmsgsn1"),
    task = rep("tr", 9),
    coll_task = c("e3g_tr_vnmenm1", "e4g_tr_vnmenm1", "e4g_tr_vnmsgsn1", "e3g_tr_vnmenm1", "e4g_tr_vnmenm1", "e4g_tr_vnmsgsn1", "e3g_tr_vnmenm1", "e4g_tr_vnmenm1", "e4g_tr_vnmsgsn1"),
    tech = c("3g", "4g", "4g", "3g", "4g", "4g", "3g", "4g", "4g"),
    filescount = c(1136, 3475, 317, 1136, 8899, 296, 1136, 9034, 292)
)

# Create a dataframe to plot
df_sum <- df %>%
    group_by(emmm, tech, Date) %>%
    summarize(total_files = sum(filescount), .groups = 'drop')

# Plot the data
ggplot(df_sum, aes(x = Date, y = total_files, group = interaction(emmm, tech), color = interaction(emmm, tech), linetype = interaction(emmm, tech))) +
    geom_line() +
    geom_point() +
    labs(x = "Date", y = "Files per Tech", color = "Tech", linetype = "Tech") +
    scale_color_discrete(name = "Emmm - Tech", labels = c("vnmenm1 - 3g", "vnmenm1 - 4g", "vnmsgsn1 - 4g")) +
    scale_linetype_discrete(name = "Emmm - Tech", labels = c("vnmenm1 - 3g", "vnmenm1 - 4g", "vnmsgsn1 - 4g")) +
    theme_minimal()

Output:

enter image description here

BrJ
  • 574
  • 3
  • 7
  • the requirement to plot per emmm --- then per tech and filecount – leoin86 Jun 06 '23 at 08:54
  • Not 100% sure if this is what you mean, but I changed my answer. Let me know if this is what you mean. – BrJ Jun 06 '23 at 09:24
  • amazing, this is what i meant, scale_color_discrete and scale_linetype_discrete are hardcoded, checking to put it automate as those emmm name always changes. – leoin86 Jun 06 '23 at 11:20
  • added auto labels -- apply(unique(df_sum[c("enm", "tech")]), 1, function(x) paste(paste(x[1], x[2] , sep='-'))) – leoin86 Jun 06 '23 at 12:59
  • the blank date space in Jun02, any way it can be removed. – leoin86 Jun 06 '23 at 15:13
  • You could use 'scale_x_date(labels = function(x) format(x, "%b%d")) +' in your plot to format your date without a space (or however you want to format the date). – BrJ Jun 07 '23 at 11:33