Hi R and ggplot experts,
I started learning R and experimenting with ggplot
.
I have a usecase as explained below.
Reproducible R code:
require(ggplot2)
library(ggrepel)
# Create the data frame.
sales_data <- data.frame(
emp_name = rep(c("Sam", "Dave", "John", "Harry", "Clark",
"Kent", "Kenneth", "Richard", "Clement", "Toby",
"Jonathan"), times = 3),
month = as.factor(rep(c("Jan", "Feb", "Mar", "Jan", "Feb", "Mar",
"Jan", "Feb", "Mar", "Jan", "Jan"), times = 3)),
dept_name = as.factor(rep(c("Production", "Services", "Support",
"Support", "Services", "Production", "Production",
"Support", "Support", "Support", "Production"),
times = 3)),
revenue = rep(c(100, 200, 300, 400, 500, 600, 500, 400, 300,
200, 500), times = 3),
status = rep(c("Low", "Medium", "Medium", "High", "Very High",
"Very High", "Very High", "High", "Medium", "Medium",
"Low"), times = 3)
)
sales_data$month <- factor(sales_data$month, levels = c("Jan",
"Feb", "Mar"))
month_vector <- levels(sales_data$month)
sales_data$month <- as.integer(sales_data$month)
sales_data$status <- factor(sales_data$status, levels = c("Low",
"Medium", "High", "Very High"))
dept_vector <- levels(sales_data$dept_name)
sales_data$dept_name <- as.integer(sales_data$dept_name)
ggplot(sales_data, aes(x = month, y = dept_name)) +
geom_raster(data = expand.grid(sales_data$month,
sales_data$dept_name), aes(x = Var1, y = Var2, width=1,
height=1), fill = NA, col = 'gray50', lty = 1) +
geom_point(aes(size = status ), shape = 16,
position = position_jitter(seed = 0), show.legend = F) +
scale_color_manual(name = "revenue") +
geom_text(aes(label = revenue), size=4, vjust = 1.6,
position = position_jitter(seed = 0)) + theme_bw() +
theme(
axis.title = element_blank(),
axis.ticks = element_blank(),
plot.background = element_blank(),
axis.line = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.text = element_text(colour = "blue", face = "plain",
size =11)
) + scale_x_continuous(limits=c(0.5,3.5), expand = c(0,0),
breaks = 1:length(month_vector), labels = month_vector) +
scale_y_continuous(limits=c(0.5,3.5), expand = c(0,0),
breaks = 1:length(dept_vector), labels = dept_vector) +
geom_hline(yintercept = as.numeric(sales_data$dept_name) + 0.5) +
geom_vline(xintercept = as.numeric(sales_data$month) - 0.5,
color = "grey")
Output plot:
As one can see, that the points plotted by geom_point
are overlapping quite often.
In order to solve the overlapping issue, I am having a solution in mind, but not sure, how it can be done using R.
Need guidance.
Step.1) Introduce a new column in dataset (sales_data
), which has number of points in each particular category combination. For example for category Feb
and Services
there are 6 enteries/points. So, for all the rows corresponding to such enteries, the new column's value should be 6.
Step.2) I will calculate the square root of the number of enteries in each category combination and then take ceiling of that number. For example, category case Feb
and Services
, there are 6 points, so ceiling(squareroot(6))
= 3. Now, I know that I have to plot 6 points by splitting x and y range for the category as 3 x 3 grid. And accordingly points will be plotted on first 6 points of those 9 grid points inside the tile for those category.
Can someone please guide me, how it can be done ? I am sure, it is very much possible, but not sure how to approach this case.