I have been able to create the following visualization in python but would like to re-create it in r. The data can be found further down in the r code.
The python code I wrote to generate the below graph from the data is:
import matplotlib.pyplot as plt
import pandas as pd
# Load the data
portfolio_data = pd.read_excel("Data.xlsx")
# Define colors for each Therapeutic Area (TA)
ta_colors = {
'Malaria': 'lightblue',
'HIV': 'lightgreen',
# Additional colors can be added for other TAs if present in the dataset
}
# Define the width of the bars to adjust the diamond symbol position
bar_width = 0.8
plt.figure(figsize=(12, 8))
# For each phase, plot the projects, label them, color them by TA, add symbol for external funding, and draw border for NME type
for idx, phase in enumerate(portfolio_data['Phase'].unique()):
phase_data = portfolio_data[portfolio_data['Phase'] == phase]
bottom_offset = 0
for index, row in phase_data.iterrows():
edge_color = 'black' if row['Type'] == 'NME' else None # Add border if project type is NME
plt.bar(idx, 1, bottom=bottom_offset, color=ta_colors[row['TA']], edgecolor=edge_color, linewidth=1.2)
plt.text(idx, bottom_offset + 0.5, row['Project'], ha='center', va='center', fontsize=10)
# Add diamond symbol next to projects with external funding, positioned on the right border of the bar
if row['Funding'] == 'External':
plt.text(idx + bar_width/2, bottom_offset + 0.5, u'\u25C6', ha='right', va='center', fontsize=10, color='red')
bottom_offset += 1
# Adjust x-ticks to match phase names
plt.xticks(range(len(portfolio_data['Phase'].unique())), portfolio_data['Phase'].unique())
# Create legends for the TAs and external funding separately
legend_handles_ta = [plt.Rectangle((0, 0), 1, 1, color=ta_colors[ta], label = ta) for ta in ta_colors.keys() ]
legend_external_funding = [plt.Line2D([0], [0], marker='D', color='red', markersize=10, label='External Funding', linestyle='None')]
legend_nme = [plt.Rectangle((0, 0), 1, 1, edgecolor='black', facecolor='none', linewidth=1.2, label='NME Type')]
# Add legends to the plot
legend1 = plt.legend(handles=legend_handles_ta, title="Therapeutic Area (TA)", loc='upper left')
plt.gca().add_artist(legend1)
legend2 = plt.legend(handles=legend_external_funding, loc='upper right')
plt.gca().add_artist(legend2)
plt.legend(handles=legend_nme, loc='upper center')
plt.title('Number of Projects by Phase, Colored by TA, with Symbol on Bar Border for External Funding and Border for NME Type')
plt.xlabel('Phase')
plt.ylabel('Number of Projects')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Here is what the result looks like: enter image description here
In my attempts to replicate the output in r, I have tried the following code:
library(ggplot2)
library(dplyr)
portfolio_data <- read.table(text = "Project Phase Funding TA Type
Project1 I Internal Malaria NME
Project2 I Internal Malaria NME
Project3 I Internal Malaria NME
Project4 I External HIV NME
Project5 I Internal HIV NME
Project10 II Internal Malaria NME
Project11 II Internal Malaria NME
Project12 II Internal Malaria NME
Project17 II External Malaria LCM
Project18 II External HIV LCM
Project19 II Internal HIV LCM
Project20 III External Malaria NME
Project21 III Internal Malaria NME
Project22 III External Malaria LCM
Project23 III Internal HIV LCM
Project24 III External HIV NME
Project25 III Internal Malaria LCM
Project26 III External HIV LCM
Project27 III Internal HIV NME
", header=TRUE)
portfolio_data <- portfolio_data %>%
mutate(dummy = 1)
ta_colors <- c(
Malaria = "lightblue",
HIV = "lightgreen"
)
type_colors <- c(
NME = "black",
LCM = "white"
)
# Create the plot
plot <- ggplot(portfolio_data, aes(x = Phase, y = dummy, fill = TA, label = Project)) +
geom_col() +
#add project name as labels
geom_text(aes(label = Project)
, position = position_stack(vjust = .5)) +
#add borders by Type
geom_col(aes(color = Type)
, fill = NA
, size = 1) +
#add colors for TA and Type
scale_fill_manual(values = ta_colors) +
scale_color_manual(values = type_colors) +
#diamonds for projects with external funding
geom_text(aes(label = if_else(Funding == "External", "\u25C6", NA))
, vjust = 0.5, hjust = -6.8, color = "red", size = 5
, position = position_stack(vjust = .5)) +
# Theme and labels
labs(title = "Number of Projects by Phase, Colored by TA, with Symbol on Bar Border for External Funding and Border for NME Type",
x = "Phase",
y = "Number of Projects") +
theme_minimal()
print(plot)
I got the following result: enter image description here
The problem is that the borders are not correct. For example, Project 24 is an NME project. It seems that the second geom_col() call re-orders the projects so that the link between the Project and Type is no longer maintained. Is there a way around this? I wanted to use the built in functionality to draw borders but maybe I should consider adding a separate layer with boxes around the labels? I also tried geom_bar() but no success. Perhaps there are even better ways. Any help appreciated.