0

I am a newbie on graphs and r and I got the exercise to analyse a dataset. I am willing to do a graph (facet_wrap?) from it, where I show the distribution of profits per month from different countries. I have a subset with 19 Countries, 12 Months and profit values! Something like (example):

Country   Month  Profit
Brazil    Jan     50
Brazil    fev     80
Brazil    mar     15
Austria   Jan     35
Austria   fev     80
Austria   mar     47
France    Jan     21
France    fev     66
France    mar     15
[...]
Germany   Dez     40 

I have played a bit around with the graph but I am still struggling with understanding how does it work. So far I have:

test <- ggplot(sub, aes(x=Month, y=Profit, fill= Profit))+
geom_bar(stat='identity')+
facet_wrap(~Country) +
scale_fill_gradient(low = "red", high = "green", name = "Profit grade", labels = comma) +
scale_y_continuous(labels = function(x) format(x, scientific = FALSE))+
theme_bw()

which comes like: enter image description here

there are a few problems though that I cant understand.

  1. Why aren't all the bars starting from the same line? they are not straight, is it normal? or why is the X axis fluctuating? (check France for clear picture)
  2. How can I fill it correctly? On my graph when I use the function Fill, the shade happens throughout the bar and not between different pars (Which would make more sense). Could you explain what this filling (from the photo) could be meaning?
  3. Would there be another/better way to present such information? I was willing to try a line graph, such as those mountain graphs about economics used on TV, but I don't know how to do it. I tried some points but didn't make sense for me (or I did wrong haha)
  • 1
    0) Your months are as character data so are appearing alphabetically. You probably want them as ordered factors. 1) You have multiple values per month-country combination. Perhaps different years or different industries? Some of your Profit values are negative, so those components are plotted going down from zero. – Jon Spring Jan 26 '22 at 00:09
  • 1
    2) The fill is filling the different components in your data. `sub` seems to have at least half a dozen rows for each Country-month combination, and these are by default stacked on top of each other. – Jon Spring Jan 26 '22 at 00:11
  • @JonSpring about the month (0) your right! i kinda forgot this line, I reordered it and worked fine! Thanks =) 1. Oh... I see. Well the profit makes references to the sum of 20 different Products! Do you know how can I make it as a total? or another way to represent it? Thank you for your answer! – lkasquilici Jan 26 '22 at 09:14

1 Answers1

1

Probably a larger fraction of the data is needed anyhow it is sufficient to work with referring to your questions:

First lets understand the graph: we are looking at profit per country and month (X axis is crowded). This means, we se multiple occurrences of the month country combination. The reason could be the missing info of year or an additional but missing and thus unknown variable. It most probably the year info lost in process of your data analysis. You possibly have to control for that in your chart.

  1. Since we are looking at profit it is possible these values are negative indeed (full data is needed, but you could sort ascending or filter values < 0 to check)

  2. The filling refers to the unknown variable (therefore some month of some countries in possibly some years are indeed negative)

  3. Lets mock up some data with your snipped (year included)

n

library(tidyverse) # you could call only ggplot2 and dplyr
# read data from plain text as data.table/data.frame
dt <- data.table::fread('Year  Country   Month  Profit
2012   Brazil    Jan     50
2012   Brazil   fev     80
2012   Brazil    mar     15
2012   Austria   Jan     35
2012   Austria   fev     80
2012   Austria   mar     47
2012   France    Jan     21
2012   France    fev     66
2012   France    mar     15
2012   Germany   Dez     40
2013   Brazil    Jan     50
2013   Brazil   fev     80
2013   Brazil    mar     15
2013   Austria   Jan     35
2013   Austria   fev     80
2013   Austria   mar     47
2013   France    Jan     21
2013   France    fev     66
2013   France    mar     15
2013   Germany   Dez     40')

# the months have to be manipulated to make text recognition work - you can look up the info here and it seems you have portugues or brazilian data
# http://metodologia.lilacs.bvsalud.org/docs/pt/tabela-abreviatura-meses.htm
dtc <- dt %>%
    dplyr::mutate(Month = stringr::str_to_lower(Month), # text to small case - less substitutions
                  # correct months for text recognition (manipulation to lower case is performed in prior call and overwritten with new data
                  Month = dplyr::case_when(Month == 'fev' ~ 'feb',
                                           Month == 'dez' ~ 'dec',
                                           TRUE ~ Month),
              # use tidyverse recognition for dates - first of month in this case but the info helps to organize data as timeline
                  newdate = lubridate::ym(paste(Year, Month))) 

# a very very simple point chart that you can convert to line chat 
dtc %>%
    ggplot2::ggplot(aes(newdate, Profit)) +
    ggplot2::geom_point() +
    # ggplot2::geom_line() +
    ggplot2::facet_wrap(~Country) +
    ggplot2::theme(axis.text.x = element_text(angle = 90))

enter image description here

# or the option of a flipped column chart
dtc %>%
    ggplot2::ggplot(aes(newdate, Profit)) +
    ggplot2::geom_col() +
    ggplot2::facet_wrap(~Country) +
    ggplot2::coord_flip()

enter image description here

DPH
  • 4,244
  • 1
  • 8
  • 18
  • HEY @DPH! thank you for your answer! Yeah, there is another "missing" column! The dataset is all from the year of 2015 and is about 20 different Products. My idea was to summarize them as a MarketProfit per Month. I will go later through your code and see if I can edit something! Thank you so far :) – lkasquilici Jan 26 '22 at 09:20