0

Reference code and image below: enter image description here

I have a dataframe that is grouped by company name that looks like so:


Company | tweet
AMZN      @115827 Thanks for your patience.
AMZN      @115826 I'm sorry for the wait. You'll receive an email as soon as possible.
APPL      @115818 Glad to check.
APPL      @115853 Happy to assist any way I can.
DASH      @116109 We have followed up via DM.
DASH      @116269 We've been in touch via DM!

After subsetting the tweet field based on each word using the code below - I ended up creating a row for each word found per tweet. Example of new table.

CODE

Supp_cleaned_tweets <- Customer_df %>%
  mutate(Cleaned_Tweet = str_remove_all(tweet, "\\s*@\\S+")) %>%
  select(Company, Cleaned_Tweet) %>%
  mutate(line = row_number()) %>%
  unnest_tokens(word, Cleaned_Tweet) %>%
  anti_join(stop_words)


Company | word

AMZN      Thanks
AMZN      for
AMZN      your
AMZN      patience
APPL      Glad
APPL      to
APPL      check

What I am having trouble is to create a graph that shows each company and their respective top 10 most common words found - in descending order - as each company will have different words. What I would like to do is a facet_wrap so it's all on one image but the y-axis is messing up.

Supp_cleaned_tweets %>%
  group_by(Company) %>%
  count(word, sort = TRUE) %>%
  top_n(10) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n, fill = Company)) +
  geom_col() +
  facet_wrap(~ Company) +
  xlab(NULL) +
  coord_flip() +
  labs(y = "Count",
       x = "Unique words",
       title = "Most frequent words found in the tweets",
       subtitle = "Stop words removed from the list")
Dinho
  • 704
  • 4
  • 15
  • 1
    Try `facet_wrap(~ Company, scales = "free")` – Vinícius Félix Sep 15 '21 at 00:59
  • That worked great! Now my only issues is having the geom_col be in desc order for each plot. Only seems to be the case 2 out of the 5 figures... @ViníciusFélix – Dinho Sep 15 '21 at 01:06
  • 1
    That happens because the factor is applied to all data, a solution is to use this function: https://rdrr.io/github/dgrtwo/drlib/man/reorder_within.html – Vinícius Félix Sep 15 '21 at 01:07
  • Gotcha - so it looks like I need to do the reorder_within in the ggplot(aes(x = word, y = n, fill = author_id)) statement? I thought my mutate line would do it – Dinho Sep 15 '21 at 01:10

1 Answers1

0

UPDATE

Solved based on this code below - referenced from the help in comments link shared

Supp_cleaned_tweets %>%
  group_by(Company) %>%
  count(word, sort = TRUE) %>%
  top_n(10) %>%
  ungroup %>%
  mutate(word = reorder_within(word, n, Company)) %>%
  ggplot(aes(x = word, y = n, fill = author_id)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ author_id, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Count",
       x = "Unique words",
       title = "Most frequent words found in the tweets",
       subtitle = "Stop words removed from the list")
Dinho
  • 704
  • 4
  • 15