0

I would like to add totals to my data frame but am having difficulties because the data is quite messy (as ever!) - some columns are text, some dates, some numeric. I can't post the actual data as it is sensitive but I will show a representative example instead which is same structure (below - needed columns are in yellow. I have been trying to do this with dplyr and pipes but run into problems due to mix of text and numbers....

data:

date <- c("17/08/2020", "17/08/2020", "17/08/2020", "17/08/2020","18/08/2020", "18/08/2020", "18/08/2020", "18/08/2020")

type <- c("type A", "type B", "type A", "type B","type A", "type B","type A", "type B")

location <- c("USA","USA","India","India","USA","USA","India","India")

value <- c("10","10","frak","frak","15","15","open","open")

df <- data.frame(date, type, location, value)

Basically, I need total summarised by date, by type and by location. enter image description here

itsMeInMiami
  • 2,324
  • 1
  • 13
  • 34
  • Not sure what you want exactly since the image you provide and your description differ about the grouping. Should "frak" and "open" be filtered out , should they be NA or should they be in the summarized data? – Humpelstielzchen Aug 17 '20 at 11:13
  • the text data such as frak and open cannot be summed, so "n/a" or blank is fine for those.... all the numeric data I want to sum to give the yellow rows in my image – R_debutante Aug 17 '20 at 11:40

2 Answers2

1

Not sure if that's what you're after.

df %>%
  group_by(date, type = "total_type", location) %>%
  summarise("value" = sum(as.numeric(value), na.rm = F)) %>%
  mutate(value = as.character(value)) %>%
  bind_rows(df)

# A tibble: 12 x 4
# Groups:   date, type [6]
   date       type       location value
   <chr>      <chr>      <chr>    <chr>
 1 17/08/2020 total_type India    NA   
 2 17/08/2020 total_type USA      20   
 3 18/08/2020 total_type India    NA   
 4 18/08/2020 total_type USA      30   
 5 17/08/2020 type A     USA      10   
 6 17/08/2020 type B     USA      10   
 7 17/08/2020 type A     India    frak 
 8 17/08/2020 type B     India    frak 
 9 18/08/2020 type A     USA      15   
10 18/08/2020 type B     USA      15   
11 18/08/2020 type A     India    open 
12 18/08/2020 type B     India    open 

Grouping by all columns but value reproduces your original table and in your image summarised rows have type = total_type. On the other hand all your summarised rows in the image have location USA which doesn't make sense either, so I just let that as it is.

Humpelstielzchen
  • 6,126
  • 3
  • 14
  • 34
  • This looks good, thanks. Although I would like the new rows added to the original data frame? – R_debutante Aug 17 '20 at 12:05
  • write the `group_by` statement this way: `group_by(date, type = "total_type", location) %>%` and then use `bind_rows` to bind `df` to the dataframe with the totals – Edo Aug 17 '20 at 12:08
0

I would suggest next approach, which is also close to the one proposed by @Humpelstielzchen, that is close to what you show in the image:

library(dplyr)

df %>% bind_rows(df %>% group_by(date,location) %>%
                   mutate(value=as.numeric(value)) %>% 
                   summarise(value=sum(value,na.rm=F)) %>%
                   mutate(type='total type',value=as.character(value)))

Output:

         date       type location value
1  17/08/2020     type A      USA    10
2  17/08/2020     type B      USA    10
3  17/08/2020     type A    India  frak
4  17/08/2020     type B    India  frak
5  18/08/2020     type A      USA    15
6  18/08/2020     type B      USA    15
7  18/08/2020     type A    India  open
8  18/08/2020     type B    India  open
9  17/08/2020 total type    India  <NA>
10 17/08/2020 total type      USA    20
11 18/08/2020 total type    India  <NA>
12 18/08/2020 total type      USA    30

Update: Here an approach that could works because of OP'issues with version of package:

library(dplyr)
#Data
date <- c("17/08/2020", "17/08/2020", "17/08/2020", "17/08/2020","18/08/2020", "18/08/2020", "18/08/2020", "18/08/2020")

type <- c("type A", "type B", "type A", "type B","type A", "type B","type A", "type B")

location <- c("USA","USA","India","India","USA","USA","India","India")

value <- c("10","10","frak","frak","15","15","open","open")

df <- data.frame(date, type, location, value,stringsAsFactors = F)
#Mutate for summary
df1 <- df %>% group_by(date,location) %>%
  mutate(value=as.numeric(value)) %>% 
  summarise(value=sum(value,na.rm=F)) %>%
  mutate(type='total type') %>% ungroup()
df1$value <- as.character(df1$value)
#Bind
df2 <- rbind(df,df1)

Output:

         date       type location value
1  17/08/2020     type A      USA    10
2  17/08/2020     type B      USA    10
3  17/08/2020     type A    India  frak
4  17/08/2020     type B    India  frak
5  18/08/2020     type A      USA    15
6  18/08/2020     type B      USA    15
7  18/08/2020     type A    India  open
8  18/08/2020     type B    India  open
9  17/08/2020 total type    India  <NA>
10 17/08/2020 total type      USA    20
11 18/08/2020 total type    India  <NA>
12 18/08/2020 total type      USA    30
Duck
  • 39,058
  • 13
  • 42
  • 84
  • This looks perfect but I get an error - Error: Column `value` can't be converted from numeric to character – R_debutante Aug 17 '20 at 12:40
  • @R_debutante That is strange could you try `df %>% bind_rows(df %>% group_by(date,location) %>% mutate(value=as.numeric(value)) %>% summarise(value=sum(value,na.rm=F)) %>% ungroup() %>% mutate(type='total type',value=as.character(value)))` – Duck Aug 17 '20 at 12:46
  • Hmmm. Yes, I still get the same error message? Error: Column 'value' can't be converted from numeric to character – R_debutante Aug 17 '20 at 13:40
  • @R_debutante Could you please re start `R` and see if the issue persists? I believe is an issue of packages or maybe update `dplyr` to last version! – Duck Aug 17 '20 at 13:49
  • I am running dplyr 0.8, think this is the issue - I cannot upgrade though as is workplace server... – R_debutante Aug 18 '20 at 12:45
  • @R_debutante Could you please try: `df %>% bind_rows(df %>% group_by(date,location) %>% mutate(value=as.numeric(value)) %>% summarise(value=sum(value,na.rm=F)) %>% mutate(type='total type') %>% mutate_all(.funs = as.character))` – Duck Aug 18 '20 at 12:50
  • Hmm, that seems to generate an extra column called 'type'. Also the 'name' column now populated with 'name' and value' and the correct names are gone? – R_debutante Aug 18 '20 at 13:52
  • I am trying a clunkier way... 2 df's separately then rbind. However, my 2nd df is grouped class: 1] "grouped_df" "tbl_df" "tbl" "data.frame" so when I bind them it comes out a mess, THE OUTPUT LOOKS LIKE THIS: names Date name value df Character,207176 Character,207176 Character,207176 Character,207176 df2 Character,164 Character,164 Character,164 Numeric,164 – R_debutante Aug 18 '20 at 13:55
  • R is way harder than I thought! :-) – R_debutante Aug 18 '20 at 13:56
  • @R_debutante I am solving the issue now, just a moment please :) – Duck Aug 18 '20 at 13:57
  • @R_debutante I have added an update, could you please check and see if that works for you? – Duck Aug 18 '20 at 14:02
  • Thank you - still not working somehow.... I changed track and doing it little differently. Thank you though, appreciate the help – R_debutante Aug 19 '20 at 08:22
  • @R_debutante Oh sorry I think is because of package version. If you believe this answer helped you in any sense you could upvote. It is up to you and thanks for your time! – Duck Aug 19 '20 at 12:06