-2

I come across problems when trying to make rowSums in dplyr.

After grouping the data via

data <- data %>%
  group_by(location, category) %>% 
  summarise(amount = sum(amount)) %>%
  spread(key = "category", value = "amount", fill = 0)

The output is:

# A tibble: 4,211 x 140
# Groups:   location [4,211]
                       location    art      books     cars 
 *                        <chr>    <dbl>    <dbl>     <dbl>                         
 1                 New York, NY    0         10       0                    
 2              Los Angeles, CA    12        0        2           
 ...

Then trying to make the rowSum didn't work:

data %>% mutate(sum=rowSums(.))

Error in mutate_impl(.data, dots) : 
  Evaluation error: 'x' must be numeric.

> class(ks)
[1] "grouped_df" "tbl_df"     "tbl"        "data.frame"

I tried to change the pivot like below, but it didn't help either:

data <- data %>%
  group_by(location, category) %>% 
  summarise(amount = as.numeric(sum(amount))) %>%        # Changed
  spread(key = "category", value = "amount", fill = 0)

str(data.frame(data))

'data.frame':   4211 obs. of  140 variables:
 $ location                    : chr  "New York, NY" "Los Angeles, CA" ... ...
 $ art                         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ books                       : num  0 0 0 0 0 0 0 0 0 0 ...
 $ cars                        : num  0 0 0 0 0 0 0 0 0 0 ...
 ...     

It would be great to have some help here.

After calculating the sum of each row, I need to filter locations that have a rowsum < 1000. It would also be great to know how to do this and if dplyr is the right approach in general.

Christopher
  • 2,120
  • 7
  • 31
  • 58
  • 1
    `location` is a character, `data %>% mutate(sum=rowSums(.[-1]))` should work – Sotos Nov 24 '17 at 13:46
  • Returns: "Error in mutate_impl(.data, dots) : Column `sum` must be length 1 (the group size), not 4211" – Christopher Nov 24 '17 at 13:48
  • 1
    Well, provide a reproducible example then. That should work (You can try it with `iris` to see, i.e. `iris %>% mutate(sum = rowSums(.[-5]))`) – Sotos Nov 24 '17 at 13:52
  • I wish I could but I can't share the data publicly. I added some additional information on top, maybe that helps? I'm puzzles because the data appears to be numeric. – Christopher Nov 24 '17 at 14:01
  • `rowSums(data.frame(data)) Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 'x' must be numeric"` – Christopher Nov 24 '17 at 14:03
  • 3
    You can create mock data with the same structure as your original and share that. – Sotos Nov 24 '17 at 14:03
  • 1
    maybe?? `mutate(sum=rowSums(.[,-1]))` ?? – Ben Bolker Nov 24 '17 at 14:07
  • Same error, I'm sorry. – Christopher Nov 24 '17 at 14:09
  • @Sotos: The csv roughly goes: `"location","amount","category", "Los Angeles, CA",140,"art", "Los Angeles, CA",17,"books", "Los Angeles, CA",20,"cars", "Los Angeles, CA",50,"art", "New York, NY",60,"cars", "New York, NY",52,"art"` – Christopher Nov 24 '17 at 14:10
  • 1
    I am sorry, I can't help you any further until you provide reproducible example. – Sotos Nov 24 '17 at 14:25
  • The data above is actually a mockup of the original data. Unfortunately, I can't provide the originals. What puzzles me is that `$amount : num 141 105 116 182 943 ...` So I thought the problem is more in the syntax than in the data :( – Christopher Nov 24 '17 at 14:40
  • @BenBolker @Sotos: I tried `data <- data.frame(data) data <- data %>% mutate(sums=rowSums(.[-1]))` For some reason, that worked , but only after setting as a data.frame. Any explanation for this? – Christopher Nov 24 '17 at 14:52

1 Answers1

2
test %>% 
    mutate(row = 1:n()) %>% 
    gather(variable, value, -Species, -row) %>% 
    group_by(Species, row) %>% 
    summarize(value = sum(value))

Here I'm using the Iris data set for an example ("Species" is a character vector, everything else is numeric). This uses gather to collect all the columns you want to sum up. It also creates a unique identifier for each row. You could merge the resulting data frame back into your original data and then filter by the summed column.

jdobres
  • 11,339
  • 1
  • 17
  • 37