0

I have a dataframe where one column is the amount spent. In the amount spent column there are the values for amount spent and also negative values for any returns. For example.

ID    Store    Spent
123    A        18.50
123    A       -18.50
123    A        18.50

I want to remove the negative value then one of its positive counter parts - the idea is to only keep fully completed spend amounts so I can look at total spend.

Right now I am thinking something like this - where I have the data frame sorted by spend

if spend < 0 {
  take absolute value of spend
  if diff between abs(spend) and spend+1 = 0 then both are NA}

I would like to have something like

df[df$spend < 0] <- NA

where I can also set one positive counterpart to NA as well. Any suggestions?

user2113499
  • 1,001
  • 3
  • 13
  • 23
  • 2
    How do you know if a given row is a "positive counterpart"? Can you just pick a row at random with matching ID and identical value (sans pos/neg sign)? – IceCreamToucan Jul 22 '19 at 14:59
  • I can delete any of the so called "positive counterparts" . I am trying to get total spend for all ID's and not by individual ID so if I have a negative value somewhere I just need a positive removed as well. I don't even need to do this with a dataframe. I can take the column as a vector and work with that. – user2113499 Jul 22 '19 at 15:07
  • Are you trying to get the net expenditures per store? Or per store-id pairing? – Fred Boehm Jul 22 '19 at 15:26
  • I already have the data subset by store. I just need to sum(spend) but want to remove the returns (negative values) and the original transaction associated with that return. Because I am not containing this to individual ID's I don't necessarily need to remove the actual original transaction that is associated with a return. Just need to remove the negative value and one positive along with it. – user2113499 Jul 22 '19 at 15:34
  • 1
    Is it sufficient to calculate the net expenditure per store? – Fred Boehm Jul 22 '19 at 15:36
  • @IceCreamToucan You posted a solution using the data.table library and were able to do this but ID number. I didn't get a chance to use that to see how it works. Could you post that again? – user2113499 Jul 24 '19 at 20:52

3 Answers3

2

There should be a simpler solution to this but here is one way. Also created my own example since the one shared did not have sufficient data points to test

#Original vector
x <- c(1, 2, -2, 1, -1, -1, 2, 3, -4, 1, 4)
#Count the frequency of negative numbers, keeping all the unique numbers
vals <- table(factor(abs(x[x < 0]), levels = unique(abs(x))))   
#Count the frequency of absolute value of original vector
vals1 <- table(abs(x)) 
#Subtract the frequencies between two vectors
new_val <- vals1 - (vals * 2 )
#Recreate the new vector
as.integer(rep(names(new_val), new_val))
#[1] 1 2 3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

If you add a rowid column you can do this with data.table ant-joins.

Here's an example which takes ID into account, not deleting "positive counterparts" unless they're the same ID

First create more interesting sample data

df <- fread('
ID    Store    Spent
123    A        18.50
123    A       -18.50
123    A        18.50
123    A       -19.50
123    A        19.50
123    A       -99.50
124    A       -94.50
124    A        99.50
124    A        94.50
124    A        94.50
')

Now remove all the negative values with positive counterparts, and remove those counterparts

negs <- df[Spent < 0][, Spent := -Spent][, rid := rowid(ID, Spent)]
pos <- df[Spent > 0][, rid := rowid(ID, Spent)]
pos[!negs, on = .(ID, Spent, rid), -'rid']
#     ID Store Spent rid
# 1: 123     A  18.5   2
# 2: 124     A  99.5   1
# 3: 124     A  94.5   2

And as applied to Ronak's x vector example

x <- c(1, 2, -2, 1, -1, -1, 2, 3, -4, 1, 4)
negs <- data.table(x = -x[x<0])[, rid := rowid(x)]
pos <- data.table(x = x[x>0])[, rid := rowid(x)]
pos[!negs, on = names(pos), -'rid']

#    x
# 1: 2
# 2: 3
# 3: 1
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
0

I used the following code.

library(dplyr)
store <- rep(LETTERS[1:3], 3)
id <- c(1:4, 1:3, 1:2)
expense <- runif(9, -10, 10)
tibble(store, id, expense) %>%
  group_by(store) %>%
  summarise(net_expenditure = sum(expense))

to get this output:

# A tibble: 3 x 2
  store net_expenditure
  <chr>           <dbl>
1 A               13.3 
2 B                8.17
3 C               16.6 

Alternatively, if you wanted the net expenditure per store-id pairing, then you could use this code:

tibble(store, id, expense) %>%
  group_by(store, id) %>%
  summarise(net_expenditure = sum(expense))

I've approached your question from a slightly different perspective. I'm not sure that my code answers your question, but it might help.

Fred Boehm
  • 656
  • 4
  • 11