Removing negative values and one positive value from R dataframe

Question

I have a dataframe where one column is the amount spent. In the amount spent column there are the values for amount spent and also negative values for any returns. For example.

ID    Store    Spent
123    A        18.50
123    A       -18.50
123    A        18.50

I want to remove the negative value then one of its positive counter parts - the idea is to only keep fully completed spend amounts so I can look at total spend.

Right now I am thinking something like this - where I have the data frame sorted by spend

if spend < 0 {
  take absolute value of spend
  if diff between abs(spend) and spend+1 = 0 then both are NA}

I would like to have something like

df[df$spend < 0] <- NA

where I can also set one positive counterpart to NA as well. Any suggestions?

How do you know if a given row is a "positive counterpart"? Can you just pick a row at random with matching ID and identical value (sans pos/neg sign)? — IceCreamToucan, Jul 22 '19 at 14:59
I can delete any of the so called "positive counterparts" . I am trying to get total spend for all ID's and not by individual ID so if I have a negative value somewhere I just need a positive removed as well. I don't even need to do this with a dataframe. I can take the column as a vector and work with that. — user2113499, Jul 22 '19 at 15:07
Are you trying to get the net expenditures per store? Or per store-id pairing? — Fred Boehm, Jul 22 '19 at 15:26
I already have the data subset by store. I just need to sum(spend) but want to remove the returns (negative values) and the original transaction associated with that return. Because I am not containing this to individual ID's I don't necessarily need to remove the actual original transaction that is associated with a return. Just need to remove the negative value and one positive along with it. — user2113499, Jul 22 '19 at 15:34
Is it sufficient to calculate the net expenditure per store? — Fred Boehm, Jul 22 '19 at 15:36
@IceCreamToucan You posted a solution using the data.table library and were able to do this but ID number. I didn't get a chance to use that to see how it works. Could you post that again? — user2113499, Jul 24 '19 at 20:52

score 2 · Accepted Answer · answered Jul 22 '19 at 15:22

There should be a simpler solution to this but here is one way. Also created my own example since the one shared did not have sufficient data points to test

#Original vector
x <- c(1, 2, -2, 1, -1, -1, 2, 3, -4, 1, 4)
#Count the frequency of negative numbers, keeping all the unique numbers
vals <- table(factor(abs(x[x < 0]), levels = unique(abs(x))))   
#Count the frequency of absolute value of original vector
vals1 <- table(abs(x)) 
#Subtract the frequencies between two vectors
new_val <- vals1 - (vals * 2 )
#Recreate the new vector
as.integer(rep(names(new_val), new_val))
#[1] 1 2 3

score 2 · Answer 2 · answered Jul 22 '19 at 16:52

If you add a rowid column you can do this with data.table ant-joins.

Here's an example which takes ID into account, not deleting "positive counterparts" unless they're the same ID

First create more interesting sample data

df <- fread('
ID    Store    Spent
123    A        18.50
123    A       -18.50
123    A        18.50
123    A       -19.50
123    A        19.50
123    A       -99.50
124    A       -94.50
124    A        99.50
124    A        94.50
124    A        94.50
')

Now remove all the negative values with positive counterparts, and remove those counterparts

negs <- df[Spent < 0][, Spent := -Spent][, rid := rowid(ID, Spent)]
pos <- df[Spent > 0][, rid := rowid(ID, Spent)]
pos[!negs, on = .(ID, Spent, rid), -'rid']
#     ID Store Spent rid
# 1: 123     A  18.5   2
# 2: 124     A  99.5   1
# 3: 124     A  94.5   2

And as applied to Ronak's x vector example

x <- c(1, 2, -2, 1, -1, -1, 2, 3, -4, 1, 4)
negs <- data.table(x = -x[x<0])[, rid := rowid(x)]
pos <- data.table(x = x[x>0])[, rid := rowid(x)]
pos[!negs, on = names(pos), -'rid']

#    x
# 1: 2
# 2: 3
# 3: 1

score 0 · Answer 3 · answered Jul 22 '19 at 15:35

I used the following code.

library(dplyr)
store <- rep(LETTERS[1:3], 3)
id <- c(1:4, 1:3, 1:2)
expense <- runif(9, -10, 10)
tibble(store, id, expense) %>%
  group_by(store) %>%
  summarise(net_expenditure = sum(expense))

to get this output:

# A tibble: 3 x 2
  store net_expenditure
  <chr>           <dbl>
1 A               13.3 
2 B                8.17
3 C               16.6

Alternatively, if you wanted the net expenditure per store-id pairing, then you could use this code:

tibble(store, id, expense) %>%
  group_by(store, id) %>%
  summarise(net_expenditure = sum(expense))

I've approached your question from a slightly different perspective. I'm not sure that my code answers your question, but it might help.

Removing negative values and one positive value from R dataframe

3 Answers3

Linked