1

I have a dataset with supplier and its invoices and I need to determinate which of the invoices are consecutives marking it with a 1 or a 0.

For example:

df1 = pd.DataFrame()
df1['Supplier'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC']
df1['Invoice'] = ['101','102','105','109','110','114','120','121','122','130','131']
   Supplier Invoice
0       ABC     101
1       ABC     102
2       ABC     105
3       ABC     109
4       ABC     110
5       ABC     114
6       ABC     120
7       ABC     121
8       ABC     122
9       ABC     130
10      ABC     131

And what I want is a third column like this:

   Supplier Invoice Consecutive
0       ABC     101           0
1       ABC     102           1
2       ABC     105           0
3       ABC     109           0
4       ABC     110           1
5       ABC     114           0
6       ABC     120           0
7       ABC     121           1
8       ABC     122           1
9       ABC     130           0
10      ABC     131           1

EDIT

Thanks for your answers, this options works great, but when I tried it in a real database I realized that I need to compare the suppliers, if the next row is a new supplier the consecutive must be 0.

For example:

   Supplier Invoice Consecutive
0       ABC     101           0
1       ABC     102           1
2       ABC     105           0
3       ABC     109           0
4       ABC     110           1
5       ABC     114           0
6       ABC     120           0
7       ABC     121           1
8       ABC     122           1
9       ABC     130           0
10      ABC     131           1
11      DEF     132           0
12      DEF     133           1
13      GHI     134           0

Thanks in advance!

  • Observe that the difference between consecutive invoices is always 1. So, you can logically add another column with the difference computed between the current row and the previous, and then add yet another column that indicates that the diff column has a value of 1. – Mark Sep 28 '22 at 16:31

4 Answers4

2

Make sure your 'Invoice' column is of a numeric type first. Then you can use:

df1['Consecutive'] = (df1.Invoice - df1.Invoice.shift(1) == 1).astype(int)

prints:

   Supplier  Invoice  Consecutive
0       ABC      101            0
1       ABC      102            1
2       ABC      105            0
3       ABC      109            0
4       ABC      110            1
5       ABC      114            0
6       ABC      120            0
7       ABC      121            1
8       ABC      122            1
9       ABC      130            0
10      ABC      131            1
sophocles
  • 13,593
  • 3
  • 14
  • 33
2

using npwhere and diff

# take difference b/w consecutive rows using diff
# update to 0 when difference is not 1

df1['Consecutive']=(df1.groupby('Supplier')['Invoice']
                    .transform(lambda x: np.where(
                                           x.astype(int).diff() !=1, 0, 1)) )
df1
df1
    Supplier    Invoice     Consecutive
0   ABC     101     0
1   ABC     102     1
2   ABC     105     0
3   ABC     109     0
4   ABC     110     1
5   ABC     114     0
6   ABC     120     0
7   ABC     121     1
8   ABC     122     1
9   ABC     130     0
10  ABC     131     1
11  DEF     132     0
12  DEF     133     1
13  GHI     134     0
Naveed
  • 11,495
  • 2
  • 14
  • 21
1
df1['Consecutive']=df1.Invoice.astype(int).diff().eq(1).astype(int)
   Supplier Invoice  Consecutive
0       ABC     101            0
1       ABC     102            1
2       ABC     105            0
3       ABC     109            0
4       ABC     110            1
5       ABC     114            0
6       ABC     120            0
7       ABC     121            1
8       ABC     122            1
9       ABC     130            0
10      ABC     131            1
Алексей Р
  • 7,507
  • 2
  • 7
  • 18
0

Thanks to all for your answers, here is the code for the case we have multiple suppliers.

df1 = pd.DataFrame()
df1['Supplier'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','GHI']
df1['Invoice'] = ['101','102','105','109','110','114','120','121','122','130','131','132','133','134']
df1['Consecutive']= np.where((df1['Invoice'].astype('int64').diff() ==1) 
                                             & (df1.Supplier.eq(df1.Supplier.shift())), 1, 0)

Result:

   Supplier Invoice  Consecutive
0       ABC     101            0
1       ABC     102            1
2       ABC     105            0
3       ABC     109            0
4       ABC     110            1
5       ABC     114            0
6       ABC     120            0
7       ABC     121            1
8       ABC     122            1
9       ABC     130            0
10      ABC     131            1
11      DEF     132            0
12      DEF     133            1
13      GHI     134            0