7

How can I iterate over pairs of rows of a Pandas DataFrame?

For example:

content = [(1,2,[1,3]),(3,4,[2,4]),(5,6,[6,9]),(7,8,[9,10])]
df = pd.DataFrame( content, columns=["a","b","interval"])
print df

output:

   a  b interval
0  1  2   [1, 3]
1  3  4   [2, 4]
2  5  6   [6, 9]
3  7  8  [9, 10]

Now I would like to do something like

for (indx1,row1), (indx2,row2) in df.?
    print "row1:\n", row1
    print "row2:\n", row2
    print "\n"

which should output

row1:
a    1
b    2
interval    [1,3]
Name: 0, dtype: int64
row2:
a    3
b    4
interval    [2,4]
Name: 1, dtype: int64

row1:
a    3
b    4
interval    [2,4]
Name: 1, dtype: int64
row2:
a    5
b    6
interval    [6,9]
Name: 2, dtype: int64

row1:
a    5
b    6
interval    [6,9]
Name: 2, dtype: int64
row2:
a    7
b    8
interval    [9,10]
Name: 3, dtype: int64

Is there a builtin way to achieve this? I looked at df.groupby(df.index // 2) and df.itertuples but none of these methods seems to do what I want.

Edit: The overall goal is to get a list of bools indicating whether the intervals in column "interval" overlap. In the above example the list would be

overlaps = [True, False, False]

So one bool for each pair.

Lxndr
  • 191
  • 1
  • 4
  • 13
  • 1
    You can try shift, which essentially returns a dataframe of "the next rows". – xyzjayne Jul 20 '18 at 13:36
  • 1
    How would one then combine df and df.shift(1)? – Lxndr Jul 20 '18 at 13:39
  • Why do you want to loop? Post your greater problem.. you probably dont need the looping – rafaelc Jul 20 '18 at 13:41
  • One column of the dataframe contains an interval in each row and I want to check if the intervals overlap pairwise. – Lxndr Jul 20 '18 at 13:42
  • 1
    @Lxndr your problem has been asked plenty of times actually (the interval problem). Definetly dont need a loop to do this. Will get very slow as your data frame size increases – rafaelc Jul 20 '18 at 14:16
  • Do you have a link? I'm not really sure what to search for. – Lxndr Jul 20 '18 at 14:21
  • Does this answer your question? [Apply function on pairs of rows in Pandas dataframe](https://stackoverflow.com/questions/52711358/apply-function-on-pairs-of-rows-in-pandas-dataframe) – AMC Feb 22 '20 at 19:17

4 Answers4

15

shift the dataframe & concat it back to the original using axis=1 so that each interval & the next interval are in the same row

df_merged = pd.concat([df, df.shift(-1).add_prefix('next_')], axis=1)
df_merged
#Out:
   a  b interval     next_a     next_b    next_interval
0  1  2   [1, 3]        3.0        4.0           [2, 4]
1  3  4   [2, 4]        5.0        6.0           [6, 9]
2  5  6   [6, 9]        7.0        8.0          [9, 10]
3  7  8  [9, 10]        NaN        NaN              NaN

define an intersects function that works with your lists representation & apply on the merged data frame ignoring the last row where the shifted_interval is null

def intersects(left, right):
    return left[1] > right[0]

df_merged[:-1].apply(lambda x: intersects(x.interval, x.next_interval), axis=1)
#Out:
0     True
1    False
2    False
dtype: bool
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
  • 1
    This is awesome! I will leave @Ben.T 's answer as the accepted though since it better answers my original question. I will use this for my problem though! – Lxndr Jul 20 '18 at 14:36
2

If you want to keep the loop for, using zip and iterrows could be a way

for (indx1,row1),(indx2,row2) in zip(df[:-1].iterrows(),df[1:].iterrows()):
    print "row1:\n", row1
    print "row2:\n", row2
    print "\n"

To access the next row at the same time, start the second iterrow one row after with df[1:].iterrows(). and you get the output the way you want.

row1:
a    1
b    2
Name: 0, dtype: int64
row2:
a    3
b    4
Name: 1, dtype: int64


row1:
a    3
b    4
Name: 1, dtype: int64
row2:
a    5
b    6
Name: 2, dtype: int64


row1:
a    5
b    6
Name: 2, dtype: int64
row2:
a    7
b    8
Name: 3, dtype: int64

But as said @RafaelC, doing for loop might not be the best method for your general problem.

Ben.T
  • 29,160
  • 6
  • 32
  • 54
0

To get the output you've shown use:

for row in df.index[:-1]:
    print 'row 1:'
    print df.iloc[row].squeeze()
    print 'row 2:'
    print df.iloc[row+1].squeeze()
    print
zipa
  • 27,316
  • 6
  • 40
  • 58
  • I was hoping for a more 'pythonic' solution. This to me looks like for i in xrange(len(iterable)): print iterable[i] – Lxndr Jul 20 '18 at 13:49
  • @Lxndr Your request is to `print` in certain way, and this approach uses `pandas` specific methods to generate desired output – zipa Jul 20 '18 at 13:52
0

You could try the iloc indexing.

Exmaple:

for i in range(df.shape[0] - 1):                        
    idx1,idx2=i,i+1                         
    row1,row2=df.iloc[idx1],df.iloc[idx2]   
    print(row1)                             
    print(row2)                             
    print()