0

I have a Koalas data frame that has what I would expect to be a unique ID column ("index") created by resetting the index on a dataframe using breakdf2 = breakdf.reset_index()

I believe I can prove that breakdf2 only has one record for index = 0 using len(breakdf2.loc[breakdf2['index'] == 0]) which returns "1".

However, when I run

for i in range (10):
  print(breakdf2.loc[breakdf2['index'] == 0][['index', 'Primary_dp_margin', 'Sell_dp_margin']])

I get

enter image description here

How is it that I'm getting 2 pairs of values for a seemingly unique identifier "index"?

L. Taylor
  • 23
  • 5
  • What does `breakdf2[breakdf2['index'].duplicated(keep=False)]` return? – It_is_Chris Apr 12 '22 at 15:18
  • After you `reset_index`, what does `breakdf2.loc[breakdf2['index']==0]` return? – not_speshal Apr 12 '22 at 15:20
  • Chris, I had to modify your code to be ```breakdf2[ks.DataFrame(breakdf2['index']).duplicated(keep=False)]```, but got a blank result from there – L. Taylor Apr 12 '22 at 15:22
  • Speshal, I already reset the index of breakdf to create breakdf2. I take your question to mean what happens if I just run ```breakdf2.loc[breakdf2['index']==0]``` on the newly created breakdf2 dataframe. It returns many columns that I omitted from my question, but index is 0, Primary_dp_margin is -0.297432, and Sell_dp_margin is NaN (yet another result...) – L. Taylor Apr 12 '22 at 15:24

0 Answers0