1

I have a dataframe that looks like this:

issue_id       repo_id
101             10365
102             10543
103             11001  

df = pd.DataFrame({"issue_id":[101,102,103],"repo_id":[10365,10543,11001]})

I want to iterate through the dataframe, use the values of issue_id and repo_id in each index to request data from an API, and append the response to a new column.

Here is what I have done so far (works for the sample df) It assigns the payload recieved by the get_issue_data method of the zenhub api to df['new'] at that index.

df['new'='na'
for i in df.index:
    df['new'][i]=zh.get_issue_data(df.repo_id[i],df.issue_id[i])['pipelines']

(zh is just the namespace for the pyzenhub library I am using to pull data for issues from zenhub)

When I use it for the small sample df mentioned above, it works...But when I use this in my actual code , within another nested loop , the code still runs but df['new'] only previously assigned has 'na' values.

my question is , will I need to structure the above code differently for it to run properly within a loop?

Devarshi Goswami
  • 1,035
  • 4
  • 11
  • 26

1 Answers1

1

This df['new'][i] = ... is chain indexing and is not guaranteed to work. More details in this doc.

You could do:

# you don't need this
# df['new'] = 'na'

df['new'] = [zh.get_issue_data(repo_id, issue_id)['pipelines']
                for repo_id, issue_id in zip(df.repo_id, df.issue_id)]

Or use apply:

df['new'] = df.apply(lambda x: zh.get_issue_data(x.repo_id, x.issue_id)['pipelines'],
                     axis=1)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74