Python column retains original updated 'NA'; never gets updated with float

Question

When updating dataframe column, FractionOfVote, my first step was to add a new column, FractionOfVote, with default NA value. Then parse the dataframe column, Votes, using split.

The following two functions code works fine: 1) add_new_column_fraction(), 2) add_new_column_votes().

def add_new_column_fraction(df):
    df['FractionOfVote'] = 'NA'

def add_new_column_votes(df):
    df[['YesVotes','NumVotes']] = df['Votes'].str.split('/',expand=True)[[0,1]]

The problem code is found in function calc_fraction_ratio_for_votes()

def calc_fraction_ratio_for_votes(df):
    for idx, row in df.iterrows():
        numerator = row['YesVotes']
        denomerator = row['NumVotes']
        try:
            row['FractionOfVote'] = float(numerator) / float(denomerator)
        except ZeroDivisionError:
            row['FractionOfVote'] = 'NaN'

This function takes two other dataframe columns, YesVotes, NumVotes, and calculates a new float value for the new column, FractionOfVote, defined previously in add_new_column_fraction().

The logical error is that column, FractionOfVote, retains the original updated 'NA'; and never received the update from "row['FractionOfVote'] = float(numerator) / float(denomerator)" with either the float value calculation, or the 'NaN' from the "except ZeroDivisionError".

dataista · Answer 1 · 2018-11-22T17:11:46.333

1

Why are you using iterrrows() in the first place? You can achieve the same results with a vectorized implementation as below:

 # Create column and fill all values to NaN by default
 df['FractionOfVote'] = np.nan # import numpy as np if you didn't

 # Populate the valid values with the ratio.
 df.loc[df['NumVotes'].astype(float) > 0, 'FractionOfVote'] = df['YesVotes'] / df['NumVotes']

edited Nov 22 '18 at 17:11

answered Nov 22 '18 at 17:09

dataista

3,187
1
16
23

1

Why I was using iterrow(), too many years of Java iteration programming, it's still in my head :) – manager_matt Nov 22 '18 at 17:28

score 0 · Accepted Answer · answered Nov 22 '18 at 17:06

0

You should try and avoid Python-level loops. First ensure your series are numeric (if necessary):

df = pd.DataFrame({'Yes': [0, 3, 0, 10, 0],
                   'Num': [0, 5, 0, 30, 2]})

num_cols = ['Yes', 'Num']
df[num_cols] = df[num_cols].apply(pd.to_numeric, errors='coerce')

Then use division and replace inf with NaN:

print((df['Yes'] / df['Num']).replace(np.inf, np.nan))

0         NaN
1    0.600000
2         NaN
3    0.333333
4    0.000000
dtype: float64

answered Nov 22 '18 at 17:06

jpp

159,742
34
281
339

thanks, right on, Python level loops on data.frames appear to operate somewhat irregular, thanks for catching and the commendation to avoid Python loops on data.frame when a data.frame level function is more appropriate to use – manager_matt Nov 22 '18 at 17:26

Python column retains original updated 'NA'; never gets updated with float

2 Answers2