0

When updating dataframe column, FractionOfVote, my first step was to add a new column, FractionOfVote, with default NA value. Then parse the dataframe column, Votes, using split.

The following two functions code works fine: 1) add_new_column_fraction(), 2) add_new_column_votes().

def add_new_column_fraction(df):
    df['FractionOfVote'] = 'NA'

def add_new_column_votes(df):
    df[['YesVotes','NumVotes']] = df['Votes'].str.split('/',expand=True)[[0,1]]

The problem code is found in function calc_fraction_ratio_for_votes()

def calc_fraction_ratio_for_votes(df):
    for idx, row in df.iterrows():
        numerator = row['YesVotes']
        denomerator = row['NumVotes']
        try:
            row['FractionOfVote'] = float(numerator) / float(denomerator)
        except ZeroDivisionError:
            row['FractionOfVote'] = 'NaN'

This function takes two other dataframe columns, YesVotes, NumVotes, and calculates a new float value for the new column, FractionOfVote, defined previously in add_new_column_fraction().

The logical error is that column, FractionOfVote, retains the original updated 'NA'; and never received the update from "row['FractionOfVote'] = float(numerator) / float(denomerator)" with either the float value calculation, or the 'NaN' from the "except ZeroDivisionError".

jpp
  • 159,742
  • 34
  • 281
  • 339
manager_matt
  • 395
  • 4
  • 19

2 Answers2

1

Why are you using iterrrows() in the first place? You can achieve the same results with a vectorized implementation as below:

 # Create column and fill all values to NaN by default
 df['FractionOfVote'] = np.nan # import numpy as np if you didn't

 # Populate the valid values with the ratio.
 df.loc[df['NumVotes'].astype(float) > 0, 'FractionOfVote'] = df['YesVotes'] / df['NumVotes'] 
dataista
  • 3,187
  • 1
  • 16
  • 23
0

You should try and avoid Python-level loops. First ensure your series are numeric (if necessary):

df = pd.DataFrame({'Yes': [0, 3, 0, 10, 0],
                   'Num': [0, 5, 0, 30, 2]})

num_cols = ['Yes', 'Num']
df[num_cols] = df[num_cols].apply(pd.to_numeric, errors='coerce')

Then use division and replace inf with NaN:

print((df['Yes'] / df['Num']).replace(np.inf, np.nan))

0         NaN
1    0.600000
2         NaN
3    0.333333
4    0.000000
dtype: float64
jpp
  • 159,742
  • 34
  • 281
  • 339
  • thanks, right on, Python level loops on data.frames appear to operate somewhat irregular, thanks for catching and the commendation to avoid Python loops on data.frame when a data.frame level function is more appropriate to use – manager_matt Nov 22 '18 at 17:26