1

So I'm iterating thru Excel columns containing numbers and I'm trying to round all the numbers using .apply(pd.to_numeric).round()

This has always worked for me but recently, some of the Excel files contain columns with numbers mixed with fractions (e.g. 27 3/8, 50 17/32). When my script runs, I get "Unable to parse string "50 17/32" at position 0"

Suppose this is my series:

0           250.25
1           32.75
2           64
3           50 17/32
4           16 3/8
Name: Qty, dtype: object

Desired result:

0           250
1           33
2           64
3           51
4           16
Name: Qty, dtype: object

I'm trying to split the columns based on the white space and somehow trying to add the 2 columns together, but I'm running into all sorts of issues. The code below sort of works, but my original 'Qty' column is returning a bunch of NaNs instead of the original numbers for rows where there is no delimiter character

df['Qty'] = df['Qty'].fillna(value=np.nan)
df[['Qty','Fraction']] = df['Qty'].str.split(' ', expand=True)

Here's my original ['Qty'] column:

enter image description here

Here's the same rows after running that split code on it:

enter image description here

Intertingly, it does properly split the rows with integer-fraction mix, but turning certain rows to NaN for reasons I don't understand is throwing me off. Another thing I've tried is using lambda functions, but from what I can gather, those work best when it's just a traditional fraction like 3/8, without an integer in front of it. Been researching for hours and I'm close to giving up so if anyone has a clue how to go about this, I'd love to know

Thanks

user1452574
  • 485
  • 1
  • 6
  • 15
  • Does this answer your question? [Python- Convert a mixed number to a float](https://stackoverflow.com/questions/2878727/python-convert-a-mixed-number-to-a-float) – Nick ODell Apr 21 '22 at 18:39
  • To elaborate: the linked question shows you how to do this for a single number. If you use `Series.apply()`, you can apply a function across many rows. – Nick ODell Apr 21 '22 at 18:40

1 Answers1

1

Here is one approach using fractions.Fraction:

from fractions import Fraction
df2 = df['Qty'].str.extract(r'(\d+(?:\.\d+)?)?\s*(\d+/\d+)?')

out = (pd.to_numeric(df2[0], errors='coerce')
      +df2[1].fillna(0).apply(lambda x: float(Fraction(x)))
      )
df['float'] = out
df['int'] = out.round().astype(int)

output:

        Qty      float  int
0    250.25  250.25000  250
1     32.75   32.75000   33
2        64   64.00000   64
3  50 17/32   50.53125   51
4    16 3/8   16.37500   16

Alternative using arithmetic:

df2 = df['Qty'].str.extract(r'(\d+(?:\.\d+)?)?\s*(?:(\d+)/(\d+))?').astype(float)
df['int'] = (df2[0]+df2[1].fillna(0)/df2[2].fillna(1)).round().astype(int)
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    This works fine, but for some reason the surrounding numbers that do not have a / are getting converted into NaNs. Probably some issue with Excel, I suppose I'll just figure out a way to compare the 2 columns and get the max value. Regardless, your code works based on my specific question, so thank you – user1452574 Apr 21 '22 at 19:26