Excel formula evaluation in Pandas

Question

Say i have a dict like this whose key's are cell references in excel and value's are either formula or integer.

input_dict = {
    "B25": "=B2*B4",
    "C25": "=C2*C4",
    "D25": "=D2*D4",
    "E25": "=E2*E4",
    "F25": "=F2*F4",
    "G25": "=G2*G4",
    "B22": 0,
    "C22": "=SUM(B22:B28)",
    "D22": "=SUM(C22:C28)",
    "E22": "=SUM(D22:D28)",
    "F22": "=SUM(E22:E28)",
    "G22": "=SUM(F22:F28)",
    "B28": "=B19*B20*B22",
    "C28": "=C19*C20*C22",
    "D28": "=D19*D20*D22",
    "E28": "=E19*E20*E22",
    "F28": "=F19*F20*F22",
    "G28": "=G19*G20*G22",
    "B2": 1000,
    "C2": 900,
    "D2": 880,
    "E2": 860,
    "F2": 840,
    "G2": 800,
    "B4": 0.95,
    "C4": 0.90,
    "D4": 0.80,
    "E4": 0.80,
    "F4": 0.70,
    "G4": 0.60,
    "B19": 0.001,
    "C19": 0.001,
    "D19": 0.001,
    "E19": 0.001,
    "F19": 0.001,
    "G19": 0.002,
    "B20": 4,
    "C20": 3,
    "D20": 4,
    "E20": 4,
    "F20": 3,
    "G20": 4
  }

How to perform the calculation on this type of data.

My approach was to convert the dict to DataFrame and perform the calculations on it but i am stuck.

df = pd.DataFrame(list(input_dict.items()))

The output of df is shown below.

    0   1
0   G22 =SUM(F22:F28)
1   G4  0.6
2   F2  840
3   D2  880
4   C20 3
5   C4  0.9
6   B28 =B19*B20*B22
7   F25 =F2*F4
8   B25 =B2*B4
9   G25 =G2*G4
10  C28 =C19*C20*C22
11  G28 =G19*G20*G22
12  F22 =SUM(E22:E28)
13  C25 =C2*C4
14  B19 0.001
15  E4  0.8
16  D22 =SUM(C22:C28)
17  D4  0.8
18  G2  800
19  E28 =E19*E20*E22
20  D20 4
21  G20 4
22  E25 =E2*E4
23  F20 3
24  G19 0.002
25  E22 =SUM(D22:D28)
26  C2  900
27  D25 =D2*D4
28  E2  860
29  D28 =D19*D20*D22
30  C19 0.001
31  F28 =F19*F20*F22
32  B20 4
33  B2  1000
34  F4  0.7
35  E19 0.001
36  D19 0.001
37  B4  0.95
38  B22 0
39  F19 0.001
40  C22 =SUM(B22:B28)
41  E20 4

How to perform excel like calculations in Python ?

The expected output is as below

{
    "B25": "950",
    "C25": "810",
    "D25": "704",
    "E25": "688",
    "F25": "588",
    "G25": "480",
    "B22": 0,
    "C22": 950,
    "D22": 1757.15,
    "E22": 2454.1214,
    "F22": 3710.908,
    "G22": 4161.220736,
    "B28": 0,
    "C28": -2.85,
    "D28": -7.0286,
    "E28": -9.8164856,
    "F28": -9.396914743,
    "G28": -29.687264,
    "B2": 1000,
    "C2": 900,
    "D2": 880,
    "E2": 860,
    "F2": 840,
    "G2": 800,
    "B4": 0.95,
    "C4": 0.90,
    "D4": 0.80,
    "E4": 0.80,
    "F4": 0.70,
    "G4": 0.60,
    "B19": 0.001,
    "C19": 0.001,
    "D19": 0.001,
    "E19": 0.001,
    "F19": 0.001,
    "G19": 0.002,
    "B20": 4,
    "C20": 3,
    "D20": 4,
    "E20": 4,
    "F20": 3,
    "G20": 4
  }

bradbase · Answer 1 · 2020-08-07T03:15:07.280

I have written a library, xlcalculator, that reads Excel files, interprets formulas with supported functions into Python and can subsequently evaluates the formulas.

An example if its use using the problem as you've stated;

input_dict = {
    "B4": 0.95,
    "B2": 1000,
    "B19": 0.001,
    "B20": 4,
    # B21
    "B22": 1,
    "B23": 2,
    "B24": 3,
    "B25": "=B2*B4",
    "B26": 5,
    "B27": 6,
    "B28": "=B19*B20*B22",
    "C22": "=SUM(B22:B28)",
  }

from xlcalculator import ModelCompiler
from xlcalculator import Model
from xlcalculator import Evaluator

compiler = ModelCompiler()
my_model = compiler.read_and_parse_dict(input_dict)
evaluator = Evaluator(my_model)

for formula in my_model.formulae:
    print("Formula", formula, "evaluates to", evaluator.evaluate(formula))

# cells need a sheet and Sheet1 is default.
evaluator.set_cell_value("Sheet1!B22", 100)
print("Formula B28 now evaluates to", evaluator.evaluate("Sheet1!B28"))
print("Formula C22 now evaluates to", evaluator.evaluate("Sheet1!C22"))

produces this output;

Formula Sheet1!B25 evaluates to 950.0
Formula Sheet1!B28 evaluates to 0.004
Formula Sheet1!C22 evaluates to 967.004
Formula B28 now evaluates to 0.4
Formula C22 now evaluates to 1066.4

Thanks @ggorlen for picking this up. I have edited the answer to ensure there is no ambiguity. — bradbase, Aug 07 '20 at 03:16
This sounds great, but unfortunately I get an `AttributeError: module 'xlcalculator.xlfunctions.xl' has no attribute 'is_float'` ? — rvrvrv, Sep 23 '20 at 19:25
@rvrvrv Thank you for bringing this up. This bug has been fixed. The fix will be released in xlcalculator 0.2.9. This use case is now part of the suite of tests in the project. — bradbase, Sep 24 '20 at 04:46

score 2 · Accepted Answer · answered Feb 11 '17 at 19:45

You could use regex (regular expressions) and Python's eval function.

Let's assume we have

d = {'A1': '=A2+A3',
     'A2': '=SUM(A3:A5)',
     'A3': 3,
     'A4': 6,
     'A5': -1,
     ...}

the entire function would look like

import re
def g(s):
    """Excel-like evaluation with recurrence"""
    if isinstance(s,(int, float)):
        return s
    s=re.sub(r'=', '', s)
    s=re.sub(r'SUM\(([A-Z])([0-9]):([A-Z])([0-9])\)','sum([g(d[chr(i)+str(j)]) for j in range(\g<2>,\g<4>+1) for i in range(ord("\g<1>"), ord("\g<3>")+1)])',s)
    s=re.sub(r'([A-Z][0-9])',r'g(d["\1"])',s)
    return eval(s)

For example

>>> print(g(d['A1']))
11

Let's have a look at the single steps:

First of all we get rid of the =. One could alternatively write a test and only evaluate the formula if it starts with a =, up to the reader.
With re.sub(r'([A-Z][0-9])', r'g(d["\g<1>"])', any_string) one replaces a group with one capital letter and a number (e.x. 'A3') with the dictionary look-up of it (e.x. 'g(d["A3"])')
- we need to apply g() again if the new cell value is still a formula (recurrence)
- Note: If there are entries like 'A$3' or '$AB$4' as they occur in Excel, one could use r'$?([A-Z]+)$?([0-9])' as search pattern and r'd["\g<1>\g<2>"]' as substitution.
Then we can evaluate this string with eval(). Up to now one can use all implemented Python operations such as +, -, *, /, //, %, etc.
All other functions need to be implemented manually by substituting them with Python expressions. Her is an example for SUM(A3:B10):

With r'SUM$([A-Z])([0-9]):([A-Z])([0-9])$' we search for the formula.

[chr(i)+str(j) for j in range(\g<2>,\g<4>+1) for i in range(ord("\g<1>"), ord("\g<3>")+1)]] gives us all table indices that are in the sum. Then we apply g(d[...]) on each of them (recurrence) and take the sum.

This can be extended to any Excel formula of course.

Kind of digging this up from the far past, but trying to get it to work while passing on the dictionary as a variable as well (because I want to use this function for different dictionaries). The reference to `d` seems hardcoded, so I thought if I pass it on to the function like so `def g(s, d)` it might work -- unfortunately it doesn't... (`TypeError: g() missing 1 required positional argument: 'd'`) Would you happen to know how to solve this? Thanks! — rvrvrv, Sep 21 '20 at 17:17
Ah wait I found it, need to add the extra variable also to the calls generated *within* the function! On line 7: `g(d[chr(i)+str(j)], d)` and on line 8: `r'g(d["\1"], d)'` — rvrvrv, Sep 21 '20 at 17:27

score 0 · Answer 3 · edited May 23 '17 at 11:53

You'll need something to parse Excel formulas and convert them into a form which allows executing the calculations.

A quick search brings up pycel as the most promising Python library for this. It does not support all of Excel's functions and syntax, but it should probably support what you need, and it definitely supports the formulas in the example you posted.

Also see this answer to a similar SO question. As it mentions, you can also actually connect to Excel, have it execute all of the calculations, and then just read the results. One way to do this is using the win32com library, as detailed in the previously mentioned answer.

Excel formula evaluation in Pandas

3 Answers3

Linked