I am working on parsing formulas written in an internal syntax. I am working with Lark. Its the first time im doing this, please bear with me.
The formulas look something like this:
MEAN(1,SUM({T(F_01.01)R(0100)C(0100)S(AT)[T-1Y]},{T(F_01.01)R(0100,0120)C(0100)S(AT)[T-1Y]})))
In a first step I would like to convert the above into something like this:
MEAN(1,SUM(F_01.01_r0100_c0100_sAT[T-1Y],F_01.01_r0100_c0100_sAT[T-1Y],F_01.01_r0120_c0100_sAT[T-1Y])))
Here an example of the code:
from lark import Lark,Transformer
grammar = """
?start:
| NUMBER
| [symbols] datapoints ([symbols]+ datapoints)* [symbols]
?symbols.1:
| /\+/
| /\-/
| /\//
| /\*/
| /\*\*/
| /\,/
| /\(/
| /\)/
| /\w+/
?datapoints.2:
| "{" "T" "(" TABLE ")" [ "R" "(" ROW ")"] ["C" "(" COLUMN ")"] ["S" "(" SHEETS ")"] [TIME_SHIFT] "}" -> its_data_point
| "{" "SPE.DPI" "(" CNAME ")" [TIME_SHIFT] "}" -> ste_data_point
TIME_UNIT: "M" | "Q" | "Y"
TIME_SHIFT: /\[T\-/ INT TIME_UNIT /\]/ | /\[PYE\]/
TABLE: /[A-Z]{1}/ "_" (/\d{3}/ | /\d{2}/) "." /\d{2}/ ["." /[a-z]/]
ROW: /\d{4}/ (/\,\d{4}/)*
COLUMN: /\d{4}/ (/\,\d{4}/)*
SHEETS: /[a-zA-T0-9_]+/ ("," /a-zA-T0-9_/)*
OTHER: /[a-zA-Z]+/
%import common.WS_INLINE
%import common.INT
%import common.CNAME
%import common.NUMBER
%ignore WS_INLINE
"""
sp = Lark(grammar)
class MyTransFormer(Transformer):
def __init__(self):
self.its_data_points = []
def its_data_point(self,items):
t,r,c,s,ts=items
res = []
for row in r.split(','):
res.append(str(t)+'_r'+ str(row)+'_c'+str(c)+'_s'+str(s)+str(ts))
self.its_data_points += res
return ','.join(res)
def __default_token__(self, token):
return str(token.value)
def __default__(self, data, children, meta):
return ''.join(children)
teststr="MEAN(1,SUM({T(F_01.01)R(0100,0120)C(0100)S(AT)[T-1Y]},{T(F_01.01)R(0100)C(0100)S(AT)[T-1Y]}))"
tree = sp.parse(teststr)
mt = MyTransFormer()
print(mt.transform(tree))
but with this i get:
MEANMEAN(1,SUM(F_01.01_r0100_c0100_sAT[T-1Y],F_01.01_r0120_c0100_sAT[T-1Y],F_01.01_r0100_c0100_sAT[T-1Y]))
why do I get a 'mean' twice ?