Trying to seek some guidance on the best way of curating an extensive ETL process. My pipeline has a reasonably sleek extract section, and loads into a designated file in a succinct manner; but the only way I can think to do transformation steps is a series of variable assignments:
a = ['some','form','of','petl','data']
b = petl.addfield(a, 'NewStrField', str(a))
c = petl.addrownumbers(b)
d = petl.rename(c, 'row', 'ID')
.......
Reformatting to assign the same variable name makes some sense, but doesn't aid readability:
a = ['some','form','of','petl','data']
a = petl.addfield(a, 'NewStrField', str(a))
a = petl.addrownumbers(a)
a = petl.rename(a, 'row', 'ID')
.......
I've read up on multiple method calls like this:
a = ['some','form','of','data']
result = petl.addfield(a, 'NewStrField', str(a))
.addrownumbers(a)
.rename(a, 'row', 'ID')
.......
but that won't work, as the functions require the table as the first parameter passed.
Is there some fundamental I am missing? I'm loathe to believe that the right way of doing this commercially involves 1000+ LOC?