-1

Rather than create for-loops/nested for-loops, is there a more Pythonic way to write a fixed-width string using a meta-data dictionary?

My input is as follows:

{
 't_order': 11112014,
 't_date': 20150101,
 't_external': 'from sample',
 't_mode': 'A',
 'message_id': 'ID01',
 't_value': 123.45
}

And my meta-dict looks as such:

[{'field': 'message_id',
  'decimalPrecision': '0',
  'isTypeOf': 'C',
  'Length': '8',
  'Level': '0',
  'Start': '1'},
 {'field': 't_mode',
  'decimalPrecision': '0',
  'isTypeOf': 'C',
  'Length': '1',
  'Level': '0',
  'Start': '9'},
 {'field': 't_order',
  'decimalPrecision': '0',
  'isTypeOf': '9',
  'Length': '8',
  'Level': '0',
  'Start': '10'},
 {'field': 't_external',
  'decimalPrecision': '0',
  'isTypeOf': 'C',
  'Length': '25',
  'Level': '0',
  'Start': '18'},
 {'field': 't_date',
  'decimalPrecision': '0',
  'isTypeOf': '9',
  'Length': '8',
  'Level': '0',
  'Start': '43'},
 {'field': 't_value',
  'decimalPrecision': '4',
  'isTypeOf': '9',
  'Length': '18',
  'Level': '0',
  'Start': '51'}]

Anything of isTypeOf == C is a str and isTypeOf == 9 is an int. The Start value is the start of the string position and Length is the length of the field that is left-space-padded. The numeric field doesn't include a decimal point and is right-zero-padded in the precision. That being said, the given sample would read:

ID01 A11112014from sample 20150101 1234500

What would be a more efficient way instead of loops/nested for-statements?

Carlos
  • 1,897
  • 3
  • 19
  • 37
  • 1
    If all fields are neccessarily filled, you could iterate through the meta dictionary and build a format string. Then, you'd just need to pass the dictionary to the format method and it would build a string accordingly. – Fran Borcic Mar 11 '15 at 00:29
  • 1
    Have you made any attempt at a solution? – jedwards Mar 11 '15 at 00:35
  • Oh, I have an idea of how to do this but it would involve for-loops (nested and otherwise) but doing so hurts performance as I'm looking to process upwards in the hundreds of thousands of messages. I'm trying to get some ideas from the community on how they would approach this problem. Thanks! – Carlos Mar 11 '15 at 00:41
  • @FranBorcic, that is sort of the idea I was initially going with but then it lead me down a path of nested for-loops which looked really nasty (and didn't ran as fast as I'd like). Perhaps you can elaborate? – Carlos Mar 11 '15 at 00:43
  • Do you actually mean "left-space-padded" = "right aligned, with spaces padding"? Or do you mean "left aligned, with spaces padding"? Same for "right-zero-padded" – jedwards Mar 11 '15 at 00:45
  • @jedwards, let me clarify. left-space-padded is right-aligned, with spaces padding. For numeric, it is right-aligned (zero padded) with spaces padding (on the left). Hope that helps. – Carlos Mar 11 '15 at 00:47
  • If numerics are right aligned with zeros padding on the left -- why does `123.45` become `1234500`? Also, the length says 18 but you have it at 7. – jedwards Mar 11 '15 at 00:58
  • The numeric starts at the first white-space after the date. By including the white-space, you'll see that it's space-padded, right-aligned. – Carlos Mar 11 '15 at 01:19

2 Answers2

1

I have to imagine there's a better way to do this, but this seems to work:

def extract_elem(input_dict, meta_elem):
    val  = input_dict[meta_elem['field']]
    off  = int(meta_elem['Start'])
    flen = int(meta_elem['Length'])
    if meta_elem['isTypeOf'] == 'C':        # String
        return off, val.ljust(flen, ' ')
    if meta_elem['isTypeOf'] == '9':        # Float / Int
        prec = int(meta_elem['decimalPrecision'])
        fmt = "%%.%df" % prec
        val = (fmt % val).replace('.','')
        return off, val.rjust(flen, ' ')

def extract(input_dict, meta_list):
    s = ['' * 200]
    for m in meta_list:
        off, val = extract_elem(_input_dict, m)
        end = off + len(val)
        s[off:end] = val

    return ''.join(s)

print extract(_input_dict, _meta_list)

Outputs:

ID01    A11112014from sample              20150101           1234500
jedwards
  • 29,432
  • 3
  • 65
  • 92
  • This is what I was looking for, some code to help me think outside the norm. Much appreciated! I'll try it out, throw it against the wall and see how it sticks. – Carlos Mar 11 '15 at 01:18
0

I would strongly suggest that for loops would be the most pythonic way to do this. Python favors readability over brevity

Complex is better than complicated

That said, if your metadata is a dictionary of dictiomaries with the keys as the field names, you can index into the dictionary and avoid nesting loops.

The really pythonic thing to do would be to make sure the original records are being generated by a generator rather than loaded into a list.

Nick Bailey
  • 3,078
  • 2
  • 11
  • 13
  • Well, you do raise a valid point. I'm trying to keep the code as readable as possible, it's just that I'm looking to consider other `built-in` functions that I might not have otherwise thought of. – Carlos Mar 11 '15 at 00:51
  • 1
    Have you thought about using map to apply your processing? It might or might not improve performance. – Nick Bailey Mar 11 '15 at 00:53
  • Yes, I have thought about map/lambda but I'm not too familiar with the functions. I've been reading up on them and see how I can apply them to my problem. – Carlos Mar 11 '15 at 01:34