I get daily data feeds with data that is only loosely structured. I need to import it into a database so I can run a report that finds new records and changes to existing records.
The data looks like this:
--------------------------------
blah:
foo
bar
lorum: ipsum
dolor: sit
foo: bar
bar: foo
123-555-1212
Lorum / Ipsum / Dolor / Sit
Foo / Bar
--------------------------------
As you can see there are some field headings like "blah", "lorum", etc. but some data lacks a heading, like the phone number or slash delimited list. And some headings are on the same line and others are not.
Just to keep us on our toes, the records do not have the same number of fields.
So I'm thinking that parsing needs to have at least 3 ways to parse the data like,
if "heading:$" then grab the next lines until the next "*.:" is read and grab "heading: value" and if line starts with number assume heading of "phone" and if line contains slash delimited list assume heading "features" until "--------..."
But I have no idea how to start coding something like this. The language is open at this point although I have to run the code in MacOS.
I suppose perl might be good for this, but have very poor perl foo.
Don't even know where to start with this one.