0

I have row of records which has multiple columns. Different columns could contain values of different types like Long, String, Date etc but one column would have all values of same type. I am trying to write some kind of generic parser which can deal with this. For example If I get different set of records, I should be able to configure my parser.

Input1

1, Jitendra, 2011-02-12
2, Xyz, 2011-02-13

Input2

XYZ, 34.00, 1
ABC, 56.00, 3

Something like this.

Class Parser {
  int columnNo;
  String columnName;
  <Something here to identify data type of column> dataType;
}

Class ParsedRecord{
  String name;
  <Datatype> value;
}

Does this approach seem to be feasible? Any suggestions?

Thanks

Vitaly Olegovitch
  • 3,509
  • 6
  • 33
  • 49
RandomQuestion
  • 6,778
  • 17
  • 61
  • 97
  • This would be a better fit on http://codereview.stackexchange.com. – Kirk Woll Mar 30 '12 at 21:10
  • Normally, while your data may contain different values in each column, the data types are uniform across all rows (that is to say, you won't have data that's `String | String | int | Date` then `Date | String | int | String`). Is your data formed properly in the sense that your data is uniform, or do you want to handle malformed data? – Makoto Mar 30 '12 at 21:11
  • @KirkWoll, I'm not sure I agree. Code Review handles peer review of code (efficiency, correctness, etc), but this is a valid design question for SO. – Makoto Mar 30 '12 at 21:12
  • @Makoto, this question is open-ended and subjective. It's a very poor fit for SO. (if you're saying "look at my code, any suggestions?" then the question doesn't belong here) – Kirk Woll Mar 30 '12 at 21:13
  • @KirkWoll, again I disagree. It's not a question of "any suggestions on what I can do", it's still a design question in my eyes. You're free to disagree though. – Makoto Mar 30 '12 at 21:16
  • @Makoto Data would be uniform. For e.g. It would always be `String | String | int | Date`. But column value itself could be malformed. For e.g. date not in proper format or String value in int column. For such cases I would throw some exception. – RandomQuestion Mar 30 '12 at 21:17

4 Answers4

1

There are frameworks that allow you to configure the record structure into one XML configuration file and will parse files for you. You can use:

Your question is related to: Converting Flat File to Java Objects

Community
  • 1
  • 1
Vitaly Olegovitch
  • 3,509
  • 6
  • 33
  • 49
  • Unfortunately, none of these frameworks exist in my company currently. Getting them approved and importing would be time consuming task. – RandomQuestion Mar 30 '12 at 21:22
  • So you have to build your own one. I would suggest using a small xml file for the file format configuration. The generic field would be: . – Vitaly Olegovitch Mar 30 '12 at 21:30
0

Sure it seems feasible if you have a limited set of possible data types. Just scan in all the information into your strings, then iterate over all values in a column to determine what the type is, then either re-parse or parse what you've scanned in already.

Dave
  • 5,133
  • 21
  • 27
0

Your bean would obviously contain an Object for the column's value. But you could inject a Parser that would read the data from the record and convert to the appropriate type of object. For example you would have an IntegerParser, LongParser, DateParser, etc. Then construct the overall parser by assembling the correct list of column parsers:

<bean id="myparser>
   <constructor-arg columns>
       <list>
          <value ref="DateParser"/>
          <value ref="IntegerPaser"/>
          ....
       </list>
   </constructor-arg>
 </bean>
John B
  • 32,493
  • 6
  • 77
  • 98
  • One question on this. Assuming all these parser implementations would implement some Parser interface with method `Object parse(String)`, how would I distinguish actual data type of return object – RandomQuestion Mar 30 '12 at 22:22
  • Your business logic would have to be written to expect / handle certain data types. You can only get so far with a completely generic solution. At some point you have to know what you want to do with your data. – John B Mar 31 '12 at 02:08
0

Based on your clarification to your question (I'd encourage editing that into the question so others answering can help), a generic parser might work, but I would be more concerned about how your data was coming in, and how it was formatted. (Why is it inconsistent?)

Your data, as you clarified, would be uniform across all rows (as in, you would expect exactly two types of String, one type of int, and one type of Date). However, if the values in the file can appear in a different order, you would have malformed data, and the file shouldn't be trusted at face value. It would be a better choice to guarantee data integrity instead, as two Strings in any order is highly ambiguous (is it a name? An address?).

If you really wish to continue, it may be inefficient, but I would encourage data sanity checking instead of polymorphism in this case. Whenever you're reading from the file, send it to a helper method, and see if it fits a specific type and format. If it doesn't, move it to the next helper until all data can be successfully parsed and read in.

Makoto
  • 104,088
  • 27
  • 192
  • 230