0

My focus is to parse some plain text, which has the information about the users, including what browser, version, and what operating systems each user used, and extract such information (browser/version, operating system/version....)

Is there any general frame work/libs (in Java[preferred] or C++) that has a similar problem focus? I believe that each parsing problem may have different properties and need slightly different approaches, but if you are aware of any, please share or suggest, as that would be helpful to clarifying the steps of the problem, and possibly avoiding repeated work or bugs or increasing effectiveness.

I found a schema by Oracle for text analysis, which seems pretty interesting: (http://www.oracle.com/webfolder/technetwork/data-quality/edqhelp/Content/processor_library/text_analysis/parse.htm)

The amount of input data can be fairly big, but to make the question simple, we can ignore the largeness of data for now.

Simo
  • 2,292
  • 5
  • 30
  • 45
  • In general, text parsing is unique to the data file format. Common data file formats may have libraries; otherwise you have to write your own. Search for "c++ read file" or "c++ parse text file". – Thomas Matthews May 21 '13 at 22:59
  • Yeah. I am aware of that. So what I look for is something similar to the text analysis of Oracle (URL in the main post). Unfortunately, could not find its concrete implementation or jar. Not sure if it's available for public though. – Simo May 21 '13 at 23:04
  • If you are defining the format then XML, JSON, or protocol-buffers are all good starting points. If someone else defined the format, then please explain it. – brian beuning May 21 '13 at 23:55

0 Answers0