My focus is to parse some plain text, which has the information about the users, including what browser, version, and what operating systems each user used, and extract such information (browser/version, operating system/version....)
Is there any general frame work/libs (in Java[preferred] or C++) that has a similar problem focus? I believe that each parsing problem may have different properties and need slightly different approaches, but if you are aware of any, please share or suggest, as that would be helpful to clarifying the steps of the problem, and possibly avoiding repeated work or bugs or increasing effectiveness.
I found a schema by Oracle for text analysis, which seems pretty interesting: (http://www.oracle.com/webfolder/technetwork/data-quality/edqhelp/Content/processor_library/text_analysis/parse.htm)
The amount of input data can be fairly big, but to make the question simple, we can ignore the largeness of data for now.