3

I am really looking for a toolkit or readymade tool which will parse a given document and then generate a brief summary of better still a mindmap of the document. I know Python has ntlk and perl has quite a few modules which will help in natural language parsing etc. It is even feasible to write a tool to do so, with using ntlk like tool kit, but for the lack of time. Would appreciate if you know of some such tool or has some pointer to such a tool, if you could post it here, with thanks in advance.

datta
  • 29
  • 1

2 Answers2

1

Someone (here on SO) has already written it for you (discussion). Another option would be TexLexAn (Text Analyzer Classifier Summarizer).

Community
  • 1
  • 1
TryPyPy
  • 6,214
  • 5
  • 35
  • 63
0

Google people may already be working on such a thing. ;-)

If I get you right, you want a tool that will read a book for you and then briefly summarize for you what it was all about so you can spare the time reading it yourself. Maybe you're not interested in the contents but rather want to categorize the material, as a librarian for example.

That may be technically possible for very structured text with many very similar documents in a very specialized area, say mathematical proofs of theses or experimental results or medical reports. Surely it would be possible to have a tool that can distinguish between a novel and a phone book to roughly sort through literature. Obviously it's very easy to provide page or word counts, identify the written language etc. because these parameters can be clearly defined.

Quite surely though, computers will fail trying to get a grasp of actual stories, anything more conversational or casual. So to decide who's the good guy and who's the bad one, or whether the piece at hand is a love novel featuring detectives or a criminal thriller where a detective is in love with somebody, a machine would have no chance to decide what's what with any feasible amount of memory, CPU power, and knowledge database.

Maybe it would help if you could be more specific regarding the actual purpose for which you want to use this tool.

Olfan
  • 579
  • 6
  • 17
  • thanks for the reply. Actually I am more interested in parsing the documents/papers in bioinformatics and genomics domain at the moment, so the domain is "constrained" if we could say so :-). At the moment I looking for a simple utility which will do a simple parsing of the content and generate a map of the document in a tree fashion. I could, over time, provide it with a list of phrases, or words which should be considered associated etc. if nothing comes up I might have to put something together. – datta Jan 19 '11 at 11:34