Is there any library, preferably in python but at least open source, that can summarize and or simplify natural-language text?
-
There is another library which is based on the 'TextRank' algorithm which you can find here. https://github.com/RaRe-Technologies/gensim – prashanth Oct 21 '16 at 20:57
-
1There is hardly any program which can do this. – too honest for this site Dec 04 '17 at 13:22
7 Answers
Maybe you can try sumy. It's a quite small library that I wrote in Python. There are implemented Luhn's and Edmundson's approaches, LSA method, SumBasic, KL-Sum, LexRank and TextRank algorithms. It's Apache2 licensed and supports Czech, Slovak, English, French, Japanese, Chinese, Portuguese, Spanish and German languages.
Feel free to open an issue or send a pull request if there is something you are missing.

- 2,327
- 1
- 23
- 25
-
1I love Sumy. Its very easy to use. What is your prefered method? Isn't `LSA` the most recent natural language methodology and theoretically the best one compared to the other options? – Houman Jul 29 '15 at 19:45
-
3Hi, thanks. As with almost everything there is no silver bullet, but LSA is the most advanced method in sumy. – Mišo Aug 01 '15 at 10:52
-
1I've done a ton of testing with sumy on wikipedia articles and peer-reviewed articles, and I personally get by far the best results with KL, but it also takes about 200 times longer than any of the other summarizers. – Xodarap777 Apr 02 '20 at 01:24
-
@Xodarap777 can you write what other summarizers did you try? And maybe even link to your code with experiments? – Mišo Apr 02 '20 at 06:38
-
@Xodarap777 can you please share what type of measurement you used to compare the produced summaries? – CtrlMj Aug 29 '20 at 15:51
I'm not sure if there is currently any libraries that do this, as text summarization, or at least understandable text summarization isn't something that will be easily accomplished by a simple plug & play library.
Here are a few links that I managed to find regarding projects / resources that are related to text summarization to get you started:
- The Lemur Project
- Python Natural Language Toolkit
- O'Reilly's Book on Natural Language Processing in Python
- Google Resource on Natural Language Processing
- Tutorial : How to create a keyword summary of text in Python
Hope that helps :)

- 9,518
- 6
- 36
- 58

- 74,820
- 37
- 200
- 327
-
1Some dead links in the answer, replaced with cached pages from https://archive.org/web/ – Nick Bull Sep 12 '16 at 13:31
I needed also the same thing but I couldn't find anything in Python that helped me have a Comprehensive Result.
So I found this Web Service really useful, and they have a free API which gives a JSON result, and I wanted to share it with you.
Check it out here: http://smmry.com

- 581
- 1
- 4
- 17
Try Open Text Summarizer which is released under the GPL open source license. It works reasonably well but there has been no development work on it since 2007.
The original code is written in C (both a library and a command line utility) but there are wrappers to it in a number of languages:

- 12,246
- 2
- 25
- 20
Take a look at this article which does a detailed study of these methods and packages:
- Lex_rank (sumy)
- LSA (sumy)
- Luhn (sumy)
- PyTeaser
- Gensim TextRank
- PyTextRank
- Google TextSum
The ending of the article does a 'summary'.
The author of sumy @miso.belica has given a description in an answer above.
Various other ML techniques have risen, such as Facebook/NAMAS and Google/TextSum but still need extensive training in Gigaword Dataset and about 7000 GPU hours. The dataset itself is quite costly.
In conclusion I would say sumy is the best option in the market right now if you don't have access to high-end machines. Thanks a lot @miso.belica for this wonderful package.

- 2,068
- 2
- 22
- 33
Not python but MEAD will do text summarization (it's in Perl). Usually what comes out is comprehensible, if not always particularly fluent sounding. Also check out summarization.com for a lot of good information on the text summarization task.

- 3,677
- 1
- 25
- 26
A while back, I wrote a summarization library for python using NLTK, using an algorithm from the Classifier4J library. It's pretty simple but it may suit the needs of anyone that needs summarization: https://github.com/thavelick/summarize

- 67,400
- 20
- 54
- 64