-2

I want to write a Learning Algo which can automatically create summary of articles .

e.g, there are some fiction novels(one category considering it as a filter) in PDF format. I want to make an automated process of creating its summary. We can provide some sample data to implement it in supervised learning approach. Kindly suggest me how can i implement this properly.

I am a beginner & am pursuing Andrew Ng course and aware of some common algorithms(linear reg, logistic , neural net) + Udacity Statistics courses and ready to dive more into NLP , Deep learning etc. but motive is to solve this. :) Thanks in advance

Rahul Saxena
  • 422
  • 1
  • 9
  • 22
  • 3
    this is broad and unsolved topic. I do not think it is a good idea to tackle this kind of problem being a beginner. If you really feel that you have to - simply google any recent paper on the topic and try to reimplement their idea (as I said - this is broad and unsolved issue, there are hundreads of "solutions" which do something, and not single one which **really** works). – lejlot Jul 01 '16 at 08:36

1 Answers1

3

The keyword is Automatic Summarization.

Generally, there are two approaches to automatic summarization: extraction and abstraction.

  • Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary.
  • Abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might generate.

Abstractive summarization is a lot more difficult. An interesting, approach is described in A Neural Attention Model for Abstractive Sentence Summarization by Alexander M. Rush, Sumit Chopra, Jason Weston (source code based on the paper here).

A "simple" approach is used in Word (AutoSummary Tool):

AutoSummarize determines key points by analyzing the document and assigning a score to each sentence. Sentences that contain words used frequently in the document are given a higher score. You then choose a percentage of the highest-scoring sentences to display in the summary.

You can select whether to highlight key points in a document, insert an executive summary or abstract at the top of a document, create a new document and put the summary there, or hide everything but the summary.

If you choose to highlight key points or hide everything but the summary, you can switch between displaying only the key points in a document (the rest of the document is hidden) and highlighting them in the document. As you read, you can also change the level of detail at any time.

Anyway automatic data (text) summarization is an active area of machine learning / data mining with many ongoing researches. You should start reading some good overviews:

manlio
  • 18,345
  • 14
  • 76
  • 126