0

How to delete common words from two documents thats extracted from two websites? I already extracted the news from two sites now I want to delete the common words from the two documents. I used the following code to extract news from two different websites:

from __future__import unicode_literals
import feedparser
import re

d=feedparser.parse('http://feeds.bbci.co.uk./news/rss.xml')
i=0
for post in d.entries
  titl = post.title
  desc = post.description
  titl2 = tit1.replace('\\'," ")
  desc1 = desc.replace('/'," ")
  print(str(i) + ' ' + titl2)
  i=i+1
print "indian Express"

g=feedparser.parse('http://www.rssmicro.com/rss.web?q=Android')
i=0
for pos in g.entries:
  tit = post.title
  #desc=post.description
  tit4 = tit.replace('\\'," ")
  print(str(i) + ' ' + tit4)
  i=i+1
SergGr
  • 23,570
  • 2
  • 30
  • 51
Anila
  • 1
  • Although Improve code formatting, it is still absolutely not clear what exactly you are trying to achieve. What are those "common words" that you are trying to remove? Could you provide some example with some data? – SergGr Mar 15 '17 at 06:19
  • It looks like you ask the same question from more accounts. Please read and follow [I accidentally created two accounts; how do I merge them?](http://stackoverflow.com/help/merging-accounts) – JosefZ Mar 16 '17 at 00:31

0 Answers0