0

I extracted data from blogs using article extractor which returns articles in a string format. Since some pages have sub-links that go into news content I want that data to be extracted too. So, how can I access the data that is inside the sub-links? My code is this:

String news =" ";
try
{                   
    URL url;
    url = new URL("http://www.firstpost.com/tag/crime-in-india");
    InputSource is = HTMLFetcher.fetch(url).toInputSource();        
    BoilerpipeSAXInput in = new BoilerpipeSAXInput(is);
    TextDocument doc = in.getTextDocument();        
    news = ArticleExtractor.INSTANCE.getText(doc);
}
WiredPrairie
  • 58,954
  • 17
  • 116
  • 143
chopu
  • 27
  • 1
  • 10
  • What part of the data is to be converted to json? – StoopidDonut Jan 25 '14 at 17:27
  • @PopoFibo the end result "news" is to be converted to j son.But before that I want to extract articles/news in sub-links which contain the entire main news content.Obviously if there exist a sub link in a news article then it should also be extracted to get complete content – chopu Jan 25 '14 at 17:49
  • URL url; url = new URL("blogs.timesofindia.indiatimes.com/mellowdrama/entry/…); InputSource is = HTMLFetcher.fetch(url).toInputSource(); BoilerpipeSAXInput in = new BoilerpipeSAXInput(is); TextDocument doc = in.getTextDocument(); news1=ArticleExtractor.INSTANCE.getText(doc); XMLSerializer xmlSerializer = new XMLSerializer(); JSON json = xmlSerializer.read(news1); } catch(Exception e) { e.printStackTrace(); } – this is the code snippet..but it shows an exception in last line – chopu Jan 25 '14 at 17:57

2 Answers2

0
import net.sf.json.xml.XMLSerializer;


XMLSerializer xmlSerializer = new XMLSerializer(); 
JSON json = xmlSerializer.read( news );  
jww
  • 97,681
  • 90
  • 411
  • 885
java seeker
  • 1,246
  • 10
  • 13
  • URL url; url = new URL("http://blogs.timesofindia.indiatimes.com/mellowdrama/entry/india-needs-a-law-against-community-crime"); InputSource is = HTMLFetcher.fetch(url).toInputSource(); BoilerpipeSAXInput in = new BoilerpipeSAXInput(is); TextDocument doc = in.getTextDocument(); news1=ArticleExtractor.INSTANCE.getText(doc); XMLSerializer xmlSerializer = new XMLSerializer(); JSON json = xmlSerializer.read(news1); } catch(Exception e) { e.printStackTrace(); } – chopu Jan 25 '14 at 17:26
  • are you getting exception? – java seeker Jan 25 '14 at 17:28
  • it shows error and prints exception for last line :Exception in thread "main" java.lang.Error: Unresolved compilation problem: The method read(String) is undefined for the type XMLSerializer at Blog2.main(Blog2.java:44) – chopu Jan 25 '14 at 17:41
  • I have this error The method read(String) is undefined for the type XMLSerializer, did you solve it? – Elena Mar 10 '15 at 09:55
0

Check your library imports in your build path - especially in Eclipse

I had this issue with 2 separate projects and it turned out I had older version libraries of net.sf.json in the json-lib-2.4-jdk15.jar (had older versions as well)