1

The main problem I'm having is parsing from the website to my program. I got it to print out the source code. Also if it doesn't contain 'http://' I need to add it. I really don't understand how to parse strings .

import java.net.*; 
import java.io.*; 
import java.util.Scanner;
public class Project6 { 
  public static void main (String [] args) throws Exception { 

    Scanner sc = new Scanner(System.in); 
    System.out.print("Please enter the URL. "); 
    String web= sc.nextLine(); 
    String foo = "http://allrecipes.com/";


//is "web" have an allrecipes.com url?
//if it doesn't, then exit
if ( web.equals(foo)) {  
  StringBuilder s = new StringBuilder(); 
URL recipes  = new URL (web); 
BufferedReader in = new BufferedReader(new InputStreamReader(recipes.openStream()));

String  inputLine; 

while ((inputLine = in.readLine ())!= null) 
  System.out.println(inputLine);
in.close(); 

}
else { 
   System.out.println("I'm  sorry, but that is not a valid allrecipes.com URL."); 
  System.exit(0); 
//does "web" start with "http://"
//if it doesn't, add it
}
  • look here.. sort of same http://stackoverflow.com/questions/9580684/how-to-retrieve-title-of-a-html-with-the-help-of-htmleditorkit – Balaji Krishnan Oct 18 '13 at 07:48
  • You shouldn't be using `web.equals(foo)`, because you need to handle for if the user forgot http:// and if they entered a subdomain. A better check would be `web.indexOf("allrecipes") != -1`, that makes sure at least the domain is there. – William Gaul Oct 18 '13 at 07:50
  • Use Pattern Matching i.e Regular Expression for printing out certain things from website – Prateek Oct 18 '13 at 07:59

4 Answers4

1

Parsing HTML on your own is not a good idea. I would propose using jsoup library, which really helps with parsing and selecting elements.

Your code could look something like this with jsoup:

Document doc = Jsoup.connect(web).get();
Elements title = doc.select("title");

It is concise, readable and you can easily parse/select other elements if you need (eg. more complex css selectors like #recipes > div #recipe-title)

Juraj Blahunka
  • 17,913
  • 6
  • 34
  • 52
0

You are looking for a web crawler. Just a couple: JSoup & Selenium(CSS selectors to retrieve elements), crawler4j(I haven't used it).

Silviu Burcea
  • 5,103
  • 1
  • 29
  • 43
0

Then your if condition should be

if(web.equlas(foo) || web.equlas(foo.replaceAll("http://", "")){


}

The above test passes if web equals to

http://allrecipes.com/

or

allrecipes.com/

As a side note: http://allrecipes.com/ <-- . There is no need for / in the end I guess.

Suresh Atta
  • 120,458
  • 37
  • 198
  • 307
0

Match input from foo :

Scanner sc = new Scanner(System.in);
System.out.print("Please enter the URL. ");
String web = sc.nextLine(); // Suppose "allrecipes.com";
String foo = "http://allrecipes.com"; // no need of / like this http://allrecipes.com/

// is "web" have an allrecipes.com url?
// if it doesn't, then exit
if (foo.matches(web) || foo.matches("http://"+web)) {
 ..........
}

In above case if user has entered allrecipes.com or http://allrecipes.com then only will be able to proceed further

Prateek
  • 12,014
  • 12
  • 60
  • 81