I want to scan through an html page and count the number of times "." (period) shows up. Here I have some code that reads the html, and prints out the desired output.
I was thinking to modify this code, but seeing as this is a straightforward question, perhaps we don't need to go through the trouble of modifying it; rather, we can just go straight to coding a new program.
Here's the code where I read the html of a webpage (a lot of it should be unnecessary code):
import edu.duke.*;
public class URLFinder {
public StorageResource findURLs(String url) {
URLResource page = new URLResource(url);
String source = page.asString();
StorageResource store = new StorageResource();
int start = 0;
while (true) {
int index = source.indexOf("href=", start);
if (index == -1) {
break;
}
int firstQuote = index+6; // after href="
int endQuote = source.indexOf("\"", firstQuote);
String sub = source.substring(firstQuote, endQuote);
if (sub.contains(".")) {
store.add(sub);
}
start = endQuote + 1;
}
return store;
}
public void testURL() {
StorageResource s1 = findURLs("http://www.dukelearntoprogram.com/course2/data/newyorktimes.html");
//StorageResource s2 = findURLs("http://www.doctorswithoutborders.org");
for (String link : s1.data()) {
System.out.println(link);
}
System.out.println("size = " + s1.size());
//System.out.println("size = " + s2.size());
}
}