0

I want to scan through an html page and count the number of times "." (period) shows up. Here I have some code that reads the html, and prints out the desired output.

I was thinking to modify this code, but seeing as this is a straightforward question, perhaps we don't need to go through the trouble of modifying it; rather, we can just go straight to coding a new program.

Here's the code where I read the html of a webpage (a lot of it should be unnecessary code):

import edu.duke.*;


public class URLFinder {
    public StorageResource findURLs(String url) {
        URLResource page = new URLResource(url);
        String source = page.asString();
        StorageResource store = new StorageResource();
        int start = 0;
        while (true) {
            int index = source.indexOf("href=", start);
            if (index == -1) {
                break;
            }
            int firstQuote = index+6; // after href="
            int endQuote = source.indexOf("\"", firstQuote);
            String sub = source.substring(firstQuote, endQuote);
            if (sub.contains(".")) {
                store.add(sub);
            }
            start = endQuote + 1;
        }
        return store;
    }

    public void testURL() {
        StorageResource s1 = findURLs("http://www.dukelearntoprogram.com/course2/data/newyorktimes.html");
        //StorageResource s2 = findURLs("http://www.doctorswithoutborders.org");
        for (String link : s1.data()) {
            System.out.println(link);
        }
        System.out.println("size = " + s1.size());
        //System.out.println("size = " + s2.size());
    }
}
CodeWizard
  • 128,036
  • 21
  • 144
  • 167
A. K.
  • 81
  • 1
  • 4
  • 14
  • Is the HTML contained within the `source` variable? – npinti Dec 29 '15 at 07:11
  • For the java guys answering question: Note that stuff can be added into web page using `css`, `javascript` etc at client side ;) – T J Dec 29 '15 at 07:16

2 Answers2

1

You could do something like this:

int count = 0;
for (char c : source.toCharArray()) {
    if (c == '.') {
        count++;
    }
}

Alternatively, utilize the Apache Commons library and its great StringUtils functions: StringUtils.countMatches(String string, String subStringToCount). You would then just do StringUtils.countMatches(source, "."); to get the count of periods.

If you were putting this into your current program, you'd want to edit your findUrls function, inserting the counting right after String source = page.asString();.

Or if you just wanted this in its own function:

public int countPeriods(String url) {
    URLResource page = new URLResource(url);
    String source = page.asString();
    int count = 0;
    for (char c : source.toCharArray()) {
        if (c == '.') {
            count++;
        }
    }
    return count;
}

where now all you would need to do is pass in a url as a string to the function and it would return you your count.

MasterOdin
  • 7,117
  • 1
  • 20
  • 35
1

One way to do it would be to use the indexOf method:

int index = -1;
int count = 0;
String source = ...;
while((index = source.indexOf(".", ++index) != -1)
    count++

As pointed out by @TJCrowder, it might be the case that you need to let some script execute. If that is the case, please refer to this previous SO question.

Community
  • 1
  • 1
npinti
  • 51,780
  • 5
  • 72
  • 96