1

I'm trying to do some exercise on Streams and encountered following problem:

I have a text file, and I want to compute the average number of words per line. Can someone tell me if my way of thinking is correct? Here's some pseudocode I think should do the trick once implemented:

double wordCount(String filepath){
  return Files.lines(Paths.get(filepath))
                      // make a wordarray of the line
                      // average the size of every wordarray with something like that
                          (collect(Collectors.averagingDouble())

Can someone please help me with that?

How can I convert a line to a String array of words?

How do I get the size of that array?

shmosel
  • 49,289
  • 6
  • 73
  • 138
helloWorld
  • 23
  • 1
  • 4
  • Actually you mention several tasks. How to convert a line to an array of Strings. How to get the size of this array. Solve these sub tasks and bring your work to this question. –  May 13 '18 at 18:38
  • 2
    *"How do I get the size of that array?"* Really? `arr.length`!!! – Andreas May 13 '18 at 18:54
  • @Andreas Yes I know that's arr. length... Somehow I thought to complicated and didn't think it's that easy with streams... – helloWorld May 14 '18 at 06:16

3 Answers3

4

The 2 steps missing are :

  • How can I convert a line to a Stringarray of words : split around spaces >> s.split(" ");

  • How do I get the size of that array : get its length >> arr.length


  1. Using specialized operation on IntStream

    double wordCount(String filepath) {
        try {
            return Files.lines(Paths.get(filepath))
                    .map(s -> s.split(" "))
                    .mapToInt(arr -> arr.length)
                    .average()
                    .orElse(-1.0);
        } catch (IOException e) {
            e.printStackTrace();
        }
        return -1.0;
    }
    
  2. Less specific Collectors operation (To avoid againt prop 1.)

    double wordCount2(String filepath) {
        try {
            return Files.lines(Paths.get(filepath))
                    .map(s -> s.split(" "))
                    .map(arr -> arr.length)
                    .collect(Collectors.averagingDouble(Double::new));
        } catch (IOException e) {
            e.printStackTrace();
        }
        return -1.0;
    }
    
azro
  • 53,056
  • 7
  • 34
  • 70
  • 1
    Nice, although I'd avoid using the second approach for this specific task. see https://stackoverflow.com/questions/49965753/when-to-use-specialised-stream-methods-over-methods-in-the-collectors-class-or-v – Ousmane D. May 13 '18 at 20:10
  • @azro Thank you very much! – helloWorld May 14 '18 at 06:18
  • 1
    @Aominè I’d avoid using `String.split` at all, when all I want, is the number of words/matches… – Holger May 14 '18 at 09:26
3

You are doing unnecessary work by splitting a string into a bunch of strings, one for each word, populating an array with them, just to ask for the array’s size afterwards.

If you want to get the number of words, consider a straight-forward method, only counting the words

private static final Pattern WORD = Pattern.compile("\\w+");
public static int wordCount(String s) {
    int count = 0;
    for(Matcher m = WORD.matcher(s); m.find(); ) count++;
    return count;
}

You may then use this method inside a Stream operation, to get the average word count:

Files.lines(Paths.get(filePath)).mapToInt(YourClass::wordCount).average().orElse(0)

With Java 9, you could rewrite the wordCount method to use a Stream like

private static final Pattern WORD = Pattern.compile("\\w+");
public static int wordCount(String s) {
    return (int)WORD.matcher(s).results().count();
}

But the loop likely is more efficient as it omits the construction of MatchResult instances.

Holger
  • 285,553
  • 42
  • 434
  • 765
1
private static void wordcount(String filePath) throws IOException {
    Path path = Paths.get(Paths.get(filePath).toUri());
    double result = Files.lines(path).map(s -> {
        String trimmed = s.trim();
        if (trimmed.isEmpty()) return 0;
        else return trimmed.split(" ").length;
    }).mapToInt(i -> i.intValue()).average().getAsDouble();

    System.out.println(result);
}
Liju John
  • 1,749
  • 16
  • 19