1

Please have a look at the following code

//Devide the has into set of 3 pieces
    private void devideHash(String str)
    {
        int lastIndex = 0;

        for(int i=0;i<=str.length();i=i+3)
            {
                lastIndex = i;
                try
                {
                    String stringPiece = str.substring(i, i+3);
                  //  pw.println(stringPiece);
                    hashSet.add(stringPiece);
                }
                catch(Exception arr)
                {
                    String stringPiece = str.substring(lastIndex, str.length());
                  //  pw.println(stringPiece);
                    hashSet.add(stringPiece);
                }
            }
    }

The above method receives String like abcdefgjijklmnop as the parameter. Inside the method, its job is to divide this sets of 3 letters. So when the operation is completed, the hashset will have pieces like abc def ghi jkl mno p

But the problem is that if the input String is big, then this loop takes noticeable amount of time to complete. Is there any way I can use to speed this process?

halfer
  • 19,824
  • 17
  • 99
  • 186
PeakGen
  • 21,894
  • 86
  • 261
  • 463
  • 1
    How long is "noticeable amount of time" and how long is the string? I would personally not use exception handling to catch "whoops we messed up the last part" but other than that it looks okay... – Jon Skeet Apr 05 '14 at 08:35
  • 1
    You don't really need `lastIndex`. – Bex Apr 05 '14 at 08:38
  • Strings can be very slow. Do you really need to split it or can you just process the string without having to copy portions of it out and insert into a hashset? – Jesus Ramos Apr 05 '14 at 08:40
  • 1
    I agree with @JonSkeet. What do you mean by "noticeable amount of time" since that code should have a runtime behavior of `O(n)`, meaning it takes longer if the string increases. This though depend on the implementation of `String::substring`, as it is discussed in this [question](http://stackoverflow.com/questions/16123446/java-7-string-substring-complexity). –  Apr 05 '14 at 08:40
  • Try and split it up to the available number of cores on your system and parallelize it – spinlok Apr 05 '14 at 08:40
  • In my opinion, if you want this to be faster... you should not do this. Maybe you could process your `String` while you consume it, without creating many new `String`s and putting them into a `Map`. If you really need to have all those substrings put into a `Map` (for ordering or unicity purpose for example), then I guess that there is nothing else to do that would be significant. –  Apr 05 '14 at 08:43
  • As b52 writes, there is a significant change in this area in java 7. I would expect this to run a _lot_ faster in java 6 than in java 7. Maybe that's worth a shot? – Bex Apr 05 '14 at 08:45
  • Actually, the more I think about this, the more I wonder why you would like to do it in the first place. Maybe if you would share a little bit more about the problem, we can find a better solution? – Bex Apr 05 '14 at 08:49
  • @JonSkeet: "how long the string" really depends. It could be 500-600 words without spaces. – PeakGen Apr 05 '14 at 09:18
  • @b52: Yes, if a long string, then more time. – PeakGen Apr 05 '14 at 09:18
  • @GloryOfSuccess: 500-600 words should be done in the blink of an eye. Again - how long is this taking? Can you show a short but complete program demonstrating the problem, with appropriate timings? – Jon Skeet Apr 05 '14 at 09:31
  • You first string chunk is empty. – Hannes Apr 05 '14 at 10:15

2 Answers2

4

As an option, you could replace all your code with this line:

private void divideHash(String str) {
    hashSet.addAll(Arrays.asList(str.split("(?<=\\G...)")));
}

Which will perform well.


Here's some test code:

String str = "abcdefghijklmnop";
hashSet.addAll(Arrays.asList(str.split("(?<=\\G...)")));
System.out.println(hashSet);

Output:

[jkl, abc, ghi, def, mno, p]
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • I'm not sure about that actually. It seems that, for some obscure reason, it splits the string into 1 group of 3 chars, then groups of 4 chars. Could you check [that link](http://regex101.com/r/cR3sK0) and "fix it" or explain what's wrong? –  Apr 05 '14 at 09:04
  • @user3419378 I just tested it and it works perfectly. See edited answer for test code and output. The reason you think it "doesn't work" is that you are using that regex site to find *matches*, but my code uses it to *split*, which is a different thing. The regex site is *consuming* the character after the match, but `split()` does *not* consume that character. – Bohemian Apr 05 '14 at 09:06
  • Thanks for the reply. "Perform well" means improve the speed? – PeakGen Apr 05 '14 at 09:16
  • @glory try it. It should perform very well actually. – Bohemian Apr 05 '14 at 09:20
  • I can't imagine the regex being faster than a simple loop, but I love this one-liner. – maaartinus Apr 05 '14 at 09:37
  • oops! No. This is 3 times slower. Anyway I found out the reason for that loop seems to be slow is it is getting called 170,000 times. – PeakGen Apr 05 '14 at 10:13
  • @GloryOfSuccess I don't know how you're testing speed, but I ran it 1M times, and put in a warm up, and timing just that line of code: Average time was 0.000011 seconds. That's fast in anyone's language. – Bohemian Apr 05 '14 at 14:09
  • @maaartinus I timed it: Average time was 0.000011 seconds. That's "fast enough" :) – Bohemian Apr 05 '14 at 14:11
  • I wasn't claiming it's slow. It's just that it's most probably much slower than the manual approach. Still probably more than good enough. – maaartinus Apr 05 '14 at 14:20
  • Using regex is going to be slower than doing it manually, given the extra overhead for Regex parsing etc, and benefit of sharing char array for substring. However, I just cannot imagine how *Slow* OP is facing. It doesn't seems to me that the code can be slow even the string is with length in thousands magnitude. – Adrian Shum Apr 07 '14 at 04:44
2

There is nothing we can really tell unless you tell us what the "noticeable large amount" is, and what is the expected time. It is recommended that you start a profiler to find what logic takes most time.

Some recommendations I can give from briefly reading your code is:

  1. If the result Set is going to be huge, it will involve lots of resize and rehashing when your HashSet resize. It is recommended you first allocate required size. e.g.

    HashSet hashSet = new HashSet<String>(input.size() / 3 + 1, 1.0);

    This will save you lots of time for unnecessary rehashing

  2. Never use exception to control your program flow.

Why not simply do:

int i = 0;
for (int i = 0; i < input.size(); i += 3) {
    if (i + 3 > input.size()) {
        // substring from i to end
    } else {
        // subtring from i to i+3
    }
}
Adrian Shum
  • 38,812
  • 10
  • 83
  • 131