-1

I have this sequence "ggtacctcctacgggaggcagcagtgaggaattttccgcaatgggcgaaagcctgacgga" and I want to break it into 3char length units like ggt acc tcc ..etc?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
Raed Tabani
  • 217
  • 3
  • 7

4 Answers4

0

Try something like:

String str[] = s.split("(?<=\\G...)");
Output
[ggt, acc, tcc, tac, ggg, agg, cag, cag, tga, gga, att, ttc, cgc, aat, ggg, cga, aag, cct, gac, gga]
SMA
  • 36,381
  • 8
  • 49
  • 73
0

Do not use a Stringtokenizer. The regular expression to split is really inefficient - DNA/RNA-Strings are really long.

In Java 8 one could do following solution:

public static void main(String[] args) {
    String str = "ggtacctcctacgggaggcagcagtgaggaattttccgcaatgggcgaaagcctgacgga";
    List<String> collect = str.chars()
        .mapToObj(accumulator(3))
        .filter(s -> s != null)
        .collect(Collectors.toList());
    System.out.println(collect);
}

private static IntFunction<String> accumulator(final int size) {
    return new CharAccumulator(size);
}

private static final class CharAccumulator implements IntFunction<String> {
    private StringBuilder builder ;
    private int size;

    private CharAccumulator(int size) {
        this.builder = new StringBuilder();
        this.size = size;
    }

    @Override
    public String apply(int value) {
        builder.append((char) value);
        if (builder.length() == size) {
            String result = builder.toString();
            builder.setLength(0);
            return result;
        } else  {
            return null;
        }
    }
}

It is not as easy to understand and maybe not as performant but it works also with lazy char streams (saves memory).

CoronA
  • 7,717
  • 2
  • 26
  • 53
0

You could try something like the following, where you could convert the String to a char[] and loop through them in units of 3 in order to get that String:

String str = "ggtacctcctacgggaggcagcagtgaggaattttccgcaatgggcgaaagcctgacgga";
    char[] array = str.toCharArray();
    List<String> result = new ArrayList<String>();
    for(int i = 0; i<array.length; i+=3)
    {
        StringBuilder s = new StringBuilder();
        for(int j = i ; j<array.length && j < i+3; j++)
        {
            s.append(array[j]);
        }
        result.add(s.toString());
    }

The List results now contains strings of length three, and it does not break if the size is not a multiple of three.

Gregory Basior
  • 300
  • 1
  • 9
0

Here is another solution that uses the substring method (without StringTokenizer):

public static void main(String[] args) {        
    String s = "ggtacctcctacgggaggcagcagtgaggaattttccgcaatgggcgaaagcctgacgga";
    char[][] c = new char[s.length()/3][3];
    for ( int i = 0 ; i < s.length() ; i+=3 ) {
        String substring = s.substring(i, i+3);
        c[i/3] = substring.toCharArray();
    }
    // test
    for ( int i = 0 ; i < c.length ; i++ ) {
        for ( int j = 0 ; j < c[0].length ; j++ ) {
            System.out.print(c[i][j]);
        }
        System.out.println();
    }
}
ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199