Getting all positions of an occuring String using StringBuilder.indexOf()

Question

Java Beginner over here. I'm currently working on a programm that searches a part of the human DNA. Specifically, I want to find all occurences of a String within a StingBuilder, using StringBuilder.indexOf(). But I need all occurences, not just the first.

Code:

public void search(String motive){
    int count = 0;
    gene.indexOf(motive);   // gene is the Stringbuilder
    count++;


}

I need all occurences of motive in the gene StringBuilder plus the counter how often motive is in gene. Any help, since indexOf() only displays the first occurence?

well, yes, there is `'indexOf(String str, int fromIndex)', but that isn't very helpful either, since I don't yet know where 'str' appears (obviously). — Smunfr, Jan 06 '17 at 16:16
[Getting unix timestamp from Date()](//stackoverflow.com/q/7784421) — Chetan, Jan 06 '17 at 16:16
Then you start at 0. After that, you start 0+offset (which might be the length of the string), etc., etc. — Sotirios Delimanolis, Jan 06 '17 at 16:19

Frelling · Accepted Answer · 2017-01-07T17:27:09.983

I take it that you are looking for indices of a specific nucleotide sequence within a gene sequence or sub-sequence. The following example class demonstrates a generic approach using Java's regular expression library to find such:

package jcc.tj.dnamatch;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Gene {
   private String gene;

   public Gene() {}

   public Gene( String gene ) {
      this.gene = gene;
   }

   public List<Integer> find( String seq ) {
      List<Integer> indices = new ArrayList<Integer>();

      Pattern pat = Pattern.compile( seq );
      Matcher m = pat.matcher( gene );

      while ( m.find() )
         indices.add( m.start() );

      return indices;
   }

   public String getGene() {
      return gene;
   }

   public void setGene( String gene ) {
      this.gene = gene;
   }
}

The above example, use a Matcher to find patterns. There are other String-based algorithms that may be more efficient, but as a starting point, the Matcher offers a generic solution to any type of text pattern search.

Encoding nucleotides as characters (ATCG) is very flexible and convenient, allowing the use of String-based tools to analyze and characterize sequences and/or sub-sequences. Unfortunately, they do not scale well. In such cases, it would be better to consider more specific bioinfomatics techniques for representing and managing sequences.

A good reference on certain techniques, would be Chapter 2 – Algorithms and Data Structures in Next-Generation Sequencing of the book Next Generation Sequencing Technologies and Challenges in Sequence Assembly. A more detailed PDF preview of it is available from this Google link; though I won't guarantee it to work forever.

You may also want to look at BioJava. While, I wouldn't want to detract you from Java, Perl is another good alternative for sequence analysis. Beginning Perl for Bioinformatics; Perl and Bioinformatics; or BioPerl.

I realize that this answer may be TMI; but, if it helps you or others find more appropriate solutions, it served its purpose.

Edit:

Based on the comment below, this appears to be a homework question, given the requirement that the search be accomplished by StringBuilder.indexOf(). The following method would accomplish the search accordingly.

public List<Integer> findBySb( String seq ) {
    List<Integer> indices = new ArrayList<Integer>();
    StringBuilder sb = new StringBuilder( gene );
    int strIdx = 0;

    while ( strIdx < sb.length() ) {
        int idx = sb.indexOf( seq, strIdx );
        if ( idx == -1 )
            break;
        indices.add( idx );
        strIdx = idx + seq.length();
    }

    return indices;
}

The same indexOf() approach can used with the string directly.

public List<Integer> findByString( String seq ) {
    List<Integer> indices = new ArrayList<Integer>();
    int strIdx = 0;

    while ( strIdx < gene.length() ) {
        int idx = gene.indexOf( seq, strIdx );
        if ( idx == -1 )
            break;
        indices.add( idx );
        strIdx = idx + seq.length();
    }

    return indices;
}

Both StringBuilder and String use the same static implementation of String.indexOf(), thus functionally there is no difference. However, instantiating a StringBuilder just for searching is overkill and a little more wasteful since it also allocates buffers to manage string operations. I could go on :), but it doesn't add to the answer.

Thank you very much, that'll help a lot for the moment to go on with my project(since i need that method in further tasks). Only problem is that my task was to specifically use the indexOf() method of StringBuilder. If I can't figure it out, ur solution will serve the purpose anyway. — Smunfr, Jan 07 '17 at 10:18
@Smunfr See the additional method added to the solution above for a StringBuilder-based search. — Frelling, Jan 07 '17 at 17:06

Getting all positions of an occuring String using StringBuilder.indexOf()

1 Answers1