I take it that you are looking for indices of a specific nucleotide sequence within a gene sequence or sub-sequence. The following example class demonstrates a generic approach using Java's regular expression library to find such:
package jcc.tj.dnamatch;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Gene {
private String gene;
public Gene() {}
public Gene( String gene ) {
this.gene = gene;
}
public List<Integer> find( String seq ) {
List<Integer> indices = new ArrayList<Integer>();
Pattern pat = Pattern.compile( seq );
Matcher m = pat.matcher( gene );
while ( m.find() )
indices.add( m.start() );
return indices;
}
public String getGene() {
return gene;
}
public void setGene( String gene ) {
this.gene = gene;
}
}
The above example, use a Matcher to find patterns. There are other String-based algorithms that may be more efficient, but as a starting point, the Matcher offers a generic solution to any type of text pattern search.
Encoding nucleotides as characters (ATCG) is very flexible and convenient, allowing the use of String-based tools to analyze and characterize sequences and/or sub-sequences. Unfortunately, they do not scale well. In such cases, it would be better to consider more specific bioinfomatics techniques for representing and managing sequences.
A good reference on certain techniques, would be Chapter 2 – Algorithms and Data Structures in Next-Generation Sequencing of the book Next Generation Sequencing Technologies and Challenges in Sequence Assembly. A more detailed PDF preview of it is available from this Google link; though I won't guarantee it to work forever.
You may also want to look at BioJava. While, I wouldn't want to detract you from Java, Perl is another good alternative for sequence analysis. Beginning Perl for Bioinformatics; Perl and Bioinformatics; or BioPerl.
I realize that this answer may be TMI; but, if it helps you or others find more appropriate solutions, it served its purpose.
Edit:
Based on the comment below, this appears to be a homework question, given the
requirement that the search be accomplished by StringBuilder.indexOf()
. The following method would accomplish the search accordingly.
public List<Integer> findBySb( String seq ) {
List<Integer> indices = new ArrayList<Integer>();
StringBuilder sb = new StringBuilder( gene );
int strIdx = 0;
while ( strIdx < sb.length() ) {
int idx = sb.indexOf( seq, strIdx );
if ( idx == -1 )
break;
indices.add( idx );
strIdx = idx + seq.length();
}
return indices;
}
The same indexOf()
approach can used with the string directly.
public List<Integer> findByString( String seq ) {
List<Integer> indices = new ArrayList<Integer>();
int strIdx = 0;
while ( strIdx < gene.length() ) {
int idx = gene.indexOf( seq, strIdx );
if ( idx == -1 )
break;
indices.add( idx );
strIdx = idx + seq.length();
}
return indices;
}
Both StringBuilder
and String
use the same static implementation of String.indexOf()
, thus functionally there is no difference. However,
instantiating a StringBuilder
just for searching is overkill and a little
more wasteful since it also allocates buffers to manage string operations. I could go on :), but it doesn't add to the answer.