Try this pattern "\\b([GCATgcat]+)\\b
" which matches any GCAT character sequence (upper or lowercase) surrounded by a word boundary (so it wouldn't match those characters embedded in other strings, like the word "catalog"). If you repeatedly scan for this regex in your sample file you will extract each sequence.
Here's a working example for your sample file:
// Locate the substring between "ORIGIN" and "//" in the file.
String fileContents = getSampleFileContents();
int indexOfOrigin = fileContents.indexOf("ORIGIN");
String pertinentSection = fileContents.substring(
indexOfOrigin, fileContents.indexOf("//", indexOfOrigin));
// Search for sequences within the pertinent substring.
Pattern p = Pattern.compile("\\b([GCATgcat]+)\\b");
Matcher m = p.matcher(pertinentSection);
List<String> sequences = new ArrayList<String>();
while (m.find()) {
sequences.add(m.group(1));
}
sequences.toString(); // => ["acagatgaag", "acagatgaag", ..., "acagatgaag"]