Hello I am trying to use regular expressions in a java program. I would like the regex to identify a String of unknown length and whose charachters are only 'C', 'A', 'G' or 'T'. Thanks for your help.
Asked
Active
Viewed 4,891 times
3 Answers
7
Easy, just use a character class:
[CAGT]+
Or if the entire string has to comprise of the chars CAGT for it to match:
^[CAGT]+$

bluepnume
- 16,460
- 8
- 38
- 48
-
The entire String has to be made up of only 'C', 'A', 'T' or 'G' – Julio Diaz May 16 '11 at 15:12
2
Adding to the above :
^[CAGTcagt]+$
To ensure detection of lowercase and upper case charcters.

phoxis
- 60,131
- 14
- 81
- 117
2
I disagree with the most voted answer. With [ACGT]+
, a large string will lead to a lot of memory usage. So I would use a negated regex instead, and check if the string doesn't contain non [ACGT] characters instead:
str !~ [^ACGTacgt]

Denis de Bernardy
- 75,850
- 13
- 131
- 154