2

Hello I am trying to use regular expressions in a java program. I would like the regex to identify a String of unknown length and whose charachters are only 'C', 'A', 'G' or 'T'. Thanks for your help.

Julio Diaz
  • 9,067
  • 19
  • 55
  • 70

3 Answers3

7

Easy, just use a character class:

[CAGT]+

Or if the entire string has to comprise of the chars CAGT for it to match:

^[CAGT]+$
bluepnume
  • 16,460
  • 8
  • 38
  • 48
2

Adding to the above :

^[CAGTcagt]+$

To ensure detection of lowercase and upper case charcters.

phoxis
  • 60,131
  • 14
  • 81
  • 117
2

I disagree with the most voted answer. With [ACGT]+, a large string will lead to a lot of memory usage. So I would use a negated regex instead, and check if the string doesn't contain non [ACGT] characters instead:

str !~ [^ACGTacgt]
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154