2

I am writing a program that should check if:

  • a user input string contains a specific word
  • OR contains 4 designated characters and is divisible by 3.

I can get the specific string check and the divisibility check to work but I can't seem to get the letter check to work.

validationcheck = False
while not validationcheck:
    InputSequence = input("Input: ")
    if (InputSequence == 'EXAMPLE' or len(InputSequence) % 3 == 0 and 'C', 'A', 'G', 'T', 't', 'g', 'a','c' in InputSequence):
        validationcheck = True
    else:
        print("Invalid input")
        InputSequence = input("Input: ")

My desired output would be that if the user types EXAMPLE, or they type a sequence of letters containing C, A, G, T that is divisible by 3, there would be no invalid input. Otherwise the program would print an invalid input message and prompt the user to re enter

Update - I ended up solving it but thanks for the reponses. I ended up using the following if it is helpful for anyone

            validationcheck = False
            while not validationcheck:
                InputSequence = input("Input: ")
                stringToCheck = InputSequence
                found = re.search("[CAGT]", stringToCheck)
                if len(InputSequence) % 3 == 0 and found:
                    validationcheck = True
  • Checking the specification on the second option: (1) must the input string contain all the designated characters, and (2) is it allowed to contain other characters? (since CAGT sounds like it might be a gene sequence which would not in fact have anything other than those characters, reinforced by the "divisibility by 3" which is amino acid coding length) – Joffan Apr 27 '21 at 14:34

3 Answers3

1

I propose to do this in two steps: at first you check line lenght and if is EXAMPLE and than you should check for stranger chars inside your line. To find characters inside a line you can iterate chars with a for loop:

validationcheck = False
while not validationcheck:
    InputSequence = input("Input: ")
    if (InputSequence == 'EXAMPLE' or len(InputSequence) % 3):
        # check for single chars inside string
        input_ok = True
        for ch in InputSequence:
            # check if character is not in given options
            if (ch not in [...]):  # to be completed
                input_ok = False
        if (input_ok):
            validationcheck = True
    else:
        print("Invalid input")
        InputSequence = input("Input: ")

A simple note: for python label names use the snake case like input_sequence and not InputSequence.

piertoni
  • 1,933
  • 1
  • 18
  • 30
0

You can use any() to check if any of the letters (for letter in the string) are in InputSequence as lower case str.lower()

validationcheck = False
while not validationcheck:
    InputSequence = input("Input: ")
    if (InputSequence == 'EXAMPLE':
        validationcheck = True
   elif not len(InputSequence) % 3 and any(i in InputSequence.lower() for i in 'tgac'):
        validationcheck = True
   else:
        print("Invalid input")
        InputSequence = input("Input: ")

You can also search if the input is containing only the characters 't', 'g', 'a' or 'c' by using regex

import re
validationcheck = False
while not validationcheck:
    InputSequence = input("Input: ")
    if (InputSequence == 'EXAMPLE':
        validationcheck = True
   elif not len(InputSequence) % 3 and re.fullmatch(r'[tgac]+[\r\n]*', InputSequence.lower()):
        validationcheck = True
   else:
        print("Invalid input")
        InputSequence = input("Input: ")

The regex [tgac]+[\r\n]* is says "one of the 't' or 'g' or 'a' or 'c' 1 or infinity times, then line separation (can be '\n', '\r', '\r\n') 0 or infinity times". more about regex

Ido
  • 168
  • 9
  • You should evaluate the programming level of the person that is asking and answer accordingly... both the pattern you gave are perfectly correct, but a bit difficult for entry level: `any(i in InputSequence.lower() for i in 'tgac'])` (btw you can remove a parenthesis as you will still have a generator) and `regex` concept/use – piertoni Apr 27 '21 at 12:59
  • 1
    You're right! I will add more details and explanations about the code I wrote. Thanks! – Ido Apr 27 '21 at 14:00
0

It sounds to me like you need to ensure that in the second option, the string consists of only the characters CAGT. This is a different requirement of course - part of good programming is identifying and eliminating ambiguities in the specification.

Proceeding on that basis, the keyword option is more cleanly handled as a separate if statement, using in to check input against a keyword list. The following code uses all( ) to ensure that non-keyword entries consist only of characters CAGT in a list comprehension checking all individual characters against the allowed letters. The test here doesn't check that all of those letters are present (so AGCAGC would be passed valid despite having no T). Finally I gave divisibility by 3 its own error message since that is potentially harder to spot when looking at an otherwise-good entry.

You don't need a second input statement, since the loop will bring you back to that for invalid input.

ExactKeywords = ['DOTHIS','DOTHAT']
validationcheck = False
while not validationcheck:
    InputSequence = input("Input: ").upper()
    if ( InputSequence in ExactKeywords ):
        validationcheck = True
    elif all(ch in 'CAGT' for ch in InputSequence):
        if len(InputSequence)%3 == 0:
            validationcheck = True
        else:
            print("Input string length not a multiple of 3")
    else:
        print("Invalid input")
Joffan
  • 1,485
  • 1
  • 13
  • 18
  • As I mentioned in a comment above, ACGT sounds like it might denote a nucleotide sequence which would not have anything other than those characters, reinforced by the "divisibility by 3" which corresponds to amino acid coding ([codon](https://en.wikipedia.org/wiki/Genetic_code#Codons)) length. – Joffan Apr 27 '21 at 21:21