0

I have written code to split a specific symbol out of a symbols list, but it is buggy and does not work properly. I hope someone can clarify and help me.

What I would like to do is to split this string (for example) - 'game.run();' to this list of strings - ['game','.','run','(', ')',';'] where the Symbol list -

Symbollst = [
        '{' , '}' , '(' , ')' , '[' , ']' , '.' ,
        ',' , ';' , '+' , '-' , '*' , '/' , '&' ,
        ',' , '<' , '>' , '=' , '~'
        ]

My initial code :

 for token in r_splitted :
    if any(x in token for x in Symbollst) :
        TokenInSymbol = [i in token for i in Symbollst]
        new_token = token.split(Symbollst[TokenInSymbol.index(True)])
        new_token.insert(1,Symbollst[TokenInSymbol.index(True)])
        for i in new_token :
            if i=='' : continue
            self.TokenList.append(i)

Note - this is part of Nand2Tetris compiler task.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
Daniel Sapir
  • 23
  • 1
  • 4
  • Is this the only type of thing you're parsing, or are there more cases? I assume whitespace is ignored? Please provide a variety of examples of expected input and output that cover various cases. How about `1 >= 1`? I assume we want to tokenize this as `["1", ">=", "1"]` rather than `["1, ">", "=", "1"]`, just to give one example. Thanks. – ggorlen Sep 29 '19 at 14:50
  • 1
    Sounds like you need a lexical scanner + tokenizer. Do look for existing implementations before trying to DIY. You'll inadvertently miss lots of cases. – rdas Sep 29 '19 at 14:52
  • I am actually building a tokenizer here... – Daniel Sapir Sep 29 '19 at 17:00

1 Answers1

0

Ok, I have thought about it over night and came up with a solution using list(token) to separate each character and treat it individually :

Symbollst = [
        '{' , '}' , '(' , ')' , '[' , ']' , '.' ,
        ',' , ';' , '+' , '-' , '*' , '/' , '&' ,
        ',' , '<' , '>' , '=' , '~'
        ]

token = 'game.run();'
temp_token = list(token)
new_token=[]
string=''

for i in temp_token :
    if i in Symbollst :
        if not string=='' : new_token.append(string)
        new_token.append(i)
        string=''
    else : string = string + i

print new_token

so for the input -

token = 'game.run();'

output will be :

new_token = ['game', '.', 'run', '(', ')', ';']
Daniel Sapir
  • 23
  • 1
  • 4