0

Using python, how can I split a file, containing for example a code with methods, variables, etc. into words but leave the code's string variables as one unit string?

For example: given the following python code inside a file:

def example():
    a = 5
    b = "Hello World"

The result should be:

['def', 'example', '(', ')', ':', 'a', '=', '5', 'b', '=', '"Hello World"']

where "Hello World" is as one single token.

Thanks...

lior_13
  • 577
  • 7
  • 18
  • 1
    Related [Parsing Python Code From Within Python?](http://stackoverflow.com/q/1978515) – Bhargav Rao Jun 20 '16 at 15:55
  • Possible duplicate of [Parsing Python Code From Within Python?](http://stackoverflow.com/questions/1978515/parsing-python-code-from-within-python) – Mo H. Jun 20 '16 at 16:02
  • Don't think it's a duplicate, it's more of a general question about Lexers. – advance512 Jun 20 '16 at 16:33

1 Answers1

0

You can use the shlex module.

Example, for the fule:

Take the following text:

This string has embedded "double quotes" and 'single quotes' in it,
and even "a 'nested example'".

Using the shlex library, we construct a simple lexical parser:

import shlex
import sys

if len(sys.argv) != 2:
    print 'Please specify one filename on the command line.'
    sys.exit(1)

filename = sys.argv[1]
body = file(filename, 'rt').read()
print 'ORIGINAL:', repr(body)
print

print 'TOKENS:'
lexer = shlex.shlex(body)
for token in lexer:
    print repr(token)

This generates the output:

ORIGINAL: 'This string has embedded "double quotes" and \'single quotes\' in it,\nand even "a \'nested example\'".\n'

TOKENS:
'This'
'string'
'has'
'embedded'
'"double quotes"'
'and'
"'single quotes'"
'in'
'it'
','
'and'
'even'
'"a \'nested example\'"'
'.'

More information and a nice tutorial can be found here.

advance512
  • 1,327
  • 8
  • 20
  • Well, shlex module have solved my problem. I was just needed to delete the quotation marks in the sides of the strings... – lior_13 Jun 21 '16 at 15:54
  • More information to update the answer would be helpful. Also, marking it as the "accepted answer". – advance512 Jun 21 '16 at 16:05