Lexer for Parsing to the end of a line

Question

If I have a keyword, how can I get it to, once it encounters a keyword, to just grab the rest of the line and return it as a string? Once it encounters an end of line, return everything on that line.

Here is the line I'm looking at:

  description here is the rest of my text to collect

Thus, when the lexer encounters description, I would like "here is the rest of my text to collect" returned as a string

I have the following defined, but it seems to be throwing an error:

states = (
     ('bcdescription', 'exclusive'),
)

def t_bcdescription(t):
    r'description '
    t.lexer.code_start = t.lexer.lexpos
    t.lexer.level = 1
    t.lexer.begin('bcdescription')

def t_bcdescription_close(t):
    r'\n'
    t.value = t.lexer.lexdata[t.lexer.code_start:t.lexer.lexpos+1]
    t.type="BCDESCRIPTION"
    t.lexer.lineno += t.valiue.count('\n')
    t.lexer.begin('INITIAL')
    return t

This is part of the error being returned:

  File "/Users/me/Coding/wm/wm_parser/ply/lex.py", line 393, in token
raise LexError("Illegal character '%s' at index %d" % (lexdata[lexpos],lexpos), lexdata[lexpos:])
ply.lex.LexError: Illegal character ' ' at index 40

Finally, if I wanted this functionality for more than one token, how could I accomplish that?

Thanks for your time

How will other keywords work? Will they all be "command" + "rest of line"? Without knowing, it might not make sense to even use a parser / lexer here. — FogleBird, Dec 23 '12 at 01:20
That is correct. command = "description" and the rest of the line would be "....the rest of the line...." — KingFish, Dec 23 '12 at 05:29

woshifyz · Accepted Answer · 2013-11-25T13:12:39.077

There is no big problem with your code,in fact,i just copy your code and run it,it works well

import ply.lex as lex 

states = ( 
     ('bcdescription', 'exclusive'),
)

tokens = ("BCDESCRIPTION",)

def t_bcdescription(t):
    r'\bdescription\b'
    t.lexer.code_start = t.lexer.lexpos
    t.lexer.level = 1 
    t.lexer.begin('bcdescription')

def t_bcdescription_close(t):
    r'\n'
    t.value = t.lexer.lexdata[t.lexer.code_start:t.lexer.lexpos+1]
    t.type="BCDESCRIPTION"
    t.lexer.lineno += t.value.count('\n')
    t.lexer.begin('INITIAL')
    return t

def t_bcdescription_content(t):
    r'[^\n]+'

lexer = lex.lex()
data = 'description here is the rest of my text to collect\n'
lexer.input(data)

while True:
    tok = lexer.token()
    if not tok: break      
    print tok

and result is :

LexToken(BCDESCRIPTION,' here is the rest of my text to collect\n',1,50)

So maybe your can check other parts of your code

and if I wanted this functionality for more than one token, then you can simply capture words and when there comes a word appears in those tokens, start to capture the rest of content by the code above.

score -1 · Answer 2 · edited Oct 08 '21 at 07:09

-1

It is not obvious why you need to use a lexer/parser for this without further information.

>>> x = 'description here is the rest of my text to collect'
>>> a, b = x.split(' ', 1)
>>> a
'description'
>>> b
'here is the rest of my text to collect'

edited Oct 08 '21 at 07:09

Richard Dally

1,432
2
21
38

answered Dec 23 '12 at 20:45

FogleBird

74,300
25
125
131

I have a high level language which was written and I want to parse and convert it to another language. The "description" is an example of a command. – KingFish Dec 24 '12 at 10:47

Lexer for Parsing to the end of a line

2 Answers2