pyparsing: NotAny(FollowedBy()) failing

Question

i have some input data like

[gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6]

and want to find all gogs, if not G after it. so in this case i want to get gog2 and gog3 (and maybe gog6).

looks pretty simple, rigth? but i failed :(

import pyparsing as pp
from pyparsing import *

def pyparsing_test():
    # this also dont helps
    # ParserElement.enable_left_recursion(force=True)

    data=""" [gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6] """

    poi_type = Word(alphas).set_results_name('type')
    poi = Suppress('[') + poi_type + Char(nums) + Suppress(']')

    def cnd_is_type(val):
        return lambda toks: toks.type==val

    def cnd_is_not_type(val):
        return lambda toks: toks.type!=val

    poi_gog=poi('gog').add_condition(cnd_is_type('gog'))
    poi_g=poi('g').add_condition(cnd_is_type('G'))
    poi_not_g=poi('not_g').add_condition(cnd_is_not_type('G'))

    pattern = poi_gog + ~poi_g
    #WTF this finds only `gog6`, why??

    pattern = poi_gog + NotAny(FollowedBy(poi_g))
    #WTF same, only `gog6`

    pattern = poi_gog + poi_not_g.suppress()
    #WTF this works better but find only `gog2`, why not `gog3` also?

    r=pattern.search_string(data)
    print(data)
    print('='*10)
    print(r)

i'm also tried `pattern = poi_gog + ~FollowedBy(poi_g)` but it also captures only `gog6` — Jhon BYaka, May 11 '23 at 11:31
Could you give output examples of what you want to achieve ? — Malo, May 11 '23 at 12:18
You are grabbing the first gog only. You need to continue to search through the string. — Jatin Morar, May 11 '23 at 17:58
Sorry, how? as i understand, used method `search_string` find all matches. and with `pattern=poi_gog` it returns all items. maybe u means `scan_string`? — Jhon BYaka, May 11 '23 at 18:16

Malo · Answer 1 · 2023-05-12T15:36:26.043

0

I would go for the regexp module re

import re
data=""" [gog1] [G1] [gog2] [gog3] [gog4] [G2] [gog5] [G3] [gog6] """
m = re.findall('\[(gog.)(?!...G)', data)
print(m)

the result is:

['gog2', 'gog3', 'gog6']

The regexp can still be improved if you want to exclute the last gog ? and/or you need to handle numbers larger than 9 if needed ? or make it more robust.

edited May 12 '23 at 15:36

answered May 11 '23 at 12:09

Malo

1,233
1
8
25

Malo thanks for help! but i need to use pyparsing, my parsing-logic is much more complex - it's just example of problem. Anyway i need to extract gog if it not followed by G, so in this example correct result is gog2, gog3 - not all gogs – Jhon BYaka May 11 '23 at 13:29
@JhonBYaka OK I misunderstood what you said and that you wanted to exclude "gogG"... but you need to exclude any gog if followed by [G]. I edited my response to match this behaviour. You can do pretty complex stuff with regexp. You can still improve it if needed. – Malo May 11 '23 at 14:53
yep, like this, but need to do this with pyparsing unfortently – Jhon BYaka May 11 '23 at 15:31

score 0 · Accepted Answer · answered May 27 '23 at 10:06

Finnaly, we know whats happens, thanks to @ptmcg! His original answer on github https://github.com/pyparsing/pyparsing/issues/482#issuecomment-1546779260.

Summary:

First of all, need to use grouping with StringEnd() and this one works:

pattern = poi_gog + FollowedBy(Group(poi_not_g) | StringEnd())

About title problem - NotAny() have bug, it skips parse actions (and conditions). Current version pyparsing 3.0.9

pyparsing: NotAny(FollowedBy()) failing

2 Answers2