1

Friends: in PostgreSQL plpython, am trying to do an iterative search/replace in a text block 'data'.

Using re-sub to define a match pattern, then call a function 'replace' to do the work. Objective is to have the 'replace' function called repeatedly, as some replacements generate further 'rule' matches, which require further replacements.

All works well through many, many replacements - and I'm managing to trigger the 2nd Pass of the repeat loop. Then, until something causes the Regex pattern to return an integer(?) -- apparently at the point it finds no matches... ?? I've tried testing for 'None' and '0', with no luck. Ideas?

data = (a_huge_block of_text)

# ======================  THE FUNCTION  ==============
def replace(matchobj):
 tag = matchobj.group(1)
 plpy.info("-------- matchobj.group(1), tag: ", tag)
 if matchobj.group(1) != '':
  (do all the replacement work in here)
# ======================  END FUNCTION  ==============

passnumber = 0
# If _any_ pattern match is found, process all of data for _all_ matches:
while re.search('(rule:[A-Za-z#]+)', data) != '':
 # BEGIN repeat loop:
 passnumber = passnumber + 1
 plpy.info(' ================================  BEGIN PASS: ',  passnumber)

 data = re.sub('(rule:[A-Za-z#]+)', replace, data)
 plpy.info(' =================================== END PASS: ',  passnumber)

Above code seems to be running OK, into a second iteration... then:

ERROR:  TypeError: sequence item 21: expected string, int found
CONTEXT:  Traceback (most recent call last):
  PL/Python function "myfunction", line 201, in <module>
    data = re.sub('(rule:[A-Za-z#]+)', replace, data)
  PL/Python function "myfunction", line 150, in sub
PL/Python function "myfunction"

Have also tried re.search (...) != '' -- and re.search (...) != 'None' --- with same result. I do realize I must find the syntax to represent the match object in some readable form...

DrLou
  • 649
  • 5
  • 21
  • There's a trick to figuring this kind of thing out. It's called the `print` function. In the body of the `while` loop, add enough `print` functions to display `repr(replace)` and `repr(data)`. Don't guess about what's going on. `print` stuff. Include the output in your question so we can all see what's **actually** happening. Proof is better than speculation. – S.Lott Sep 09 '11 at 02:14
  • Better yet, use [`pdb`](http://docs.python.org/dev/library/pdb.html) and inspect the stack at the point of the error to see what it's really stuck on. – Ross Patterson Sep 09 '11 at 02:36
  • 1
    @S.Lott: Don't think I have any print output capability from plpython; I do have the plpy.info call, which I make extensive use of. – DrLou Sep 09 '11 at 03:48
  • @Ross Patterson: Am I able to use pdb from within a plpython call? (I thought we weren't). Will research. – DrLou Sep 09 '11 at 03:48
  • Well I don't know plpython, so maybe not. But if you can invoke the python process yourself, you can use `python -m pdb /file/to/run.py`, or if the Python process can control `std(in|out|err)` then you can use `pdb.set_trace()`. – Ross Patterson Sep 09 '11 at 03:57
  • Please post the entire test case and the versions of the things you use. – Peter Eisentraut Sep 09 '11 at 07:16
  • @DrLou: http://developer.postgresql.org/pgdocs/postgres/plpython-util.html "The plpy module also provides the functions plpy.debug(msg), plpy.log(msg), plpy.info(msg), plpy.notice(msg), plpy.warning(msg), plpy.error(msg), and plpy.fatal(msg)" You have plenty of capability of printing. Please use it to provide us the missing information. – S.Lott Sep 09 '11 at 10:18
  • @Peter Eisentraut - Great to see you weighing in! – DrLou Sep 10 '11 at 00:44

1 Answers1

0

The answer to this turned out to be quite simple, of course, once you know Python! (I don't!)

To initiate the repeat loop, I had been doing this test:

while re.search('(rule:[A-Za-z#]+)', data) != '':

Had also tried this one, which will also not work:

while re.search('(rule:[A-Za-z#]+)', data) != 'None':

The None result can be trapped, of course, but the quotes are not needed. It's as simple as that:

while re.search('(rule:[A-Za-z#]+)', data) != None:

It's all so simple, once you know!

DrLou
  • 649
  • 5
  • 21