0

I'm using WebHarvest to parse some html. I get the following error in WebHarvest's ide on the function that follows, and I don't understand what's wrong. I'm trying to create a function that trims a string.

Error:

Error executing XQuery expression (Xquery=[declare variable $xqsource external; let $result := normalize-space($xqsource) return $result])!

Edit2: The log reports the following SAX Error:

[...] Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog

I don't understand what this means in this case.

Function's parameters: sourceString, the string to trim

<function name="trim">
    <return>
        <xquery>
            <xq-param name="xqsource">
                <var name="sourceString" />
            </xq-param>
            <xq-expression><![CDATA[
                declare variable $xqsource external;

                let $result := normalize-space($xqsource)
                    return 
                     $result
                ]]>
            </xq-expression>
        </xquery>
    </return>
</function>

Edit: sourceString is a string composed of alphanumeric chars, new lines and spaces, like

" blabla - bla2

"

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
cdarwin
  • 4,141
  • 9
  • 42
  • 66
  • I can reproduce the error testing the XQuery expression with Saxon. What's the `sourceString` value? –  Dec 11 '10 at 23:30
  • @Alejandro: do you think the xquery code is correct? Anyway, the strings passed are string with new lines, alphanumeric chars and spaces inside – cdarwin Dec 12 '10 at 11:59

1 Answers1

1

the default type of xq-param is node() (cf manual). Therefore, WebHarvest tries to parse your variable content as XML (SAXParseException is an XML parsing error, not a particular XQuery error).

You should add a string type declaration to your param:

<xq-param name="xqsource" type="string">
  <var name="sourceString" />
</xq-param>

Does that help?

Dennis Münkle
  • 5,036
  • 1
  • 19
  • 18
  • Argh!! I missed that parameter type because the example used an xml var which is the default!!! Thank you VERY MUCH, now it works! – cdarwin Dec 12 '10 at 16:03