0

My test table has two columns, one is message_xml of type XML and another is company_names of type VARCHAR. I would like to search for a word in each XML row and if the XML has it then export that XML.

Here are 2 samples of xml stored in DB2:

<breakfast_menu>
  <food>
    <name>Belgian Waffles</name>
    <price>$5.95</price>
    <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
    <calories>650</calories>
  </food>
  <food>
    <name>Homestyle Breakfast</name>
    <price>$6.95</price>
    <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
    <calories>950</calories>
  </food>
</breakfast_menu>

<breakfast_menu>
  <food>
    <name>Strawberry Belgian Waffles</name>
    <price>$7.95</price>
    <description>Light Belgian waffles </description>
    <calories>900</calories>
  </food>
  <food>
    <name>French Toast</name>
    <price>$4.50</price>
    <description>Thick slices of bread</description>
    <calories>600</calories>
  </food>
</breakfast_menu>

In these XML documents I want to find the word "bacon" (it can be anywhere in the XML) and just export that XML to a text file.

I tried using CONTAINS by first converting the XML to string but I get an error regarding VARCHAR.

xmlcast(test.message_xml as varchar(255)) as export_XML is the code I wrote to convert XML to string

[Error Code: -16061, SQL State: 10608] The value "429541527005540133404021548131000109999-12-312..." cannot be constructed as, or cast (using an implicit or explicit cast) to the data type "VARCHAR_255". Error QName=err:FORG0001.. SQLCODE=-16061, SQLSTATE=10608, DRIVER=4.15.82

Next I directly tried to use CONTAINS but I get an error regarding no text index was found. So, I tried to create an index but I get an error there too and the error reads

"An unexpected token "idx1" was found following "L) as ( create index". Expected tokens may include: "JOIN""

for the code: create index idx1 on test(message_xml) generate key using xmlpattern '/XML' as varchar(9999)

My code is:

@export on;
@export set filename="D:\temp\searchResults.txt";
@set maxrows 10;
with Tempresult(export_xml)
as
(
create index idx1 on test(message_xml)
generate key using xmlpattern '/XML'
as varchar(9999)
select 
    test.message_xml as export_XML
from test where source_id = 14
and trans_timestamp between '2015-10-01' and '2016-04-30'
)
select
    export_XML
    from Tempresult
    //where LOCATE('bacon',export_XML) > 0;
    where CONTAINS(export_XML, ' "bacon" ') = 1;
@export off;

with the above code WITHOUT creating an index, I get this error:

[Error Code: -443, SQL State: 38799] Routine "*RCH_8K64" (specific name "") has returned an error SQLSTATE with diagnostic text "CTE0199 No text index corresponding to column "MESSAGE_XML"".. SQLCODE=-443, SQLSTATE=38799, DRIVER=4.15.82

I even tried to use LOCATE, CONVERT, CAST but of no use. Can someone please help me in solving this?

I think solution might be by converting XML to string and apply CONTAINS or LOCATE or create an index for the xml column. Please correct me if I am wrong.

mustaccio
  • 18,234
  • 16
  • 48
  • 57
user3814614
  • 31
  • 1
  • 8
  • code I used to convert the xml to string is: "xmlcast(test.message_xml as varchar(9999)) as export_XML". I used a large number in varchar is because I want to export the entire xml that the word I am searching for. – user3814614 May 25 '16 at 18:17
  • Instead of showing your attempts to solve the problem, you should probably better describe the problem itself. Show the table definition, some sample data, and the desired result. Instead of saying "error regarding VARCHAR" post the actual error code and complete message. I suspect there may be an easier solution to it. By the way, XML indexes are different from text indexes, which are required for the `CONTAINS` function to work. – mustaccio May 25 '16 at 20:11
  • I listed my attempts because I am afraid that I may get the solution I tried. If I list the error, then someone can suggest me corrections in that. I agree that I did not provide the details you requested and I am sorry for that. I will provide it. Can you please help me in creating a text index? – user3814614 May 26 '16 at 12:35
  • If you decide to use text search functionality, [read the documentation](http://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0052052.html?lang=en) about enabling it and creating text indices; you cannot embed text search commands as part of a query. – mustaccio May 26 '16 at 14:03
  • Thank you mustaccio for editing my question :) . Sure I will go through the documentation. Thank you once again. – user3814614 May 26 '16 at 14:13

2 Answers2

2

You should take advantage of the PureXML features of DB2. For example, the following WHERE clause will search for bacon anywhere in elements name or description.

WHERE XMLEXISTS('/breakfast_menu/food[contains(name,"bacon") or contains(description,"bacon")] ' PASSING message_xml)
Stavr00
  • 3,219
  • 1
  • 16
  • 28
  • Awesome. Thanks for the reply Stavr00. What if "bacon" is a name attribute in a tag. Like this: ` 1 ` . How can I edit the WHERE clause? Is there a way that I can do as text-search? Sometimes I need to deal with really really huge xmls. – user3814614 May 26 '16 at 13:55
  • ... then this becomes an `XPATH` question ;) – Stavr00 May 26 '16 at 13:56
  • Oh got it. I will try your suggestions and get back to you. Thank you. In the mean time can you suggest a way using text-search? When I try to do that I get no text index on the column. – user3814614 May 26 '16 at 14:00
  • 1
    `XMLSERIALIZE(message_xml AS CLOB)` will convert the XML into a searchable type. But I still recommend using XPATH to search for data. – Stavr00 May 26 '16 at 14:19
  • I tried this Stavr00: `WHERE XMLEXISTS('$clam//breakfast_menu/*[contains(name,"bacon")]' PASSING message_xml as "clam")` and it returned 0 results though I have the xml that has "bacon" attribute. Am I missing something? Thank you Stavr00. – user3814614 May 26 '16 at 15:08
  • I recommend you debug your XPATH using something like `Notepad++` or an online tester (http://www.webtoolkitonline.com/xml-xpath-tester.html). In your case above it should read `$clam//breakfast_menu/food/*[contains(name,"bacon")]` – Stavr00 May 26 '16 at 15:35
  • Yes I tried it and when I do that I just get the values but not tags inside it. And so when I do using XPath, I get an error: `1) [Error Code: -16011, SQL State: 10507] The result of an intermediate step expression in an XQuery path expression contains an atomic value. Error QName=err:XPTY0019.. SQLCODE=-16011, SQLSTATE=10507, DRIVER=4.15.82. 2) [Error Code: -727, SQL State: 56098] An error occurred during implicit system action type "2". Information returned for the error includes SQLCODE "-16011", SQLSTATE "10507" and message tokens "".. SQLCODE=-727, SQLSTATE=56098, DRIVER=4.15.82` – user3814614 May 26 '16 at 16:19
  • meaning of error is that there are no nodes after the path and so XMLEXISTS cannot be computed – user3814614 May 26 '16 at 16:20
  • This is a silly question, but are you *certain* `message_xml` is indeed an `XML` type and not `CLOB` or `BLOB` ? – Stavr00 May 26 '16 at 17:02
  • Yes Stavr00. I am sure it is of type `XML` only. – user3814614 May 26 '16 at 17:10
  • You are very close to the answer, unfortunately I can't sit behind you and watch your work. You're gonna have to persevere and debug your XPATH yourself... – Stavr00 May 26 '16 at 17:23
  • Yeah I can understand that I have to keep on trying. I am doing that :). Thank you Stavr00. I will let you know if I find anything – user3814614 May 27 '16 at 14:39
0

I found a solution at the expense of some run time due to XMLSERIALIZE (I think).

select message_xml AS export_xml from test where LOCATE('bacon',XMLSERIALIZE(xmlquery('$clam//breakfast_menu ' passing test.message_xml as "clam") as CLOB)) > 0

For now this worked for me. Sometimes I am getting an error which reads:

[Error Code: -433, SQL State: 22001] Value "et">0.00" is too long.. SQLCODE=-433, SQLSTATE=22001, DRIVER=4.15.82

I couldn't figure out what this error is.

user3814614
  • 31
  • 1
  • 8