0

1.Alexa API response about google.com: http://pastebin.com/C5yjSjCf -in other words it is representing one row from 12 simple tables called "ContactInfo" "Rank by Country", ...

and just one more example (facebook.com) http://pastebin.com/mP813jYS

2.Scheme/data type information: http://awis.amazonaws.com/AWSAlexa/AWSAlexa.xsd

i can do basic xquery with xqilla

query.txt:

declare namespace aws="http://alexa.com";

/aws:UrlInfoResponse/aws:Response/aws:UrlInfoResult/aws:Alexa/aws:ContentData/aws:DataUrl

xqilla -i alexa.xml query.txt
Error parsing resource: file:///var/www/google  Error message: invalid content after root element's end tag [err:FODC0002]

xqilla -i google.xml query.txt
Error parsing resource: file:///var/www/Error message: invalid content after root element's end tag [err:FODC0002]

alexa.xml (the file i want to query actually), is many of these api responses after eachother

i also tried deleting the first 3 lines and the last one from google.xml and to search&replace aws:' and leading spaces , jsut to make it more simple but still the same Error :(

iloveregex
  • 57
  • 2
  • 12

1 Answers1

1

You said...

alexa.xml (the file i want to query actually), is many of these api responses after eachother

Is this what it sounds like? Does your file look something like this?

<aws:UrlInfoResponse xmlns:aws="http://alexa.com">
    <!--...-->    
</aws:UrlInfoResponse>
<aws:UrlInfoResponse xmlns:aws="http://alexa.com">
    <!--...-->    
</aws:UrlInfoResponse>

That would also explain your error. You're only allowed to have one root element. You'd need to wrap it all in another element.

Example:

<responses>
    <aws:UrlInfoResponse xmlns:aws="http://alexa.com">
        <!--...-->    
    </aws:UrlInfoResponse>
    <aws:UrlInfoResponse xmlns:aws="http://alexa.com">
        <!--...-->    
    </aws:UrlInfoResponse>    
</responses>

XPath:

/responses/aws:UrlInfoResponse/aws:Response/aws:UrlInfoResult/aws:Alexa/aws:ContentData/aws:DataUrl
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • yay, got it to work but only after putting it in a root tag and removing all aws: from tags and all tags in the beginning xmlns:aws @daniel-haley – iloveregex Apr 08 '14 at 23:11
  • /root/Alexa/ContentData/DataUrl now runs through the file but crazy slow and just lists 5 lines a second!!!:D any fix for that? So i may had been faster with regex or xerces. just thought this was closer to the way its meant to be – iloveregex Apr 08 '14 at 23:12
  • how would i actually do it the smartest way, validated, including the datatypes from the .xsd? isnt there any tool to automatically import the whole xml and xsd to a database table or csv, this would be so much easier... – iloveregex Apr 08 '14 at 23:14