0

I'm trying to parse google atom with XmlSlurper. My use case is something like this.

1) Send an atom xml to server with rest client.

2)Handle request and parse it on server side.

I develop my server with Groovy and used XmlSlurper as a parser. But i couldnt succed and get the "content is not allowed in prolog" exception. And then i tried to find the reason why it happened. I saved my atom xml to a file which is encoded with utf-8. And then tried read file and parse atom, i get the same exception. But then i saved atom xml to a file whixh is encoded with ansi. And I parsed atom xml successfully. So i think the problem is about XmlSlurper and "UTF-8".

Do you have any idea about this limitation? My atom xml has to be utf-8, so how can i parse this atom xml ? Thanks for your help.

XML :

<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns:atom='http://www.w3.org/2005/Atom'
    xmlns:gd='http://schemas.google.com/g/2005'>
  <category scheme='http://schemas.google.com/g/2005#kind'
    term='http://schemas.google.com/contact/2008#contact' />
  <title type='text'>Elizabeth Bennet</title>
  <content type='text'>Notes</content>
  <gd:email rel='http://schemas.google.com/g/2005#work'
    address='liz@gmail.com' />
  <gd:email rel='http://schemas.google.com/g/2005#home'
    address='liz@example.org' />
  <gd:phoneNumber rel='http://schemas.google.com/g/2005#work'
    primary='true'>
    (206)555-1212
  </gd:phoneNumber>
  <gd:phoneNumber rel='http://schemas.google.com/g/2005#home'>
    (206)555-1213
  </gd:phoneNumber>
  <gd:im address='liz@gmail.com'
    protocol='http://schemas.google.com/g/2005#GOOGLE_TALK'
    rel='http://schemas.google.com/g/2005#home' />
  <gd:postalAddress rel='http://schemas.google.com/g/2005#work'
    primary='true'>
    1600 Amphitheatre Pkwy Mountain View
  </gd:postalAddress>
</entry>

read file and parse :

 String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml";
 String line = "";
 StringBuilder sb = new StringBuilder();
 BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
 while ((line = br.readLine()) !=null) {
     sb.append(line);
 }
 System.out.println("sb.toString() = " + sb.toString());

 def xmlf = new XmlSlurper().parseText(sb.toString())
    .declareNamespace(gContact:'http://schemas.google.com/contact/2008',
        gd:'http://schemas.google.com/g/2005')

   println xmlf.title  
Dan Lowe
  • 51,713
  • 20
  • 123
  • 112
erimerturk
  • 4,230
  • 25
  • 25
  • What exactly do you mean by "i saved atom xml to a file which is encoded with ansi"? How are you parsing the XML, exactly? Some code would be helpful... – Jon Skeet Oct 18 '11 at 12:48
  • i mean i create a file with notepad++ which is encoding type ansi. I made a copy- paste. – erimerturk Oct 18 '11 at 12:50
  • Do you also have an example of the XML that is failing? – tim_yates Oct 18 '11 at 12:57
  • @erimerturk: That means you've already applied multiple decoding/encoding passes - I'm not surprised it's wrong. Wherever possible, try *not* to get into the encoding business. See my answer for more details - but it's not clear why you've even *got* a file. I'm assuming that in reality the XML comes from a network stream - so get XmlSlurper to parse that stream *as an InputStream*. – Jon Skeet Oct 18 '11 at 12:58

2 Answers2

3

Try:

String file = "C:\\Documents and Settings\\user\\Desktop\\create.xml"

def xmlf = new XmlSlurper().parse( new File( file ) ).declareNamespace( 
        gContact:'http://schemas.google.com/contact/2008',
        gd:'http://schemas.google.com/g/2005' )
println xmlf.title  

You're going the long way round

tim_yates
  • 167,322
  • 27
  • 342
  • 338
  • As a i said before i have to send this atom xml to server with a rest client. I used file for find the problem. This approach worked on file, i will try this for ServletInputStream then, feedback. Thank you – erimerturk Oct 18 '11 at 13:07
  • @erimerturk `XmlSlurper` [can `parse` an `InputStream`](http://groovy.codehaus.org/api/groovy/util/XmlSlurper.html#parse%28java.io.InputStream%29), so you don't need to run everything through a chained sequence of `Readers` and `InputStreams` to get it into a `String` – tim_yates Oct 18 '11 at 13:11
  • 1
    I tried XmlSlurper().parse(request.getInputStream()) but know i get the "Premature end of file Exception" but i parsed same xml with this XmlSlurper().parse( new File( file ) ). what am i missing ? – erimerturk Oct 18 '11 at 14:19
1

This is the problem:

BufferedReader br = new BufferedReader(
    new InputStreamReader(new FileInputStream(file)));
while ((line = br.readLine()) !=null) {
    sb.append(line);
}

That's reading the file with the platform default encoding. If the encoding is wrong, you'll be reading the data incorrectly.

What you should do is let the XML parser handle it for you. It should be able to detect the encoding itself, based on the first line of data.

I'm not familiar with XmlSlurper but I'd expect it to either be able to parse an input stream (in which case just give it the FileInputStream) or handle the name of the file itself.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194