20

I am using node-xml2js to parse XML. I am having trouble parsing XML from a URL. I've used this SO answer as a model. But when I use the code below, I get null as the result. Any thoughts on what I'm doing wrong?

UPDATE: I've updated the url for the xml to the actual site used.

var eyes = require('eyes');
var https = require('https');
var fs = require('fs');
var xml2js = require('xml2js');
var parser = new xml2js.Parser();

parser.addListener('end', function(result) {
  eyes.inspect(result);
  console.log('Done.');
});

https.get('https://tsdrapi.uspto.gov/ts/cd/casestatus/sn78787878/info.xml', function(result) {
  result.on('data', function (data) {
    parser.parseString(data);
    });
  }).on('error', function(e) {
  console.log('Got error: ' + e.message);
});
Community
  • 1
  • 1
Ken
  • 3,091
  • 12
  • 42
  • 69
  • You have a typo, should be ```var https = require('https')```. After fixing this it works for me. – edin-m Dec 07 '15 at 22:07
  • @Edin M Thank you for pointing that out. I forgot to fix that before copying/pasting after troubleshooting. Unfortunately, I'm stilling encountering the same problem when trying to access the actual site. – Ken Dec 07 '15 at 22:16
  • Endpoint is reachable again, my answer works. Hopefully to your satisfaction – Ferry Kobus Dec 12 '15 at 19:16

6 Answers6

24

I don't see xml2js being able to parse xml by its chunks, so we need to buffer entire http response. To do that I have used global here, but it is better to use something like concat-stream (I'll post that sometime later).

I have tried this one and it works for me:

 var eyes = require('eyes');
 var https = require('https');
 var fs = require('fs');
 var xml2js = require('xml2js');
 var parser = new xml2js.Parser();

 parser.on('error', function(err) { console.log('Parser error', err); });

 var data = '';
 https.get('https://tsdrapi.uspto.gov/ts/cd/casestatus/sn78787878/info.xml', function(res) {
     if (res.statusCode >= 200 && res.statusCode < 400) {
       res.on('data', function(data_) { data += data_.toString(); });
       res.on('end', function() {
         console.log('data', data);
         parser.parseString(data, function(err, result) {
           console.log('FINISHED', err, result);
         });
       });
     }
   });

Only when response ends sending, then we parse XML. xml2js uses sax which seems to have streaming support but not sure if xml2js takes advantage of it.

I have created small example which uses chunk-by-chunk parsing (similar like your example) but it fails giving error when parsing because in a chunk invalid xml arrives - that's why we need to buffer entire response.

If your xml is very big, try different parsers like sax which have stream support.

You can also add error handler to parser to be able to print errors if it encounters them.

Concat stream

With concat stream you can more elegantly concat all .on('data'...) calls:

var https = require('https');
var xml2js = require('xml2js');
var parser = new xml2js.Parser();
var concat = require('concat-stream');

parser.on('error', function(err) { console.log('Parser error', err); });

https.get('https://tsdrapi.uspto.gov/ts/cd/casestatus/sn78787878/info.xml', function(resp) {

    resp.on('error', function(err) {
      console.log('Error while reading', err);
    });

    resp.pipe(concat(function(buffer) {
      var str = buffer.toString();
      parser.parseString(str, function(err, result) {
        console.log('Finished parsing:', err, result);
      });
    }));

});

You can use sax to be able to not buffer entire file (in case your xml files are big), but it is more low level however, piping it as a stream is very similar.

edin-m
  • 3,021
  • 3
  • 17
  • 27
  • 1
    Probably file is small enough and is reported to ```.on('data', ...``` in one chunk. Larger files are reported with multiple chunks. This is because tcp is a streaming protocol and underlying implementations give us chunks as soon as they have it. Details of when and how, and chunk sizes are matters of implementations. – edin-m Dec 08 '15 at 12:54
  • I'll be using a number of other files that are quite large. Do both of your examples buffer the entire file? I thought that xml2js took advantage of sax. So I'm a little confused. – Ken Dec 12 '15 at 17:06
  • @Ken, Yes they do. I recommend using ```sax``` directly to be able to use streaming support since ```xml2js``` does not take advantage of it. There is an [opened issue](https://github.com/Leonidas-from-XIV/node-xml2js/issues/137) for stream support. – edin-m Dec 12 '15 at 20:50
  • res.statusCode >= 200 && res.statusCode < 400 – Watchmaker Jun 22 '16 at 12:12
5

Based on your question the solution should be something like this.

Both options are working as expected and give a valid json object of the xml. You can configure how to parse the xml as described in the read.me of xml2js

Native


var eyes = require('eyes'),
    https = require('https'),
    fs = require('fs'),
    xml2js = require('xml2js'),
    parser = new xml2js.Parser();


https.get('https://tsdrapi.uspto.gov/ts/cd/casestatus/sn78787878/info.xml', function(res) {
    var response_data = '';
    res.setEncoding('utf8');
    res.on('data', function(chunk) {
        response_data += chunk;
    });
    res.on('end', function() {
        parser.parseString(response_data, function(err, result) {
            if (err) {
                console.log('Got error: ' + err.message);
            } else {
                eyes.inspect(result);
                console.log('Done.');
            }
        });
    });
    res.on('error', function(err) {
        console.log('Got error: ' + err.message);
    });
});

ASYNC *Without the callback hell


var eyes = require('eyes'),
    https = require('https'),
    async =require('async'),
    xml2js = require('xml2js');

async.waterfall([
    function(callback) {
        https.get('https://tsdrapi.uspto.gov/ts/cd/casestatus/sn78787878/info.xml', function(res) {
            var response_data = '';
            res.setEncoding('utf8');
            res.on('data', function(chunk) {
                response_data += chunk;
            });
            res.on('end', function() {
                callback(null, response_data)
            });
            res.on('error', function(err) {
                callback(err);
            });
        });
    },
    function(xml, callback) {
        var parser = new xml2js.Parser();
        parser.parseString(xml, function(err, result) {
            if (err) {
                callback(err);
            } else {
                callback(null, result);
            }
        });
    }, 
    function(json, callback) {
        // do something usefull with the json
        eyes.inspect(json);
        callback();
    }
], function(err, result) {
    if (err) {
        console.log('Got error');
        console.log(err);
    } else {
        console.log('Done.');
    }
});
Ferry Kobus
  • 2,009
  • 2
  • 14
  • 18
4

Using xml2js, it's very simple.

var parseString = require('xml2js').parseString;

var xmldata = "XML output from the url";
console.log(xmldata);
parseString(xmldata, function (err, result) {
 // Result contains XML data in JSON format
});
zvava
  • 101
  • 1
  • 3
  • 14
Abdul Manaf
  • 4,933
  • 8
  • 51
  • 95
2
var https = require('https');
var parseString = require('xml2js').parseString;
var xml = '';

function xmlToJson(url, callback) {
  var req = https.get(url, function(res) {
    var xml = '';

    res.on('data', function(chunk) {
      xml += chunk;
    });

    res.on('error', function(e) {
      callback(e, null);
    }); 

    res.on('timeout', function(e) {
      callback(e, null);
    }); 

    res.on('end', function() {
      parseString(xml, function(err, result) {
        callback(null, result);
      });
    });
  });
}

var url = "https://tsdrapi.uspto.gov/ts/cd/casestatus/sn78787878/info.xml"

xmlToJson(url, function(err, data) {
  if (err) {
    // Handle this however you like
    return console.err(err);
  }

  // Do whatever you want with the data here
  // Following just pretty-prints the object
  console.log(JSON.stringify(data, null, 2));
});
Chris
  • 111
  • 1
  • 2
  • Hey, you can view the full explanation here: http://antrikshy.com/blog/fetch-xml-url-convert-to-json-nodejs/ I just basically changed the protocol from http to https in order to get the xml from the url the original question was based off. – Chris Feb 02 '16 at 14:08
  • @Chris, all that can be viewed at your link is a 404 error page with some prose about what you could do to find what you where looking for. That's why it's better to explain things here. (By the way, the page still exists (currently), try without trailing slash: http://antrikshy.com/blog/fetch-xml-url-convert-to-json-nodejs) – jox Nov 01 '16 at 21:39
1

Recommend you to use the libaray request, the code is quite simple. Then you can also use libaray cheerio along with it you you need extract any information from it for further processing.

var request = require('request');
const cheerio = require('cheerio');

request('https://tsdrapi.uspto.gov/ts/cd/casestatus/sn78787878/info.xml', function (error, response, html) {
    if (!error && response.statusCode == 200) {
        // Do something you need
        var $ = cheerio.load(html, {
            xmlMode: true
        });
    const nodes = $('div');  // Just an example
    }
});
Bravo Yeung
  • 8,654
  • 5
  • 38
  • 45
0
var Request = require("request");
const express = require("express");
const app = express();
app.use(express.urlencoded({extended: true}));
app.use(express.text())

app.post("/api/getXML", (req, res) => {
    Request.post({
        "headers": { "content-type": "text/plain; charset=utf-8"},
        "url": "<url which return xml response>",
        "body": req.body
    }, (error, response, body) => {
        if(error) {
            console.error(error);
            res.send(error)
        }
        console.log("XML body :",body);
        res.send(body);
    });
});

The idea got from the link https://www.thepolyglotdeveloper.com/2017/10/consume-remote-api-data-nodejs-application/

Alwin Jose
  • 696
  • 1
  • 5
  • 13