7

I am trying to get data from the Bing search API, and since the existing libraries seem to be based on old discontinued APIs I though I'd try myself using the request library, which appears to be the most common library for this. My code looks like

var SKEY           =  "myKey...." , 
    ServiceRootURL =  'https://api.datamarket.azure.com/Bing/Search/v1/Composite';

function getBingData(query, top, skip, cb) {
    var params = {
         Sources: "'web'", 
         Query: "'"+query+"'", 
         '$format': "JSON", 
         '$top': top, '$skip': skip
       },
       req = request.get(ServiceRootURL).auth(SKEY, SKEY, false).qs(params);
    request(req, cb)
}

getBingData("bookline.hu", 50, 0, someCallbackWhichParsesTheBody)

Bing returns some JSON and I can work with it sometimes but if the response body contains a large amount of non ASCII characters JSON.parse complains that the string is malformed. I tried switching to an ATOM content type, but there was no difference, the xml was invalid. Inspecting the response body as available in the request() callback actually shows bad code.

So I tried the same request with some python code, and that appears to work fine all the time. For reference:

r = requests.get(
       'https://api.datamarket.azure.com/Bing/Search/v1/Composite?Sources=%27web%27&Query=%27sexy%20cosplay%20girls%27&$format=json', 
        auth=HTTPBasicAuth(SKEY,SKEY))
stuffWithResponse(r.json())

I am unable to reproduce the problem with smaller responses (e.g. limiting the number of results) and unable to identify a single result which causes the issue (by stepping up the offset). My impression is that the response gets read in chunks, transcoded somehow and reassembled back in a bad way, which means the json/atom data becomes invalid if some multibyte character gets split, which happens on larger responses but not small ones.

Being new to node, I am not sure if there is something I should be doing (setting the encoding somewhere? Bing returns UTF-8, so this doesn't seem needed).

Anyone has any idea of what is going on?

FWIW, I'm on OSX 10.8, node is v0.8.20 installed via macports, request is v2.14.0 installed via npm.

riffraff
  • 2,429
  • 1
  • 23
  • 32

3 Answers3

1

i'm not sure about the request library but the default nodejs one works well for me. It also seems a lot easier to read than your library and does indeed come back in chunks.

http://nodejs.org/api/http.html#http_http_request_options_callback or for https (like your req) http://nodejs.org/api/https.html#https_https_request_options_callback (the same really though)

For the options a little tip: use url parse

var url = require('url');

var params = '{}'

var dataURL = url.parse(ServiceRootURL);
var post_options = {  
    hostname: dataURL.hostname,
    port: dataURL.port || 80,
    path: dataURL.path,
    method: 'GET',  
    headers: {  
        'Content-Type': 'application/json; charset=utf-8',  
        'Content-Length': params.length  
    }  
};

obviously params needs to be the data you want to send

rob_james
  • 1,262
  • 1
  • 12
  • 17
  • TBH I had tried to do it this way (though using `https.get` rather than `.request`) too but I couldn't get it to work, I must have got something wrong. Anyway it appears to work now, so I'll accept your answer anyway if someone doesn't provide a fix for using the `request` module. Thanks! – riffraff Feb 22 '13 at 16:03
  • 1
    It may have more to do with the fact that the JSON is in fact malformed. If you have a string with a multibyte character in it and you pass the `Content-Length` as `params.length`, then you're saying the content has the same byte length as the number of characters in the string. This isn't true with multibyte characters. Instead of `{"name": "feeé"}`, your api is probably getting `{"name": "feeé"` – amsross Mar 11 '16 at 22:32
0

I think your request authentication is incorrect. Authentication has to be provided before request.get. See the documentation for request HTTP authentication. qs is an object that has to be passed to request options just like url and auth. Also you are using same req for second request. You should know that request.get returns a stream for the GET of url given. Your next request using req will go wrong.

If you only need HTTPBasicAuth, this should also work

//remove req = request.get and subsequent request
request.get('http://some.server.com/', {
  'auth': {
    'user': 'username',
    'pass': 'password',
    'sendImmediately': false
  }
 },function (error, response, body) {
});

The callback argument gets 3 arguments. The first is an error when applicable (usually from the http.Client option not the http.ClientRequest object). The second is an http.ClientResponse object. The third is the response body String or Buffer. The second object is the response stream. To use it you must use events 'data', 'end', 'error' and 'close'.

Be sure to use the arguments correctly.

user568109
  • 47,225
  • 17
  • 99
  • 123
  • no, the fluent syntax sets the options correctly, it's just quite poorly documented. My problem is not authentication, I can see that it works and I get an authnticated response. My issue is the mangled response body. – riffraff Feb 22 '13 at 22:31
0

You have to pass the option {json:true} to enable json parsing of the response

Duane Fields
  • 1,331
  • 12
  • 20
  • the problem is encoding, not format, if you read the question I also tried with ATOM. But the question is 18 months old, so hopefully they fixed it. – riffraff Sep 19 '14 at 06:17