-1

I am trying to parse hebrew rss like this one: http://rss.walla.co.il/?w=/3/0/12/@rss.e

I am using feedparser and request, and the problem is that the encoding is windows-1255 and not UTF-8

so I see the text like: ����� ������� , and not like a regular hebrew text.

I tried some converts (like iconv-lite) but I did not succeed.

This is my code:

function getAll(URL) {
var request = require('request');
request(URL, function (error, response, body) {
    if (!error && response.statusCode == 200) {
        var allXML = body.substring(body.indexOf('<title>') + ('<title>').length, body.indexOf('</title>'));
        var text = iconv.decode(new Buffer(allXML), 'win1255');
        console.log("text = ", text);
    }
})

}

and this is what it print: text = ן¿½ן¿½ן¿½ן¿½ן¿½! ן¿½ן¿½ן¿½ן¿½ן¿½ - ן¿½ן¿½ן¿½ן¿½ן¿½

nir
  • 11
  • 2

2 Answers2

0

You can use a module such as iconv or iconv-lite to convert between encodings, since currently node only natively supports utf8, utf16le, latin1/binary, ascii, hex, and base64.

mscdex
  • 104,356
  • 15
  • 192
  • 153
0

I used this source: github

function getText(URL) {
http.get(URL, function(res) {
    var chunks = [];
    res.on('data', function(chunk) {
        chunks.push(chunk);
    });
    res.on('end', function() {
        var decodedBody = iconv.decode(Buffer.concat(chunks), 'win1255');
        console.log(decodedBody);
    });
});

}

nir
  • 11
  • 2