0

I have a HTML Document which I would like to parse. I am trying to use cheerio to parse the HTML file.

<ul data-reactid=".0.1.0.0.1.1.0.0.0.0.1.0">
    <li class="_1ht1 _1ht2" data-reactid=".0.1.0.0.1.1.0.0.0.0.1.0.1:$user=xyz">
        .
        .
        .
        .
        <span data-reactid=".0.1.0.0.1.1.0.0.0.0.1.0.1:$user=xyz.0.0.$right.0.0.1.$left.0.1:0">
            My Random Text
        </span>
    </li>
</ul>

From my HTML I am am trying to extract the first instance of the ul tag with data-reactid=".0.1.0.0.1.1.0.0.0.0.1.0"

In that the very first li tag, I want to extract the user, in this case xyz. After that I want to find the text within the span class mentioned in the code.

Through Cheerio I tried the following:

var cheerio = require('cheerio'), 
fs = require('fs'); 

fs.readFile('index.html', 'utf8', dataLoaded);

function dataLoaded(err, data) {
    $ = cheerio.load(data);
    console.log("Trying out " + JSON.stringify($("<ul data-reactid=\".0.1.0.0.1.1.0.0.0.0.1.0\">").data()));
}   

It prints Trying out {"reactid":".0.1.0.0.1.1.0.0.0.0.1.0"} How do I get the value inside the HTML?

Note: xyz is dynamic and it will change

user1692342
  • 5,007
  • 11
  • 69
  • 128

3 Answers3

1

I think this will work for you if I understood your question correctly :

var myDataReactId = '.0.1.0.0.1.1.0.0.0.0.1.0'
var firstLi = $("ul[data-reactid = '" + myDataReactId + "'] li")[0];
//console.log(firstLi);
var user = $(firstLi).data('reactid');
$(firstLi).find("span[data-reactid*='" + user + "']").text();
Patel
  • 1,478
  • 1
  • 13
  • 24
  • You are almost right! Could you explain what data-reactid* does? what is the significance of *? I think currently its searching for any span tag with data-reactid which contains user variable in it? I am right? – user1692342 Jul 18 '15 at 10:17
  • 1
    @user1692342 Yes, It's searching for `span` tag (which *contains* user variable) *inside* that *specific* `li`. – Patel Jul 18 '15 at 10:20
  • thanks! I have got it working now!! :) This is working in my node js module, however I am facing an issue with nodewebkit. Could you check this question of mine as well :) http://stackoverflow.com/questions/31489279/uncaught-error-cannot-find-module-cheerio-nodewebkit – user1692342 Jul 18 '15 at 10:22
0

Try this. Basically it turns your HTML into something jquery can work with, and then it finds the unsorted-list, of course you can make the find more specific. Using .data() it extracts the value of data-reactid attribute.

reactid = $($(data).find('ul>li>span')).data('reactid');
Chol Nhial
  • 1,327
  • 1
  • 10
  • 25
  • With this I will get the value .0.1.0.0.1.1.0.0.0.0.1.0, which is not what I asked. I already know how to get that value. I am trying to find value in the span tag under the ul tag. I tried this console.log($(data).find('li._1ht1 _1ht2').data('reactid')); & it shows as undefined – user1692342 Jul 18 '15 at 09:19
  • Check the edition I've made, I'm using the child selector to locate the `span`. Hopefully it works. – Chol Nhial Jul 18 '15 at 09:28
  • Maybe my question phrasing is wrong and you were not able to get it. I am not looking for the first span inside the ul class. From the first ul class, I want to look at the first li tag, extract the data-reactid & then search for the text inside of the span class with the data-reactid received from the li tag – user1692342 Jul 18 '15 at 09:31
0

The problem with my first answer is that I didn't actually find the element you would like to extract the reactid from. With some js fiddling I was able to put something together that resembles your scenario. Noticed in the fiddle that I use .html(). Without further ado, here we go: http://jsfiddle.net/0r5k9egu/. Run the fiddle and in the console you should see .0.1.0.0.1.1.0.0.0.0.1.0.1:$user=xyz.0.0.$right.0.0.1.$left.0.1:0

Chol Nhial
  • 1,327
  • 1
  • 10
  • 25
  • Yes I can see that output, Is it possible to get the value inside the span which is my random text in this case. Could you briefly explain how your code is working? I am not able to follow through – user1692342 Jul 18 '15 at 09:52
  • I tried this with my Html File and I was not able to get it working. My html file has lot more span tags inside the li tag ! Also there are multiple ul tags before the one I mentioned. I will be requiring to search in a ul tag with that specific react-id – user1692342 Jul 18 '15 at 09:55