0

I am new to csquery and I am having trouble crawling html like this below:

<li id="Ingredient">
    <span id="Amount" class="ingredient-amount">1 pound</span>
    <span id="Name" class="ingredient-name">sweet Italian Sausage
</li>
<li id="Ingredient">
    <span id="Amount" class="ingredient-amount">3/4 pound</span>
    <span id="Name" class="ingredient-name">lean ground beef</span>
</li>

I want to take out the text inside span tags and format them as follows:

1 pound sweet Italian sausage
3/4 pound lean ground beef

This is my code below :

for (int i = 0; i < dom.Select("#Ingredient").Length; ++i) {
    if (dom.Select("#Ingredient span#Amount")[i] != null)
            Console.WriteLine(dom.Select("#Ingredient span#Amount")[i].InnerHTML + " ");
    if (dom.Select("#Ingredient span#Name")[i] != null)
            Console.WriteLine(dom.Select("#Ingredient span#Name")[i].InnerHTML);
    Console.WriteLine(Environment.NewLine);
}

It works fine with the html above but the problem arises when one of the span is missing. For example if <span id="lblIngName" class="ingredient-name">sweet Italian sausage</span> was missing from the html, my code would return:

1 pound lean ground beef
3/4 pound

As you can see, the lean ground beef went up. I want it to say with 3/4 pound at all costs. And 1 pound can stay alone. How can I do that? I have tried a lot of ways but it didn't work. So I want to do something like : for each "#Ingredient" write the "#Amount" if it exists or "#Name" if it exists. Do not bother with things on another Ingredient

wingerse
  • 3,670
  • 1
  • 29
  • 61
  • 2
    the html is invalid anyways. duplicate ids are not permitted, and your loop would only ever return the FIRST element with the matching id. if you work around that, then why even deal with the spans? Get the ingredient divs and extract the entirety of their innerText, which could give you '1 pound sweet italian sausage' already. – Marc B May 22 '15 at 19:53
  • "and your loop would only ever return the FIRST element with the matching id" No. `dom.Select("#Ingredient span#Amount")` returns all elements with matching ids and I am using `[i]` to get the correct element. – wingerse May 22 '15 at 20:02
  • I will try the second suggestion. Thanks – wingerse May 22 '15 at 20:03
  • 1
    then c# is doing it wrong. a DOM id must be unique throughout the document. since it HAS to be unique, an id select should only ever return one element. – Marc B May 22 '15 at 20:03
  • But @MarcB, how can I extract "1 pound sweet italian sausage" from `1 pound sweet Italian Sausage`? I have tried a lot of ways but non worked – wingerse May 23 '15 at 15:14
  • Nevermind, I created a CQ object from the innerHTML of the "#ingredient" li – wingerse May 23 '15 at 15:40

0 Answers0