0

ive been going through these joup bits to get some information from a div:

http://jsoup.org/cookbook/extracting-data/dom-navigation

Document doc = Jsoup.connect(path).get();
Element cat = doc.getElementById("category_1");
Elements links = cat.getElementsByTag("a");
for (Element link : links) 
{
    rstring += link.attr("href");
    rstring += link.text() + "\n";
}

that code bit i wrote does not work, and ive been working on this for hours.

i can get some of what i want with different jsoup functions, but i need to get the links in this particular action so i can populate and array of certain things for my android app.

im attempting to parse http://android.myfewclicks.com for testing and building an app for my real site.

any assistance at all would be wonderful. jsoup just wont cooperate.

    <table class="table_list">
        <tbody class="header" id="category_1">
            <tr>
                <td colspan="4">
                    <div class="cat_bar">
                        <h3 class="catbg">
                            <a class="collapse" href="http://android.myfewclicks.com/index.php?action=collapse;c=1;sa=collapse;c707bdb315=de9d7f201a0964cbab3d56e683507ad7#c1"><img src="http://android.myfewclicks.com/Themes/default/images/collapse.gif" alt="-" /></a>
                            <a class="unreadlink" href="http://android.myfewclicks.com/index.php?action=unread;c=1">Unread Posts</a>
                            <a id="c1"></a><a href="http://android.myfewclicks.com/index.php?action=collapse;c=1;sa=collapse;c707bdb315=de9d7f201a0964cbab3d56e683507ad7#c1">Category A</a>
                        </h3>
                    </div>
                </td>
            </tr>
        </tbody>

on my test forum, there are four categorys. the three links inside this particular part is 1 set of the 4. if i can figure out how to adaquitely parse these out, then i should be able to make a big leap on my app. but jsoup isnt behaving the way im thinking it should, or im missing something very crucial.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
texasman1979
  • 473
  • 1
  • 6
  • 17
  • could you post the exact HTML snippet that you're trying to parse? – MarcoS Sep 29 '11 at 09:43
  • it is the source to my test site: http://android.myfewclicks.com. – texasman1979 Sep 29 '11 at 10:11
  • basic rundown is im making an app for the main site, but i cant get jsoup to act right at all. – texasman1979 Sep 29 '11 at 10:14
  • let me rephrase the question: could you please post a **minimal** snippet of HTML that demonstrate the problem that you're trying to solve with jsoup? (the source of [http://android.myfewclicks.com/](http://android.myfewclicks.com/) is a bit too long :) ) – MarcoS Sep 29 '11 at 12:35
  • i updated the above. i wish there was just a way for me to maintain the session and parse the document just like an xml parser parses an xml file. any help will be appreciated. – texasman1979 Sep 29 '11 at 15:47

1 Answers1

1

You apparently need to login first in order to get the links with href. When I open the site in my browser while not logged in, I see

<tbody class="header" id="category_1">
    <tr>
        <td colspan="4">
            <div class="cat_bar">
                <h3 class="catbg">
                    <a id="c1"></a>Category A
                </h3>
            </div>
        </td>
    </tr>
</tbody>

I can get the links as follows:

Document document = Jsoup.connect("http://android.myfewclicks.com/").get();
Elements category1links = document.select("#category_1 a");

for (Element category1link : category1links) {
    System.out.println(category1links);
}

Which prints

<a id="c1"></a>

Note that there's no href or text!

Jsoup does not login for you automatically, nor does it take over the cookies of an arbitrary browser which is already installed on your machine. You need to login and maintain the session cookie yourself. See also Sending POST request with username and password and save session cookie for an example.

Community
  • 1
  • 1
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • i really appreciate your help. im logged in on the browser, i never would have thought that it would hide the links like that. not sure how the site works for guests. but that is a question, how could i get the important stuff with the app running as a guest. the reason is, if the app works out, im going to publish it for all SMF forums. but after looking at it, i may have answered my own question, quest browsing doesnt require as much stuff to navigate. – texasman1979 Sep 29 '11 at 16:36
  • any other input that you might have on this project, i would gladly appreciate. :) thx again. – texasman1979 Sep 29 '11 at 16:37
  • Sorry, I'm not familiar with the form engine your site is using. This is also not really programming related. Consider asking a question on the support site of the forum engine's company. – BalusC Sep 29 '11 at 16:42