17

Let say I have a URL: http://java.sun.com/j2se/1.5/pdf I want to get a list of all files/directories under the pdf directory.

I'm using Java 5.

I can get the list of dir with this program http://www.httrack.com/, but with Java I don't know if it is possible.

Does any body know how to get it in Java? Or how can this program do the job if Java can't?

Lii
  • 11,553
  • 8
  • 64
  • 88
itro
  • 7,006
  • 27
  • 78
  • 121
  • 5
    do you know what kind of HTTP requests you would make to obtain them? Do you know whether the target HTTP server even supports directory listing? – wrschneider Jul 19 '12 at 13:13
  • I have not much info about server and HTTP request, the only thing what i have is a URL like mentioned above. – itro Jul 20 '12 at 08:41

2 Answers2

23

There are some conditions:

  1. The server must have enabled directory listing in order for you to see the content of it.
  2. There is no way I know of (no API or HTTP verb) to retrieve the listing, and so the listing is generally shown as a normal HTML page
  3. You will have to parse this HTML page in order to find the entries.

The parsing can be done easily using a lib like JSoup.

For example, using JSoup you can fetch the documents at url http://howto.unixdev.net/ like this:

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class Sample {
    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect("http://howto.unixdev.net").get();
        for (Element file : doc.select("td.right td a")) {
            System.out.println(file.attr("href"));
        }
    }
}

Will output:

beignets.html
beignets.pdf
bsd-pam-ldap.html
ddns-updates.html
Debian_on_HP_dv6z.html
dextop-slackware.html
dirlist.html
downloads/
ldif/
Linux-SharePoint.html
rhfc3-apt.html
rhfc3-apt.tar.bz2
SUNWdsee-Debian.html
SUNWdtdte-b69.html
SUNWdtdte-b69.tar.bz2
tcshrc.html
Test_LVM_Trim_Ext4.html
Tru64-CS20-HOWTO.html

As for your sample url http://java.sun.com/j2se/1.5/pdf this is a page not found, so I think you're out of luck.

Alex
  • 25,147
  • 6
  • 59
  • 55
  • I can get the list of dir with this program http://www.httrack.com/ . I think there must be a way to do it with java too – itro Jul 20 '12 at 08:35
7

If the URL is for the file: protocol, then you could convert it to a java.io.File, then use those methods to list the directory.

If the URL is for the http: protocol, then there is no concept of directories of files, and you fundamentally cannot do what you think you want to do. You will have to step back and look at the higher-level requirement you are trying to fulfill.

Have your server deploy a Servlet to retrieve a list of files from the folder specified by the request it receives. At your client end point, your application sends a request to the server by providing a path (virtual? relative ?) you intend to list. The servlet will return the list of files in the requested path, retrieved from the server's OS. Then, it serializes the file list to the client end point for further processing.

If you can render the page with HTTP access only then:
Use the HTML page and parse it giving directory listing to get the list of the files and viz-a-viz using regular expression to render the file names.

user207421
  • 305,947
  • 44
  • 307
  • 483
GingerHead
  • 8,130
  • 15
  • 59
  • 93