Android - How to extract HTML from a FTP Website

Question

I have looked everywhere but cannot find a solution for my particular case.

I have a website that is essential just a directory listing of a bunch of files (directory listing is enabled on the server). The website URL schema is 'ftp://'. All I want to do is extract the HTML so that I can get the names and URLs of the files within the directory. I have tried the following code (sorry, I can't post the actual FTP URL):

String ftpURL = "ftp://blah.com"
URL url = new URL(ftpURL);
URLConnection urlc = url.openConnection();

// open the stream and put it into BufferedReader
BufferedInputStream bis = new  BufferedInputStream(urlc.getInputStream()); // ERROR HERE

int inputLine;
String outputHtml = "";


while ((inputLine = bis.read()) != -1) {
    outputHtml += inputLine;
}

 bis.close();

When I run this code I get this error on the 4th line of code:

java.io.IOException: Unable to connect to server: Unable to retrieve file: 550

EDIT: If extracting the HTML from the ftp site isn't a possibility, how would I go about getting a list of the names and urls to each file in the directory specified in the ftp URL? Also, I should note that I can access the ftp site publically and can view all sub files and directories without any authentication required.

Any ideas? Thank you!

What makes you think an `ftp://` link returns HTML? The FTP protocol is separate and distinct from HTTP. Anything you see in a browser when browsing an `ftp://` link is the browser talking to the FTP server and then rendering the results internally as HTML. No HTML is sent over the wire. — Jim Garrison, Aug 23 '16 at 05:18
Try using an FTP client like http://commons.apache.org/proper/commons-net/ — Diego Torres Milano, Aug 23 '16 at 05:23
@JimGarrison: you can transfer any kind of data with FTP, i.e. images, programs .. and also HTML files. The contents of the data (image, HTML...) is not related to the protocol (HTTP, FTP, ...) — Steffen Ullrich, Aug 23 '16 at 05:23
Not with the JDK libraries. There are extended libraries like the Apache Commons FTPClient. — Jim Garrison, Aug 23 '16 at 05:25
@SteffenUllrich "All I want to do is extract the HTML so that I can get the names and URLs" this indicates that OP is not talking about HTML files — Diego Torres Milano, Aug 23 '16 at 05:26
@JimGarrison this is news to me, I just assumed it was similar to HTTP. I guess then I need to know how I can get a list of all the sub files in the directory — n00bAppDev, Aug 23 '16 at 05:29
Possible duplicate of [URLConnection FTP list files](http://stackoverflow.com/questions/14200106/urlconnection-ftp-list-files) — Steffen Ullrich, Aug 23 '16 at 05:36
@DiegoTorresMilano: I see - the OP actually wants a directory listing and not a HTML file stored at the server. The last one is possible with URLConnection, the first one not. — Steffen Ullrich, Aug 23 '16 at 05:37

score 0 · Accepted Answer · edited May 23 '17 at 12:15

0

java.io.IOException: Unable to connect to server: Unable to retrieve file: 550

"550" is the code send by the FTP server in response for you requesting the file. According to the FTP standard this means:

     550 Requested action not taken.
         File unavailable (e.g., file not found, no access)

Which simply means that your URL is probably wrong, i.e. that the file does not exist at the server with this name or that you don't have any permission to retrieve it. In this case you should also not be able to retrieve the same URL inside a web browser. Note that the case of the file name will matter with most FTP servers.

Note that you cannot get the directory contents in FTP by trying to access ftp://hostname/directory because a directory is not a file. What you see in the browser instead is the result of the browser doing a directory listing on the FTP server which is different to retrieving a file. To get a directory listing you would need to use a FTP library instead, i.e URLConnection will not help. See URLConnection FTP list files for more information.

edited May 23 '17 at 12:15

Community

1
1

answered Aug 23 '16 at 05:26

Steffen Ullrich

114,247
10
131
172

@Steffen The weird thing is that when I visit the ftp URL on my web browser I can view all the sub files in the directory and can open them find. And on the server the ftp URL is set to public, there is no authentication needed... – n00bAppDev Aug 23 '16 at 05:31
@n00bAppDev: see edited response: Its not a HTML file stored on the server you see. Instead the browser is doing the directory listing for you. – Steffen Ullrich Aug 23 '16 at 05:41
I have tried the solution in the link you provided although I am getting an error message when trying to call FTPClient.listFiles() – n00bAppDev Aug 23 '16 at 06:40
@n00bAppDev: in this case open a different question [with enough information to reproduce your problem](http://stackoverflow.com/help/mcve). Note that just "getting an error message" and not showing code does not count as enough information. Your original question on "How to extract HTML from a FTP Website" is answered, the new one is about getting a directory listing and also has different code. – Steffen Ullrich Aug 23 '16 at 07:11

Android - How to extract HTML from a FTP Website

1 Answers1