Right now, I have some code that reads a page and saves everything to an html file. However, there are some problems... some punctuation and special characters show up as question marks.
Of course, if I do this manually, I'd save the .txt file with Unicode encoding rather than the default ANSI. I looked around, and all I see about this is complaining that it's impossible in Java or half explanations that I don't understand...
In any case, can anyone help me correct the question marks? Here is the part of my code that downloads the page. (The lister creates an array of urls to download, to be used with sites with pages. You can ignore that, it works fine.)
public void URLDownloader(String site, int startPage, int endPage) throws Exception {
String[] pages = URLLister(site, startPage, endPage);
String webPage = pages[0];
int fileNumber = startPage;
if (startPage == 0)
fileNumber++;
//change pages
for(int i = 0; i < pages.length; i++) {
webPage = pages[i];
URL url= new URL(webPage);
BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream()));
PrintWriter out = new PrintWriter(name + (fileNumber+i) + ".html");
String inputLine;
//while stuff to read on current page
while ((inputLine = in.readLine()) != null) {
out.println(inputLine); //write line of text
}
out.close(); //end writing text
if (startPage == 0)
startPage++;
console.append("Finished page " + startPage + "\n");
startPage++;
}