0

is it possible to get all website content into XML file ? means if i provided Website URL then it will get all website Content into XML file using JAVA.

if i give URL of this page then all content of this page will be in XML file.

Sagar Patel
  • 4,993
  • 1
  • 8
  • 19
  • 2
    Yes. If this is an unsatisfying answer, please take the time to read http://stackoverflow.com/help/how-to-ask - thanks! – reto Apr 11 '16 at 11:01
  • What do you mean with "Website URL to XML file" ? In what way exactly do you want to put the website content in to the XML file, could you expand a bit and describe your problem ? – Jonas Czech Apr 11 '16 at 11:02
  • @JonasCz means that if i give this page URL http://stackoverflow.com/questions/36546808/write-website-content-into-xml-file-in-java?noredirect=1#comment60694415_36546808 then all data of this page will be download into XML file. – Sagar Patel Apr 11 '16 at 12:07

1 Answers1

0

A very simplified approach do download a website (only static content) could be

// read the website from this URL
URL urlIn = new URL("http://www.example.com/index.html");
// save the content as file "/tmp/example.out"
Path pathOut = Paths.get("/tmp/example.out");
// read and write the data
Files.copy(urlIn.openStream(), pathOut, StandardCopyOption.REPLACE_EXISTING);
SubOptimal
  • 22,518
  • 3
  • 53
  • 69
  • @SagarPatel In that case you should answer to the already given comments and provide an example. **1)** How does the website look like you want to download. **2)** How should the XML look like for this example HTML page. Without this information it's nearly impossible to guess what you want to achieve. – SubOptimal Apr 11 '16 at 13:15
  • 1). Websites is any knd of its not fixed . 2). also XML format is not fixed it will be depend on Website. like if give URL of Stackoverflow.com then My xml will be in paired of Question and Answer. – Sagar Patel Apr 13 '16 at 08:18
  • 1
    @SagarPatel So you expect some `artificial intelligence` which knows how to separate the different webpages content. e.g. for `stackoverflow.com` into `request` and `answers`, and for `some.music.shop` into `CD name` and `tracks`? I believe this is not possible. Expect the webpages provide some [RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework) data. – SubOptimal Apr 13 '16 at 09:19
  • yes you exactly right... i need data as you described format. – Sagar Patel Apr 14 '16 at 06:13
  • @SagarPatel For StackOverflow you might have a look here http://meta.stackexchange.com/questions/146481/stack-overflow-api-for-java and https://api.stackexchange.com/. For any other page you have to check for other solutions. – SubOptimal Apr 14 '16 at 06:42