Am quite new to Hadoop and I wanted to import the semi-structured data - XML into HDFS. What are the ways to import XML data from a remote location to HDFS and any open source tools used for it? Can Flume import XML data into HDFS? Thanks in advance
Asked
Active
Viewed 1,916 times
1 Answers
1
You could try writing use HDFS Java API to create files in HDFS and write the whole content of the XML in each files.
Yes you could also go with flume, if there would be a large number of XML files getting generated from the source and will want to be sinked into HDFS.
You can have a look at these links : http://www.dummies.com/how-to/content/log-data-with-flume-in-hdfs.html

Jinith
- 439
- 6
- 16
-
Hello Jinith thank you for the response. Its quite hard to find an example in net for importing xml files using flume, however am going to give it a try. One more query in a scenario if there are many large xml files generated everyday and storage at client's server end and I would like to load/import these files into HDFS, should I use "wget" command to pull these files into Hadoop cluster and use "put" command to write the xml files to HDFS? or is there any best way? – avinash Dec 27 '15 at 20:56