how to work on specific part of cvs file uploaded into HDFS ? I'm new in Hadoop and i have an a question that is if i export an a relational database into cvs file then uploaded it into HDFS . so how to work on specific part (table) in file using MapReduce . thanks in advance .
-
what do you mean by specific part of the table? – Praveen Sripati Apr 17 '13 at 16:27
-
exported csv file contains all tables , so how to handle specific table that exist in any place in the file – Samy Louize Hanna Apr 18 '13 at 07:42
3 Answers
I assume that the RDBMS tables are exported to individual csv files for each table and stored in HDFS. I presume that, you are referring to column(s) data within the table(s) when you mentioned 'specific part (table)'. If so, place the individual csv files into the separate file paths say /user/userName/dbName/tables/table1.csv
Now, you can configure the job for the input path and field occurrences. You may consider to use the default Input Format so that your mapper would get one line at time as input. Based on the configuration/properties, you can read the specific fields and process the data.

- 899
- 7
- 9
Cascading
allows you to get started very quickly with MapReduce. It has framework that allows you to set up Taps
to access sources (your CSV file) and process it inside a pipeline say to (for example) add column A to column B and place the sum into column C by selecting them as Fields

- 1,146
- 7
- 10