how to work on specific part of cvs file uploaded into HDFS?

Question

how to work on specific part of cvs file uploaded into HDFS ? I'm new in Hadoop and i have an a question that is if i export an a relational database into cvs file then uploaded it into HDFS . so how to work on specific part (table) in file using MapReduce . thanks in advance .

exported csv file contains all tables , so how to handle specific table that exist in any place in the file — Samy Louize Hanna, Apr 18 '13 at 07:42

score 0 · Answer 1 · answered Apr 17 '13 at 17:02

I assume that the RDBMS tables are exported to individual csv files for each table and stored in HDFS. I presume that, you are referring to column(s) data within the table(s) when you mentioned 'specific part (table)'. If so, place the individual csv files into the separate file paths say /user/userName/dbName/tables/table1.csv

Now, you can configure the job for the input path and field occurrences. You may consider to use the default Input Format so that your mapper would get one line at time as input. Based on the configuration/properties, you can read the specific fields and process the data.

score 0 · Answer 2 · answered Apr 18 '13 at 01:20

Cascading allows you to get started very quickly with MapReduce. It has framework that allows you to set up Taps to access sources (your CSV file) and process it inside a pipeline say to (for example) add column A to column B and place the sum into column C by selecting them as Fields

score 0 · Accepted Answer · answered Apr 18 '13 at 15:35

0

use BigTable means convert your database to one big table

answered Apr 18 '13 at 15:35

Samy Louize Hanna

821
2
8
15

how to work on specific part of cvs file uploaded into HDFS?

3 Answers3