0

I am creating Greenplum external table and populating it with data from my map reduce output files present in hdfs. I am able to connect external table to the HDFS and access all the files present in those directories. All files have values in comma separated format.

For example I have two files Employee and Student having comma as delimiter as:

Employee:

id, name, company, status
1, XYZ, Greenplumb, Online
2, ABC, Big Data, Available

Student:

name, courses, description
ABC, Hadoop, This course is about hadoop. (newline character) . It will help 
             understand what hadoop is and how to play with big data using hadoop.

So now for when I create an external table for employee file it works properly. So for every row in the Employee file a row in external table gets created. (delimiter is comma).

But when I try to create external table for Student file it gives error. Because description column has value having NEWLINE character feeds in it. So whenever External table is encountering a new line feed it is considering that as end of record and starts treating value after each newline feed as a new record.

Things I already tried:

  1. Making the above description value into double quotes that is treating it a complete string. But it did not worked out.
  2. Removing new line character feed from the data in map reduce itself but this is making my data unreadable. So not an option.

Can anyone suggest how I can handle this problem. Thanks in advance.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182
user1188611
  • 945
  • 2
  • 14
  • 38
  • maybe u'll try in a classic way using COPY http://www.postgresql.org/docs/9.2/static/sql-copy.html ? If it's really 3 column table all column are text it's the fastest way to do that and then, since data are in table u can do with it whatever u want. – Borys May 02 '13 at 18:46
  • thanks but that is not an option. it was a dummy example I created here to explain the problem I am facing. I have more than 3 columns and many such files. Please suggest something else. – user1188611 May 02 '13 at 18:53
  • Sounds like an ideal use for a simple Python script or similar. Import with `csv`, write with `psycopg2`. – Craig Ringer May 03 '13 at 00:27
  • @CraigRinger: can you share the python code you are talking about. – user1188611 May 08 '13 at 21:05
  • @user1188611 I don't have any canned code to hand that isn't part of a bigger and more complex tool. Start with http://docs.python.org/2/library/csv.html and http://initd.org/psycopg/docs/ – Craig Ringer May 09 '13 at 00:12

0 Answers0