I am creating Greenplum external table and populating it with data from my map reduce output files present in hdfs. I am able to connect external table to the HDFS and access all the files present in those directories. All files have values in comma separated format.
For example I have two files Employee and Student having comma as delimiter as:
Employee:
id, name, company, status
1, XYZ, Greenplumb, Online
2, ABC, Big Data, Available
Student:
name, courses, description
ABC, Hadoop, This course is about hadoop. (newline character) . It will help
understand what hadoop is and how to play with big data using hadoop.
So now for when I create an external table for employee file it works properly. So for every row in the Employee file a row in external table gets created. (delimiter is comma).
But when I try to create external table for Student file it gives error. Because description column has value having NEWLINE character feeds in it. So whenever External table is encountering a new line feed it is considering that as end of record and starts treating value after each newline feed as a new record.
Things I already tried:
- Making the above description value into double quotes that is treating it a complete string. But it did not worked out.
- Removing new line character feed from the data in map reduce itself but this is making my data unreadable. So not an option.
Can anyone suggest how I can handle this problem. Thanks in advance.