I have taken off some of the columns from a CSV using pig script:
Cleaned = FOREACH data generate $0 .. $8,$11 .. $27, $31 .. $41, $45 .. $97, $99 .. $111;
In the columns that I have kept, I need to take off any new line character that may corrupt my data in hive. Be it \n
or \r
or \r\n
or <br>
. Since it is user entered data, I believe the line breaks created while typing the data using enter key would be one of the characters mentioned above, would appreciate if you could also specify what it is converted to but for the most part I need to make sure that any sort of line break is taken off the data to make sure that my data is mapped properly by hive CSV Parser. How do I do it in the pig script I am using to filter out the columns ?
Edit: 1. I wish to continue using column range instead of having to specify each column. 2. The example pointed to does not take care of all type of new line characters.