Task: I want to split a variable called "website" in a hive table to get all the websites that are delimited by character space or \n
Issue: When I use either of the following queries:
SELECT website,split(website, '[\\s]') as websites FROM temp_pages
SELECT website,split(website, '[\\s, \\n]') as websites FROM temp_pages
I am unable to achieve the desired results. Here are the results that I get
Expected Output - delimited on space
Input: http://www.insync4all.com http://www.insync4all.nl
Output: ["http://www.insync4all.com","http://www.insync4all.nl"]
Unexpected output - Delimited on \n.
When there is an \n character instead of splitting the websites based on \n character it introduces \\n
Input: www.imtherealthing.com\nwww.childmodelmagazine.com
Output: ["www.imtherealthing.com\\nwww.childmodelmagazine.com"]
Can someone help me to split the website field on \n. It will also be good to understand what is going wrong in the \n case.