1

I want to check if a column contains some set of special characters and do evaluation further, I was able to do it for most of the spl chars but not for one particular spl character. Has anyone come across and handled this spl char- Å? I even used backslash to escape it but in vain, as seen in the code below.


    --
    t1 = LOAD '$input_track1' using PigStorage('|') as (t1data:chararray,           sec_col:int);
    t1_output = foreach t1 generate $0, (CASE WHEN SUBSTRING($0, 0, 1) IN ('F','S') THEN 1 ELSE 0 END) AS ORR,
   (CASE WHEN SUBSTRING($0, 0, 1) IN ('^','@', '|','[',']','-','`','{','}','~','!','#','$','%','&','(',')','*','<','>',':','=','?', '"','\'''\Å')      THEN 1 ELSE 0 END) AS ORR2;
 dump track1_output;

Sample data:

ÅSecond|456

mercuryman
  • 11
  • 3
  • `Å` is a Norwegian letter. If you found that you may also find `Æ` and `Ø`. There are many other "special" characters you need to consider like that. – melwil Jul 10 '15 at 08:26
  • These are the only set of special characters expected in the data, so need to handle only these. – mercuryman Jul 10 '15 at 16:55
  • Have you tried using unicode notation? For that character it should be "\u00C5". You might also want to consider writing a UDF for this, should be way easier and more readable :) – LiMuBei Jul 14 '15 at 15:50

0 Answers0