1

I am trying to import data from a CSV file (latlong.csv) and I want to remove all of the quotes from my columns. Please Refer to first image.

First image

This is the code I used to import the data

CREATE TABLE IF NOT EXISTS latlong
    (COUNTRY String, ALPHA2 String, ALPHA3 String, NUMERICCODE String,
    LATITUDE String, LONGITUDE String)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    STORED AS TEXTFILE
    tblproperties("skip.header.line.count"="1");

LOAD DATA LOCAL INPATH '/tmp/project2/latlong.csv' INTO TABLE latlong; 

I tried to use the command below but I get an error. Error saying I can only insert into the table and not update it (i think).

Update latlong set country = replace(country, '"', '')

error message

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Alex
  • 11
  • 3
  • Does this answer help? [How to replace characters in Hive](https://stackoverflow.com/questions/18090481/how-to-replace-characters-in-hive/18098996) – ldz Nov 30 '19 at 23:37

1 Answers1

1

To update table which is not in the transaction mode use INSERT OVERWRITE. Double quote needs shielding. Use ["] or double-slash \\":

insert overwrite table latlong
select regexp_replace(COUNTRY, '["]', '') COUNTRY, --this will remove double-qutes from COUNTRY column
       ALPHA2, ALPHA3, NUMERICCODE, LATITUDE, LONGITUDE
  from latlong;

This solution is applicable if you have quotes inside strings and you want to remove them. It seems this is not your case.

If you have quoted columns, like in your data example, then use SerDe to remove quotes during de-serialization, this is far more efficient. Just create table with proper SerDe and properties:

drop table latlong;

CREATE TABLE latlong
(COUNTRY String, ALPHA2 String, ALPHA3 String, NUMERICCODE String,
LATITUDE String, LONGITUDE String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES 
(
"separatorChar" = ",",
"quoteChar"     = "\""
)  
STORED AS TEXTFILE
tblproperties("skip.header.line.count"="1");
;

LOAD DATA LOCAL INPATH '/tmp/project2/latlong.csv' INTO TABLE latlong; 

SerDe will remove quotes during select, no need to update the table.

leftjoin
  • 36,950
  • 8
  • 57
  • 116