I am trying to load data from s3 bucket to spectrum table but data is not showing correct. Could you please help me to resolve the below issue.
Example:
website
is one column where data is - www.calidadhh.com/\
but while loading into table it is taking other column value - www.calidadhh.com/Home Health Agency Corporation
I am running below query -
drop table schema.dhs_account_ovrvw;
create external table schema.dhs_account_ovrvw(
hospital_id INTEGER
,hospital_name VARCHAR(255)
,hq_address VARCHAR(255)
,hq_address1 VARCHAR(255)
,hq_city VARCHAR(255)
,hq_state VARCHAR(50)
,hq_zip_code VARCHAR(50)
,hq_county VARCHAR(255)
,website VARCHAR(20000)
,firm_type VARCHAR(510)
,rec_add_user_nm VARCHAR(20)
,rec_add_tms TIMESTAMP
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ('quoteChar' = '\"','escapeChar' = '\\','separatorChar' = '\001')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://*******/Refined_Layer/DHS_ACC_OVRVW/'
TABLE PROPERTIES ('skip.header.line.count' = '1');
Output it is generating -
hospital_id,hospital_name,hq_address,hq_address1,hq_city,hq_state,hq_zip_code,hq_county,website,firm_type,rec_add_user_nm,rec_add_tms
929719 Calidad Home Health & Hospice 1600 E Expy 83 La Feria TX 78559 TX - Cameron www.calidadhh.com/Home Health Agency Corporation CDP_USER 2021-11-22 07:39:57
Output should be -
hospital_id,hospital_name,hq_address,hq_address1,hq_city,hq_state,hq_zip_code,hq_county,website,firm_type,rec_add_user_nm,rec_add_tms
929719 Calidad Home Health & Hospice 1600 E Expy 83 La Feria TX 78559 TX - Cameron www.calidadhh.com/\ Home Health Agency Corporation CDP_USER 2021-11-22 07:39:57