0

How can I make the following table source delimiter by one or more white spaces:

CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT)
COMMENT 'weather table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\n'
LOCATION '/data/Weather';

For example, the number of spaces between USAF and WBAN is just one space, while between 'Date' to DIR there are 3 spaces.

I have tried the following:

CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT)
COMMENT 'weather table'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "\s+"
)
STORED AS TEXTFILE
LOCATION '/data/Weather';

but when running simple query like select * from weather limit 100; I got the following error:

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: Number of matching groups doesn't match the number of columns
Time taken: 2.958 seconds
dtolnay
  • 9,621
  • 5
  • 41
  • 62
abutmah
  • 63
  • 1
  • 3
  • 9
  • my suggestion is, you have to clean your data to have necessary columns or change the delimiter. In your case having space will not work because having regex will count all the continuous space so the columns will not match. – Sathiyan S Dec 13 '16 at 10:43

1 Answers1

0

Use multi-delimiter serde

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ("field.delim"="\\s+")

Your final query may look as below:

CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT)
COMMENT 'weather table'
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ("field.delim"="\\s+")
STORED AS TEXTFILE
LOCATION '/data/Weather';

Hope this help you!!!

Farooque
  • 3,616
  • 2
  • 29
  • 41