1

I am trying to create a table in hive 3.0 using the following schema i found online:

    CREATE TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT< text : STRING, user : STRUCT<screen_name : STRING,name : STRING>, retweet_count : INT>,
entities STRUCT< urls : ARRAY<STRUT<expanded_url : STRING>>,
user_mentions : ARRAY<STRUCT<screen_name : STRING,name : STRING>>,
hashtags : ARRAY<STRUCT<text : STRING>>>,
text STRING,
user STRUCT< screen_name : STRING, name : STRING, friends_count : INT, followers_count : INT, statuses_count : INT, verified : BOOLEAN, utc_offset : INT, time_zone : STRING>, 
in_reply_to_screen_name STRING
) 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JSONSerDe';

enter image description here

when i press enter the NoViableAltException. I am using hive first time have no experience can someone tell me what is wrong with the schema?

Shivam...
  • 409
  • 1
  • 8
  • 21

1 Answers1

1

User is Reserved keyword in case if we are using keywords in hive then we need to enclose the keyword with `(backticks)

Example:

`user`

Try with below create table statement

    CREATE TABLE tweets (
    id BIGINT,
    created_at STRING,
    source STRING,
    favorited BOOLEAN,
    retweeted_status STRUCT< text : STRING, `user` : STRUCT<screen_name : STRING,name : STRING>, retweet_count : INT>,
    entities STRUCT< urls : ARRAY<STRUCT<expanded_url : STRING>>,
    user_mentions : ARRAY<STRUCT<screen_name : STRING,name : STRING>>,
    hashtags : ARRAY<STRUCT<text : STRING>>>,
    text STRING,
    `user` STRUCT< screen_name : STRING, name : STRING, friends_count : INT, followers_count : INT, statuses_count : INT, verified : BOOLEAN, utc_offset : INT, time_zone : STRING>, 
    in_reply_to_screen_name STRING
    ) 
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    Location '/user/flume/tweets/';

I am able to create table with above ddl:

desc tweets;
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
|         col_name         |                                                                     data_type                                                                     |      comment       |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
| id                       | bigint                                                                                                                                            | from deserializer  |
| created_at               | string                                                                                                                                            | from deserializer  |
| source                   | string                                                                                                                                            | from deserializer  |
| favorited                | boolean                                                                                                                                           | from deserializer  |
| retweeted_status         | struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int>                                                                 | from deserializer  |
| entities                 | struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>>   | from deserializer  |
| text                     | string                                                                                                                                            | from deserializer  |
| user                     | struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string>  | from deserializer  |
| in_reply_to_screen_name  | string                                                                                                                                            | from deserializer  |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+

UPDATE:

Hive works as schema on read when we run select statement hive looks for files in the directory where table is pointing(/user/hive/warehouse/tweets/) to then read those data as per your ddl statement but in this case data is not present in the directory so select statement is not returning any records.

To Fix this issue:

Option1. Move the data from /user/flume/tweets/ to /user/hive/warehouse/tweets/ directory then you are able to select the data from the table.

`hadoop fs -mv /user/flume/tweets/  /user/hive/warehouse/tweets/`

(or)

Option2. We need to create hive table on top of /user/flume/tweets/ this directory then you are able to see the data in tweets table(use the above create table statement for that).

notNull
  • 30,258
  • 4
  • 35
  • 50
  • i tried and got another error: FAILED: ParseException line 11:0 cannot recognize input near 'user' 'STRUCT' '<' in column name or constraint – Shivam... Oct 07 '18 at 15:40
  • @ShivamSingh, try with the **updated create table statement**. i tried and able to create table on my end. – notNull Oct 07 '18 at 16:24
  • thanks a lot bro, i got it! <3 can you tell what was wrong? – Shivam... Oct 07 '18 at 16:29
  • @ShivamSingh, 1. There is one spelling mistake for struct in **entities column**, 2. Sede is case sensitive **JsonSerDe** but you are having **JSONSerDe**. – notNull Oct 07 '18 at 16:41
  • Oh ok.Thanks :) – Shivam... Oct 07 '18 at 16:44
  • I loaded the data stored using flume in this table but when i run select * from tweets i get no results displayed, could you tell what is wrong? – Shivam... Oct 07 '18 at 17:13
  • @ShivamSingh, Make sure your table is created on top of the HDFS directory that you have stored from flume. If that doesn't worked please Update your question with some sample data. – notNull Oct 07 '18 at 17:29
  • My flume data is in directory /user/flume/tweets/. And hive table is in /user/hive/warehouse/tweets – Shivam... Oct 07 '18 at 17:43
  • @ShivamSingh, please check my edited answer's Update section. – notNull Oct 07 '18 at 18:06
  • https://stackoverflow.com/questions/52702996/data-moved-from-hdfs-to-hive-directory-on-performing-load-operation-on-hive-tabl can you check this? – Shivam... Oct 08 '18 at 18:07