16

Using a Cygwin distribution, I've installed Hadoop 0.20.3 and Hive 0.11.0.

First of all, I don't understand how to use the Hive CLI:

hive> show tables;

Then enter and nothing happens. I can execute queries using hive -e/-f.

Then, I've created a table:

CREATE TABLE tweet_table(
tweet STRING
)
COMMENT 'Table of string'

But how can I insert data into this table? I see some INSERT INTO examples but when I try:

INSERT INTO TABLE tweet_table (tweet) VALUES ("data")

I've got an error:

FAILED: ParseException line 1:30 cannot recognize input near '(' 'tweet' ')' in select clause

How can I append data in my table?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Apaachee
  • 900
  • 2
  • 10
  • 32

5 Answers5

43

You can insert new data into table by two ways.

  1. Load the data of a file into table using load command.

    LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename.
    
  2. You can insert new data into table by using select query.

    INSERT INTO table tablename1 select columnlist FROM secondtable;
    
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Balaswamy Vaddeman
  • 8,360
  • 3
  • 30
  • 40
  • 1
    So the LOAD ways for news data and the insert INTO for data already existing in another table ? – Apaachee Jun 07 '13 at 11:17
  • If I have a file where one line = one row (STRING tweet) of my Hive table, how can i insert my data properly ? – Apaachee Jun 07 '13 at 11:20
  • you will have single column table so that you can load every line as a row – Balaswamy Vaddeman Jun 07 '13 at 11:22
  • Thanks Balaswamy. But 2 points are not clear for me. The file must be uploaded on the cluster in a first time to be loaded in a second time, i don't know how to do. And second point, I must store my data in a file every time I want to store data in Hive ? And if i have 3 load query per second...? Its not optimal I guess ? – Apaachee Jun 07 '13 at 11:28
  • that is the process of hive as per my understanding you need not upload your data to cluster you can even mention local so that local data is uploaded to hive.to answer second question yes it is files every time we are dealing with.but all these operations are done very fastly you rarely face performance issues here. – Balaswamy Vaddeman Jun 07 '13 at 11:35
5

Try to use this with single quotes in data:

insert into table test_hive values ('1','puneet');
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Puneet Sharma
  • 51
  • 1
  • 2
4

If you already have a table pre_loaded_tbl with some data. You can use a trick to load the data into your table with following query

INSERT INTO TABLE tweet_table 
  SELECT  "my_data" AS my_column 
    FROM   pre_loaded_tbl 
   LIMIT   5;

Also please note that "my_data" is independent of any data in the pre_loaded_tbl. You can select any data and write any column name (here my_data and my_column). Hive does not require it to have same column name. However structure of select statement should be same as that of your tweet_table. You can use limit to determine how many times you can insert into the tweet_table.

However if you haven't' created any table, you will have to load the data using file copy or load data commands in above answers.

René Nyffenegger
  • 39,402
  • 33
  • 158
  • 293
Animesh Raj Jha
  • 2,704
  • 1
  • 21
  • 25
2

If table is without partition then code will be,

Insert into table table_name select col_a,col_b,col_c from another_table(source table)

--here any condition can be applied such as limit, group by, order by etc...

If table is with partitions then code will be,

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

insert into table table_name partition(partition_col1, paritition_col2) select col_a,col_b,col_c,partition_col1,partition_col2 from another_table(source table)

--here any condition can be applied such as limit, group by, order by etc...

0

Although there is an accepted answer I would want to add that as of Hive 0.14, record level operations are allowed. The correct syntax and query would be:

INSERT INTO TABLE tweet_table VALUES ('data');