0

I am trying to load pipe separated csv file in hive table using python without success. Please assist.

Full code:

from pyhive import hive 
host_name = "192.168.220.135" 
port = 10000 
user = "cloudera" 
password = "cloudera" 
database = "default" 
conn = hive.Connection(host=host_name, port=port, username=user, database=database) 
print('Connected to DB: {}'.format(host_name)) 
cursor = conn.cursor() 
Query = """LOAD DATA LOCAL inpath '/home/cloudera/Desktop/ccna_test/RERATING_EMMCCNA.csv' INTO TABLE python_testing fields terminated by '|' lines terminated by '\n' """ 
cursor.execute(Query)
user8834780
  • 1,620
  • 3
  • 21
  • 48
waleed khalid
  • 11
  • 1
  • 2
  • 2
    It is not clear what you are asking I am afraid. – Dean Dec 21 '18 at 11:23
  • i want wo upload pipe separated csv in hive table using python code.. from pyhive import hive host_name = "192.168.220.135" port = 10000 user = "cloudera" password = "cloudera" database = "default" conn = hive.Connection(host=host_name, port=port, username=user, database=database) print('Connected to DB: {}'.format(host_name)) cursor = conn.cursor() Query = """LOAD DATA LOCAL inpath '/home/cloudera/Desktop/ccna_test/RERATING_EMMCCNA.csv' INTO TABLE python_testing fields terminated by '|' lines terminated by '\n' """ cursor.execute(Query) – waleed khalid Dec 21 '18 at 11:56
  • what is the error ? if not are you able to see files in HDFS location? – Gaurang Shah Dec 21 '18 at 19:33

1 Answers1

4

From your question, I assume the csv format is like below and you want a query to load data into hive table.

value1|value2|value3
value4|value5|value6
value7|value8|value9

First there should be a hive table and could be created using below query.

create table python_testing ( col1 string, col2 string, col3 string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with SERDEPROPERTIES ( "separatorChar" = "|") stored as textfile;

Note that separator character and input file format is explicitly given on table creation.

Also table is stored in TEXTFILE format. This is due to the format of input file.

If you want ORC table, then input file should be in ORC format(Hive 'load data' command just copies the files to hive data files and does not do any transformations on data). A possible workaround is to create a temporary table with STORED AS TEXTFILE, LOAD DATA into it, and then copy data from this table to the ORC table.

Use 'load' command to load the data.

load data local inpath '/home/hive/data.csv' into table python_testing;

/home/hive/data.csv should be your file path.

For more details visit blog post - http://forkedblog.com/load-data-to-hive-database-from-csv-file/

arunkvelu
  • 1,631
  • 10
  • 21