Loading data from HDFS to Kudu

Question

I'm trying to load data to a Kudu table but getting a strange result.

In the Impala console I created an external table from the four HDFS files imported by Sqoop:

drop table if exists hdfs_datedim;
create external table hdfs_datedim
( ... )
row format
 delimited fields terminated by ','
location '/user/me/DATEDIM';

A SELECT COUNT(*) tells me there lots of rows present. The data looks good when queried.

I use a standard select into to copy the results

INSERT INTO impala_kudu.DATEDIM
SELECT * FROM hdfs_datedim;

A SELECT COUNT(*) tells me impala_kudu.DATEDIM has four rows (the number of files in HDFS not the number of rows in the table.

Any Ideas?

Can you do a `select * from hdfs_datedim limit 10` to see if the result is indeed in the correct form? — Amos, Dec 20 '17 at 01:53
Yes. 'Select Count(*)' returns 17,000 not four. 'Select * ... limit 10' returns ten rows that look perfect. I thought the same thing too. The source table appears correct but I'm so inexperienced I could easily be wrong — Jay, Dec 20 '17 at 13:48
Does this only happen to kudu tables? Sounds like a bug to me. — Amos, Dec 20 '17 at 13:50

score 1 · Answer 1 · answered Mar 26 '18 at 12:44

1

Currently Sqoop does not support Kudu yet. You can import to HDFS and then use Impala to write data to Kudu.

answered Mar 26 '18 at 12:44

Maneesh K Bishnoi

131
1
7

I tried that as well. It gives me a generic "permission denied" message. Unfortunately I don't know which permission was denied. Another user claimed it gives that error message for any sort of failure. – Jay Mar 26 '18 at 16:52

score 0 · Accepted Answer · answered Jan 08 '18 at 16:04

0

The data created by sqoop was under the covers was a sequence of poorly formatted csv files. The import failed without an error because of data in the flat file. Watch out for date formats and text strings with delimiters embedded in the string.

answered Jan 08 '18 at 16:04

Jay

13,803
4
42
69

score 0 · Answer 3 · answered Jun 12 '18 at 07:56

If you have the data in HDFS in (csv/avro/parquet) format,then you can use the below command to import the files to Kudu table.

Prerequisites: Kudu jar with compatible version (1.6 or higher)

spark2-submit  --master yarn/local  --class org.apache.kudu.spark.tools.ImportExportFiles <path of kudu jar>/kudu-spark2-tools_2.11-1.6.0.jar --operation=import --format=<parquet/avro/csv> --master-addrs=<kudu master host>:<port number>  --path=<hdfs path for data> --table-name=impala::<table name>

Loading data from HDFS to Kudu

3 Answers3