0

I'm trying to load data to a Kudu table but getting a strange result.

In the Impala console I created an external table from the four HDFS files imported by Sqoop:

drop table if exists hdfs_datedim;
create external table hdfs_datedim
( ... )
row format
 delimited fields terminated by ','
location '/user/me/DATEDIM';

A SELECT COUNT(*) tells me there lots of rows present. The data looks good when queried.

I use a standard select into to copy the results

INSERT INTO impala_kudu.DATEDIM
SELECT * FROM hdfs_datedim;

A SELECT COUNT(*) tells me impala_kudu.DATEDIM has four rows (the number of files in HDFS not the number of rows in the table.

Any Ideas?

Jay
  • 13,803
  • 4
  • 42
  • 69
  • Can you do a `select * from hdfs_datedim limit 10` to see if the result is indeed in the correct form? – Amos Dec 20 '17 at 01:53
  • Yes. 'Select Count(*)' returns 17,000 not four. 'Select * ... limit 10' returns ten rows that look perfect. I thought the same thing too. The source table appears correct but I'm so inexperienced I could easily be wrong – Jay Dec 20 '17 at 13:48
  • Does this only happen to kudu tables? Sounds like a bug to me. – Amos Dec 20 '17 at 13:50
  • I'll try other sources of data. Good suggestions – Jay Dec 20 '17 at 19:37

3 Answers3

1

Currently Sqoop does not support Kudu yet. You can import to HDFS and then use Impala to write data to Kudu.

  • I tried that as well. It gives me a generic "permission denied" message. Unfortunately I don't know which permission was denied. Another user claimed it gives that error message for any sort of failure. – Jay Mar 26 '18 at 16:52
0

The data created by sqoop was under the covers was a sequence of poorly formatted csv files. The import failed without an error because of data in the flat file. Watch out for date formats and text strings with delimiters embedded in the string.

Jay
  • 13,803
  • 4
  • 42
  • 69
0

If you have the data in HDFS in (csv/avro/parquet) format,then you can use the below command to import the files to Kudu table.

Prerequisites: Kudu jar with compatible version (1.6 or higher)

spark2-submit  --master yarn/local  --class org.apache.kudu.spark.tools.ImportExportFiles <path of kudu jar>/kudu-spark2-tools_2.11-1.6.0.jar --operation=import --format=<parquet/avro/csv> --master-addrs=<kudu master host>:<port number>  --path=<hdfs path for data> --table-name=impala::<table name>
Subash
  • 887
  • 1
  • 8
  • 19