0

I am trying to create a saved sqoop job ( incremental ) and use sqoop metastore as repository. ( for security reasons "xyz" being the unix user who is running the job and sample_table is used for reference)

sqoop job -Dhadoop.security.credential.provider.path=jceks://path/to/pwd \
--meta-connect "jdbc:hsqldb:file:/tmp/sqoop-meta.db;shutdown=true" \
--create sample_job --import -Dmapred.job.queue.name=realtime \
--connect jdbc:mysql://hostname/db?zeroDateTimeBehavior=convertToNull \
--driver com.mysql.jdbc.Driver \
--table SAMPLE_TABLE -m 1 \
--username tony \
--password-alias pwd \
--incremental lastmodified \
--check-column ts \
--last-value 2018-04-24 \
--target-dir /some/tmp/location/ \
--map-column-hive XYZ=tinyint \
--null-string '' \
--null-non-string ''

Using a local sqoop metastore, which is also used while executing the stored job.

sqoop job --meta-connect "jdbc:hsqldb:file:/tmp/sqoop-meta.db;shutdown=true" -exec sample_job

Sqoop runs fine, and I can see data gets pulled to hdfs location. However, at the end getting this error. Cant seem to find a fix for it anywhere. Tried giving full permission to the _sqoop directory but still same prob.

    Logging initialized using configuration in jar:file:/usr/hdp/2.6.4.0-91/hive/lib/hive-common-1.2.1000.2.6.4.0-91.jar!/hive-log4j.properties
    OK
    Time taken: 3.028 seconds
    FAILED: SemanticException Line 2:17 Invalid path ''hdfs://hostname/user/xyz/_sqoop/ad240610c282442cb65b61b14f582935_sample_table'': No files matching path hdfs://vBAPUAT01nn/user/xyz/_sqoop/ad240610c282442cb65b61b14f582935_sample_table

But the above file is not present in the _sqoop location. Instead everytime I run the sqoop operation a see new differently named file created. /sqoop/

$ hadoop fs -ls /user/pcjaapp/_sqoop/
drwxrwxrwx   - xyz hdfs          0 2018-04-26 16:31 /user/xyz/_sqoop/1ab0f2de2ace4984b7b978af5f8f8336_sample_table
drwxr-xr-x   - xyz hdfs          0 2018-04-27 10:50 /user/xyz/_sqoop/3aedb9fc0857433f8388ae2c70019d93_sample_table
drwxrwxrwx   - xyz hdfs          0 2018-04-26 16:54 /user/xyz/_sqoop/6dfb90bcf7854f5e82efebbcd317cabe_sample_table
drwxrwxrwx   - xyz hdfs          0 2018-04-26 15:29 /user/xyz/_sqoop/a59b80e855734c35b785bdc35dcbd5a6_sample_table
drwxrwxrwx   - xyz hdfs          0 2018-04-26 14:30 /user/xyz/_sqoop/b77d4b88140e469daa5aa0962b56b562_sample_table

Can someone shed some light to what exactly is happening here. Fairly new in this field since it is the first time trying to launch sqoop incremental jobs using metastore.

StrangerThinks
  • 246
  • 4
  • 14
  • On further inspection I see the folder created is actually contains sqooped data. for e.g. the last run I manually edited a row in source and found that row in the /_sqoop/_table_name hdfs directory. Somehow, sqoop is pulling that data and merging it to the target-dir. But why is it throwing the error? – StrangerThinks Apr 27 '18 at 20:41
  • `_sqoop` is a temporary directory being created to store temporary data. looks like you have not provided the sufficient permission on your `hdfs` home directory. Can you try providing required permission to `create, read, wirte, delete` files on your `hdfs` home directory? – Sandeep Singh Apr 28 '18 at 04:47
  • You can try `chmod -R 755 /user/xyz` – Sandeep Singh Apr 28 '18 at 04:52
  • @SandeepSingh Hi. Sorry couldnt test any sooner coz I was away from my workstation. I checked. the /user/xyz already has 755 permission set. Any other insight? – StrangerThinks Apr 30 '18 at 13:40
  • UPDATE: Tried running the sqoop operation without using metastore. just the sqoop import command with incremental last-updated clause, merge-key. The problem still exists. At the end getting the same error. – StrangerThinks Apr 30 '18 at 18:08
  • Logging initialized using configuration in jar:file:/usr/hdp/2.6.4.0-91/hive/lib/hive-common-1.2.1000.2.6.4.0-91.jar!/hive-log4j.properties OK Time taken: 2.92 seconds FAILED: SemanticException Line 2:17 Invalid path ''hdfs://hostname/user/xyz/_sqoop/55cc1038f2924cc398e5e014061eb0f2_sample_table'': No files matching path hdfs://hostname/user/xyz/_sqoop/55cc1038f2924cc398e5e014061eb0f2_sample_table – StrangerThinks Apr 30 '18 at 18:08

1 Answers1

0

I dug the official docs & hortonworks resources, and am finally able to solve the issue by removing the below parameter from sqoop command:

--hive-import \
--map-column-hive XYZ=tinyint \

I realized the tinyint conversion can be ignored in this case by passing it as string. If using the above command, sqoop needs the "--hive-import" clause as well, and that is where it was failing because it is expecting an underlying table everytime sqooped. When the above clause was removed + the map-column-hive parameter, sqoop finished successfully and was able to update metastore with last-value.

StrangerThinks
  • 246
  • 4
  • 14