0

On HDP 2.3.4, using Oozie 4.2.0 and Sqoop 1.4.2, I'm trying to create a coordinator app that will execute sqoop jobs on a daily basis. I need the sqoop action to execute jobs because these are incremental imports.

I've configured sqoop-site.xml and started the sqoop-metastore and I'm able to create, list, and delete jobs via the command line but the workflow encounters the error: Cannot restore job: streamsummary_incremental

stderr

Sqoop command arguments :
             job
             --exec
             streamsummary_incremental
Fetching child yarn jobs
tag id : oozie-26fcd4dc0afd8f53316fc929ac38eae2
2016-02-03 09:46:47,193 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at <myHost>/<myIP>:8032
Child yarn jobs are found - 
=================================================================

>>> Invoking Sqoop command line now >>>

2241 [main] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2016-02-03 09:46:47,404 WARN  [main] tool.SqoopTool (SqoopTool.java:loadPluginsFromConfDir(177)) - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2263 [main] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.6.2.3.4.0-3485
2016-02-03 09:46:47,426 INFO  [main] sqoop.Sqoop (Sqoop.java:<init>(97)) - Running Sqoop version: 1.4.6.2.3.4.0-3485
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage  - Cannot restore job: streamsummary_incremental
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(254)) - Cannot restore job: streamsummary_incremental
2552 [main] ERROR org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage  - (No such job)
2016-02-03 09:46:47,715 ERROR [main] hsqldb.HsqldbJobStorage (HsqldbJobStorage.java:read(255)) - (No such job)
2553 [main] ERROR org.apache.sqoop.tool.JobTool  - I/O error performing job operation: java.io.IOException: Cannot restore missing job streamsummary_incremental
    at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.read(HsqldbJobStorage.java:256)
    at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:198)
    at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
    at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:197)
    at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:177)
    at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
    at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:46)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:241)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

sqoop-site.xml

  <property>
    <name>sqoop.metastore.client.enable.autoconnect</name>
    <value>false</value>
    <description>If true, Sqoop will connect to a local metastore
      for job management when no other metastore arguments are
      provided.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.url</name>
    <value>jdbc:hsqldb:hsql://<myhost>:12345</value>
    <description>The connect string to use when connecting to a
      job-management metastore. If unspecified, uses ~/.sqoop/.
      You can specify a different path here.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.username</name>
    <value>SA</value>
    <description>The username to bind to the metastore.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.client.autoconnect.password</name>
    <value></value>
    <description>The password to bind to the metastore.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.location</name>
    <value>/tmp/sqoop-metastore/shared.db</value>
    <description>Path to the shared metastore database files.
    If this is not set, it will be placed in ~/.sqoop/.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.port</name>
    <value>12345</value>
    <description>Port that this metastore should listen on.
    </description>
  </property>

workflow.xml

  <action name="sqoop-import-job">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
      <job-tracker>${jobTracker}</job-tracker>
      <name-node>${nameNode}</name-node>
      <prepare>
        <delete path="${outputDir}"/>
      </prepare>
      <arg>job</arg>
      <arg>--exec</arg>
      <arg>${jobId}</arg>
    </sqoop>
    <ok to="hive-load"/>
    <error to="kill-sqoop"/>
  </action>

Additional info:

  • We're only running a single-node cluster.
  • Only Sqoop Client is installed.

I'm thinking maybe Oozie isn't able to connect to the metastore because we don't have sqoop server? Could anyone confirm this? If not that, could I have missed anything else?

Thanks!

Ton Torres
  • 1,509
  • 13
  • 24
  • The first thing that Sqoop complains about is `$SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.` Did you try to set it somehow? – Samson Scharfrichter Feb 03 '16 at 07:35
  • 1
    Also, did you try to put the Metastore URL as a Sqoop argument *(in command-line then in Oozie)*, and see if that works? And did you connect to the Metastore via Squirrel SQL or sthg like that, to check that your tests were really against *that* Metastore and not against the default (a local file)? – Samson Scharfrichter Feb 03 '16 at 07:38
  • @SamsonScharfrichter no, I didn't try to set `$SQOOP_CONF_DIR`. This seems to be a Hortonworks thing, because none of the usual Hadoop environment variables seem to be set. I'm not sure if this affects anything other than that warning though, as I'm able to run Sqoop normally via command line. – Ton Torres Feb 03 '16 at 07:45
  • @SamsonScharfrichter I haven't tried your suggestion of providing the Metastore URL as a command-line argument, though I did try configuring MySQL to be the metastore following this [guide](http://techuniqe.blogspot.com/2014/10/sqoop-mysql-metastore.html) and upon creating the jobs, I checked and found that the jobs were in MySQL. – Ton Torres Feb 03 '16 at 07:47
  • Side note: I don't really get what you mean by *"This seems to be a Hortonworks thing"* -- a quick search on Google will show a few hits of this error message with Cloudera clusters. But no mention of the variable itself. Might be a rogue message label left over from an unreleased patch... – Samson Scharfrichter Feb 03 '16 at 21:39
  • Where exactly did you update your `oozie-site.xml` -- on **every node of the cluster**? Or just on an edge node?? >> remember that Oozie will run the Sqoop command inside a YARN container *(i.e. a kind of Virtual Machine)* on a node that maybe you have no way to connect to and check how it was configured. – Samson Scharfrichter Feb 03 '16 at 21:46
  • Quoting the Oozie documentation: *"Sqoop configuration can be specified with a file, using the job-xml element"* >> upload your conf file on HDFS somewhere and tell Oozie to inject the contents in the Sqoop config with `/some/where/on/hdfs/sqoop-site.xml` – Samson Scharfrichter Feb 03 '16 at 21:49
  • @SamsonScharfrichter I finally tried your suggestion with regards to explicitly passing the metastore URL in the Oozie workflow and it works as expected now. After reading through your other comments, I suppose that Oozie tries to connect to a local metastore because it doesn't have a copy of `sqoop-site.xml` and doesn't know the autoconnect url – Ton Torres Feb 04 '16 at 01:28
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/102541/discussion-between-ton-torres-and-samson-scharfrichter). – Ton Torres Feb 04 '16 at 01:42

1 Answers1

0

I managed to resolve this issue with the help of @SamsonScharfrichter in the comments. I explicitly passed the metastore URL in the Oozie workflow and it worked:

<arg>job</arg>
<arg>--meta-connect</arg>
<arg>jdbc:hsqldb:hsql://<myhost>:12345/sqoop</arg>
<arg>--exec</arg>
<arg>myjob</arg>

It seems that Oozie tries to connect to a local metastore because it doesn't have a copy of sqoop-site.xml and so it doesn't know the metastore url (even though I'm running a single-node configuration).

Ton Torres
  • 1,509
  • 13
  • 24