0

I have successfully installed kudu on Ubuntu (Trusty) as per the official kudu documentations (see http://kudu.apache.org/docs/installation.html ). The setup has one node running master and tablet server and another node running the tablet server only. I am having issues installing impala-kudu without Cloudera Manager on the node running kudu master. I have followed CDH installation instructions on this (see http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_cdh5_install.html ) page until Step 3. I have avoided installing CDH with YARN and MRv1 as I don’t need to run any mapreduce jobs and will not be using hadoop. Impala-kudu and impala-kudu-shell installed without errors. When I launch the impala-shell it returns:

Starting Impala Shell without Kerberos authentication
Error connecting: TTransportException, Could not connect to kudu_test:21000
***********************************************************************************
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.7.0-cdh5-IMPALA_KUDU-cdh5 (48f1ad3) built on Thu Aug 18 12:15:44 PDT 2016)Want to know what version of Impala you're connected to? Run the VERSION command to
find out!
***********************************************************************************
[Not connected] > 

I have tried to use the CONNECT option to connect to the kudu-master node without success. Both imapala-kudu and kudu are running on the same machine. Are there additional configuration settings which need to be changed or is hadoop and YARN a strict requirement to make impala-kudu work?

After running ps -ef | grep -i impalad I can confirm the impala daemon is not running. After navigating to the impala logs at ~/var/log/impala I find a few errors and warning files. Here is the output of impalad.ERROR:

Log file created at: 2016/09/13 13:26:24
Running on machine: kudu_test
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0913 13:26:24.084389  3021 logging.cc:118] stderr will be logged to this file.
E0913 13:26:25.406966  3021 impala-server.cc:249] Currently configured default filesystem: LocalFileSystem. fs.defaultFS (file:///) is not supported.ERROR: block location tracking is not properly enabled because
  - dfs.datanode.hdfs-blocks-metadata.enabled is not enabled.
  - dfs.client.file-block-storage-locations.timeout.millis is too low. It should be at least 10 seconds.

E0913 13:26:25.406990  3021 impala-server.cc:252] Aborting Impala Server startup due to improper configuration. Impalad exiting.

Maybe I need to revisit HDFS and the Hive Metastore to ensure I have these services configured properly?

  • What command did you use to start impala? Did you check the output of that command or the impalad logs for error messages? – Zoltan Sep 12 '16 at 12:10
  • I used `impala-shell` to start impala. The output of this command generates the above. Checking the impalad logs at /var/log/impala there are few error and warnings but mostly duplicates. I will update the post above to reflect this –  Sep 13 '16 at 13:19

1 Answers1

0

According to the log, impalad quits because the default filesystem is configured to be LocalFileSystem, which is not supported. You have to set a distributed filesystem, such as HDFS as the default.

Although Kudu is a separate storage system and does not rely on HDFS, Impala still seems to require a non-local default FS even when using with Kudu. The Impala_Kudu documentation explicitly lists the following requirement:

Before installing Impala_Kudu, you must have already installed and configured services for HDFS (though it is not used by Kudu), the Hive Metastore (where Impala stores its metadata), and Kudu.

I can even imagine that HDFS may not really be needed for any other reason than to make Impala happy, but this is just speculation from my side. Update: Found IMPALA-1850 which confirms my suspicion that HDFS should not be needed for Impala any more, but it's not just a single check that has to be removed.

Zoltan
  • 2,928
  • 11
  • 25
  • Thankyou Zoltan. Will try to install it and see how i go. –  Sep 14 '16 at 00:08
  • Zoltan, that seemed to have worked. Could you explain why HDFS is required and how Kudu uses it (if at all?) –  Sep 18 '16 at 13:27
  • Hi GNettlefold, Kudu does not need HDFS, but apparently Impala does. I suspect that it wouldn't really be necessary, it is probably only required for historical reasons (i.e. back then before Kudu support was added it didn't make any sense not to use a remote FS). I extended my answer with this information. – Zoltan Sep 18 '16 at 17:29
  • Fair enough. Thanks –  Sep 19 '16 at 07:01
  • another question for you. Do kudu and impala-kudu need to run on the same machine or can they be separate and connected over the network? –  Sep 24 '16 at 10:50
  • I'm sorry, I don't know the answer to your last question. My guess would be that they can be separate, but I don't know what the performance implications are. – Zoltan Sep 26 '16 at 13:35