0

I have tried to configured Impala to run on top of Alluxio, but failed.

Here is the Impala configurations:

/etc/impala/conf/core-site.xml(http://www.alluxio.org/docs/1.6/en/Running-Hadoop-MapReduce-on-Alluxio.html)

<configuration>
<property>
  <name>fs.alluxio.impl</name>
  <value>alluxio.hadoop.FileSystem</value>
  <description>The Alluxio FileSystem (Hadoop 1.x and 2.x)</description>
</property>
<property>
  <name>fs.AbstractFileSystem.alluxio.impl</name>
  <value>alluxio.hadoop.AlluxioFileSystem</value>
  <description>The Alluxio AbstractFileSystem (Hadoop 2.x)</description>
</property>
</configuration>

/etc/impala/conf/hive-site.xml(http://www.alluxio.org/docs/1.6/en/Running-Hive-with-Alluxio.html)

<property>
   <name>fs.defaultFS</name>
   <value>alluxio://master_hostname:port</value>
</property>

Then I started Impala(impala-server, impala-catalogd, impala-state-store), but in the log I found this:

...impala-server.cc:282] Currently configured default file system: FileSystem. fs.defaultFS (alluxio://192.168.1.10:19998/) is not supported.
...impala-server.cc:285] Aborting Impala Server startup due to improper configuration. Impalad exiting.

I have searched a lot on Bing but got no luck. Even there is few result on search key words 'impala on alluxio'. So can impala run on top of alluxio? Any suggestions will be appreciated.

My Impala version: 2.10.0-cdh5.13.0 RELEASE, Alluxio version: alluxio-1.8.0-hadoop-2.7

dtolnay
  • 9,621
  • 5
  • 41
  • 62
Allen Xu
  • 133
  • 1
  • 10
  • Impala is a C++ application and does not use the same HDFS client lib as regular Hadoop components (Java apps). Unless Alluxio has a specific how-to that explains how to plug Impala on specific libs, you are screwed. – Samson Scharfrichter Sep 28 '18 at 07:52
  • 1
    Based on search in Impala repo, I found the log content appears in: https://github.com/cloudera/Impala/blob/cdh5.13.0-release/fe/src/main/java/org/apache/impala/service/JniFrontend.java So it appears that Impala checks default file system in config file and finds that Alluxio's FileSystem Implementation is not one of DistributedFileSystem, S3AFileSystem or AdlFileSystem, impalad fails to start. – Allen Xu Sep 28 '18 at 09:37

1 Answers1

2

Have you tried running Hive with external tables on Alluxio? Instead of setting Alluxio as defaultFS, remove

<property>
   <name>fs.defaultFS</name>
   <value>alluxio://master_hostname:port</value>
</property>

and use something like the following to create a table on Alluxio:

hive> CREATE TABLE u_user (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 'alluxio://master_hostname:port/table_path';

That might help workaround Impala's filesystem implementation check. Also there is a bug in CDH 5.13 and below which prevents Impala from reading data in Alluxio. You might want to upgrade to CDH 5.14 which fixed that issue.

Bin Feng
  • 21
  • 2
  • Please provide the elemental parts of your answer in the text, as links may expire. – leonardkraemer Oct 01 '18 at 18:22
  • Yes I have tried Hive on Alluxio and it worked well. I noticed that our Impala version is too old. When the newer version is available, I will give it another try. Thanks. @binfeng – Allen Xu Oct 10 '18 at 07:17