On a Centos 7 vm, we installed mssql server 2019 following the instructions here: https://learn.microsoft.com/en-us/sql/linux/quickstart-install-connect-red-hat?view=sql-server-linux-ver15
We then installed mssql-server-polybase following the instructions here: https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-linux-setup?view=sql-server-ver15
After running a query against polybase, we received the following error message:
Msg 110813, Level 16, State 1, Line 26
The Remote Java Bridge has not been attached yet.
This is the query we attempted to run:
use testdb
GO
CREATE EXTERNAL DATA SOURCE [HadoopSouthMLI] WITH (TYPE = HADOOP, LOCATION = N'hdfs://servername:8020', RESOURCE_MANAGER_LOCATION = N'rm-servername:8032')
GO
CREATE EXTERNAL FILE FORMAT [parquetz_gz_file_format] WITH (FORMAT_TYPE = PARQUET, DATA_COMPRESSION = N'org.apache.hadoop.io.compress.GzipCodec')
GO
CREATE EXTERNAL TABLE [dbo].[test] (
[primary_key] nvarchar(64) NOT NULL,
[type] nvarchar(64) NOT NULL,
[track_key] nvarchar(64) NOT NULL,
[id_number] nvarchar(6),
[additional_id_number] nvarchar(6),
[score] float NULL,
[id] nvarchar(7),
)
WITH (LOCATION='/path/to/file/',
DATA_SOURCE = HadoopSouthMLI,
FILE_FORMAT = parquetz_gz_file_format
);
GO
The query works without issues on windows machines that have had mssql-server 2017 installed on them, so I don't believe there is a problem with the query itself.
After receiving the error above I listed the other mssql-server packages available and found a mssql-server-polybase-hadoop package, which also installed two other packages; mssql-zulu-jre-11 and mssql-zulu-jre-8. We installed the package and restarted mssql-server, however, we are still receiving "The Remote Java Bridge has not been attached yet" when attempting the connection.
Are there any additional packages or configuration that are needed to get the bridge created so this query will work?