1

I am connecting to an impala server to retrieve some data. However I want to change the length of the variable in the impala pass-through sql statement because otherwise it will retrieve the variable with a length of 32767 which is not optimized.

This is how I would do it in SAS:

proc sql;
    connect to impala (dsn="somedsn");
    create table want as
    select * 
    from connection to impala
        (select var1 length=50  
            from &disc_table.);
    disconnect from impala;
quit;

However, as IMPALA does not have the same sql statement structure, it does not work.

What is the correct statement to retrieve the var1 with a length=50 in IMPALA SQL statement ?

Kermit
  • 3,112
  • 2
  • 10
  • 34

2 Answers2

1

You could ask IMPALA to CAST the variable to one with the correct length.

... (select cast(var1 as varchar(50)) as var1 from &disc_table.) ...
Tom
  • 47,574
  • 2
  • 16
  • 29
  • The length is still 32767 using `(select cast(var1 as varchar(50)) as var1 from &disc_table.)` – Kermit Jan 25 '21 at 18:12
1

One option is to change the length in the SAS-side select:

proc sql;
    connect to impala (dsn="somedsn");
    create table want as
    select var1 length=50 
    from connection to impala
        (select var1 
            from &disc_table.);
    disconnect from impala;
quit;

You could also use the DBMAX_TEXT option I believe in the connection step, though I'm not familiar with Impala's options specifically.


Given the comments below, it looks like the problem is that Impala doesn't support CHAR/VARCHAR, and so you get 32767 length strings. Your solution might be to do both things Tom and I suggest, ie:

proc sql;
    connect to impala (dsn="somedsn");
    create table want as
    select var1 length=50 
    from connection to impala
        (select cast(var1 as varchar(50)) as var1 
           from &disc_table.);
    disconnect from impala;
quit;

Which might mean you get less data transferred back AND you get a smaller column; or you might add DBMAX_TEXT here; or you can also do as the kb article suggests and additionally add a limit in the ODBC connection options.

Another option is to use SAS DS2, though it looks like it still has the same issue ultimately (in that older versions of Impala don't support varchar/char). That document suggests upgrading to CDH 5.2 (which I think is a Cloudera implementation of Impala). I'm not sure if non-Cloudera versions support VARCHAR, though there are at least some mentions in the Impala change notes for version 3.4, specifically this issue which mentions adding it to Kudu tables.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Yeah that's how I used it. However I want to put it in the SQL pass-through statement so that the data I download from Impala would be smaller. Will have a look at the `DBMAX_TEXT` option. Thanks. – Kermit Jan 26 '21 at 08:00
  • Does the combination of Tom's answer and mine solve it? SAS may be allocating 32767 for it, but Impala may be only doing 50 on its side if you use `cast`. – Joe Jan 26 '21 at 15:43
  • No. I think it's just that this is pretty bad (cf.https://support.sas.com/kb/53/835.html) – Kermit Jan 26 '21 at 17:38
  • 1
    Added this plus some other details in the answer. – Joe Jan 26 '21 at 17:48