5

I have successfully created a Python UDF that accepts a varchar value from a table and extracts a substring of that value based on a regex.

The max size of that varchar column in the DDL is set to be 20000 bytes, and in some occasions the UDF outputs an error when I try to call it:

ERROR: Value too long for character type

Detail: 
  -----------------------------------------------
  error:  Value too long for character type
  code:      8001
  context:   Value too long for type character varying(256)
  query:     1127608
  location:  funcs_string.hpp:390
  process:   query0_73 [pid=25345]
  -----------------------------------------------

Is the case that the max varchar limit for a UDF input variable is set to 256 bytes? If yes, is this something that I can change in my code?

Thanks,

and_apo
  • 1,217
  • 3
  • 17
  • 41

2 Answers2

16

Turns out that you have to specify the length in your function's parameter if it is more than 256, which is the default value.. So I fixed this using something similar to :

CREATE OR REPLACE FUNCTION f_xxxxxx(val VARCHAR(20000)) RETURNS VARCHAR(20000) 
IMMUTABLE AS $$
<python function here>
$$ LANGUAGE plpythonu;
Faiz
  • 5,331
  • 10
  • 45
  • 57
and_apo
  • 1,217
  • 3
  • 17
  • 41
  • 3
    FYI: You can use `VARCHAR(MAX)` if your function accepts the largest string possible. You also want to make sure that you use `RETURNS VARCHAR(MAX)` if it does because apparently, VARCHAR == VARCHAR(256) in Redshift. – ZiggyTheHamster Aug 02 '17 at 06:41
  • this answer is right, can reference [aws official answer](https://forums.aws.amazon.com/thread.jspa?threadID=219840) – Vanjor May 31 '18 at 04:46
1

Are you using Text or nvarchar? Redshift internally does not support these data types and converts them to varchar(256).

More details can be found here

Community
  • 1
  • 1
Paladin
  • 570
  • 3
  • 13