I want to use Python libraries to create UDF functions in Redshift, specifically ua-parser library.
Process of using custom Python libraries on Redshift is described here http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_LIBRARY.html
In order to get the library with all dependencies, I used PipLibraryInstaller, by aws labs, which should put all the dependent libraries on S3, same as regular pip command.
But I cannot make ua-parser library work with this command.
I created and uploaded lib to S3 using following command
./installPipModuleAsRedshiftLibrary.sh -m ua-parser -s s3://bucket_location -r region_name
I then used following command to create the library
CREATE OR REPLACE LIBRARY ua_parser
LANGUAGE plpythonu
from 's3://bucket/ua-parser.zip'
WITH CREDENTIALS AS 'aws_access_key_id=AWS_key;aws_secret_access_key=secret_key'
region 'region_name'
Then I created function:
create function f_user_agent_parse (user_agent varchar) returns varchar IMMUTABLE
as $$
from ua_parser import user_agent_parser as parser
parsed_string = parser.Parse(user_agent)
return type(parsed_string)
$$
language plpythonu;
When I try to execute the following:
select f_user_agent_parse('facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)') as s
I get the following error:
ERROR: XX000: ImportError: No module named _regexes. Please look at svl_udf_log for more information
It looks like regexes is not within the library. But, when I downloaded lib from S3, and looked into it, I see following files:
What is the problem here? Im I doing something wring or there is a problem with the library?