2

I want to use Python libraries to create UDF functions in Redshift, specifically ua-parser library.

Process of using custom Python libraries on Redshift is described here http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_LIBRARY.html

In order to get the library with all dependencies, I used PipLibraryInstaller, by aws labs, which should put all the dependent libraries on S3, same as regular pip command.

But I cannot make ua-parser library work with this command.

I created and uploaded lib to S3 using following command

./installPipModuleAsRedshiftLibrary.sh -m ua-parser -s s3://bucket_location -r region_name

I then used following command to create the library

CREATE OR REPLACE LIBRARY ua_parser
LANGUAGE plpythonu
from 's3://bucket/ua-parser.zip'
WITH CREDENTIALS AS 'aws_access_key_id=AWS_key;aws_secret_access_key=secret_key'
region 'region_name'

Then I created function:

create function f_user_agent_parse (user_agent varchar) returns varchar IMMUTABLE 
as $$
from ua_parser import user_agent_parser as parser

parsed_string = parser.Parse(user_agent)

return type(parsed_string)
$$ 
language plpythonu;

When I try to execute the following:

select f_user_agent_parse('facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)') as s

I get the following error:

ERROR: XX000: ImportError: No module named _regexes. Please look at svl_udf_log for more information

It looks like regexes is not within the library. But, when I downloaded lib from S3, and looked into it, I see following files: enter image description here

What is the problem here? Im I doing something wring or there is a problem with the library?

Srdjan Nikitovic
  • 853
  • 2
  • 9
  • 19

2 Answers2

1

Actually the problem was that I was running this command in windows, but it does not work from Windows environment.

It is really strange although native client for Redshift is Aginity, which runs only on Windows, but then we cannot use Python functionalities that Redshift offers

Srdjan Nikitovic
  • 853
  • 2
  • 9
  • 19
0

Works for me with:

$ python --version
Python 2.7.10
$ pip --version
pip 7.1.2 from /Library/Python/2.7/site-packages/pip-7.1.2-py2.7.egg (python 2.7)

And executing the script from aws-labs:

Collecting ua-parser
  Using cached ua_parser-0.7.1-py2.py3-none-any.whl
  Saved /private/var/folders/ty/fw4v8qq54330h_b6tz47c8r40000gn/T/.ua-parser/ua_parser-0.7.1-py2.py3-none-any.whl

However, I have another problem executing the code you posted.
Upon executing the query in Redshift I got:

ERROR:  TypeError: expected string or Unicode object, type found. Please look at svl_udf_log for more information

I changed return type(parsed_string) to return parsed_string['user_agent']['family']:

db=# select f_user_agent_parse('facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)'::varchar(200));
 f_user_agent_parse
--------------------
 FacebookBot
(1 row)

Folder structure inside ua-parser.zip:

$ unzip ua-parser.zip
Archive:  ua-parser.zip
  inflating: ua_parser/__init__.py
  inflating: ua_parser/_regexes.py
  inflating: ua_parser/user_agent_parser.py
  inflating: ua_parser/user_agent_parser_test.py
  inflating: ua_parser-0.7.1.dist-info/DESCRIPTION.rst
  inflating: ua_parser-0.7.1.dist-info/metadata.json
  inflating: ua_parser-0.7.1.dist-info/top_level.txt
  inflating: ua_parser-0.7.1.dist-info/WHEEL
  inflating: ua_parser-0.7.1.dist-info/METADATA
  inflating: ua_parser-0.7.1.dist-info/RECORD
moertel
  • 1,523
  • 15
  • 18
  • I know the code might not be correct, but I never reach to get that kind o error. I get that "_regexes" is not available module. I have python 2.7.12 and pip 8.1.2 So after u execute the script from the aws labs, dies it directly upload library to S3, and do you create the library using the same script to the one I provided above? – Srdjan Nikitovic Aug 04 '16 at 08:25
  • I created myself a new env with Python 2.7.12 and pip 8.1.2 but everything still works perfectly fine. I used the script from `aws-labs` as well and it can upload to S3 without problems. From your question it sounds like you get an error upon executing the function but now you write that it already dies upon uploading to S3? Can you clarify? – moertel Aug 04 '16 at 11:53
  • Nono, uploading to S3 works fine. Here are the steps I performed: 1) using aws-labs script, created and uploaded ua_parser.zip to S3 -> works fine 2) create library in Redshift using Aginity tool -> succeeded without errors 3) created function in Redshift using Aginity client -> succeeded without errors 4) trying to execute select statement using previously created function, (as described in the question) -> I get the error that there is no module called _regexes Thank you – Srdjan Nikitovic Aug 04 '16 at 13:08
  • I'm afraid I cannot reproduce your error then. The only difference between the steps we have performed is the client we've used (I issued all commands and statements from the command line via `psql`). I'd suggest to delete everything, i.e. issue `drop function f_user_agent_parse (user_agent varchar)` and `drop library ua_parser` and try again. If all else fails I could send you my S3 file -- maybe the files differ in some detail. – moertel Aug 04 '16 at 13:41
  • Can you send me a file please. I am interested in a floder structure, it might be different. Can you send it to me via personal message – Srdjan Nikitovic Aug 04 '16 at 14:31
  • I added the folder structure to my post. Attaching or sending files does not seem possible here. – moertel Aug 04 '16 at 15:05
  • Thanks for your help, but it doesn't work. I guess we will contact Amazon for support on this – Srdjan Nikitovic Aug 05 '16 at 10:49
  • Error is the same - import error: no module named regexes – Srdjan Nikitovic Aug 05 '16 at 10:55
  • create library for regexes – Alex B Aug 20 '21 at 21:54