Questions tagged [aws-data-wrangler]

AWS Data Wrangler offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses and Databases. It integrates with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Project: awswrangler · PyPI

69 questions
0
votes
0 answers

Why am I getting client error while running Data wrangler processing job in sagemaker?

I am working in Feature store creation with the help of AWS Data wrangler - a feature of AWS Sagemaker Studio. When I try to run the data wrangler job (for ingestion data into feature store), I am encountering the following error. "ClientError: API…
0
votes
1 answer

AWS Wrangler S3 reading parquet, writing to DynamoDB - Unsupported type numpy.ndarray

I am trying to read parquet into dataframe with AWS wrangler, while writing this data to DynamoDB its erroring out with unsupported type error - Unsupported type numpy.ndarray for value....... wr.s3.read_parquet(path=s3_path, dataset=dataset,…
0
votes
1 answer

Use AWS wrangler without AWS cli

I'm running the following code through AWS wrangler import awswrangler as wr my_query = wr.athena.read_sql_query( sql="""select "$path" as path from table""", database='db', workgroup='workgroup' ) But I don't wish to use static methods…
gseph
  • 41
  • 6
0
votes
0 answers

awswrangler Error: could not find a version satisfies the requirement pyarrow<6.0.1 (from awswrangler)

Python 3.7.9 Emr 5.33.1 Rrying to install libraries onto EMR using bootstrap.sh I get following error: Error: could not find a version satisfies the requirement pyarrow<6.1.0,>=2.0.0 (from awswrangler) The weired thing is I've uploaded pyarrow…
haneulkim
  • 4,406
  • 9
  • 38
  • 80
0
votes
0 answers

AWS Sagemaker data wrangler processing job

So I am creating a feature store. I am doing the transformations using data wrangler and by using the option "export to sagemaker feature store" I am trying to ingest all the dat ai got after transformation, into the feature store. One of my feature…
0
votes
1 answer

Having issues with my excel to csv conversion in aws lambda using the AWSDatawrangler layer

I have a function that reads an excel file into a dataframe and then I save that dataframe in an s3 bucket using the awswrangler api to_csv function. The excel file has data starting from different rows and columns. My conversion code looks…
0
votes
2 answers

Update Athena Table using AWS Data Wrangler

I started using AWS Data Wrangler and Athena, to upload my data files onto S3, and being able to query them, respectively. My question is about the procedure to "safely" update the data in the table. Here is what I did: I used the AWS Data…
0
votes
0 answers

Why is AWS athena slower than reading parquet directly?

I have created a table on AWS Athena. It is partitioned both on S3, and Athena. I am now trying to load the table into a pandas dataframe using 2 methods from the awswrangler library: AWS Athena read_sql_query vs reading parquet directly as…
0
votes
1 answer

Unable to find AWS Wrangler 2.10.0 Layer in US-WEST-2 under public-artifacts

Please consider the example shown in the below link: https://github.com/awslabs/aws-data-wrangler/issues/923 I am trying to get the public artifacts bucket wrangler to US-West-2, but I am unable to find that, can someone help me figure it out, also,…
0
votes
1 answer

AWS Wrangler Error HIVE_METASTORE_ERROR: Table is missing storage descriptor

hope you can help me with a concern about an error with awswrangler. this is the case: i have 2 aws accounts, AccountA and AccountB, both with lakeformation enabled, i have a set of databases in AccA and another set in AccB, so we share AccountB…
0
votes
3 answers

How to read sheet names of excel sheet from S3 in AWS Wrangler?

I have an excel sheet which is placed in S3 and I want to read sheet names of excel sheet. I have read excel sheet with aws wrangler using awswrangler.s3.read_excel(path) How can I read sheetnames using AWS Wrangler using Python?
0
votes
1 answer

Automate the date parameter while deplying the model on AWS Wrangler

I have built a XGBoost model on my local machine which takes a training data and validates the model on a testing dataset. However, I have hard-coded the date values as the training data is created monthly. The training data gets created based on…
0
votes
1 answer

Split SQL Where IN Clause When List is to big into Smaller Requests in Python

I have setup an AWS Lambda function with python to ingest requests from a CSV and then query an AWS Serverless Aurora PostgreSQL database based on this request. The function works when the requests are less then 1K but I get errors due to a hard…
0
votes
1 answer

AWS Maximum BadRequestException retries reached for query Using Data API to Query RDS Serveless Aurora

I have created a Lambda function that uses awswrangler data api to read in data from an RDS Serverless Aurora PostgreSQL Database from a query. The query contains a conditional that is a list of IDs. If the query has less then 1K ids it works great,…
0
votes
1 answer

Sagemaker AWSWrangler>2.3.0

As I am trying to use the function read_excel as part of AWS Wrangler, available as of version 2.3.0 in Sagemaker Jupyter Lab on Amazon Web Services, it does not install properly. The Python version of the Conda Instance is 3.6. When running !pip…