Highest Voted 'aws-data-wrangler' Questions

0

votes

0 answers

Is it possible to omit header rows when exporting a SageMaker DataWrangler flow to s3 (via a Jupyer Notebook)?

I am exporting a DataWrangler flow to s3 via a Jupyter Notebook using SageMaker Studio. Each of the resulting CSV files (each containing a part of the transformed dataset) include a header row with the column names. However when using a CSV file as…

asked Jun 19 '23 at 17:00

Thomas Hopkins

671
2
10
20

0

votes

0 answers

Which pandas module can be used read parquet file in parallel?

I am using Data wrangler to read parquet datasets. The partition has 300 files and each files are around 256 mb. I am using sagemaker ml.r5.24xlarge which has 96 cores. Processing job is doing 3 task. Read parquet file. Execute model Write the…

python pyarrow aws-data-wrangler

asked Jun 12 '23 at 13:03

user3858193

1,320
5
18
50

0

votes

0 answers

How to connect to Amazon Athena using Simba ODBC and Python

I have python code that read data from Athena and that works fine in AWS portal , but it's not working from my local computer because of security policies (using secret key is forbidden for us). This code uses awswrangler and boto3 to read the data.…

python amazon-web-services boto3 aws-data-wrangler

asked Jun 05 '23 at 16:01

Harry Birimirski

858
1
8
21

0

votes

0 answers

AWS Wrangler WaiterError: Waiter BucketExists failed: Max attempts exceeded. Previously accepted state: Matched expected HTTP status code: 404

I've been trying to query information frin Athena using the next bit of code: import boto3 import awswrangler as wr sessAWS_Test = boto3.session.Session( aws_access_key_id = 'id', aws_secret_access_key = 'key', region_name = 'region…

python amazon-web-services aws-data-wrangler

asked May 02 '23 at 22:54

ricardov-DB

11
2

0

votes

1 answer

aws wrangler (pandas layer). problem with path to S3 bucket

here is my python code in my lambda layer. Shout out to John R, for some of this paginator code. from api gateway, I pass in path param (bucket) and query string params (fmt & date), such…

python pandas aws-lambda aws-api-gateway aws-data-wrangler

asked Apr 30 '23 at 19:54

bob

99
7

0

votes

1 answer

awswrangler query Athenea: AttributeError: Can only use .dt accessor with datetimelike values

I have one table in Athena which all columns with proper datatypes(date, bigint,int,decimal(28,2) string and etc.). I need to query out the data via aws wrangler API :athena.read_sql_query I write: athena.read_sql_query(sql=test_query,…

amazon-athena aws-data-wrangler

asked Apr 26 '23 at 22:12

zac yang

1

0

votes

0 answers

how to remove extra space in json output from lambda

I have a lambda reading csv's in an S3 bucket. The api gateway calls the lambda. The data in the csv is like this: Ticker Exchange Date Open High Low Close Volume 6A BATS 12/2/2021 0.9 0.95 0.83 0.95 …

python pandas aws-lambda aws-api-gateway aws-data-wrangler

asked Mar 30 '23 at 22:39

bob

99
7

0

votes

0 answers

AWS Wrangler wr.athena.read_sql_query, s3_additional_kwargs not tagging s3 objects

I am trying to add cost-allocation tags to S3 resources created by Athena queries, in a way that I can analyze the S3 costs of different application related to Athena usage. To achieve this, I am making use of the parameter s3_additional_kwargs when…

amazon-s3 amazon-athena aws-data-wrangler aws-cost-explorer

asked Mar 20 '23 at 08:19

E. Faslo

325
5
19

0

votes

0 answers

How to test awswrangler with local data

I am working with awswrangler to execute athena queries and transform it with pandas. I want to test my code in local without any actual aws instance. Is there a way to mock aws services or other ways to work with awswrangler in local ?

pandas amazon-web-services amazon-s3 amazon-athena aws-data-wrangler

asked Mar 02 '23 at 18:24

Archana Gajula

11
2

0

votes

0 answers

How do I merge several parquet files into one using awswrangler?

I am trying to use awswrangler.s3.merge_datasets() using a glob source string but it isn't working for me. https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.s3.merge_datasets.html import glob import awswrangler as…

python amazon-web-services glob aws-data-wrangler

asked Mar 02 '23 at 10:17

jtlz2

7,700
9
64
114

0

votes

0 answers

Unable to query parquet data which has array datatype

When using awswrangler and writing to S3 in parquet format, the data files are not queryable using S3 select (for csv) or Athena. For e.g. events = [{"c1": "12", "c2": [1, 2, 3, 6], "c3": 1234}] df = pd.DataFrame.from_dict(events) wr.s3.to_parquet( …

python-3.x amazon-athena aws-data-wrangler

asked Feb 28 '23 at 19:10

Raman

665
1
15
38

0

votes

1 answer

How to specify the location of athena query results when using awswrangler

The python code below can fetch data from a pre-configured athena table when it is run on local computer. But it automatically creates an S3 bucket to store temporary tables and metadata. The automatically created bucket name looks like…

amazon-athena aws-data-wrangler

asked Feb 22 '23 at 17:02

d.b

32,245
6
36
77

0

votes

0 answers

How to check which column is a value from when using awswrangler to write a parquet file to s3?

I have some dataframes with various columns and rows pulled from some worksheets in google sheets using pygsheets and from postgres tables in several databases, trying to write these on s3 buckets using awswrangler. For most of them I don't have to…

python dataframe amazon-s3 aws-data-wrangler

asked Feb 02 '23 at 18:01

fmvio

1
2

0

votes

0 answers

Missing s3 package from AWS Wrangler

I installed the latest version of AWS Wrangler, 2.19.0. When I run the import this happens. import awswrangler as wr File \~/opt/anaconda3/lib/python3.9/site-packages/awswrangler/lakeformation/\_utils.py:13, in \ 11 from awswrangler import…

amazon-web-services amazon-s3 aws-data-wrangler

asked Jan 17 '23 at 11:05

Lautaro

5
4

0

votes

1 answer

How can I apply a unique filter to partition column of a parquet file using wr.s3.read_parquet?

I have a parquet dataset stored in s3 and I want to read it to apply a filter to the partition field, specifically the unique. I was trying as follows, however the unique function cannot be applied Here's my attempt: query_fecha_dato =…

python pandas aws-data-wrangler

asked Dec 13 '22 at 06:08

Jeanpiere Alcocer

1
1

Questions tagged [aws-data-wrangler]