Questions tagged [aws-glue-spark]

244 questions
0
votes
2 answers

AWS glue: Deploy model in aws environment

As per our AWS environment , we have 2 different types SAGs( service account Group) for Data storage. One SAG is for generic storage , another SAG for secure data which will only hold PII or restricted data. In our environment, we are planning to…
0
votes
1 answer

Read FASTQ file into a AWS Glue Job Script

I need to read FASTQ file into AWS Glue Job Script but I'am getting this error: Traceback (most recent call last): File "/opt/amazon/bin/runscript.py", line 59, in runpy.run_path(script, run_name='main') File "/usr/lib64/python3.7/runpy.py", line…
0
votes
1 answer

AWS Glue: Data Skewed or not Skewed?

I have a job in AWS Glue that fails with: An error occurred while calling o567.pyWriteDynamicFrame. Job aborted due to stage failure: Task 168 in stage 31.0 failed 4 times, most recent failure: Lost task 168.3 in stage 31.0 (TID 39474,…
0
votes
0 answers

AWS Glue write_dynamic_frame_from_options encounters schema exception

I'm new to Pyspark and AWS Glue and I'm having an issue when I try to write out a file with Glue. When I try to write some output into s3 using Glue's write_dynamic_frame_from_options it's getting an exception and saying :…
SGolds
  • 119
  • 2
  • 16
0
votes
0 answers

external library error while running AWS Glue job

I have placed the external python libraries (*.whl) in S3 and is accessing the same by mentioning the path in AWS Glue Job 'Python library path' argument. It runs fine for few external modules but fails for others with the below error: Traceback…
0
votes
1 answer

How to merge CSV file from S3 bucket and save it back into S3 using AWS Glue

Objective is to transform the data (csv files) from one S3 bucket to another S3 bucket - using Glue. What I already tried: I created a CSV classifier. I created a crawler which scans the data coming in S3 bucket. Where I am stuck: Unable to find how…
0
votes
2 answers

glue job schema inference issue

Requirment: I need a glue job to get the aws-dynamodb(nested structure-combination of maps and list) data into s3. My approach: First, i used glue-dynamic frame to get all the data from dynamodb into one dynamic frame. datasource =…
0
votes
1 answer

PySpark query one column name with the value present in another column

Input_pyspark_dataframe: id name collection student.1.price student.2.price student.3.price 111 aaa 1 100 999 232 222 bbb 2 200 888 656 333 ccc 1 …
siva
  • 549
  • 7
  • 25
0
votes
1 answer

How to write to multiple S3 buckets based on distinct values of a dataframe in an AWS Glue job?

I have a dataframe with account_id column. I want to group all of the distinct account_id rows and write to different S3 buckets. Writing to a new folder for each account_id within a given S3 bucket works too.
pnhegde
  • 695
  • 1
  • 8
  • 19
0
votes
0 answers

aws glue bookmark multiple folders in one job run not working

I have my Job code like this: sc = SparkContext() glueContext = GlueContext(sc) s3_paths = ['01', '02', '03'] #these paths are in the same folder and are partitioned under the source_path s3_source_path = 'bucket_name/' for sub_path in s3_paths : …
phoebe
  • 1
  • 1
0
votes
3 answers

Joining two dataframes in spark scala based on OR condition

I have two data frames 1) Accounts and 2) Customers. The schema of accounts is as: Name Id Telehone Mob email AR 1 123 1234 test1@gmail.com BR 2 213 4123 test2@gmail.com CR 3 231 …
0
votes
1 answer

AWS glue spark submit use Spark avro

How to specify/pass packages parameters to the AWS glue spark job? I am using Glue version 1 which supports Spark 2.4.3 and want to use Spark avro to read some avro files
vkt
  • 1,401
  • 2
  • 20
  • 46
0
votes
1 answer

Issues using mergeDynamicFrame on AWS Glue

I need do a merge between two dynamic frames on Glue. I tried to use the mergeDynamicFrame function, but i keep getting the same error: AnalysisException: "cannot resolve 'id' given input columns: [];;\n'Project ['id]\n+- LogicalRDD false\n" Right…
Hlemos
  • 3
  • 4
0
votes
1 answer

Kafka Integration with AWS GLUE

Could not find any specific Group for this particular integration. I am working for a retail organisation and trying to do direct integration of Kafka streams with Glue. I mean putting Kafka Topic as input Source to AWS Glue. I am using Apache Kafka…
0
votes
2 answers

How to override s3 data using Glue job in AWS

I have dynamo db table and i am sending the dynamo db data to s3 using glue job. Whenever running the glue job for updating new data to s3, but it is also appending old data. It should override the old data.Job Script below import sys from…
htyagi1
  • 23
  • 1
  • 6
1 2 3
16
17