Questions tagged [aws-glue-spark]
244 questions
1
vote
0 answers
How to work with schema returned by 'get_catalog_schema_as_spark_schema'?
Example:
schema = glueContext.get_catalog_schema_as_spark_schema(database=args['Database'], table_name=args['Table'])
if I simply print the returned schema I can see the StructType/StructField structure, something similar to:
StructType(
…

GSazheniuk
- 1,340
- 10
- 16
1
vote
2 answers
DataFrame remove rows existing in another DataFrame
I have two data frames:
df1:
+----------+-------------+-------------+--------------+---------------+
|customerId| fullName| telephone1| telephone2| email|
+----------+-------------+-------------+--------------+---------------+
| …

TurboAza
- 75
- 1
- 9
1
vote
1 answer
AWS glue pyspark: java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
I'm trying to read a csv file from s3 in my AWS glue pyspark script.
Following is the snippet of the code:-
import sys
import os
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import…

Harsh P Waghela
- 63
- 1
- 9
1
vote
2 answers
aws glue pyspark remove struct in an array but keep the data and save into dynamodb
A dynamodb table is exported to s3 and aws glue crawler crawls the s3 data.
Aws glue jobs take the source from the crawled data and here's the schema that was transformed by MergeLineItems:
def MergeLineItems(rec):
rec["lineItems1"] = {}
a =…

Minah
- 81
- 1
- 9
1
vote
1 answer
AWS Glue Bad value for type BigDecimal : NaN
I'm trying to export a table I crawled from a postgres(rds) database into glue. There's one field with a decimal(10, 2) type. Now I have several problems.
Exporting the table from glue(using spark 2.4, 3.1 python 3) into s3 with the following…

Mugiwara
- 46
- 1
- 4
1
vote
1 answer
AWS Glue assigned all tasks to the same worker
I have an AWS Glue job whose work is very simple: break large CSV gzip files into 1GB ones.
In my test, I uploaded 4 files into the bucket, each is around 5GB.
Yet, the job always assigns all files to a single worker instead of distributing across…

user1888955
- 626
- 1
- 9
- 27
1
vote
0 answers
Convert Glue column datatype to Spark metadata
I have a glue column whose datatype in
Glue is
struct
However when spark infers this schema, it converts this glue type to spark metadata
and saves it to Glue table properties as follows:
"name": "columnName",
"type":…

aishwarya murkute
- 11
- 1
1
vote
0 answers
Error Running Spark Glue jobs after created DF to tempView
Explanation
when I am creating a DF from dynamic frame it works fine and I am able to write
dataframe back to dynamic frame but when I am converting a Dataframe to
createOrReplaceTempView then it is throwing me this error. The number of…

bigDataArtist
- 141
- 1
- 12
1
vote
1 answer
Glue PySpark Job: An error occurred while calling o100.pyWriteDynamicFrame
I am building data pipeline for migrating data from S3 bucket to Snowflake via AWS Glue by creating custom connector in AWS Glue.
I am getting below Error when running glue job:
**An error occurred while calling o100.pyWriteDynamicFrame. Glue ETL…

Lavish Patodi
- 11
- 1
- 4
1
vote
1 answer
How do you specify Project ID in the AWS Glue to BigQuery connector?
I'm trying to use the AWS Glue connector to BigQuery following the tutorial in https://aws.amazon.com/blogs/big-data/migrating-data-from-google-bigquery-to-amazon-s3-using-aws-glue-custom-connectors/ but after following all steps I get a:
:…

tonicebrian
- 4,715
- 5
- 41
- 65
1
vote
1 answer
Creating dynamic frame issue without the pushdown predicate
New to AWS glue, so pardon my question:
Why do I get an error when I don't include a pushdown predicate when creating the dynamic frame. I try to use it without the predicate as I will be using bookmark so only new files will be processed regardless…

marcia12
- 159
- 1
- 2
- 12
1
vote
0 answers
AWS Glue not able to access database in VPC
I have AWS Glue Job which is using Spark and Scala with jdbc connections specified in the script for custom ETL and data decryption. While running the job in an environment where databases are not publicly available the jobs are failing with…

Trojan Developer
- 13
- 5
1
vote
1 answer
How would chaning the read in AWS Glue change a column's data type?
I have a AWS Glue job that was slightly modified, only the read was changed, the job runs fine however the datatypes on my columns have changed. Where I previously had BigInt, I now just have Ints. This is causing an EMR Job dependent on these files…

sgallagher
- 137
- 10
1
vote
0 answers
Why does Spark SQL add double quotes to some string concat() but not to others? I do not want quotes around numeric fields
Please note that I do not want double quotes around all field; just strings.
Working on AWS Glue Studio, if I have select concat(ref_alpha, '!', ref_beta) and send it to a csv file I get
"AB12!RT45"
but if I have concat(ref_alpha, 'T', ref_beta) I…

corisco
- 115
- 5
1
vote
1 answer
Unable to access csv file generated by a jar file in AWS Glue
This is my first question here!
So we're working on some MDM related stuff wherein we need to run a jar file provided by our MDM partner to merge the records. We are able to call the subprocess() method in our AWS Glue script to run the jar file.…

Elhan Shaji
- 11
- 2