Highest Voted 'aws-glue-spark' Questions

1

vote

0 answers

How to work with schema returned by 'get_catalog_schema_as_spark_schema'?

Example: schema = glueContext.get_catalog_schema_as_spark_schema(database=args['Database'], table_name=args['Table']) if I simply print the returned schema I can see the StructType/StructField structure, something similar to: StructType( …

aws-glue-spark

asked Oct 06 '21 at 00:05

GSazheniuk

1,340
10
16

1

vote

2 answers

DataFrame remove rows existing in another DataFrame

pandas dataframe pyspark aws-glue aws-glue-spark

asked Sep 17 '21 at 21:55

TurboAza

75
1
9

1

vote

1 answer

AWS glue pyspark: java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException

I'm trying to read a csv file from s3 in my AWS glue pyspark script. Following is the snippet of the code:- import sys import os from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import…

amazon-s3 pyspark aws-glue aws-glue-spark

asked Sep 17 '21 at 16:06

Harsh P Waghela

63
1
9

1

vote

2 answers

aws glue pyspark remove struct in an array but keep the data and save into dynamodb

A dynamodb table is exported to s3 and aws glue crawler crawls the s3 data. Aws glue jobs take the source from the crawled data and here's the schema that was transformed by MergeLineItems: def MergeLineItems(rec): rec["lineItems1"] = {} a =…

pyspark amazon-dynamodb aws-glue aws-glue-spark

asked Sep 15 '21 at 05:13

Minah

81
1
9

1

vote

1 answer

AWS Glue Bad value for type BigDecimal : NaN

I'm trying to export a table I crawled from a postgres(rds) database into glue. There's one field with a decimal(10, 2) type. Now I have several problems. Exporting the table from glue(using spark 2.4, 3.1 python 3) into s3 with the following…

apache-spark pyspark aws-glue aws-glue-data-catalog aws-glue-spark

asked Sep 13 '21 at 13:03

Mugiwara

46
1
4

1

vote

1 answer

AWS Glue assigned all tasks to the same worker

I have an AWS Glue job whose work is very simple: break large CSV gzip files into 1GB ones. In my test, I uploaded 4 files into the bucket, each is around 5GB. Yet, the job always assigns all files to a single worker instead of distributing across…

aws-glue aws-glue-data-catalog aws-glue-spark aws-glue-connection

asked Sep 01 '21 at 19:35

user1888955

626
1
9
27

1

vote

0 answers

Convert Glue column datatype to Spark metadata

I have a glue column whose datatype in Glue is struct However when spark infers this schema, it converts this glue type to spark metadata and saves it to Glue table properties as follows: "name": "columnName", "type":…

apache-spark aws-glue aws-glue-spark

asked Aug 21 '21 at 16:13

aishwarya murkute

11
1

1

vote

0 answers

Error Running Spark Glue jobs after created DF to tempView

Explanation when I am creating a DF from dynamic frame it works fine and I am able to write dataframe back to dynamic frame but when I am converting a Dataframe to createOrReplaceTempView then it is throwing me this error. The number of…

amazon-web-services apache-spark aws-glue aws-glue-data-catalog aws-glue-spark

asked Jul 21 '21 at 16:26

bigDataArtist

141
1
12

1

vote

1 answer

Glue PySpark Job: An error occurred while calling o100.pyWriteDynamicFrame

I am building data pipeline for migrating data from S3 bucket to Snowflake via AWS Glue by creating custom connector in AWS Glue. I am getting below Error when running glue job: **An error occurred while calling o100.pyWriteDynamicFrame. Glue ETL…

pyspark snowflake-cloud-data-platform aws-glue snowflake-schema aws-glue-spark

asked Jul 06 '21 at 12:08

Lavish Patodi

11
1
4

1

vote

1 answer

How do you specify Project ID in the AWS Glue to BigQuery connector?

I'm trying to use the AWS Glue connector to BigQuery following the tutorial in https://aws.amazon.com/blogs/big-data/migrating-data-from-google-bigquery-to-amazon-s3-using-aws-glue-custom-connectors/ but after following all steps I get a: :…

google-bigquery aws-glue aws-glue-spark aws-glue-connection

asked Jul 02 '21 at 12:31

tonicebrian

4,715
5
41
65

1

vote

1 answer

Creating dynamic frame issue without the pushdown predicate

New to AWS glue, so pardon my question: Why do I get an error when I don't include a pushdown predicate when creating the dynamic frame. I try to use it without the predicate as I will be using bookmark so only new files will be processed regardless…

apache-spark pyspark apache-spark-sql aws-glue aws-glue-spark

asked Jul 02 '21 at 03:10

marcia12

159
1
2
12

1

vote

0 answers

AWS Glue not able to access database in VPC

I have AWS Glue Job which is using Spark and Scala with jdbc connections specified in the script for custom ETL and data decryption. While running the job in an environment where databases are not publicly available the jobs are failing with…

scala apache-spark aws-glue aws-glue-spark aws-glue-connection

asked Jun 22 '21 at 14:56

Trojan Developer

13
5

1

vote

1 answer

How would chaning the read in AWS Glue change a column's data type?

I have a AWS Glue job that was slightly modified, only the read was changed, the job runs fine however the datatypes on my columns have changed. Where I previously had BigInt, I now just have Ints. This is causing an EMR Job dependent on these files…

scala aws-glue aws-glue-spark

asked Jun 10 '21 at 00:45

sgallagher

137
10

1

vote

0 answers

Why does Spark SQL add double quotes to some string concat() but not to others? I do not want quotes around numeric fields

Please note that I do not want double quotes around all field; just strings. Working on AWS Glue Studio, if I have select concat(ref_alpha, '!', ref_beta) and send it to a csv file I get "AB12!RT45" but if I have concat(ref_alpha, 'T', ref_beta) I…

apache-spark apache-spark-sql aws-glue aws-glue-spark

asked Jun 01 '21 at 00:09

corisco

115
5

1

vote

1 answer

Unable to access csv file generated by a jar file in AWS Glue

This is my first question here! So we're working on some MDM related stuff wherein we need to run a jar file provided by our MDM partner to merge the records. We are able to call the subprocess() method in our AWS Glue script to run the jar file.…

amazon-web-services aws-glue executable-jar aws-glue-spark reltio

asked May 21 '21 at 12:05

Elhan Shaji

11
2

Questions tagged [aws-glue-spark]