Highest Voted 'aws-glue-spark' Questions

0

votes

0 answers

Error in reading a csv file from aws glue catalog table , data also contain coma

I have s3 file in csv format , which is reading through aws glue job using aws glue catalog . There is 3 fields in s3 file. as follows ID,NAME,COMMENT 1,"XYZ","COMMENT1,COMMENT2" 2,"abc","COMMENT3" 3,"mno","COMMENT4" The issue is while reading…

asked Aug 19 '21 at 15:23

Data girl

101
1
7

0

votes

1 answer

How to drop the duplicate column in glue job. As glue is creating duplicate column

I have created the glue job and its creating duplicate column once I run the crawler on transformed file .How to drop the duplicate column in it I have know there is DropNullFields function but it will drop the null field not duplicate coulmn. What…

pyspark aws-glue aws-glue-spark

asked Aug 17 '21 at 09:33

Parag Shahade

57
3
8

0

votes

1 answer

How to debug an aws glue pyspark job

I have a aws glue pyspark job which is long running after a certain command . In the log it is not writing anything after that command even a simple “print hello “ statement. How can I debug aws glue pyspark job which is long running and not even…

amazon-web-services pyspark aws-lambda aws-glue-spark aws-glue-workflow

asked Aug 14 '21 at 07:59

Data girl

101
1
7

0

votes

0 answers

Strange behavior when editing AWS Glue driver script in AWS console

So basically I have a couple of different Glue jobs (all created from Terraform, but with different workspace for testing purpose), the Glue driver scripts are a little bit different, and they are stored in S3 bucket, then pointed to the targeted…

amazon-web-services aws-glue aws-glue-spark

asked Aug 13 '21 at 12:18

wawawa

2,835
6
44
105

0

votes

1 answer

Is there a way to define AWS Glue input path with wildcard?

I have a Glue job, it looks at the files for the current date (each date has a folder in S3) and process the data in this folder (e.g: "s3://bucket_name/year/month/day"), now I want to find a way to define the input s3 path which tells Glue to look…

amazon-web-services amazon-s3 wildcard aws-glue aws-glue-spark

asked Aug 12 '21 at 15:37

wawawa

2,835
6
44
105

0

votes

1 answer

An error occurred while calling o79.getDynamicFrame. [Amazon](500310) Invalid operation: syntax error at or near "s_next_of_kin"

I have a table in redshift where we have a column name -->( agent's_next_of_kin) if you see it has an apostrophe s in the name now when I am reading it into my DynamicFrame with glue it gives me the above error saying syntax issues . how can I make…

pyspark apache-spark-sql aws-glue aws-glue-data-catalog aws-glue-spark

asked Jul 28 '21 at 19:53

bigDataArtist

141
1
12

0

votes

1 answer

aws glue studio inner join gives error when one of data catalogue has no records

I am new to aws glue studio. I have created two tables in the AWS glue database with partition as the current date. I am doing inner join & left anti join to process the job. If there is no match my glue job fails with the error AnalysisException:…

amazon-web-services aws-glue aws-glue-data-catalog aws-glue-spark aws-glue-workflow

asked Jul 17 '21 at 07:05

Mahen Nakar

376
2
15

0

votes

2 answers

Create a glue job that splits an array into rows?

I currently have data arriving from Firehose into an Athena table. When I view the data it is an array of JSON. Is it possible to use a glue job to split the arrays into separate rows so each row is its own JSON log. For example: Data…

amazon-web-services aws-glue aws-glue-spark

asked Jul 08 '21 at 16:22

ikjot dhillon

1
1

0

votes

0 answers

how to run python Shell glue job by using the glue resources?

python shell jobs run on AWS Glue so they use the DPUs assigned to the GLUE, I was going thru the some tutorials where they were running sql queries which were trigging redshift .My concern was that the computation is happening on redshift which…

python aws-glue aws-glue-data-catalog aws-glue-spark

asked Jul 05 '21 at 05:54

bigDataArtist

141
1
12

0

votes

1 answer

resolve choice for Glue dataframe not working

I have a Glue data frame with the following structure, due to some historical data we have differences in the structure. When I try to change the structure the resolveChoice is not working. |-- logs: array | |-- element: struct | | |--…

pyspark aws-glue aws-glue-spark

asked Jun 30 '21 at 01:12

Tobias Bruckert

348
2
12

0

votes

1 answer

Executing spark sql in aws glue returns the column name in the queries rather than values

running spark sql in aws glue returns the column name in the queries data: product,price,quantityinKG mango,100,1 apple,200,3 peach,200,2 mango,200,2 My Test Query eg : select product,sum(price) from myDataSource …

apache-spark-sql aws-glue aws-glue-data-catalog aws-glue-spark

asked Jun 29 '21 at 12:09

bigDataArtist

141
1
12

0

votes

1 answer

Glue: map/process source table's column data and write it to columns in pre-existing redshift table

I am very new to Glue and came across to a scenario where we've source table in glue catalog and we need to write it's data to specific columns in pre-existing table in redshift. e.g. source_table_name[source_table_column_name]. …

python-3.x aws-glue aws-glue-spark

asked Jun 18 '21 at 05:43

newbieitTech

57
1
7

0

votes

1 answer

AWS Glue : Unable to process data from multiple sources S3 bucket and postgreSQL db with AWS Glue using Scala-Spark

For my requirement, I need to join data present in PostgreSQL db(hosted in RDS) and file present in S3 bucket. I have created a Glue job(spark-scala) which should connect to both PostgreSQL, S3 bucket and complete processing. But Glue job encounters…

postgresql scala amazon-s3 aws-glue aws-glue-spark

asked Jun 12 '21 at 16:59

Swapnil

11
1
2

0

votes

1 answer

unable to convert from spark dataframe to AWS Glue dynamic frame

I have a spark dataframe named cost_matrix. I am trying to convert this spark dataframe to a aws glue dynamic frame using the following line of code: glue_cost_matrix = DynamicFrame.fromDF(cost_matrix, glueContext, 'glue_cost_matrix') However, I'm…

apache-spark-sql aws-glue aws-glue-spark

asked Jun 02 '21 at 19:38

brenda

656
8
24

0

votes

0 answers

Loading data from AWS EMR to Redshift using Glue is very slow

I am trying to load data from AWS EMR(data storage as S3 and glue-catalog for metastore) to Redshift. import sys import boto3 from datetime import datetime,date from awsglue.transforms import * from awsglue.utils import getResolvedOptions from…

amazon-redshift aws-glue aws-glue-spark amazon-emr

asked May 31 '21 at 18:04

Rohan Singh Dhaka

173
2
8
33

Questions tagged [aws-glue-spark]