Questions tagged [aws-glue-spark]

244 questions
2
votes
1 answer

How to use a function from one glue script to another in AWS glue

I have one AWS glue pyspark script for example scriptA.py. In this script I have defined few generic functions like readSourceData() def readSourceData(parameter1, parameter2): //logic of function Now I want to use this generic function in my secong…
2
votes
1 answer

How to Read Filename from S3 using AWS Glue ETL Tools

I have a some files in S3 that look like this (all in the same path): group1_20210415.csv group2_20210415.csv group1_20210416.csv group2_20210416.csv The schema for each file is rather simple: group1_name, group1_id group2_name, group2_id I want…
2
votes
4 answers

AWS Glue- how to write dynamic frame in S3 as .txt file and use '|' as the delimiter

I want to write a dynamic frame to S3 as a text file and use '|' as the delimiter. How can I modify the code below, so that Glue saves the frame as a .txt file and uses '|' as the delimiter. glue_context.write_dynamic_frame.from_options( …
Beginner
  • 71
  • 1
  • 3
  • 10
2
votes
1 answer

How to make an existing column NOT NULL in AWS REDSHIFT?

I had dynamically created a table through glue job and it is successfully working fine. But as per new requirement, I need to add a new column which generates unique values and should be primary key in redshift. I had implemented the same using…
2
votes
1 answer

What is the Scala and Java version for AWS Glue ETL job?

So far I'm using scala 2.11 with Java 8 to build the library used by the Glue ETL job. We're planning to upgrade to Scala 2.12 with Java 11 but not sure if they are supported by the Glue ETL.
seiya
  • 1,477
  • 3
  • 17
  • 26
2
votes
1 answer

Is there a way to know what was the last partition written in S3 table to use for a push down predicate in AWS Glue Job?

I´m trying to read just the last partition written in a table in S3 from a Glue Job reading the Dynamic Frame using the push down predicate. The table I want to read from gets loaded every day, and therefore a new partition gets created for that…
2
votes
2 answers

How to pass input parameter to AWS Glue Map.apply function

I am working on an AWS Glue job where I have a function "some_function" that I want to apply on DynamicFrame dy_f, but I also want to pass an input param to some_function. Map.apply(frame=products_combination, f=search) where some_function's…
2
votes
0 answers

AWS Glue: Column "column_name" not found in schema

I'm trying to create an ETL job in AWS Glue. The use-case is as follows: When a column gets added in one of the source table after running ETL job, and when we try to re run the etl job, the etl job fails saying column not found (in target…
2
votes
1 answer

Unable to convert aws glue dynamicframe into spark dataframe

I'm trying to convert glue dynamic frame into the spark dataframevusing Dynamicframe.toDF, but I'm getting this exception Traceback (most recent call last): File "/tmp/ManualJOB", line 62, in df1 = datasource0.toDF() File…
Akhil
  • 69
  • 1
  • 6
1
vote
0 answers

Transfering the latest data from Redshift to dynamoDB by AWS Glue

I'm new to dynamoDB and AWS Glue and I'm trying to transfer data from Redshift Cluster to DynamoDB tables by using AWS Glue, but I want to keep only the most recent data from Cluster table. As I understand, dropping the entire dynamoDB table and…
1
vote
0 answers

Failed to start Glue Notebook server

I am trying to create a Glue Studio job using Jupyter Notebook option. But I am getting this error: Role arn:aws:iam::role/AWSGlueServiceNotebookRoleDefault should be given assume role permissions for Glue Service. (Service: AWSGlueJobExecutor;…
abd
  • 51
  • 6
1
vote
0 answers

AWS Glue - fixed width text file - with header and footer

I'm a beginner in AWS, so please bare with me, if certain things are a bit off :) I have a task, where I need to load in a fixed width text file, that contains both a header record and a footer record. And of cause a lot of data in between. The data…
1
vote
0 answers

Cross-Region AWS Glue Data Catalog access with Glue ETL

I have a Glue ETL job in a region us-west-2 that reads from database from AWS Glue Data Catalog from that region. Example datasource0 = glueContext.create_dynamic_frame.from_catalog(database='my-database', …
1
vote
0 answers

Data load from Arena (DMS) to AWS S3

I'm building a unified data platform on the AWS cloud and need to get the necessary information from the Arena document management source system. Has anyone developed a code or any service for ingesting documents or data from the Arena solution…
Mukesh
  • 11
  • 2
1
vote
0 answers

How do I run crawlers for AWS Glue Job that read an excel file?

I am trying to import an excel file with multiple sheets. Based on what I read Glue 2.0 can read excel files. I have tried this code and the job was successful but I am lost as to how I am supposed to run crawlers for Data Catalog, I cannot seem to…