Questions tagged [aws-glue-spark]
244 questions
0
votes
3 answers
Decrypt records using KMS in pySpark in AWS Glue
We are performing client-side encryption on certain text content and store them into individual files in s3. We are looking to read these files and process the content in AWS Glue. We are able to read the contents but during decryption, we get a…

justlikethat
- 329
- 2
- 12
0
votes
1 answer
AWS Glue ETL - Converting Epoch to timestamp
as the title states, I'm having trouble converting a column on a Dynamic Frame from Epoch to a timestamp.
I have tried onverting in into a Data Frame and back to Dynamic Frame but it is not working.
This is what I'm working with:
import sys
from…

parmigiano
- 95
- 1
- 14
0
votes
0 answers
Rename File in aws glue scala
I am writing files to S3 using glue scala code. But it saves csv file as run-part-000.. I want it to save or rename as something.csv. How can it be done? Code snippet below-
gluecontext.getSinkWithFormat(
connectionType = "S3",
options =…

Abhishek Mishra
- 21
- 8
0
votes
1 answer
AWS Glue Spark job error: "ModuleNotFoundError: You need to install pyodbc respectively the AWS Data Wrangler package with the sqlserver"
I am using AWS Glue Spark with python job to sync the data from s3 to on-prem Sql Server and using AWS Wrangler and attached pyodbc wheel file along with it. when I ran my job I am getting this error "ModuleNotFoundError: You need to install pyodbc…

nithin
- 11
- 4
0
votes
1 answer
How to create a filter on an aws glue dynamicframe that filters out set of (literal) values
In a glue script (running in a zeppelin notebook forwarding to a dev endpoint in glue), I've created a dynamicframe from a glue table, that I would like to filter on field "name" not being in a static list of values, i.e. ("a","b","c").
Filtering…

Anske
- 1
- 3
0
votes
1 answer
Pyspark - dataframe..write - AttributeError: 'NoneType' object has no attribute 'mode'
I am trying to convert csv files into parquet using pyspark.
parquet_file = s3://bucket-name/prefix/
parquet_df.write.format("parquet").option("compression", "gzip").save(parquet_file).mode(SaveMode.Overwrite)
I am trying to overwrite parquet…

Mahantesh Angadi
- 1
- 1
0
votes
1 answer
Connecting to Presto database using AWS Glue. Unable to pass SSL Keystore or Certificate
I have issue connecting to Presto using AWS glue job. The code is written in Spark Scala. I am trying to connect to Presto using the below code.
val datanot_in_hz = sqlcontext.read.format("jdbc").option("url", jdbcConUrl).option("driver",…

roshaga
- 257
- 3
- 11
0
votes
1 answer
AWS Glue maximum and transform rows
I am trying to load data in one of the table created using AWS glue from source bucket S1.
Source bucket having 4 columns( session_id, Date, type, action ) with below values. Purchase transaction lasted for 1 min and we get 2 records for the same.…

Ganesh
- 7
- 4
0
votes
1 answer
How to choose python version 3 while deploying AWS glue Job with glue version 1.0 using YAML(serverless)
How to choose python version 3 while deploying AWS glue Job with glue version 1.0 using YAML(serverless)?
I'm deploying AWS glue using serverless YAML code. AWS has provided GlueVersion parameter to choose the version of glue to use which I'm…

Rajan Beri
- 1
- 4
0
votes
1 answer
How to add an index to an RDS database/table after AWS Glue script imports the data therein?
I have a typical AWS Glue-generated script that loads data from an S3 bucket to my Aurora database available through a JDBC Connection. For reference, it looks like this:
import sys
from awsglue.transforms import *
from awsglue.utils import…

onkami
- 8,791
- 17
- 90
- 176
0
votes
0 answers
How does Partitioning work with AWS Glue Jobs
If I have a Glue Job running every hour but is partitioned by day... what is the expected functionality? Will the job first create a partition for that day and then subsequent jobs append to that partition? Is there any documentation that provides…

sgallagher
- 137
- 10
0
votes
1 answer
Concat / Join / Transform multiple columns to one struct column
I have very big, legacy file with ~5000 columns and very big amount of record.
Many columns are named like a_1,a_2,...,a_200 etc.
I want to concatenate number of columns into struct (for better data manipulation later), so instead:
_| a_1 | a_2 |…

Aylard
- 33
- 5
0
votes
1 answer
How to run pySpark with snowflake JDBC connection driver in AWS glue
I am trying to run the below code in AWS glue:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from…

Gaurav Gangwar
- 467
- 3
- 11
- 24
0
votes
1 answer
How to write the dataframe to S3 after filter
I am trying to write the data-frame after filtering to S3 in CVS format in script editing with below Scala code.
Current status:
Does not show any error after run but just not writing to S3.
The logs screen print Start, however cannot see print…

foy
- 387
- 4
- 15
0
votes
0 answers
Is it compulsory to convert glue dynamic frame to convert to spark dataframe before writing to snowflake?
Is it always necessary to convert glue dynamic frame to spark dataframe before writing to snowflake? I didn't find any other way anywhere. This conversion for 20 million records is taking most of the time. Writing only takes 2 mins.
Has anyone done…