Highest Voted 'aws-glue-spark' Questions

0

votes

3 answers

Decrypt records using KMS in pySpark in AWS Glue

We are performing client-side encryption on certain text content and store them into individual files in s3. We are looking to read these files and process the content in AWS Glue. We are able to read the contents but during decryption, we get a…

asked Mar 01 '21 at 09:01

justlikethat

329
2
12

0

votes

1 answer

AWS Glue ETL - Converting Epoch to timestamp

as the title states, I'm having trouble converting a column on a Dynamic Frame from Epoch to a timestamp. I have tried onverting in into a Data Frame and back to Dynamic Frame but it is not working. This is what I'm working with: import sys from…

amazon-web-services pyspark aws-glue aws-glue-spark

asked Feb 25 '21 at 14:49

parmigiano

95
1
14

0

votes

0 answers

Rename File in aws glue scala

I am writing files to S3 using glue scala code. But it saves csv file as run-part-000.. I want it to save or rename as something.csv. How can it be done? Code snippet below- gluecontext.getSinkWithFormat( connectionType = "S3", options =…

aws-glue aws-glue-spark

asked Feb 22 '21 at 08:46

Abhishek Mishra

21
8

0

votes

1 answer

AWS Glue Spark job error: "ModuleNotFoundError: You need to install pyodbc respectively the AWS Data Wrangler package with the sqlserver"

I am using AWS Glue Spark with python job to sync the data from s3 to on-prem Sql Server and using AWS Wrangler and attached pyodbc wheel file along with it. when I ran my job I am getting this error "ModuleNotFoundError: You need to install pyodbc…

python sql-server aws-glue aws-glue-spark

asked Feb 15 '21 at 22:14

nithin

11
4

0

votes

1 answer

How to create a filter on an aws glue dynamicframe that filters out set of (literal) values

In a glue script (running in a zeppelin notebook forwarding to a dev endpoint in glue), I've created a dynamicframe from a glue table, that I would like to filter on field "name" not being in a static list of values, i.e. ("a","b","c"). Filtering…

aws-glue-spark

asked Feb 12 '21 at 15:38

Anske

1
3

0

votes

1 answer

Pyspark - dataframe..write - AttributeError: 'NoneType' object has no attribute 'mode'

I am trying to convert csv files into parquet using pyspark. parquet_file = s3://bucket-name/prefix/ parquet_df.write.format("parquet").option("compression", "gzip").save(parquet_file).mode(SaveMode.Overwrite) I am trying to overwrite parquet…

python apache-spark-sql pyspark aws-glue-spark

asked Feb 03 '21 at 23:32

Mahantesh Angadi

1
1

0

votes

1 answer

Connecting to Presto database using AWS Glue. Unable to pass SSL Keystore or Certificate

I have issue connecting to Presto using AWS glue job. The code is written in Spark Scala. I am trying to connect to Presto using the below code. val datanot_in_hz = sqlcontext.read.format("jdbc").option("url", jdbcConUrl).option("driver",…

scala apache-spark ssl aws-glue aws-glue-spark

asked Dec 18 '20 at 21:58

roshaga

257
3
11

0

votes

1 answer

AWS Glue maximum and transform rows

I am trying to load data in one of the table created using AWS glue from source bucket S1. Source bucket having 4 columns( session_id, Date, type, action ) with below values. Purchase transaction lasted for 1 min and we get 2 records for the same.…

amazon-web-services amazon-s3 aws-glue amazon-athena aws-glue-spark

asked Dec 06 '20 at 15:02

Ganesh

7
4

0

votes

1 answer

How to choose python version 3 while deploying AWS glue Job with glue version 1.0 using YAML(serverless)

How to choose python version 3 while deploying AWS glue Job with glue version 1.0 using YAML(serverless)? I'm deploying AWS glue using serverless YAML code. AWS has provided GlueVersion parameter to choose the version of glue to use which I'm…

python amazon-web-services aws-glue aws-glue-spark

asked Nov 23 '20 at 16:50

Rajan Beri

1
4

0

votes

1 answer

How to add an index to an RDS database/table after AWS Glue script imports the data therein?

I have a typical AWS Glue-generated script that loads data from an S3 bucket to my Aurora database available through a JDBC Connection. For reference, it looks like this: import sys from awsglue.transforms import * from awsglue.utils import…

amazon-web-services pyspark amazon-rds aws-glue aws-glue-spark

asked Nov 20 '20 at 15:51

onkami

8,791
17
90
176

0

votes

0 answers

How does Partitioning work with AWS Glue Jobs

If I have a Glue Job running every hour but is partitioned by day... what is the expected functionality? Will the job first create a partition for that day and then subsequent jobs append to that partition? Is there any documentation that provides…

partitioning aws-glue aws-glue-spark

asked Nov 19 '20 at 02:54

sgallagher

137
10

0

votes

1 answer

Concat / Join / Transform multiple columns to one struct column

I have very big, legacy file with ~5000 columns and very big amount of record. Many columns are named like a_1,a_2,...,a_200 etc. I want to concatenate number of columns into struct (for better data manipulation later), so instead: _| a_1 | a_2 |…

python python-3.x data-science etl aws-glue-spark

asked Nov 06 '20 at 12:41

Aylard

33
5

0

votes

1 answer

How to run pySpark with snowflake JDBC connection driver in AWS glue

I am trying to run the below code in AWS glue: import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from…

python apache-spark pyspark snowflake-task aws-glue-spark

asked Oct 17 '20 at 18:13

Gaurav Gangwar

467
3
11
24

0

votes

1 answer

How to write the dataframe to S3 after filter

I am trying to write the data-frame after filtering to S3 in CVS format in script editing with below Scala code. Current status: Does not show any error after run but just not writing to S3. The logs screen print Start, however cannot see print…

scala apache-spark-sql aws-glue aws-glue-data-catalog aws-glue-spark

asked Oct 17 '20 at 17:27

foy

387
4
15

0

votes

0 answers

Is it compulsory to convert glue dynamic frame to convert to spark dataframe before writing to snowflake?

Is it always necessary to convert glue dynamic frame to spark dataframe before writing to snowflake? I didn't find any other way anywhere. This conversion for 20 million records is taking most of the time. Writing only takes 2 mins. Has anyone done…

apache-spark jdbc snowflake-cloud-data-platform aws-glue-spark

asked Oct 11 '20 at 06:55

Amlan Mishra

1

Questions tagged [aws-glue-spark]