Highest Voted 'aws-glue-spark' Questions

0

votes

0 answers

SQL Server bcp tool on AWS GLUE job

any one has tried sql server bcp utility on aws Glue job (Python shell or Spark type)? bcp tool needs to be installed using Sudo yum commands however those are not supported on Glue. sudo yum install mssql-tools unixODBC-devel reference link from…

asked Aug 04 '22 at 16:33

Sam

392
1
6
18

0

votes

0 answers

How to trigger a Glue job from another Glue job

Is it possible to trigger a glue job(pyspark) from another glue job(pyspark) using boto3. Everything seems to be working fine(no syntax or code errors) except the boto3 method glue_client.start_job_run() Tested the similar code in Lambda and it's…

python amazon-web-services boto3 aws-glue aws-glue-spark

asked Aug 03 '22 at 11:41

Ravi Teja Surla

1
3

0

votes

1 answer

Reading Spark Dataframe from Partitioned Parquet data

I have parquet data stored on S3 and Athena table partitioned by id and date. The parquet files are stored in s3://bucket_name/table_name/id=x/date=y/ The parquet file contains the partition columns in them (id, date), because of which I am not…

apache-spark pyspark parquet aws-glue aws-glue-spark

asked Jul 19 '22 at 23:13

AswinRajaram

1,519
7
18

0

votes

0 answers

How to catch an exception thrown from imported module in pyspark

I want to catch an exception thrown from imported module and raise it to fail the job giving the same exception. for example, ------a.py---------- def check(a, b): try: # Check something except Exception as e: raise…

python pyspark aws-glue aws-glue-spark

asked Jul 15 '22 at 09:55

Tushar Patil

748
4
13

0

votes

1 answer

AWS Glue issue causing a PicklingError

I'm running into an issue with AWS Glue where when I run a Map.apply function to a DataFrame in order to decrypt a given column value it throws an error. The error I'm getting is PicklingError: Could not serialize object: TypeError: can't pickle…

python apache-spark pyspark aws-glue aws-glue-spark

asked Jul 14 '22 at 23:48

Ryan Shea

1
1

0

votes

1 answer

Glue/Spark: Filter a large dynamic frame with thousands of conditions

I am trying to filter a timeseries glue dynamic frame with millions of rows having data: id val ts a 1.3 2022-05-03T14:18:00.000Z a 9.2 2022-05-03T12:18:00.000Z c 8.2 2022-05-03T13:48:00.000Z I have another pandas dataframe with thousands…

apache-spark pyspark apache-spark-sql aws-glue aws-glue-spark

asked Jul 06 '22 at 21:34

Azeem Akhter

497
7
19

0

votes

2 answers

It is possible use Spark 3.3.0 in AWS Glue 3.0

I would like to use Spark 3.3.0 version features like Trigger.availableNow in AWS Glue 3.0 with scala, but the AWS Glue 3.0 usage Apache spark version 3.1.1, Is there any way to use apache spark 3.3.0 in AWS Glue 3.0 with scala.

apache-spark aws-glue aws-glue-spark aws-glue3.0

asked Jun 25 '22 at 13:07

krishna Prasad

3,541
1
34
44

0

votes

1 answer

How to calculate number of G.1 Workers in AWS Glue for processing 1TB data?

I have 1TB of data from the parquet S3 to be loaded in AWS Glue Spark Jobs. I am trying to figure out the number of workers needed for this type of requirement. As per me below are the details of the G.1x configuration: 1 DPU added for MasterNode …

amazon-web-services apache-spark aws-glue aws-glue-spark

asked Jun 23 '22 at 05:34

RushHour

494
6
25

0

votes

0 answers

File conversion XML to JSON in S3 through AWS Glue

I have my bucket structure like below and i have xml files landing in this s3 bucket folder. S3:/Fin-app-ops/data-ops/raw-d Need to convert those xml files to JSON files and put back to s3 in same bucket but different…

python amazon-s3 aws-glue aws-glue-spark xml-to-json

asked Jun 17 '22 at 12:19

Sarath

35
3

0

votes

3 answers

AWS glue job (Pyspark) to AWS glue data catalog

We know that, the procedure of writing from pyspark script (aws glue job) to AWS data catalog is to write in s3 bucket (eg.csv) use a crawler and schedule it. Is there any other way of writing to aws glue data catalog? I am looking for a direct way…

amazon-web-services aws-glue aws-glue-data-catalog aws-glue-spark

asked Jun 02 '22 at 13:04

Mehedee Hassan

133
1
9

0

votes

0 answers

Pyspark: Input_filename() returns empty string when reading json.gz file

I am trying to get filenames(file format:json.gz) using input_filename() function in pyspark. Below is the code: df.withColumn("source_file",sql_f.element_at(sql_f.split(sql_f.input_file_name(), "/"), -1) It returns an empty string. Below is the…

amazon-web-services pyspark apache-spark-sql aws-glue aws-glue-spark

asked Jun 02 '22 at 11:46

Nabeel Khan Ghauri

125
1
4
15

0

votes

1 answer

Writing each row in a spark dataframe to a separate json

I have a fairly large dataframe(million rows), and the requirement is to store each of the row in a separate json file. For this data frame root |-- uniqueID: string |-- moreData: array The output should be stored like below for all the…

scala apache-spark apache-spark-sql aws-glue-spark

asked May 30 '22 at 22:50

Thal

93
2
7

0

votes

1 answer

AWS Glue - IllegalArgumentException: Duplicate value for path

I have a messy data source where some field values can come in with two different names but should map to one conformed field name on the output. e.g. data source contains update_date or modified_date and both should map to timestamp. Both field…

python aws-glue aws-glue-spark

asked May 14 '22 at 19:36

Alex R

11,364
15
100
180

0

votes

0 answers

Glue Dynamic Frame Parse text file with ¶ delimiter

I have a text file which look like below. HDR¶20200101 BDY¶1¶Jimmy BDY¶1¶Something TRL¶123 I would like to parse it to a Glue Dynamic Dataframe by filtering out the header trailer. Also assign the header as ID, Name. I tried the below code and it…

pyspark apache-spark-sql aws-glue aws-glue-spark

asked May 13 '22 at 21:51

need_the_buzz

423
2
9
18

0

votes

1 answer

AWS GLUE Image certificate related issue

I am new to Docker . Please help in resolving the issue. I have created Docker compose file mentioned below : version: "2" services: spark: image: glue/spark:latest container_name: spark ** build: ./spark** hostname: spark ports: -…

aws-glue-spark

asked May 12 '22 at 16:38

pbh

186
1
9

Questions tagged [aws-glue-spark]