How to write a predicate expression on AWS Glue

Question

I'm new to AWS Glue and PySpark. Below is a code sample

    glue_context.create_dynamic_frame.from_catalog(
    database = "my_S3_data_set",
    table_name = "catalog_data_table",
    push_down_predicate = my_partition_predicate)

in the guide Managing Partitions for ETL Output in AWS Glue.

Suppose a SQL query to filter the data frame is as below

    select * from catalog_data_table
    where timestamp >= '2018-1-1'

How to do the pre-filtering on AWS Glue?

https://stackoverflow.com/questions/57925034/aws-push-down-predicate-not-working-when-reading-hive-partitions/70453286#70453286 — vaquar khan, Dec 29 '21 at 16:16

score 0 · Answer 1 · answered Nov 16 '18 at 00:23

0

Generally speaking, your data should be partitioned and then you will be able to use these partitioning columns in push_down_predicate expression.

Please take a look at this answer.

answered Nov 16 '18 at 00:23

Yuriy Bondaruk

4,512
2
33
49

How to write a predicate expression on AWS Glue

1 Answers1