I have a glue job that reads from an S3 bucket does transformations and uploads the result in another S3 bucket.
Here's what my aws glue get-job-bookmark --job-name xx
returns
JobBookmark": "{\"datasource0\":{\"jsonClass\":\"HadoopDataSourceJobBookmarkState\",\"timestamps\":{\"RUN\":\"4\",\"HIGH_BAND\":\"900000\",\"CURR_LATEST_PARTITION\":\"1618957000000\",\"CURR_LATEST_PARTITIONS\":\"s3://XXYY/2021/04/20/16/\",\"CURR_RUN_START_TIME\":\"2021-04-20T22:43:19.304Z\",\"INCLUDE_LIST\":\"\"}}}"
As you can see my S3 is structured as bucketname/yyyy/mm/dd/HH. And the above shows the bookmark is set at the prefix 2021/04/20/16.
Now if another file is added at the same exact prefix, it is processed.
However if there's a newer partition, say, 2021/04/20/17 and there's a file in it - it doesn't get picked up by the bookmark.
My script is very straightforward, most of it is auto-generated since I am only testing this feature.
The location of my table is specified as S3://xxyy at the very top level.
Thanks for reading.