Oracle Autonomous Database provides the DBMS_CLOUD package for accessing object store files.
DBMS_CLOUD package has a function LIST_OBJECTS, which provides a list of files in an object store location (bucket or subfolder). This function provides the object name , size in bytes and checksum, which can be used to compare files in the 2 buckets with the help of Oracle SQL FULL OUTER JOIN and WITH Clause.
SQL Script:
define bucket1_uri='https://objectstorage.us-phoenix-1.oraclecloud.com/n/mynamespace/b/mybucket/o/folder1/'
define bucket2_uri='https://objectstorage.us-ashburn-1.oraclecloud.com/n/mynamespace/b/mybucket/o/folder2/'
define credname1='OCI$RESOURCE_PRINCIPAL'
define credname2='OCI$RESOURCE_PRINCIPAL'
set linesize 120
set pages 1000
col object_name_A format a30
col object_name_B format a30
col object_checksum_A format a10
col object_checksum_B format a10
-- Use SQL FULL OUTER JOIN and compare the checksum exclude matching rows
WITH bucket1 AS
(SELECT object_name, bytes, checksum FROM DBMS_CLOUD.LIST_OBJECTS('&credname1', '&bucket1_uri')
ORDER BY object_name, bytes, checksum),
bucket2 AS
(SELECT object_name, bytes, checksum FROM DBMS_CLOUD.LIST_OBJECTS('&credname2', '&bucket2_uri')
ORDER BY object_name, bytes, checksum)
SELECT rownum id, diff.* FROM
(SELECT a.object_name object_name_A, a.bytes object_size_A, a.checksum object_checksum_A,
b.object_name object_name_B, b.bytes object_size_B, b.checksum object_checksum_B
FROM bucket1 a FULL OUTER JOIN bucket2 b
ON a.object_name = b.object_name) diff
WHERE diff.object_name_A IS NULL OR
diff.object_name_B IS NULL OR
(diff.object_name_A = diff.object_name_B AND
diff.object_checksum_A != diff.object_checksum_B)
/
Explanation:
In the above script:
- The assumption is to provide 2 Object store URLs and a credential name. One can use Resource principal for OCI Object store in Autonomous Database to avoid specifying passwords.
- Using Oracle SQL WITH Clause, there are 2 inline views to provide the object_name, bytes and checksum columns from DBMS_CLOUD.LIST_OBJECTS in the two buckets.
- The script uses file checksum to compare files in the two buckets, which is provided by cloud object store providers as a hash of the file content.
- Using SQL Full Outer Join, we can get objects existing in bucket1 and not in bucket2, or vice-versa, as well as objects that have a same name in both buckets but different file checksum.
- Finally, the SQL above does not show objects that have a same name and same size in both buckets.
Use Cases:
The DBMS_CLOUD package works for multiple object stores such as Oracle OCI, AWS S3, Azure BLOB, Google Cloud, Wasabi, so the above script can be used to compare buckets in multiple scenarios such as -
- Compare buckets in same region in OCI
- Compare buckets in different regions in OCI
- Compare buckets in different accounts in OCI object store
- Compare buckets in different cloud providers - OCI bucket vs S3 bucket
References: