0

Thank you for reading!

We are running Presto along with Minio using docker-compose:

  minio:
    image: minio/minio:RELEASE.2021-08-31T05-46-54Z
    container_name: minio
    ports:
      - 9094:9000
      - 9001:9001
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
      MINIO_REGION: us-east-1
    command: server /data --console-address ":9001"
    mem_limit: 512m

  presto:
    image: ahanaio/prestodb-sandbox:0.250
    container_name: presto
    ports:
      - 9095:8080
    volumes: 
      - ./presto/minio.properties:/opt/presto-server/etc/catalog/minio.properties
      - ./presto/create-table.sql:/opt/create-table.sql
      - ./presto/sync-partitions.sql:/opt/sync-partitions.sql
    mem_limit: 1024m

We upload a bunch of data to minio, and then create a subdirectory with the following Presto SQL:

use minio.default;

CREATE TABLE orders ( 
  "name" array(varchar)
  "year" int, 
  "month" tinyint, 
  "day" tinyint) 
WITH (
  format='JSON', 
  external_location='s3a://orders-local/json/',
  partitioned_by=ARRAY['year', 'month', 'day']
  );

After executing the SQL statement, developers on our team are seeing inconsistent results in the orders-local bucket. In some cases, the json directory is present. In others, it is not present.

We are all running the same docker containers. So far, we are only seeing the issue on some MacOS instances, but not all. We have also ruled out OS version as we see successes and failures in Big Sur as well as Catalina.

Does anyone have experience with this kind of inconsistency?

Simon Tower
  • 664
  • 3
  • 11
  • 27
  • 1
    Do you want to see if Trino has the same issue for you? I never had this issue running with Trino. https://github.com/bitsondatadev/trino-getting-started/tree/main/hive/trino-minio – Brian Olsen Sep 03 '21 at 01:01
  • If you're not familiar with the project, it is the project driven by the creators and majority code contributors of Presto and is, for all intents and purposes, an upgrade. https://trino.io/blog/2020/12/27/announcing-trino.html – Brian Olsen Sep 03 '21 at 01:02
  • We have a workaround for this. SSH into the hadoop container and execute the following commands: `hdfs dfs -mkdir s3a://orders-local/json/` `hdfs dfs -touchz s3a://orders-local/json/test.txt` This manually creates the subdirectory, and places a dummy file in it. This guarantees that presto will see this subdirectory properly. – Simon Tower Sep 03 '21 at 20:27
  • Uff sorry you have to run that hack just to get this working. Seems like you may be stuck on Presto for now? Let me know if you want to make the hop over to Trino. – Brian Olsen Sep 03 '21 at 21:22

1 Answers1

0

Our environment variables are being overwritten by values in ~/.aws

We can't resolve why the env vars are not being honored, but we know this line is the culprit:

config.WithDefaultRegion(os.Getenv("STORAGE_REGION"))

We successfully get the right region with the os.Getenv() call, however config still pulls the value from our $HOME/.aws/config file.

So it seems that the presence of a .aws/config file takes precedence over the environment variables.

Simon Tower
  • 664
  • 3
  • 11
  • 27