1

What is the proper way to connect DVC to Min.IO that is connected to some buckets on S3.

AWS-S3(My_Bucket) > Min.io(MY_Bucket aliased as S3)

Right now i am accessing my bucket by using mc for example mc cp s3/my_bucket/datasets datasets to copy stuff from there. But I need to setup my DVC to work with min.io as a hub between AWS.S3 and DVC so i can use for example "DVC mc-S3 pull" and "DVC AWS-S3 pull".

How do i got for it because while googling i couldn't find anything that i could easily follow.

  • 1
    Why do you need to access Minio? To download data? To store data? To use as the DVC project cache? Please look at **S3-compatible storage** in https://dvc.org/doc/command-reference/remote/add#supported-storage-types for now. – Jorge Orpinel Pérez Jul 28 '21 at 19:45
  • I need to do everything that is done on s3 and possible to do on min.io by treating min.io as hub – Niewasz Biznes Jul 29 '21 at 09:32

1 Answers1

2

It looks like you are looking for a combination of things.

First, Jorge mentioned you can set endpointurl to access Minio the same way as you would access regular S3:

dvc remote add -d minio-remote s3://mybucket/path
dvc remote modify minio-remote endpointurl https://minio.example.com                          

Second, it seems you can create two remotes - one for S3, one for Minio and use -r option that is available for many data management related commands:

dvc pull -r minio-remote
dvc pull -r s3-remote
dvc push -r minio-remote
...

This way you could push/pull data to/from a specific storage.

But I need to setup my DVC to work with min.io as a hub between AWS.S3 and DVC

There are other possible ways, I think to organize this. It indeed depends on what semantics you expect from DVC mc-S3 pull. Please let us know if -r is not enough and clarify the question- that would help us here.

Shcheklein
  • 5,979
  • 7
  • 44
  • 53
  • dvc pull -r X/Y etc is more than enough but i still don't understand one thing. So if i have a server then i add min.io as you mentioned and that's great, but for me to access my S3 bucket over min.io, I had to install client for min.io, setup it to add s3 storage and then i can use it with command like 'mc cp S3/my_bucket/dataset /dataset' but i am not sure how i can do it over dvc? – Niewasz Biznes Jul 29 '21 at 09:35
  • `access my S3 bucket over min.io ... add s3 storage` - could you please clarify this a bit? Send me some links to the documentation, may be I'm not that familiar with all the things that Minio could do. – Shcheklein Jul 29 '21 at 21:05
  • So the links will be the problem 'cause I cannot find them so I created this topic. So i got a server hosted on min.io and server hosted on s3. On another PC i got an min.io client that has access to S3 and my min.io server. So thanks to it i can for example do mc copy s3-server/dataset dataset so it copies dataset from S3, then i also can do it for my min.io server with mc copy min-io-server/dataset dataset. And now since i got no other access to min.io server and s3 server i thought it's possible to use mc client as a hub for DVC. Maybe now it's clearer? – Niewasz Biznes Aug 02 '21 at 13:44
  • 1
    Thanks! I think I have a better idea, but I'm lost on this `as a hub for DVC` still :( What does hub mean? Could you give some examples of the DVC commands that should/would work for you? It feels that what you are trying to do is to make DVC use `mc` or its configuration in `pull`, `push`. DVC rely on AWS CLI common configs only and support regular S3 env vars, etc. – Shcheklein Aug 03 '21 at 02:40
  • Yea that was my mistake and i tried to use DVC use mc for some reason i thought that should be possible but i guess it's not, but i still got a problem with dvc pull -r X/Y So i got to my dvc added two storages X (s3 storage), Y (min.io storage). I have two different folders uploaded there X_folder and Y_folder inside a folder dataset. So when i am trying to do dvc pull -r X i am getting a folder dataset/X_folder and when i am trying to do dvc pull -r Y i will get a folder dataset/Y_folder but i am getting X_folder again. Any chance you could tell me how to get that? – Niewasz Biznes Aug 03 '21 at 13:24
  • Re `got an min.io client that has access to S3` - Maybe related to https://docs.min.io/docs/minio-gateway-for-s3.html ? – Jorge Orpinel Pérez Aug 03 '21 at 23:59
  • Re `but i am getting X_folder again` - @NiewaszBiznes please open a separate follow-up question with all the details to reproduce the issue, or reach out in https://dvc.org/chat – Jorge Orpinel Pérez Aug 04 '21 at 00:04