You're misunderstanding the API. When you add some transform to your dataset, it is essentially a function which is being applied to every sample from that dataset and then returned. transforms.Compose
applies sub-transforms sequentially, rather than returning multiple results (with each translation either being applied or not). So
transforms.Compose([
transforms.RandomRotation(degrees = (90, -90)),
transforms.RandomRotation(degrees = (180, -180)),
])
will just rotate your image once at a random angle between 90 and 90 degrees (in other words, by exactly 90 degrees) and then again by 180. This is equivalent to a single RandomRotation(degrees=(270, 270))
(it is actually worse because it leads to more data corruption in the process).
So, most transforms
are as above - "linear" - one input, one output. There are some "forking" transforms which produce more outputs than inputs. An example is FiveCrop
. Please pay attention to its note on how to deal with that. Even with "forking" transforms, you will still get the same number of items in your dataset, it's just that your batches will be bigger.
If you specifically want to have a dataset which contains 4 differently rotated copies of each item and yields them randomly (ie. possibly each rotated variant comes in a different batch), you will have to write some custom data loading logic. For that, you may want to base your work on source of DatasetFolder
.
Why is the API made the way it is? In practice, most people are fine with the transforms as they are currently - in your place, they would just write a transform which randomly flips by 0, 90, 180 or 270 degrees and then train their network for 4 times more epochs than you would, on average getting one sample of each.