0

I am a Pytorch newbie and am very interested in using the DCGAN architecture to feed in .npy files and hopefully generate new .npy files. The shape of the numpy files is (128, 7752).

Could anyone help me disect the DCGAN architecture to try and solve this problem

At this point it is very theoretical. I know it is possible, i just dont know enough to alter the DCGAN architecture.

1 Answers1

0

First of all, what is inside your .npy files?

Although DCGAN is a powerful architecture, it is especially helpful for image data. Of course, you could consider any 2d data as images, but the results may be disappointing. In your case, even if the .npy files are images, they are likely too big to be generated with a vanilla DCGAN. You will run out of memory or simply have extremely poor results.

GANs are known to be quite challenging to train when it comes to large data and great diversity here is an article that reviews the main difficulties of training GANs.

Nevertheless, if you are willing to try anyway, here are the main steps that you should follow:

  1. Crop your data into smaller pieces if it is possible. In any case, for an easier design of the models, the spatial dimension of your data should ideally be a power of 2.
  2. Build a generator that output the right size images from a random patch of noise, e.g. if your data is 128x4096, you could build a generator with enough deconvolution layers tho have a stride of 128. Then you should sample your noise with dimension 1x32xC where C is the number of channels.
  3. Build a discriminator as a CNN as well, it may not need to have the same stride. The discriminator will provide a classification map that says wether each patch in the input image is real or fake.
  4. Implement the loss and the training loop.

I hope this helps.

pierlj
  • 44
  • 2
  • This is so helpful thank you. It is not image data that i am using, but melspectrogram data generated from audio files. Do you think there is a better generative method to use? – Mark Stent Jan 19 '23 at 12:32
  • Unfortunately, I have never worked with spectrograms so this is just a wild guess. The spectrograms are sequences of 128-d vectors that represent a fixed-length audio signal (probably a fraction of a second). Therefore, you could simply choose to generate smaller audio signals (i.e. narrower spectrograms), for instance, you could start with 128x512 spectrograms. Although, if you really need to generate longer audio signals with coherence, DCGAN will probably be not appropriate. – pierlj Jan 20 '23 at 09:16
  • For longer signal generation, I guess you could use transformer-based approach as they are particularly well suited for sequence generation. Also check out recent diffusion approach such as VALL-E, you might find useful design tricks in it. – pierlj Jan 20 '23 at 09:16