1

I am writing a calibration pipeline to learn the hyperparameters for neural networks to detect properties of DNA sequences*. This therefore requires training a large number of models on the same dataset with different hyperparameters.

I am trying to optimise this to run on GPU. DNA sequence datasets are quite small compared to image datasets (typically 10s or 100s of base-pairs in 4 'channels' to represent the 4 DNA bases, A, C, G and T, compared to 10,000s of pixels in 3 RGB channels), and consequently cannot make full use of the parallelisation on a GPU unless multiple models are trained at the same time.

Is there a way to do this in nolearn, lasagne or, at worst, Theano?

* It's based on the DeepBind model for detecting where transcription factors bind to DNA, if you're interested.

Andrew Steele
  • 493
  • 6
  • 16
  • Hi @Statto, very good question. Unfortunately I can't help with my experience, but I hope some one could. The only advice I can offer you can try to add new representations to your feature space (for instance a 3mer composition). This will increase the system description. Do you have access to the DeepBind model code? I would love to give a look. – tbrittoborges Feb 02 '16 at 19:31
  • @Statto, it's possible with [tag:caffe] – ypx Feb 03 '16 at 16:22
  • @ypx How would you do it with Caffe? Any links or advice appreciated! – Andrew Steele Feb 16 '16 at 12:16
  • @Statto, you merge the network definitions into the same trainval.prototxt and give layer and blobs unique names. Demonstrated in [this](https://github.com/kashefy/caffe_sandbox/blob/master/examples/parallel_train/verify_parallel_training.ipynb) ipython notebook example. – ypx Mar 04 '16 at 17:31

0 Answers0