How do you know which layers and how many layers to use?

Question

I am new to machine learning and have spent some time learning python. I have started to learn TensorFlow and Keras for machine learning and I literally have no clue nor any understanding of the process to make the model. How do you know which models to use? which activation functions to use? The amount of layers and dimensions of the output space?

I've noticed most models were the Sequential type, and tend to have 3 layers, why is that? I couldn't find any resources that explain which to use, why we use them, and when. The best I could find was tensorflow's function details. Any elaboration or any resources to clarify would be greatly appreciated.

Thanks.

score 1 · Answer 1 · answered Mar 05 '19 at 20:37

I'd suggest you continue to read more about machine learning. The one here is a multi-part explanation. Disclaimer: I don't know the author and this is not my own work.

Also, I suggest a simple thought experiment where you have binary classification and have to consider how the different shapes of the activation functions may affect your results.

https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12

Regarding your model choice: This is highly dependent on your data and what you wish to explore. If I were you, I'd try to visualize your data first to see if there are any interesting relationships.

For example, a seaborn pairplot (https://seaborn.pydata.org/generated/seaborn.pairplot.html) is one way to visualize relationships between variables. If you have a lot of data points, I'd suggest only using a sample of at most a few hundred data points as this plot can take a long time to make otherwise. You can also try DataShader, but I haven't used it personally.

Once you visualize your data, then try to actually think about what these relationships might mean between the variables. Doing all of this before using these machine learning models will guide you later as you try to implement some of the models in the above post.

Also, sometimes some deep learning algorithm is not the best approach. Often times (depending on whether you are doing a classification or regression problem) linear (or multiple linear) regression will suffice. For a regression problem, I often start with (multiple) linear regression as my baseline model and then improve upon it with regularization before I try fancy deep neural networks.

Deep neural networks are slower to train then linear models, can easily overfit your data, and can even give the same (and even worse!) results as the simpler linear regression. Consider whether you are trying to be a hammer in search of a nail when outright applying deep learning to a multitude of problems from the get-go.

Hope this helps.

TLDR:

Visualize your data and figure out if you want to do regression or classification
Start with simple linear models as a baseline and compute performance metric (ex. MSE)
Improve (hopefully) with neural networks and see if the additional gain is worth it in your case. At some point, you may have to experiment with different activation functions to see which suits your case more.

score 1 · Answer 2 · answered Mar 05 '19 at 20:40

Nobody really knows why certain architectures work, that is still a topic of ongoing discussion (see, e.g., this paper).

Finding architectures that work is mostly trial and error, and adopting or modifying existing architectures that seem to work well for related tasks and dataset sizes.

I would refer you to Goodfellow, Bengio, and Courville's book, it is a great resource to get started with machine learning and deep learning in particular.

How do you know which layers and how many layers to use?

2 Answers2