1

I recently downloaded Deeplearning for Java (DL4J) and am now experimenting a bit with convolutional nets. I found some working samples on the homepage and the internet, for example how to classify images, i.e. recognizing faces. I understand roughly how the training data is read and that the images are labeled with the correct label the net should recognize ("Barack Obama - Picture 1"). However, from here on I am a little bit stuck. I cannot make sense of the output. Like described on http://deeplearning4j.org/image-data-pipeline we use an ImageRecordReader to read in the images and create training data out of it. But how does the program then for example know to classify one image as "barack obama" and not "barack obama - sample picture 1"? Or does the net do that? I don't think so. My next problem then is to change the application to not merely recognize an object but to evaulate an image, kind of like in AlphaGo evaluating a board position represented as an image. How would I input the data then? I could label for example training board states with their score ... but I do not know if that is good at all. I hope this was understandable, help and minimal samples would be greatly appreciated!

Thanks and have a good day Oliver

Gemini
  • 475
  • 3
  • 12

1 Answers1

0

The example you cite uses the Labeled Faces in the Wild dataset, which has the following folder structure:

lfw
├── Aaron_Eckhart
├── Aaron_Guiel
├── Aaron_Patterson
│   ├── Aaron_Patterson_0001.jpg

The ImageRecordReader class extends the abstract BaseImageRecordReader class, which in its initialize() method uses the following lines (131-134) to create the labels array:

File parentDir = imgFile.getParentFile();
String name = parentDir.getName();
if(!labels.contains(name))
    labels.add(name);

In other words, it does not use the names of the JPEG files, but rather that of its parent folder.

As for your second question:

My next problem then is to change the application to not merely recognize an object but to evaulate an image, kind of like in AlphaGo evaluating a board position represented as an image. [..] I could label for example training board states with their score ... but I do not know if that is good at all.

I would suggest starting with reading the following paper: http://www.nature.com/nature/journal/v529/n7587/fig_tab/nature16961_F1.html, and the following outline: https://www.tastehit.com/blog/google-deepmind-alphago-how-it-works/ (especially starting from the AlphaGo section).

AlphaGo relies on two different components: A tree search procedure, and convolutional networks that guide the tree search procedure. [..] In total, three convolutional networks are trained, of two different kinds: two policy networks and one value network. Both types of networks take as input the current game state, represented as an image.

[..]

The value network provides an estimate of the value of the current state of the game: what is the probability of the black player to ultimately win the game, given the current state? The input to the value network is the whole game board, and the output is a single number, representing the probability of a win.

The policy networks provide guidance regarding which action to choose, given the current state of the game. The output is a probability value for each possible legal move (i.e. the output of the network is as large as the board). Actions (moves) with higher probability values correspond to actions that have a higher chance of leading to a win.

Community
  • 1
  • 1
appel
  • 517
  • 2
  • 7
  • 19