Unify text and image classification (Python)

Question

I am working on a code to classify texts of scientific articles (using the title and the abstract). And for this I'm using an SVM, which delivers a good accuracy (83%). At the same time I used a CNN to classify the images of these articles. My idea is to merge the text classifier with the image classifier, to improve the accuracy.

It is possible? If so, you would have some idea how I could implement it or some kind of guideline?

Thank you!

score 1 · Answer 1 · answered Feb 04 '19 at 16:39

1

You could use the CNN to do both. For this you'd need two (or even three) inputs. One for the text (or two where one is for the abstract and the other for the title) and the second input for the image. Then you'd have some conv-max pooling layers before you merge them at one point. You then plug in some additional CNN or dense layers.

You could also have multiple outputs in this model. E.g a combined one, one for the text and one for the images. If you're using keras you would need the functional API. A picture of an example model can be found here (They're using LSTM in the example, but I guess you should stick to CNN.)

answered Feb 04 '19 at 16:39

Syrius

941
6
22

Thank you for your input as well :) I'm trying to apply a similiar approach, but I always get a error on the array dimensions. Because a image array look like this: [256, 256, 3] and the text [9460, 1000]. My error at the moment is: `ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_19:0", shape=(?, 256, 256, 3), dtype=float32) at layer "input_19". The following previous layers were accessed without issue: []` – Bruna B Feb 05 '19 at 11:00
Sorry I can't help you with that problem without the code – Syrius Feb 06 '19 at 12:54
No problem, I solved, thank you. But now I'm in the following doubt: do I have to have the text and its corresponding image in my training set? Cause I have randomly selected my texts and my images to my training set. But I'm confused now with the matching of sets. You understand? – Bruna B Feb 06 '19 at 13:55
You need to make sure that you feed the image with the corresponding text to the neural network for training. If you give a text that does not belong to the image it will learn something else. So it is good to randomly pick the samples but you need to make sure that the image corresponds to the text in the training/test set. – Syrius Feb 06 '19 at 14:46
Ok, I got it. Thank you very much =] – Bruna B Feb 06 '19 at 15:33
sorry to bother you again, but I still have a question. My data is separate. I have the texts in a .csv file and the images in separate folders (according to each class). Would you have a suggestion of how to separate the training set, so that the texts and the images match? – Bruna B Feb 07 '19 at 08:09

score 0 · Answer 2 · answered Feb 04 '19 at 17:33

0

If you get probability from both classifiers you can average them and take the combined result. However taking a weighted average might be a better approach in which case you can use a validation set to find the suitable value for the weight.

prob_svm = probability from SVM text classifier
prob_cnn = probability from CNN image classifier
prob_total = alpha * prob_svm + (1-alpha) * prob_cnn  # fine-tune alpha with validation set

If you can get another classifier (maybe a different version of any of these two classifiers), you can also do a majority voting i.e., take the class on which two or all three classifiers agree on.

answered Feb 04 '19 at 17:33

xashru

3,400
2
17
30

Thank you for your input! In case of a majority voting... how should my validation set be? Should I have a mixed set or just texts or just images? This leaves me quite confused in the implementation. – Bruna B Feb 05 '19 at 10:48
should consist of mixed set. You would split data by document and both the text and image of a document will be either on training data or validation data(not like one in training, one in validation) – xashru Feb 05 '19 at 11:08
I think my data types may be a problem then. Because I have separate images and texts. Images in folders according to their classes and texts in a .csv file. Do you think it's still possible? Thank you again, @xashru =) – Bruna B Feb 05 '19 at 12:48
You can train individual models separately but need to make sure that image and text of same document are not mixed in train and validation set. – xashru Feb 05 '19 at 12:55
Thank you! I got it, I will certainly try. – Bruna B Feb 05 '19 at 15:05

Unify text and image classification (Python)

2 Answers2