I've got a bunch of images (~3000) which have been manually classified (approved/rejected) based on some business criteria. I've processed these images with Google Cloud Platform obtaining annotations and SafeSearch results, for example (csv format):
file name; approved/rejected; adult; spoof; medical; violence; annotations A.jpg;approved;VERY_UNLIKELY;VERY_UNLIKELY;VERY_UNLIKELY;UNLIKELY;boat|0.9,vehicle|0.8 B.jpg;rejected;VERY_UNLIKELY;VERY_UNLIKELY;VERY_UNLIKELY;UNLIKELY;text|0.9,font|0.8
I want to use machine learning to be able to predict if a new image should be approved or rejected (second column in the csv file).
Which algorithm should I use?
How should I format the data, especially the annotations column? Should I obtain first all the available annotation types and use them as a feature with the numerical value (0 if it doesn't apply)? Or would it be better to just process the annotation column as text?