tldr: How can I convert a folder of pdfs into a list of CMYK values (or RGB or any kind of colour scale values), preferably in python.
I have a folder with around ~100,000 documents in it. To make sampling these documents easier I want to run data analysis on the documents (clustering and anomaly detection), and one metric I want to have is the CMYK coverage. Is there any method or package in (preferably) python that will calculate the CMYK coverage of the PDF?
****edit****
After some research I have found out that GhostScript should provide the functionality I require, if anyone could help me with the implementation I would still really appreciate it.