I have a list like similar to this:
list = ['Opinion, Journal, Editorial',
'Opinion, Magazine, Evidence-based',
'Evidence-based']
where the commas split between categories eg. Opinion and Journal are two separate categories. The real list is much larger and has more possible categories. I would like to use one-hot encoding to transform the list so that it can be used for machine learning. For example, from that list I would like to produce a sparse matrix containing data like:
list = [[1, 1, 1, 0, 0],
[1, 0, 0, 0, 1],
[0, 0, 0, 0, 1]]
Ideally, I would like to use scikit-learn's one hot encoder as I presume this would be the most efficient.
In response to @nbrayns comment:
The idea is to transform the list of categories from text to a vector wherby if it belongs to that category it will be assigned 1, otherwise 0. For the above example, the headings would be:
headings = ['Opinion', 'Journal', 'Editorial', 'Magazine', 'Evidence-based']