I have a large list of strings. Each string is a different example in the training dataset and contains a list of categories, whereby each category is separated by a comma. Eg.
mesh = ['aligator, dog, cat', 'cat, mouse, aligator', '']
Some examples may not belong to any category and so will be represented as an empty string.
I wish to use one-hot encoding to encode these categories for use in machine learning.
How can I do this? I do not have a complete list of categories and there are approximately 5,000 possible categories.