1

My dataset consists of video game titles from various websites, formatted in different ways. Here's my example:

"The Legend Of Zelda: Wind Waker, Nintendo"
"The Legend Of Zelda: The Wind Waker"
"The Legend Of Zelda: Wind Waker, Nintendo"
"The Legend Of Zelda: Wind Waker, Nintendo"
"Zelda: Wind Waker Hd Nintendo Wii U Game"
"The Legend Of Zelda: The Wind Waker"
"Legend Of Zelda: The Wind Waker Hd (nintendo Wii"
"The Legend Of Zelda: Wind Waker Of Game (nintendo"
"The Legend Of Zelda: The Wind Waker Nintendo Wii"
"Nintendo Wii U Game Zelda: Wind Waker Hd"
"The Legend Of Zelda: The Wind Waker Hd Wii U"
"The Legend Of Zelda: Wind Waker, Nintendo Pinterest"
"Zelda: Hd (nintendo Wii The"
"The Legend Of Zelda: The Wind Waker Hd Wii U Pinterest"
"The Legend Of Zelda: The Wind Waker Hd"
"Legend Of Zelda: Wind Waker Hd (nintendo Wii"
"The Legend Of Zelda: The Wind Waker Hd"
"The Legend Of Zelda: Wind Waker, Nintendo Wii U"
"The Legend Of Zelda Wind Hd"
"Zelda Wind Waker Hd"
"The Legend Of Zelda: Wind Waker, Nintendo Pinterest"
"The Legend Of Zelda Wind Waker Wii U Nintendo"
"Wii U The Legend Of Zelda: The Wind Waker Hd"
"Zelda: Wind Waker Hd"
"The Legend Of Zelda: The Wind Waker Hd Game Wii"
"The Legend Of Zelda: The Wind Waker Hd Nintendo Wii U"
"Zelda: Wind Waker Hd"
"The Legend Of Zelda The Wind Waker Hd Wii U"

The correct output for this data would be:

The Legend Of Zelda: The Wind Waker HD - Title

Wii U - Platform

Nintendo - Publisher

I can feed a model 100's of these datasets, with what I would then expect as the correct output, and then hope that the model "learns" for future datasets of titles what an expected output might be.

Is this something that Machine Learning can do? What model should I use? I have never done anything with ML before so I'm unsure if this is a good use case for it.

Ethan Allen
  • 14,425
  • 24
  • 101
  • 194

1 Answers1

0

As I can see in your question, the Title, Platform and Publisher (Outputs) are extracted from the original data (Input), so you can use something similar to Named Entity Recognition, you should look at the literature to find out more but this is the most likely direction you should go.

ESDAIRIM
  • 611
  • 4
  • 12
  • Correct, the output would be built from the various inputs in the dataset. I'll look into that, thank you! – Ethan Allen Nov 08 '20 at 21:59
  • to clarify more, doing Named Entity Recognition is analogous to taking a marker and highlighting parts of your Input text to be Outputs, so keep that in mind. – ESDAIRIM Nov 08 '20 at 22:14