more than one target column in MLDataTable - CreateML framework in Swift

Question

I want to create a MLDataTable with one feature and three targets using Create ML framework. For example, let us assume I'm creating a calendar app, which has a feature to add quick event like the Native Mac Calendar app. I have a feature column text which contains strings like Club game at Nehru Stadium, Chennai on Saturday Morning. I want the three target columns title, location and time to get the values Club game, Nehru Stadium, Chennai and 24 Nov 2018, 08:00.

Also, Kindly let me know if there are any other ways to implement the same using CreateML framework.

Ozgur Sahin · Answer 1 · 2018-12-11T18:29:19.023

You can train MLWordTagger for this task. Create a training data file (JSON) in this format.

[
    {
      "tokens": [
        "Club game",
        "at",
        "Nehru Stadium Chennai",
        "on",
        "Saturday Morning"
      ],
      "labels": [
        "TITLE",
        "NONE",
        "LOCATION",
        "NONE",
        "TIME"
      ]
    },
    ... other sample records...

  ]

You can train with the code below in Playground.

var trainingData = try MLDataTable(contentsOf: URL(fileURLWithPath: "/pathto..train.json"))

let model = try! MLWordTagger(trainingData: trainingData, tokenColumn: "tokens", labelColumn: "labels")

And then use this prediction method to predict each of your tokens in the sentence.

func prediction(from tokens: [MLWordTagger.Token]) throws -> [String]

This method return an array of tags for the tokens.

An alternative way of doing this is using NLTagger which is already able to detect place names, organization names but time.

import NaturalLanguage

let text = "Club game at Nehru Stadium, Chennai on Saturday Morning."
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
    if let tag = tag, tags.contains(tag) {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }
    return true
}

This will return the output below, so you will only need to train model to detect the adverb of time.

Nehru Stadium: PlaceName
Chennai: OrganizationName

I'm not sure how much this will work. I guess, the accuracy of prediction will depend on the volume and variety of the data set. I'm thinking about using NLTagger in combination with the above. I'll try out and let you know. Thanks. — imthath, Dec 10 '18 at 12:44
Yes as you said the key is the volume and variety of the dataset. I hope this will work. — Ozgur Sahin, Dec 10 '18 at 19:55

more than one target column in MLDataTable - CreateML framework in Swift

1 Answers1