FastText and Datasets in Azure ML with Python

Question

I am running an experiment (a custom-made model created with Pytorch) in Azure ML and using FastText (not the gensim version), but met a problem:

In the experiment, I have a (rather large) text file in a dataset and need to train FastText with it, but fasttext.train_unsupervised only takes a file name as an input.

Please, how do I work with FastText in the context of Azure ML datasets?

Thanks in advance!

score 0 · Answer 1 · answered Feb 07 '20 at 21:30

Well, just found out:

You can mount an Azure ML dataset as a directory and have FastText read from it in this fashion:

import fasttext
from azureml.core import Dataset
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
dset = Dataset.get_by_name(workspace=ws, name='thenameofyourdataset')

dset.mount('afoldernameyoujustinvented')
embedding = fasttext.train_unsupervised('afoldernameyoujustinvented/myfilename.txt')

In other words: you mount your dataset to a virtual folder and use that virtual folder as if it was (and probably, under the hood, it is) a real folder with the files in your dataset.

Cheers!

FastText and Datasets in Azure ML with Python

1 Answers1