1

I needed to train (fine tune) NER token classifier to recognize our custom tokens. The easiest way to do that I found was: Token Classification with W-NUT Emerging Entities

But now I encountered a problem - the plan was to follow: HuggingFace in Spark NLP - BERT Sentence.ipynb , but when I try:

model.save_pretrained(<path on DBFS>)

I get file write error. As far as I understand this is because tranformers/keras won't work on distributed file systems like DBFS

Is there any walkaround for this?

I cannot move training away from databricks because I'm using data (entities) from the database to create training file

PS. Maybe I can do the same using only spark nlp? How- prefarably using same "tag only" format?

Lord_JABA
  • 2,545
  • 7
  • 31
  • 58

1 Answers1

3

You should save the model to the local file system first and then copy it to DBFS:

from distutils.dir_util import copy_tree

local_path = "./tmp/model"
dbfs_path = "/dbfs/tmp/model"

model.save_pretrained(local_path)
copy_tree(local_path, dbfs_path)