1

I am using the following models in my application:

en_core_web_sm

xx_ent_wiki_sm

I wanted to know the supported Date and Time formats that default Spacy model can extract.

Python Version Used:3.6 spaCy Version Used: 2.0.x

manish Prasad
  • 636
  • 6
  • 16

1 Answers1

4

The English models were trained on the OntoNotes 5 corpus, which supports the more extensive label scheme including DATE and TIME.

The xx_ent_wiki_sm model was trained on a Wikipedia corpus with a more limited label scheme and only recognises PER, LOC, ORG and MISC out of the box (model details here).

When using the models to extract mentions of date and time, it's important to keep in mind that it's a statistical process – so the results you see will depend on the context and the data the models were trained on. Depending on the texts you're working with, you likely want to update and fine-tune the pre-trained models with more examples specific to your application, or try a rule-based approach instead. Also see this thread for more details on date and time parsing.

Ines Montani
  • 6,935
  • 3
  • 38
  • 53
  • Would like to let the model [release](https://github.com/explosion/spacy-models/releases) reference, which is very simple to find different models and its steps to install it. Using the site is kinda a mess. –  Dec 30 '20 at 02:59