I am using the following models in my application:
en_core_web_sm
xx_ent_wiki_sm
I wanted to know the supported Date and Time formats that default Spacy model can extract.
Python Version Used:3.6 spaCy Version Used: 2.0.x
I am using the following models in my application:
en_core_web_sm
xx_ent_wiki_sm
I wanted to know the supported Date and Time formats that default Spacy model can extract.
Python Version Used:3.6 spaCy Version Used: 2.0.x
The English models were trained on the OntoNotes 5 corpus, which supports the more extensive label scheme including DATE
and TIME
.
The xx_ent_wiki_sm
model was trained on a Wikipedia corpus with a more limited label scheme and only recognises PER
, LOC
, ORG
and MISC
out of the box (model details here).
When using the models to extract mentions of date and time, it's important to keep in mind that it's a statistical process – so the results you see will depend on the context and the data the models were trained on. Depending on the texts you're working with, you likely want to update and fine-tune the pre-trained models with more examples specific to your application, or try a rule-based approach instead. Also see this thread for more details on date and time parsing.