2

I'm trying to use regex in entities extraction to extend the training set. It only recognizes words in the training examples. for "check asana for new data" it only recognized "new".

Your input ->  check asana for new data                                                                                                                     
Are you asking for new

My intent and entity in nlu.md:

## regex:task_name
-  [a-zA-Z0-9\\s]*

## intent: check_asana
- check asana for [jobs on linkedin](task_name)
- check asana for [job interviews](task_name)
- check asana for [new opportunities](task_name)
- check asana for [finding job opportunities](task_name)
- check asana for [finding jobs on linkedin](task_name)
- check asana for [handling the rasa example](task_name)
- check asana
  • Upated it from
## intent: check_asana
- check asana for [jobs on linkedin](task_name)
- check asana for [job interviews](task_name)
- check asana for [new opportunities](task_name)
- check asana


## regex:task_name
-  [^\\s]*

In domain.yml, the entities and slots:


entities:
  - task_name

slots:
  task_name:
    type: text
#    auto_fill: false

In config.yml I use the pipeline:

language: en
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
AsmerAmen
  • 31
  • 1
  • 4

1 Answers1

0

Regex features are appropriate for entities with very regular patterns, like zipcodes, or room numbers. Using them in this case does not make sense; there's no distinct pattern to your entities.

A regex for entity extraction provides the entity extractor with a feature i.e. "yes, this regex is matched in this sentence/for this phrase". Your regex will match for basically every single sentence, so it's not doing anything useful for you.

I would suggest removing the regex entirely and providing more examples of your entities.

Melinda
  • 747
  • 5
  • 13