Using regex with enitities in Rasa NLU

Question

I'm trying to use regex in entities extraction to extend the training set. It only recognizes words in the training examples. for "check asana for new data" it only recognized "new".

Your input ->  check asana for new data                                                                                                                     
Are you asking for new

My intent and entity in nlu.md:

## regex:task_name
-  [a-zA-Z0-9\\s]*

## intent: check_asana
- check asana for [jobs on linkedin](task_name)
- check asana for [job interviews](task_name)
- check asana for [new opportunities](task_name)
- check asana for [finding job opportunities](task_name)
- check asana for [finding jobs on linkedin](task_name)
- check asana for [handling the rasa example](task_name)
- check asana

Upated it from

## intent: check_asana
- check asana for [jobs on linkedin](task_name)
- check asana for [job interviews](task_name)
- check asana for [new opportunities](task_name)
- check asana


## regex:task_name
-  [^\\s]*

In domain.yml, the entities and slots:


entities:
  - task_name

slots:
  task_name:
    type: text
#    auto_fill: false

In config.yml I use the pipeline:

language: en
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

So, if you need to match any text, use `.*`, why use `[^\\s]*`? — Wiktor Stribiżew, Jun 21 '20 at 22:21
I have tried it, it didn't work. I think the problem is with my configuration, not the regex. — AsmerAmen, Jun 21 '20 at 22:44

score 0 · Answer 1 · answered Jun 26 '20 at 12:50

Regex features are appropriate for entities with very regular patterns, like zipcodes, or room numbers. Using them in this case does not make sense; there's no distinct pattern to your entities.

A regex for entity extraction provides the entity extractor with a feature i.e. "yes, this regex is matched in this sentence/for this phrase". Your regex will match for basically every single sentence, so it's not doing anything useful for you.

I would suggest removing the regex entirely and providing more examples of your entities.

Using regex with enitities in Rasa NLU

1 Answers1