0

I am trying to build simple chatbot application using Rasa, but my bot is giving confidence 0 if there is an underscore in the word.

Below is my config.yml configuration:

language: en  
pipeline: supervised_embeddings  
policies:  
  - name: KerasPolicy  
  #- name: MappingPolicy  
  #- name: MemoizationPolicy  
  #- name: FallbackPolicy  

nlu.md configuration:

## intent:name
- name
- nmae
- nme
- what is my name?

## intent: firstname
- firstName
- FName
- first name

## intent: gender
- gender
- sex
- gnder
- gendr
- sx

## intent: lastname
- lastName
- lname
- surname
- lstnme
- lstname

## intent: username
- userName
- uname
- usrnme
- usernme
- userid

If I pass firstname I am getting the correct intent and confidence and if I try with _firstname or first_name I am getting the below result:

first_name
{
  "intent": {
    "name": null,
    "confidence": 0.0
  },
  "entities": [],
  "intent_ranking": [],
  "text": "first_name"
}
Secespitus
  • 710
  • 2
  • 14
  • 22
Karthik Mannava
  • 207
  • 1
  • 5
  • 12

1 Answers1

1

You're getting 0 confidence precisely because you've used underscore in your word. The word first_name hasn't been used in your training data so, that word is foreign to your model. That's why it doesn't predict anything for that word. (By default, it uses a whitespace tokenizer so words are only tokenized by whitespace.)

So, to fix your issue, just don't use underscore in your word or you can edit the whitespace tokenizer to tokenize by whitespace and underscore.

Hope that helps.

lahsuk
  • 1,134
  • 9
  • 20
  • Thanks for the help and I'm able to solve this issue by using **CountVectorsFeaturizer** with analyzer 'char_wb' . – Karthik Mannava Sep 05 '19 at 10:13
  • There's that way too but I don't really like using the `char_wb` option. I don't think it works that well because I think it'll give more false-positives. – lahsuk Sep 05 '19 at 11:07