1

At the moment, I'm writing a chatbot which tries to simulate some sort of a conversation. Being a n00b in Python, I currently rely on lists and dictionaries for some standard responses to a standard set of queries. As I learn more, I realise that lists/dictionaries/functions aren't going to be enough and that I'd have to use some sort of database. At the moment, I just keep adding items to my list/dictionaries as I encounter new questions from the user. My question is, what database do I use to store/query data from the user. I went thru this and its subsequent links, in the answers, but I find no mention of which DB was used. (This little project of mine is aimed at teaching myself the concepts of machine learning and NLP)

Thanks in advance.

Community
  • 1
  • 1
rahuL
  • 3,330
  • 11
  • 54
  • 79

2 Answers2

2

Google n-grams is probably one of the best data base you can get, not only it gives you frequencies of words , it also gives you n-grams with their frequencies, which will allow you to get phrases!

You could also use wikipedia dump file for various uses, like Semantic analyzes of words/terms, as described by Markovitch & Gabrilovich in their (brilliant) paper: Wikipedia-based Semantic Interpretation for Natural Language Processing

amit
  • 175,853
  • 27
  • 231
  • 333
1

Might want to look into redis. It's extremely fast (which is important for a chatbot) and very easy to use. It's just a key-value store, though, so if you're looking for layered logic like that example had with XML, this isn't your answer necessarily--but then again, you probably wouldn't want to store the logic in the database anyway.

Basically, look at Redis, but without more detail about exactly what you're doing, it's a little hard to help.

jdotjdot
  • 16,134
  • 13
  • 66
  • 118
  • Sorry if the question looks a little vague. The aim is to first build a chatbot which interacts in a standard question-answer mode to start with. For this, as I read thru the Redis How-Tos, redis looks ok. However, the next step after that is to analyze each sentence/phrase and make the bot give an educated response based purely on the query and not on a standard set of responses. The step after that would probably be for it to learn/train (continually if possible) - but I'll cross that bridge when I get there :) – rahuL Dec 05 '12 at 20:30
  • i chose this answer coz it suit my simplistic needs for now.. n - grams is little too complex at this time – rahuL Feb 25 '13 at 11:40