You cannot mix language.
Speech Recognition roughly contains 3 part -> Accoustic model, Language model, and dictionary.
Accoustic model is the result of data training contains relationship between audio signal and phonetic
Dictionary contains words and how they pronounced, for e.g, word TOP are pronounced "T AH P" on the general speech recognition dictionary.
Language model is the connection between words to create sentences, for e.g. the word "I" is connected with "am", so the speech recognizer will very rarely (or never) give the result of "I are" or "I is".
Every Language have their own Accoustic Model (phonetic), Dictionary (words), and Language Model (sentences), so we can just mix them up.
The Question is : Is it still possible?
The Answer is : YES!
You can build your own language (in this case Hindi + English) using many tools, one I already tried called CMU Sphinx / Pocket Sphinx. You can build your own model, train it, and make a dictionary out of it. It will be alot work to do, but you can configure anything you will need for speech recognition.
Link for any platform implementation : https://github.com/cmusphinx