2

I need to identify the difference between Brazil and european portuguese either with Character Sets or unicodes or ascii letters or regex or with trigrams used to identify the difference in these two languages. most of the language detectors like NTextCart, guesslanguages.js does not identify the difference in language. can any one have the solution for this issue.

Thanks in Advance :)

jean
  • 4,159
  • 4
  • 31
  • 52
Arun Kumar
  • 37
  • 1
  • 6
  • No need to re-invent the wheel - Google Translate is among the best there is: https://www.google.com/search?q=c-sharp+google+translate+api&ie=&oe= – Shannon Holsinger Sep 17 '16 at 20:30
  • 3
    @ShannonHolsinger google translate can't differentiate Portuguese from Brazilian Portuguese – Almis Sep 17 '16 at 21:18
  • I do have large data of the words . so checking each words against translation may impact performance of the application . can you please guide me to get the most used trigrams of the brazil and european protuguese which would solve my issues – Arun Kumar Sep 17 '16 at 21:41
  • Bummer - didn't know that. – Shannon Holsinger Sep 18 '16 at 03:49
  • 1
    I'm voting to close this question as off-topic because, as currently written, it has nothing to do with programming. Ask rather at http://pt.stackoverflow.com/ or (better) at http://portuguese.stackexchange.com/ or even at http://linguistics.stackexchange.com/. Please take the [2-minute tour](http://stackoverflow.com/tour). Moreover, open [Help Center](http://stackoverflow.com/help) and read at least [_What topics can I ask about here_](http://stackoverflow.com/help/on-topic)? – JosefZ Sep 19 '16 at 09:18
  • This has been **[cross-posted](http://portuguese.stackexchange.com/questions/3700/how-can-i-automatically-distinguish-brazilian-portuguese-from-european-portugues)** to portuguese.stackexchange.com. It was edited there to conform to the topic/theme of the Portuguese language; the technical sides of this question will not be addressed there. – ANeves Sep 27 '16 at 23:47
  • guys as informed by ANeves i got some comments in the protuguese stack overflow http://portuguese.stackexchange.com/questions/3700/how-can-i-automatically-distinguish-brazilian-portuguese-from-european-portugues?noredirect=1#comment9111_3700 again it was redirected . so please help out to identify the difference – Arun Kumar Sep 28 '16 at 17:35
  • You're basically asking the volunteers who answer questions at Stack Overflow to build a language detector from whole cloth. That's the very definition of too broad. Please see [ask] and [help/on-topic]. – Heretic Monkey Sep 30 '16 at 14:51
  • Is this dealing with user input? If so then there are probably other methods to achieve what you are looking for. For Example you can request to know the user's location, or if you have a database that maintains a user's country then use that to identify if it would be brasilian or european – Jarod Moser Sep 30 '16 at 14:57

1 Answers1

1

It's not different from telling apart US english and UK english

You must know both languages and seek for very specific differences. It's a tricky and not accurate way. Also you may need to get the context of the message to get the meaning of the words.

Even a native portuguese speaker can have hard time telling them appart, it's even worse for small texts.

To get an example get search for the same topic (example, Clinton x Trump debate) in brazilian and portuguese news sites and try to read them and see the diffrences. You will got an idea.

Also put in mind if you are getting casual chatting you will need to handle slangs, mispellings and region specific expressions from each country.

After reading how Guesslanguagew uses trigram analysis I see it ill get abad time telling dialects apart. There are few words with different spelling.

jean
  • 4,159
  • 4
  • 31
  • 52