Questions tagged [language-detection]

Language detection or language identification is the task of identifying the language(s) in a fragment of text.

From Wikipedia:

In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods.

...

One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Serbian and Croatian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them.

http://corporavm.uni-koeln.de/vardial/sharedtask.html has input data and results from a recent competition (COLING 2014 VarDial workshop DSL task).

142 questions
3
votes
1 answer

Solr language detection update processor for denormalized mixed-language documents

I have a database of things, with each thing being able to have several names in different languages. This is currently normalized to a thing has-many names schema: things ------ id ... names ----- id thing_id language name I am indexing this…
deceze
  • 510,633
  • 85
  • 743
  • 889
3
votes
2 answers

Python unable to install guesslang

I'm trying to install guesslang with pip but it seems that the last version (which was released on August 2021) depends on an obsolete version of Tensorflow (2.5.0). The problem is that I can't find this version anywhere. So, how can I install it?…
pasta64
  • 302
  • 2
  • 11
3
votes
3 answers

RuntimeError: Failed to init API, possibly an invalid tessdata path: C:\Users\hp\Anaconda3\/tessdata/

I am using windows 10 and tesserocr version is 2.4. Want to detect text from an image and then the language of that text. While running this piece of code: from tesserocr import PyTessBaseAPI import argparse parser = argparse.ArgumentParser("Enter…
3
votes
0 answers

Converting Language Detection Score of CLD2 to CLD3 Accuracy

My cld2 language detection model (langID) returns for the input sentence to classify the following values { reliable: true, textBytes: 181, languages: [ { name: 'ITALIAN', code: 'it', percent: 61, score: 774 }, { name: 'ENGLISH', code:…
loretoparisi
  • 15,724
  • 11
  • 102
  • 146
3
votes
1 answer

iOS voiceOver/accessibility foreign words pronuncation

Some of the texts I assign to accessibilityLabel(s) in my iOS app contain "mixed language". For example, in German the text would be "Bier und guter Sound". The word "Sound" spoken with German VoiceOver language doesn't make sense (it should say…
kampfgnu
  • 107
  • 3
  • 10
3
votes
2 answers

How are programming language specific settings called in Vim and how to detect + overwrite them?

The editor Vim comes with syntax highlighting for many different programming languages. Questions: In Emacs, language-specific settings are called "modes". In Vim, however, the term "mode" refers to command or insert mode. So what is the Vim term…
Joachim W
  • 7,290
  • 5
  • 31
  • 59
3
votes
1 answer

Detect language changes in file using Python

I need to detect language changes in a file, and tag each word accordingly. I've come up with a hacky way, that works for 2 languages (english and greek). The script is this: #!/usr/bin/env python # -*- coding: utf-8 -*- import sys #open…
themistoklik
  • 880
  • 1
  • 8
  • 19
3
votes
2 answers

Use browsers language settings to set language (rlmp_language_detection)

I try to automaticly set the language in my typo3 6.2 One-Tree Page. To my setup, I use RealURL to add the langauge to the URL, I use the default Lparameter. I DON'T use ISO codes for the languages, but I use static_info_tables to set the ISO Code.…
nbar
  • 6,028
  • 2
  • 24
  • 65
3
votes
1 answer

language detection code in python

So, we have built a language detection program in python that just detects different languages. Our code seems fine; there is no error but I am not getting the desired result. Whenever I run it on Eclipse, it runs and terminates giving us the…
Rabia Khan
  • 53
  • 1
  • 4
3
votes
3 answers

PHP get current client OS language

I wanted to know, is there any way from PHP/javascript to get current client OS language. I tried to use $_SERVER["HTTP_ACCEPT_LANGUAGE"] but sometimes it get the wrong language. For example in Google Chrome: My OS: Windows 7 Language:…
user430926
  • 4,017
  • 13
  • 53
  • 77
2
votes
4 answers

Fast Java library for language detection of Tweets?

According to this bug, Twitter's search API has been broken with regard to Language for at least 2 years: http://bit.ly/GQ244g so it seems unlikely they're going to fix it. I've looked at the libraries mentioned on the other language detection…
George
  • 579
  • 1
  • 5
  • 12
2
votes
2 answers

Detecting the current tab language using Chrome extension?

Is there a way to use chrome API to detect the language of the current content in the current tab?
Max
  • 4,152
  • 4
  • 36
  • 52
2
votes
1 answer

How to extract only those rows of the DataFrame where the values of two columns of the DataFrame are in English Language?

I have a dataframe which has 27 columns including columns FonctionsStagiaire and ExigencesParticulieres. The dataframe has 13774 rows which are either entirely in English or French. The csv file can be found here: GDrive link I am trying to keep…
2
votes
2 answers

Language detection for short string in a user content generated context

I have some question about the detection of short string. I need to detect the language of text sent in a chat, and I am faced with 2 problems: the lenght of the message the errors that may be in it and the noise (emoji etc...) but for the noise,…
Jourdelune
  • 131
  • 8
2
votes
0 answers

Language detection using pycld2

I am trying to use the pycld2 package to detect multiple languages in text. This is the example I am testing out: import pycld2 as cld2 text = '''The universal connection with an additional advantage: Push-in connection. Terminate solid and…
1 2
3
9 10