Questions tagged [language-detection]

Language detection or language identification is the task of identifying the language(s) in a fragment of text.

From Wikipedia:

In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods.

...

One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Serbian and Croatian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them.

http://corporavm.uni-koeln.de/vardial/sharedtask.html has input data and results from a recent competition (COLING 2014 VarDial workshop DSL task).

142 questions
0
votes
4 answers

How to detect if a text is in a given language?

I have a kind of Q&A site (very approximately) where users enter questions to be answered by our Staff. I am quite concerned about users posting non-questions, which are an annoyance. The best I thought to far is a system to detect whether the text…
Giulio Muscarello
  • 1,312
  • 2
  • 12
  • 33
0
votes
1 answer

Installing CLD libary on windows and bind to Python

I have a need to make use of Chromium's Compact Language Detector library within a Python script. AFAIK, there are two projects that leverage this library, but I have been having troubles with getting either of them set up on a Windows 7 machine. I…
jakc
  • 1,161
  • 3
  • 15
  • 42
0
votes
1 answer

How to crawl English site and avoid crawling other languages?

Hi I need to crawl only sites that their language is English. I know nutch can detect the langauge of sites by plugins like language detector But I need to prevent nutch from crawling the none English site. Although I know we need to crawl a page to…
a.toraby
  • 3,232
  • 5
  • 41
  • 73
-1
votes
1 answer

How do I translate a string from one language to another in PHP without making any external network requests?

I have this: $English_string = 'Hello. I am a robot.'; Now I want this: $Swedish_string = 'Hej. Jag är en robot.'; I imagine the code to be like this: $Swedish_string = translate_me($English_string, 'en', 'sv'); // text, from, to This…
-1
votes
3 answers

Detecting language of string value from a database column

I have a school project idea and would like to ask for your advice on how to implement it. I would like to create an application which will enable users to upload a data file. The application should be able to detect the language (french, english,…
elmify
  • 105
  • 1
  • 5
  • 16
-1
votes
2 answers

How to understand text language in utf8 encoded text?

Redis is using utf8 code and for my project I need to get text language which is utf8 encoded text. Is there any way that can give a clue about the language of the text? EDIT: My project is on NodeJs programming language. In Redis maybe lua script…
uzay95
  • 16,052
  • 31
  • 116
  • 182
-5
votes
2 answers

Detecting if text is non-English

What is the most accurate method for detecting if a text (specifically Instagram comments) are non-English? I am happy to use any high-level language, such as Python, PHP, etc. $ sudo pip2 install guess_language >>> from guess_language import…
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
1 2 3
9
10