Questions tagged [korean-nlp]
9 questions
8
votes
2 answers
Is there a way to programmatically combine Korean unicode into one?
Using a Korean Input Method Editor (IME), it's possible to type 버리 + 어 and it will automatically become 버려.
Is there a way to programmatically do that in Python?
>>> x, y = '버리', '어'
>>> z = '버려'
>>> ord(z[-1])
47140
>>> ord(x[-1]), ord(y)
(47532,…

alvas
- 115,346
- 109
- 446
- 738
2
votes
1 answer
Error while reading CSV containing Korean language
I am trying to read CSV file in which one column contain korean text using below lines
Sys.setlocale(category="LC_ALL", locale = "Korean")
old <- read.csv("Past-Korean.csv", encoding = "utf-8",header=T,na.strings=c(""))
But I am getting…

user3734568
- 1,311
- 2
- 22
- 36
1
vote
0 answers
Using spaCy to Lemmatize Korean?
I'm trying to lemmatize some Korean sentences using some pretrained models. I'm very much a beginner with this sort of thing so I'm sure I could be missing something obvious but following examples I found for other languages and the Korean model's…

Anteater
- 11
- 1
1
vote
0 answers
How to make a mecab-ko as AWS Lambda Layer?
From the answer to how to add mecab library in aws lambda, I could make a lambda layer of mecab library.
However, mecab-ko does not seem to be working in the same way.
Could anyone please guide me?

Yj Cho
- 11
- 1
1
vote
2 answers
extracting a word (of variable length) ending with 동 from a string in R
I have a data frame in R with one column containing an address in Korean. I need to extract one of the words (a word ending with 동), if it's there (it's possible that it's missing) and create a new column named "dong" that will contain this word. So…

carpediem
- 371
- 3
- 11
0
votes
0 answers
missing words in tdm, using konlp, R
I'm currently preprocessing korean corpus using KoNLP, in R.
library(stringr)
library(tm)
library(KoNLP)
library(dplyr)
library(rJava)
useNIADic()
myfunc_extract <- function(doc){
doc <- as.character(doc)
doc2 <- paste(SimplePos22(doc))
…

K.K.SAN
- 11
- 4
0
votes
0 answers
R - how to create DocumentTermMatrix for Korean words
I hope those text mining gurus, that are also Non-Koreans can help me with my very specific question.
I'm currently trying to create a Document Term Matrxi (DTM) on a free text variable that contains mixed English words and Korean words.
First of…

Brian
- 161
- 11
0
votes
0 answers
issues tagging POS
I'm trying to tag POS but results returns me an error I don't even know what to do about :/
can anyone help me find out where went wrong..?
tagging pos
api = KhaiiiApi()
significant_tags = ['NNG', 'NNP', 'NNB', 'VV', 'VA', 'VX', 'MAG', 'MAJ', 'XSV',…

chocoscone
- 3
- 1
0
votes
0 answers
Convert mojibake to Korean in Python
(Edited: now referencing Unbaking mojibake)
Source file: Android phone .vcf contacts file
Destination: Windows 7 User Contacts file (imported .vcf)
Resulting contact info:
Korean mojibake for the Name Field:
'_곗퐫 李쏀__ㅻ━肄섏떎留_'
The result should be…

hippo
- 3
- 3