Questions tagged [korean-nlp]

9 questions
8
votes
2 answers

Is there a way to programmatically combine Korean unicode into one?

Using a Korean Input Method Editor (IME), it's possible to type 버리 + 어 and it will automatically become 버려. Is there a way to programmatically do that in Python? >>> x, y = '버리', '어' >>> z = '버려' >>> ord(z[-1]) 47140 >>> ord(x[-1]), ord(y) (47532,…
alvas
  • 115,346
  • 109
  • 446
  • 738
2
votes
1 answer

Error while reading CSV containing Korean language

I am trying to read CSV file in which one column contain korean text using below lines Sys.setlocale(category="LC_ALL", locale = "Korean") old <- read.csv("Past-Korean.csv", encoding = "utf-8",header=T,na.strings=c("")) But I am getting…
user3734568
  • 1,311
  • 2
  • 22
  • 36
1
vote
0 answers

Using spaCy to Lemmatize Korean?

I'm trying to lemmatize some Korean sentences using some pretrained models. I'm very much a beginner with this sort of thing so I'm sure I could be missing something obvious but following examples I found for other languages and the Korean model's…
Anteater
  • 11
  • 1
1
vote
0 answers

How to make a mecab-ko as AWS Lambda Layer?

From the answer to how to add mecab library in aws lambda, I could make a lambda layer of mecab library. However, mecab-ko does not seem to be working in the same way. Could anyone please guide me?
Yj Cho
  • 11
  • 1
1
vote
2 answers

extracting a word (of variable length) ending with 동 from a string in R

I have a data frame in R with one column containing an address in Korean. I need to extract one of the words (a word ending with 동), if it's there (it's possible that it's missing) and create a new column named "dong" that will contain this word. So…
carpediem
  • 371
  • 3
  • 11
0
votes
0 answers

missing words in tdm, using konlp, R

I'm currently preprocessing korean corpus using KoNLP, in R. library(stringr) library(tm) library(KoNLP) library(dplyr) library(rJava) useNIADic() myfunc_extract <- function(doc){ doc <- as.character(doc) doc2 <- paste(SimplePos22(doc)) …
K.K.SAN
  • 11
  • 4
0
votes
0 answers

R - how to create DocumentTermMatrix for Korean words

I hope those text mining gurus, that are also Non-Koreans can help me with my very specific question. I'm currently trying to create a Document Term Matrxi (DTM) on a free text variable that contains mixed English words and Korean words. First of…
Brian
  • 161
  • 11
0
votes
0 answers

issues tagging POS

I'm trying to tag POS but results returns me an error I don't even know what to do about :/ can anyone help me find out where went wrong..? tagging pos api = KhaiiiApi() significant_tags = ['NNG', 'NNP', 'NNB', 'VV', 'VA', 'VX', 'MAG', 'MAJ', 'XSV',…
0
votes
0 answers

Convert mojibake to Korean in Python

(Edited: now referencing Unbaking mojibake) Source file: Android phone .vcf contacts file Destination: Windows 7 User Contacts file (imported .vcf) Resulting contact info: Korean mojibake for the Name Field: '_곗퐫 李쏀__ㅻ━肄섏떎留_' The result should be…
hippo
  • 3
  • 3