Training a computer for word auto-segmentation (non-english language)

Asked Nov 26 '19 at 09:51

Active Nov 26 '19 at 09:51

Viewed 26 times

I have been given a set of 80 non-english words in an excel file..the first column contains the resulting word after a crude automatic segmentation has been applied to it and the second column contains the resulting word after being segmented manually. Below is a set of 3 rows of the file

Auto segmentation ......... Manually segmented

[%D-Ik--(is$) ........... [%D-Ik]--(is$)
[%D-Ip-t-eR]-(u$) .... [%D-I]-[pt-eR]-(u$)
[%D-Om-(a$) ........... [%D-Om]-(a$)

My question is: is there a way with which I can train a model with this set of examples in order to segment new words (that start from d) automatically?

asked Nov 26 '19 at 09:51

Georgy90

1

It is a sequence labeling problem. For every character in the sequence, you want to assign a flag if it is an end of a segment. 80 examples are however too few to any machine learning. – Jindřich Nov 26 '19 at 12:17
Perhaps I can ask for more data. Seting that aside, is there a particular algorithm that would be the most appropriate for this task (e.g. hidden Markov model)? – Georgy90 Nov 26 '19 at 13:20

Training a computer for word auto-segmentation (non-english language)

0 Answers0