0

I have been given a set of 80 non-english words in an excel file..the first column contains the resulting word after a crude automatic segmentation has been applied to it and the second column contains the resulting word after being segmented manually. Below is a set of 3 rows of the file

Auto segmentation ......... Manually segmented

  1. [%D-Ik--(is$) ........... [%D-Ik]--(is$)
  2. [%D-Ip-t-eR]-(u$) .... [%D-I]-[pt-eR]-(u$)
  3. [%D-Om-(a$) ........... [%D-Om]-(a$)

My question is: is there a way with which I can train a model with this set of examples in order to segment new words (that start from d) automatically?

Georgy90
  • 155
  • 2
  • 12
  • 1
    It is a sequence labeling problem. For every character in the sequence, you want to assign a flag if it is an end of a segment. 80 examples are however too few to any machine learning. – Jindřich Nov 26 '19 at 12:17
  • Perhaps I can ask for more data. Seting that aside, is there a particular algorithm that would be the most appropriate for this task (e.g. hidden Markov model)? – Georgy90 Nov 26 '19 at 13:20

0 Answers0