0

This is regarding the R package udpipe for NLP. I am using it to tokenize, tag, lemmatize and perform dependency parsing on text files.

I am not sure which template the conllu file is needed for the function

udpipe_accuracy

I loaded a CSV file of 10 columns but the error persists.

I could not search any questions on SO on this package and also there is no tag of udpipe.

phiver
  • 23,048
  • 14
  • 44
  • 56
Lazarus Thurston
  • 1,197
  • 15
  • 33
  • What are you trying to do ? You can give a sample of your problem and the expected output. May be, there would be another (may be better) package which can solve your problem. – YOLO Feb 25 '18 at 16:27
  • @ManishSaraswat, I am working on summarising large text documents. Before I can use any package like [textrank](https://github.com/bnosac/textrank/blob/master/vignettes/textrank.Rmd) I need to convert the text into CoNLLU format. I guess that's the standard format for any NLP work on text. – Lazarus Thurston Feb 26 '18 at 04:58
  • Can you provide a sample data set ? I think there might be another way to do it where you don't require CoNLLU format. – YOLO Feb 26 '18 at 11:29
  • I guess I no longer need to run udpipe_accuracy, as I sorted out the fundamental problem of accurately creating a terminology file. But I now have a problem in ignoring page headers and footers in the text file imported for a pdf doc...as the CONLL-U format includes the headers also as sentences. I will close this question and open a new question if that is fine @ManishSaraswat – Lazarus Thurston Feb 26 '18 at 17:16
  • That's perfectly fine. – YOLO Feb 26 '18 at 17:45

1 Answers1

1

udpipe_accuracy is used in combination with udpipe_train. If you trained a custom udpipe model with udpipe_train based on data in conllu format, you can see how good it is by using udpipe_accuracy on hold-out conllu data which was not used to build the model.