Questions tagged [conll]

Use this tag for questions concerning the CoNLL data format, e.g. for CoNLL-X or CoNLL-U data.

CoNLL stands for Conference on Computational Natural Language Learning. During a shared task of the tenth version of this conference, a data type named CoNLL-X was born. CoNLL-U is a revised version of this format that is used to structure universal dependencies.

46 questions
0
votes
1 answer

Problem with for loop, break statement does not do what I thought it would

This is my first time posting here, so be gentle, please. I have written the following code: import pandas as pd import spacy df = pd.read_csv('../../../Data/conll2003.dev.conll', sep='\t', on_bad_lines='skip', header=None) nlp =…
Ayro
  • 3
  • 3
0
votes
0 answers

What is the way used to split text file of CoNLL format into train, valid and test sets

Have you any idea please how to split conll dataset to train and test thank you
0
votes
0 answers

how to convert data to CoNLL09?

i have an data for biology but it only know predicate in it's example. eg: Both RAP1 and 2 are important vaccine candidates because it has been shown that Alanine can block the action of a…
robocon20x
  • 175
  • 8
0
votes
0 answers

What is the way used to split text file of CoNLL format into train, valid and test sets?

I have a text file that contains data for the NER model, the data is in CoNLL format. The CoNLL format is a text file with one word per line with sentences separated by an empty line. The first word in a line should be the word and the last word…
Mai
  • 121
  • 1
  • 10
0
votes
2 answers

How to convert annotated text in XML to CONLL?

I need to preprocess XML files for a NER task and I am struggling with the conversion of the XML files. I guess there is a nice and easy way to solve the following problem. Given an annotated text in XML with the following structure as input:
coreehi
  • 177
  • 1
  • 6
0
votes
1 answer

Removing a rows from pandas data frame if one of its cell contains list of all caps string

I was working with conll2003dataset. It contains articles from various news sources among other things. It contains sentences, part of speech tags for each word in those sentences, chunk ids for those words etc. Some sentences are all caps. I simply…
Rnj
  • 1,067
  • 1
  • 8
  • 23
0
votes
1 answer

Count the number of labels on IOB corpus with Pandas

From my IOB corpus such as: mention Tag 170 171 467 O 172 173 Vincennes B-LOCATION 174 . O 175 176 Confirmation O 177 des O 178 privilèges O 179 de O 180 la O 181 ville B-ORGANISATION 182 de I-ORGANISATION 183 Tournai…
Lter
  • 43
  • 11
0
votes
0 answers

How to convert IOB to Conll U?

I am trying to convert a simple IOB file to Conll U, since the model I am trying to use requires Conll U format. Is there a simple and fast way to do so? The file looks like this: Thanks in advance!
darned7
  • 39
  • 6
0
votes
1 answer

jsonl-to-conll conversion tool application error

I need to convert a jsonl file to conll and i found this tool https://pypi.org/project/jsonl-to-conll/ but there is no examples or detailed documentation i tried this command line on command prompt C:\Users\Downloads>jsonl-to-conll…
eya_bklt
  • 305
  • 3
  • 10
0
votes
0 answers

How to load .conll file in Python?

I tried the three ways below that I found online to read a .conll file in Python but only got error reports that I don't understand. I also read about different types of .conll file, yet I don't know which one is my dataset. How can I find out? Is…
Paw in Data
  • 1,262
  • 2
  • 14
  • 32
0
votes
1 answer

Change Named Entity Recognition Format from ENAMEX to CoNLL

I have a dataset which is in ENAMEX format like this: Italy's business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job…
Ash
  • 3,428
  • 1
  • 34
  • 44
0
votes
1 answer

Parsing CoNLL-U missing annotation (misc)

I'm trying to parse .ConLL files from this github Repo, an example of my parsing code: from io import open from conllu import parse_tree_incr import glob import os for filename in…
Troy
  • 19
  • 5
0
votes
1 answer

Append in for-loop not working for storing the token lists

In the for loop below, I'm reading .dat files from a folder and parsing each file to extract the token list and then storing it in a list. My code does this, but for individual files. I have 1187 files, but the ud_file.append() just adds the tokens…
Shreya Agarwal
  • 676
  • 2
  • 8
  • 20
0
votes
1 answer

How to merge three Conllu files with Conllu python library?

This is my first time working with conllu files. I'm not able to find any way to merge these files in the Conllu python library. Any leads would be helpful. Thanks.
Shreya Agarwal
  • 676
  • 2
  • 8
  • 20
0
votes
1 answer

How to use Spacy's convert to keep paragraph information from conllu files?

I'm trying to convert conllu files to Spacy's jsonl format. These conllu files contain paragraph information as specified in Universal Dependencies' website. The problem is that the paragraph information is not carrying over to the jasonl converted…
Fábio Reale
  • 121
  • 1
  • 8