Questions tagged [conll]

Use this tag for questions concerning the CoNLL data format, e.g. for CoNLL-X or CoNLL-U data.

CoNLL stands for Conference on Computational Natural Language Learning. During a shared task of the tenth version of this conference, a data type named CoNLL-X was born. CoNLL-U is a revised version of this format that is used to structure universal dependencies.

46 questions
1
vote
1 answer

AllenNLP BERT SRL input format ("OntoNotes v. 5.0 formatted")

The goal is to train BERT SRL on another data set. According to configuration, it requires conll-formatted-ontonotes-5.0. Natively, my data comes in a CoNLL format and I converted it to the conll-formatted-ontonotes-5.0 format of the GitHub edition…
Chiarcos
  • 324
  • 1
  • 10
1
vote
0 answers

Can you use Conll-U formatted files for neuralcoref training?

This guide shows how one can train the neuralcoref package using Conll-2012 formatted files. I have prepared Conll-U formatted files using the spacy_conll package. Does anyone know if one can use Conll-U files for neuralcoref training instead? Does…
Onias
  • 51
  • 3
1
vote
1 answer

Convert .CSV data into CoNLL BIO format for NER

I have some data in a .csv file that looks like this sent_num = [0, 1, 2] text = [['Jack', 'in', 'the', 'box'], ['Jack', 'in', 'the', 'box'], ['Jack', 'in', 'the', 'box']] tags = [['B-ORG', 'I-ORG', 'I-ORG', 'I-ORG'], ['B-ORG', 'I-ORG', 'I-ORG',…
GSA
  • 751
  • 8
  • 12
1
vote
3 answers

How to change from CoNLL format into a sentences list?

I have a txt file in, theoretically, CoNLL format. Like this: a O nivel B-INDC de O la O columna B-ANAT anterior I-ANAT del I-ANAT acetabulo I-ANAT existiendo O minimos B-INDC cambios B-INDC edematosos B-DISO en O la O medular B-ANAT (...) I need…
Andrea NR
  • 1,357
  • 1
  • 5
  • 14
1
vote
1 answer

Converting pandas dataframe to CoNLL

I have a processed dataframe which is used as a input to train a NLP model: sentence_id words labels 0 0 a B-ORG 1 0 b I-ORG 2 0 c I-ORG 5 1 d B-ORG 6 1 e …
Shyam
  • 357
  • 1
  • 9
1
vote
0 answers

Why do I get "ValueError: Inconsistent number of columns" when reading sentences from .ConLL file?

from nltk.corpus.reader.conll import ConllCorpusReader READER = ConllCorpusReader(root="./", fileids=".conll", columntypes=('words','pos','tree','chunk','ne','srl','ignore') …
Paw in Data
  • 1,262
  • 2
  • 14
  • 32
1
vote
1 answer

How to create a TokenList using the conllu library?

I'm trying to create a CoNLL-U file using the conllu library as part of a Universal Dependency tagging project I'm working on. I have a number of sentences in python lists. These contain sub-lists of tokens, lemmata, POS tags, features, etc. For…
AdeDoyle
  • 361
  • 1
  • 14
1
vote
1 answer

How to import text from CoNNL format with named entities into spaCy, infer entities with my model and write them to the same dataset (with Python)?

I have a dataset in CoNLL NER format which is basically a TSV file with two fields. The first field contains tokens from some text - one token per line (each punctuation symbol is also considered a token there) and the second field contains named…
Sergey Zakharov
  • 1,493
  • 3
  • 21
  • 40
1
vote
1 answer

Conversion of Text sentences to CONLL Format

I want to convert the Normal english text into CONLL-U format for maltparser for finding dependency in the text in Python. I tried in java but was failed to do so, below is the format I'm looking for- String[] tokens = new String[11]; tokens[0] =…
Shubham
  • 11
  • 2
1
vote
1 answer

Spacy identifying blank spaces as entities

I am just starting to work with Spacy and have put a text through to test how it is working on a pdf I OCR'd with AntFileConverter. The txt file (sample below - would attach but unsure how) seems fine, is in UTF-8. However when I output the file in…
Sandra Young
  • 85
  • 3
  • 12
0
votes
0 answers

Generating CoNLL-U Format from Excel Data - Duplicate 'id' Line Issue

Description: I have a script that aims to convert data from an Excel file to the CoNLL-U format. However, I'm encountering an issue where the line containing 'id form lemma upos xpos feats head deprel deps misc' appears twice…
cande5
  • 1
0
votes
0 answers

Getting ACHTUNG! No gold labels and no all_predicted_values found message

This is how my traing data looks like, (for testing/debuging) Sentence[8]: "ziedona B-LOC iela I-LOC ane B-LOC latvia B-LOC" Sentence[4]: "ziedona B-LOC iela I-LOC" why I get this message, what is wrong with the data? 2023-08-17 13:34:43,943…
realPro
  • 1,713
  • 3
  • 22
  • 34
0
votes
1 answer

Problems with reproducing the training of the spaCy pipeline

I'm trying to reproduce the training of one of the spaCy pipeline for Italian language: it_core_news_sm. This pipeline is trained on 2 datasets: UD_Italian-ISDT for the conllu tasks WikiNer for NET tagging Where can I find more info about the data…
0
votes
0 answers

Label Studio: Importing Txt Files as Whole Files & Exporting the Result

I am trying to export the result of the file that I imported to Label Studio. This is my labeling interface :
0
votes
0 answers

Is there a way to convert multiple spacy docs to one conllu file in Python?

I want to parse sentences with a spacy pipeline and then convert the docs into a single conllu file. But with texts = ["First sentence.", "Second sentence.", "Third sentence."] nlp = init_parser(language, parser, …