I have a corpus of newspaper articles in a .txt
file, and I'm trying to split the sentences from it to a .csv
in order to annotate each sentence.
I was told to use NLTK
for this purpose, and I found the following code for sentence splitting:
import nltk
from nltk.tokenize import sent_tokenize
sent_tokenize("Here is my first sentence. And that's a second one.")
However, I'm wondering:
- How does one use a
.txt
file as an input for the tokenizer (so that I don't have to just copy and paste everything), and - How does one output a
.csv
file instead of just printing the sentences in my terminal.