-1

I need help with something simple! I've tried different codes, but nothing worked properly.

I have a .txt file with two columns separated by space. The file looks like this:

screenshot of file contents

I want to split these strings into a list to obtain the result below:

my_list=['1', 'abacaxi','1','abalo','1','abalos', '0', 'abacate']

How can I do this? The code below runs but the result is not what I need.

import os
import io
import sys
from pathlib import Path

while True:
    try:
        file_to_open =Path(input("Please, insert your file path: "))
        with open(file_to_open,'r', encoding="utf-8") as f:
            words = f.read().lower()
            break         
    except FileNotFoundError:
        print("\nFile not found. Better try again")
    except IsADirectoryError:
        print("\nIncorrect Directory path.Try again")


print('total number of words + articles: ', len(words))
corpus=words.split(' ')
print(corpus[0:20])
martineau
  • 119,623
  • 25
  • 170
  • 301
Natalia Resende
  • 185
  • 1
  • 1
  • 15

1 Answers1

0

Here you go,

with open(file_to_open,'r', encoding="utf-8") as f:
    words = f.read().lower()

#Split the lines and join them into one line, and single spaces between them
words = " ".join(words.split(sep='\n'))

#remove double spaces with single space
    while "  " in words:
        words = words.replace("  ", " ")
#Split the line silimiter ' ' i.e. space into a list
word_li = " ".join(words.split(sep=' '))
scsanty
  • 146
  • 1
  • 7