Hi I want classify a dataset using naivebayesclassifier.For that I want to use external dataset which i have downloaded from google.this dataset contains a two folder for positive reviews and negative reviews.Each folder contains 1000 .txt files.How to import this file in my code as a train dataset in python.I am new to machine learning so I have very less idea about that.Please help me out.
Asked
Active
Viewed 162 times
1 Answers
0
You can use os.listdir
, from (https://docs.python.org/2/library/os.html), e.g.:
import os
fileList = os.listdir('train_directory')
for file in fileList:
# add content of file to dataset.

Thomas Pinetz
- 6,948
- 2
- 27
- 46
-
os.listdir code works.thanks for guiding.I want to read every .txt file and extract all positive words and tag the words as positive at the end.Below code but it is showing error stating 0_9.txt this file name does not exist but its there in the folder posfilenames = os.listdir("C:/Users/Sharmili/Desktop/movie_reviews/pos") print(posfilenames)for filename in posfilenames: f = open(filename,'r') reviews = f.read() pos_reviews = reviews.split() pos_reviews.append((create_word_feature(words),"positive")) print(len(pos_reviews)) – Sharmili Nag Dec 06 '16 at 22:03
-
can you please help me out – Sharmili Nag Dec 06 '16 at 22:23
-
you need to use f = open(dir + "/" + filename) – Thomas Pinetz Dec 07 '16 at 07:40
-
I have tried your code but it was not working so I tried the following code for filename in posfilenames: f = open(os.path.join("C:/Users/Sharmili/Desktop/movie_reviews/pos",filename),'r') pos_reviews = f.read().split(" ") print(pos_reviews) – Sharmili Nag Dec 09 '16 at 05:59
-
Although it is printing the words but it is giving error return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 803: character maps to
– Sharmili Nag Dec 09 '16 at 05:59 -
Are you sure you are using the right encoding. Maybe those txt files are not really text files but binary encodes. Then you would have to use open(file, 'rb'). This is hard to answer without knowing the content of the text files. But i can assure you I use the code above regularly for my files and it works. Anyways you can try to open one file to see what the difference is between my code and the code that works for you. – Thomas Pinetz Dec 09 '16 at 08:06