I am trying to work with a zip archive in Kaggle, and access the files inside a train.zip file to then train my model. This file includes images of Cats and Dogs and the filenames reveal if the image is an image of a cat or a dog. I think can do this by reading the zip archive and then create lists of the number of Cat and Dog images.
I know I can use this code to read the zip archive:
with zipfile.ZipFile("../input/dogs-vs-cats/train.zip","r") as z:
z.extractall(".")
print(check_output(["ls", "train"]).decode("utf8"))
Also, the code below, can be used to categorize files, providing that we have them unzipped. However, it seems the file is not unzipped and we have only read it using the code above. So, I don't know how I can mate these two codes to be able to read file names.
categories = []
for filename in filenames:
category = filename.split('.')[0]
if category == 'dog':
categories.append(1)
else:
categories.append(0)
df = pd.DataFrame({
'filename': filenames,
'category': categories
})
print (categories)
The problem is that it seems the filenames
can only be a string and I cannot assign the output of the first code (containing the ZipFile command) to it. I think by adding the following code, I can read the directory and assign values to the filenames
; however,the file should be unzipped.
filenames = os.listdir("../input/dogs-vs-cats/")
So, I wonder how I can feed the zipfile to the categorization code, or how I can unzip the file in Kaggle in a way that files can be found in the directory?