How to write feature vectors for all images of a folder in a txt file for future TSNE processing

Question

I am creating image feature vector using resnet50 in PyTorch. Each feature vector is of length 2048. When I want to write it to a txt file, I have to convert it to str which I did in the code below. Problem is, only a few numbers from the vector of length 2048 is saved in the txt file. How can I fix this?

Additionally, each of my filenames (images) have a label associated between 1 to 9 (9 classes).

The code below is a modification from this repo: https://github.com/christiansafka/img2vec

import numpy as np
import sys
import os
sys.path.append("..")  # Adds higher directory to python modules path.
from img_to_vec import Img2Vec
from PIL import Image
from sklearn.metrics.pairwise import cosine_similarity
import glob


input_path = "my image folder**"
img2vec = Img2Vec()
vector_fh = open('resnet50_feature_vectors.txt', 'w+')


# For each test image, we store the filename and vector as key, value in a dictionary
pics = {}

filenames = glob.glob(input_path + "/*.*")

for filename in filenames:
    print(filename)
    img = Image.open(filename)
    nd_arr = img2vec.get_vec(img)
    #str_arr = nd_arr.tostring()
    str_arr = np.array2string(nd_arr, formatter={'float_kind':lambda x: "%.2f" % x})
    vector_fh.write(str_arr+"\n")

Here is what I am receiving as results:

$ head resnet50_feature_vectors.txt

[0.22 1.54 0.40 ... 0.15 0.56 0.22]
[0.57 1.34 1.78 ... 0.26 1.19 1.30]
[0.01 2.81 0.15 ... 0.28 0.41 0.27]
[0.30 0.80 0.15 ... 0.02 0.08 0.03]
[0.10 1.39 0.60 ... 0.13 0.25 0.04]
[0.62 0.71 0.72 ... 0.36 0.15 0.51]
[0.43 0.44 0.52 ... 0.40 0.29 0.33]
[0.07 1.14 0.40 ... 0.09 0.08 0.10]
[0.13 1.45 0.96 ... 0.19 0.03 0.11]
[0.06 1.84 0.19 ... 0.11 0.11 0.03]

How should I fix the way feature vectors are saved in the txt file?

I am trying to follow the tutorial here in which there is a .txt file containing each feature vector and one txt file containing the label for each feature vector.

I am speaking about the Python part of the tutorial for MNIST dataset https://lvdmaaten.github.io/tsne/code/tsne_python.zip

Your question is not clear. When you say "only a few numbers from the vector of length 2048 is saved in the txt file", do you mean ***"when writing a [PIL.Image file](https://pillow.readthedocs.io/en/3.1.x/reference/Image.html), long lines are truncated with dots/ellipsis, like `[0.22 1.54 0.40 ... 0.15 0.56 0.22]`"?*** If so, check other questions on "PIL Image file long line". Also, the rest of your code is not needed, please reduce your example to about three lines, that's all we need to replicate (***minimal*** complete verification example). You can use junk or random-seeded data. — smci, Oct 24 '18 at 00:41
Also, only apply tags directly related to the code issue, not thematically describing the (unrelated) rest of your code. I untagged [tag:data-visualization] [tag:visualization] [tag:pytorch] and tagged [tag:python-imaging-library] [tag:file-writing] [tag:long-lines]. We don't need to know it's for TSNE, or resnet50 in PyTorch. Just strip the question down to the minimum needed. — smci, Oct 24 '18 at 00:45
If by fix you mean something like csv, use `np.savetxt(filename, array, delimiter=',')`. While if you want to process it using numpy in the future, a better solution would be `np.save` which saves binary format — ZisIsNotZis, Oct 24 '18 at 00:48
@smci what I am trying to say is that not all the 2048 numbers in the vector are present and namely only 6 numbers are present in each line when written to txt file. 3 numbers before ... and 3 numbers after ... is it more clear now? — Mona Jalal, Oct 24 '18 at 02:36

How to write feature vectors for all images of a folder in a txt file for future TSNE processing

0 Answers0