How do I apply a function to every row of a csv file and save the new data into a new file?

Question

I have this MNIST data set of 10,000 rows and I'm trying to apply a convolution kernel to every single row, but what my code does only produces the last line after it's done. It's been reshaped to 28,28. This is a snippet the raw original data set. 10000 rows of 784 numbers that corresponds to MNIST data.

test_data_file = open("mnist_test.csv", 'r')      
test_data_list = test_data_file.readlines()    
test_data_file.close() 

for record in test_data_list:                  # test_data_list is all the values in the test file
        all_values = record.split(',')             # split each record (image) into values seperated by commas
        correct_label = int(all_values[0])         # the first value is the lab
        inputs = (numpy.asfarray(all_values[1:]))    
    
        original = numpy.asfarray(inputs.reshape((28,28)))    # the list is made into an array
        sharpen_kernel = np.array([
                    [0, -1, 0],
                    [-1, 5, -1],
                    [0, -1, 0]])  
    
        matplotlib.rcParams['figure.figsize'] = 20,20      # convolve your image with the kernel
        conv_image = numpy.ones((28,28))
    
    # make a subarray and convolve it with the kernel
        step = 3
        i=0
        while i < 25:
            i+=1
            j = 0
            while j < 25 :
                sub_image = original[i:(i+step),j:(j+step):]    
                sub_image = numpy.reshape(sub_image,(1,(step ** 2)))
                kernel = numpy.reshape(sharpen_kernel, ((step ** 2),1))
                conv_scalar = numpy.dot(sub_image,kernel)
                sharpened[i,j] = conv_scalar
                j+=1
            pass

This is what I get when I np.savetxt it into a new file. You see that it's only one single line. I want to produce a new csv file with ALL 10,000 rows after applying the kernel.

And when I matplot my 'sharpened' image, I only get a singular image. Do I have to use count+= function or add a new loop somewhere after the 'for record in ...' line? A very confused newbie.

you should start to use pandas library (https://pandas.pydata.org/) — bAN, Nov 04 '22 at 09:46
So basically, I'm trying to apply that convolution kernel to all 10,000 values in my test_data_list csv - it has 10000 rows of 784 numbers that corresponds to MNIST dataset. It's already been reshaped to 28,28. When I run that current code, and print(sharpened) it only gives me an output of the last line (10000th), post kernel. What I want is to basically have a new csv file, probably having to use np.savetxt to write into a new file and reshape back to (1,784) but I want that kernel to be applied to EVERY 10,000 of that initial csv file. Am very new to python if you can't tell. — Moe, Nov 04 '22 at 09:59
I'm trying to get that kernel to apply to every MNIST row in the datafile so that I can get a new csv with the post-kernel images. — Moe, Nov 04 '22 at 14:05

Caleth · Answer 1 · 2022-11-04T10:15:10.363

I'd recommend taking your loop body, and moving it to a function. You can use numpy's array2string to get one line of output per line of input

def process(record: str) -> str:
    # your loop's body
    return numpy.array2string(sharpened, separator=',', suffix='\n')

with test_data_file = open("mnist_test.csv", 'r'):
    test_data_list = test_data_file.readlines()

with output_file = open("output.csv", 'w'):
    for record in test_data_list:
        output_file.write(process(record))

How do I apply a function to every row of a csv file and save the new data into a new file?

1 Answers1