0

I am not Python programmer, but I need to use some method from SciPy library. I just want to repeat inner loop a couple of times, but with changed index of table. Here is my code for now:

from scipy.stats import pearsonr

fileName = open('ILPDataset.txt', 'r')
attributeValue, classValue = [], []

for index in range(0, 10, 1):
    for line in fileName.readlines():
        data = line.split(',')
        attributeValue.append(float(data[index]))
        classValue.append(float(data[10]))
    print(index)
    print(pearsonr(attributeValue, classValue))

And I am getting the following output:

0
(-0.13735062681256097, 0.0008840631556260505)
1
(-0.13735062681256097, 0.0008840631556260505)
2
(-0.13735062681256097, 0.0008840631556260505)
3
(-0.13735062681256097, 0.0008840631556260505)
4
(-0.13735062681256097, 0.0008840631556260505)
5
(-0.13735062681256097, 0.0008840631556260505)
6
(-0.13735062681256097, 0.0008840631556260505)
7
(-0.13735062681256097, 0.0008840631556260505)
8
(-0.13735062681256097, 0.0008840631556260505)
9
(-0.13735062681256097, 0.0008840631556260505)

As you can see index is changing, but the result of that function is always like the index would be 0.

When I am running script couple of times but with changing index value like this:

attributeValue.append(float(data[0]))
attributeValue.append(float(data[1]))
...
attributeValue.append(float(data[9]))

everything is ok, and I am getting correct results, but I can't do it in one loop statement. What am I doing wrong?

EDIT: Test file:

62,1,6.8,3,542,116,66,6.4,3.1,0.9,1
40,1,1.9,1,231,16,55,4.3,1.6,0.6,1
63,1,0.9,0.2,194,52,45,6,3.9,1.85,2
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,6.2,3,240,1680,850,7.2,4,1.2,1
20,1,1.1,0.5,128,20,30,3.9,1.9,0.95,2
84,0,0.7,0.2,188,13,21,6,3.2,1.1,2
57,1,4,1.9,190,45,111,5.2,1.5,0.4,1
52,1,0.9,0.2,156,35,44,4.9,2.9,1.4,1
57,1,1,0.3,187,19,23,5.2,2.9,1.2,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
30,1,1.3,0.4,482,102,80,6.9,3.3,0.9,1
17,0,0.7,0.2,145,18,36,7.2,3.9,1.18,2
46,0,14.2,7.8,374,38,77,4.3,2,0.8,1

Expected results of pearsonr for 9 script runs:

data[0] (0.06050513030608389, 0.8238536636813034)
data[1] (-0.49265895172303803, 0.052525691067199995)
data[2] (-0.5073312383613632, 0.0448647312201305)
data[3] (-0.4852842899321005, 0.056723468068371544)
data[4] (-0.2919584357031029, 0.27254138535817224)
data[5] (-0.41640591455640696, 0.10863082761524119)
data[6] (-0.46954072465442487, 0.0665061785375443)
data[7] (0.08874739193909209, 0.7437895010751641)
data[8] (0.3104260624799073, 0.24193152445774302)
data[9] (0.2943030868699842, 0.26853066217221616)
  • put the `print(index)` and `print(pearsonr(attributeValue, classValue))` inside the second for loop by indenting it – Stack Jan 11 '18 at 19:55
  • But I can't call **pearsonr** method every inner loop iteration. I want first to fill those tables and then call this method. It should be called 10 times. – Krzysiek Zienkiewicz Jan 11 '18 at 20:02
  • can you add a single line from `ILPDataset.txt` to your question to better understand – Stack Jan 11 '18 at 20:03
  • `65,0,0.7,0.1,187,16,18,6.8,3.3,0.9,1` – Krzysiek Zienkiewicz Jan 11 '18 at 20:04
  • Can you provide a few lines of the file and the expected result for `attributeValue`? – wwii Jan 11 '18 at 20:05
  • Or do you want to correlate item 0 with item 10 for each line then item 1 with item 10 for each line then item 2 with item 10 for each line? ...?? – wwii Jan 11 '18 at 20:11
  • I've edited post and add some test file, and expected results for 9 script runs. But I want to get those values in one script run. As you wrote I want to correlate item 0 with item 10 then item 1 and item 10. – Krzysiek Zienkiewicz Jan 11 '18 at 20:17
  • Is there a reason you're not using the built in `csv` module to parse that file? – jpmc26 Jan 11 '18 at 23:25

2 Answers2

1

Turn each line of the file into a list of floats

data = []
with open'ILPDataset.txt') as fileName:
    for line in fileName:
        line = line.strip()
        line = line.split(',')
        line = [float(item) for item in line[:11]]
        data.append(line)

Transpose the data so that each list in data has the column values from the original file. data --> [[column 0 items], [column 1 items],[column 2 items],...]

data = zip(*data)    # for Python 2.7x
#data = list(zip(*data))    # for python 3.x

Correlate:

for n in [0,1,2,3,4,5,6,7,8,9]:
    corr = pearsonr(data[n], data[10])
    print('data[{}], {}'.format(n, corr))
wwii
  • 23,232
  • 7
  • 37
  • 77
0

@wwii 's answer is very good

Only one suggestion. list(zip(*data)) seems a bit overkill to me. zip is really for lists with variable types and potentially variable lengths to be composed into tuples. Only then be transformed back into lists in this case with list()).

So why not just use the simple transpose operation which is what this is?

import numpy;

//...

data = numpy.transpose(data);

which does the same job, probably faster (not measure) and more deterministically.

Oliver Schönrock
  • 1,038
  • 6
  • 11