0

I have a set of textiles that I read using np.genfromtext. Usually they are in a standard format, one text file for each plate measured, with each plate having 300 holes. This gives me headers of:

headers =['ID','Diameter','Radius','Xpos','Ypos']
#the data looks like
[1,105,53.002,784.023,91.76],
[2,104,51.552,787.023,91.71],
...
[300,104,51.552,787.023,91.71]

Now I have a set of textiles that instead of one measurement per hole for a plate are measuring one hole twice:

[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[2,104,51.552,787.023,91.71],
[2,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]

or one in every two holes twice:

[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[3,104,51.552,787.023,91.71],
[3,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]

or 1 in three holes twice:

[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[4,104,51.552,787.023,91.71],
[4,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]

What I would like is one method of taking the first value in each row, the 'ID' and based on that be able to take an average of how ever many rows have that same ID and then proceed with the rest of my code to analyse the results.

This is how I usually read in the 1 of 1 data:

dataA=np.genfromtxt(fname,dtype=float, delimiter='\t', names=True)

And this line works fine if every textile had a duplicate row or second measurement:

lines = open( 'filename.txt', "r" ).readlines()[::2]

Any ideas on how to get a unique array as an output with no duplications of ID, ideally averages of the rows with the same ID but unique rows may suffice?

Windy71
  • 851
  • 1
  • 9
  • 30
  • 1
    What do you mean by "1 of 1 data"? And why are you skipping the duplicates if you want to "take an average of how ever many rows have that same ID"? – Akaisteph7 Jul 23 '19 at 14:04
  • 1
    Also please clearly provide an expected input and output. Would you want the code to work in all four cases or just one? – Akaisteph7 Jul 23 '19 at 14:06
  • Hi, 1 of 1 means 1 measurement of 1 hole, 1 of 3 means 1 measurement and then skip the next two holes etc. I would prefer to have averages of the rows with same ID or if thats not possible to skip the measurements after the first one with that ID. – Windy71 Jul 23 '19 at 14:10
  • 1
    That "output" is the same as the input. Didn't you say you wanted averages? – Akaisteph7 Jul 23 '19 at 14:13
  • All four cases in one method ideally. Sorry, I was giving the output format, output format with averages is: [1, 104.5, 53.007, 784.0235, 91.76], [4, 105, 51.542, 786.923, 91.7], [300, 104, 51.5545, 786.508, 91.655] – Windy71 Jul 23 '19 at 14:28
  • 2
    Possible duplicate of [Group and Average Numpy Matrix](https://stackoverflow.com/questions/29291279/group-and-average-numpy-matrix) – Georgy Jul 23 '19 at 14:34
  • You still haven't shown what the expected output would be.. – Akaisteph7 Jul 23 '19 at 15:02
  • Hi,I edited the output comment to show the format and averages, three comments above this one. – Windy71 Jul 23 '19 at 16:23

1 Answers1

1

You can use below code. This will not average but you get rid of duplicate index values.

import numpy as np
a = np.array([[2,8,3,1], [3,2,3,3], [5,3,2,1], [1,4,2,3], [3,6,3,4], [2,4,5,6], [4,1,1,1]])
a[np.unique(a[:,0],return_index=True,axis=0)[1]]
Pritesh Gohil
  • 416
  • 7
  • 15
  • Hi Pritesh, when I tried it with import numpy as np a=[[2,8,3,1], [3,2,3,3], [5,3,2,1], [1,4,2,3], [3,6,3,4], [2,4,5,6], [4,1,1,1]] a[np.unique(a[:,0],return_index=True,axis=0)[1]] print (a) I got "TypeError: list indices must be integers or slices, not tuple" as error message, not sure what I've done wrong here. – Windy71 Jul 23 '19 at 16:29
  • sorry that I only wrote final solution. `import numpy as np` `a = np.array([[2,8,3,1], [3,2,3,3], [5,3,2,1], [1,4,2,3], [3,6,3,4], [2,4,5,6], [4,1,1,1]])`. We are using numpy.unique so your data must be in numpy array. Numpy module cannot process list. – Pritesh Gohil Jul 24 '19 at 07:58