I have a set of textiles that I read using np.genfromtext. Usually they are in a standard format, one text file for each plate measured, with each plate having 300 holes. This gives me headers of:
headers =['ID','Diameter','Radius','Xpos','Ypos']
#the data looks like
[1,105,53.002,784.023,91.76],
[2,104,51.552,787.023,91.71],
...
[300,104,51.552,787.023,91.71]
Now I have a set of textiles that instead of one measurement per hole for a plate are measuring one hole twice:
[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[2,104,51.552,787.023,91.71],
[2,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]
or one in every two holes twice:
[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[3,104,51.552,787.023,91.71],
[3,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]
or 1 in three holes twice:
[1,105,53.002,784.023,91.76],
[1,104,53.012,784.024,91.76],
[4,104,51.552,787.023,91.71],
[4,106,51.532,786.823,91.69],
...
[300,104,51.552,787.023,91.71],
[300,104,51.557,785.993,91.6]
What I would like is one method of taking the first value in each row, the 'ID' and based on that be able to take an average of how ever many rows have that same ID and then proceed with the rest of my code to analyse the results.
This is how I usually read in the 1 of 1 data:
dataA=np.genfromtxt(fname,dtype=float, delimiter='\t', names=True)
And this line works fine if every textile had a duplicate row or second measurement:
lines = open( 'filename.txt', "r" ).readlines()[::2]
Any ideas on how to get a unique array as an output with no duplications of ID, ideally averages of the rows with the same ID but unique rows may suffice?