My data looks like this:
1516268134 49.95 99.982 49.95 0 0 0 0 0 0 0 1516268134 49.95 99.966 49.95 0 0 0 0 0 0 0 1516268134 49.95 100.28 49.95 0 0 0 0 0 0 0 1516268134 49.95 100.01 49.95 0 0 0 0 0 0 0 1516268134 49.95 100.10 49.95 0 0 0 0 0 0 0 1516268134 49.95 99.773 49.95 0 0 0 0 0 0 0 1516268134 49.95 99.246 49.95 0 0 0 0 0 0 0 1516268134 49.95 144.89 49.95 0 0 0 0 0 0 0 1516268135 49.95 55.700 49.95 0 0 0 0 0 0 0 1516268135 49.95 99.441 49.95 0 0 0 0 0 0 0
2nd, 3rd and 4th columns are floats, the rest are integers. Separator is tab.
I need to take N lines, and calculate min/mean/max values, like
1516268134 49.950 55.700 49.950 0 0 0 0 0 0 0 1516268134 49.950 99.939 49.950 0 0 0 0 0 0 0 1516268135 49.9500 144.890 49.950 0 0 0 0 0 0 0
Again, 2nd, 3rd and 4th columns are floats, the rest need to be integers. Separator is still a tab.
The code looks like this:
import sys import pandas file=open(sys.argv[2], "w") for data in pandas.read_table(sys.argv[1], delim_whitespace=True, header=None, chunksize=int(sys.argv[3])): file.write("%d\t%f\t%f\t%f\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n" % (data[0].min(), data[1].min(), data[2].min(), data[3].min(), data[4].min(), data[5].min(), data[6].min(), data[7].min(), data[8].min(), data[9].min(), data[10].min())) file.write("%d\t%f\t%f\t%f\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n" % (data[0].mean(), data[1].mean(), data[2].mean(), data[3].mean(), data[4].mean(), data[5].mean(), data[6].mean(), data[7].mean(), data[8].mean(), data[9].mean(), data[10].mean())) file.write("%d\t%f\t%f\t%f\t%d\t%d\t%d\t%d\t%d\t%d\t%d\n" % (data[0].max(), data[1].max(), data[2].max(), data[3].max(), data[4].max(), data[5].max(), data[6].max(), data[7].max(), data[8].max(), data[9].max(), data[10].max())) file.close()
I'd like to make the code shorter and look better (& be more easy to understand & maintain).
Tried replacing the 11x data[X].FUNC() with with single data.FUNC() but that gave me error "TypeError: %d format: a number is required, not Series".
The next thing I tried was data.FUNC().convert_objects(convert_numeric=True) but that gave me the same error.
How can I replace
data[0].max(), data[1].max(), data[2].max(), data[3].max(), data[4].max(), data[5].max(), data[6].max(), data[7].max(), data[8].max(), data[9].max(), data[10].max()
with something short and simple, and keep the float/int format in the data?
I was looking for a solution to convert data.FUNC() to 11 individual numbers, but failed.
-Paavo