4

I have some data stored in fixed width text file format that I can read into pandas - however, I need to be able to save it back the same way. I'm almost able to save it using numpy.savetxt() but I can't get the format string to left pad zeros thereby maintaining the right column width. I will define the problem generally because I wouldn't mind seeing if other solutions exist within pandas for example. Here is what the data looks like:

19570010008.3980008.3150004.8380003.8390003.8470002.7150007.313
19570020008.7610008.8500009.0170009.1870009.3030009.4630004.479
19570030008.7090008.2880008.2660008.6920005.9340006.3410002.832
19570040008.5750008.9160009.2570009.7800009.9960010.4320009.518
19570050009.2030008.9530008.7690009.3770009.9450009.5650009.554
19570060009.5840008.9930009.4220010.0380009.8050010.4230009.965
19570070009.2030009.1210009.3770009.4600010.0290010.2850009.726
19570080002.6520002.5970002.6850003.9650002.7860002.8100003.657
19570090009.3830009.2140007.6890007.0390005.8230005.1310002.922
19570100008.0510008.6540009.1620008.4300008.9810009.0460005.027
19570110008.6200007.9140005.8870006.4840008.0130006.1190009.438
19570120009.5460009.3730009.3560009.7090009.4510009.1450008.531
19570130008.3750008.6330006.2340006.4720006.5210004.9730003.002
19570140005.2490004.5890002.8050002.8340002.9050002.9300003.024
19570150008.5760009.6430009.6230010.2590010.3760010.9220010.722
19570160009.8880009.6180009.7790009.8600010.6320006.6980011.374
19570170010.1370009.7760007.0580009.8330010.0330010.8690010.364
19570180010.3010009.9380010.1940010.8420010.6760010.9410011.221

and here is how I read it into my dataframe:

#Define function to parse the dates
parse = lambda x: pd.Timestamp(datetime(int(x[0:4]), 1, 1) + timedelta(int(x[4:7]) - 1))

#Get the overall width
with open("file.txt") as f:
    L = len(f.readline())

#Define column specifications
specs = [(0,7)] + [(7+5*i, 11+5*i) for i in xrange((L-8)/5)]

#Load in the data
df = pd.read_fwf("file.txt", specs, index_col=0, header=[0,1,2], parse_dates=True, date_parser=parse)

I get a frame that looks like this:

    In [62]:

df

Out[62]:
    1   2   3   4   5   6   7
0                           
1957-01-01  8.398   8.315   4.838   3.839   3.847   2.715   7.313
1957-01-02  8.761   8.850   9.017   9.187   9.303   9.463   4.479
1957-01-03  8.709   8.288   8.266   8.692   5.934   6.341   2.832
1957-01-04  8.575   8.916   9.257   9.780   9.996   10.432  9.518
1957-01-05  9.203   8.953   8.769   9.377   9.945   9.565   9.554
1957-01-06  9.584   8.993   9.422   10.038  9.805   10.423  9.965
1957-01-07  9.203   9.121   9.377   9.460   10.029  10.285  9.726
1957-01-08  2.652   2.597   2.685   3.965   2.786   2.810   3.657
1957-01-09  9.383   9.214   7.689   7.039   5.823   5.131   2.922
1957-01-10  8.051   8.654   9.162   8.430   8.981   9.046   5.027
1957-01-11  8.620   7.914   5.887   6.484   8.013   6.119   9.438
1957-01-12  9.546   9.373   9.356   9.709   9.451   9.145   8.531
1957-01-13  8.375   8.633   6.234   6.472   6.521   4.973   3.002
1957-01-14  5.249   4.589   2.805   2.834   2.905   2.930   3.024
1957-01-15  8.576   9.643   9.623   10.259  10.376  10.922  10.722
1957-01-16  9.888   9.618   9.779   9.860   10.632  6.698   11.374
1957-01-17  10.137  9.776   7.058   9.833   10.033  10.869  10.364
1957-01-18  10.301  9.938   10.194  10.842  10.676  10.941  11.221
1957-01-19  6.731   10.010  6.034   9.781   10.556  10.336  10.798
1957-01-20  8.070   10.178  10.435  10.710  11.310  10.799  11.170
1957-01-21  10.720  10.256  10.513  10.788  11.195  11.465  11.750
1957-01-22  10.990  10.336  10.688  10.676  11.276  11.251  11.022
1957-01-23  10.890  10.418  10.577  11.729  11.261  11.532  11.712

which is fine, however I need to be able to save this back in the same form that I got it ie. positions of each row need to be the same with the right padding of zeros to do so. Is there a simple way to do this from pandas or numpy?

Here is what I've tried using numpy.savetxt():

#Convert first column back to way it was found using index
df.index = [int(str(d.year) + str(d.dayofyear).zfill(3)) for d in df.index]
df = df.reset_index()

#List if format strings for each column
formats = ['%i'] + ['%04.3f' for i in xrange((L-8)/8)]
#Save using empty string as delimiter
np.savetxt("testing.txt", df.values, fmt=formats, delimiter='')

The output of this attempt is something like this:

19570018.3988.3154.8383.8393.8472.7157.313
19570028.7618.8509.0179.1879.3039.4634.479
19570038.7098.2888.2668.6925.9346.3412.832
19570048.5758.9169.2579.7809.99610.4329.518
19570059.2038.9538.7699.3779.9459.5659.554
19570069.5848.9939.42210.0389.80510.4239.965
19570079.2039.1219.3779.46010.02910.2859.726
19570082.6522.5972.6853.9652.7862.8103.657
19570099.3839.2147.6897.0395.8235.1312.922
19570108.0518.6549.1628.4308.9819.0465.027
19570118.6207.9145.8876.4848.0136.1199.438
19570129.5469.3739.3569.7099.4519.1458.531
19570138.3758.6336.2346.4726.5214.9733.002
19570145.2494.5892.8052.8342.9052.9303.024
19570158.5769.6439.62310.25910.37610.92210.722
19570169.8889.6189.7799.86010.6326.69811.374

So as I mentioned the left padding of zeros is not occuring although I thought I specified this in the format string.

pbreach
  • 16,049
  • 27
  • 82
  • 120
  • 2
    Try `'%08.3f'`, the 8 mean the total length, not the digits before "." – HYRY Jan 18 '15 at 11:55
  • 1
    Possible duplicate of: http://stackoverflow.com/questions/34010451/specific-column-widths-and-alignment-with-savetxt – albert Dec 01 '15 at 00:37
  • @albert Since this question was asked nearly a year before, wouldn't that technically make the question you referenced a duplicate? – pbreach Dec 01 '15 at 00:57
  • That's true. However, I found this question after answering the other and mixed those two questions up a bit. Is there a way to tidy up this mess? – albert Dec 01 '15 at 01:03
  • Idk, I would just leave it seeing as the answer you gave in the other was clear assuming it solves OP's question. It would have solved this one as well (even though all I needed was @HYRY 's comment to solve but I must have been too busy to respond at the time...) – pbreach Dec 01 '15 at 01:11
  • I mean you could just answer this one and mark other as duplicate if you really wanted, doesn't matter to me. – pbreach Dec 01 '15 at 01:13

0 Answers0