Saving header from CSV file using `numpy.genfromtxt()`

Question

I'm using numpy.genfromtxt() to read in a CSV file, and I'd like to save the header separately from the data below the header.

I know that the skip_header=1 parameter allows me to skip the header, but, in that case, the header is lost, but I'd like to save it. I tried using the skip_footer parameter such that it would skip everything below the header and keep only the header by setting skip_footer equal to one less than the length of the CSV file or skip_footer=(len('filename.csv')-1). The code runs but it doesn't give the right output. Somehow, numpy.genfromtxt() doesn't count the rows of the CSV file in the way that I'm imagining.

header = numpy.genfromtxt('filename.csv', delimiter=',', skip_footer=(len('filename.csv')-1))

I expected to get just the header as a 1-D NumPy array, instead I get something resembling the whole array:

[[      nan       nan       nan ...       nan       nan       nan]
 [2.016e+03 1.000e+00 1.000e+00 ... 1.165e+01 6.999e+01 1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 ... 8.000e+00 5.430e+01 1.000e+00]
 ...
 [2.016e+03 6.000e+00 3.000e+01 ... 0.000e+00 4.630e+01 2.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 8.750e+00 5.255e+01 1.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 8.880e+00 5.822e+01 1.000e+00]]

I want to keep just what's in the top row of nans.

Use of `maxrows` makes more sense the `skip_footer` (which requires reading the whole file). — hpaulj, Jun 09 '19 at 18:30
Thanks, @hpaulj, that worked. You just have to be careful to specify the `dtype` parameter as `dtype=str` in this case. See answer below. — Data2Dollars, Jun 09 '19 at 18:58
See https://stackoverflow.com/a/48706350/1021819 for a clever use of `json.dumps()` for populating of such a header — jtlz2, May 27 '22 at 07:42

score 2 · Accepted Answer · answered Jun 09 '19 at 19:01

SOLUTION:

header = np.genfromtxt('filename.csv', delimiter=',', dtype=str, max_rows=1)
print(header)

OUTPUT:

['pickup_year' 'pickup_month' 'pickup_day' 'pickup_dayofweek'
 'pickup_time' 'pickup_location_code' 'dropoff_location_code'
 'trip_distance' 'trip_length' 'fare_amount' 'fees_amount' 'tolls_amount'
 'tip_amount' 'total_amount' 'payment_type']

Saving header from CSV file using `numpy.genfromtxt()`

1 Answers1

Linked