This is a confusing problem, and it stems from the fact that there are many layers of legacy classes and data structures in astropy.io.fits
(stemming back from earlier versions of PyFITS). For example, you can see in your example that hdu.data
is a FITS_rec
object, which is like a Numpy recarray (itself a soft-deprecated legacy class), but it also has a .columns
attribute (as you've noted):
>>> bug.columns
ColDefs(
name = 'target'; format = '20A'
name = 'V_mag'; format = 'E'
)
This in turn actually holds references back to the original arrays from which you described the columns. For example:
>>> bug.columns['target'].array
chararray(['NGC1001', 'NGC1002', 'NGC1003'],
dtype='|S20')
You can see here that even though bug
is a "slice" of your original table, the arrays referenced through bug.columns
are still contain the original, unsliced array data. So when you do something like in your original post
>>> col0 = bug.columns
>>> col1 = fits.ColDefs([fits.Column(name='new',format='D',array=bug.field("V_mag")+1.)])
it's doing its best here to figure out the intent, but col0
here has no idea that bug
is a slice of the original table anymore, it only has the original "coldefs" with the full columns to rely on here.
Most of these classes, including FITS_rec
, Column
, and especially ColDefs
almost never need to be used directly anymore. Unfortunately not all of the documentation has been updated to reflect this fact, and there are a lot of older tutorials and example code that show usage of these classes. Nobody with the requisite expertise has been able to take the time to update the docs and clarify this point.
On occasion Column
is useful if you already have columnar data with each column in a separate array, and you want to build a table from it and give some specific FITS attributes to the table columns. But I have redesigned much of the API so that you can take native Python data structures like Numpy arrays and save them to FITS files without worrying about the details of how FITS is implemented or annoying things like FITS data format codes in many cases.
This work is slightly incomplete, because it seems if you want to define a FITS table from some columnar arrays, you still need to use the Column
class and specify a FITS format at a minimum (but you never need to use ColDefs
directly):
>>> hdu = fits.BinTableHDU.from_columns([fits.Column(name='target', format='20A', array=a1), fits.Column(name='V_mag', format='E', array=a2)])
>>> hdu.data
FITS_rec([('NGC1001', 11.1), ('NGC1002', 12.3), ('NGC1003', 15.2)],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '<f4')]))
However, you can also work with Numpy structured arrays directly, and I usually find that simpler personally, as it allows you to ignore most FITS-isms and just focus on your data, for those cases where it's not important to finely tweak the FITS-specific stuff. For example, to define a structured array for your data, there are several ways to go about that, but you might try:
>>> nrows = 3
>>> data = np.empty(nrows, dtype=[('target', 'S20'), ('V_mag', np.float32)])
>>> data['target'] = a1
>>> data['V_mag'] = a2
>>> data
array([('NGC1001', 11.100000381469727), ('NGC1002', 12.300000190734863),
('NGC1003', 15.199999809265137)],
dtype=[('target', 'S20'), ('V_mag', '<f4')])
and then you can instantiate a BinTableHDU
directly from this array:
>>> hdu = fits.BinTableHDU(data)
>>> hdu.data
FITS_rec([('NGC1001', 11.1), ('NGC1002', 12.3), ('NGC1003', 15.2)],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '<f4')]))
>>> hdu.header
XTENSION= 'BINTABLE' / binary table extension
BITPIX = 8 / array data type
NAXIS = 2 / number of array dimensions
NAXIS1 = 24 / length of dimension 1
NAXIS2 = 3 / length of dimension 2
PCOUNT = 0 / number of group parameters
GCOUNT = 1 / number of groups
TFIELDS = 2 / number of table fields
TTYPE1 = 'target '
TFORM1 = '20A '
TTYPE2 = 'V_mag '
TFORM2 = 'E '
Likewise when it comes to things like masking and slicing and adding new columns, working directly with the native Numpy data structures is best.
Or, as suggested in the answers to other question, use the Astropy Table
API and don't mess with low-level FITS stuff at all if you can help it. Because as I discussed, it contains several layers of legacy interfaces that make things confusing (and that long term should probably be cleaned up, but it's hard to do because code that uses them in some way or another are pervasive). The Table API was designed from the ground-up to make table manupulations, including things like masking rows and adding columns, relatively easy. Whereas the old PyFITS APIs never quite worked for many simple cases.
I hope this answer was edifying--I know it's maybe a bit long and confusing. If there is anything specific I can clear up let me know.