I am working on a python utility to get data from the Tycho 2 star catalogue. One of the functions I am working on queries the catalogue and returns all the information for a given star id (or set of star ids).
I'm currently doing this by looping through the lines of the catalogue file and then attempting to parse the line into a numpy structured array if it was queried. (note if there is a better way to do this you can let me know even though this is not what this question is about -- I'm doing it this way because the catalogue is too big to load all of it into memory at one time)
Anyway, once I have identified a record that I want to keep I've run into a problem... I can't figure out how to parse it into a structured array.
For instance, say the record I want to keep is:
record = '0002 00038 1| | 3.64121230| 1.08701186| 14.1| -23.0| 69| 82| 1.8| 1.9|1968.56|1957.30| 3|1.0|3.0|0.9|3.0|12.444|0.213|11.907|0.189|999| | | 3.64117944| 1.08706861|1.83|1.73| 81.0|104.7| | 0.0'
Now, I am trying to parse this into a numpy structured array with dtype:
dform = [('starid', [('TYC1', int), ('TYC2', int), ('TYC3', int)]),
('pflag', str),
('starBearing', [('rightAscension', float), ('declination', float)]),
('properMotion', [('rightAscension', float), ('declination', float)]),
('uncertainty', [('rightAscension', int), ('declination', int), ('pmRA', float), ('pmDc', float)]),
('meanEpoch', [('rightAscension', float), ('declination', float)]),
('numPos', int),
('fitGoodness', [('rightAscension', float), ('declination', float), ('pmRA', float), ('pmDc', float)]),
('magnitude', [('BT', [('mag', float), ('err', float)]), ('VT', [('mag', float), ('err', float)])]),
('starProximity', int),
('tycho1flag', str),
('hipparcosNumber', str),
('observedPos', [('rightAscension', float), ('declination', float)]),
('observedEpoch', [('rightAscension', float), ('declination', float)]),
('observedError', [('rightAscension', float), ('declination', float)]),
('solutionType', str),
('correlation', float)]
This seems like it should be a fairly simple thing to do but everything I try breaks...
I've tried:
np.genfromtxt(BytesIO(record.encode()),dtype=dform,delimiter=(' ','|'))
np.genfromtxt(BytesIO(record.encode()),dtype=dform,delimiter=(' ','|'),missing_values=' ',filling_values=None)
both of which gives me
{TypeError}cannot perform accumulate with flexible type
which makes no sense since it shouldn't be doing any accumulation.
I've also tried
np.array(re.split('\|| ',record),dtype=dform)
which complains
{TypeError}a bytes-like object is required, not 'str'
and another variant
np.array([x.encode() for x in re.split('\|| ',record)],dtype=dform)
which doesn't throw an error but also certainly doesn't return the correct results:
[ ((842018864, 0, 0), '', (0.0, 0.0), (0.0, 0.0), (0, 0, 0.0, 0.0), (0.0, 0.0), 0, (0.0, 0.0, 0.0, 0.0), ((0.0, 0.0), (0.0, 0.0)), 0, '', '', (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), '', 0.0)...
So how can I do this? I think the genfromtxt option is the way to go (especially since there may be missing data occasionally) but I don't understand why it isn't working. Is this something that I'm just going to have to write a parser for on my own?