I am downloading a star catalog from Vizier (using astroquery). The catalog concerned does not include star names so I am getting these from SIMBAD (also using astroquery) by querying all SIMBAD stars within 1 arcsec of each of my Vizier catalog stars.
I need to then perform a match by ra/dec coordinates. However both the Vizier and SIMBAD coordinates may be slightly inaccurate so I can't do an exact match.
My current solution is to specify a tolerance and, for each Vizier star, call the function below to loop through the SIMBAD stars, testing whether the coordinates agree within the specified tolerance. As a double-check, because stars can be very close together, I also check whether the star magnitudes match to within 0.1 mag.
This all works but for a Vizier catalog of c.2,000 stars and a SIMBAD dataset of similar size it takes over 2 minutes to run. I'm looking for ideas to speed this up.
def get_simbad_name(self, vizier_star, simbad_stars, tolerance):
"""
Searches simbad_stars to find the SIMBAD name of the star
referenced in vizier_star.
A match is deemed to exist if a star in simbad_stars has both
ra and dec +/- tolerance of the target vizier_star and if their V
magnitudes, rounded to one decimal place, also match.
Parameters
==========
vizier_star : astropy.table.Row
Row of results from Vizier query, corresponding to a star in a
Vizier catalog. Columns of interest to this function are:
'_RAJ2000' : float [Right ascension in decimal degrees]
'_DEJ2000' : float [Declination in decimal degrees]
'Vmag' : float [V magnitude (to 3 decimal places)]
simbad_stars : list of dict
List of star data derived from a Vizier query. Keys of interest
to this function are:
'ra' : float [Right ascension in decimal degrees (ICRS/J2000)]
'dec' : float [Declination in decimal degrees (ICRS/J2000)]
'Vmag' : float [V magnitude (to 3 decimal places)]
'name' : str [SIMBAD primary id of star]
tolerance : float
The tolerance, in degrees, to be used in determining whether
the ra/dec coordinates match.
Returns
=======
name : str
If match then returns the SIMBAD name. If no match returns
an empty string.
Notes
=====
simbad_stars are not all guaranteed to have Vmag. Any that don't are
ignored.
"""
for item in simbad_stars:
try:
approx_Vmag = round(item['Vmag'],1)
except KeyError:
continue
if ((vizier_star['_RAJ2000'] > item['ra'] - tolerance) and
(vizier_star['_RAJ2000'] < item['ra'] + tolerance) and
(vizier_star['_DEJ2000'] > item['dec'] - tolerance) and
(vizier_star['_DEJ2000'] < item['dec'] + tolerance) and
(round(vizier_star['Vmag'],1) == approx_Vmag)):
return item['name']
return ''
Some more thoughts after the comments:
The match success is very high (c. 99%) so the loop exits early in almost all cases. It doesn’t have to iterate all of simbad_stars.
I could improve things further if I pre-sort simbad_stars by ra and use a binary chop to get the index of where to start the loop.