If you work with astronomy data for more than a few days, you eventually run into TAP.
TAP means Table Access Protocol. It is an IVOA standard for querying astronomical catalogs through web services. In practice, it lets archives expose their tables as remote relational databases, and you query them with ADQL, which is basically SQL with celestial-coordinate functions.
For a data astronomer, knowing TAP is very close to knowing SQL for normal data engineering. Gaia, NOIRLab Astro Data Lab, VizieR, MAST and several radio archives use it.
The basic architecture
The shape is simple:
Client (Python, TOPCAT, browser)
|
| HTTP POST/GET
v
TAP server ---> relational database
|
v
Result (VOTable, FITS, CSV, JSON)
A TAP service usually exposes:
{base_url}/syncfor synchronous queries, good for small and fast requests{base_url}/asyncfor asynchronous jobs, needed when the query is slow or large{base_url}/tablesto inspect schemas and tables{base_url}/capabilitiesto see what the server supports
For quick tests, /sync is fine. For real catalog work, /async is safer.
ADQL
ADQL is SQL-92 with astronomy extensions. The part that matters most is geometry:
| Function | Use |
|---|---|
POINT('ICRS', ra, dec) |
A point on the sky |
CIRCLE('ICRS', ra, dec, r) |
A cone centered at RA/Dec with radius in degrees |
BOX('ICRS', ra, dec, w, h) |
A rectangular sky region |
POLYGON('ICRS', ...) |
An arbitrary polygon |
CONTAINS(a, b) |
Returns 1 if one geometry contains the other |
INTERSECTS(a, b) |
Tests geometry intersection |
DISTANCE(a, b) |
Angular distance in degrees |
One thing that always bites people coming from SQL: use TOP, not LIMIT.
SELECT TOP 100 source_id, ra, dec, parallax
FROM gaiadr3.gaia_source
WHERE parallax > 10
Without TOP or a strong filter, it is very easy to request a huge table and hit a timeout.
Cone search
A cone search is the standard “give me everything around this coordinate” query:
SELECT TOP 1000 *
FROM survey.catalog
WHERE CONTAINS(
POINT('ICRS', ra, dec),
CIRCLE('ICRS', 153.26, -1.61, 1.5)
) = 1
The radius is in degrees. If you want one arcsecond, use about 0.000278.
Cross-matching catalogs
Spatial joins are where TAP becomes really useful. For example, matching a DESI source to nearby Gaia sources:
SELECT d.targetid, d.rv_adop, g.source_id, g.pmra, g.pmdec,
DISTANCE(POINT('ICRS', d.target_ra, d.target_dec),
POINT('ICRS', g.ra, g.dec)) AS sep_deg
FROM desi_dr1.mws AS d
JOIN gaiadr3.gaia_source AS g
ON CONTAINS(
POINT('ICRS', g.ra, g.dec),
CIRCLE('ICRS', d.target_ra, d.target_dec, 0.000278)
) = 1
WHERE CONTAINS(
POINT('ICRS', d.target_ra, d.target_dec),
CIRCLE('ICRS', 153.26, -1.61, 1.5)
) = 1
This kind of query is the backbone of a lot of data-driven astronomy: take one catalog, find counterparts in another, then build a physical picture from the combined measurements.
Useful TAP servers
| Service | TAP URL | Main use |
|---|---|---|
| Gaia Archive | https://gea.esac.esa.int/tap-server/tap |
Gaia DR3 and DR2 |
| NOIRLab Astro Data Lab | https://datalab.noirlab.edu/tap |
DESI, DECaLS, NSC, DES |
| MAST | https://mast.stsci.edu/tap |
HST, TESS, Kepler, K2 |
| VizieR | https://tapvizier.cds.unistra.fr/TAPVizieR/tap |
Published catalogs |
| SIMBAD | https://simbad.cds.unistra.fr/simbad/sim-tap |
Named objects and object types |
| CASDA | https://casda.csiro.au/casda_vo_tools/tap |
ASKAP and Parkes data |
| ARI Gaia mirror | https://gaia.ari.uni-heidelberg.de/tap |
Gaia mirror and distance tables |
Python with astroquery
For day-to-day work I usually start with astroquery:
from astroquery.utils.tap.core import TapPlus
tap = TapPlus(url="https://datalab.noirlab.edu/tap")
job = tap.launch_job("""
SELECT TOP 100 targetid, rv_adop, feh
FROM desi_dr1.mws
WHERE success = 'True'
""")
df = job.get_results().to_pandas()
For large jobs, use async:
job = tap.launch_job_async("""
SELECT targetid, target_ra, target_dec, rv_adop, feh, logg, teff
FROM desi_dr1.mws
WHERE CONTAINS(
POINT('ICRS', target_ra, target_dec),
CIRCLE('ICRS', 153.26, -1.61, 1.5)
) = 1
AND success = 'True'
AND rvs_warn = 0
""")
job.wait_for_job_end()
df = job.get_results().to_pandas()
Gaia also has a dedicated helper:
from astroquery.gaia import Gaia
Gaia.MAIN_GAIA_TABLE = "gaiadr3.gaia_source"
job = Gaia.launch_job_async("""
SELECT source_id, ra, dec, parallax, pmra, pmdec, phot_g_mean_mag
FROM gaiadr3.gaia_source
WHERE CONTAINS(
POINT('ICRS', ra, dec),
CIRCLE('ICRS', 153.26, -1.61, 1.5)
) = 1
AND parallax < 0.5
""")
df = job.get_results().to_pandas()
Python with pyvo
pyvo is lower level, but it is very useful when you want direct control over IVOA services:
import pyvo
service = pyvo.dal.TAPService("https://gea.esac.esa.int/tap-server/tap")
results = service.search("""
SELECT TOP 1000 source_id, ra, dec, parallax
FROM gaiadr3.gaia_source
WHERE parallax > 10
""")
df = results.to_table().to_pandas()
It also supports other VO protocols such as cone search, spectral access and image access.
TOPCAT
TOPCAT is still one of the best tools for interactive catalog work. It has a TAP tab, autocompletes tables and columns, does spatial matching, plots results and exports FITS, CSV and VOTable files. If I am exploring a new archive, I often start in TOPCAT and move the final query to Python later.
Common errors
| Error | Usual cause | Fix |
|---|---|---|
OverflowError: query exceeded row limit |
Result is too large | Add TOP or stronger filters |
400 Bad Request |
Invalid ADQL | Check quotes, geometry syntax and TOP |
| Timeout | Query is too slow | Use async TAP |
| Wrong coordinate column names | Table schemas vary | Inspect /tables first |
CONTAINS gives no rows |
Region is outside the footprint | Test with TOP 10 without geometry first |
TAP is not glamorous, but it is one of those boring standards that quietly makes modern astronomy possible. Once you get comfortable with it, the web becomes a giant astronomical database.