Data formats for Astronomy

by S. Prasad Ganti

A recent article in the journal “Nature” piqued my interest about data formats for data coming from microscopic observations. Wearing my Information Technology hat, I read this article and also explored the different data formats in use for Astronomical data.  

The article mentioned that there is no one standard for data coming out of Microscopes. Thousands of biological researchers use microscopes made by different manufacturers. Unfortunately each vendor creates data in their own proprietary format. As a result, researchers cannot exchange data with each other. 

The importance of data standards becomes noticeable if the recipient cannot view an image I share in an email. We all can read a PDF file or a JPEG based picture. We take these standards for granted, but absent such standards the networking effect would be very feeble. That is the value of a phone network increases if more people can share information (voice or data). We just have to remember the VHS vs. Betamax wars from yesteryears for the video cassettes. One type of video player could not play the other format. Eventually VHS won the battle but lost the war when video cassettes became history in a few years. 

A standard called OME (Open Microscopy Environment) is coming to the fore so that one set of observations made by one researcher can be viewed and studied by another in a different Lab or a different country. The key to such a data format is the presence of metadata, which describes the data present in the file, i.e. the specimen observed, the conditions prevailing at the time of observations etc. Increasingly data formats contain the metadata, the actual data, and chunking of data so that the whole set of data need not be transmitted which could overwhelm the networks as well as the resources on devices like cell phones. Like Google maps, more data is requested when the user moves the cursor to a different area or wants to magnify a certain area. The data is transmitted only in chunks on a need basis.

The telescopes and the spectroscopes by comparison to a microscope,  are custom built. I am not talking about the backyard telescopes. Instead the Keck telescope in Hawaii or the JWST (James Webb Space Telescope). It is easier to standardize data format for astronomical data than it is for biological data from a microscope. 

In the 1970s, the Astronomers came up with the Flexible Image Transport System (FITS) standard when became a widely used data format. It incorporates metadata and binary data in the same file.The FITS standard has several limitations that make it difficult to use for complicated and hierarchical metadata. It is an older standard which was defined when computing and networking resources were at a premium. It needed a change. 

The Advanced Scientific Data Format (ASDF) was originally developed in 2015. The format consists of a YAML (Yet Another Markup Language) header optionally followed by one or more binary blocks for containing binary data. This format is not limited to Astronomy, and is being proposed as suitable for much of scientific and engineering data, NASA has adopted ASDF and FITS data formats for JSWT. The upcoming Nancy Grace Roman Space Telescope is slated to be launched in 2026. Although its mirror size is the same as that of the Hubble Space Telescope (HST), its field of view is much larger. It will be used mostly for survey of the skies. ASDF will be the primary data format for this telescope.

To read and process the data, libraries exist for the Python and C++ programming languages. ASDF, like FITS, is only a transport format but not a storage format. Storage formats are different for high performance computing environments. Given that a lot of research gets done after, sometimes long after the data is captured by the telescope, storing and transporting data in a standard format is very important. Data is at the center of new astronomical discoveries. With the launch of more sophisticated telescopes with increasingly complex instruments, and more powerful computing facilities and standard data formats on the Earth, awaiting many more discoveries in the future. 

This entry was posted in November 2023, Sidereal Times and tagged , . Bookmark the permalink.

Leave a comment