Prev: P10.3 Next: P10.6

P10.5: Dowler, Patrick
Patrick Dowler (National Research Council Canada)
María Arévalo (European Space Astronomy Centre)
Javier Durán (European Space Astronomy Centre)
Daniel Durand (National Research Council Canada)
Séverin Gaudet (National Research Council Canada)
Jonathan Hargis, Brian McLean, Oliver Oberdorf, David Rodriguez (Mikulski Archives for Space Telescopes) Brian Major (National Research Council Canada)


Theme: Databases and Archives: Challenges and Solutions in the Big Data Era
Title: Archive-2.0: Metadata and Data Synchronisation between MAST, CADC, and ESAC

The Canadian Astronomy Data Centre (CADC) and the European Space Astronomy Centre (ESAC) maintain mirrors of and provide user access to the HST data collection. A new mirroring approach was needed to improve consistency and support future missions like JWST. The Common Archive Observation Model (CAOM) is used as the core model for all data holdings at the CADC and the Mikulski Archive for Space Telescopes (MAST) and was extended to support a metadata and data synchronization system that allows the partners to maintain a complete copy of the entire HST collection using metadata and files generated at MAST. The metadata synchronization process relies on a simple RESTful web service operated by the metadata source (MAST) and a metadata harvesting tool run by the mirror centres (CADC and ESAC). The harvesting tool normally operates in incremental mode (recent changes) to maintain an up-to-date copy of the metadata. Consistency of the metadata is insured through the use of a robust metadata checksum algorithm; a full validation mode can be used to check and fix cases where incremental harvest events or deleted observation events were missed (rare) or source and destination metadata checksums do not match. We have also developed a new file synchronization tool that leverages CAOM metadata to discover and retrieve files from the source (MAST) to the mirror sites. Through the use a backend plugins, the CADC and ESAC have extended the file synchronization to interface with their respective site-specific storage systems.. Like the metadata harvesting tool, file synchronization normally operates in incremental mode: it uses local CAOM metadata to discover new or modified files and schedule downloads. A separate mode performs downloads. A validation mode performs a full comparison of files referenced in CAOM with those in the local storage system and (optionally) schedules downloads to fix any discrepancies. Apart from the common data collection, services and tools described above, partners are allowed to extend CAOM metadata with more information specifically intended to provide added value features. ESAC, for instance, introduced information about publications in their instance of the Archive.

Link to PDF (may not be available yet): P10-5.pdf