Prev: I4.1 Next: O4.2

O4.1: Toledo, Ignacio
Ignacio Toledo (Joint ALMA Observatory)
Jorge Ibsen (Joint ALMA Observatory)





Time: Tue 11.45 - 12.00
Theme: Data Science: Workflows Hardware Software Humanware
Title: Data Science =! Software Engineering. Exploring a workflow for ALMA operations.

In the last few years Data science has emerged as a discipline of its own to address problems where data is usually heterogeneous, complex and abundant. In a nutshell, data science allows to provide answers to situations where a hypothesis can be formulated and later can be either confirmed or rejected following standard scientific methodology using data as raw material. Data science has been called differently depending of the domain (business intelligence, operational management, astroinformatics) and it has been recently in the center of a hype related to artificial intelligence and machine learning. It has been quickly adopted by the digital industry as the tool to distill information of massive operational data sets. Among the many tools data science requires (mathematics, statistics, domain knowledge of the data sets, …), IT infrastructure and software is by far the most visible and there is at present a whole ecosystem available as open source projects. The downside of this is data science is commonly confused with IT and software development, which creates conflicts between engineering- and scientific- mindsets, and leads to wrongly applying software development methodologies to it neglecting the experimental nature of the problem. In summary, creating the data lab becomes more important than answering questions with it. In the domain of ALMA operations, there are many instances that can be identified and described as data science cases or projects ranging from monitoring array elements to understand performances and predict faults for engineering operations to routine monitoring of calibrators for science operations purposes. We have identified already around 30 different initial questions (or data science cases) and found that several of them have been addressed through individual efforts. In parallel, several enabling platforms or frameworks have appear in the ecosystem that provides data scientists with both the “laboratory equipment” to conduct their “experiments” as well as enabling tools for collaboration, versioning control, and deploying results in production with a quick turnaround. This talk aims to summarize the results of our exploration to apply data science workflows to resolve ALMA operations issues, identify suitable platforms that are already in use by the industry, share our experience in addressing specific ALMA operations data cases, and discuss the technical and sociological challenges we encountered along the way.​​

Link to PDF (may not be available yet): O4-1.pdf