The abstract "Wayang AgoraEO Plugin: The Framework for Scalable EO Workflows" by Rodrigo Pardo Meza, Jorge-Arnulfo Quiané-Ruiz, Begüm Demir, and Volker Markl has been accepted for Session ESSI1.9 – "GeoML-Ops: Frameworks & methods for automated geospatial machine-learning at scale on hybrid systems" at the EGU General Assembly 2023.
https://egu23.eu/
Title: Wayang AgoraEO Plugin: The Framework for Scalable EO Workflows
Authors: Rodrigo Pardo Meza, Jorge-Arnulfo Quiané-Ruiz, Begüm Demir, and Volker Markl.
Abstract: Currently, Earth Observation (EO) platforms provide datasets, algorithms, and processing
capabilities. Nevertheless, each platform proposes its own exclusive habitat to discover, process,
and run EO elements. We recently proposed AgoraEO [2], a decentralized, open, and unified
ecosystem, where users can find EO elements, compose cross-platform EO pipelines, and execute
them efficiently. With this ambition of supporting cross-platform federated analytics, Agora EO
relies on Apache Wayang [1] as its main analytical processing platform. Within AgoraEO, we are
developing and enabling Apache Wayang with EO features, exposing the internals of BigEarthNet
[2] to the Earth Observation community. Here we present our Wayang AgoraEO plugin that follows
the BigEarthNet workflow to achieve all its benefits in a scalable and parameterizable (reusable)
way. The Wayang AgoraEO plugin empowers users to create EO workflows, using any EO platform
in a simple way: using operators and an intuitive API that follows the behaviors of the EO
platforms it exploits. The execution of sub-tasks is controlled but isolated in any required data
processing system in tandem with the rest of the platform. In addition, one can fetch datasets
from several independent sources. By design, Apache Wayang works as a declarative framework
for ML: Users specify ML tasks at a high level, using the most convenient API to write a workflow
(Java-Scala, Python, and Postgres are supported). Wayang then models an ML task as a
mathematical optimization problem and uses its gradient descent-based optimizer to invoke the
appropriate physical algorithms and system configurations to execute a given ML task. Therefore,
decoupling user specification of ML tasks from its execution. We believe the Wayang AgoraEO
plugin can be a game changer in the tedious task of implementing and deploying EO workflows
within EO platforms today: It makes it easy to reuse resources and share them. Likewise, it is an
easily extensible solution to include new operators that can include new EO platforms and tasks.
As a result, this solution can be a great leap in the democratization of EO technologies,
contributing to their integration, scalability, and access to high-performance computing.
References
[1] S. Kruse, Z. Kaoudi, J. -A. Quiane-Ruiz, S. Chawla, F. Naumann and B. Contreras-Rojas,
"Optimizing Cross-Platform Data Movement," IEEE 35th International Conference on Data
Engineering, 2019, pp. 1642-1645.
[2] A. Wall, B. Deiseroth, E. Tzirita Zacharatou, J-A, Quiané-Ruiz, B. Demir, V. Markl, "AGORA-EO: A
Unified Ecosystem for Earth Observation - A Vision For Boosting EO Data Literacy," Big Data from
Space Conference, 2021.