AIREO Presentation at the European Geophysical Union Assembly, 2021
The work of the AIREO project on producing Training Datasets was presented at the EGU online in April, 2021. Alastair McKinstry presented on behalf of the AIREO team, on “AI-Ready Training Datasets for Earth Observation: Enabling FAIR data principles for EO training data”. He showed how there is a lack of suitable training datasets for Earth observation, due to limited availability, limited interoperability and the lack of standards and guides on how to generate and describe a Training Data Set.
The EGU General Assembly was online in 2021, and the event was well-attended with breakout question and answer sessions. There were 10 presentations on various topics on creating EO Training datasets, from automated set generation, to scaling challenges on new algorithmic approaches.
The AIREO project was previously presented in public to the AIREO Network to capture community requirements. Presenting work done since then, he described the pilot datasets on Biomass retrieval, sea ice, Common cultural Policy and Urban development data. The work included Specification development, best practice guidelines, library and notebook development.
The AIREO specification is built on STAC and provides STAC extensions to give cloud-native metadata and datasets. It enables users to quickly build and benchmark machine Learning models on Earth Observation Training datasets.
New innovations in the project include Quality Assurance automation in the library, with a concept of conformance levels for metadata quality and provenance; full data traceability and compliance to FAIR (Findability, Access, Interoperability and Reusability) standards. The ability to embed “feature engineering recipes” in the training datasets will enable the Automated recreation of the TDS’s.
The Specification and best practice guides, along with pilot datasets, are to be made public in June 2021.