Satellogic open-source release: A large dataset of high-resolution imagery for AI model training

By Emiliano Kargieman, CEO & Founder

Pioneering Foundation Models for Earth Observation with Open Data

In a previous post, I shared the vision of how interacting with large Earth Observation (EO) AI models with access to high-resolution, real-time imagery of our planet will fundamentally change the way we derive insights of what happens on Earth. Today, we’d like to announce the release of a large open dataset of high-resolution imagery curated from our catalog, to support training of AI models.

The dataset contains around 3 million 384 m by 384 m Satellogic images of unique locations (6 million images, including location revisits) from around the world – totaling 900 Gigapixels spanning different land-use types, objects, geographies, and seasons. Satellogic data is released under a Creative Commons CC-BY 4.0 license, allowing for commercial use of the data with attribution.

The full dataset can now be accessed on Hugging Face as part of EarthView. In order to train EO foundation models it is crucial to use datasets that include diverse sources, sensors, and scales. EarthView combines Satellogic’s newly released archive data at 1-meter spatial resolution with data from Neon, Sentinel1, and Sentinel2 (for model training and evaluation purposes).

Sample Satellogic data

Satellogic data included in this dataset is a subset of our entire Mark IV imagery archive. Satellogic has already collected over 70M km²of 1-meter resolution imagery with its Mark IV satellites since 2020 – data is available in a STAC-compliant archive for our customers in its native resolution, or as a 70 cm super-resolved product. STAC tiles are 2 km by 2 km, with red, green, blue, and near-infrared bands, and key capture metadata.

Geographic distribution of Satellogic’s Mark IV archive

Soon, a paper presenting the EarthView dataset will be published along with the release of a baseline foundation model, a masked autoencoder (scalable self-supervised learners for computer vision), built on top of it. The paper describes how we built the full dataset, the model architecture and experimental setup. This work is the result of Satellogic’s collaboration with an exceptional team of researchers led by Alexandre Lacoste at ServiceNow under Yoshua Bengio’s guidance.

Spatial coverage for each source in the EarthView dataset. Note that a colored area may contain more than one source.

We are excited by the potential of multimodal foundation models applied to Earth Observation, and hope that by making this data available for training we can accelerate both open-source and commercial developments in this space. We expect to help open a path for research on models trained on large datasets that many will follow. We’ll have much more to share soon. Stay tuned!

Access the Open Dataset

Emiliano Kargieman

CEO & Founder

Emiliano Kargieman serves as Chief Executive Officer at Satellogic. He has spent the last 25+ years building technology and technology companies. In 2010, after spending some time at the NASA Ames Campus in Mountain View, he started developing the concepts that would become Satellogic. Prior to co-founding the firm, Mr. Kargieman co-founded Core Security Technologies, […]

View Full Bio

Satellogic open-source release: A large dataset of high-resolution imagery for AI model training

Related Posts

Pushing Intelligence to the Edge: Satellogic’s Vision for AI-Powered Earth Observation

Revolutionizing Earth Observation: How AI is Transforming Geospatial Intelligence

Unlocking New Possibilities with Mark V Satellite Imagery

Satellogic’s Mark V Satellite: Unit Economics to Enable Foundation Models for Earth Observation

Satellogic open-source release: A large dataset of high-resolution imagery for AI model training

Share

Share

Related Posts