AI & ML interests

Earth Observation Datasets

Recent Activity

mikonvergence 
posted an update 2 days ago
view post
Post
97
🥕 Introducing BetaEarth - your own Earth embedding emulator [𝐏𝐫𝐞-𝐑𝐞𝐥𝐞𝐚𝐬𝐞]

The past year has brought many notable embedding products, like AlphaEarth, TESSERA or OlmoEarth. We are entering a phase where embeddings begin to act as a substitute for real observation data.

BetaEarth is an attempt to explore how much one can learn from a model based on its embeddings alone, and whether those embeddings can serve as a useful training target for other models. Huge credit to the AlphaEarth team for releasing the embedding archive openly — it's what made this kind of community-built extension possible.

[𝐁𝐞𝐭𝐚𝐄𝐚𝐫𝐭𝐡 𝐢𝐬 𝐧𝐨𝐭 𝐚 𝐟𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐦𝐨𝐝𝐞𝐥 𝐛𝐮𝐭 𝐢𝐭 𝐭𝐫𝐢𝐞𝐬 𝐢𝐭𝐬 𝐛𝐞𝐬𝐭]

BetaEarth is a flexible (and relatively lightweight) emulator of the AlphaEarth annual product. It doesn't reproduce AlphaEarth's exact outputs, nor the product, but it reaches ~0.87 cosine similarity on held-out data and retains 97% of downstream land-cover classification accuracy. It only took 1-2 days to train.

It can encode any combination (including multi-temporal) of:
- Sentinel-2 L1C
- Sentinel-2 L2A
- Sentinel-1 RTC
- COP-DEM 30 product

The model weights are open, just like its training data (built exclusively using Major TOM). The GitHub repository provides a script for automated generation of embeddings across any footprint.
You can also try the workflow over small bounding boxes on the free Hugging Face web app!

⚙️ GitHub: https://github.com/asterisk-labs/beta-earth
🖥️ Web App: asterisk-labs/betaearth
🏭 Models: https://huggingface.co/collections/asterisk-labs/beta-earth
🟨 Colab: https://colab.research.google.com/github/asterisk-labs/beta-earth/blob/main/examples/generate_demo.ipynb
🗞️ Pre-print: https://github.com/asterisk-labs/beta-earth/blob/main/docs/beta_earth_preprint.pdf
Nymbo 
posted an update about 1 month ago
view post
Post
6742
We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.
  • 3 replies
·