TerraTorch Model Zoo Overview

This guide provides a comprehensive overview of the Geospatial Foundation Models (GeoFMs) available in the TerraTorch toolkit. Each model represents different approaches to pre-training on Earth observation data, with varying architectures, data requirements, and downstream task performance.

Model Comparison Metrics

For consistency, we evaluate each model using these standardized metrics:

Architecture Type: Base neural network architecture (ResNet, ViT, Swin)
Parameter Count: Total trainable parameters
Pre-training Method: Self-supervised learning approach used
Input Resolution: Spatial resolution of training data
Spectral Bands: Number and type of input channels
Temporal Handling: How the model processes time-series data
Pre-training Dataset Size: Scale of training data
Patch Size: For ViT models, the size of image patches
Embedding Dimension: Size of learned representations

Contrastive Learning Models

MOCOv2

Paper: Momentum Contrast for Unsupervised Visual Representation Learning
Repository: Available through TerraTorch backbone registry

Description: MOCOv2 applies momentum-based contrastive learning to Sentinel-2 imagery, learning representations by maximizing agreement between different augmented views of the same scene across multiple seasons.

Standard Metrics: - Architecture Type: ResNet50 - Parameter Count: 25M - Pre-training Method: Momentum Contrastive Learning - Input Resolution: 10m (Sentinel-2) - Spectral Bands: 13 (Sentinel-2 MSI) - Temporal Handling: Multi-seasonal contrasts - Pre-training Dataset Size: 1M samples - Patch Size: N/A (CNN-based) - Embedding Dimension: 2048

DINO

Paper: Emerging Properties in Self-Supervised Vision Transformers
Repository: Integrated via TerraTorch

Description: DINO (self-DIstillation with NO labels) learns visual representations through self-distillation, adapted for Sentinel-2 imagery with multi-seasonal temporal patterns.

Standard Metrics: - Architecture Type: ResNet50 - Parameter Count: 25M - Pre-training Method: Self-Distillation - Input Resolution: 10m (Sentinel-2) - Spectral Bands: 13 (Sentinel-2 MSI) - Temporal Handling: Multi-seasonal processing - Pre-training Dataset Size: 1M samples - Patch Size: N/A (CNN-based) - Embedding Dimension: 2048

DeCUR

Paper: Decoupling Common and Unique Representations for Multimodal Self-Supervised Learning
Repository: Available in TerraTorch

Description: DeCUR jointly learns from Sentinel-1 (radar) and Sentinel-2 (optical) data by decoupling common and unique representations between modalities, enabling robust multi-modal Earth observation.

Standard Metrics: - Architecture Type: ResNet50 - Parameter Count: 25M - Pre-training Method: Multi-modal Contrastive Learning - Input Resolution: 10m - Spectral Bands: 13 (S2) + 2 (S1 VV/VH polarizations) - Temporal Handling: Single timestamp - Pre-training Dataset Size: 1M samples - Patch Size: N/A (CNN-based) - Embedding Dimension: 2048

Masked Autoencoding Models

ScaleMAE

Paper: Scale-Aware Masked Autoencoder for Multi-scale Geospatial Representation Learning
Repository: GitHub

Description: ScaleMAE introduces scale-aware positional encodings to handle the variable ground sampling distances in remote sensing, training on RGB imagery across multiple resolutions.

Standard Metrics: - Architecture Type: ViT-Large - Parameter Count: 300M - Pre-training Method: Masked Autoencoding with scale awareness - Input Resolution: 0.1m to 30m (variable) - Spectral Bands: 3 (RGB) - Temporal Handling: Single timestamp - Pre-training Dataset Size: 360k samples - Patch Size: 16x16 - Embedding Dimension: 1024

DOFA (Dynamic One-For-All)

Paper: Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities
Repository: Available through TerraTorch

Description: DOFA employs dynamic wavelength encoding to handle arbitrary combinations of spectral bands, making it adaptable to various Earth observation sensors without retraining.

Standard Metrics: - Architecture Type: ViT-Large - Parameter Count: 300M - Pre-training Method: Masked Autoencoding with dynamic encoding - Input Resolution: 1-30m (variable) - Spectral Bands: Dynamic (any combination) - Temporal Handling: Single timestamp - Pre-training Dataset Size: 8M samples - Patch Size: 16x16 - Embedding Dimension: 1024

Clay v1

Paper: Clay Foundation Model Technical Report
Repository: HuggingFace

Description: Clay combines masked autoencoding with DINO for self-supervised learning, incorporating location and temporal encodings alongside dynamic wavelength handling for comprehensive Earth observation.

Standard Metrics: - Architecture Type: ViT-Base - Parameter Count: 100M - Pre-training Method: MAE + DINO hybrid - Input Resolution: 1-500m (highly variable) - Spectral Bands: Dynamic (Sentinel-2, Landsat, NAIP) - Temporal Handling: Temporal position encodings - Pre-training Dataset Size: 70M samples - Patch Size: 8x8 - Embedding Dimension: 768

Prithvi-EO-1.0

Paper: Foundation Models for Generalist Geospatial Artificial Intelligence
Repository: HuggingFace

Description: Developed by IBM and NASA, Prithvi-EO-1.0 is trained on Harmonized Landsat-Sentinel (HLS) data with multi-temporal inputs for comprehensive Earth system understanding.

Standard Metrics: - Architecture Type: ViT-Base - Parameter Count: 100M - Pre-training Method: Masked Autoencoding - Input Resolution: 30m (HLS) - Spectral Bands: 6 (HLS bands) - Temporal Handling: Multi-temporal stacking (3 timestamps) - Pre-training Dataset Size: 250k samples - Patch Size: 16x16 - Embedding Dimension: 768

Prithvi-EO-2.0

Paper: Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model
Repository: HuggingFace

Description: The second generation of Prithvi models, offering both 300M and 600M parameter variants with enhanced temporal and location encodings for improved global Earth observation capabilities.

Standard Metrics: - Architecture Type: ViT-Large (300M) / ViT-Huge (600M) - Parameter Count: 300M / 600M - Pre-training Method: Masked Autoencoding with temporal encoding - Input Resolution: 30m (HLS) - Spectral Bands: 6 (HLS bands) - Temporal Handling: Enhanced multi-temporal (3+ timestamps) - Pre-training Dataset Size: 4.2M samples - Patch Size: 16x16 - Embedding Dimension: 1024 (300M) / 1280 (600M)

Multi-Task Supervised Models

Satlas

Paper: SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
Repository: GitHub

Description: Satlas uses supervised multi-task learning across various label types and resolutions, creating a generalist model for diverse remote sensing applications.

Standard Metrics: - Architecture Type: Swin Transformer - Parameter Count: 100M - Pre-training Method: Supervised Multi-task Learning - Input Resolution: ~10m (various sources) - Spectral Bands: Variable (RGB + multispectral) - Temporal Handling: Single timestamp - Pre-training Dataset Size: Not specified (labeled data) - Patch Size: 4x4 (Swin patches) - Embedding Dimension: 768

Model Selection Guide

Best for Temporal Analysis

Prithvi-EO-2.0: Enhanced temporal encodings
Prithvi-EO-1.0: Native multi-temporal support
MOCOv2/DINO: Multi-seasonal contrastive learning

Best for High-Resolution Tasks

ScaleMAE: Scale-aware design for variable resolutions
Satlas: Multi-resolution supervised training

Best for Limited Compute Resources

MOCOv2/DINO/DeCUR: 25M parameters (ResNet50)
Prithvi-EO-1.0: 100M parameters with proven efficiency
Clay v1: 100M parameters with 8x8 patches for detail

Best for Production Deployment

Prithvi-EO-2.0: Extensive validation and NASA/IBM support
Clay v1: Active development and community support
Satlas: Supervised training for predictable performance

Implementation Example

import terratorch
from terratorch.models import PrithviModelFactory

# Load a pre-trained model
model = PrithviModelFactory.build_model(
    backbone="prithvi_eo_v2_300m",
    decoder="upernet",
    num_classes=10,
    in_channels=6,
    bands=["B02", "B03", "B04", "B08", "B11", "B12"],
    num_frames=3
)

# Fine-tune on your dataset
trainer = terratorch.Trainer(
    model=model,
    task="semantic_segmentation",
    learning_rate=1e-4,
    batch_size=16
)



:::{#quarto-navigation-envelope .hidden}
[TerraTorch Model Zoo: Comprehensive Guide to Geospatial Foundation Models]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-sidebar-title"}
[GEOG 288KC]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar-title"}
[🏠 home]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:🏠 home"}
[/index.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/index.html"}
[📋 syllabus]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:📋 syllabus"}
[/Syllabus.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/Syllabus.html"}
[💻 weekly sessions]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:💻 weekly sessions"}
[Week 1 - 🚀 Core Tools and Data Access]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Week 1 - 🚀 Core Tools and Data Access"}
[/chapters/c01-geospatial-data-foundations.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/chapters/c01-geospatial-data-foundations.html"}
[Week 2 - ⚡ Rapid Remote Sensing Preprocessing]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Week 2 - ⚡ Rapid Remote Sensing Preprocessing"}
[/chapters/c02-spatial-temporal-attention-mechanisms.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/chapters/c02-spatial-temporal-attention-mechanisms.html"}
[Week 3a - 🌍 TerraTorch Foundations]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Week 3a - 🌍 TerraTorch Foundations"}
[/chapters/c03a-terratorch-foundations.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/chapters/c03a-terratorch-foundations.html"}
[Week 3b - 🤖 Machine Learning on Remote Sensing]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Week 3b - 🤖 Machine Learning on Remote Sensing"}
[/chapters/c03-complete-gfm-architecture.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/chapters/c03-complete-gfm-architecture.html"}
[Week 4 - 🏗️ Foundation Models in Practice]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Week 4 - 🏗️ Foundation Models in Practice"}
[/chapters/c04-pretraining-implementation.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/chapters/c04-pretraining-implementation.html"}
[Week 5 - 🔧 Fine-Tuning & Transfer Learning]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Week 5 - 🔧 Fine-Tuning & Transfer Learning"}
[/chapters/c05-training-loop-optimization.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/chapters/c05-training-loop-optimization.html"}
[Week 6 - ⏰ Spatiotemporal Modeling & Projects]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Week 6 - ⏰ Spatiotemporal Modeling & Projects"}
[/chapters/c06-model-evaluation-analysis.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/chapters/c06-model-evaluation-analysis.html"}
[👀 cheatsheets]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:👀 cheatsheets"}
[📋 All Cheatsheets]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:📋 All Cheatsheets"}
[/cheatsheets.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/cheatsheets.html"}
[---]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:---"}
[⚡ Quick Starts]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:⚡ Quick Starts"}
[Week 01: Import Guide]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Week 01: Import Guide"}
[/extras/cheatsheets/week01_imports.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/cheatsheets/week01_imports.html"}
[🧩 explainers]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:🧩 explainers"}
[1️⃣ Week 1]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:1️⃣ Week 1"}
[🤖 AI/ML/DL/FM Hierarchy]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:🤖 AI/ML/DL/FM Hierarchy"}
[/extras/ai-ml-dl-fm-hierarchy.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/ai-ml-dl-fm-hierarchy.html"}
[🎯 GFM Predictions (Standalone)]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:🎯 GFM Predictions (Standalone)"}
[/extras/geospatial-foundation-model-predictions-standalone.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/geospatial-foundation-model-predictions-standalone.html"}
[✅ Geospatial Task/Prediction Types]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:✅ Geospatial Task/Prediction Types"}
[/extras/geospatial-prediction-hierarchy.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/geospatial-prediction-hierarchy.html"}
[🧠 Neural Networks: Neurons to Transformers]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:🧠 Neural Networks: Neurons to Transformers"}
[/extras/neural_networks_explainer.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/neural_networks_explainer.html"}
[2️⃣ Week 2]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:2️⃣ Week 2"}
[🏗️ Foundation Model Architectures]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:🏗️ Foundation Model Architectures"}
[/chapters/c00a-foundation_model_architectures.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/chapters/c00a-foundation_model_architectures.html"}
[🎓 Introduction to Deep Learning Architecture]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:🎓 Introduction to Deep Learning Architecture"}
[/chapters/c00b-introduction-to-deeplearning-architecture.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/chapters/c00b-introduction-to-deeplearning-architecture.html"}
[📖 extras]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:📖 extras"}
[🎯 Practical Examples]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:🎯 Practical Examples"}
[Normalization Comparison]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Normalization Comparison"}
[/extras/examples/normalization_comparison.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/examples/normalization_comparison.html"}
[ResNet Implementation]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:ResNet Implementation"}
[/extras/examples/resnet.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/examples/resnet.html"}
[Segmentation Fine-Tuning]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Segmentation Fine-Tuning"}
[/extras/examples/segmentation_finetuning.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/examples/segmentation_finetuning.html"}
[Text Encoder]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Text Encoder"}
[/extras/examples/text_encoder.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/examples/text_encoder.html"}
[Tiling and Patches]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Tiling and Patches"}
[/extras/examples/tiling-and-patches.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/examples/tiling-and-patches.html"}
[TerraTorch Workflows]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:TerraTorch Workflows"}
[/extras/examples/terratorch_workflows.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/examples/terratorch_workflows.html"}
[📚 Reference Materials]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:📚 Reference Materials"}
[/extras/resources/course_resources.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/resources/course_resources.html"}
[📁 Project Templates]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:📁 Project Templates"}
[Project Proposal Template]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Project Proposal Template"}
[/extras/projects/project-proposal-template.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/projects/project-proposal-template.html"}
[Project Results Template]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:Project Results Template"}
[/extras/projects/mvp-template.html]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:/extras/projects/mvp-template.html"}
[https://github.com/gfms-from-scratch/gfms-from-scratch.github.io]{.hidden .quarto-markdown-envelope-contents render-id="quarto-int-navbar:https://github.com/gfms-from-scratch/gfms-from-scratch.github.io"}

:::{.hidden .quarto-markdown-envelope-contents render-id="footer-left"}
![Department of Geography logo](/images/geog-logo.png){.img-fluid width=250px}

:::


:::{.hidden .quarto-markdown-envelope-contents render-id="footer-right"}
This website is built with [](https://github.com/kcaylor/GEOG-288KC-geospatial-foundation-models) and [Quarto](https://quarto.org/)
:::

:::



:::{#quarto-meta-markdown .hidden}
[TerraTorch Model Zoo: Comprehensive Guide to Geospatial Foundation Models]{.hidden .quarto-markdown-envelope-contents render-id="quarto-metatitle"}
[TerraTorch Model Zoo: Comprehensive Guide to Geospatial Foundation Models]{.hidden .quarto-markdown-envelope-contents render-id="quarto-twittercardtitle"}
[TerraTorch Model Zoo: Comprehensive Guide to Geospatial Foundation Models]{.hidden .quarto-markdown-envelope-contents render-id="quarto-ogcardtitle"}
[]{.hidden .quarto-markdown-envelope-contents render-id="quarto-twittercarddesc"}
[]{.hidden .quarto-markdown-envelope-contents render-id="quarto-ogcardddesc"}
:::




<!-- -->

::: {.quarto-embedded-source-code}
```````````````````{.markdown shortcodes="false"}
---
title: "TerraTorch Model Zoo: Comprehensive Guide to Geospatial Foundation Models"
author: "GeoAI Course Materials"
date: "2025"
format: 
  html:
    toc: true
    toc-depth: 3
---

# TerraTorch Model Zoo Overview

This guide provides a comprehensive overview of the Geospatial Foundation Models (GeoFMs) available in the TerraTorch toolkit. Each model represents different approaches to pre-training on Earth observation data, with varying architectures, data requirements, and downstream task performance.

## Model Comparison Metrics

For consistency, we evaluate each model using these standardized metrics:

- **Architecture Type**: Base neural network architecture (ResNet, ViT, Swin)
- **Parameter Count**: Total trainable parameters
- **Pre-training Method**: Self-supervised learning approach used
- **Input Resolution**: Spatial resolution of training data
- **Spectral Bands**: Number and type of input channels
- **Temporal Handling**: How the model processes time-series data
- **Pre-training Dataset Size**: Scale of training data
- **Patch Size**: For ViT models, the size of image patches
- **Embedding Dimension**: Size of learned representations

---

## Contrastive Learning Models

### MOCOv2
**Paper**: [Momentum Contrast for Unsupervised Visual Representation Learning](https://arxiv.org/abs/1911.05722)  
**Repository**: Available through TerraTorch backbone registry

**Description**: MOCOv2 applies momentum-based contrastive learning to Sentinel-2 imagery, learning representations by maximizing agreement between different augmented views of the same scene across multiple seasons.

**Standard Metrics**:
- Architecture Type: ResNet50
- Parameter Count: 25M
- Pre-training Method: Momentum Contrastive Learning
- Input Resolution: 10m (Sentinel-2)
- Spectral Bands: 13 (Sentinel-2 MSI)
- Temporal Handling: Multi-seasonal contrasts
- Pre-training Dataset Size: 1M samples
- Patch Size: N/A (CNN-based)
- Embedding Dimension: 2048

### DINO
**Paper**: [Emerging Properties in Self-Supervised Vision Transformers](https://arxiv.org/abs/2104.14294)  
**Repository**: Integrated via TerraTorch

**Description**: DINO (self-DIstillation with NO labels) learns visual representations through self-distillation, adapted for Sentinel-2 imagery with multi-seasonal temporal patterns.

**Standard Metrics**:
- Architecture Type: ResNet50
- Parameter Count: 25M
- Pre-training Method: Self-Distillation
- Input Resolution: 10m (Sentinel-2)
- Spectral Bands: 13 (Sentinel-2 MSI)
- Temporal Handling: Multi-seasonal processing
- Pre-training Dataset Size: 1M samples
- Patch Size: N/A (CNN-based)
- Embedding Dimension: 2048

### DeCUR
**Paper**: [Decoupling Common and Unique Representations for Multimodal Self-Supervised Learning](https://arxiv.org/abs/2309.05300)  
**Repository**: Available in TerraTorch

**Description**: DeCUR jointly learns from Sentinel-1 (radar) and Sentinel-2 (optical) data by decoupling common and unique representations between modalities, enabling robust multi-modal Earth observation.

**Standard Metrics**:
- Architecture Type: ResNet50
- Parameter Count: 25M
- Pre-training Method: Multi-modal Contrastive Learning
- Input Resolution: 10m
- Spectral Bands: 13 (S2) + 2 (S1 VV/VH polarizations)
- Temporal Handling: Single timestamp
- Pre-training Dataset Size: 1M samples
- Patch Size: N/A (CNN-based)
- Embedding Dimension: 2048

---

## Masked Autoencoding Models

### ScaleMAE
**Paper**: [Scale-Aware Masked Autoencoder for Multi-scale Geospatial Representation Learning](https://arxiv.org/abs/2212.14532)  
**Repository**: [GitHub](https://github.com/bair-climate-initiative/scale-mae)

**Description**: ScaleMAE introduces scale-aware positional encodings to handle the variable ground sampling distances in remote sensing, training on RGB imagery across multiple resolutions.

**Standard Metrics**:
- Architecture Type: ViT-Large
- Parameter Count: 300M
- Pre-training Method: Masked Autoencoding with scale awareness
- Input Resolution: 0.1m to 30m (variable)
- Spectral Bands: 3 (RGB)
- Temporal Handling: Single timestamp
- Pre-training Dataset Size: 360k samples
- Patch Size: 16x16
- Embedding Dimension: 1024

### DOFA (Dynamic One-For-All)
**Paper**: [Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities](https://arxiv.org/abs/2403.15356)  
**Repository**: Available through TerraTorch

**Description**: DOFA employs dynamic wavelength encoding to handle arbitrary combinations of spectral bands, making it adaptable to various Earth observation sensors without retraining.

**Standard Metrics**:
- Architecture Type: ViT-Large
- Parameter Count: 300M
- Pre-training Method: Masked Autoencoding with dynamic encoding
- Input Resolution: 1-30m (variable)
- Spectral Bands: Dynamic (any combination)
- Temporal Handling: Single timestamp
- Pre-training Dataset Size: 8M samples
- Patch Size: 16x16
- Embedding Dimension: 1024

### Clay v1
**Paper**: [Clay Foundation Model Technical Report](https://arxiv.org/abs/2406.13030)  
**Repository**: [HuggingFace](https://huggingface.co/made-with-clay/Clay)

**Description**: Clay combines masked autoencoding with DINO for self-supervised learning, incorporating location and temporal encodings alongside dynamic wavelength handling for comprehensive Earth observation.

**Standard Metrics**:
- Architecture Type: ViT-Base
- Parameter Count: 100M
- Pre-training Method: MAE + DINO hybrid
- Input Resolution: 1-500m (highly variable)
- Spectral Bands: Dynamic (Sentinel-2, Landsat, NAIP)
- Temporal Handling: Temporal position encodings
- Pre-training Dataset Size: 70M samples
- Patch Size: 8x8
- Embedding Dimension: 768

### Prithvi-EO-1.0
**Paper**: [Foundation Models for Generalist Geospatial Artificial Intelligence](https://arxiv.org/abs/2310.18660)  
**Repository**: [HuggingFace](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M)

**Description**: Developed by IBM and NASA, Prithvi-EO-1.0 is trained on Harmonized Landsat-Sentinel (HLS) data with multi-temporal inputs for comprehensive Earth system understanding.

**Standard Metrics**:
- Architecture Type: ViT-Base
- Parameter Count: 100M
- Pre-training Method: Masked Autoencoding
- Input Resolution: 30m (HLS)
- Spectral Bands: 6 (HLS bands)
- Temporal Handling: Multi-temporal stacking (3 timestamps)
- Pre-training Dataset Size: 250k samples
- Patch Size: 16x16
- Embedding Dimension: 768

### Prithvi-EO-2.0
**Paper**: [Prithvi-EO-2.0: A Versatile Multi-Temporal Foundation Model](https://arxiv.org/abs/2412.02732)  
**Repository**: [HuggingFace](https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M)

**Description**: The second generation of Prithvi models, offering both 300M and 600M parameter variants with enhanced temporal and location encodings for improved global Earth observation capabilities.

**Standard Metrics**:
- Architecture Type: ViT-Large (300M) / ViT-Huge (600M)
- Parameter Count: 300M / 600M
- Pre-training Method: Masked Autoencoding with temporal encoding
- Input Resolution: 30m (HLS)
- Spectral Bands: 6 (HLS bands)
- Temporal Handling: Enhanced multi-temporal (3+ timestamps)
- Pre-training Dataset Size: 4.2M samples
- Patch Size: 16x16
- Embedding Dimension: 1024 (300M) / 1280 (600M)

---

## Multi-Task Supervised Models

### Satlas
**Paper**: [SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding](https://arxiv.org/abs/2211.15660)  
**Repository**: [GitHub](https://github.com/allenai/satlas)

**Description**: Satlas uses supervised multi-task learning across various label types and resolutions, creating a generalist model for diverse remote sensing applications.

**Standard Metrics**:
- Architecture Type: Swin Transformer
- Parameter Count: 100M
- Pre-training Method: Supervised Multi-task Learning
- Input Resolution: ~10m (various sources)
- Spectral Bands: Variable (RGB + multispectral)
- Temporal Handling: Single timestamp
- Pre-training Dataset Size: Not specified (labeled data)
- Patch Size: 4x4 (Swin patches)
- Embedding Dimension: 768

---

## Model Selection Guide

### Best for Multi-Modal Applications
- **DeCUR**: Optimized for combined SAR-optical analysis
- **Clay v1**: Flexible wavelength handling for diverse sensors
- **DOFA**: Dynamic adaptation to any spectral configuration

### Best for Temporal Analysis
- **Prithvi-EO-2.0**: Enhanced temporal encodings
- **Prithvi-EO-1.0**: Native multi-temporal support
- **MOCOv2/DINO**: Multi-seasonal contrastive learning

### Best for High-Resolution Tasks
- **ScaleMAE**: Scale-aware design for variable resolutions
- **Satlas**: Multi-resolution supervised training

### Best for Limited Compute Resources
- **MOCOv2/DINO/DeCUR**: 25M parameters (ResNet50)
- **Prithvi-EO-1.0**: 100M parameters with proven efficiency
- **Clay v1**: 100M parameters with 8x8 patches for detail

### Best for Production Deployment
- **Prithvi-EO-2.0**: Extensive validation and NASA/IBM support
- **Clay v1**: Active development and community support
- **Satlas**: Supervised training for predictable performance

---

## Implementation Example
```python
import terratorch
from terratorch.models import PrithviModelFactory

# Load a pre-trained model
model = PrithviModelFactory.build_model(
    backbone="prithvi_eo_v2_300m",
    decoder="upernet",
    num_classes=10,
    in_channels=6,
    bands=["B02", "B03", "B04", "B08", "B11", "B12"],
    num_frames=3
)

# Fine-tune on your dataset
trainer = terratorch.Trainer(
    model=model,
    task="semantic_segmentation",
    learning_rate=1e-4,
    batch_size=16
)

:::