What they are, how to access them, and practical usage patterns
What is GEO-Bench?
GEO-Bench is a curated benchmark suite for evaluating geospatial foundation models and related methods across diverse Earth observation tasks. It provides standardized data splits, consistent preprocessing, and a simple Python API for loading tasks as PyTorch-ready datasets. See the paper: GEO-Bench: Toward Foundation Models for Earth Monitoring.
Why it matters
Comparability: Common splits and metrics across datasets and tasks
Breadth: Classification, segmentation, and other downstream tasks
Relevance: Real-world sensors (e.g., Sentinel-2, Landsat) across many geographies
Task suites in GEO-Bench
GEO-Bench groups tasks into benchmark suites. Common suites include:
classification_v1.0: Scene/patch classification tasks drawn from multiple sources
segmentation_v1.0: Pixel-wise land cover/semantic segmentation tasks
(Some releases include additional tracks; consult the repository for the current list.)
Each suite consists of multiple tasks. A βtaskβ defines a dataset, its split protocol, input bands, and target type.
Install and download
GEO-Bench provides a pip package and a CLI for data download. The full suite can be large; ensure sufficient disk space.
#| eval: falsepip install geobench# optional: choose where data are storedexportGEO_BENCH_DIR="$HOME/datasets/geobench"# download selected benchmark(s); will prompt/stream progressgeobench-download
Notes: - If GEO_BENCH_DIR is not set, GEO-Bench defaults to $HOME/dataset/geobench/ (as configured by the package). - Download sizes can exceed ~65 GB for full coverage.
Loading tasks in Python
The geobench package exposes an iterator over tasks in a benchmark suite. Each task can yield a PyTorch-style dataset per split.
import os# Try to import geobench, but don't fail the notebook if unavailabletry:import geobench # type: ignoreprint("geobench available:", getattr(geobench, "__version__", "unknown"))exceptExceptionas e: geobench =Noneprint("geobench not usable in this environment:", e)train_ds =Noneif geobench isnotNone:try: data_dir = os.environ.get("GEO_BENCH_DIR")ifnot data_dir ornot os.path.exists(data_dir):print("GEO_BENCH_DIR not set or path missing; skipping dataset load")else:# List tasks and take the first one as an example iterator = geobench.task_iterator(benchmark_name="classification_v1.0") first_task =Nonefor task in iterator:print("Task:", task.name, "| Splits:", task.splits)if first_task isNone: first_task = taskif first_task isnotNone: train_ds = first_task.get_dataset(split="train") sample = train_ds[0]print("Loaded dataset:", type(train_ds).__name__)print("Num bands:", len(getattr(sample, "bands", [])))else:print("No tasks found in this benchmark; check your local data")exceptExceptionas e:print("Failed to enumerate/load GEO-Bench tasks:", e)
geobench available: 0.0.3
GEO_BENCH_DIR not set or path missing; skipping dataset load
Wrapping for model training
Convert GEO-Bench samples to channels-first tensors and pair with labels/masks for PyTorch training.
from types import SimpleNamespacefrom torch.utils.data import DataLoaderimport numpy as npimport torchdef to_chw_tensor(sample):# Stack per-band arrays into [C, H, W] band_arrays = [torch.from_numpy(band.data).float() for band in sample.bands] x = torch.stack(band_arrays, dim=0)# Normalize per band (simple min-max as example) x_min = x.amin(dim=(1,2), keepdim=True) x_max = x.amax(dim=(1,2), keepdim=True) x = (x - x_min) / torch.clamp(x_max - x_min, min=1e-6)return xdef collate_classification(batch): xs = [to_chw_tensor(s) for s in batch] ys = [torch.tensor(s.label, dtype=torch.long) for s in batch]return {"image": torch.stack(xs), "label": torch.stack(ys)}def make_synthetic_samples(num=8, bands=3, size=64): rng = np.random.default_rng(0) samples = []for i inrange(num): arrays = [rng.random((size, size), dtype=np.float32) for _ inrange(bands)] band_objs = [SimpleNamespace(data=a, band_info=SimpleNamespace(name=f"B{j}")) for j, a inenumerate(arrays)] label =int(i %4) samples.append(SimpleNamespace(bands=band_objs, label=label))return samplesif'train_ds'inglobals() and train_ds isnotNone: loader = DataLoader(train_ds, batch_size=8, shuffle=True, collate_fn=collate_classification) batch =next(iter(loader))print("From GEO-Bench β", batch["image"].shape, batch["label"].shape)else:# Fallback: demonstrate the collate with synthetic samples so this cell always runs synthetic = make_synthetic_samples() batch = collate_classification(synthetic)print("Synthetic demo β", batch["image"].shape, batch["label"].shape)
For segmentation tasks, use a different collate that returns mask or target tensors:
from types import SimpleNamespaceimport numpy as npimport torchdef collate_segmentation(batch): xs = [to_chw_tensor(s) for s in batch] ys = [torch.from_numpy(s.mask).long() for s in batch] # H x Wreturn {"image": torch.stack(xs), "mask": torch.stack(ys)}# Minimal runnable demo with synthetic masks (does not depend on geobench)class _SegSample(SimpleNamespace):passdef _make_synthetic_segmentation(num=4, bands=3, size=32, classes=5): rng = np.random.default_rng(1) samples = []for i inrange(num): arrays = [rng.random((size, size), dtype=np.float32) for _ inrange(bands)] band_objs = [SimpleNamespace(data=a, band_info=SimpleNamespace(name=f"B{j}")) for j, a inenumerate(arrays)] mask = rng.integers(0, classes, size=(size, size), dtype=np.int64) samples.append(_SegSample(bands=band_objs, mask=mask))return samplesseg_batch = collate_segmentation(_make_synthetic_segmentation())print("Synthetic segmentation demo β", seg_batch["image"].shape, seg_batch["mask"].shape)
Practical tips
Storage layout: After geobench-download, verify GEO_BENCH_DIR contains the benchmark folders (e.g., classification_v1.0/, segmentation_v1.0/). The Python API will find them automatically.
Band handling: Different tasks expose different sensors and band sets. Always inspect task.input_bands and adapt normalization/ordering accordingly.
Train/val/test: Use task.get_dataset(split="train"|"val"|"test") to obtain consistent splits. Do not reshuffle unless explicitly allowed.
Transforms: Wrap datasets with on-the-fly augmentations for training. Keep evaluation preprocessing deterministic.
Reproducibility: Fix random seeds in your training loop and log the exact benchmark_name and task list used.
How GEO-Bench fits our course
Weeks 6β9 use standardized benchmarks for evaluation, fine-tuning, and deployment demos
You can swap a custom dataset for a GEO-Bench task to compare against baselines quickly
---title: "GEO-Bench Datasets"subtitle: "What they are, how to access them, and practical usage patterns"jupyter: geoaiformat: html: code-fold: false---## What is GEO-Bench?GEO-Bench is a curated benchmark suite for evaluating geospatial foundation models and related methods across diverse Earth observation tasks. It provides standardized data splits, consistent preprocessing, and a simple Python API for loading tasks as PyTorch-ready datasets. See the paper: [GEO-Bench: Toward Foundation Models for Earth Monitoring](https://arxiv.org/abs/2306.03831).### Why it matters- **Comparability**: Common splits and metrics across datasets and tasks- **Breadth**: Classification, segmentation, and other downstream tasks- **Relevance**: Real-world sensors (e.g., Sentinel-2, Landsat) across many geographies## Task suites in GEO-BenchGEO-Bench groups tasks into benchmark suites. Common suites include:- **classification_v1.0**: Scene/patch classification tasks drawn from multiple sources- **segmentation_v1.0**: Pixel-wise land cover/semantic segmentation tasks- (Some releases include additional tracks; consult the repository for the current list.)Each suite consists of multiple tasks. A "task" defines a dataset, its split protocol, input bands, and target type.## Install and downloadGEO-Bench provides a pip package and a CLI for data download. The full suite can be large; ensure sufficient disk space.```bash#| eval: falsepip install geobench# optional: choose where data are storedexportGEO_BENCH_DIR="$HOME/datasets/geobench"# download selected benchmark(s); will prompt/stream progressgeobench-download```Notes:- If `GEO_BENCH_DIR` is not set, GEO-Bench defaults to `$HOME/dataset/geobench/` (as configured by the package).- Download sizes can exceed ~65 GB for full coverage.## Loading tasks in PythonThe `geobench` package exposes an iterator over tasks in a benchmark suite. Each task can yield a PyTorch-style dataset per split.```{python}import os# Try to import geobench, but don't fail the notebook if unavailabletry:import geobench # type: ignoreprint("geobench available:", getattr(geobench, "__version__", "unknown"))exceptExceptionas e: geobench =Noneprint("geobench not usable in this environment:", e)train_ds =Noneif geobench isnotNone:try: data_dir = os.environ.get("GEO_BENCH_DIR")ifnot data_dir ornot os.path.exists(data_dir):print("GEO_BENCH_DIR not set or path missing; skipping dataset load")else:# List tasks and take the first one as an example iterator = geobench.task_iterator(benchmark_name="classification_v1.0") first_task =Nonefor task in iterator:print("Task:", task.name, "| Splits:", task.splits)if first_task isNone: first_task = taskif first_task isnotNone: train_ds = first_task.get_dataset(split="train") sample = train_ds[0]print("Loaded dataset:", type(train_ds).__name__)print("Num bands:", len(getattr(sample, "bands", [])))else:print("No tasks found in this benchmark; check your local data")exceptExceptionas e:print("Failed to enumerate/load GEO-Bench tasks:", e)```### Wrapping for model trainingConvert GEO-Bench samples to channels-first tensors and pair with labels/masks for PyTorch training.```pythonfrom types import SimpleNamespacefrom torch.utils.data import DataLoaderimport numpy as npimport torchdef to_chw_tensor(sample):# Stack per-band arrays into [C, H, W] band_arrays = [torch.from_numpy(band.data).float() for band in sample.bands] x = torch.stack(band_arrays, dim=0)# Normalize per band (simple min-max as example) x_min = x.amin(dim=(1,2), keepdim=True) x_max = x.amax(dim=(1,2), keepdim=True) x = (x - x_min) / torch.clamp(x_max - x_min, min=1e-6)return xdef collate_classification(batch): xs = [to_chw_tensor(s) for s in batch] ys = [torch.tensor(s.label, dtype=torch.long) for s in batch]return {"image": torch.stack(xs), "label": torch.stack(ys)}def make_synthetic_samples(num=8, bands=3, size=64): rng = np.random.default_rng(0) samples = []for i inrange(num): arrays = [rng.random((size, size), dtype=np.float32) for _ inrange(bands)] band_objs = [SimpleNamespace(data=a, band_info=SimpleNamespace(name=f"B{j}")) for j, a inenumerate(arrays)] label =int(i %4) samples.append(SimpleNamespace(bands=band_objs, label=label))return samplesif'train_ds'inglobals() and train_ds isnotNone: loader = DataLoader(train_ds, batch_size=8, shuffle=True, collate_fn=collate_classification) batch =next(iter(loader))print("From GEO-Bench β", batch["image"].shape, batch["label"].shape)else:# Fallback: demonstrate the collate with synthetic samples so this cell always runs synthetic = make_synthetic_samples() batch = collate_classification(synthetic)print("Synthetic demo β", batch["image"].shape, batch["label"].shape)```For segmentation tasks, use a different collate that returns `mask` or `target` tensors:```pythonfrom types import SimpleNamespaceimport numpy as npimport torchdef collate_segmentation(batch): xs = [to_chw_tensor(s) for s in batch] ys = [torch.from_numpy(s.mask).long() for s in batch] # H x Wreturn {"image": torch.stack(xs), "mask": torch.stack(ys)}# Minimal runnable demo with synthetic masks (does not depend on geobench)class _SegSample(SimpleNamespace):passdef _make_synthetic_segmentation(num=4, bands=3, size=32, classes=5): rng = np.random.default_rng(1) samples = []for i inrange(num): arrays = [rng.random((size, size), dtype=np.float32) for _ inrange(bands)] band_objs = [SimpleNamespace(data=a, band_info=SimpleNamespace(name=f"B{j}")) for j, a inenumerate(arrays)] mask = rng.integers(0, classes, size=(size, size), dtype=np.int64) samples.append(_SegSample(bands=band_objs, mask=mask))return samplesseg_batch = collate_segmentation(_make_synthetic_segmentation())print("Synthetic segmentation demo β", seg_batch["image"].shape, seg_batch["mask"].shape)```## Practical tips- **Storage layout**: After `geobench-download`, verify `GEO_BENCH_DIR` contains the benchmark folders (e.g., `classification_v1.0/`, `segmentation_v1.0/`). The Python API will find them automatically.- **Band handling**: Different tasks expose different sensors and band sets. Always inspect `task.input_bands` and adapt normalization/ordering accordingly.- **Train/val/test**: Use `task.get_dataset(split="train"|"val"|"test")` to obtain consistent splits. Do not reshuffle unless explicitly allowed.- **Transforms**: Wrap datasets with on-the-fly augmentations for training. Keep evaluation preprocessing deterministic.- **Reproducibility**: Fix random seeds in your training loop and log the exact `benchmark_name` and task list used.## How GEO-Bench fits our course- Weeks 6β9 use standardized benchmarks for evaluation, fine-tuning, and deployment demos- You can swap a custom dataset for a GEO-Bench task to compare against baselines quickly## References- GitHub repository: `https://github.com/ServiceNow/geo-bench`- Paper: `https://arxiv.org/abs/2306.03831`