Home Knowledge Base Start Here Curated reading ramps Paths Routes by role, mission, time Sensors How instruments observe reality Earth Obs Interpreting what sensors see Data Formats, processing, and tools Philosophy Why any of this matters Docs Product documentation

Analysis-Ready Data

The concept that satellite imagery should arrive ready for science, not ready for more preprocessing

DAT-004

Analysis-Ready Data (ARD) is satellite imagery that has been processed to a standard where it can be used directly for analysis without additional preprocessing. This means the image has been geometrically corrected (pixels are in the right geographic locations), radiometrically calibrated (pixel values represent meaningful physical quantities like surface reflectance rather than arbitrary digital numbers), atmospherically corrected (the atmosphere's distortion has been removed), and often cloud-masked (unusable pixels are flagged). ARD is the difference between receiving raw ingredients and receiving a prepared, measured, recipe-ready mise en place.

Why It Matters

The vast majority of time spent working with satellite data is not spent doing analysis. It is spent getting the data ready for analysis.

A remote sensing analyst who wants to study vegetation health across a region over five years using Sentinel-2 imagery faces this sequence before any actual science begins: download the scenes, check cloud cover, apply atmospheric correction to convert top-of-atmosphere reflectance to surface reflectance, apply geometric correction to align pixels to a coordinate reference system, resample to a common grid, apply cloud and shadow masks, verify radiometric consistency across scenes, and handle any data gaps. For a single scene, this takes minutes to hours depending on tooling. For a time series across a large area, it can take days to weeks.

This preprocessing burden is the single largest barrier to wider use of earth observation data. Not the cost of the data — much of it is free through programs like Copernicus. Not the complexity of the analysis — vegetation indices are straightforward arithmetic on spectral bands. The barrier is the gap between what the data provider delivers and what the analyst needs.

ARD closes this gap. When data is delivered as ARD, the analyst receives surface reflectance values in a known projection with quality flags already applied. They can start computing NDVI immediately. They can compare scenes from different dates because the radiometry is consistent. They can composite across sensors because the data has been harmonized to a common standard.

The Committee on Earth Observation Satellites (CEOS) formalized ARD requirements in their ARD for Land (CARD4L) framework, defining minimum specifications for what constitutes analysis-ready optical and radar data. These specifications cover geometric accuracy, radiometric consistency, atmospheric correction quality, and metadata completeness.

Processing Levels

Satellite data is conventionally described in processing levels that indicate how much correction has been applied:

Level 0 — Raw instrument data. Unprocessed digital numbers as recorded by the sensor. Useful only for specialized calibration work.

Level 1A — Reconstructed, unprocessed instrument data at full resolution with radiometric and geometric coefficients appended but not applied. The data is organized but uncorrected.

Level 1B / Level 1C — Radiometric calibration applied. For Sentinel-2, Level-1C is top-of-atmosphere (TOA) reflectance: the sensor's measurement includes both the surface signal and the atmosphere's contribution. Pixels are orthorectified (geometrically corrected using a digital elevation model).

Level 2A — Surface reflectance. Atmospheric correction has been applied to remove the atmosphere's contribution, yielding an estimate of what the surface actually reflects. For Sentinel-2, this is produced by the Sen2Cor processor. Cloud and shadow masks are included. This is the starting point for most ARD products.

Level 3 — Temporally composited or spatially mosaicked products. Monthly composites, seasonal averages, gap-filled time series. These aggregate multiple Level-2 scenes into summary products.

Level 4 — Derived geophysical variables. Not reflectance but interpreted products: vegetation indices, land cover maps, burned area, water extent. These are the outputs of analysis, not its input.

ARD typically corresponds to Level 2A: surface reflectance with quality flags, in a known projection, ready for direct use. Some definitions extend ARD to include temporal compositing (Level 3), but the core requirement is that the data represents a physically meaningful quantity at the Earth's surface with documented uncertainty.

What Makes Data "Analysis Ready"

CEOS CARD4L defines specific requirements across several dimensions:

Geometric Accuracy

Pixels must be located correctly on the Earth's surface. This requires orthorectification using a digital elevation model and ground control points. Sub-pixel accuracy (better than one pixel's width) is the standard for most ARD products. Without geometric accuracy, time series analysis is meaningless — you would be comparing different patches of ground across dates.

Radiometric Consistency

Pixel values must represent a consistent physical quantity. For optical data, this means surface reflectance: the fraction of incoming sunlight reflected by the surface, with the atmosphere removed. For SAR, this means calibrated backscatter coefficients (sigma nought, gamma nought) that account for incidence angle and terrain effects.

Radiometric consistency across time is essential for change detection. If two scenes of the same unchanged area produce different pixel values because of different atmospheric conditions or sensor calibration drift, any change detection algorithm will produce false positives.

Atmospheric Correction

The atmosphere absorbs and scatters sunlight between the sun, the surface, and the sensor. Aerosols, water vapor, and atmospheric gases all contribute. Atmospheric correction models (such as Sen2Cor for Sentinel-2, LaSRC for Landsat, or 6S radiative transfer code) estimate and remove this atmospheric contribution to recover the surface signal.

This is not a trivial step. Atmospheric correction can change pixel values by 10-30% depending on conditions, and the accuracy of the correction depends on the quality of the atmospheric model and auxiliary data (aerosol optical depth, water vapor content, ozone concentration). Poor atmospheric correction propagates errors into every downstream analysis.

Cloud and Quality Masking

Clouds, cloud shadows, cirrus, snow, and other obstructions must be identified and flagged. These pixels are not usable for surface analysis and must be excluded. Cloud masking algorithms (such as Fmask, s2cloudless, or the Sentinel-2 Scene Classification Layer) are imperfect — they miss thin cirrus, misclassify bright surfaces, and struggle with cloud edges.

ARD includes quality assessment bands or layers that encode per-pixel confidence in the classification. This allows analysts to choose their quality threshold rather than relying on a binary mask.

Metadata Completeness

Every ARD product must include sufficient metadata to understand what it represents and how it was produced. This includes: sensor identification, acquisition date and time, processing chain description, quality assessment information, coordinate reference system, and per-pixel uncertainty where available.

Without metadata, data is orphaned. It cannot be trusted, combined with other data, or traced back to its source when problems arise.

The Harmonization Challenge

ARD solves the preprocessing problem for individual sensors. The harder problem is harmonization across sensors.

Sentinel-2 and Landsat 8/9 both produce multispectral imagery of the Earth's surface, but their bands are not identical. Sentinel-2's Band 4 (Red, 665nm center) is not the same as Landsat 8's Band 4 (Red, 655nm center). Their spatial resolutions differ (10m vs 30m), their spectral response functions differ, and their radiometric characteristics differ.

Computing NDVI from Sentinel-2 and NDVI from Landsat for the same location will produce different values — not because the vegetation changed, but because the sensors measure slightly different things. For time series that span both sensors, or for analyses that require combining them for temporal density, this inconsistency is a fundamental problem.

Harmonization goes beyond ARD. It is the process of making data from different sensors comparable: adjusting for spectral response differences, resampling to common grids, cross-calibrating radiometry, and documenting every adjustment. This is the core problem Fabric was built to solve.

esc
No results for “
Searching…