Chain of Custody in Multi-Sensor Fusion

A single satellite image has a straightforward history. It was captured by a specific sensor, at a specific time, from a specific orbit. It was downlinked, processed, corrected, and delivered. The chain is linear. One thing happened, then the next thing happened, then the next. You can write it down as a list.

The interesting work in Earth observation almost never uses a single image.

A forest health assessment might begin with Sentinel-2 optical imagery to measure vegetation indices, add Sentinel-1 SAR data to detect structural changes beneath the canopy, incorporate a digital elevation model to account for terrain effects on the signal, and pull in climate data from ground stations to contextualise what the satellites observed. Each of those inputs has its own provenance — its own capture time, processing history, coordinate system, and chain of custody. The assessment itself is a fifth thing, born from the convergence of four independent lineages.

The European Forest Fire Information System (EFFIS) and the Copernicus Emergency Management Service (CEMS) both operationally fuse Sentinel-1 and Sentinel-2 for vegetation monitoring and damage assessment. Their processing chains are among the better-documented fusion workflows in the public sector. See CEMS Mapping Portfolio.

This is where most provenance systems break.

Chains, Trees, and Graphs

The language we use for provenance is revealing. "Chain of custody" implies a sequence — A handed to B, B handed to C, C handed to you. This works when you are tracking a single object through time. A painting. A piece of evidence in a criminal case. A dataset from a single sensor.

But fusion is not sequential. It is convergent. Multiple independent streams arrive at a single operation, and that operation produces something that did not exist before — a product whose properties are derived from all of its parents but identical to none of them.

The provenance of a fused product is not a chain. It is a directed acyclic graph. Each input has its own subgraph of processing history. The fusion operation is a node where those subgraphs converge. And the output inherits the full history of every branch.

This distinction matters because the tools and standards that most of the geospatial industry uses for provenance were designed for chains. Metadata schemas that store lineage as a text description of processing steps. Pipeline logs that record a sequence of operations. STAC items with a single derived_from link pointing to one parent asset. These work fine when the history is linear. They fall apart when it branches and merges.

STAC's Processing Extension supports fields like processing:software, processing:lineage, and processing:level, but adoption is inconsistent. A survey of public STAC catalogs shows that most items populate processing:level (e.g., "L2A") while leaving processing:software and processing:lineage empty. The extension exists; the convention of filling it does not.

The Ancestry Problem

Consider a concrete example. You are building a flood extent map for an emergency response agency. Your inputs are:

A Sentinel-1 SAR scene, captured six hours after the flood event. It has been radiometrically calibrated, terrain-corrected using a DEM, and converted to backscatter coefficients in decibels.

A pre-event Sentinel-1 SAR scene from twelve days earlier, processed through the same pipeline, to serve as the baseline for change detection.

A Copernicus DEM tile at 30-metre resolution, used both for the terrain correction of the SAR scenes and as an independent input to the flood model for identifying low-lying areas.

A land cover classification derived from Sentinel-2 optical imagery, used to mask out permanent water bodies and distinguish flood water from lakes and reservoirs.

Now. The DEM appears twice in this graph — once as an input to the SAR terrain correction and once as a direct input to the flood modelling. The pre-event SAR scene shares a processing lineage with the post-event scene but diverges at the point of acquisition. The land cover map has its own entire provenance tree rooted in a different satellite, different spectral bands, different processing algorithms.

The Copernicus DEM (GLO-30) is derived from the WorldDEM product, itself generated from TanDEM-X interferometric SAR. Its own provenance chain — from raw X-band SAR to calibrated elevation model — adds another subgraph beneath the node. See Copernicus DEM documentation.

The flood extent map you deliver to the emergency responder is a descendant of all of these. If anyone asks "where did this map come from?" the honest answer is a graph with at least four root nodes, multiple shared intermediate nodes, and a set of fusion operations that each need their own documentation.

Most systems would record this as: "Flood extent derived from Sentinel-1 SAR and ancillary data." That is technically true and practically useless.

What Gets Lost at the Junction

The fusion operation itself is where provenance most often goes missing. The individual inputs may have decent metadata. The output may have a description of what it represents. But the operation that combined them — the specific algorithm, its parameters, how it handled conflicts between sources, what it did when spatial resolutions did not match, how it weighted different inputs — is frequently undocumented or described only in general terms.

This matters because the fusion operation is where the most consequential decisions are made. When SAR-derived flood extent disagrees with the terrain model about whether an area should be flooded, something has to give. When the land cover mask says a pixel is a permanent water body but the SAR change detection shows backscatter change, the system has to decide which input to trust. These decisions shape the final product as much as any individual input does.

If your provenance record captures every detail of how the SAR image was terrain-corrected but says nothing about how the fusion algorithm resolved conflicting evidence from different sources, you have documented the easy part and skipped the hard part.

Temporal Alignment and the Provenance of Time

Multi-sensor fusion introduces a problem that single-source processing does not have: the inputs were not captured at the same time.

The post-event SAR scene is from six hours after the flood. The land cover classification is derived from optical imagery captured weeks or months earlier. The DEM represents terrain as it was when the elevation data was acquired, which might be years ago. The pre-event SAR baseline is twelve days old.

When you fuse these into a single product, you are implicitly making claims about temporal relationships. You are asserting that the land cover classification is still valid, that the DEM has not changed due to erosion or construction, that the pre-event baseline represents normal conditions for this location.

These are assumptions, and they are rarely recorded in the provenance. The metadata might say "land cover source: Sentinel-2 2025 composite" but it will not say "we assumed land cover had not changed between the composite date and the flood event, and if it had, the flood mask may include or exclude areas incorrectly."

Temporal assumptions are a form of provenance debt. They accumulate silently, and they only become visible when something goes wrong — when someone asks why the flood map shows water where there is a new housing development that was not in the land cover classification.

The Resolution Mismatch

Sensor fusion also forces decisions about spatial resolution that carry provenance implications. Sentinel-1 SAR data at 10-metre resolution fused with a 30-metre DEM fused with a 10-metre land cover map: the final product cannot have a resolution higher than its coarsest input, and how you handle the mismatch changes the result.

Do you upsample the DEM to 10 metres? That creates synthetic detail — the 30-metre elevation values are interpolated, not measured. Do you downsample everything to 30 metres? That discards real information from the SAR and land cover data. Do you process at native resolutions and resample only at the final step? That preserves more information but introduces potential misregistration artifacts.

Bilinear interpolation of a 30m DEM to 10m is common practice in Sentinel-1 terrain correction (e.g., in ESA's SNAP toolbox). The resulting elevation values between measured posts are mathematically smooth but physically unverified — a distinction that matters for slope-dependent flood modelling in complex terrain.

Each of these choices produces a different product. Each is defensible. And each should be recorded in the provenance — not as "resampled to common grid" but as a specific description of the method, the target resolution, the interpolation algorithm, and the order in which resampling was applied.

Representing Fusion Provenance

The W3C PROV model is expressive enough to represent fusion provenance as a graph. An Activity (the fusion operation) can consume multiple Entities (the input datasets) and produce a new Entity (the fused product). Each input can have its own chain of Activities and Entities extending back to the original sensor reading. The model supports attribution, delegation, and derivation relationships that can capture who did what to which data and when.

The problem is not the data model. It is the tooling.

Most geospatial processing platforms do not generate PROV-compatible provenance records. Most pipeline orchestration tools — Airflow, Prefect, Argo — track task execution for operational monitoring, not for scientific reproducibility. The logs they produce can tell you that a task ran, how long it took, and whether it succeeded. They cannot tell you the scientific meaning of what that task did to the data.

Bridging this gap requires provenance-aware processing — systems that generate structured, machine-readable provenance records as a native byproduct of computation, not as an afterthought bolted onto the logging framework. The processing receipt should be as automatic as the log file, and considerably more detailed.

The Trust Boundary

There is a deeper issue with multi-sensor fusion that purely technical provenance cannot solve: the trust boundary.

When you process a single dataset from a single source, the provenance chain exists within one institutional context. ESA processes Sentinel data. You download it. You trust ESA's pipeline because ESA is a known entity with published methodologies and a reputation to protect.

When you fuse data from multiple sources, you are implicitly trusting multiple institutions. ESA for the Sentinel data. The DLR or another agency for the DEM. A commercial provider for the land cover classification. A government weather service for the climate data. Each of these has different standards, different levels of transparency, and different incentives.

The provenance of a fused product is only as trustworthy as the least trustworthy input. If one of your four sources has poor provenance — if you cannot verify how its data was processed — then your fused product inherits that uncertainty regardless of how well you documented your own fusion operation.

This is why the movement toward cryptographic provenance — verifiable proofs of processing rather than metadata claims — is particularly important for fusion workflows. When data crosses institutional boundaries, the trust model changes. You are no longer trusting a colleague down the hall. You are trusting an organisation you may never have interacted with, whose processing pipeline you cannot inspect. Cryptographic attestation provides a mechanism for verification that does not depend on institutional trust.

What a Solution Looks Like

A provenance-complete fusion system would do several things that most current systems do not.

It would represent provenance as a graph, not a chain. Every input, every intermediate product, every operation would be a node, and the relationships between them would be explicit and machine-readable.

It would capture the fusion operation with the same rigour as the input processing. The algorithm, its parameters, its conflict resolution logic, its resampling decisions — all of it recorded, not summarised.

It would make temporal assumptions explicit. When inputs from different times are treated as contemporaneous, that assumption would be part of the provenance record, along with whatever validation was performed to justify it.

It would track resolution handling. Every resampling decision — what was upsampled, what was downsampled, by what method — would be documented as a transformation with its own provenance entry.

It would propagate uncertainty. Each input carries uncertainty from its own processing history. The fusion operation introduces additional uncertainty. The output's provenance should include not just the lineage of values but the lineage of confidence.

And it would be automatic. If generating provenance requires manual effort from the analyst, it will not happen. It must be a byproduct of processing, generated by the system, not by the person using it.

None of these requirements are technically impossible. Some of them are being implemented in various research systems and experimental platforms. The challenge is making them standard — making comprehensive fusion provenance the default rather than the exception.