badge-checkData Provenance and Verification in Streams

When consuming data from any source, especially in a decentralized environment, the most critical question is: "Can I trust this data?"

This question is not just about the data's content, but its origin. How do you know that data claiming to be from a trusted oracle, a specific device, or another user actually came from them and not from an imposter?

This is the challenge of Data Provenance.

In Somnia Data Streams, provenance is not an optional feature or a "best practice". It is a fundamental, cryptographic guarantee built into the core smart contract. This article explains how Streams ensures authenticity via publisher signatures and how you can verify data origin.

The Cryptographic Guarantee: msg.sender as Provenance

The trust layer of Somnia Streams is elegantly simple. It does not rely on complex off-chain signature checking or data fields like senderName. Instead, it leverages the most basic and secure primitive of the EVM: msg.sender.

All data published to Streams is stored in the core Streams smart contract. The data storage mapping has a specific structure:

Conceptual Contract Storage

// mapping: schemaId => publisherAddress => dataId => data
mapping(bytes32 => mapping(address => mapping(bytes32 => bytes))) public dsstore;

When a publisher calls sdk.streams.set(...) or sdk.streams.setAndEmitEvents(...), their wallet signs a transaction. The Streams smart contract receives this transaction and identifies the signer's address via the msg.sender variable.

The contract then stores the data at the msg.sender's address within the schema's mapping.

This is the cryptographic guarantee.

It is impossible for 0xPublisher_A to send a transaction that writes data into the slot for 0xPublisher_B. They cannot fake their msg.sender. The data is automatically and immutably tied to the address of the account that paid the gas to publish it.

  • An attacker cannot write data as if it came from a trusted oracle.

  • A user cannot send a chat message pretending to be another user.

  • Data integrity is linked directly to wallet security.

Verification Is Implicit in the Read Operation

Because the publisher address is a fundamental key in the storage mapping, you don't need to perform complex "verification" steps. Verification is implicit in the read operation.

When you use the SDK to read data, you must specify which publisher you are interested in:

  • sdk.streams.getByKey(schemaId, publisher, key)

  • sdk.streams.getAllPublisherDataForSchema(schemaId, publisher)

When you call getAllPublisherDataForSchema(schemaId, '0xTRUSTED_ORACLE_ADDRESS'), you are not filtering data. You are asking the smart contract to retrieve data from the specific storage slot that only 0xTRUSTED_ORACLE_ADDRESS could have written to.

If an imposter (0xIMPOSTER_ADDRESS) publishes data using the same schemaId, their data is stored in a completely different location (dsstore[schemaId]['0xIMPOSTER_ADDRESS']). It will never be returned when you query for the trusted address.

Deliverable: Building a Verification Script

Let's build a utility to prove this concept.

Scenario: We have a shared oraclePrice schema. Two different, trusted oracles (0xOracle_A and 0xOracle_B) publish prices to it. We will build a script that verifies the origin of data and proves that an imposter cannot pollute their feeds.

Project Setup

We will use the same project setup as the "Multi-Publisher Aggregatorarrow-up-right" tutorial. You will need a .env file with at least one private key to act as a publisher, and we will simulate the other addresses.

src/lib/clients.tsarrow-up-right (No changes needed from the previous tutorial. We just need publicClient.)

src/lib/schema.ts

The Verification Script

This script will not publish data. We will assume our two trusted oracles (PUBLISHER_1_PK and PUBLISHER_2_PK from the previous tutorial) have already published data using the oraclePriceSchema.

Our script will:

  1. Define a list of TRUSTED_ORACLES.

  2. Define an IMPOSTER_ORACLE (a random address that has not published).

  3. Create a verifyPublisher function that fetches data only for a specific publisher address.

  4. Run verification for all addresses and show that data is only returned for the correct publishers.

src/scripts/verifyOrigin.ts

Expected Output

To run this, first publish some data (using the script from the previous tutorial, but adapted for oraclePriceSchema) from both PUBLISHER_1_PK and PUBLISHER_2_PK. Then, run the verification script.

You will see an output similar to this:

Conclusion: Key Takeaways

  • Provenance is Built-In: Data provenance in Somnia Streams is not an optional feature; it is a core cryptographic guarantee of the Streams smart contract, enforced by msg.sender.

  • Verification is Implicit: You verify data origin every time you perform a read operation with getAllPublisherDataForSchema or getByKey. The publisher address acts as the ultimate verification key.

  • Trust Layer: This architecture creates a robust trust layer. Your application logic can be certain that any data returned for a specific publisher was, without question, signed and submitted by that publisher's wallet.

Last updated