Data Provenance and Verification in Streams

When consuming data from any source, especially in a decentralized environment, the most critical question is: "Can I trust this data?"

This question is not just about the data's content, but its origin. How do you know that data claiming to be from a trusted oracle, a specific device, or another user actually came from them and not from an imposter?

This is the challenge of Data Provenance.

In Somnia Data Streams, provenance is not an optional feature or a "best practice". It is a fundamental, cryptographic guarantee built into the core smart contract. This article explains how Streams ensures authenticity via publisher signatures and how you can verify data origin.

The Cryptographic Guarantee: `msg.sender` as Provenance

The trust layer of Somnia Streams is elegantly simple. It does not rely on complex off-chain signature checking or data fields like senderName. Instead, it leverages the most basic and secure primitive of the EVM: msg.sender.

All data published to Streams is stored in the core Streams smart contract. The data storage mapping has a specific structure:

Conceptual Contract Storage

// mapping: schemaId => publisherAddress => dataId => data
mapping(bytes32 => mapping(address => mapping(bytes32 => bytes))) public dsstore;

When a publisher calls sdk.streams.set(...) or sdk.streams.setAndEmitEvents(...), their wallet signs a transaction. The Streams smart contract receives this transaction and identifies the signer's address via the msg.sender variable.

The contract then stores the data at the msg.sender's address within the schema's mapping.

This is the cryptographic guarantee.

It is impossible for 0xPublisher_A to send a transaction that writes data into the slot for 0xPublisher_B. They cannot fake their msg.sender. The data is automatically and immutably tied to the address of the account that paid the gas to publish it.

An attacker cannot write data as if it came from a trusted oracle.
A user cannot send a chat message pretending to be another user.
Data integrity is linked directly to wallet security.

Verification Is Implicit in the Read Operation

Because the publisher address is a fundamental key in the storage mapping, you don't need to perform complex "verification" steps. Verification is implicit in the read operation.

When you use the SDK to read data, you must specify which publisher you are interested in:

sdk.streams.getByKey(schemaId, publisher, key)
sdk.streams.getAllPublisherDataForSchema(schemaId, publisher)

When you call getAllPublisherDataForSchema(schemaId, '0xTRUSTED_ORACLE_ADDRESS'), you are not filtering data. You are asking the smart contract to retrieve data from the specific storage slot that only 0xTRUSTED_ORACLE_ADDRESS could have written to.

If an imposter (0xIMPOSTER_ADDRESS) publishes data using the same schemaId, their data is stored in a completely different location (dsstore[schemaId]['0xIMPOSTER_ADDRESS']). It will never be returned when you query for the trusted address.

Deliverable: Building a Verification Script

Let's build a utility to prove this concept.

Scenario: We have a shared oraclePrice schema. Two different, trusted oracles (0xOracle_A and 0xOracle_B) publish prices to it. We will build a script that verifies the origin of data and proves that an imposter cannot pollute their feeds.

Project Setup

We will use the same project setup as the "Multi-Publisher Aggregator" tutorial. You will need a .env file with at least one private key to act as a publisher, and we will simulate the other addresses.

src/lib/clients.ts (No changes needed from the previous tutorial. We just need publicClient.)

src/lib/schema.ts

export const oraclePriceSchema = 'uint256 price, uint64 timestamp'

The Verification Script

This script will not publish data. We will assume our two trusted oracles (PUBLISHER_1_PK and PUBLISHER_2_PK from the previous tutorial) have already published data using the oraclePriceSchema.

Our script will:

Define a list of TRUSTED_ORACLES.
Define an IMPOSTER_ORACLE (a random address that has not published).
Create a verifyPublisher function that fetches data only for a specific publisher address.
Run verification for all addresses and show that data is only returned for the correct publishers.

src/scripts/verifyOrigin.ts

import 'dotenv/config'
import { SDK, SchemaDecodedItem } from '@somnia-chain/streams'
import { publicClient } from '../lib/clients' // Assuming you have clients.ts from previous tutorial
import { oraclePriceSchema } from '../lib/schema'
import { Address, createWalletClient, http } from 'viem'
import { privateKeyToAccount } from 'viem/accounts'

// --- Setup: Define our trusted and untrusted addresses ---

function getEnv(key: string): string {
  const value = process.env[key]
  if (!value) throw new Error(`Missing environment variable: ${key}`)
  return value
}

// These are the addresses we trust for this schema.
// We get them from our .env file for this example.
const TRUSTED_ORACLES: Address[] = [
  privateKeyToAccount(getEnv('PUBLISHER_1_PK') as `0x${string}`).address,
  privateKeyToAccount(getEnv('PUBLISHER_2_PK') as `0x${string}`).address,
]

// This is a random, untrusted address.
const IMPOSTER_ORACLE: Address = '0x1234567890123456789012345678901234567890'

// --- Helper Functions ---

// Helper to decode the oracle data
function decodePriceRecord(row: SchemaDecodedItem[]): { price: bigint, timestamp: number } {
  const val = (field: any) => field?.value?.value ?? field?.value ?? ''
  return {
    price: BigInt(val(row[0])),
    timestamp: Number(val(r[1])),
  }
}

/**
 * Verification Utility
 * Fetches data for a *single* publisher to verify its origin.
 */
async function verifyPublisher(sdk: SDK, schemaId: `0x${string}`, publisherAddress: Address) {
  console.log(`\n--- Verifying Publisher: ${publisherAddress} ---`)
  
  try {
    const data = await sdk.streams.getAllPublisherDataForSchema(schemaId, publisherAddress)
    
    if (!data || data.length === 0) {
      console.log('[VERIFIED] No data found for this publisher.')
      return
    }

    const records = (data as SchemaDecodedItem[][]).map(decodePriceRecord)
    console.log(`[VERIFIED] Found ${records.length} record(s) cryptographically signed by this publisher:`)
    
    records.forEach(record => {
      console.log(`  - Price: ${record.price}, Time: ${new Date(record.timestamp).toISOString()}`)
    })

  } catch (error: any) {
    console.error(`Error during verification: ${error.message}`)
  }
}

// --- Main Execution ---

async function main() {
  const sdk = new SDK({ public: publicClient })
  
  const schemaId = await sdk.streams.computeSchemaId(oraclePriceSchema)
  if (!schemaId) throw new Error('Could not compute schemaId')

  console.log('Starting Data Provenance Verification...')
  console.log(`Schema: oraclePriceSchema (${schemaId})`)

  // 1. Verify our trusted oracles
  for (const oracleAddress of TRUSTED_ORACLES) {
    await verifyPublisher(sdk, schemaId, oracleAddress)
  }

  // 2. Verify the imposter
  // This will securely return NO data, even if the imposter
  // published data to the same schemaId under their *own* address.
  await verifyPublisher(sdk, schemaId, IMPOSTER_ORACLE)
}

main().catch((e) => {
  console.error(e)
  process.exit(1)
})

Expected Output

To run this, first publish some data (using the script from the previous tutorial, but adapted for oraclePriceSchema) from both PUBLISHER_1_PK and PUBLISHER_2_PK. Then, run the verification script.

# Add to package.json
"verify": "ts-node src/scripts/verifyOrigin.ts"

# Run it
npm run verify

You will see an output similar to this:

Starting Data Provenance Verification...
Schema: oraclePriceSchema (0x...)

--- Verifying Publisher: 0xPublisher1Address... ---
[VERIFIED] Found 2 record(s) cryptographically signed by this publisher:
  - Price: 3200, Time: 2025-10-31T12:30:00.000Z
  - Price: 3201, Time: 2025-10-31T12:31:00.000Z

--- Verifying Publisher: 0xPublisher2Address... ---
[VERIFIED] Found 1 record(s) cryptographically signed by this publisher:
  - Price: 3199, Time: 2025-10-31T12:30:30.000Z

--- Verifying Publisher: 0x1234567890123456789012345678901234567890 ---
[VERIFIED] No data found for this publisher.

Conclusion: Key Takeaways

Provenance is Built-In: Data provenance in Somnia Streams is not an optional feature; it is a core cryptographic guarantee of the Streams smart contract, enforced by msg.sender.
Verification is Implicit: You verify data origin every time you perform a read operation with getAllPublisherDataForSchema or getByKey. The publisher address acts as the ultimate verification key.
Trust Layer: This architecture creates a robust trust layer. Your application logic can be certain that any data returned for a specific publisher was, without question, signed and submitted by that publisher's wallet.

PreviousAdvanced: The DApp Publisher Proxy Pattern NextTutorial: Integrating Chainlink Oracles with Streams

Last updated 4 months ago

Good afternoon

hashtagThe Cryptographic Guarantee: msg.sender as Provenance

hashtagVerification Is Implicit in the Read Operation

hashtagDeliverable: Building a Verification Script

hashtagProject Setup

hashtagThe Verification Script

hashtagExpected Output

hashtagConclusion: Key Takeaways