Data Engineering Hub
GitHub Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode

Claim Check Pattern

The claim-check pattern is used to reduce the cost and size of large messages by first storing the data in an external storage location and then sending a reference to the data/event to the consumer.

%%{init: { "flowchart": { "useMaxWidth": true } } }%%

graph LR

A[[Message with data]]-->|1.| B((Producer))
B -->|2. Store data, save key| C[(Storage)]
D[[Smaller message with key only]]
B -->|3.| D -->|4.| E((Consumer))
C -->|5. Get data with key| E -->|6.| F[[Message with data]]
  1. Send message
  2. Store message in data store
  3. Enqueue the message’s reference (i.e. key)
  4. Read the message’s reference
  5. Retrieve the message
  6. Process the message

Claim Check Pattern Advantages

  • Reduces cost of data transfer via messaging/streams. This is because storage is usually cheaper than messaging/streaming resources (memory).
  • Helps protect the message bus and client from being overwhelmed or slowed down by large messages.
  • Allows you to asynchronously process data which can help with scalability/performance.

Claim Check Pattern Disadvantages

  • If the external service used to store the payload fails, then the message will not be delivered.
  • Requires additional storage space and adds additional time to store/retrieve data.

Claim Check Pattern Examples

  • Kafka client writes payload to S3/Azure Blob Storage/GCS. Then it sends a notification message. The consumer receives the message and accesses the payload from S3/Azure Blob Storage/GCS.
  • In Airflow, you sometimes need to pass data between tasks. You can do this using XComs but there is a limit to the size of the message you can send. For passing large messages via XComs you can use the claim check pattern.

Sources: