Kontakt

The Log

The Log is a data structure which acts as unifying abstraction for real-time data.

The most important background article about The Log can be found on the LinkedIn Blog - a great introduction which explains the rationale behind building Kafka, which is de-facto standard of distributed logs!

A much more recent article series is Building a Distributed Log from Scratch (Part1, Part2, Part3, Part4, Part5), which gives great background info on Kafka and Nats Streaming.

We are assessing this pattern as TRY instead of adopt (for now), because I see it as highly relevant in many architectures; though we did not apply the pattern in many real world projects yet in full depth.

Benefits

  • the log can be used a unifying structure in a distributed system, decoupling data producers and consumers
  • captures state changes (though events which are entries in the log), which contains more information than just storing the current state.
  • also related to Event Sourcing / CQRS, which applies very similar principles for the application architecture domain.

Drawbacks

  • for legacy (e.g. database) systems, it can be sometimes hard to extract the event / log information. Projects which can help there:
    • Debezium: Reads the Postgres WAL, or Mysql Transaction log and publishes changes to Kafka.
    • Lapidus does the same as Debrezium, but can also publish to Nats Streaming.
    • The Postgres Replication Slots feature together with wal2json or jsoncdc can be used (lowlevel).
    • Transicator does something similarish it seems; but also contains a complete HTTP server.
  • To ensure the log contains all information, you should use the log as primary communication pattern between systems. The utility of the log shrinks with every additional communication side-channel being created.

Log Storage Solutions

  • Apache Kafka: The "grandfather" of distributed logs, the de-facto standard in many bigger deployments. Written in Java. Has a fat client with more logic, as it has to remember the position in the stream. Needs zookeeper, that's why I personally would refrain from using it in smaller deployments.
  • Nats Streaming: Lightweight log storage written in Go. Not just the client can remember the stream position, but also the server can remember the client's position in the stream - which makes the clients thinner. Soon has HA/Clustering support! Currently my recommendation for our kinds of projects.
  • Redis Streams (soon, starting with Redis 5.0): If you use Redis already, very easy to get started with. Drawback: The full  stream has to fit in memory; so for infinite logs this is not an option. May be an option for limited-size logs (with a retention time of e.g. a few days).

For broader context and related tools, see queues.io, which links many projects in this space (and Kafka/Nats are also in this list).