The Log is a data structure which acts as unifying abstraction for real-time data.
The most important background article about The Log can be found on the LinkedIn Blog - a great introduction which explains the rationale behind building Kafka, which is de-facto standard of distributed logs!
We are assessing this pattern as TRY instead of adopt (for now), because I see it as highly relevant in many architectures; though we did not apply the pattern in many real world projects yet in full depth.
- the log can be used a unifying structure in a distributed system, decoupling data producers and consumers
- captures state changes (though events which are entries in the log), which contains more information than just storing the current state.
- also related to Event Sourcing / CQRS, which applies very similar principles for the application architecture domain.
- for legacy (e.g. database) systems, it can be sometimes hard to extract the event / log information. Projects which can help there:
- Debezium: Reads the Postgres WAL, or Mysql Transaction log and publishes changes to Kafka.
- Lapidus does the same as Debrezium, but can also publish to Nats Streaming.
- The Postgres Replication Slots feature together with wal2json or jsoncdc can be used (lowlevel).
- Transicator does something similarish it seems; but also contains a complete HTTP server.
- To ensure the log contains all information, you should use the log as primary communication pattern between systems. The utility of the log shrinks with every additional communication side-channel being created.
Log Storage Solutions
- Apache Kafka: The "grandfather" of distributed logs, the de-facto standard in many bigger deployments. Written in Java. Has a fat client with more logic, as it has to remember the position in the stream. Needs zookeeper, that's why I personally would refrain from using it in smaller deployments.
- Nats Streaming: Lightweight log storage written in Go. Not just the client can remember the stream position, but also the server can remember the client's position in the stream - which makes the clients thinner. Soon has HA/Clustering support! Currently my recommendation for our kinds of projects.
- Redis Streams (soon, starting with Redis 5.0): If you use Redis already, very easy to get started with. Drawback: The full stream has to fit in memory; so for infinite logs this is not an option. May be an option for limited-size logs (with a retention time of e.g. a few days).
For broader context and related tools, see queues.io, which links many projects in this space (and Kafka/Nats are also in this list).