Martin Krasser's Blog: Introduction to Akka Persistence

Akka Persistence is a new module in Akka 2.3. At the time of writing this post, it is available as milestone release (2.3-M2). Akka Persistence adds actor state persistence and at-least-once message delivery semantics to Akka. It is inspired by and the successor of the eventsourced project. They share many high-level concepts but completely differ on API and implementation level.

To persist an actor's state, only changes to that actor's state are written to a journal, not current state directly. These changes are appended as immutable facts to a journal, nothing is ever mutated, which allows for very high transaction rates and efficient replication. Actor state can be recovered by replaying stored changes and projecting them again. This not only allows state recovery after an actor has been restarted by a supervisor but also after JVM or node crashes, for example. State changes are defined in terms of messages an actor receives (or generates).

Persistence of messages also forms the basis for supporting at-least-once message delivery semantics. This requires retries to counter transport losses, which means keeping state at the sending end and having an acknowledgement mechanism at the receiving end (see Message Delivery Guarantees in Akka). Akka Persistence supports that for point-to-point communications. Reliable point-to-point communications are an important part of highly scalable applications (see also Pet Helland's position paper Life beyond Distributed Transactions).

The following gives a high-level overview of the current features in Akka Persistence. Links to more detailed documentation are included.

Processors

Processors are persistent actors. They internally communicate with a journal to persist messages they receive or generate. They may also request message replay from a journal to recover internal state in failure cases. Processors may either persist messages

before an actor's behavior is executed (command sourcing) or
during an actor's behavior is executed (event sourcing)

Command sourcing is comparable to using a write-ahead-log. Messages are persisted before it is known whether they can be successfully processed or not. In failure cases, they can be (logically) removed from the journal so that they won't be replayed during next recovery. During recovery, command sourced processors show the same behavior as during normal operation. They can achieve high throughput rates by dynamically increasing the size of write batches under high load.

Event sourced processors do not persist commands. Instead they allow application code to derive events from a command and atomically persist these events. After persistence, they are applied to current state. During recovery, events are replayed and only the state-changing behavior of an event sourced processor is executed again. Other side effects that have been executed during normal operation are not performed again.

Processors automatically recover themselves. Akka Persistence guarantees that new messages sent to a processor never interleave with replayed messages. New messages are internally buffered until recovery completes, hence, an application may send messages to a processor immediately after creating it.

Snapshots

The recovery time of a processor increases with the number of messages that have been written by that processor. To reduce recovery time, applications may take snapshots of processor state which can be used as starting points for message replay. Usage of snapshots is optional and only needed for optimization.

Channels

Channels are actors that provide at-least-once message delivery semantics between a sending processor and a receiver that acknowledges the receipt of messages on application level. They also ensure that successfully acknowledged messages are not delivered again to receivers during processor recovery (i.e. replay of messages). Applications that want to have reliable message delivery without application-defined sending processors should use persistent channels. A persistent channel is like a normal channel that additionally persists messages before sending them to a receiver.

Journals

Journals (and snapshot stores) are pluggable in Akka Persistence. The default journal plugin is backed by LevelDB which writes messages to the local filesystem. A replicated journal is planned but not yet part of the distribution. Replicated journals allow stateful actors to be migrated in a cluster, for example. For testing purposes, a remotely shared LevelDB journal can be used instead of a replicated journal to experiment with stateful actor migration. Application code doesn't need to change when switching to a replicated journal later.

Updates:

Views have been added after 2.3-M2.
A replicated journal backed by Apache Cassandra is available here.
A complete list of community-contributed plugins is maintained here.

10 comments:

UnknownDecember 16, 2013 at 4:48 PM
Nice Post. Interesting stuff.
UnknownDecember 17, 2013 at 2:05 AM
Very Nice！ Thank Martin

Best Regard
tiGer
UnknownDecember 17, 2013 at 11:54 PM
Great Post!
Manas KarFebruary 10, 2014 at 7:26 PM
Nice post
UnknownMarch 19, 2014 at 6:30 PM
nice
UnknownMarch 24, 2014 at 4:21 PM
Cool stuff!
markApril 3, 2014 at 3:40 AM
during recovery, are the messages/events being replayed in the in the correct order (right sequence)? is this guaranteed by akka persistence?
UnknownJuly 25, 2014 at 9:44 AM
Hello Martin, thanks for the post and for your good job!

Im totally newcomer to akka and im very confused about the concept of "view" in the akka-persistence context.

The documentation does not helps at all since it does not explains the concept.
I understand the view concept from a database perspective but it's hard to me to translate this to the akka framework.

Can you please give me some hints about this?
Amy WongAugust 25, 2014 at 10:43 AM
Hi Martin,

Thanks for the brief and clear introduction to Akka persistence.

I am new to NoSql.

Comparing with events, it seems a snapshot is a large object and less frequently written to a storage. Therefore, is journal better stored in a database while snapshot is better stored in a file storage? However, when I checked the akka community, all the snapshot plugins are also implemented for databases. Is it because database is still preferred over file storage for snapshot storage?

Thank you.

Monday, December 16, 2013

Introduction to Akka Persistence

Processors

Snapshots

Channels

Journals

10 comments: