At the Linux Foundation’s recent End User Summit, I had the pleasure of meeting K.S. Bhaskar from FIS. Recently he wrote an article on his blog about Eventual State Consistency vs. Eventual Path Consistency in which he has some particularly interesting things about different kinds of consistency guarantees.

there are applications where detection of colliding updates does not suffice to ensure Consistency (where Consistency is defined as the database always being within the design parameters of the application logic – for example, the “in balance” guarantee that the sums of the assets and liabilities columns of a balance sheet are always equal).

He then gives an example showing an apparent problem with two financial transactions and their associated service charges, across two sites while a service-charge rate change is still “in flight” between them. I originally responded there, but my reply seems to have disappeared. Maybe it got lost due to a conflict with a subsequent update. ;) In any case, I might as well respond here because I think his example highlights an important issue. I don’t think Bhaskar’s example really demonstrates the problem he had described. In the last step he says that

B detects a collision, since it observes that the data that was read by the application logic on A to compute the result of transaction P has changed on B

How could B observe such a thing? Only if it knew either the data that was read on A (i.e. the service-charge rate in effect for the transaction was included as part of the replication request) or the exact replication state on A at the time P was processed there (e.g. by using vector clocks or similar). Either way, it would have enough information to replicate the transaction in a consistent fashion.

The real problem would be if B didn’t know whether or not the rate change had reached A yet when P was processed there. That would result in B needing to distinguish between two possible states that would have to be handled differently, but with no way to make that distinction. The general rule to avoid these kinds of unresolvable conflicts is: don’t pass around references to values that might be inconsistent across systems. It’s like passing a pointer from one address space to a process in another; you just shouldn’t expect it to work. Either pass around the actual values or do calculations involving those values and replicate the result. For example, consider the following replication requests.

# $var indicates immediate substitution from the original context
# %var indicates a transaction-local variable

# Wrong: sc_rate is passed by reference and interpreted at destination
replicate transaction "transfer #zzz" {
    acct_x -= $amt * (1.0 + sc_rate);
    acct_y += $amt;
}

# Right: sc_rate is interpreted at source and passed by value
replicate transaction "transfer #zzz" {
    %sc_rate = $sc_rate;
    acct_x -= $amt * (1.0 + %sc_rate);
    acct_y += $amt;
}

# Right: service charge is calculated at source
# works, but not good for auditing
amt_with_sc = amt * (1.0 + sc_rate)
replicate transaction "transfer #zzz" {
    acct_x -= $amt_with_sc;
    acct_y += $amt;
}

# Right: service charge as separate transaction
sc = amt * sc_rate;
replicate transaction "transfer #zzz" {
    acct_x -= $amt;
    acct_y += $amt;
}
replicate transaction "service charge for #zzz" {
    acct_x -= $sc;
}

In an ideal world, the interface and behavior for the replication subsystem would disallow or strongly discourage the wrong form. For example, it could require that any values meant to be interpreted or modified at the destination must be explicitly listed or tagged, and reject anything that abuses “extraneous” variables as in the first form above. (Auto-conversion of the first form into the second is likely to introduce its own kinds of unexpected behavior.) That would force people to use one of the methods that actually works.