Data consistency and Transactions

Data consistency and transactions

dbzero is built around a transactional model that guarantees serializable consistency. This means all your data modifications are grouped into atomic operations that appear to run sequentially, one after another. This design eliminates race conditions and ensures your application state is always predictable and consistent.

Committing Changes: autocommit and dbzero.commit()

By default, changes to your objects are buffered in memory and then periodically persisted to storage. This autocommit feature makes data visible to other processes (readers) with minimal delay. It's the recommended mechanism for most use cases because it maximizes throughput, while durability guarantees in complex environments can be handled by higher-level distributed transactions.

For scenarios where your process must ensure data is immediately written to disk, you can explicitly call dbzero.commit(). This is useful for batch operations, where you can add multiple objects in a loop and persist them together with a single commit.

def test_append_list_in_multiple_transactions(db0_fixture):
    # Create a persistent list
    tasks = db0.list()
    db0.commit()
 
    # In a loop, append 5 tasks and then commit the batch
    for _ in range(10):
        for _ in range(5):
            tasks.append(MemoTask("etl", "processor1"))
        # The 5 new tasks are now persisted to disk
        db0.commit()
    
    # The list now durably contains all 50 tasks
    assert len(tasks) == 50

When you call dbzero.close(), any pending changes in the current transaction are automatically committed as well.

Automatic Commits

To simplify data persistence, dbzero enables autocommit by default. This feature periodically saves data changes at a configurable interval, maintaining optimal balance between performance and durability without requiring manual dbzero.commit() calls. If no data changes occured since the last commit, the autocommit operation is skipped.

You can observe this behavior in action. The transaction state number, which you can get with dbzero.get_state_num(), will automatically increase after a change is made and the autocommit interval passes.

def test_db0_starts_autocommit_by_default(db0_fixture):
    object_1 = MemoTestClass(951)
    state_1 = db0.get_state_num()
 
    # Wait for a period longer than the autocommit interval
    time.sleep(0.3) # Configured interval is 250ms
    state_2 = db0.get_state_num()
 
    # The state number increased because autocommit fired
    assert state_2 > state_1

You can easily configure or disable autocommit, either globally during initialization or for a specific data prefix.

# Disable autocommit for a specific prefix
db0.open("my-prefix", autocommit=False)
 
# Or configure it globally during initialization
db0.init(DB0_DIR, config={'autocommit': True, 'autocommit_interval': 1000}) # 1s interval

Multi-Process Synchronization 🔄

dbzero implements a single-writer, multiple-reader concurrency model for each data prefix. When a writer process commits a transaction, its changes are made visible to all reader processes automatically and transparently, typically within a fraction of a second.

Readers don't need to call any special functions to refresh their view of the data. dbzero handles the updates in the background, so your code can simply access object attributes in a loop to detect changes made by another process.

test_auto_refresh.py
# SETUP: A separate writer process changes object_1.value1 from 123 to 124 and commits.
# ...
 
# In the reader process:
dbzero.open(prefix_name, "r")
object_1 = MemoClassX()
assert object_1.value1 == 123 # Initial value
 
# Loop and wait for the change to become visible automatically
max_repeat = 10
while object_1.value1 != 124 and max_repeat > 0:
    time.sleep(0.1)
    max_repeat -= 1
 
# The reader now sees the updated value
assert object_1.value1 == 124

Waiting for Specific Updates

While data appears automatically, sometimes a reader needs to ensure it has processed a specific transaction before proceeding. For this, dbzero provides dbzero.wait() (and its asyncio counterpart, dbzero.async_wait()). These methods efficiently block the reader until the writer reaches a target transaction state, avoiding the need for polling loops.

test_wait_for_updates.py
# Get the reader's current state number
current_num = dbzero.get_state_num(prefix)
 
# A writer process makes and commits 5 transactions in the background...
 
# Block until the state has advanced by 5, with a 1-second timeout.
# This is much more efficient than a polling loop.
assert dbzero.wait(prefix, current_num + 5, 1000)

Durability and Crash Safety

dbzero is designed for resilience. All committed data is durable and safe, even if your application crashes. If a writer process terminates mid-transaction, before a commit is finalized, the partial changes are automatically rolled back. This ensures the storage is never left in a corrupt or inconsistent state.

test_crash_safety.py
def open_prefix_then_crash():
    dbzero.open("new-prefix-1")
    dbzero.tags(MemoTestClass(123)).add("tag1")
    # Process crashes before changes are committed
    raise Exception("Crash!")
 
# After the writer process crashes...
p = multiprocessing.Process(target=open_prefix_then_crash)
p.start()
p.join()
 
# Another process can safely open the same prefix
dbzero.open("new-prefix-1", "r")
 
# The uncommitted data from the crashed process is gone
assert len(list(dbzero.find("tag1"))) == 0