Indexes 🗂️
Introduction
In dbzero, an index is a specialized object that stores other objects in a key-value format. Its primary purpose is to provide fast, sorted access to your data, acting like an in-memory, ordered dictionary. This allows you to efficiently organize and retrieve objects based on criteria like priority, date, or any other comparable value.
Indexes are created using the dbzero.index() factory function.
The core idea is to replicate the behavior of database indexes directly on your Python objects, avoiding the need for a separate database system for many use cases.
Unlike in a traditional database, an index can be bound to a specific context—for example, as an attribute on a Client object to store only the orders for that specific client. This localization greatly improves cache locality and can make lookups orders of magnitude faster than searching a global index.
Basic Operations
Creating and Populating an Index
You can create an empty index and add items to it. An index can be a standalone object or a member of another class.
# Create a new index
my_index = db0.index()
# Add items using index.add(key, value)
# The value must be a dbzero-managed object.
task1 = Task(description="Finish docs")
task2 = Task(description="Write tests")
my_index.add(1, task1) # key=1 (priority)
my_index.add(2, task2) # key=2 (priority)
print(f"Index now contains {len(my_index)} items.")You can add the same object multiple times with different keys. An index can also store objects with None as their key.
Removing Elements
To remove an item, you must provide both the key and the value that were originally added.
# Remove the item with key=1
my_index.remove(1, task1)
print(f"Index now contains {len(my_index)} items.")Removing items with a None key works the same way:
null_key_obj = Task(description="Low priority")
my_index.add(None, null_key_obj)
# ...
my_index.remove(None, null_key_obj)Operations like add and remove are transactional. The changes are staged in memory until they are persisted.
Data Persistence 💾
Changes made to an index (adding or removing elements) are not immediately written to disk. They are held in memory and persisted to the underlying storage only on autocommit, when you call dbzero.commit() or dbzero.close().
index = db0.index()
obj = MyObject(value=123)
index.add(1, obj)
# The change is currently only in memory.
# To save it permanently:
db0.commit()
# Alternatively, closing the database session also flushes changes.
# db0.close()If the program exits or the index object is destroyed without a commit or close, any pending changes will be discarded.
Querying and Sorting ⚙️
The real power of indexes comes from querying and sorting.
Sorting with index.sort()
The sort() method takes an iterable (like the result of a dbzero.find() query) and returns its elements sorted according to their keys in the index.
# Setup: Add objects with tags and priorities to an index
priority_index = db0.index()
tasks = [Task(priority=p) for p in [99, 66, 55, 88]]
for t in tasks:
db0.tags(t).add("project-alpha")
priority_index.add(t.priority, t)
# Find all tasks with "project-alpha" and sort them by priority
all_tasks = db0.find("project-alpha")
sorted_tasks = priority_index.sort(all_tasks)
# sorted_tasks will be in ascending order of priority: [55, 66, 88, 99]
print([t.priority for t in sorted_tasks])Sorting Order
By default, sort() orders items in ascending order. None values are placed at the end.
You can customize the sorting behavior with these arguments:
desc=True: Sorts in descending order.null_first=True: PlacesNonekeys at the beginning of the result.null_first=False: PlacesNonekeys at the end (this is the default for ascending sort).
# Example data with keys: [666, None, 555, 888, None]
tasks = db0.find("all-tasks")
# Descending order (None keys are first by default in descending)
result_desc = priority_index.sort(tasks, desc=True)
# -> [None, None, 888, 666, 555]
# Ascending order, but with None keys first
result_null_first = priority_index.sort(tasks, null_first=True)
# -> [None, None, 555, 666, 888]index.sort() is efficient because it doesn't re-sort the data each time. It uses the pre-ordered structure of the index to quickly arrange the given results.
Multi-level Sorting
You can achieve multi-level sorting by chaining sort() calls. The last sort() call determines the primary sort order.
# Sort by name, then by priority
sorted_by_priority = priority_index.sort(all_tasks)
final_sort = name_index.sort(sorted_by_priority)Range Queries with index.select()
The select() method retrieves all objects whose keys fall within a specified range.
index.select(min_key, max_key)
The range is inclusive by default. You can use None to specify an unbounded range.
# Setup: index with keys from 0 to 9
numeric_index = db0.index()
for i in range(10):
numeric_index.add(i, MyObject(i))
# Select items with keys between 2 and 5 (inclusive)
# Result values: {2, 3, 4, 5}
results = numeric_index.select(2, 5)
# Select all items with keys >= 7 (high-unbounded)
results = numeric_index.select(7, None)
# Select all items with keys <= 3 (low-unbounded)
results = numeric_index.select(None, 3)
# Select all items in the index
all_items = numeric_index.select(None, None)
# or just:
all_items = numeric_index.select()The keys can be numbers, dates, or datetimes. Strings are currently not supported as keys in dbzero indexes. For string-based indexing, consider using dbzero.dict instead.
from datetime import datetime, timedelta
date_index = db0.index()
# ... populate with datetime keys ...
# Get items from the last 24 hours
one_day_ago = datetime.now() - timedelta(days=1)
recent_items = date_index.select(one_day_ago, None)Advanced Queries
Combining select and find
You can combine the results of index.select() with other queries using dbzero.find(). This allows for powerful, multi-faceted filtering. dbzero.find() performs an intersection of the results from all provided iterables.
# Find objects with 'tag1' AND a priority between 500 and 800
query_result = db0.find(
"tag1",
priority_index.select(500, 800)
)
# Find objects that are in both index ranges
query_result = db0.find(
priority_index.select(100, 200),
date_index.select(start_date, end_date)
)Finding Specific Objects in an Index
The in operator and dbzero.find() also work on the results of an index query, making it easy to check for existence or retrieve specific items.
# Check if a specific object is in a range
my_object = # ... some object
if my_object in index.select(100, 200):
print("Object found in range!")
# Find a list of specific objects within a range
objects_to_find = [obj1, obj3, obj5]
found_objects = db0.find(index.select(), objects_to_find)Object Lifecycle & Memory Management 🗑️
dbzero automates memory management through reference counting. Indexes play a crucial role in this.
- Adding an object to an index creates a reference to it, preventing it from being garbage collected.
- Removing an object from an index removes that reference.
- Deleting an entire index with
dbzero.delete(index)will also remove its references to all the objects it contains.
If an object's last reference is removed (e.g., it's removed from the only index that holds it), the object itself will be automatically deleted from the prefix upon the next commit.
index = db0.index()
obj = MyObject()
obj_uuid = db0.uuid(obj)
# Add object to index, creating a reference
index.add(1, obj)
db0.commit()
# Now, remove the object from the index
index.remove(1, obj)
db0.commit()
# If no other references to 'obj' exist, it's now gone.
# This will raise an exception because the object was deleted.
db0.fetch(obj_uuid)When an object needs to add itself to an index during initialization (__init__), you must wrap self with dbzero.materialized(). This ensures that the object is fully created before its reference is added to the index.
@db0.memo
class SelfAwareObject:
def __init__(self, key, index):
# Correct way to add self to an index from __init__
index.add(key, db0.materialized(self))