Testing Repositories

The Repository pattern is an abstraction over the data storage layer. It wraps database operations in an interface and hides the complexity and mechanics of the database. The domain layer uses the Repository to query and persist domain objects. From the domain layer point of view, the Repository's interface looks like an in-memory domain object collection.

The Repository's responsibility is object retrieval and persistence - it contains no business logic. Therefore, all business logic resides in the domain objects, not the data layer. This separation of responsibilities makes testing and reasoning about the core application's behavior easier. In tests, you'll provide an in-memory object to the domain logic you want to test without being burdened by the test data setup through the production database.

The Repository pattern is described in detail in the PoEAA and DDD. Other great resources I know of are: cosmicpython book's chapter on Repositories, Repositories for DDD on AWS, and designing persistence layer with .NET example.

The following sections will explore testing Repositories separately from the rest of the application.

Implementing a simple Repository

The Repository implementation will differ depending on the database technology (relational, NoSQL, file store, etc.) and the framework (ORM, datastore client library, etc.). The most important part is not exposing the underlying technology in the Repository's interface.

Implementing domain object

Before exploring how to test a Repository, we must create a domain object that the Repository will save and query. We'll use a Customer object as an example.

There are two ways to create a new Customer object instance - with the default constructor (auto-generated with @dataclass) or a create factory method. The former is used for object reconstruction and the latter for new object creation.

customers/domain.py

import uuid
from dataclasses import dataclass


@dataclass
class Customer:
    id: str
    name: str
    email: str

    @staticmethod
    def create(name: str, email: str) -> "Customer":
        return Customer(
            id=str(uuid.uuid4()),
            name=name,
            email=email,
        )

When you're querying an existing object, e.g., by customer identifier or email address, the object is reconstructed from existing data from a datastore. When the data layer is separated from the domain layer, the Repository is responsible for reconstructing the objects. The domain layer provides an interface for object reconstruction - the object constructor method (__init__).

To create a new, unique customer, e.g., when a user registers in your application, we'll use the new object creation method - create. The create method requires the data to create a new customer - name and email. Unlike __init__, the create method doesn't require the id; the new random uuid is generated inside the create method. Therefore, the new object creation method encapsulates the rules of a new object creation - in this example, the Customer.id generation - and offloads this responsibility from the data layer.

We'll see how separating the object reconstruction from the new object creation is useful when implementing a sample Repository.

Implementing DynamoDB Repository

In this example, we'll use AWS DynamoDB to implement DynamoDBCustomerRepository. The sample Repository has two methods - constructor (__init__) and save method.

The constructor takes two dependencies - DynamoDBClient and table_name. Explicitly passing the dependencies increases flexibility - the Repository will be easy to configure in tests and in the production code, as we'll see in the following sections. The save method persists a Customer object in the database; its implementation is the minimal working version to showcase the example.

adapters/repository.py

from types_aiobotocore_dynamodb import DynamoDBClient

from .domain001 import Customer


class DynamoDBCustomerRepository:
    def __init__(self, client: DynamoDBClient, table_name: str) -> None:
        self._client = client
        self._table_name = table_name

    async def save(self, customer: Customer) -> None:
        await self._client.put_item(
            TableName=self._table_name,
            Item={
                "PK": {"S": f"CUSTOMER#{customer.id}"},
                "Id": {"S": customer.id},
                "Name": {"S": customer.name},
                "Email": {"S": customer.email},
            },
        )

Testing with a production-like database

As we learned in the Testing Databases, we want to test the Repository with a production-like database. To test the DynamoDB Repository, we'll use Moto AWS service mocks. An alternative is using a real AWS account - that will make the tests accurate but slower and more complicated to configure securely due to permission and account management - if we're not careful, we can accidentally run the tests on a production AWS account. Service mocks like Moto or LocalStack are good enough for most use cases.

To test the Repository, we need to instantiate it with DynamoDBClient and table_name. We'll get the DynamoDBClient from the Tomodachi Testcontainers library with the moto_dynamodb_client fixture. The fixture will automatically start MotoContainer. For the table_name, any string value will suffice; the example is using a value with a random uuid suffix as a namespace to avoid table name clashes during tests.

Dependency injection increases testability

Being able to pass a different DynamoDBClient to the Repository in tests is powerful - it makes the code testable and explicit about its dependencies. To configure the Repository in the production code, we'd create a new DynamoDBClient instance with configuration values from the environment variables.

The first test test_save_customer creates a new Customer object and calls the save method to persist it in the database. The assertion is missing for now - we'll look into what to assert in the next section.

tests/test_repository.py

import uuid
from typing import AsyncGenerator

import pytest
import pytest_asyncio
from types_aiobotocore_dynamodb import DynamoDBClient

from .create_customers_table import create_customers_table
from .domain001 import Customer
from .repository001 import DynamoDBCustomerRepository


@pytest_asyncio.fixture(loop_scope="session")
async def repository(moto_dynamodb_client: DynamoDBClient) -> AsyncGenerator[DynamoDBCustomerRepository, None]:
    table_name = f"autotest-{uuid.uuid4()}-customers"
    await create_customers_table(moto_dynamodb_client, table_name)
    yield DynamoDBCustomerRepository(moto_dynamodb_client, table_name)
    await moto_dynamodb_client.delete_table(TableName=table_name)


@pytest.mark.asyncio(loop_scope="session")
async def test_save_customer(repository: DynamoDBCustomerRepository) -> None:
    # Arrange
    customer = Customer.create(name="John Doe", email="john.doe@example.com")

    # Act
    await repository.save(customer)

    # Assert
    ...

For the example completeness, the function below creates a new DynamoDB table.

tests/create_customers_table.py

from types_aiobotocore_dynamodb import DynamoDBClient


async def create_customers_table(client: DynamoDBClient, table_name: str) -> None:
    await client.create_table(
        TableName=table_name,
        KeySchema=[
            {"AttributeName": "PK", "KeyType": "HASH"},
        ],
        AttributeDefinitions=[
            {"AttributeName": "PK", "AttributeType": "S"},
        ],
        BillingMode="PAY_PER_REQUEST",
    )

Test the interface, not the implementation

To test that the Repository has saved an object in a database, we can query the database and assert that the data is stored correctly. This approach has a significant drawback - the tests know about the Repository's implementation details, such as how and where the data is stored. As more functionality is added to the Repository, the tests will become brittle, lengthy, and difficult to maintain.

tests/test_repository.py

@pytest.mark.asyncio(loop_scope="session")
async def test_save_customer(repository: DynamoDBCustomerRepository, moto_dynamodb_client: DynamoDBClient) -> None:
    # Arrange
    customer = Customer.create(name="John Doe", email="john.doe@example.com")

    # Act
    await repository.save(customer)

    # Assert
    item = await moto_dynamodb_client.get_item(
        TableName=repository._table_name,
        Key={"PK": {"S": f"CUSTOMER#{customer.id}"}},
    )
    assert item["Item"] == {
        "PK": {"S": f"CUSTOMER#{customer.id}"},
        "Id": {"S": customer.id},
        "Name": {"S": "John Doe"},
        "Email": {"S": "john.doe@example.com"},
    }

To test the Repository, verify its behavior by calling only its public API - test the interface, not the implementation. The intent of the test_save_customer is to assert that the Customer object is saved to the Repository - that it's possible to retrieve it back from the Repository and that its data is the same. This way, the tests are not concerned with the database's internal data structure, which can now change independently without breaking the tests.

The DynamoDBCustomerRepository.get reconstructs a customer's object from existing data from the database.

adapters/repository.py

class DynamoDBCustomerRepository:
    ...

    async def get(self, customer_id: str) -> Customer:
        response = await self._client.get_item(
            TableName=self._table_name,
            Key={"PK": {"S": f"CUSTOMER#{customer_id}"}},
        )
        item = response["Item"]
        return Customer(
            id=item["Id"]["S"],
            name=item["Name"]["S"],
            email=item["Email"]["S"],
        )

Repository's public API round-trip testing helps to avoid testing implementation details

You can think of the pattern of saving an object and querying it in the same test as a "round-trip" test. The same test verifies a complete cycle of a domain object persistence - saved in the datastore and retrieved back. The example doesn't include updating the domain object, but the same idea applies - create (arrange), update (act), query (assert).

To test a negative case when the Customer is not found in the Repository, we can test that the get method raises an exception. The current Repository implementation will throw the KeyError because the Item key will not exist in the DynamoDB GetItem API response. This test has the same problem as the first example - it asserts on the implementation detail - KeyError.

The test shouldn't care if the internal data structure is a dictionary that throws the KeyError when the dictionary key is not found. In addition, the KeyError might not necessarily mean that the customer is not found in the Repository. If the Repository has a bug and is not saving the customer's object field, the same error will be raised when trying to access the unsaved field, e.g., email=item["Email"]["S"]. In this case, the error handling code catching the KeyError will always treat it as the "customer not found" case and return misleading results to the application's end user.

tests/test_repository.py

@pytest.mark.asyncio(loop_scope="session")
async def test_customer_not_found(repository: DynamoDBCustomerRepository) -> None:
    with pytest.raises(KeyError):
        await repository.get("123456")

To hide the exception's implementation details, we introduce a new domain exception - CustomerNotFoundError - to identify and handle the error unambiguously. The domain exception is part of the Repository's public API - when a customer with a given customer_id is not found, the CustomerNotFoundError is raised. All Repository's implementations must adhere to this public API or contract, regardless of the underlying database technology.

adapters/repository.py

class DynamoDBCustomerRepository:
    ...

    async def get(self, customer_id: str) -> Customer:
        response = await self._client.get_item(
            TableName=self._table_name,
            Key={"PK": {"S": f"CUSTOMER#{customer_id}"}},
        )
        item = response.get("Item")
        if item is None:
            raise CustomerNotFoundError(customer_id)
        return Customer(
            id=item["Id"]["S"],
            name=item["Name"]["S"],
            email=item["Email"]["S"],
        )

tests/test_repository.py

from .repository005 import CustomerNotFoundError


@pytest.mark.asyncio(loop_scope="session")
async def test_customer_not_found(repository: DynamoDBCustomerRepository) -> None:
    with pytest.raises(CustomerNotFoundError):
        await repository.get("123456")

Implementing a fake Repository

The database implementation details obscure the intent of the application's business logic, so we have hidden the details behind the Repository's interface - a contract between the application's domain layer and persistence layer.

To further ease domain layer testing, instead of using the production DynamoDBCustomerRepository, we can replace it with an in-memory fake Repository. The fake Repository will store the data in an in-memory dictionary. The in-memory version is unsuitable for real-world use because the data is lost on application shutdown, and the Repository is not scalable. However, if the fake Repository behaves the same as the real one, it's a good choice for unit testing, prototyping, and demos. There's no database or Testcontainers to manage, and the tests will be fast.

adapters/repository.py

class InMemoryRepository:
    def __init__(self, customers: list[Customer]) -> None:
        self.customers = {customer.id: customer for customer in customers}

    async def save(self, customer: Customer) -> None:
        if customer.id in self.customers:
            raise CustomerIdentifierAlreadyExistsError(customer.id)
        if customer.email in (customer.email for customer in self.customers.values()):
            raise CustomerEmailAlreadyExistsError(customer.email)
        self.customers[customer.id] = customer

    async def get(self, customer_id: str) -> Customer:
        try:
            return self.customers[customer_id]
        except KeyError as e:
            raise CustomerNotFoundError(customer_id) from e

The in-memory repository is useful not just for testing; when modeling a complex and unknown domain, you can postpone the decision of which database technology to use and focus the development efforts on the problem domain. By using the Repository with a clean interface, you'll be able to quickly evolve the domain layer without being slowed down by the accidental complexities of a production database - mapping domain objects to the datastore format and back, managing schema and data migration, handling infrastructure errors, etc. You can better commit to a specific technology when the problem domain is more explored and apparent with how the data is queried and used.

Let the problem domain drive your technological choices

By focusing the development on the problem domain first and keeping the infrastructure concerns on the periphery, later, you can make informed choices of which specific technologies are better suited for your needs. It's a significant benefit of the broader Ports & Adapters pattern - hiding accidental complexity of low-level components. The Ports & Adapters apply to all systems - databases, file stores, external services, message brokers, etc.

Testing other Repository implementations with the same test suite

To ensure that the in-memory Repository works, we must test it with the same test suite as the production Repository. Since the interface is the same and tests are testing the interface, not the implementation, the test suite doesn't care which Repository it's given, in-memory, DynamoDB, PostgreSQL, AWS S3, etc., as long as the behavior is the same. Knowing this property, we could have implemented the in-memory repository and its tests before the DynamoDB version.

To reuse the same test suite for testing multiple Repository versions, we'll use pytest parametrized fixtures. Any other popular test runner should have a similar concept. We'll define two fixtures, dynamodb_repository and fake_repository, and use them in the generic repository fixture. The repository fixture is parametrized; when a test case uses the fixture, it will be run twice - with dynamodb and fake parameters. Depending on the passed parameter, the fixture will return DynamoDBCustomerRepository or InMemoryRepository.

tests/test_repository.py

import uuid
from typing import AsyncGenerator, Generator

import pytest
import pytest_asyncio
from types_aiobotocore_dynamodb import DynamoDBClient

from .create_customers_table import create_customers_table
from .domain006 import (
    Customer,
    CustomerEmailAlreadyExistsError,
    CustomerIdentifierAlreadyExistsError,
    CustomerNotFoundError,
)
from .ports006 import CustomerRepository
from .repository006 import DynamoDBCustomerRepository, InMemoryRepository


@pytest_asyncio.fixture(loop_scope="session")
async def dynamodb_repository(moto_dynamodb_client: DynamoDBClient) -> AsyncGenerator[DynamoDBCustomerRepository, None]:
    table_name = f"autotest-{uuid.uuid4()}-customers"
    await create_customers_table(moto_dynamodb_client, table_name)
    yield DynamoDBCustomerRepository(moto_dynamodb_client, table_name)
    await moto_dynamodb_client.delete_table(TableName=table_name)


@pytest.fixture
def fake_repository() -> InMemoryRepository:
    return InMemoryRepository([])


@pytest.fixture(params=["dynamodb", "fake"])
def repository(
    request: pytest.FixtureRequest,
    dynamodb_repository: DynamoDBCustomerRepository,
    fake_repository: InMemoryRepository,
) -> Generator[CustomerRepository, None, None]:
    if request.param == "dynamodb":
        yield dynamodb_repository
    elif request.param == "fake":
        yield fake_repository
    else:
        raise NotImplementedError

The tests use the generic repository fixture instead of a specific implementation and are run twice - with DynamoDBCustomerRepository and InMemoryRepository.

tests/test_repository.py

from .ports006 import CustomerRepository


@pytest.mark.asyncio(loop_scope="session")
async def test_save_customer(repository: CustomerRepository) -> None:
    customer = Customer.create(name="John Doe", email="john.doe@example.com")

    await repository.save(customer)

    assert await repository.get(customer.id) == customer


@pytest.mark.asyncio(loop_scope="session")
async def test_customer_not_found(repository: CustomerRepository) -> None:
    with pytest.raises(CustomerNotFoundError, match="123456"):
        await repository.get("123456")


@pytest.mark.asyncio(loop_scope="session")
async def test_customer_id_should_be_unique(repository: CustomerRepository) -> None:
    customer_id = str(uuid.uuid4())
    customer_1 = Customer(id=customer_id, name="John Doe", email="john.doe@example.com")
    customer_2 = Customer(id=customer_id, name="Mary Doe", email="mary.doe@example.com")
    await repository.save(customer_1)

    with pytest.raises(CustomerIdentifierAlreadyExistsError, match=customer_id):
        await repository.save(customer_2)


@pytest.mark.asyncio(loop_scope="session")
async def test_customer_email_should_be_unique(repository: CustomerRepository) -> None:
    customer_1 = Customer.create(name="John Doe", email="john.doe@example.com")
    customer_2 = Customer.create(name="John Doe", email="john.doe@example.com")
    await repository.save(customer_1)

    with pytest.raises(CustomerEmailAlreadyExistsError, match="john.doe@example.com"):
        await repository.save(customer_2)

For the type hint, the repository tests use the generic Repository's protocol - the "Port" part of the "Ports & Adapters" pattern.

customers/ports.py

from typing import Protocol

from .domain006 import Customer


class CustomerRepository(Protocol):
    async def save(self, customer: Customer) -> None: ...

    async def get(self, customer_id: str) -> Customer: ...

Testing other Repository implementations with the same test suite — Running the same test suite with different pytest fixture implementations.

Decoupling and Testing Infrastructure Layer with Ports & Adapters Pattern

This section used the Repository pattern to decouple the persistence layer from the domain layer. The Repository pattern is a specific application for another, more general pattern - Ports & Adapters. The Ports & Adapters pattern helps decouple all sorts of components, not just databases. The following section Decoupling and Testing Infrastructure Layer with Ports & Adapters Pattern describes in more detail the applications of Ports & Adapters, and how Testcontainers help to implement and test the "Adapters" part of the pattern.

The diagrams below (C4) showcase how the Ports & Adapters pattern helps to implement the example Repository.

Container Diagram - Application with DynamoDB Database

Ports are the interfaces of our infrastructure components; they reside in the domain layer. Adapters or the Infrastructure layer implement the Ports. There are two implementations - in-memory Repository and DynamoDB Repository. Testcontainers help to test the DynamoDB Repository in a production-like environment by provisioning AWS service mocks, e.g., LocalStack.

The diagram shows that all dependencies (arrows) flow towards the domain layer. It means the domain layer doesn't depend on other infrastructure components. It makes the domain layer testable in isolation and makes it easier to understand the code.

Testing Repositories

Implementing a simple Repository

Implementing domain object

Implementing DynamoDB Repository

Testing with a production-like database

Test the interface, not the implementation

Implementing a fake Repository

Testing other Repository implementations with the same test suite

Decoupling and Testing Infrastructure Layer with Ports & Adapters Pattern

References