Testing Repositories
The Repository pattern is an abstraction over the data storage layer. It wraps database operations in an interface and hides the complexity and mechanics of the database. The domain layer uses the Repository to query and persist domain objects. From the domain layer point of view, the Repository's interface looks like an in-memory domain object collection.
The Repository's responsibility is object retrieval and persistence - it contains no business logic. Therefore, all business logic resides in the domain objects, not the data layer. This separation of responsibilities makes testing and reasoning about the core application's behavior easier. In tests, you'll provide an in-memory object to the domain logic you want to test without being burdened by the test data setup through the production database.
The Repository pattern is described in detail in the PoEAA and DDD. Other great resources I know of are: cosmicpython book's chapter on Repositories, Repositories for DDD on AWS, and designing persistence layer with .NET example.
The following sections will explore testing Repositories separately from the rest of the application.
Implementing a simple Repository
The Repository implementation will differ depending on the database technology (relational, NoSQL, file store, etc.) and the framework (ORM, datastore client library, etc.). The most important part is not exposing the underlying technology in the Repository's interface.
Implementing domain object
Before exploring how to test a Repository, we must create a domain object that the Repository will save and query.
We'll use a Customer
object as an example.
There are two ways to create a new Customer
object instance -
with the default constructor (auto-generated with @dataclass
) or a create
factory method.
The former is used for object reconstruction and the latter for new object creation.
import uuid
from dataclasses import dataclass
@dataclass
class Customer:
id: str
name: str
email: str
@staticmethod
def create(name: str, email: str) -> "Customer":
return Customer(
id=str(uuid.uuid4()),
name=name,
email=email,
)
When you're querying an existing object, e.g., by customer identifier or email address, the object is
reconstructed from existing data from a datastore. When the data layer is separated from the domain layer,
the Repository is responsible for reconstructing the objects.
The domain layer provides an interface for object reconstruction - the object constructor method (__init__
).
To create a new, unique customer, e.g., when a user registers in your application, we'll use the new object creation method - create
.
The create
method requires the data to create a new customer - name
and email
.
Unlike __init__
, the create
method doesn't require the id
; the new random uuid
is generated inside the create
method.
Therefore, the new object creation method encapsulates the rules of a new object creation -
in this example, the Customer.id
generation - and offloads this responsibility from the data layer.
We'll see how separating the object reconstruction from the new object creation is useful when implementing a sample Repository.
Implementing DynamoDB Repository
In this example, we'll use AWS DynamoDB to implement DynamoDBCustomerRepository
.
The sample Repository has two methods - constructor (__init__
) and save
method.
The constructor takes two dependencies - DynamoDBClient
and table_name
. Explicitly passing the dependencies increases flexibility -
the Repository will be easy to configure in tests and in the production code, as we'll see in the following sections.
The save
method persists a Customer
object in the database; its implementation is the minimal working version to showcase the example.
from types_aiobotocore_dynamodb import DynamoDBClient
from .domain001 import Customer
class DynamoDBCustomerRepository:
def __init__(self, client: DynamoDBClient, table_name: str) -> None:
self._client = client
self._table_name = table_name
async def save(self, customer: Customer) -> None:
await self._client.put_item(
TableName=self._table_name,
Item={
"PK": {"S": f"CUSTOMER#{customer.id}"},
"Id": {"S": customer.id},
"Name": {"S": customer.name},
"Email": {"S": customer.email},
},
)
Testing with a production-like database
As we learned in the Testing Databases, we want to test the Repository with a production-like database. To test the DynamoDB Repository, we'll use Moto AWS service mocks. An alternative is using a real AWS account - that will make the tests accurate but slower and more complicated to configure securely due to permission and account management - if we're not careful, we can accidentally run the tests on a production AWS account. Service mocks like Moto or LocalStack are good enough for most use cases.
To test the Repository, we need to instantiate it with DynamoDBClient
and table_name
.
We'll get the DynamoDBClient
from the Tomodachi Testcontainers library with the moto_dynamodb_client
fixture.
The fixture will automatically start MotoContainer
. For the table_name
, any string value will suffice;
the example is using a value with a random uuid
suffix as a namespace to avoid table name clashes during tests.
Dependency injection increases testability
Being able to pass a different DynamoDBClient
to the Repository in tests is powerful -
it makes the code testable and explicit about its dependencies.
To configure the Repository in the production code, we'd create a new DynamoDBClient
instance with configuration values from the environment variables.
The first test test_save_customer
creates a new Customer
object and calls the save
method to persist it in the database.
The assertion is missing for now - we'll look into what to assert in the next section.
import uuid
from typing import AsyncGenerator
import pytest
import pytest_asyncio
from types_aiobotocore_dynamodb import DynamoDBClient
from .create_customers_table import create_customers_table
from .domain001 import Customer
from .repository001 import DynamoDBCustomerRepository
@pytest_asyncio.fixture(loop_scope="session")
async def repository(moto_dynamodb_client: DynamoDBClient) -> AsyncGenerator[DynamoDBCustomerRepository, None]:
table_name = f"autotest-{uuid.uuid4()}-customers"
await create_customers_table(moto_dynamodb_client, table_name)
yield DynamoDBCustomerRepository(moto_dynamodb_client, table_name)
await moto_dynamodb_client.delete_table(TableName=table_name)
@pytest.mark.asyncio(loop_scope="session")
async def test_save_customer(repository: DynamoDBCustomerRepository) -> None:
# Arrange
customer = Customer.create(name="John Doe", email="john.doe@example.com")
# Act
await repository.save(customer)
# Assert
...
For the example completeness, the function below creates a new DynamoDB table.
from types_aiobotocore_dynamodb import DynamoDBClient
async def create_customers_table(client: DynamoDBClient, table_name: str) -> None:
await client.create_table(
TableName=table_name,
KeySchema=[
{"AttributeName": "PK", "KeyType": "HASH"},
],
AttributeDefinitions=[
{"AttributeName": "PK", "AttributeType": "S"},
],
BillingMode="PAY_PER_REQUEST",
)
Test the interface, not the implementation
To test that the Repository has saved an object in a database, we can query the database and assert that the data is stored correctly. This approach has a significant drawback - the tests know about the Repository's implementation details, such as how and where the data is stored. As more functionality is added to the Repository, the tests will become brittle, lengthy, and difficult to maintain.
@pytest.mark.asyncio(loop_scope="session")
async def test_save_customer(repository: DynamoDBCustomerRepository, moto_dynamodb_client: DynamoDBClient) -> None:
# Arrange
customer = Customer.create(name="John Doe", email="john.doe@example.com")
# Act
await repository.save(customer)
# Assert
item = await moto_dynamodb_client.get_item(
TableName=repository._table_name,
Key={"PK": {"S": f"CUSTOMER#{customer.id}"}},
)
assert item["Item"] == {
"PK": {"S": f"CUSTOMER#{customer.id}"},
"Id": {"S": customer.id},
"Name": {"S": "John Doe"},
"Email": {"S": "john.doe@example.com"},
}
To test the Repository, verify its behavior by calling only its public API - test the interface, not the implementation.
The intent of the test_save_customer
is to assert that the Customer
object is saved to the Repository -
that it's possible to retrieve it back from the Repository and that its data is the same.
This way, the tests are not concerned with the database's internal data structure,
which can now change independently without breaking the tests.
The DynamoDBCustomerRepository.get
reconstructs a customer's object from existing data from the database.
class DynamoDBCustomerRepository:
...
async def get(self, customer_id: str) -> Customer:
response = await self._client.get_item(
TableName=self._table_name,
Key={"PK": {"S": f"CUSTOMER#{customer_id}"}},
)
item = response["Item"]
return Customer(
id=item["Id"]["S"],
name=item["Name"]["S"],
email=item["Email"]["S"],
)
Repository's public API round-trip testing helps to avoid testing implementation details
You can think of the pattern of saving an object and querying it in the same test as a "round-trip" test. The same test verifies a complete cycle of a domain object persistence - saved in the datastore and retrieved back. The example doesn't include updating the domain object, but the same idea applies - create (arrange), update (act), query (assert).
To test a negative case when the Customer
is not found in the Repository,
we can test that the get
method raises an exception.
The current Repository implementation will throw the KeyError
because the Item
key will
not exist in the DynamoDB GetItem
API response. This test has the same problem as the first example -
it asserts on the implementation detail - KeyError
.
The test shouldn't care if the internal data structure is a dictionary that throws the KeyError
when the dictionary key is not found.
In addition, the KeyError
might not necessarily mean that the customer is not found in the Repository.
If the Repository has a bug and is not saving the customer's object field, the same error will be raised
when trying to access the unsaved field, e.g., email=item["Email"]["S"]
. In this case, the error handling code
catching the KeyError
will always treat it as the "customer not found" case and return misleading results to the application's end user.
@pytest.mark.asyncio(loop_scope="session")
async def test_customer_not_found(repository: DynamoDBCustomerRepository) -> None:
with pytest.raises(KeyError):
await repository.get("123456")
To hide the exception's implementation details, we introduce a new domain exception - CustomerNotFoundError
-
to identify and handle the error unambiguously.
The domain exception is part of the Repository's public API - when a customer with a given customer_id
is not found,
the CustomerNotFoundError
is raised.
All Repository's implementations must adhere to this public API or contract, regardless of the underlying database technology.
class DynamoDBCustomerRepository:
...
async def get(self, customer_id: str) -> Customer:
response = await self._client.get_item(
TableName=self._table_name,
Key={"PK": {"S": f"CUSTOMER#{customer_id}"}},
)
item = response.get("Item")
if item is None:
raise CustomerNotFoundError(customer_id)
return Customer(
id=item["Id"]["S"],
name=item["Name"]["S"],
email=item["Email"]["S"],
)
from .repository005 import CustomerNotFoundError
@pytest.mark.asyncio(loop_scope="session")
async def test_customer_not_found(repository: DynamoDBCustomerRepository) -> None:
with pytest.raises(CustomerNotFoundError):
await repository.get("123456")
Implementing a fake Repository
The database implementation details obscure the intent of the application's business logic, so we have hidden the details behind the Repository's interface - a contract between the application's domain layer and persistence layer.
To further ease domain layer testing, instead of using the production DynamoDBCustomerRepository
,
we can replace it with an in-memory fake Repository. The fake Repository will store the data in an in-memory dictionary.
The in-memory version is unsuitable for real-world use because the data is lost on application shutdown, and the Repository is not scalable.
However, if the fake Repository behaves the same as the real one, it's a good choice for unit testing, prototyping, and demos.
There's no database or Testcontainers to manage, and the tests will be fast.
class InMemoryRepository:
def __init__(self, customers: list[Customer]) -> None:
self.customers = {customer.id: customer for customer in customers}
async def save(self, customer: Customer) -> None:
if customer.id in self.customers:
raise CustomerIdentifierAlreadyExistsError(customer.id)
if customer.email in (customer.email for customer in self.customers.values()):
raise CustomerEmailAlreadyExistsError(customer.email)
self.customers[customer.id] = customer
async def get(self, customer_id: str) -> Customer:
try:
return self.customers[customer_id]
except KeyError as e:
raise CustomerNotFoundError(customer_id) from e
The in-memory repository is useful not just for testing; when modeling a complex and unknown domain, you can postpone the decision of which database technology to use and focus the development efforts on the problem domain. By using the Repository with a clean interface, you'll be able to quickly evolve the domain layer without being slowed down by the accidental complexities of a production database - mapping domain objects to the datastore format and back, managing schema and data migration, handling infrastructure errors, etc. You can better commit to a specific technology when the problem domain is more explored and apparent with how the data is queried and used.
Let the problem domain drive your technological choices
By focusing the development on the problem domain first and keeping the infrastructure concerns on the periphery, later, you can make informed choices of which specific technologies are better suited for your needs. It's a significant benefit of the broader Ports & Adapters pattern - hiding accidental complexity of low-level components. The Ports & Adapters apply to all systems - databases, file stores, external services, message brokers, etc.
Testing other Repository implementations with the same test suite
To ensure that the in-memory Repository works, we must test it with the same test suite as the production Repository. Since the interface is the same and tests are testing the interface, not the implementation, the test suite doesn't care which Repository it's given, in-memory, DynamoDB, PostgreSQL, AWS S3, etc., as long as the behavior is the same. Knowing this property, we could have implemented the in-memory repository and its tests before the DynamoDB version.
To reuse the same test suite for testing multiple Repository versions, we'll use pytest
parametrized fixtures.
Any other popular test runner should have a similar concept.
We'll define two fixtures, dynamodb_repository
and fake_repository
, and use them in the generic repository
fixture.
The repository
fixture is parametrized; when a test case uses the fixture, it will be run twice -
with dynamodb
and fake
parameters. Depending on the passed parameter, the fixture will return DynamoDBCustomerRepository
or InMemoryRepository
.
import uuid
from typing import AsyncGenerator, Generator
import pytest
import pytest_asyncio
from types_aiobotocore_dynamodb import DynamoDBClient
from .create_customers_table import create_customers_table
from .domain006 import (
Customer,
CustomerEmailAlreadyExistsError,
CustomerIdentifierAlreadyExistsError,
CustomerNotFoundError,
)
from .ports006 import CustomerRepository
from .repository006 import DynamoDBCustomerRepository, InMemoryRepository
@pytest_asyncio.fixture(loop_scope="session")
async def dynamodb_repository(moto_dynamodb_client: DynamoDBClient) -> AsyncGenerator[DynamoDBCustomerRepository, None]:
table_name = f"autotest-{uuid.uuid4()}-customers"
await create_customers_table(moto_dynamodb_client, table_name)
yield DynamoDBCustomerRepository(moto_dynamodb_client, table_name)
await moto_dynamodb_client.delete_table(TableName=table_name)
@pytest.fixture
def fake_repository() -> InMemoryRepository:
return InMemoryRepository([])
@pytest.fixture(params=["dynamodb", "fake"])
def repository(
request: pytest.FixtureRequest,
dynamodb_repository: DynamoDBCustomerRepository,
fake_repository: InMemoryRepository,
) -> Generator[CustomerRepository, None, None]:
if request.param == "dynamodb":
yield dynamodb_repository
elif request.param == "fake":
yield fake_repository
else:
raise NotImplementedError
The tests use the generic repository
fixture instead of a specific implementation and are run twice -
with DynamoDBCustomerRepository
and InMemoryRepository
.
from .ports006 import CustomerRepository
@pytest.mark.asyncio(loop_scope="session")
async def test_save_customer(repository: CustomerRepository) -> None:
customer = Customer.create(name="John Doe", email="john.doe@example.com")
await repository.save(customer)
assert await repository.get(customer.id) == customer
@pytest.mark.asyncio(loop_scope="session")
async def test_customer_not_found(repository: CustomerRepository) -> None:
with pytest.raises(CustomerNotFoundError, match="123456"):
await repository.get("123456")
@pytest.mark.asyncio(loop_scope="session")
async def test_customer_id_should_be_unique(repository: CustomerRepository) -> None:
customer_id = str(uuid.uuid4())
customer_1 = Customer(id=customer_id, name="John Doe", email="john.doe@example.com")
customer_2 = Customer(id=customer_id, name="Mary Doe", email="mary.doe@example.com")
await repository.save(customer_1)
with pytest.raises(CustomerIdentifierAlreadyExistsError, match=customer_id):
await repository.save(customer_2)
@pytest.mark.asyncio(loop_scope="session")
async def test_customer_email_should_be_unique(repository: CustomerRepository) -> None:
customer_1 = Customer.create(name="John Doe", email="john.doe@example.com")
customer_2 = Customer.create(name="John Doe", email="john.doe@example.com")
await repository.save(customer_1)
with pytest.raises(CustomerEmailAlreadyExistsError, match="john.doe@example.com"):
await repository.save(customer_2)
For the type hint, the repository tests use the generic Repository's protocol - the "Port" part of the "Ports & Adapters" pattern.
from typing import Protocol
from .domain006 import Customer
class CustomerRepository(Protocol):
async def save(self, customer: Customer) -> None: ...
async def get(self, customer_id: str) -> Customer: ...
Decoupling and Testing Infrastructure Layer with Ports & Adapters Pattern
This section used the Repository pattern to decouple the persistence layer from the domain layer. The Repository pattern is a specific application for another, more general pattern - Ports & Adapters. The Ports & Adapters pattern helps decouple all sorts of components, not just databases. The following section Decoupling and Testing Infrastructure Layer with Ports & Adapters Pattern describes in more detail the applications of Ports & Adapters, and how Testcontainers help to implement and test the "Adapters" part of the pattern.
The diagrams below (C4) showcase how the Ports & Adapters pattern helps to implement the example Repository.
Ports are the interfaces of our infrastructure components; they reside in the domain layer. Adapters or the Infrastructure layer implement the Ports. There are two implementations - in-memory Repository and DynamoDB Repository. Testcontainers help to test the DynamoDB Repository in a production-like environment by provisioning AWS service mocks, e.g., LocalStack.
The diagram shows that all dependencies (arrows) flow towards the domain layer. It means the domain layer doesn't depend on other infrastructure components. It makes the domain layer testable in isolation and makes it easier to understand the code.
References
- https://martinfowler.com/eaaCatalog/repository.html
- https://martinfowler.com/bliki/DomainDrivenDesign.html
- https://www.cosmicpython.com/book/chapter_02_repository.html
- https://ddd.mikaelvesavuori.se/tactical-ddd/repositories
- https://learn.microsoft.com/en-us/dotnet/architecture/microservices/microservice-ddd-cqrs-patterns/infrastructure-persistence-layer-design
- https://en.wikipedia.org/wiki/Hexagonal_architecture_(software)