Mocks vs TestContainers

Ivan Ponomarev

ivan

Ivan Ponomarev

  • Staff Engineer @ Synthesized.io

  • Teaching Java at universities

Why I Decided to Make This Presentation?

  • A am a user and enthusiast of TestContainers library since 2016

  • However, quite often I argue with engineers who consider TestContainers to be
    an ultimate solution for every problem

  • I would like to share some insights based on my personal experience
    in integration testing in Java/Kotlin

  • However, the takeaways of this talk must be useful not just for Java/Kotlin developers

Modern integration test

Diagram

Modern integration test

Diagram

Modern integration test

Diagram

Modern integration test

Diagram

Modern integration test

Diagram

Modern integration test

Diagram

What do we have?

Mocks

vs

Real Systems

Using mocks is like learning chemistry from cartoons…​

Nothing beats a real experiment, though…​

real chemistry

What do we have for Real Experiments?

testcontainers logo

What do we have for Real Experiments?

testcontainers logo

"An open source framework for providing throwaway, lightweight instances of databases, message brokers, web browsers, or just about anything that can run in a Docker container."

tc support

TestContainers in action

GenericContainer redis = new GenericContainer(
        DockerImageName.parse("redis:5.0.3-alpine"))
        .withExposedPorts(6379);

String address = redis.getHost();
Integer port = redis.getFirstMappedPort();

underTest = new RedisBackedCache(address, port);

Set of stereotypes

  •  — Mocks are unreliable

  •  — Let’s run everything in test containers and test it!

  •  — Using H2 database for testing is an outdated practice!

  • …​ we will address all them later,
    but first let’s consider the case of external REST/gRPC services

Mocks of external services

Diagram

Mocks of external services

Mocks

vs

Real Systems

thumbs up We have control over the mock and can easily simulate all the corner cases.

thumbs down We lack control over the external service and its functionality.

wiremock mountebank

The importance of proper testing of corner cases

Gojko Adzic, Humans vs Computers, 2017

"Let’s release from prisons everyone who has not committed serious crimes"

try {
  murders = restClient.loadMurders();
}
catch (IOException e) {
  logger.error("failed to load", e);
  murders = emptyList();
}

WireMock features

wiremock

Simulate a response from the service

Diagram

WireMock features

wiremock

Verify calls to the service

Diagram

WireMock features

wiremock

"Spy" by intercepting calls to the real service

Diagram

What about RDBMS/NoSQL/message brokers etc?

Diagram

What about RDBMS/NoSQL/message brokers etc?

  •  — They are too complicated to mock!

  •  — Hurray for Testcontainers!

Mocks vs TestContainers: compatibility

Mocks

vs

Testcontainers

  • thumbs down No guarantee of replicating the behavior of the real system.

  • thumbs down Writing a mock that is bug-to-bug compatible is more challenging than building the actual system, and rarely achieved.

Mocks vs TestContainers: compatibility

Mocks

vs

Testcontainers

  • thumbs down No guarantee of replicating the behavior of the real system.

  • thumbs down Writing a mock that is bug-to-bug compatible is more challenging than building the actual system, and rarely achieved.

thumbs up This is the real system itself!

Mocks vs TestContainers: Ease of use and start-up speed

Mocks

vs

Testcontainers

  • thumbs up Just a regular dependency

  • thumbs up Starts instantly, runs in the same process with the test

Mocks vs TestContainers: Ease of use and start-up speed

bepatient

Mocks vs TestContainers: Ease of use and start-up speed

Mocks

vs

Testcontainers

  • thumbs up Just a regular dependency

  • thumbs up Starts instantly, runs in the same process with the test

  • shrug Require Docker (OK, nowadays, it’s available everywhere)

  • shrug Need to download the image (depends on the size and the internet speed)

  • shrug Need to start the container (from a few seconds to a couple of minutes)

TC Startup time

startup

Worst situation

  • thumbs down "I have a laptop with an ARM-based CPU and the Docker image is not compatible!"

Testcontainers Cloud

tc cloud
  • shrug Paid service

  • shrug Requires good Internet connection (won’t work on a train)

Convenience

Mocks

vs

Testcontainers

thumbs up Fast and reliable tests make one run them more often and write more of them

thumbs down "Heavy" tests are too time-consuming to run and debug, making one want to skip their execution.

Integration Mocks vs TestContainers

Mocks

vs

Testcontainers

  • thumbs up White-box testing (enables verification of calls from the system under test).

  • thumbs up Simulation of any state, including failures, facilitating testing of corner cases.

  • thumbs up Support synchronous execution (more details to follow).

Integration Mocks vs TestContainers

Mocks

vs

Testcontainers

  • thumbs up White-box testing (enables verification of calls from the system under test).

  • thumbs up Simulation of any state, including failures, facilitating testing of corner cases.

  • thumbs up Support synchronous execution (more details to follow).

  • thumbs down Challenges in setting up the system in the desired state

  • thumbs down Difficulties in verifying which commands were called.

Availability

Mocks

vs

Testcontainers

shrug Sometimes they are available, but most often they are not.

thumbs up Can be used for anything that can be run in containers.

Story № 1. JedisMock and Call Verification

redis

Mocks of Redis in various programming languages

Python

240 stars

NodeJS

210 stars

Java

151 stars

JedisMock

  • Reimplementation of Redis in pure Java (works at the network protocol level)

  • As of April 2024, supports 153 out of 237 commands (64%)

supported redis operations

JedisMock

  • Tested with Comparison tests (running identical scenarios on Jedis-Mock and on containerized Redis)

comparison

JedisMock

  • Also tested with a subset of native Redis tests written in Tcl/Tk (tests that are being used for regression testing of Redis itself)

native redis tests

JedisMock

  • Still behavior which is different from the real Redis is being constantly reported by users (and quickly fixed)

jedis mock bugs

Why mock if we have TestContainers?

GenericContainer redis = new GenericContainer(
        DockerImageName.parse("redis:5.0.3-alpine"))
        .withExposedPorts(6379);

String address = redis.getHost();
Integer port = redis.getFirstMappedPort();

underTest = new RedisBackedCache(address, port);

JedisMock is a regular Maven dependency

//build.gradle.kts
testImplementation("com.github.fppt:jedis-mock:1.1.1")
//This binds mock redis server to a random port
RedisServer server = RedisServer
        .newRedisServer()
        .start();

//Jedis connection:
Jedis jedis = new Jedis(server.getHost(), server.getBindPort());
//Lettuce connection:
RedisClient redisClient = RedisClient
        .create(String.format("redis://%s:%s",
        server.getHost(), server.getBindPort()));

RedisCommandInterceptor: explicitly specified response

RedisServer server = RedisServer.newRedisServer()
  .setOptions(ServiceOptions.withInterceptor((state, cmd, params) -> {
    if ("get".equalsIgnoreCase(cmd)) {
      //explicitly specify the response
      return Response.bulkString(Slice.create("MOCK_VALUE"));
    } else {
      //delegate to the mock
      return MockExecutor.proceed(state, cmd, params);
    }
})).start();

RedisCommandInterceptor: verification

RedisServer server = RedisServer.newRedisServer()
  .setOptions(ServiceOptions.withInterceptor((state, cmd, params) -> {
    if ("echo".equalsIgnoreCase(cmd)) {
      //check the request
      assertEquals("hello", params.get(0).toString());
    }
    //delegate to the mock
    return MockExecutor.proceed(state, cmd, params);
})).start();

RedisCommandInterceptor: failure simulation

RedisServer server = RedisServer.newRedisServer()
  .setOptions(ServiceOptions.withInterceptor((state, cmd, params) -> {
    if ("echo".equalsIgnoreCase(cmd)) {
      //simulate a failure
      return MockExecutor.breakConnection(state);
    } else {
      //delegate to the mock
      return MockExecutor.proceed(state, cmd, params);
    }
})).start();

Working as a Test Proxy

Diagram

Conclusions on Jedis-Mock

  • index up For most Redis testing tasks, TestContainers works.

  • index up But if you want to verify the behavior of your own system or study it in situations when Redis itself fails — JedisMock is helpful.

Story №2. Kafka Streams TopologyTestDriver
and the Hell of Asynchronous Testing

Kafka Streams testing: possible options

Diagram
vs
testcontainers logo

TopologyTestDriver

  • thumbs up Simple (just a regular Maven dependency)

  • thumbs up Fast

  • thumbs up Convenient (good API for Arrange and Assert)

The Main Difference:

TopologyTestDriver

vs

Real Kafka

Works synchronously (single thread and event loop)

Works asynchronously in multiple threads on multiple containers

Thought Experiment: Limitations of Asynchronous Tests

  • We send "ping" and expect the system to return a single response "pong".

  • yellow circle 2 seconds. No response.

  • yellow circle 3 seconds. No response.

  • checkmark 4 seconds. "pong". Are we done?

  • checkmark 5 seconds. Silence.

  • checkmark 6 seconds. Silence.

  • cross mark 7 seconds. "boom!"

The Problem with Polling

//5 seconds?? maybe 6? maybe 4?
while (!(records =
           consumer.poll(Duration.ofSeconds(5))).isEmpty()) {
    for (ConsumerRecord<String, String> rec : records) {
        values.add(rec.value());
    }
}

Fundamental problem: is this the final result, or have we not waited long enough?

Awaitility: A partial solution to the problem with asynchronous testing

Awaitility.await().atMost(10, SECONDS).until(() ->
                  { // returns true
                  });

yellow circle

step1

Awaitility: A partial solution to the problem with asynchronous testing

Awaitility.await().atMost(10, SECONDS).until(() ->
                  { // returns true
                  });

yellow circle

step2

Awaitility: A partial solution to the problem with asynchronous testing

Awaitility.await().atMost(10, SECONDS).until(() ->
                  { // returns true
                  });

checkmark

awaitility pass

Awaitility: Test failure

Awaitility.await().atMost(10, SECONDS).until(()->
                  { // returns false for more than 10 seconds
                  });
cross mark
awaitility fail

Awaitility DSL Capabilities

  • atLeast (should not happen earlier)

  • atMost (should happen before expiration)

  • during (should occur throughout the interval)

  • poll interval:

    • fixed (1, 1, 1, 1…​)

    • Fibonacci (1, 2, 3, 5…​)

    • exponential (1, 2, 4, 8…​)

  • Awaitility speeds up asynchronous tests but does not overcome the fundamental problem of asynchronous tests

Meanwhile, in browser automation…​

  • Modern frameworks, such as Selenide or Playwright provide implicit waits for conditions to be met.

Problems with Awaitility

  • Arbitrary choice of waiting times leads to flakiness

  • We enter the slippery path of concurrent Java programming

Real Test with Awaitility: Part 1

//This must be a thread-safe data structure!
List<String> actual = new CopyOnWriteArrayList<>();
ExecutorService service = Executors.newSingleThreadExecutor();
Future<?> consumingTask = service.submit(() -> {
    //We must take into account the cooperative termination!
    while (!Thread.currentThread().isInterrupted()) {
    ConsumerRecords<String, String> records =
      consumer.poll(Duration.ofMillis(100));
    for (ConsumerRecord<String, String> rec : records) {
      actual.add(rec.value());
}}});

Real Test with Awaitility: Part 2

try {
  Awaitility.await().atMost(5, SECONDS)
           .until(() -> List.of("A", "B").equals(actual));
} finally {
    //We should not forget to finalize the execution
    //even in case of errors!
    consumingTask.cancel(true);
    service.awaitTermination(200, MILLISECONDS);
}

Test with TopologyTestDriver

List<String> values = outputTopic.readValuesToList();
Assertions.assertEquals(List.of("A", "B"), values);

What’s the pitfall??

  • shrug synchronous nature and lack of caching lead to differences in behavior

  • shrug it’s possible to construct simple code examples that pass the "green" test on TTD, but operate completely incorrectly on a real cluster (refer to https://www.confluent.io/blog/testing-kafka-streams/)

Conclusions on KafkaStreams:

  • index up TopologyTestDriver (TTD) remains essential, despite its limitations.

  • index up Understand that TTD may not fully mimic the behavior of a real Kafka cluster.

  • shrug A failure in TTD indicates problems in the code; however, passing tests in TTD do not guarantee code reliability.

  • index up Conduct a limited number of tests on a containerized Kafka cluster when necessary.

Story №3. Mock as One of the Supported Backends

Diagram

Apache Beam

beam logo
  • Apache Beam is a unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing.

  • SDKs: Java, Python, Go

Apache Beam Runners

  • Runners:

    • Apache Flink,

    • Apache Nemo,

    • Apache Samza,

    • Apache Spark,

    • Google Cloud Dataflow,

    • Hazelcast Jet,

    • Direct Runner

Diagram

--runner=DirectRunner

Sets of Functional Capabilities

beam backends

Matrix of Supported Features (Fragment)

beam backends
beam capability matrix

Direct Runner

beam direct

"Direct Runner performs additional checks to ensure that users do not rely on semantics that are not guaranteed by the model…​ Using the Direct Runner helps ensure that pipelines are robust across different Beam runners."

Apache Beam’s Direct Runner

  • index up Failure on Direct Runner means the code is bad.

  • shrug Successful execution on Direct Runner doesn’t mean the code is good.

  • shrug How to test Google Cloud Dataflow without Google Cloud is beyond me.

Celesta

celesta duke

Celesta

celesta duke
  • Database-first: the user defines the desired database structure, Celesta generates data access API and handles automatic migration.

  • Database-agnostic: table and view structures are described in CelestaSQL, which is then transpiled into one of the supported dialects.

Celesta

Diagram

Celesta

celesta backends
  • 5 databases, incompatible in details

Celesta

celesta sql
  • CelestaSQL is transpiled into specific dialects.

  • A subset of features is supported.

  • Celesta itself is tested with Comparison tests (running identical scenarios on all types of databases).

Celesta Comparison Tests

comparison celesta

Celesta

  • Works on H2 ⇒ will work on PostgreSQL, MS SQL, etc.

Celesta

  • Works on H2 ⇒ will work on PostgreSQL, MS SQL, etc.

Diagram

In-Memory H2 Capabilities

  • thumbs up Starts with an empty database instantly

  • thumbs up Migrates instantly

  • thumbs up Queries are easily traced
    (SET TRACE_LEVEL_SYSTEM_OUT 2)

  • thumbs up After the test, the state is "forgotten"

Using the spy JDBC Driver to trace SQL queries

Instead of

jdbc:postgresql://host/database

use

jdbc:log4jdbc:postgresql://host/database
Diagram

CelestaTest: Arrange

@CelestaTest
class OrderDaoTest {
  OrderDao orderDao = new OrderDao();
  CustomerCursor customer;
  ItemCursor item;

CelestaTest: Arrange

@CelestaTest
class OrderDaoTest {
  OrderDao orderDao = new OrderDao();
  CustomerCursor customer;
  ItemCursor item;

  @BeforeEach
  void setUp(CallContext ctx) {
    customer = new CustomerCursor(ctx);
    customer.setName("John Doe")
                 .setEmail("john@example.com").insert();

    item = new ItemCursor(ctx);
    item.setId("12345")
           .setName("cheese").setDefaultPrice(42).insert();
  }

CelestaTest: Local arrange

@Test
void orderedItemsMethodReturnsAggregatedValues(CallContext ctx)
    throws Exception {
  //ARRANGE
  ItemCursor item2 = new ItemCursor(ctx);
  item2.setId("2")
    .setName("item 2").insert();

  OrderCursor orderCursor = new OrderCursor(ctx);
  orderCursor.setId(null)
    .setItemId(item.getId())
    .setCustomerId(customer.getId())
    .setQuantity(1).insert();

  //and so on

CelestaTest: Act & Assert

//ACT
List<ItemDto> result = orderDao.getItems(ctx);

//ASSERT
Approvals.verifyJson(new ObjectMapper()
  .writer().writeValueAsString(result));

CelestaTest

  • thumbs up Works instantly

  • thumbs up Creates an empty database with the required structure for each test

  • thumbs up Encourages writing a large number of tests for all database-related logic

  • shrug The price we pay is the limitation of functionality within what Celesta supports.

Conclusions on TestContainers

  • index up Can pose problems with startup speed and developer machine configuration.

  • index up Real services are "black boxes," and it’s difficult to force them into the desired state.

  • index up Integration tests with "real" services are asynchronous, with insurmountable difficulties. These difficulties need to be understood.

Conclusions on Mocks

  • index up Specialized mocks are easier to connect, start, and execute faster.

  • index up Mocks have special functionality that facilitates testing.

  • index up Mocks do not behave the same way as the real system. This fact needs to be understood and accepted.

  • index up For your system, mocks may simply not exist.

General Conclusions

  • index up When forming a testing strategy, one should rely not on stereotypes but on a deep understanding of the system’s characteristics and available tools. The strategy will be different every time!

  • index up The testability of the system as a whole should be one of the criteria when choosing technologies.

The Most Important Conclusion

index up You should use both mocks and containers,
but above all, use your own head.


@inponomarev