Tests That Survive Change

David Morales David Morales
/
A red safety net catches falling code pieces, with a ruby at the center.

Writing changeable code rests on three skills. The first is object-oriented design: poorly designed code is hard to change by nature. The second is knowing how to refactor: transforming the structure of code without altering its external behavior. The third is tests, which give you the confidence to refactor without fear.

All three support each other. Well-designed code is easy to change, refactoring is how you move from one design to the next, and tests are what let you refactor with impunity. Without tests, you’re afraid to touch anything; without design, tests cost more than they’re worth.

The real purpose of a test, just like the real purpose of design, is to reduce costs. If writing, maintaining, and running tests takes more time than it saves, tests aren’t worth having. Many people who’ve had a bad experience with testing don’t lack tests. They have a massive, out-of-date suite that nobody runs anymore. The problem isn’t testing; it’s testing badly.

This article closes the loop of the series: design → implement → verify. We’ll revisit examples from previous articles (the payment processors and the report generator) to focus on writing tests that survive refactoring.

The Cost of a Poorly Designed Test

A test is just another object in your application that uses an existing class. And like any object, the more it couples to that class, the more fragile it becomes. If a test knows the internal details of what it’s testing, any change to those details breaks it, even if the behavior is still correct.

Take this test for CreditCardPayment. It looks reasonable, but hides a problem:

RSpec.describe CreditCardPayment do
it "charges the card" do
payment = CreditCardPayment.new(card_number: "...")
order = double("Order", total: 100, currency: "EUR")
expect(payment).to receive(:charge_card).with(100, "EUR")
payment.process_payment(order)
end
end

The test doesn’t check what process_payment does; it checks how it does it. It knows that internally a private method charge_card is called with those arguments. But charge_card is an implementation detail: if it gets renamed to charge, or the logic gets split into two methods, the test breaks even though the behavior hasn’t changed.

That’s a test coupled to the implementation. It punches holes in the object’s walls to peek inside, and in return forces you to rewrite it with every refactoring. It proves nothing about the application’s correctness; it just raises the cost of changing it.

The underlying rule is the same one you apply when designing: limit coupling, and the few couplings you allow should be to stable things. The most stable thing about any object is its public interface. The most expensive and least useful tests are those coupled to unstable internal details, because they break with every refactoring of the underlying code.

The practical takeaway is straightforward: test along the edges of the object, not from the inside. A test should only know about the messages that come in and go out, just like any other collaborator.

What to Test and What to Ignore

If you think of an application as a series of messages traveling between objects (each one a black box that only exposes the few messages that cross its boundaries) the question of what to test has a clear answer. It depends on the type of message and whose interface it belongs to.

There are three possible origins for a message, from the perspective of the object under test:

received sent sent
from others to self to others
| | |
v v v
┌───────────────── Object under test ──────────────────┐
│ public interface private outgoing │
└──────────────────────────────────────────────────────┘

Incoming messages make up the object’s public interface. Messages sent to self invoke private methods. Outgoing messages are, by definition, incoming to another object. Each type is handled differently:

MessageWhose interface?Test it?Type of test
IncomingThe object under testYesState (return value)
Outgoing commandAnother objectYesBehavior (that it gets sent)
Outgoing queryAnother objectNo
Private (sent to self)The object under testNo

That’s the whole strategy. Let’s see how to apply each one.

Incoming Messages: Test the State

Incoming messages are the object’s public interface. Other objects depend on their signature and the results they return, so they’re tested by making assertions about the value (the state) they return.

Take the composite Report from the composition article. Its main incoming message is render:

class TextSection
def initialize(data:)
@data = data
end
def render = @data.to_s
end
class Report
attr_reader :title, :sections
def initialize(title:, sections:)
@title = title
@sections = sections
end
def render
["# #{title}", *sections.map(&:render)].join("\n\n")
end
end
RSpec.describe Report do
it "renders the title and its sections" do
report = Report.new(
title: "Q2",
sections: [TextSection.new(data: "Strong growth.")]
)
expect(report.render).to eq("# Q2\n\nStrong growth.")
end
end

The test creates the object, sends a message to its public interface, and checks the result. It knows nothing about how render builds the string internally. If you change the implementation of render tomorrow, the test keeps passing as long as the result is the same. It’s coupled to the interface, not the implementation.

The rule is: an object only makes state assertions about messages in its own public interface. TextSection#render is tested in TextSection’s spec, not in Report’s. Keeping return value assertions in a single place eliminates duplication and lowers maintenance costs.

Outgoing Command Messages: Test That They Get Sent

Sometimes it does matter that a message gets sent, because other parts of the application depend on what happens as a result: a file gets written, a record gets saved, money moves. These messages are commands, and it’s the responsibility of the object that sends them to prove it does so.

In the Transaction hierarchy, Purchase#perform tells its payment method to execute the charge:

class Transaction
attr_reader :amount, :payment_method, :order_id
def initialize(amount:, payment_method:, order_id:, **opts)
@amount = amount
@payment_method = payment_method
@order_id = order_id
post_initialize(opts)
end
def post_initialize(opts)
end
def execute
validate
perform
record
end
def validate
# ...
extra_validations
end
def perform
raise NotImplementedError, "#{self.class} must implement perform"
end
def record
# ...
end
def log_type
raise NotImplementedError, "#{self.class} must implement log_type"
end
# optional hook; subclasses may override
def extra_validations
end
end
class Purchase < Transaction
def perform
@payment_method.process_payment(self)
end
def log_type
"purchase"
end
end

That message getting sent isn’t an internal detail, it’s what justifies the class’s existence. We prove it with a mock, which is a test of behavior, not state:

RSpec.describe Purchase do
it "tells the payment method to process the payment" do
payment_method = double("PaymentMethod")
purchase = Purchase.new(amount: 100, payment_method: payment_method, order_id: 1)
expect(payment_method).to receive(:process_payment).with(purchase)
purchase.perform
end
end

Instead of asserting what the message returns, the mock defines an expectation: that process_payment will be received, with those arguments. Notice what we’re not doing: we don’t check what process_payment returns. That’s the payment method’s responsibility, tested in its own spec. Purchase’s only job is to send the message; the test’s only job is to prove it does.

If you’ve injected your dependencies properly, swapping the real collaborator for a mock is trivial. Testing outgoing messages in a well-designed application is that straightforward.

Outgoing Query Messages: Don’t Test

The other kind of outgoing message has no side effects, it only matters to the object that sends it. These are queries.

When Report#render iterates over its sections and calls section.render on each one, those render calls are outgoing messages from Report’s point of view. But they’re queries: they leave no trace, and no other object in the application cares whether Report sends them. Report only uses the values they return.

That’s why we don’t mock the sections to verify that render gets sent to them:

# Unnecessary and counterproductive:
expect(section).to receive(:render)

Doing so would couple Report’s test to an internal detail of how Report produces its output. The right test is the one we already wrote: we check the result of Report#render (state). That implicitly proves the sections were used, without locking us into how. And the correctness of each individual render lives in each section’s spec, where that message is incoming.

The rule ties both sections together: an outgoing query is incoming to another object, and that object is the one that tests its return value. Duplicating that assertion in the sender only adds cost.

Private Methods: Don’t Test

Messages an object sends to itself invoke private methods. As far as the rest of the application is concerned, they don’t exist. There are three good reasons not to test them:

That charge_card test from the beginning was a test of a private method in disguise. The charge is already covered through process_payment. Testing charge_card separately only adds fragility.

The warning sign isn’t having private methods, it’s feeling like you need to test them directly. If you get there, the private method is probably carrying too much responsibility and asking you to extract it into its own object. Sometimes, to defer a design decision, you deliberately write an ugly, unstable private method, and testing it saves you pain during refactoring by pointing precisely to what broke. That’s the exception, not the rule: be biased against these tests, but don’t be afraid of them if they genuinely improve your situation.

Testing Duck Types

Now we get to the first genuinely interesting case. A duck type is a virtual agreement: it has no representation in the code, so it can erode easily. Someone adds a new class, forgets to implement the role’s message, and the system blows up in production. The way to protect that contract is to test it.

Shared Examples to Verify the Contract

In the duck typing article we saw the Payment role: any object that responds to process_payment is a Payment, even if no class carries that name. And we saw how to document that contract with shared_examples:

RSpec.shared_examples "a payment" do
it { is_expected.to respond_to(:process_payment) }
it "accepts an order and processes it" do
order = double("Order", total: 100, currency: "EUR")
expect { subject.process_payment(order) }.not_to raise_error
end
end
RSpec.describe CreditCardPayment do
subject { CreditCardPayment.new(card_number: "...") }
it_behaves_like "a payment"
end
RSpec.describe PayPalPayment do
subject { PayPalPayment.new(email: "...") }
it_behaves_like "a payment"
end

Every implementation of the role runs through the same set of examples. The test is written once and reused across every player. It serves as both verification and documentation: it raises the visibility of a role that would otherwise be invisible. If someone creates ApplePayPayment and forgets process_payment, the shared example fails and the contract is protected.

The Problem with Doubles

This case is more subtle. Imagine we’re testing PaymentProcessor (the object that iterates over payments and tells each one to process) without using real payments, but a double that stubs process_payment:

class PaymentProcessor
def process(payments, order)
payments.each { |payment| payment.process_payment(order) }
end
end
RSpec.describe PaymentProcessor do
it "processes every payment" do
order = double("Order")
payment = double("Payment", process_payment: :ok)
processor = PaymentProcessor.new
expect { processor.process([payment], order) }.not_to raise_error
end
end

The test is fast and isolated. What could go wrong?

Suppose the role’s contract changes: process_payment gains an argument:

class PaymentProcessor
def process(payments, order)
payments.each { |payment| payment.process_payment(order, order.currency) }
end
end

All the real players of the role get updated, and PaymentProcessor updates its call too (the "a payment" shared example would be updated to reflect the new contract as well). The payment double, though, stays anchored to the old contract:

RSpec.describe PaymentProcessor do
it "processes every payment" do
order = double("Order", currency: "EUR")
payment = double("Payment", process_payment: :ok)
processor = PaymentProcessor.new
expect { processor.process([payment], order) }.not_to raise_error
end
end

Because double("Payment", process_payment: :ok) responds to process_payment with any number of arguments, the test keeps passing. And it will keep passing no matter what: if someone adds a CreditCardPayment with the wrong arity tomorrow, this test won’t catch it.

This is the trap that leads people to say mocks and stubs produce brittle tests. But the blame isn’t on the tool, it’s on having a role player (the double) that is never verified against the contract.

Validating Doubles Against the Role

The problem wasn’t the double itself; it was that nothing tied it to the contract. The fix is to stop using a naive double and switch to a verifying double, RSpec’s native answer to this trap.

instance_double creates a double anchored to a real class and verifies, against it, that the method you’re stubbing actually exists and that you’re calling it with a compatible signature:

class CreditCardPayment
def process_payment(order, currency)
# charges the card via the gateway
end
end
RSpec.describe PaymentProcessor do
it "processes every payment" do
order = double("Order", currency: "EUR")
payment = instance_double(CreditCardPayment, process_payment: :ok)
processor = PaymentProcessor.new
expect { processor.process([payment], order) }.not_to raise_error
end
end

Unlike the naive double, this one doesn’t accept just anything. The naive double responded to process_payment with any number of arguments. instance_double, by contrast, validates the signature against the real CreditCardPayment class. The blind spot disappears: the double can no longer get stuck on an old contract, because its contract is the real class’s contract, verified on every run.

Testing Inheritance

Inheritance adds its own challenge. The TransactionPurchase, Refund hierarchy we built with the Template Method pattern has two things to prove: that every subclass honors a common contract, and that each one does its own specialization correctly.

Specifying the Inherited Interface

The first goal is to prove that every object in the hierarchy respects its contract. The Liskov Substitution Principle requires that a subtype be substitutable for its supertype; the simplest way to verify this is to write a shared test for the common contract and include it in every object.

RSpec.shared_examples "a transaction" do
it { is_expected.to respond_to(:execute) }
it { is_expected.to respond_to(:validate) }
it { is_expected.to respond_to(:perform) }
it { is_expected.to respond_to(:record) }
it { is_expected.to respond_to(:log_type) }
end

Any object that passes these examples can be treated as a Transaction. The shared example documents the interface, prevents accidental regressions, and lets new developers write new subclasses safely:

RSpec.describe Purchase do
let(:payment_method) { double("PaymentMethod") }
subject(:purchase) { Purchase.new(amount: 100, payment_method: payment_method, order_id: 1) }
it_behaves_like "a transaction"
it "tells the payment method to process the payment" do
expect(payment_method).to receive(:process_payment).with(purchase)
purchase.perform
end
end

Testing Subclass Responsibilities

The abstract superclass also imposes requirements on its subclasses. In our Template Method, every subclass must fill in the perform and log_type hooks. That’s also worth documenting in a shared test:

RSpec.shared_examples "a transaction subclass" do
it "provides its own log_type" do
expect(subject.log_type).to be_a(String)
end
it "overrides the abstract perform hook" do
expect(subject.method(:perform).owner).not_to eq(Transaction)
end
end

A subclass must behave both as a Transaction (the common interface) and as a subclass of Transaction (filling in the required hooks). Purchase includes both:

RSpec.describe Purchase do
let(:payment_method) { # ... }
subject(:purchase) { # ... }
it_behaves_like "a transaction"
it_behaves_like "a transaction subclass"
it "tells the payment method to process the payment" do
# ...
end
end

Together, these two modules make testing the common behavior of subclasses painless.

Testing Unique Behavior

What remains is each subclass’s own specializations. There’s one rule to follow: test the specialization without leaking knowledge of the superclass into the test.

Refund specializes log_type and adds its own validations:

class Refund < Transaction
attr_reader :original_transaction
def post_initialize(opts)
@original_transaction = opts[:original_transaction]
end
def perform
# ...
end
def log_type
"refund"
end
def extra_validations
raise "Missing original" unless @original_transaction
raise "Refund exceeds original" if @amount > @original_transaction.amount
end
end

That’s what we test directly:

RSpec.describe Refund do
subject(:refund) do
Refund.new(
amount: 50,
payment_method: double("PaymentMethod"),
order_id: 1,
original_transaction: double("Transaction", amount: 100)
)
end
it_behaves_like "a transaction"
it_behaves_like "a transaction subclass"
it "logs as a refund" do
expect(subject.log_type).to eq("refund")
end
it "rejects a refund larger than the original" do
refund = Refund.new(
amount: 50,
payment_method: double("PaymentMethod"),
order_id: 1,
original_transaction: double("Transaction", amount: 30)
)
expect { refund.validate }.to raise_error(/exceeds original/)
end
# ...
end

Notice what’s missing: we don’t test execute here. execute is the superclass’s algorithm; the "a transaction" shared example already proves that Refund responds to it. Referencing it directly in the subclass spec would be redundant and would tie Refund’s test to a detail that belongs to Transaction.

Testing the Abstract Superclass

Shifting focus to Transaction reintroduces a familiar problem: it’s an abstract class. Creating an instance of it is tricky, and even if you can, it may not have all the behavior the test needs. Its hooks are unfilled.

We can prove that the superclass forces subclasses to implement the hooks. That logic lives in Transaction:

RSpec.describe Transaction do
it "forces subclasses to implement perform" do
transaction = Transaction.new(amount: 100, payment_method: double("PaymentMethod"), order_id: 1)
expect { transaction.perform }.to raise_error(NotImplementedError)
end
end

And to test the algorithm that execute defines in the superclass (the template every subtype inherits) we use Liskov to our advantage: we create a subclass that exists only for the test, fills in the hooks with trivial implementations, and gives us a concrete object to verify that execute behaves as it should.

class TransactionDouble < Transaction
def perform
true
end
def log_type
"double"
end
end
RSpec.describe Transaction do
subject { TransactionDouble.new(amount: 100, payment_method: double("PaymentMethod"), order_id: 1) }
it_behaves_like "a transaction"
it "runs validation as part of execute" do
invalid = TransactionDouble.new(amount: 0, payment_method: double("PaymentMethod"), order_id: 1)
expect { invalid.execute }.to raise_error(/Invalid amount/)
end
it "forces subclasses to implement perform" do
# ...
end
end

As long as your test subclass doesn’t violate Liskov, you can use this technique anywhere. And if you’re worried that TransactionDouble will grow stale and let failing tests slip through, you can require it to pass "a transaction subclass" as well:

RSpec.describe TransactionDouble do
subject { TransactionDouble.new(amount: 100, payment_method: double("PaymentMethod"), order_id: 1) }
it_behaves_like "a transaction subclass"
end

Carefully written inheritance hierarchies are easy to test: one shared test for the interface, another for subclass responsibilities, and clean isolation of what’s unique to each.

The Pain Signal

There’s one final benefit of tests: they expose design flaws in the underlying code. When the design isn’t good, testing hurts, and that pain has a grammar worth learning to read:

This is where all the design work pays off. Imagine a tightly coupled object: one that knows its collaborators by their concrete class, or worse, creates them itself. A test of that object runs all those collaborators in every example, whether it cares about them or not: the suite becomes slow, brittle, and prone to breaking in distant places whenever any of those pieces change.

As soon as you invert that relationship (inject the dependencies and make the object depend on a role or abstraction instead of the concrete classes behind it) the test becomes trivial: a double, a message, an expectation. You inject fake collaborators and check the result. That’s the Dependency Inversion Principle at work. Depending on abstractions is what lets you isolate an object, and an object that can be isolated is cheap to test.

But watch the direction of the arrow. Testing, by itself, doesn’t force good design. Nothing about writing tests compels you to break coupling and inject the dependency. You can absolutely write expensive, duplicated tests around tightly coupled code. Good design makes tests cheaper, not the other way around. A costly test doesn’t necessarily mean the application is poorly designed (you can write bad tests for good code), but a test that hurts in all three ways above is almost always pointing at the code, not the test.

Conclusion

Well-written tests aren’t a separate layer bolted on after the code is done, they’re the same design discipline applied one more time. The same principles that guide how you write the code guide how you test it.

The whole strategy can be summed up like this:

The best test is coupled only to the interface, tests each thing once and in its proper place, and survives refactoring. As long as the public interfaces stay stable, you write the test once and it protects you forever.

Test your knowledge

  1. Why is an incoming message tested by its state (return value), but an outgoing query message isn’t tested from the sender’s side?

  1. You have a method that sends a message to another object. When do you use a mock to verify it gets sent?

  1. Why is a test double that stubs an obsolete method of the role it claims to play dangerous?

  1. When testing a concrete subclass like Refund, what should you avoid?

  1. A test needs a painful setup and drags half the application along to run. What is it telling you?