Tests That Survive Change

Jun 11, 2026

Writing changeable code rests on three skills. The first is object-oriented design: poorly designed code is hard to change by nature. The second is knowing how to refactor: transforming the structure of code without altering its external behavior. The third is tests, which give you the confidence to refactor without fear.

All three support each other. Well-designed code is easy to change, refactoring is how you move from one design to the next, and tests are what let you refactor with impunity. Without tests, you’re afraid to touch anything; without design, tests cost more than they’re worth.

The real purpose of a test, just like the real purpose of design, is to reduce costs. If writing, maintaining, and running tests takes more time than it saves, tests aren’t worth having. Many people who’ve had a bad experience with testing don’t lack tests. They have a massive, out-of-date suite that nobody runs anymore. The problem isn’t testing; it’s testing badly.

This article closes the loop of the series: design → implement → verify. We’ll revisit examples from previous articles (the payment processors and the report generator) to focus on writing tests that survive refactoring.

The Cost of a Poorly Designed Test

A test is just another object in your application that uses an existing class. And like any object, the more it couples to that class, the more fragile it becomes. If a test knows the internal details of what it’s testing, any change to those details breaks it, even if the behavior is still correct.

Take this test for CreditCardPayment. It looks reasonable, but hides a problem:

RSpec.describe CreditCardPayment do
  it "charges the card" do
    payment = CreditCardPayment.new(card_number: "...")
    order = double("Order", total: 100, currency: "EUR")

    expect(payment).to receive(:charge_card).with(100, "EUR")
    payment.process_payment(order)
  end
end

The test doesn’t check what process_payment does; it checks how it does it. It knows that internally a private method charge_card is called with those arguments. But charge_card is an implementation detail: if it gets renamed to charge, or the logic gets split into two methods, the test breaks even though the behavior hasn’t changed.

That’s a test coupled to the implementation. It punches holes in the object’s walls to peek inside, and in return forces you to rewrite it with every refactoring. It proves nothing about the application’s correctness; it just raises the cost of changing it.

The underlying rule is the same one you apply when designing: limit coupling, and the few couplings you allow should be to stable things. The most stable thing about any object is its public interface. The most expensive and least useful tests are those coupled to unstable internal details, because they break with every refactoring of the underlying code.

The practical takeaway is straightforward: test along the edges of the object, not from the inside. A test should only know about the messages that come in and go out, just like any other collaborator.

What to Test and What to Ignore

If you think of an application as a series of messages traveling between objects (each one a black box that only exposes the few messages that cross its boundaries) the question of what to test has a clear answer. It depends on the type of message and whose interface it belongs to.

There are three possible origins for a message, from the perspective of the object under test:

     received         sent             sent
     from others      to self          to others
         |                |                |
         v                v                v
┌───────────────── Object under test ──────────────────┐
│  public interface    private        outgoing         │
└──────────────────────────────────────────────────────┘

Incoming messages make up the object’s public interface. Messages sent to self invoke private methods. Outgoing messages are, by definition, incoming to another object. Each type is handled differently:

Message	Whose interface?	Test it?	Type of test
Incoming	The object under test	Yes	State (return value)
Outgoing command	Another object	Yes	Behavior (that it gets sent)
Outgoing query	Another object	No	—
Private (sent to self)	The object under test	No	—

That’s the whole strategy. Let’s see how to apply each one.

Incoming Messages: Test the State

Incoming messages are the object’s public interface. Other objects depend on their signature and the results they return, so they’re tested by making assertions about the value (the state) they return.

Take the composite Report from the composition article. Its main incoming message is render:

class TextSection
  def initialize(data:)
    @data = data
  end

  def render = @data.to_s
end

class Report
  attr_reader :title, :sections

  def initialize(title:, sections:)
    @title = title
    @sections = sections
  end

  def render
    ["# #{title}", *sections.map(&:render)].join("\n\n")
  end
end

RSpec.describe Report do
  it "renders the title and its sections" do
    report = Report.new(
      title: "Q2",
      sections: [TextSection.new(data: "Strong growth.")]
    )

    expect(report.render).to eq("# Q2\n\nStrong growth.")
  end
end

The test creates the object, sends a message to its public interface, and checks the result. It knows nothing about how render builds the string internally. If you change the implementation of render tomorrow, the test keeps passing as long as the result is the same. It’s coupled to the interface, not the implementation.

The rule is: an object only makes state assertions about messages in its own public interface. TextSection#render is tested in TextSection’s spec, not in Report’s. Keeping return value assertions in a single place eliminates duplication and lowers maintenance costs.

Outgoing Command Messages: Test That They Get Sent

Sometimes it does matter that a message gets sent, because other parts of the application depend on what happens as a result: a file gets written, a record gets saved, money moves. These messages are commands, and it’s the responsibility of the object that sends them to prove it does so.

In the Transaction hierarchy, Purchase#perform tells its payment method to execute the charge:

class Transaction
  attr_reader :amount, :payment_method, :order_id

  def initialize(amount:, payment_method:, order_id:, **opts)
    @amount = amount
    @payment_method = payment_method
    @order_id = order_id
    post_initialize(opts)
  end

  def post_initialize(opts)
  end

  def execute
    validate
    perform
    record
  end

  def validate
    # ...
    extra_validations
  end

  def perform
    raise NotImplementedError, "#{self.class} must implement perform"
  end

  def record
    # ...
  end

  def log_type
    raise NotImplementedError, "#{self.class} must implement log_type"
  end

  # optional hook; subclasses may override
  def extra_validations
  end
end

class Purchase < Transaction
  def perform
    @payment_method.process_payment(self)
  end

  def log_type
    "purchase"
  end
end

That message getting sent isn’t an internal detail, it’s what justifies the class’s existence. We prove it with a mock, which is a test of behavior, not state:

RSpec.describe Purchase do
  it "tells the payment method to process the payment" do
    payment_method = double("PaymentMethod")
    purchase = Purchase.new(amount: 100, payment_method: payment_method, order_id: 1)

    expect(payment_method).to receive(:process_payment).with(purchase)

    purchase.perform
  end
end

Instead of asserting what the message returns, the mock defines an expectation: that process_payment will be received, with those arguments. Notice what we’re not doing: we don’t check what process_payment returns. That’s the payment method’s responsibility, tested in its own spec. Purchase’s only job is to send the message; the test’s only job is to prove it does.

If you’ve injected your dependencies properly, swapping the real collaborator for a mock is trivial. Testing outgoing messages in a well-designed application is that straightforward.

Outgoing Query Messages: Don’t Test

The other kind of outgoing message has no side effects, it only matters to the object that sends it. These are queries.

When Report#render iterates over its sections and calls section.render on each one, those render calls are outgoing messages from Report’s point of view. But they’re queries: they leave no trace, and no other object in the application cares whether Report sends them. Report only uses the values they return.

That’s why we don’t mock the sections to verify that render gets sent to them:

# Unnecessary and counterproductive:
expect(section).to receive(:render)

Doing so would couple Report’s test to an internal detail of how Report produces its output. The right test is the one we already wrote: we check the result of Report#render (state). That implicitly proves the sections were used, without locking us into how. And the correctness of each individual render lives in each section’s spec, where that message is incoming.

The rule ties both sections together: an outgoing query is incoming to another object, and that object is the one that tests its return value. Duplicating that assertion in the sender only adds cost.

Private Methods: Don’t Test

Messages an object sends to itself invoke private methods. As far as the rest of the application is concerned, they don’t exist. There are three good reasons not to test them:

They’re redundant. A private method is invoked by some public method that already has tests. A bug in the private method will surface through an existing test.
They’re unstable. This is exactly the code that changes most often. Testing it can condemn you to rewriting the test with every refactoring.
They mislead. Tests document how the object expects to interact with the world. Exposing private methods invites others to depend on them and breaks encapsulation.

That charge_card test from the beginning was a test of a private method in disguise. The charge is already covered through process_payment. Testing charge_card separately only adds fragility.

The warning sign isn’t having private methods, it’s feeling like you need to test them directly. If you get there, the private method is probably carrying too much responsibility and asking you to extract it into its own object. Sometimes, to defer a design decision, you deliberately write an ugly, unstable private method, and testing it saves you pain during refactoring by pointing precisely to what broke. That’s the exception, not the rule: be biased against these tests, but don’t be afraid of them if they genuinely improve your situation.

Testing Duck Types

Now we get to the first genuinely interesting case. A duck type is a virtual agreement: it has no representation in the code, so it can erode easily. Someone adds a new class, forgets to implement the role’s message, and the system blows up in production. The way to protect that contract is to test it.

Shared Examples to Verify the Contract

In the duck typing article we saw the Payment role: any object that responds to process_payment is a Payment, even if no class carries that name. And we saw how to document that contract with shared_examples:

RSpec.shared_examples "a payment" do
  it { is_expected.to respond_to(:process_payment) }

  it "accepts an order and processes it" do
    order = double("Order", total: 100, currency: "EUR")
    expect { subject.process_payment(order) }.not_to raise_error
  end
end

RSpec.describe CreditCardPayment do
  subject { CreditCardPayment.new(card_number: "...") }
  it_behaves_like "a payment"
end

RSpec.describe PayPalPayment do
  subject { PayPalPayment.new(email: "...") }
  it_behaves_like "a payment"
end

Every implementation of the role runs through the same set of examples. The test is written once and reused across every player. It serves as both verification and documentation: it raises the visibility of a role that would otherwise be invisible. If someone creates ApplePayPayment and forgets process_payment, the shared example fails and the contract is protected.

The Problem with Doubles

This case is more subtle. Imagine we’re testing PaymentProcessor (the object that iterates over payments and tells each one to process) without using real payments, but a double that stubs process_payment:

class PaymentProcessor
  def process(payments, order)
    payments.each { |payment| payment.process_payment(order) }
  end
end

RSpec.describe PaymentProcessor do
  it "processes every payment" do
    order = double("Order")
    payment = double("Payment", process_payment: :ok)
    processor = PaymentProcessor.new

    expect { processor.process([payment], order) }.not_to raise_error
  end
end

The test is fast and isolated. What could go wrong?

Suppose the role’s contract changes: process_payment gains an argument:

class PaymentProcessor
  def process(payments, order)
    payments.each { |payment| payment.process_payment(order, order.currency) }
  end
end

All the real players of the role get updated, and PaymentProcessor updates its call too (the "a payment" shared example would be updated to reflect the new contract as well). The payment double, though, stays anchored to the old contract:

RSpec.describe PaymentProcessor do
  it "processes every payment" do
    order = double("Order", currency: "EUR")
    payment = double("Payment", process_payment: :ok)
    processor = PaymentProcessor.new

    expect { processor.process([payment], order) }.not_to raise_error
  end
end

Because double("Payment", process_payment: :ok) responds to process_payment with any number of arguments, the test keeps passing. And it will keep passing no matter what: if someone adds a CreditCardPayment with the wrong arity tomorrow, this test won’t catch it.

This is the trap that leads people to say mocks and stubs produce brittle tests. But the blame isn’t on the tool, it’s on having a role player (the double) that is never verified against the contract.

Validating Doubles Against the Role

The problem wasn’t the double itself; it was that nothing tied it to the contract. The fix is to stop using a naive double and switch to a verifying double, RSpec’s native answer to this trap.

instance_double creates a double anchored to a real class and verifies, against it, that the method you’re stubbing actually exists and that you’re calling it with a compatible signature:

class CreditCardPayment
  def process_payment(order, currency)
    # charges the card via the gateway
  end
end

RSpec.describe PaymentProcessor do
  it "processes every payment" do
    order = double("Order", currency: "EUR")
    payment = instance_double(CreditCardPayment, process_payment: :ok)
    processor = PaymentProcessor.new

    expect { processor.process([payment], order) }.not_to raise_error
  end
end

Unlike the naive double, this one doesn’t accept just anything. The naive double responded to process_payment with any number of arguments. instance_double, by contrast, validates the signature against the real CreditCardPayment class. The blind spot disappears: the double can no longer get stuck on an old contract, because its contract is the real class’s contract, verified on every run.

Testing Inheritance

Inheritance adds its own challenge. The Transaction → Purchase, Refund hierarchy we built with the Template Method pattern has two things to prove: that every subclass honors a common contract, and that each one does its own specialization correctly.

Specifying the Inherited Interface

The first goal is to prove that every object in the hierarchy respects its contract. The Liskov Substitution Principle requires that a subtype be substitutable for its supertype; the simplest way to verify this is to write a shared test for the common contract and include it in every object.

RSpec.shared_examples "a transaction" do
  it { is_expected.to respond_to(:execute) }
  it { is_expected.to respond_to(:validate) }
  it { is_expected.to respond_to(:perform) }
  it { is_expected.to respond_to(:record) }
  it { is_expected.to respond_to(:log_type) }
end

Any object that passes these examples can be treated as a Transaction. The shared example documents the interface, prevents accidental regressions, and lets new developers write new subclasses safely:

RSpec.describe Purchase do
  let(:payment_method) { double("PaymentMethod") }
  subject(:purchase) { Purchase.new(amount: 100, payment_method: payment_method, order_id: 1) }

  it_behaves_like "a transaction"

  it "tells the payment method to process the payment" do
    expect(payment_method).to receive(:process_payment).with(purchase)
    purchase.perform
  end
end

Testing Subclass Responsibilities

The abstract superclass also imposes requirements on its subclasses. In our Template Method, every subclass must fill in the perform and log_type hooks. That’s also worth documenting in a shared test:

RSpec.shared_examples "a transaction subclass" do
  it "provides its own log_type" do
    expect(subject.log_type).to be_a(String)
  end

  it "overrides the abstract perform hook" do
    expect(subject.method(:perform).owner).not_to eq(Transaction)
  end
end

A subclass must behave both as a Transaction (the common interface) and as a subclass of Transaction (filling in the required hooks). Purchase includes both:

RSpec.describe Purchase do
  let(:payment_method) { # ... }
  subject(:purchase) { # ... }

  it_behaves_like "a transaction"
  it_behaves_like "a transaction subclass"

  it "tells the payment method to process the payment" do
    # ...
  end
end

Together, these two modules make testing the common behavior of subclasses painless.

Testing Unique Behavior

What remains is each subclass’s own specializations. There’s one rule to follow: test the specialization without leaking knowledge of the superclass into the test.

Refund specializes log_type and adds its own validations:

class Refund < Transaction
  attr_reader :original_transaction

  def post_initialize(opts)
    @original_transaction = opts[:original_transaction]
  end

  def perform
    # ...
  end

  def log_type
    "refund"
  end

  def extra_validations
    raise "Missing original" unless @original_transaction
    raise "Refund exceeds original" if @amount > @original_transaction.amount
  end
end

That’s what we test directly:

RSpec.describe Refund do
  subject(:refund) do
    Refund.new(
      amount: 50,
      payment_method: double("PaymentMethod"),
      order_id: 1,
      original_transaction: double("Transaction", amount: 100)
    )
  end

  it_behaves_like "a transaction"
  it_behaves_like "a transaction subclass"

  it "logs as a refund" do
    expect(subject.log_type).to eq("refund")
  end

  it "rejects a refund larger than the original" do
    refund = Refund.new(
      amount: 50,
      payment_method: double("PaymentMethod"),
      order_id: 1,
      original_transaction: double("Transaction", amount: 30)
    )

    expect { refund.validate }.to raise_error(/exceeds original/)
  end

  # ...
end

Notice what’s missing: we don’t test execute here. execute is the superclass’s algorithm; the "a transaction" shared example already proves that Refund responds to it. Referencing it directly in the subclass spec would be redundant and would tie Refund’s test to a detail that belongs to Transaction.

Testing the Abstract Superclass

Shifting focus to Transaction reintroduces a familiar problem: it’s an abstract class. Creating an instance of it is tricky, and even if you can, it may not have all the behavior the test needs. Its hooks are unfilled.

We can prove that the superclass forces subclasses to implement the hooks. That logic lives in Transaction:

RSpec.describe Transaction do
  it "forces subclasses to implement perform" do
    transaction = Transaction.new(amount: 100, payment_method: double("PaymentMethod"), order_id: 1)
    expect { transaction.perform }.to raise_error(NotImplementedError)
  end
end

And to test the algorithm that execute defines in the superclass (the template every subtype inherits) we use Liskov to our advantage: we create a subclass that exists only for the test, fills in the hooks with trivial implementations, and gives us a concrete object to verify that execute behaves as it should.

class TransactionDouble < Transaction
  def perform
    true
  end

  def log_type
    "double"
  end
end

RSpec.describe Transaction do
  subject { TransactionDouble.new(amount: 100, payment_method: double("PaymentMethod"), order_id: 1) }

  it_behaves_like "a transaction"

  it "runs validation as part of execute" do
    invalid = TransactionDouble.new(amount: 0, payment_method: double("PaymentMethod"), order_id: 1)
    expect { invalid.execute }.to raise_error(/Invalid amount/)
  end

  it "forces subclasses to implement perform" do
    # ...
  end
end

As long as your test subclass doesn’t violate Liskov, you can use this technique anywhere. And if you’re worried that TransactionDouble will grow stale and let failing tests slip through, you can require it to pass "a transaction subclass" as well:

RSpec.describe TransactionDouble do
  subject { TransactionDouble.new(amount: 100, payment_method: double("PaymentMethod"), order_id: 1) }
  it_behaves_like "a transaction subclass"
end

Carefully written inheritance hierarchies are easy to test: one shared test for the interface, another for subclass responsibilities, and clean isolation of what’s unique to each.

The Pain Signal

There’s one final benefit of tests: they expose design flaws in the underlying code. When the design isn’t good, testing hurts, and that pain has a grammar worth learning to read:

If a test needs a painful setup, the code expects too much context.
If testing one object drags half the application along with it, the code has too many dependencies.
If a test is hard to write, other objects will find the code hard to reuse. (A test is, after all, the first reuser of any code.)

This is where all the design work pays off. Imagine a tightly coupled object: one that knows its collaborators by their concrete class, or worse, creates them itself. A test of that object runs all those collaborators in every example, whether it cares about them or not: the suite becomes slow, brittle, and prone to breaking in distant places whenever any of those pieces change.

As soon as you invert that relationship (inject the dependencies and make the object depend on a role or abstraction instead of the concrete classes behind it) the test becomes trivial: a double, a message, an expectation. You inject fake collaborators and check the result. That’s the Dependency Inversion Principle at work. Depending on abstractions is what lets you isolate an object, and an object that can be isolated is cheap to test.

But watch the direction of the arrow. Testing, by itself, doesn’t force good design. Nothing about writing tests compels you to break coupling and inject the dependency. You can absolutely write expensive, duplicated tests around tightly coupled code. Good design makes tests cheaper, not the other way around. A costly test doesn’t necessarily mean the application is poorly designed (you can write bad tests for good code), but a test that hurts in all three ways above is almost always pointing at the code, not the test.

Conclusion

Well-written tests aren’t a separate layer bolted on after the code is done, they’re the same design discipline applied one more time. The same principles that guide how you write the code guide how you test it.

The Dependency Inversion Principle is what makes tests cheap: depending on roles (abstractions) rather than concrete classes is what lets you inject doubles and isolate the object under test.
The Liskov Substitution Principle shows up in shared_examples: every player of a role must be interchangeable with any other.
The Single Responsibility Principle is what lets you test each thing once and in the right place: state in the receiver’s spec; the command in the sender’s spec.

The whole strategy can be summed up like this:

Test the state returned by incoming messages and the sending of outgoing commands.
Ignore outgoing queries and private methods.
Validate your doubles against the same contract as the real code.

The best test is coupled only to the interface, tests each thing once and in its proper place, and survives refactoring. As long as the public interfaces stay stable, you write the test once and it protects you forever.

Tests That Survive Change

The Cost of a Poorly Designed Test

What to Test and What to Ignore

Incoming Messages: Test the State

Outgoing Command Messages: Test That They Get Sent

Outgoing Query Messages: Don’t Test

Private Methods: Don’t Test

Testing Duck Types

Shared Examples to Verify the Contract

The Problem with Doubles

Validating Doubles Against the Role

Testing Inheritance

Specifying the Inherited Interface

Testing Subclass Responsibilities

Testing Unique Behavior

Testing the Abstract Superclass

The Pain Signal

Conclusion

Test your knowledge

Why is an incoming message tested by its state (return value), but an outgoing query message isn't tested from the sender's side?

You have a method that sends a message to another object. When do you use a mock to verify it gets sent?

Why is a test double that stubs an obsolete method of the role it claims to play dangerous?

When testing a concrete subclass like `Refund`, what should you avoid?

A test needs a painful setup and drags half the application along to run. What is it telling you?

The Cost of a Poorly Designed Test

What to Test and What to Ignore

Incoming Messages: Test the State

Outgoing Command Messages: Test That They Get Sent

Outgoing Query Messages: Don’t Test

Private Methods: Don’t Test

Testing Duck Types

Shared Examples to Verify the Contract

The Problem with Doubles

Validating Doubles Against the Role

Testing Inheritance

Specifying the Inherited Interface

Testing Subclass Responsibilities

Testing Unique Behavior

Testing the Abstract Superclass

The Pain Signal

Conclusion

Test your knowledge

Why is an incoming message tested by its state (return value), but an outgoing query message isn't tested from the sender's side?

You have a method that sends a message to another object. When do you use a mock to verify it gets sent?

Why is a test double that stubs an obsolete method of the role it claims to play dangerous?

When testing a concrete subclass like Refund, what should you avoid?

A test needs a painful setup and drags half the application along to run. What is it telling you?

When testing a concrete subclass like `Refund`, what should you avoid?