Week 12 Lecture: Unit Testing and Test-Driven Development

Why Automated Testing Matters

When you finish writing a piece of code, you probably run it once or twice to confirm it behaves as expected. That is a form of testing, but it does not scale. Imagine a project with fifty classes: every time you change one class, code somewhere else might break. Manually re-running every feature after every change is impractical.

This is why we write automated tests — small programs whose only job is to test our programs.

A Real-World Scenario

In professional software development, several programmers usually work on the same project at once. Each developer writes code on their own machine and then pushes it to a shared repository on GitHub, making their changes visible to the rest of the team. When two people change different parts of the code, those changes are merged (combined). Sometimes, however, those changes conflict or interact in unexpected ways.

Consider this example:

A developer writes a BankAccount class. He tests it manually — deposit works, withdraw works — and pushes the code.
The next day, another developer adds a new feature that changes how the balance is calculated. She tests her own code, sees that it works, and pushes it.
On Monday morning, the whole system is broken and the server is down. The first developer’s code and the second developer’s code do not work well together, but neither of them knows whose change caused the failure. They each tested their own work in isolation.

If the first developer had written automated tests for BankAccount, the moment the second developer’s change broke something the tests would have failed and immediately reported the problem. Bugs would be caught instantly, not three days later.

What Is a Unit Test?

A unit test is a test that checks one small piece of your code — usually a single method or a single behavior of a class.

The key word is unit. You are not testing the whole application; you are testing one tiny unit in isolation. Each test checks one thing, so when it fails, you know exactly where the bug is.

Choosing a Testing Framework: `pytest`

Python ships with a built-in testing module called unittest. It works, but it is somewhat verbose: it requires you to write classes and use special assertion methods.

Instead, we will use pytest, the most popular testing framework in Python. With pytest, tests are just plain functions — no special classes or complicated setup.

pytest is not built in, so you need to install it:

pip install pytest

Writing Your First Test

Let’s create a file called calculator.py with a simple add function:

def add(a, b):
    return a + b

Now we will write a test for it. Tests live in a separate file, and the file name must start with test_. We will call it test_calculator.py.

We import the function under test, define a function whose name also must start with test_, and use the assert keyword to check the result. (We will look at assert in detail in a moment.)

from calculator import add

def test_add_positive_numbers():
    assert add(2, 3) == 5

To run the tests, type pytest in your terminal:

pytest

pytest automatically:

Finds every file whose name starts with test_.
Finds every function inside those files whose name starts with test_.
Runs them and reports which passed and which failed.

A typical run looks like this:

============================= test session starts =============================
platform win32 -- Python 3.14.2, pytest-9.0.2, pluggy-1.6.0
rootdir: C:\Users\Admin\AppData\Local\Temp\pytest_demo
plugins: anyio-4.12.0, asyncio-1.3.0, cov-7.0.0, mock-3.15.1
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item

test_calculator.py .                                                     [100%]

============================== 1 passed in 0.01s ==============================

The header lines describe the platform, the project root, and any installed plugins (the asynchronous-programming lines can be ignored for now). collected 1 item means pytest found one test function — you can also see it as a single dot after the file name. Each dot represents a passing test; an F means a failed test.

Adding More Tests

def test_add_negative_numbers():
    assert add(-1, -1) == -2

def test_add_zero():
    assert add(0, 5) == 5

Running pytest again:

============================= test session starts =============================
platform win32 -- Python 3.14.2, pytest-9.0.2, pluggy-1.6.0
rootdir: C:\Users\Admin\AppData\Local\Temp\pytest_demo
plugins: anyio-4.12.0, asyncio-1.3.0, cov-7.0.0, mock-3.15.1
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 3 items

test_calculator.py ...                                                   [100%]

============================== 3 passed in 0.01s ==============================

Now we have three collected items and three dots.

A Failing Test

Let’s add one more test, this time with floating-point numbers:

def test_add_floats():
    assert add(0.1, 0.2) == 0.3

============================= test session starts =============================
platform win32 -- Python 3.14.2, pytest-9.0.2, pluggy-1.6.0
rootdir: C:\Users\Admin\AppData\Local\Temp\pytest_demo
plugins: anyio-4.12.0, asyncio-1.3.0, cov-7.0.0, mock-3.15.1
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 4 items

test_calculator.py ...F                                                  [100%]

================================== FAILURES ===================================
_______________________________ test_add_floats _______________________________

    def test_add_floats():
>       assert add(0.1, 0.2) == 0.3
E       assert 0.30000000000000004 == 0.3
E        +  where 0.30000000000000004 = add(0.1, 0.2)

test_calculator.py:13: AssertionError
=========================== short test summary info ===========================
FAILED test_calculator.py::test_add_floats - assert 0.30000000000000004 == 0.3
========================= 1 failed, 3 passed in 0.07s =========================

The F instead of a dot tells us one test failed. The cause is the classic floating-point representation issue: 0.1 + 0.2 is not exactly 0.3 in binary floating-point math.

Understanding `assert`

In programming, assert is a way of saying:

“I am confident this should be true — check it for me.”

assert <condition>

If the condition is True, nothing happens.
If the condition is False, Python raises an AssertionError.

In testing, this is everything. You are essentially saying, “I assert that this is true; if it isn’t, something is broken.”

Common patterns:

assert a == b                       # equality
assert a != b                       # inequality
assert a                            # truthiness
assert isinstance(obj, MyClass)     # type check
assert 'some key' in some_dict      # membership check

One of the great features of pytest is that when an assertion fails, it does not just say AssertionError; it shows both sides of the comparison:

assert add(2, 3) == 6
AssertionError: assert 5 == 6

It tells you, in effect: “your function returned 5, but you expected 6.” These richer error messages are one of the main reasons people prefer pytest over unittest.

Testing a Class

Let’s apply what we know to a class — our familiar BankAccount:

# bank_account.py

class BankAccount:
    def __init__(self, owner: str, balance: float = 0):
        self.owner = owner
        self.balance = balance

    def deposit(self, amount: float):
        if amount <= 0:
            raise ValueError("Deposit amount must be positive")
        self.balance += amount

    def withdraw(self, amount: float):
        if amount <= 0:
            raise ValueError("Withdrawal amount must be positive")
        if amount > self.balance:
            raise ValueError("Insufficient funds")
        self.balance -= amount

    def __repr__(self):
        return f"BankAccount('{self.owner}', {self.balance})"

We will test:

Creating an account with the default balance (it should start at 0).
Creating an account with an initial balance.
Depositing money.
Withdrawing money.

# test_bank_account.py

from bank_account import BankAccount

def test_create_account_default_balance():
    account = BankAccount("Alisher")
    assert account.owner == "Alisher"
    assert account.balance == 0

def test_create_account_with_balance():
    account = BankAccount("Sevara", 1000)
    assert account.balance == 1000

def test_deposit():
    account = BankAccount("Jasur", 500)
    account.deposit(200)
    assert account.balance == 700

def test_withdraw():
    account = BankAccount("Nodira", 1000)
    account.withdraw(300)
    assert account.balance == 700

Each test creates its own BankAccount, performs an action, and checks the result. Simple.

Testing for Exceptions

How do we check that our code correctly raises an error when it should? pytest provides a special context manager, pytest.raises.

import pytest
from bank_account import BankAccount

def test_deposit_negative_amount():
    account = BankAccount("Alisher")
    with pytest.raises(ValueError):
        account.deposit(-100)

def test_withdraw_insufficient_funds():
    account = BankAccount("Sevara", 500)
    with pytest.raises(ValueError):
        account.withdraw(1000)

Reading this in plain English: “I expect the code inside this block to raise a ValueError. If it does, the test passes. If it doesn’t, the test fails.”

You can even check the error message:

def test_withdraw_insufficient_funds_message():
    account = BankAccount('sevara', 500)
    with pytest.raises(ValueError, match="Insufficient funds."):
        account.withdraw(1000)

The match parameter checks that the actual error message contains the specified text.

Removing Repetition with Fixtures

Look at the tests above — every single one creates a BankAccount. That is repetition. It would be nice to tell pytest:

“Before each test, build a BankAccount and hand it to me.”

That is exactly what a fixture does. A fixture is a function decorated with @pytest.fixture that returns a value the tests can use.

import pytest
from bank_account import BankAccount

@pytest.fixture
def empty_account():
    return BankAccount("Alisher")

@pytest.fixture
def funded_account():
    return BankAccount("Sevara", 1000)

To use a fixture, simply list its name as a parameter of the test function:

def test_deposit(empty_account):
    empty_account.deposit(500)
    assert empty_account.balance == 500

def test_withdraw(funded_account):
    funded_account.withdraw(300)
    assert funded_account.balance == 700

def test_withdraw_insufficient(funded_account):
    with pytest.raises(ValueError):
        funded_account.withdraw(2000)

Notice we never explicitly call empty_account(). When pytest sees a parameter named empty_account, it looks for a fixture with that name, calls it, and passes the returned value into the test.

Each test gets a fresh fixture. If test_deposit modifies its empty_account, the next test that uses empty_account receives a brand-new one.

Parameterized Tests

Sometimes you want to test the same behavior with many different inputs. For example, suppose we want to verify that deposits work correctly for various amounts. We could write one test per amount:

def test_deposit_100(empty_account):
    empty_account.deposit(100)
    assert empty_account.balance == 100

def test_deposit_500(empty_account):
    empty_account.deposit(500)
    assert empty_account.balance == 500

# ... and so on

But this is repetitive — only the numbers change. The @pytest.mark.parametrize decorator handles this elegantly. It takes:

A string of parameter names separated by commas, e.g. "amount, expected".
A list of tuples, each tuple representing one test case (it can be any iterable — a list, dict, etc.).

@pytest.mark.parametrize("amount, expected", [
    (100, 100),
    (500, 500),
    (0.5, 0.5),
    (999999, 999999),
])
def test_deposit_various_amounts(empty_account, amount, expected):
    empty_account.deposit(amount)
    assert empty_account.balance == expected

pytest will run this function four times, once per tuple. The order of arguments does not matter — you could also write (amount, empty_account, expected). pytest resolves them by name.

When you run pytest -v (verbose mode, which prints more details), the output shows each parameterized case separately:

test_bank_account.py::test_deposit_various_amounts[100-100]     PASSED
test_bank_account.py::test_deposit_various_amounts[500-500]     PASSED
test_bank_account.py::test_deposit_various_amounts[0.5-0.5]     PASSED
test_bank_account.py::test_deposit_various_amounts[999999-999999] PASSED

Parameterization is also great for testing invalid inputs:

@pytest.mark.parametrize("invalid_amount", [0, -1, -100, -0.5])
def test_deposit_invalid_amounts(empty_account, invalid_amount):
    with pytest.raises(ValueError):
        empty_account.deposit(invalid_amount)

Writing Code That Is Easy to Test

Knowing how to write tests is half the story. The other half is writing code that is easy to test in the first place. Consider this class:

# hard to test!
class OrderProcessor:
    def process(self, items):
        total = sum(item.price for item in items)
        tax = total * 0.12
        grand_total = total + tax

        # directly sends email... how do we test this?
        import smtplib
        server = smtplib.SMTP("mail.example.com")
        server.send_message(f"Your total is {grand_total}")

        # directly writes to database... how do we test this?
        import sqlite3
        conn = sqlite3.connect("orders.db")
        conn.execute("INSERT INTO orders VALUES (?)", (grand_total,))

        return grand_total

Why is this hard to test? Because process() does three things at once:

Calculates the total — this is what we want to test.
Sends an email — we do not want to send real emails during tests.
Writes to a database — we do not want to mess with real data either.

Everything is mixed together; you cannot test the calculation without triggering email and database side effects. This is a design problem, not a testing problem. The underlying principle is the Single Responsibility Principle: a class should do one thing. We will study design next week, but here is a more testable version that focuses purely on calculating totals:

# easy to test!
class PriceCalculator:
    def __init__(self, tax_rate: float = 0.12):
        self.tax_rate = tax_rate

    def calculate_total(self, items):
        subtotal = sum(item.price for item in items)
        tax = subtotal * self.tax_rate
        return subtotal + tax

Notice that we do not hardcode the tax rate — it is passed in. Now we can test the calculation without worrying about emails or databases:

from dataclasses import dataclass

@dataclass
class Item:
    name: str
    price: float

def test_calculate_total():
    calc = PriceCalculator(tax_rate=0.10)
    items = [Item("Laptop", 1000), Item("Mouse", 50)]
    assert calc.calculate_total(items) == 1155.0

def test_calculate_total_empty():
    calc = PriceCalculator()
    assert calc.calculate_total([]) == 0

def test_custom_tax_rate():
    calc = PriceCalculator(tax_rate=0.20)
    items = [Item("Book", 100)]
    assert calc.calculate_total(items) == 120.0

What Makes a Class Testable

Small, focused methods — each method does one thing.
Variables passed in, not created inside — like tax_rate being a parameter rather than a hardcoded constant.
No side effects in calculations — computing values and sending emails should be separate concerns.

We will explore these ideas in greater depth next week.

Test-Driven Development (TDD)

So far we have been writing the code first and the tests afterwards. Test-Driven Development (TDD) flips that around: you write the test before you write the code.

How can you test something that does not exist? You can’t — the test will fail. And that’s the point.

TDD follows a cycle called Red–Green–Refactor:

Red — Write a test for the behavior you want. Run it. It fails. The test is “red.”
Green — Write the minimum code needed to make the test pass. Just make it work. The test turns “green.”
Refactor — Now clean up the code. Improve the design. The test should still pass after refactoring.

TDD by Example: Building a `GradeBook`

Suppose we want to build a GradeBook class for Al-Khwarizmi University.

Step 1: Red — write the test first.

What should our gradebook do? At minimum, it should start empty and let us add students.

# test_gradebook.py

from gradebook import GradeBook

def test_add_student():
    gb = GradeBook()
    gb.add_student("Alisher")
    assert "Alisher" in gb.students

Run pytest. It fails:

ModuleNotFoundError: No module named 'gradebook'

Step 2: Green — write just enough code to pass.

# gradebook.py

class GradeBook:
    def __init__(self):
        self.students = []

    def add_student(self, name):
        self.students.append(name)

Run pytest. It passes. Green.

Step 3: Refactor. Nothing to refactor yet — the code is simple. Move on.

Now go back to Red and add more behavior:

def test_add_grade():
    gb = GradeBook()
    gb.add_student("Sevara")
    gb.add_grade("Sevara", 85)
    assert gb.get_grades("Sevara") == [85]

Run it. Red — add_grade and get_grades do not exist. Fix it:

class GradeBook:
    def __init__(self):
        self.students = {}

    def add_student(self, name):
        self.students[name] = []

    def add_grade(self, name, grade):
        self.students[name].append(grade)

    def get_grades(self, name):
        return self.students[name]

We changed self.students from a list to a dict. Does the original test still pass?

def test_add_student():
    gb = GradeBook()
    gb.add_student("Alisher")
    assert "Alisher" in gb.students  # 'in' works for dicts too!

Yes — because in checks dict keys. We got lucky here, but this is exactly why we run all tests every time. If the first test had broken, we would notice immediately and fix it before moving on.

Let’s add another behavior — calculating averages:

def test_average_grade():
    gb = GradeBook()
    gb.add_student("Jasur")
    gb.add_grade("Jasur", 80)
    gb.add_grade("Jasur", 90)
    gb.add_grade("Jasur", 100)
    assert gb.average("Jasur") == 90.0

Red. Make it green:

def average(self, name):
    grades = self.students[name]
    return sum(grades) / len(grades)

Green. What about edge cases?

def test_average_no_grades():
    gb = GradeBook()
    gb.add_student("Nodira")
    with pytest.raises(ValueError, match="no grades"):
        gb.average("Nodira")

Red. Fix it:

def average(self, name):
    grades = self.students[name]
    if not grades:
        raise ValueError(f"{name} has no grades")
    return sum(grades) / len(grades)

Green. All tests pass.

Notice the rhythm: red → green → red → green. You build the class one behavior at a time, and you are always testing it.

Full Code

After the full Red–Green–Refactor cycle, here is the complete GradeBook class together with all of its tests:

# gradebook.py

class GradeBook:
    def __init__(self):
        self.students = {}

    def add_student(self, name):
        self.students[name] = []

    def add_grade(self, name, grade):
        self.students[name].append(grade)

    def get_grades(self, name):
        return self.students[name]

    def average(self, name):
        grades = self.students[name]
        if not grades:
            raise ValueError(f"{name} has no grades")
        return sum(grades) / len(grades)

# test_gradebook.py

import pytest
from gradebook import GradeBook

def test_add_student():
    gb = GradeBook()
    gb.add_student("Alisher")
    assert "Alisher" in gb.students  # 'in' works for dicts too!

def test_add_grade():
    gb = GradeBook()
    gb.add_student("Sevara")
    gb.add_grade("Sevara", 85)
    assert gb.get_grades("Sevara") == [85]

def test_average_grade():
    gb = GradeBook()
    gb.add_student("Jasur")
    gb.add_grade("Jasur", 80)
    gb.add_grade("Jasur", 90)
    gb.add_grade("Jasur", 100)
    assert gb.average("Jasur") == 90.0

def test_average_no_grades():
    gb = GradeBook()
    gb.add_student("Nodira")
    with pytest.raises(ValueError, match="no grades"):
        gb.average("Nodira")

Mocking External Dependencies

Let’s return to the OrderProcessor example that sends emails and writes to a database:

# hard to test!
class OrderProcessor:
    def process(self, items):
        total = sum(item.price for item in items)
        tax = total * 0.12
        grand_total = total + tax

        # directly sends email... how do we test this?
        import smtplib
        server = smtplib.SMTP("mail.example.com")
        server.send_message(f"Your total is {grand_total}")

        # directly writes to database... how do we test this?
        import sqlite3
        conn = sqlite3.connect("orders.db")
        conn.execute("INSERT INTO orders VALUES (?)", (grand_total,))

        return grand_total

We said this is hard to test because of the side effects. But sometimes you genuinely have to work with external systems. How do you test code that depends on things you cannot fully control — a web API, a database, an email server?

You mock them. You may already know the term “mock” from mock exams: a mock is a fake stand-in that pretends to be the real thing.

Python has a built-in module for this: unittest.mock. Yes, it lives inside unittest, but it works perfectly with pytest.

A Simple Mocking Example

Suppose we have a class that fetches user data from an API:

# user_service.py

import requests

class UserService:
    def __init__(self, api_url: str):
        self.api_url = api_url

    def get_user(self, user_id: int):
        response = requests.get(f"{self.api_url}/users/{user_id}")
        data = response.json()
        return data["name"]

We cannot call the real API during tests — it might be slow, unreliable, or cost money. So we mock it.

First we import patch and MagicMock from unittest.mock:

# test_user_service.py

from unittest.mock import patch, MagicMock
from user_service import UserService

patch is a decorator that replaces a real object with a fake during the test.
MagicMock is a fake object that accepts any method call or attribute access without complaining.

We apply patch to our test function, telling it:

“Inside the user_service module, whenever requests.get is called, use this fake instead.”

@patch("user_service.requests.get")

The fake is then passed into our test function as a parameter:

def test_get_user(mock_get):

Here mock_get is the fake replacement for requests.get during this test.

Next, we create a fake response with MagicMock:

mock_response = MagicMock()

Then we tell the mock response: “When someone calls .json() on you, return this dictionary.”

mock_response.json.return_value = {'name': 'Alisher', 'age': 20}

Then we tell mock_get (our fake requests.get): “When called, return our fake response.”

mock_get.return_value = mock_response

Now we just create the object under test and call its method:

service = UserService('https://fake-api.com')
name = service.get_user(42)

When get_user runs, it would normally call requests.get(...), but our patch redirects that call to mock_get, which returns mock_response. When the method calls .json() on the response, it gets the dictionary we configured. Finally, we assert the result:

assert name == 'Alisher'

No HTTP request is made. The test runs instantly, offline, and reliably.

Full Code

Putting all the pieces together, here is the complete test file:

# test_user_service.py

from unittest.mock import patch, MagicMock
from user_service import UserService

@patch("user_service.requests.get")
def test_get_user(mock_get):
    mock_response = MagicMock()
    mock_response.json.return_value = {'name': 'Alisher', 'age': 20}
    mock_get.return_value = mock_response

    service = UserService('https://fake-api.com')
    name = service.get_user(42)

    assert name == 'Alisher'

Mocking can get complicated, and this is just an introduction. The important idea is the concept itself:

When your code depends on something external, you can replace that thing with a fake during tests.

A useful design hint: if you find yourself needing lots of mocking, that is often a sign that your code’s design could be improved. Well-designed code requires less mocking because its pieces are more independent.

Why Automated Testing Matters

A Real-World Scenario

What Is a Unit Test?

Choosing a Testing Framework: pytest

Writing Your First Test

Adding More Tests

A Failing Test

Understanding assert

Testing a Class

Testing for Exceptions

Removing Repetition with Fixtures

Parameterized Tests

Writing Code That Is Easy to Test

What Makes a Class Testable

Test-Driven Development (TDD)

TDD by Example: Building a GradeBook

Full Code

Mocking External Dependencies

A Simple Mocking Example

Full Code

Choosing a Testing Framework: `pytest`

Understanding `assert`

TDD by Example: Building a `GradeBook`