Adding test cases to GitHub Echo

Adding test cases to GitHub Echo

Ensuring Reliability Through Test Cases with Pytest

Introduction

In our open-source course at Seneca this week, we focused on adding tests to the projects we've been building. If you've read my previous blogs, you might know that I've been working on GitHub Echo for the past couple of weeks (check it out here: GitHub Echo). Until now, I thought the project was flawless—but that's easy to assume when you haven’t written any tests, so there’s nothing to fail! This week, my goal was to integrate testing with pytest (pytest documentation) and incorporate it into my CI pipeline. Now, whenever there's a new change, I can quickly verify that everything works smoothly without breaking my code.

In this blog, I will walk you through the steps and my thought process while doing this.

Getting started

The first step in the course was to choose a testing framework, and I selected Pytest, as it’s known to be one of the most popular Python testing frameworks. Since I was new to Pytest, I began exploring and learning more about it to ensure I could apply it effectively in my project. I primarily learned the essentials through a Codecademy video available here and from Real Python’s comprehensive guide on testing in Python.

As I manage my project dependencies with Poetry, I added Pytest to the project with the command poetry add pytest. Additionally, I included requests-mock to mock API requests, following the requests-mock documentation for setup and usage.

The goal of this week’s lab was not to cover the entire project but to focus on a few utility functions, particularly the core LLM functionality. I started by testing some parser utility functions in my project’s codebase, which allowed me to get comfortable with the process of writing and structuring tests. You can view the specific utility functions I worked with here in the parser module. This initial setup helped me get a good feel for Pytest and laid the foundation for more extensive testing as the project progresses.

Here are some important things that I learnt while doing this:

Using pytest.fixture to Reuse Data Across Tests

In my tests, I use pytest.fixture to set up reusable data. Fixtures are great for situations where I need a consistent setup across multiple tests, such as when I need the same configuration data for validating my load_toml_config function.

Here's how it works:

  • I created config_file_content, a fixture that provides a sample configuration in TOML format. Any test that needs this fixture can just accept it as a parameter. This makes my tests cleaner since I don’t need to redefine the configuration in each test function.

  • I also have an expected_config fixture that holds the expected dictionary output. With pytest, I can keep both config_file_content and expected_config reusable and separate from the tests, which makes my setup more flexible.

For example, here’s how I use config_file_content in a test:

def test_load_valid_config(self, config_file_content, expected_config):
    # use config_file_content as the input and expected_config to verify the output

Using fixtures this way keeps my tests more readable and avoids duplicate setup code.

Mocking with patch to Simulate File and Path Operations

In my tests, I need to interact with the filesystem to load configuration files. But I don’t want to rely on real files because they might not exist or could be difficult to set up. That’s where unittest.mock.patch comes in handy. I can use patch to replace these file and path operations with mock objects that simulate their behavior.

For instance, I mock Path.exists to simulate whether a file exists. If I want to test a missing config file, I can patch Path.exists to return False, like this:

with patch('pathlib.Path.exists', return_value=False):
    # Now, load_toml_config will think the file doesn't exist

In cases where I need to mock the file read itself, I patch open. This way, I can simulate different file contents without creating any actual files. For example, here’s how I use open to provide specific content:

with patch('builtins.open') as mock_file:
    mock_file.return_value.__enter__.return_value.read.return_value = config_file_content
    # Now, when load_toml_config reads the file, it gets config_file_content

Using patch like this makes my tests more reliable and faster, and I have full control over the input data.

Testing Error Handling with pytest.raises

Error handling is a crucial part of any function, and pytest makes it easy to verify errors with pytest.raises. In my code, I use pytest.raises to check that certain inputs or file errors raise the expected exceptions. For example, when I want to ensure a PermissionError is raised if there’s a file permission issue, I can use pytest.raises like this:

with pytest.raises(PermissionError):
    load_toml_config('.github-echo-config.toml')

This is very clean and expressive—I can specify both the error type and inspect the error message if needed. pytest.raises helps me confirm that my code correctly handles various exceptional cases.

Asserting Outputs and Structures in JSON to Markdown Conversion

In my json_to_markdown function tests, I use straightforward assertions to confirm the output structure. I don’t need to mock anything here, as the function works on JSON data directly. I assert that the Markdown conversion matches expected text.

For example:

assert result == expected_result

Using assertions like this verifies the function’s logic without extra setup. If the function were more complex, I could add more assertions for intermediate states, but for now, a simple check against expected_result does the job.

Testing the core LLM functionality

Writing tests for two different Large Language Models (LLMs), Google Gemini and Groq, was a complex task. Testing Google Gemini was relatively straightforward, but testing Groq required more work because mocking its responses wasn’t feasible without significant setup. Here's how I approached testing each model, the challenges involved, and some example code to illustrate the process.

Why Test LLM Integration?

Testing LLMs, especially when integrating multiple models, is crucial to ensure consistent outputs and to handle unexpected errors gracefully. In this case, my product leverages both Google Gemini and Groq LLMs to generate summaries of GitHub repositories. Each model has unique capabilities and limitations, so testing them individually was essential.

Mocking Google Gemini

For Google Gemini, I was able to directly mock the API responses using unittest.mock.patch. This approach was simple because the response structure of the model was predictable and didn’t require the complex setup Groq did. Here’s a test that checks if the summary generation function (get_gemini_summary) works as expected:

# Test function to ensure successful summary generation using Gemini
@patch('google.generativeai.GenerativeModel.generate_content')
def test_get_gemini_summary(
    self, mock_generate_content, mock_gemini_response, mock_usage_metadata
):
    # Set up a mock response to simulate the API output
    mock_response = MagicMock()
    mock_response.text = json.dumps(mock_gemini_response)
    mock_response.usage_metadata = mock_usage_metadata
    mock_generate_content.return_value = mock_response

    # Call the function to generate a summary with mocked data
    github_data = {'repo_name': 'example-repo', 'owner': 'user'}
    model_temperature = 0.7
    result = get_gemini_summary(github_data, model_temperature)

    # Verify if the formatted response matches the expected markdown format
    expected_formatted_response = json_to_markdown(mock_gemini_response)
    assert result['formatted_response'] == expected_formatted_response
    assert result['usage'] == mock_usage_metadata

In this example, I:

  1. Mocked the Gemini API response using mock_generate_content to simulate the output.

  2. Compared the function output to the expected response, verifying that the formatted_response field in the result matches the expected markdown format.

The simplicity of mocking here allowed me to avoid setting up an entire client, keeping the test lightweight.

Testing Groq with a Mock Client

For Groq, I ran into challenges when trying to mock specific responses. Unlike Google Gemini, Groq’s response structure and API setup were harder to simulate. Instead of mocking just the response, I had to mock the entire Groq client and simulate its behavior, which took time and required additional setup.

Here’s how I tested Groq using a mock client:

# Mock Groq client setup
@pytest.fixture
def mock_groq_client(self, mock_claude_response):
    mock_client = MagicMock()
    mock_choice = MagicMock()
    mock_choice.message.content = json.dumps(mock_claude_response)

    mock_response = MagicMock()
    mock_response.choices = [mock_choice]
    mock_response.usage = {
        'completion_tokens': 123,
        'prompt_tokens': 456,
        'total_tokens': 579,
    }

    mock_client.chat.completions.create.return_value = mock_response
    return mock_client

# Test function to ensure successful summary generation using Groq
def test_get_groq_summary(self, mock_groq_client):
    with patch('application.core.models.groq_model.client', mock_groq_client):
        repo_data = {'some_key': 'some_value'}
        temperature = 0.5
        result = get_groq_summary(repo_data, temperature)

        # Check if the Groq client was called with correct parameters
        mock_groq_client.chat.completions.create.assert_called_once()
        call_args = mock_groq_client.chat.completions.create.call_args[1]
        assert call_args['model'] == 'mixtral-8x7b-32768'
        assert 'formatted_response' in result
        assert 'usage' in result

        formatted_response = result['formatted_response']
        assert '## Branch Protection' in formatted_response
        assert 'No branch protection' in formatted_response

Here’s a breakdown of the steps and reasoning:

  1. Mock the Groq client: The mock_groq_client fixture sets up a complete mock of the client, including a simulated completions.create response. This level of detail was necessary because mocking only the response itself led to inconsistencies.

  2. Use patch to replace the actual client: I patched the client in the get_groq_summary function to use the mock client during testing.

  3. Validate the client call: After calling get_groq_summary, I confirmed the client was called with the right parameters (model name, response format, and temperature).

  4. Check the output structure: I verified the presence of key fields like formatted_response and ensured it contained expected elements, such as "Branch Protection."

Setting Up Test Coverage with pytest-cov and Custom Testing Commands

In this section of my project, I focused on improving the testing infrastructure and making it more accessible to contributors. One of the main steps I took was integrating pytest-cov into my testing workflow. pytest-cov is a plugin for pytest that enables automatic generation of code coverage reports, helping developers identify areas of their codebase that need more test coverage. To install pytest-cov, I used the following command:

poetry add pytest-cov

This command installs pytest-cov as a dependency in my project. Once installed, I learned how to use it to generate coverage reports and documented the process in the project's documentation.

Running Tests in Various Scenarios

To make it easier for contributors to run tests, I set up several custom scripts in the pyproject.toml file. These scripts allow users to run tests in different scenarios, whether they want to test only a specific file, class, or run tests continuously as the code changes.

The following scripts were added under the [tool.poetry.scripts] section in pyproject.toml:

[tool.poetry.scripts]
lint = "_scripts:lint"
format = "_scripts:format_code"
lint-and-format = "_scripts:lint_and_format"
run-tests = "_scripts:run_tests"
run-tests-on-files = "_scripts:run_tests_on_files"
run-tests-on-classes = "_scripts:run_tests_on_classes"
run-coverage = "_scripts:run_coverage"
run-coverage-report = "_scripts:run_coverage_report"
run-coverage-html = "_scripts:run_coverage_html"
watch-tests = "_scripts:watch_tests"
watch-tests-coverage = "_scripts:watch_tests_with_coverage"

These scripts provide functionality for various testing and coverage needs:

  • lint: Runs the code linter (using tools like Ruff) to ensure the code adheres to style guidelines.

  • format: Automatically formats the code to adhere to style conventions.

  • lint-and-format: Runs both linting and formatting.

  • run-tests: Runs all tests in the project.

  • run-tests-on-files: Runs tests on specific files.

  • run-tests-on-classes: Runs tests on specific classes.

  • run-coverage: Runs tests and generates a code coverage report.

  • run-coverage-report: Generates a coverage report in the terminal.

  • run-coverage-html: Generates a detailed HTML report of code coverage.

  • watch-tests: Runs tests continuously as files change, useful during development.

  • watch-tests-coverage: Similar to watch-tests, but with coverage reporting.

These scripts are documented in the Contributing Guide to ensure that other developers can easily run tests, check coverage, and keep their code in good shape.

Example: Running Tests on Specific Files or Classes

If you want to run tests on a specific file or class, you can use the following commands:

  • To run tests on specific files:

      poetry run run-tests-on-files path/to/file.py
    
  • To run tests on specific classes:

      poetry run run-tests-on-classes test_module.TestClass
    

These commands are especially helpful when you're working on a specific part of the codebase and want to quickly verify that your changes don't break anything.

Automatically Running Tests with --watch

Another useful feature I set up is the ability to run tests automatically when code changes. This is done using the watch-tests script, which watches for changes in your Python files and reruns the tests as soon as those files are saved. This is extremely helpful for continuous testing during development. You can use it as follows:

poetry run watch-tests

If you also want to track test coverage in real-time while running the tests, you can use:

poetry run watch-tests-coverage

This will run your tests continuously and generate coverage reports on the fly.

Setting Up Continuous Integration (CI) with GitHub Actions

To ensure that tests run automatically in a CI environment, I set up a GitHub Actions pipeline. This pipeline runs whenever changes are pushed to the main branch or when a pull request is created. Here's the full configuration for the pipeline in the .github/workflows/ci.yml file:

name: CI Pipeline

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main
  workflow_dispatch:

jobs:
  code-lint:
    name: Lint with Ruff
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install ruff
      - name: Run Ruff
        run: ruff check .

  test:
    name: Run Tests
    runs-on: ubuntu-latest
    needs: ["code-lint"]
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install poetry
          poetry install
      - name: Run tests
        run: |
          poetry run run-tests

Explanation:

  1. code-lint job: This job runs the Ruff linter to ensure that the code is well-formed and adheres to style guidelines.

  2. test job: This job installs dependencies using Poetry and runs the tests using the poetry run run-tests command.

The CI pipeline ensures that every change to the main branch is linted and tested automatically, providing confidence that the code is always in a good state.

If you'd like to learn more about testing with pytest, you can check out the official documentation here. To learn more about integrating code coverage, refer to pytest-cov's documentation.

Conclusion

From this process, I’ve learned a lot about the value of testing and how it improves the reliability of code. While I’ve done testing before, specifically with Jest for JavaScript projects, this experience with pytest was a bit more challenging, but also more rewarding in the end. I’ve realized that testing isn’t just about catching bugs but also about ensuring that my code behaves as expected, even as it evolves.

Before diving into testing with pytest, I had primarily worked with Jest, which is known for its simplicity, especially in front-end applications. Jest’s setup is minimal, and it often handles mocking and assertions in a more automated way, which made it easier for me to get started with. In contrast, pytest requires a bit more effort in terms of manual setup, such as managing fixtures and organizing test cases. While the learning curve for pytest was steeper, I found it much more powerful once I became familiar with its features, particularly when it comes to managing complex testing scenarios and providing more granular control over test execution.

I believe testing is essential, and I definitely plan to incorporate it into all my future projects. Whether using Jest for JavaScript or pytest for Python, testing ensures that my code remains robust and maintainable. Additionally, with tools like pytest, I now appreciate the importance of writing tests early on and integrating them into the development process rather than retrofitting them later. So yes, I’ll absolutely be doing more testing in the future, and I’ll continue to learn and refine my testing skills along the way.