Skip to content

A modular template for extensible (Python) projects

For every programming problem that I have had in the past, there may have been a solution that I found and implemented. In most cases, the code is written and forgotten about pretty easily and we can move on.

Now, when you get to a problem a second time, you might have a déjà-vu moment and hopefully remember how you solved it. I had it the other day when creating the codebase for a new project, and I absolutely could not remember how I did it. Fortunately, all my code is saved in some Git repositories, so I could easily look up my own code. And I have a rule for these moments: If you encounter the same problem twice, you should totally write a blog post about it! The next time I will hopefully just have to look for this post instead of digging up old codebases.

This is about a project structure I have used in the past which allows me and others to easily extend the functionality of my software. When I start a new project, I spend a considerable amount of time thinking about the structure of my project. What are my requirements, how do I deal with changes in later stages of development, how do I make my code flexible and more readable? What I do often is look for “best practices” in the specific language or field, and use that as orientation on what to do (e.g. with Django).

For this blog post, I will be describing a structure that allows for an extension of functionality by external contributors.

Why a modular approach?

In most of my personal project, I start with the idea of open-sourcing the code. Sometimes it is just to show off some ideas, sometimes I just imagine people wanting to contribute to it. Therefore, you have to make it as straighforward as possible to be able to extend the software. In my specific case, I have modules that I would like to be implemented by other people. Similar to a Plugin software pattern (described in Martin Fowler’s Patterns of Enterprise Application Architecture), a modular approach allows others to add their own implementation of a functionality without knowing all the code.

As an example, consider a software that generates different files based on user-provided parameters, such as:

generator create pdf
-> file.pdf created
generator create jpeg
-> file.jpeg created
# ...

In such a case, the same executable is able to create multiple types of files. Now, what if I want it to create a mp3 file? I could ask the developers to implement it, but they could not have the time or resources to do that. Or maybe the implementation requires special knowledge that only selected people can provide (think proprietary formats). In this case, I could be willing to provide the implementation myself and share it with the developers to improve the software for everyone.

Now, in most cases, the software is complex and not documented well enough to be able to dive into the code immediately. So I would need to know where to change the CLI interface, where the code to generate stuff is called, how files are written to the hard drive etc. But, if the software used a modular approach as described below, you could simply add your own files, without modifying the core code, and the software would include it into its execution.

Okay, enough introduction into what and why, let us talk about the actual implementation.

A modular plugin pattern for Python projects

I am using Python for this, and the concept is not easily transferable to other languages for reasons. As I make use of dynamic code loading functionality, you could achieve the same with Java and Reflection / Class loaders, for example.
The great thing about Python is: At execution time, you can choose to load code from a file and include it into your execution environment. So what the code does is:

  1. Look for modules inside some directories
  2. Include them into the current process memory and Python environment
  3. Execute the module

First, let’s take a look at the project structure:

project
├── __init__.py
├── tests.py
├── utils.py
├── findmodules.py
├── moduleloader.py
└── templates
    ├── empty-template
    │   ├── empty.py
    │   └── empty-template.py
    ├── templ1
    │   ├── jpeg.cpp
    │   ├── jpeg.h
    │   └── jpeg_gen.py
    ├── templ2
    │   ├── pdf.py
    │   └── pdf_creator.cpp
    └── templ3
        └── textfiles.py

As you can see, you have the regular code inside the project’s folder. In addition, a templates folder contains all the modules that are loaded at runtime. There, you can also see how the implementation can include code in different languages. You just have to include some Python glue code. Now, the special parts of this are the .py files inside templates. They contain some boilerplate code that is used to load them into our application, but also custom code by individual contributors. Such a file looks like this:

from moduleloader import CustomModule

class ExampleModule(CustomModule):
    TEMPLATE_NAME = '' # required
    VERSION = '1.0'
    AUTHORS = []

    def run(self, **kwargs):
        super(type(self), self).run(**kwargs)
        # your custom code here

I chose a class-based approach here, as classes are easier to handle than individual functions. Also, using class inheritance and abstract classes, you can define how such a CustomModule is structured, which methods, attributes or fields it contains. When a module has this base structure, you rely on the developer implementing only the relevant code to extend the functionality. In the run method, developers can freely implement their ideas without having to know how the code is run. Basically, you only have to agree on some common structure.

The counterpart in the core code looks like this:

from abc import ABCMeta, abstractmethod

class CustomModule(metaclass=ABCMeta):
    TEMPLATE_NAME = ''  # required
    VERSION = '1.0'
    AUTHORS = []

    def __init__(self):
        self.__check_integrity()

    def __check_integrity(self):
        assert hasattr(self, 'TEMPLATE_NAME'), 'TEMPLATE_NAME must be set'
        assert hasattr(self, 'VERSION'), 'VERSION must be set'
        # ...

    @abstractmethod
    def run(self, **kwargs):
        # called from module

In this code snippet, I used Abstract Base Classes to protect my base class from instantiation by a module, but you do not actually need this. The interesting parts are most likely the constructor __init__, where you can do all kinds of stuff whenever a module is loaded (like populate some fields), and the abstract methods that have to be overwritten by the module. With this, you have a predictable base and can work with different module implementation without having to look into the actual code yourself. There are some other cool tricks you can use, such as disabling print inside a function to avoid modules messing with standard output, or automatic integrity checks for new modules.

Searching and loading custom modules

What is left now is loading these modules into your code. For this, you need to know where the modules are located on the hard drive, and what exactly you want to import.
For this, I created a findmodules function that does exactly this. It makes use of the importlib standard library of Python 3. More precisely, the importlib.machinery.SourceFileLoader function is used to import single source files. You could modify it to include any class, function or structure into your code, but what I did was one step further: Only include classes that are subclasses of CustomModule. The full code can be found on GitHub and here is the relevant code for loading modules:

for root, d, files in os.walk(paths):
    candidates = [fname for fname in files if fname.endswith('.py') and
    not fname.startswith('__')]
    for c in candidates:
    modname = os.path.splitext(c)[0]
    try:
    module = SourceFileLoader(
            modname,
            os.path.join(root, c)).load_module()
    except (SystemError, ImportError,
            NotImplementedError, SyntaxError):
        # error
        continue

And for extracting the relevant classes:

import inspect

# get classes from module
# ... 
cls = getattr(module, cls)
if (inspect.isclass(cls) and
        inspect.getmodule(cls) == module and
        issubclass(cls, base) and
        cls != base):
    classes.append(cls)

The invocation of this function is:

class_list = search(CustomModule)
for cl in class_list:
    cl()  # create instance

Conclusion

Now your software can be easily extended by other developers that are not familiar with your code (and do not need to be). Also, depending on your use case, you can allow others to implement code in their favourite language (see project structure above), as the modular approach allows to do mostly everything (like calling other interpreters/complers/binaries) inside the environment you expose as a core developer. Splitting up work and extending your software has never been made easier!

Published inTips and Tricks

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *