Hacking python's import system for single file packages

The last couple of days I have been playing around with the Kaggle Halite environment. Quickly, I ran into issues with their requirement that any agent has to be a single python file. Initially, my code was short and easy to keep self-contained. However, after adding more and more functionality, it soon became unwieldly and unintelligible. At this point, I decided to look for alternatives. In this blog post, I describe how to hack Python's import system to embed full packages into a single python module.

In the end, the functionality was surprisingly simple to get working. There are two parts required to build a custom import logic:

For simplicity, I just implemented both interfaces in the same class:

class Bootstraper:
    def find_spec(self, name, path, target=None):
        from importlib.machinery import ModuleSpec
        
        mangled_name = self.mangle("module", name)
        if not mangled_name in globals():
            return None
        
        is_pkg = self.mangle("pkg", name)
        return ModuleSpec(
            name, 
            self, 
            is_package=globals().get(is_pkg, False),
        )
    
    def create_module(self, spec):
        return None
    
    def exec_module(self, module):
        exec(
            globals()[self.mangle("module", module.__name__)], 
            module.__dict__,
        )
    
    def mangle(self, type, name):
        return type + "__" + name.replace(".", "__")

    
import sys
sys.meta_path.insert(0, Bootstraper())

And that all there is to it. With this bootstrapping code, module definitions can be embedded as strings into global variables. Each variable has to have name of the form module__{a}__{b}__{c}, where each __ represents a . inside the qualified module name. To mark packages, a boolean variable with the prefix pkg can be used.

Using this bootstrapping logic, a simple package can be defined and used in a single file as follows (download example):

# NOTE: this snippet needs to be prefixed with above's bootstrap code

pkg__foo = True
module__foo = '''
'''

module__foo__bar = '''
def bar_func():
    print("bar")
'''

module__foo__baz = '''
def baz_func():
    print("baz")
'''

from foo.bar import bar_func
bar_func()

from foo.baz import baz_func
baz_func()

While it works, it is admittedly quite easy to break. For example, having a module or package named foo in your Python path, will break this setup in hard to debug ways. So will I use this code? Probably not in any "real" project, but in the context of the Kaggle challenge, it gets the job done and is easy to adapt if new issues pop up. In case, you're looking for more battle tested implementations with similar approaches, checkout the following packages on PyPI:

  • pinliner: is using an approach similar to the one outlined in this post, but uses a dictionary for storage
  • stickytape: is re-creating the package in a temporary directory inserted into the Python path