How to Bootstrap a Python Project

Defining Python Packages
Using Git
Preferring Python 3
Using Virtualenv
Handling Dependencies With Pip
Choosing a Testing Framework
Integrating pep8 and pylint
Writing Sanity-Check Script
Enable Logging
Do Not Crash
Summary

Created On: 2016-11-10 Updated On: 2016-12-16

In this post, I will talk about how to bootstrap a new python project. This is obviously not the only way to organize a python project. I just cover the important things that you should consider when creating a new python project.

Python is often misunderstood to be a scripting language. There is no doubt python can be used for scripting. But it's far from a scripting-only language. In fact, python can be used for much more area than scripting, such as web programming, scientific programming, GUI programming, data processing, and machine learning.

When all you want is to write a simple script, there is no need to create a project. Just create a python file and run it right away.

When you are working on something serious, something that is supposed to be maintained for a long time, something that depends on some external libraries, you should consider creating a python project.

Defining Python Packages

When you want to create a project, the first thing is where to put your source files. Python has no strong directory specification like the maven standard directory layout convention for java.

In a project, I suggest you always create python packages and put source files inside packages. If you bootstrap a project using templates, usually packages are created automatically, such as the skeleton project generated by django-admin startproject.

Here is how it may look like for a simple python package:

File Name	Description
README.rst	project README file
requirements.txt	package dependencies
mypkg/	package directory
mypkg/__init__.py
mypkg/main.py	source file
mypkg/test_main.py	test file
mypkg/mylib.py
mypkg/test_mylib.py

A python package is a directory with proper naming and with a __init__.py file. Usually short all-lowercase names are used. Underscores can be used but is discouraged.

When a file is put in a package, you can import the module using <pkg_name>.<module_name> syntax.

Package can be nested and used as a namespace tool. For example,

mypkg/
mypkg/__init__.py
mypkg/main.py	source file
mypkg/subpkg/
mypkg/subpkg/__init__.py
mypkg/subpkg/sublib.py	source file

Each package directory should have a __init__.py in it.

Packages are understood by many tools. For example, py.test could find all tests in a package for you. setuptools could find packages and install them.

Here is a sample setup.py file for mypkg package, notice the use of find_packages():

#!/usr/bin/env python
# coding=utf-8

"""
python distribute file
"""

from __future__ import (absolute_import, division, print_function,
                        unicode_literals, with_statement)

from setuptools import setup, find_packages


def requirements_file_to_list(fn="requirements.txt"):
    """read a requirements file and create a list that can be used in setup.

    """
    with open(fn, 'r') as f:
        return [x.rstrip() for x in list(f) if x and not x.startswith('#')]


setup(
    name="mypkg",
    version="0.1.0",
    packages=find_packages(),
    install_requires=requirements_file_to_list(),
    dependency_links=[
        # If your project has dependencies on some internal packages that is
        # not on PyPI, you may list package index url here. Then you can just
        # mention package name and version in requirements.txt file.
    ],
    entry_points={
        # 'console_scripts': [
        #     'main = mypkg.main:main',
        # ]
    },
    package_data={
        'mypkg': ['logger.conf']
    },
    author="FIXME add author",
    author_email="FIXME add email",
    maintainer="FIXME add maintainer",
    maintainer_email="FIXME add email",
    description="FIXME add description",
    long_description=open('README.rst').read(),
    license="GPLv2+",
    url="https://pypi.python.org/pypi/mypkg",
    classifiers=[
        'Development Status :: 3 - Alpha',
        'License :: OSI Approved :: GNU General Public License (GPL)',
        'License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)',
        'Programming Language :: Python :: 2.7',
        'Programming Language :: Python :: 3.4',
    ]
)

To learn more about package distribution, read Distributing Python Modules and Packaging and Distributing Projects. To learn more about modules, packages and distribution, you may read the official document links I collected.

Using Git

Any software project should use a version control system (VCS) now. Git is fast and popular. You can also use another tool if you are already familiar with it.

When using VCS, commit in packages, requirements.txt file. Do not commit in virtualenv, python byte codes, dist files.

Here is my .gitignore file for python project:

.venv/
bin/
dist/
build/
wheelhouse/
*.egg-info/
*.py[co]
__pycache__/
.tox
*.deb
doc/_build/

Preferring Python 3

Since python 3 is not compatible with python 2, you may need to decide which one to use. If you can't decide for yourself, usually you should prefer python 3 as of 2016. If you have more specific requirements, talk to experts to get more advice.

Writing code that works in both python 2 and python 3 is not difficult. Tox is a tool that can create virtualenv, install dependencies and run tests for you in multiple python versions. It's very useful for testing libraries that should work on both python 2 and python 3.

Here is a sample tox.ini file for testing a simple library with package name foo:

[tox]
envlist = py27,py34

[testenv]
commands = pep8 --ignore=E202,E501 foo
           pylint -E foo
           py.test foo
whitelist_externals = make
deps = pep8
       pylint
       pytest

To run the tests, simple type tox. Tox will create multiple virtualenv in .tox directory, install deps packages in virtualenv and run the test and check commands.

Using Virtualenv

Virtualenv should be used for any serious python project. It's a tool to create isolated python environment, including interpreter, pip and library dependencies. It works with CPython, pypy etc.

If you only care about systems that has python3 installed by default, venv module can also be used. Usually I prefer the old virtualenv, which works for both python 2 and python 3.

Here is an example of using virtualenv:

virtualenv --python=python3 -q .venv
. .venv/bin/activate
pip install -r requirements.txt

If you don't work in shell, running python and pip inside the virtualenv directory also works:

virtualenv --python=python3 -q .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/python -V

Handling Dependencies With Pip

Pip is a tool to install, upgrade and uninstall python packages. When creating project, you should create a requirements.txt file to list the dependencies of your project.

Example requirements.txt file for a python project:

bottle==0.12.9
gevent==1.1.2
psycopg2==2.6.2
boto3==1.4.0
fake-factory==0.6.0
celery==3.1.23
redis==2.10.5
lxml==3.6.4

As you can see, it's recommended that you list both the package name and the package version. You can find which version you are running by pip freeze command. However, I don't like using the output directly. Because intermediate dependencies are also listed by pip freeze, which makes the list messy.

When developing software, you can install dependencies from PyPI using pip. When deploying software, however, I suggest against it. I recommend you get wheels offline (use pip wheel) and distribute them with your app.

Choosing a Testing Framework

Most projects needs some testing to make sure the logic is correct and to make maintenance and refactoring easier.

A testing framework can simplify how you write tests and how you run them. py.test is the tool I recommend. It has no boilerplate when defining and running tests. It has excellent text reports.

Here is an example:

#!/usr/bin/env python
# coding=utf-8

"""
example of using py.test
"""

def fib(n):
    """return the nth number in Fibonacci sequence.

    Args:
        n: a non-negative integer

    Return:
        the nth number in Fibonacci sequence, starting with 1, 1, ...

    """
    if n <= 0:
        return -1
    i = j = 1
    for _ in xrange(n - 1):
        i, j = j, i + j
    return i


def test_fib():
    assert fib(1) == 1
    assert fib(2) == 1
    assert fib(3) == 2
    assert fib(4) == 3
    assert fib(5) == 5


def test_fib_bad_input():
    assert fib(0) == -1
    assert fib(-34) == -1

As you can see, defining test is just like defining regular functions. Use assert to write assertions. It works on both atoms and containers. Tests could either be put in the same source file or in separated files. By convention, test file names and test function names start with test_. You can just run py.test <PKGNAME> to run all tests.

Integrating pep8 and pylint

pep8 is a tool that can check whether your code conform to the PEP 8 style guide. A style guide is there to make your code easier to read by yourself and other developers.

Pylint is a source code analyzer that can find common programming errors and code smells. For example, pylint can find function redefinitions, unused parameters, undefined symbols, bad imports etc. There are newer tools in this catalog, but pylint is what I use. When using pylint, I usually run pylint -E, which only list warnings and errors. Without using -E option, the output will be too long to digest. If you do read the full output, you might be surprised to see how much pylint can understand your code, especially on a medium or large size project. Another reason for using -E is that it runs much faster, since it does less analysis.

pep8 and pylint are configurable, not all checks are good for all projects. I myself disable some checks by default. Some pylint checks can be disabled at line level, function level or module level. Only disable a check globally when you think it brings more trouble than benefit.

Here is a Makefile that integrate virtualenv, pep8, pylint and pytest:

PYTHON_MODULES := mypkg
PYTHONPATH := .
VENV := .venv
PYTEST := env PYTHONPATH=$(PYTHONPATH) PYTEST=1 $(VENV)/bin/py.test
PYLINT := env PYTHONPATH=$(PYTHONPATH) $(VENV)/bin/pylint --disable=I0011 --msg-template="{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}"
PEP8 := env PYTHONPATH=$(PYTHONPATH) $(VENV)/bin/pep8 --repeat --ignore=E202,E501,E402
PYTHON := env PYTHONPATH=$(PYTHONPATH) $(VENV)/bin/python
PIP := $(VENV)/bin/pip

DEFAULT_PYTHON := /usr/bin/python3
VIRTUALENV := /usr/local/bin/virtualenv

REQUIREMENTS := -r requirements.txt

default: check-coding-style

venv:
        test -d $(VENV) || $(VIRTUALENV) -p $(DEFAULT_PYTHON) -q $(VENV)
requirements:
        @if [ -d wheelhouse ]; then \
                $(PIP) install -q --no-index --find-links=wheelhouse $(REQUIREMENTS); \
        else \
                $(PIP) install -q $(REQUIREMENTS); \
        fi
bootstrap: venv requirements

check-coding-style: bootstrap
        $(PEP8) $(PYTHON_MODULES)
        $(PYLINT) -E $(PYTHON_MODULES)
pylint-full: check-coding-style
        $(PYLINT) $(PYTHON_MODULES)
test: check-coding-style
        $(PYTEST) $(PYTHON_MODULES)
check:
        $(PYTEST) $(PYTHON_MODULES)

.PHONY: default venv requirements bootstrap check-coding-style pylint-full test check

To run all of pep8/pylint/pytest, just type "make test". It will create virtualenv in .venv directory, install dependencies, and run the checkers. To only run pytest, type "make check". You may want to run pytests only when you have a large project, where pylint runs can significantly slow down your development-test cycle.

This makefile is a trimmed version that only focus on tools mentioned in this section to make it easier to understand. You may also read the full version. That version has more make targets and some target has dependencies on external scripts that I may write posts about later on. You need to adapt it yourself.

If you use git, you can config a pre-commit hook that automatically runs "make test" for you when you run "git commit". If any tests fail, the commit will abort. To do that, create a file .git/hooks/pre-commit with the following content:

#!/bin/sh

make test

Then make it executable by

chmod +x .git/hooks/pre-commit

You can also config post-commit and post-merge hook that triggers a Jenkins CI build. But this post is already too long. So I won't talk about the details.

Writing Sanity-Check Script

Sometimes, a project has external dependencies at runtime. If an external service is down, the project cannot run.

These dependencies includes external database, network connectivity, system commands, credentials to 3rd party systems, disk space requirements, etc.

Being a responsible developer, you should provide a sanity-check script to check all external environmental dependencies. This will help sysadmin team to locate problems when deploying your application for the first time, or when something goes wrong.

Enable Logging

Logging should be in every project. Python's logging module is powerful and flexible. But it's difficult to learn and configure.

My suggestion is configure it once and forget about it forever. I learned it from the official logging module document. It has a lot of content, but not that difficult to understand.

Using logging module is quite easy compared to configuring it.

Here is how you use logging in any module:

import logging

logger = logging.getLogger(__name__)


def my_function(param1):
    logger.info(u"running my_function()")
    # ...
    logger.error(u"something goes wrong, param1=%s", param1)
    # ...

When writing logs in your app, make sure your logs are useful. Specifically, try not to flood your log file and make every line helpful.

Do Not Crash

If the program you wrote is a daemon or a script that runs for a long time, or a script that runs on many items (e.g. files), it's important it does not crash halfway. Sometimes this is called defensive programming.

In python, you should know where the program could fail and deal with the failure accordingly. Here are some ways to not crash:

Capture exceptions properly
KeyError, IndexError, IOError, DatabaseError, UnicodeDecodeError are very common exceptions. You should know when an operation could raise these exceptions and handle them at the correct place.
Validate user inputs
Check your assumptions and rule out bad user inputs. Write total functions whenever you can.
Implement retry
When it's environmental error, such as network is down temporarily, implementing retry can save a lot of hassle. Auto retry can be implemented using decorators and it is easy to use.
Fail fast
When you notice you can't fulfill a request, you could choose to fail fast. For example, when an action requires a long computation, then write result to a database, it's best you check whether connection to database is okay before doing any computation. Otherwise, the computation may be wasted.

In web app, when user's authentication or request parameter is invalid, you should fail fast and return 4xx immediately.
Deploy daemons using upstart, systemd or supervisord
Nowadays, when you need to run a daemon, you don't need to do it in the traditional unix way anymore. Use one of these service management tools. These tools can start a process for you, monitor it, restart it when it crashes. Some provide stdout and stderr logging as well.

Summary

May your new project go well. If you have questions, leave a comment.