How to Bootstrap a Python Project
Table of Contents
Created On: 2016-11-10 Updated On: 2016-12-16
In this post, I will talk about how to bootstrap a new python project. This is obviously not the only way to organize a python project. I just cover the important things that you should consider when creating a new python project.
Python is often misunderstood to be a scripting language. There is no doubt python can be used for scripting. But it's far from a scripting-only language. In fact, python can be used for much more area than scripting, such as web programming, scientific programming, GUI programming, data processing, and machine learning.
When all you want is to write a simple script, there is no need to create a project. Just create a python file and run it right away.
When you are working on something serious, something that is supposed to be maintained for a long time, something that depends on some external libraries, you should consider creating a python project.
Defining Python Packages
When you want to create a project, the first thing is where to put your source files. Python has no strong directory specification like the maven standard directory layout convention for java.
In a project, I suggest you always create python packages and put source files
inside packages. If you bootstrap a project using templates, usually packages
are created automatically, such as the skeleton project generated by
django-admin startproject
.
Here is how it may look like for a simple python package:
File Name | Description |
---|---|
README.rst | project README file |
requirements.txt | package dependencies |
mypkg/ | package directory |
mypkg/__init__.py | |
mypkg/main.py | source file |
mypkg/test_main.py | test file |
mypkg/mylib.py | |
mypkg/test_mylib.py |
A python package is a directory with proper naming and with a __init__.py
file. Usually short all-lowercase names are used. Underscores can be used but
is discouraged.
When a file is put in a package, you can import the module using
<pkg_name>.<module_name>
syntax.
Package can be nested and used as a namespace tool. For example,
mypkg/ | |
mypkg/__init__.py | |
mypkg/main.py | source file |
mypkg/subpkg/ | |
mypkg/subpkg/__init__.py | |
mypkg/subpkg/sublib.py | source file |
Each package directory should have a __init__.py
in it.
Packages are understood by many tools. For example, py.test could find all tests in a package for you. setuptools could find packages and install them.
Here is a sample setup.py file for mypkg
package, notice the use of
find_packages()
:
#!/usr/bin/env python # coding=utf-8 """ python distribute file """ from __future__ import (absolute_import, division, print_function, unicode_literals, with_statement) from setuptools import setup, find_packages def requirements_file_to_list(fn="requirements.txt"): """read a requirements file and create a list that can be used in setup. """ with open(fn, 'r') as f: return [x.rstrip() for x in list(f) if x and not x.startswith('#')] setup( name="mypkg", version="0.1.0", packages=find_packages(), install_requires=requirements_file_to_list(), dependency_links=[ # If your project has dependencies on some internal packages that is # not on PyPI, you may list package index url here. Then you can just # mention package name and version in requirements.txt file. ], entry_points={ # 'console_scripts': [ # 'main = mypkg.main:main', # ] }, package_data={ 'mypkg': ['logger.conf'] }, author="FIXME add author", author_email="FIXME add email", maintainer="FIXME add maintainer", maintainer_email="FIXME add email", description="FIXME add description", long_description=open('README.rst').read(), license="GPLv2+", url="https://pypi.python.org/pypi/mypkg", classifiers=[ 'Development Status :: 3 - Alpha', 'License :: OSI Approved :: GNU General Public License (GPL)', 'License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3.4', ] )
To learn more about package distribution, read Distributing Python Modules and Packaging and Distributing Projects. To learn more about modules, packages and distribution, you may read the official document links I collected.
Using Git
Any software project should use a version control system (VCS) now. Git is fast and popular. You can also use another tool if you are already familiar with it.
When using VCS, commit in packages, requirements.txt file. Do not commit in virtualenv, python byte codes, dist files.
Here is my .gitignore
file for python project:
.venv/ bin/ dist/ build/ wheelhouse/ *.egg-info/ *.py[co] __pycache__/ .tox *.deb doc/_build/
Preferring Python 3
Since python 3 is not compatible with python 2, you may need to decide which one to use. If you can't decide for yourself, usually you should prefer python 3 as of 2016. If you have more specific requirements, talk to experts to get more advice.
Writing code that works in both python 2 and python 3 is not difficult. Tox is a tool that can create virtualenv, install dependencies and run tests for you in multiple python versions. It's very useful for testing libraries that should work on both python 2 and python 3.
Here is a sample tox.ini
file for testing a simple library with package name
foo
:
[tox] envlist = py27,py34 [testenv] commands = pep8 --ignore=E202,E501 foo pylint -E foo py.test foo whitelist_externals = make deps = pep8 pylint pytest
To run the tests, simple type tox
. Tox will create multiple virtualenv in
.tox directory, install deps packages in virtualenv and run the test and check
commands.
Using Virtualenv
Virtualenv should be used for any serious python project. It's a tool to create isolated python environment, including interpreter, pip and library dependencies. It works with CPython, pypy etc.
If you only care about systems that has python3 installed by default, venv module can also be used. Usually I prefer the old virtualenv, which works for both python 2 and python 3.
Here is an example of using virtualenv:
virtualenv --python=python3 -q .venv
. .venv/bin/activate
pip install -r requirements.txt
If you don't work in shell, running python and pip inside the virtualenv directory also works:
virtualenv --python=python3 -q .venv .venv/bin/pip install -r requirements.txt .venv/bin/python -V
Handling Dependencies With Pip
Pip is a tool to install, upgrade and uninstall python packages. When creating
project, you should create a requirements.txt
file to list the dependencies
of your project.
Example requirements.txt file for a python project:
bottle==0.12.9 gevent==1.1.2 psycopg2==2.6.2 boto3==1.4.0 fake-factory==0.6.0 celery==3.1.23 redis==2.10.5 lxml==3.6.4
As you can see, it's recommended that you list both the package name and the
package version. You can find which version you are running by pip freeze
command. However, I don't like using the output directly. Because intermediate
dependencies are also listed by pip freeze
, which makes the list messy.
When developing software, you can install dependencies from PyPI using
pip. When deploying software, however, I suggest against it. I recommend you
get wheels offline (use pip wheel
) and distribute them with your app.
Choosing a Testing Framework
Most projects needs some testing to make sure the logic is correct and to make maintenance and refactoring easier.
A testing framework can simplify how you write tests and how you run them. py.test is the tool I recommend. It has no boilerplate when defining and running tests. It has excellent text reports.
Here is an example:
#!/usr/bin/env python # coding=utf-8 """ example of using py.test """ def fib(n): """return the nth number in Fibonacci sequence. Args: n: a non-negative integer Return: the nth number in Fibonacci sequence, starting with 1, 1, ... """ if n <= 0: return -1 i = j = 1 for _ in xrange(n - 1): i, j = j, i + j return i def test_fib(): assert fib(1) == 1 assert fib(2) == 1 assert fib(3) == 2 assert fib(4) == 3 assert fib(5) == 5 def test_fib_bad_input(): assert fib(0) == -1 assert fib(-34) == -1
As you can see, defining test is just like defining regular functions. Use
assert to write assertions. It works on both atoms and containers. Tests could
either be put in the same source file or in separated files. By convention,
test file names and test function names start with test_
. You can just run
py.test <PKGNAME>
to run all tests.
Integrating pep8 and pylint
pep8 is a tool that can check whether your code conform to the PEP 8 style guide. A style guide is there to make your code easier to read by yourself and other developers.
Pylint is a source code analyzer that can find common programming errors and
code smells. For example, pylint can find function redefinitions, unused
parameters, undefined symbols, bad imports etc. There are newer tools in this
catalog, but pylint is what I use. When using pylint, I usually run pylint
-E
, which only list warnings and errors. Without using -E
option, the
output will be too long to digest. If you do read the full output, you might
be surprised to see how much pylint can understand your code, especially on a
medium or large size project. Another reason for using -E
is that it runs
much faster, since it does less analysis.
pep8 and pylint are configurable, not all checks are good for all projects. I myself disable some checks by default. Some pylint checks can be disabled at line level, function level or module level. Only disable a check globally when you think it brings more trouble than benefit.
Here is a Makefile that integrate virtualenv, pep8, pylint and pytest:
PYTHON_MODULES := mypkg PYTHONPATH := . VENV := .venv PYTEST := env PYTHONPATH=$(PYTHONPATH) PYTEST=1 $(VENV)/bin/py.test PYLINT := env PYTHONPATH=$(PYTHONPATH) $(VENV)/bin/pylint --disable=I0011 --msg-template="{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}" PEP8 := env PYTHONPATH=$(PYTHONPATH) $(VENV)/bin/pep8 --repeat --ignore=E202,E501,E402 PYTHON := env PYTHONPATH=$(PYTHONPATH) $(VENV)/bin/python PIP := $(VENV)/bin/pip DEFAULT_PYTHON := /usr/bin/python3 VIRTUALENV := /usr/local/bin/virtualenv REQUIREMENTS := -r requirements.txt default: check-coding-style venv: test -d $(VENV) || $(VIRTUALENV) -p $(DEFAULT_PYTHON) -q $(VENV) requirements: @if [ -d wheelhouse ]; then \ $(PIP) install -q --no-index --find-links=wheelhouse $(REQUIREMENTS); \ else \ $(PIP) install -q $(REQUIREMENTS); \ fi bootstrap: venv requirements check-coding-style: bootstrap $(PEP8) $(PYTHON_MODULES) $(PYLINT) -E $(PYTHON_MODULES) pylint-full: check-coding-style $(PYLINT) $(PYTHON_MODULES) test: check-coding-style $(PYTEST) $(PYTHON_MODULES) check: $(PYTEST) $(PYTHON_MODULES) .PHONY: default venv requirements bootstrap check-coding-style pylint-full test check
To run all of pep8/pylint/pytest, just type "make test". It will create virtualenv in .venv directory, install dependencies, and run the checkers. To only run pytest, type "make check". You may want to run pytests only when you have a large project, where pylint runs can significantly slow down your development-test cycle.
This makefile is a trimmed version that only focus on tools mentioned in this section to make it easier to understand. You may also read the full version. That version has more make targets and some target has dependencies on external scripts that I may write posts about later on. You need to adapt it yourself.
If you use git, you can config a pre-commit hook that automatically runs "make
test" for you when you run "git commit". If any tests fail, the commit will
abort. To do that, create a file .git/hooks/pre-commit
with the following
content:
#!/bin/sh make test
Then make it executable by
chmod +x .git/hooks/pre-commit
You can also config post-commit and post-merge hook that triggers a Jenkins CI build. But this post is already too long. So I won't talk about the details.
Writing Sanity-Check Script
Sometimes, a project has external dependencies at runtime. If an external service is down, the project cannot run.
These dependencies includes external database, network connectivity, system commands, credentials to 3rd party systems, disk space requirements, etc.
Being a responsible developer, you should provide a sanity-check script to check all external environmental dependencies. This will help sysadmin team to locate problems when deploying your application for the first time, or when something goes wrong.
Enable Logging
Logging should be in every project. Python's logging module is powerful and flexible. But it's difficult to learn and configure.
My suggestion is configure it once and forget about it forever. I learned it from the official logging module document. It has a lot of content, but not that difficult to understand.
Using logging module is quite easy compared to configuring it.
Here is how you use logging in any module:
import logging logger = logging.getLogger(__name__) def my_function(param1): logger.info(u"running my_function()") # ... logger.error(u"something goes wrong, param1=%s", param1) # ...
When writing logs in your app, make sure your logs are useful. Specifically, try not to flood your log file and make every line helpful.
Do Not Crash
If the program you wrote is a daemon or a script that runs for a long time, or a script that runs on many items (e.g. files), it's important it does not crash halfway. Sometimes this is called defensive programming.
In python, you should know where the program could fail and deal with the failure accordingly. Here are some ways to not crash:
- Capture exceptions properly
KeyError, IndexError, IOError, DatabaseError, UnicodeDecodeError are very common exceptions. You should know when an operation could raise these exceptions and handle them at the correct place.
- Validate user inputs
Check your assumptions and rule out bad user inputs. Write total functions whenever you can.
- Implement retry
When it's environmental error, such as network is down temporarily, implementing retry can save a lot of hassle. Auto retry can be implemented using decorators and it is easy to use.
- Fail fast
When you notice you can't fulfill a request, you could choose to fail fast. For example, when an action requires a long computation, then write result to a database, it's best you check whether connection to database is okay before doing any computation. Otherwise, the computation may be wasted.
In web app, when user's authentication or request parameter is invalid, you should fail fast and return 4xx immediately.
- Deploy daemons using upstart, systemd or supervisord
Nowadays, when you need to run a daemon, you don't need to do it in the traditional unix way anymore. Use one of these service management tools. These tools can start a process for you, monitor it, restart it when it crashes. Some provide stdout and stderr logging as well.
Summary
May your new project go well. If you have questions, leave a comment.