This post was written by Marcus McCurdy, Software Engineer for Toptal

Discussions criticizing Python often talk about how it is difficult to use Python for multithreaded work, pointing fingers at what is known as the global interpreter lock (affectionately referred to as the GIL) that prevents multiple threads of Python code from running simultaneously. Due to this, the Python multithreading module doesn’t quite behave the way you would expect it to if you’re not a Python developer and you are coming from other languages such as C++ or Java. It must be made clear that one can still write code in Python that runs concurrently or in parallel and make a stark difference in resulting performance, as long as certain things are taken into consideration. If you haven’t read it yet, I suggest you take a look at Eqbal Quran’s article on concurrency and parallelism in Ruby here on the Toptal Engineering Blog.

In this Python concurrency tutorial, we will write a small Python script to download the top popular images from Imgur. We will start with a version that downloads images sequentially, or one at a time. As a prerequisite, you will have to register an application on Imgur. If you do not have an Imgur account already, please create one first.

The scripts in this tutorial have been tested with Python 3.6.4. With some changes, they should also run with Python 2—urllib is what has changed the most between these two versions of Python.

Getting Started with Python Multithreading

Let us start by creating a Python module, named This file will contain all the functions necessary to fetch the list of images and download them. We will split these functionalities into three separate functions:

  • get_links
  • download_link
  • setup_download_dir

The third function, setup_download_dir, will be used to create a download destination directory if it doesn’t already exist.

Imgur’s API requires HTTP requests to bear the Authorization header with the client ID. You can find this client ID from the dashboard of the application that you have registered on Imgur, and the response will be JSON encoded. We can use Python’s standard JSON library to decode it. Downloading the image is an even simpler task, as all you have to do is fetch the image by its URL and write it to a file.

This is what the script looks like:

import json
import logging
import os
from pathlib import Path
from urllib.request import urlopen, Request

logger = logging.getLogger(__name__)

def get_links(client_id):
   headers = {'Authorization': 'Client-ID {}'.format(client_id)}
   req = Request('', headers=headers, method='GET')
   with urlopen(req) as resp:
       data = json.loads(resp.readall().decode('utf-8'))
   return map(lambda item: item['link'], data['data'])

def download_link(directory, link):'Downloading %s', link)
   download_path = directory / os.path.basename(link)
   with urlopen(link) as image,'wb') as f:

def setup_download_dir():
   download_dir = Path('images')
   if not download_dir.exists():
   return download_dir

Next, we will need to write a module that will use these functions to download the images, one by one. We will name this This will contain the main function of our first, naive version of the Imgur image downloader. The module will retrieve the Imgur client ID in the environment variable IMGUR_CLIENT_ID. It will invoke the setup_download_dir to create the download destination directory. Finally, it will fetch a list of images using the get_links function, filter out all GIF and album URLs, and then use download_link to download and save each of those images to the disk. Here is what looks like:

import logging
import os
from time import time

from download import setup_download_dir, get_links, download_link

logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

def main():
   ts = time()
   client_id = os.getenv('IMGUR_CLIENT_ID')
   if not client_id:
       raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
   download_dir = setup_download_dir()
   links = [l for l in get_links(client_id) if l.endswith('.jpg')]
   for link in links:
       download_link(download_dir, link)
   print('Took {}s'.format(time() - ts))

if __name__ == '__main__':

On my laptop, this script took 19.4 seconds to download 91 images. Please do note that these numbers may vary based on the network you are on. 19.4 seconds isn’t terribly long, but what if we wanted to download more pictures? Perhaps 900 images, instead of 90. With an average of 0.2 seconds per picture, 900 images would take approximately 3 minutes. For 9000 pictures it would take 30 minutes. The good news is that by introducing concurrency or parallelism, we can speed this up dramatically.

All subsequent code examples will only show import statements that are new and specific to those examples. For convenience, all of these Python scripts can be found in this GitHub repository.

Using Threads for Concurrency and Parallelism

Threading is one of the most well-known approaches to attaining Python concurrency and parallelism. Threading is a feature usually provided by the operating system. Threads are lighter than processes, and share the same memory space.

Python multithreading memory model

In our Python threading tutorial, we will write a new module to replace This module will create a pool of eight threads, making a total of nine threads including the main thread. I chose eight worker threads because my computer has eight CPU cores and one worker thread per core seemed a good number for how many threads to run at once. In practice, this number is chosen much more carefully based on other factors, such as other applications and services running on the same machine.

This is almost the same as the previous one, with the exception that we now have a new class, DownloadWorker, which is a descendent of the Python Thread class. The run method has been overridden, which runs an infinite loop. On every iteration, it calls self.queue.get() to try and fetch a URL to from a thread-safe queue. It blocks until there is an item in the queue for the worker to process. Once the worker receives an item from the queue, it then calls the same download_link method that was used in the previous script to download the image to the images directory. After the download is finished, the worker signals the queue that that task is done. This is very important, because the Queue keeps track of how many tasks were enqueued. The call to queue.join() would block the main thread forever if the workers did not signal that they completed a task.

from queue import Queue
from threading import Thread

class DownloadWorker(Thread):
   def __init__(self, queue):
       self.queue = queue

   def run(self):
       while True:
           # Get the work from the queue and expand the tuple
           directory, link = self.queue.get()
           download_link(directory, link)

def main():
   ts = time()
   client_id = os.getenv('IMGUR_CLIENT_ID')
   if not client_id:
       raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
   download_dir = setup_download_dir()
   links = [l for l in get_links(client_id) if l.endswith('.jpg')]
   # Create a queue to communicate with the worker threads
   queue = Queue()
   # Create 8 worker threads
   for x in range(8):
       worker = DownloadWorker(queue)
       # Setting daemon to True will let the main thread exit even though the workers are blocking
       worker.daemon = True
   # Put the tasks into the queue as a tuple
   for link in links:'Queueing {}'.format(link))
       queue.put((download_dir, link))
   # Causes the main thread to wait for the queue to finish processing all the tasks
   print('Took {}'.format(time() - ts))

Running this Python threading example script on the same machine used earlier results in a download time of 4.1 seconds! That’s 4.7 times faster than the previous example. While this is much faster, it is worth mentioning that only one thread was executing at a time throughout this process due to the GIL. Therefore, this code is concurrent but not parallel. The reason it is still faster is because this is an IO bound task. The processor is hardly breaking a sweat while downloading these images, and the majority of the time is spent waiting for the network. This is why Python multithreading can provide a large speed increase. The processor can switch between the threads whenever one of them is ready to do some work. Using the threading module in Python or any other interpreted language with a GIL can actually result in reduced performance. If your code is performing a CPU bound task, such as decompressing gzip files, using the threading module will result in a slower execution time. For CPU bound tasks and truly parallel execution, we can use the multiprocessing module.

While the de facto reference Python implementation—CPython–has a GIL, this is not true of all Python implementations. For example, IronPython, a Python implementation using the .NET framework, does not have a GIL, and neither does Jython, the Java-based implementation. You can find a list of working Python implementations here.

Related: Python Best Practices and Tips by Toptal Developers

Spawning Multiple Processes

The multiprocessing module is easier to drop in than the threading module, as we don’t need to add a class like the Python threading example. The only changes we need to make are in the main function.

Python multiprocessing tutorial: Modules

To use multiple processes, we create a multiprocessing Pool. With the map method it provides, we will pass the list of URLs to the pool, which in turn will spawn eight new processes and use each one to download the images in parallel. This is true parallelism, but it comes with a cost. The entire memory of the script is copied into each subprocess that is spawned. In this simple example, it isn’t a big deal, but it can easily become serious overhead for non-trivial programs.

from functools import partial
from multiprocessing.pool import Pool

def main():
   ts = time()
   client_id = os.getenv('IMGUR_CLIENT_ID')
   if not client_id:
       raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
   download_dir = setup_download_dir()
   links = [l for l in get_links(client_id) if l.endswith('.jpg')]
   download = partial(download_link, download_dir)
   with Pool(8) as p:, links)
   print('Took {}s'.format(time() - ts))

Distributing to Multiple Workers

While the threading and multiprocessing modules are great for scripts that are running on your personal computer, what should you do if you want the work to be done on a different machine, or you need to scale up to more than the CPU on one machine can handle? A great use case for this is long-running back-end tasks for web applications. If you have some long-running tasks, you don’t want to spin up a bunch of sub-processes or threads on the same machine that need to be running the rest of your application code. This will degrade the performance of your application for all of your users. What would be great is to be able to run these jobs on another machine, or many other machines.

A great Python library for this task is RQ, a very simple yet powerful library. You first enqueue a function and its arguments using the library. This pickles the function call representation, which is then appended to a Redis list. Enqueueing the job is the first step, but will not do anything yet. We also need at least one worker to listen on that job queue.

Model of the RQ Python queue library

The first step is to install and run a Redis server on your computer, or have access to a running Redis server. After that, there are only a few small changes made to the existing code. We first create an instance of an RQ Queue and pass it an instance of a Redis server from the redis-py library. Then, instead of just calling our download_link method, we call q.enqueue(download_link, download_dir, link). The enqueue method takes a function as its first argument, then any other arguments or keyword arguments are passed along to that function when the job is actually executed.

One last step we need to do is to start up some workers. RQ provides a handy script to run workers on the default queue. Just run rqworker in a terminal window and it will start a worker listening on the default queue. Please make sure your current working directory is the same as where the scripts reside in. If you want to listen to a different queue, you can run rqworker queue_name and it will listen to that named queue. The great thing about RQ is that as long as you can connect to Redis, you can run as many workers as you like on as many different machines as you like; therefore, it is very easy to scale up as your application grows. Here is the source for the RQ version:

from redis import Redis
from rq import Queue

def main():
   client_id = os.getenv('IMGUR_CLIENT_ID')
   if not client_id:
       raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
   download_dir = setup_download_dir()
   links = [l for l in get_links(client_id) if l.endswith('.jpg')]
   q = Queue(connection=Redis(host='localhost', port=6379))
   for link in links:
       q.enqueue(download_link, download_dir, link)

However, RQ is not the only Python job queue solution. RQ is easy to use and covers simple use cases extremely well, but if more advanced options are required, other Python 3 queue solutions (such as Celery) can be used.

Python Multithreading vs. Multiprocessing

If your code is IO bound, both multiprocessing and multithreading in Python will work for you. Multiprocessing is a easier to just drop in than threading but has a higher memory overhead. If your code is CPU bound, multiprocessing is most likely going to be the better choice—especially if the target machine has multiple cores or CPUs. For web applications, and when you need to scale the work across multiple machines, RQ is going to be better for you.


Python concurrent.futures

Something new since Python 3.2 that wasn’t touched upon in the original article is the concurrent.futurespackage. This package provides yet another way to use concurrency and parallelism with Python.

In the original article, I mentioned that Python’s multiprocessing module would be easier to drop into existing code than the threading module. This was because the Python 3 threading module required subclassing the Thread class and also creating a Queue for the threads to monitor for work.

Using a concurrent.futures.ThreadPoolExecutor makes the code almost identical to the multiprocessing module.

import logging
import os
from concurrent.futures import ThreadPoolExecutor
from functools import partial
from time import time

from download import setup_download_dir, get_links, download_link

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

logger = logging.getLogger(__name__)

def main():
    client_id = os.getenv('IMGUR_CLIENT_ID')
    if not client_id:
        raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
    download_dir = setup_download_dir()
    links = get_links(client_id)

    # By placing the executor inside a with block, the executors shutdown method
    # will be called cleaning up threads.
    # By default, the executor sets number of workers to 5 times the number of
    # CPUs.
    with ThreadPoolExecutor() as executor:

        # Create a new partially applied function that stores the directory
        # argument.
        # This allows the download_link function that normally takes two
        # arguments to work with the map function that expects a function of a
        # single argument.
        fn = partial(download_link, download_dir)

        # Executes fn concurrently using threads on the links iterable. The
        # timeout is for the entire process, not a single call, so downloading
        # all images must complete within 30 seconds., links, timeout=30)

if __name__ == '__main__':

Now that we have all these images downloaded with our Python ThreadPoolExecutor, we can use them to test a CPU-bound task. We can create thumbnail versions of all the images in both a single-threaded, single-process script and then test a multiprocessing-based solution.

We are going to use the Pillow library to handle the resizing of the images.

Here is our initial script.

import logging
from pathlib import Path
from time import time

from PIL import Image

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

logger = logging.getLogger(__name__)

def create_thumbnail(size, path):
    Creates a thumbnail of an image with the same name as image but with
    _thumbnail appended before the extension.  E.g.:

    >>> create_thumbnail((128, 128), 'image.jpg')

    A new thumbnail image is created with the name image_thumbnail.jpg

    :param size: A tuple of the width and height of the image
    :param path: The path to the image file
    :return: None
    image =
    path = Path(path)
    name = path.stem + '_thumbnail' + path.suffix
    thumbnail_path = path.with_name(name)

def main():
    ts = time()
    for image_path in Path('images').iterdir():
        create_thumbnail((128, 128), image_path)'Took %s', time() - ts)

if __name__ == '__main__':

This script iterates over the paths in the images folder and for each path it runs the create_thumbnail function. This function uses Pillow to open the image, create a thumbnail, and save the new, smaller image with the same name as the original but with _thumbnail appended to the name.

Running this script on 160 images totaling 36 million takes 2.32 seconds. Lets see if we can speed this up using a ProcessPoolExecutor.

import logging
from pathlib import Path
from time import time
from functools import partial

from concurrent.futures import ProcessPoolExecutor

from PIL import Image

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

logger = logging.getLogger(__name__)

def create_thumbnail(size, path):
    Creates a thumbnail of an image with the same name as image but with
    _thumbnail appended before the extension. E.g.:

    >>> create_thumbnail((128, 128), 'image.jpg')

    A new thumbnail image is created with the name image_thumbnail.jpg

    :param size: A tuple of the width and height of the image
    :param path: The path to the image file
    :return: None
    path = Path(path)
    name = path.stem + '_thumbnail' + path.suffix
    thumbnail_path = path.with_name(name)
    image =

def main():
    ts = time()
    # Partially apply the create_thumbnail method, setting the size to 128x128
    # and returning a function of a single argument.
    thumbnail_128 = partial(create_thumbnail, (128, 128))

    # Create the executor in a with block so shutdown is called when the block
    # is exited.
    with ProcessPoolExecutor() as executor:, Path('images').iterdir())'Took %s', time() - ts)

if __name__ == '__main__':

The create_thumbnail method is identical to the last script. The main difference is the creation of a ProcessPoolExecutor. The executor’s map method is used to create the thumbnails in parallel. By default, the ProcessPoolExecutor creates one subprocess per CPU. Running this script on the same 160 images took 1.05 seconds—2.2 times faster!

Async/Await (Python 3.5+ only)

One of the most requested items in the comments on the original article was for an example using Python 3’s asyncio module. Compared to the other examples, there is some new Python syntax that may be new to most people and also some new concepts. An unfortunate additional layer of complexity is caused by Python’s built-in urllib module not being asynchronous. We will need to use an async HTTP library to get the full benefits of asyncio. For this, we’ll use aiohttp.

Let’s jump right into the code and a more detailed explanation will follow.

import asyncio
import logging
import os
from time import time

import aiohttp

from download import setup_download_dir, get_links

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

async def async_download_link(session, directory, link):
    Async version of the download_link method we've been using in the other examples.
    :param session: aiohttp ClientSession
    :param directory: directory to save downloads
    :param link: the url of the link to download
    download_path = directory / os.path.basename(link)
    async with session.get(link) as response:
        with'wb') as f:
            while True:
                # await pauses execution until the 1024 (or less) bytes are read from the stream
                chunk = await
                if not chunk:
                    # We are done reading the file, break out of the while loop
                f.write(chunk)'Downloaded %s', link)

# Main is now a coroutine
async def main():
    client_id = os.getenv('IMGUR_CLIENT_ID')
    if not client_id:
        raise Exception("Couldn't find IMGUR_CLIENT_ID environment variable!")
    download_dir = setup_download_dir()
    # We use a session to take advantage of tcp keep-alive
    # Set a 3 second read and connect timeout. Default is 5 minutes
    async with aiohttp.ClientSession(conn_timeout=3, read_timeout=3) as session:
        tasks = [(async_download_link(session, download_dir, l)) for l in get_links(client_id)]
        # gather aggregates all the tasks and schedules them in the event loop
        await asyncio.gather(*tasks, return_exceptions=True)

if __name__ == '__main__':
    ts = time()
    # Create the asyncio event loop
    loop = asyncio.get_event_loop()
        # Shutdown the loop even if there is an exception
        loop.close()'Took %s seconds to complete', time() - ts)

There is quite a bit to unpack here. Let’s start with the main entry point of the program. The first new thing we do with the asyncio module is to obtain the event loop. The event loop handles all of the asynchronous code. Then, the loop is run until complete and passed the main function. There is a piece of new syntax in the definition of main: async def. You’ll also notice await and with async.

The async/await syntax was introduced in PEP492. The async def syntax marks a function as a coroutine. Internally, coroutines are based on Python generators, but aren’t exactly the same thing. Coroutines return a coroutine object similar to how generators return a generator object. Once you have a coroutine, you obtain its results with the await expression. When a coroutine calls await, execution of the coroutine is suspended until the awaitable completes. This suspension allows other work to be completed while the coroutine is suspended “awaiting” some result. In general, this result will be some kind of I/O like a database request or in our case an HTTP request.

The download_link function had to be changed pretty significantly. Previously, we were relying on urllib to do the brunt of the work of reading the image for us. Now, to allow our method to work properly with the async programming paradigm, we’ve introduced a while loop that reads chunks of the image at a time and suspends execution while waiting for the I/O to complete. This allows the event loop to loop through downloading the different images as each one has new data available during the download.

There Should Be One—Preferably Only One—Obvious Way to Do It

While the zen of Python tells us there should be one obvious way to do something, there are many ways in Python to introduce concurrency into our programs. The best method to choose is going to depend on your specific use case. The asynchronous paradigm scales better to high-concurrency workloads (like a webserver) compared to threading or multiprocessing, but it requires your code (and dependencies) to be async in order to fully benefit.

Hopefully this article—and update—will point you in the right direction so you have an idea of where to look in the Python standard library if you need to introduce concurrency into your programs.

About the Author: Marcus has a Bachelor’s in Computer Engineering and a Master’s in Computer Science. He is a talented programmer, and excels most at back-end development. However, he is comfortable creating polished products as a full stack developer.

This post was written by Shanglung Wang, Python Developer for Toptal.

Operations research, the science of using computers to make optimal decisions, has been used and applied in many areas of software and programming. To give a few examples, logistics companies use it to determine the optimal route to get from point A to point B, power companies use it to determine power production schedules, and manufacturing companies use it to find optimal staffing patterns for their factories.

Mixed-integer programming

Today, we will explore the power of operations research by walking through a hypothetical problem. Specifically, we will use mixed-integer programming (MIP) to determine the optimal staffing pattern for a hypothetical factory.

Operations Research Algorithms

Before we dive into our example problem, it is helpful to go over some basic concepts in operations research and understand a bit of the tools we will be using today.

Linear Programming Algorithms

Linear programming is an operations research technique used to determine the best outcome in a mathematical model where the objective and the constraints are expressed as a system of linear equations. An example linear programming model might look like this:

Maximize a + b (objective)
Subject to:
a <= 2 (constraint 1)
b <= 3 (constraint 2)

In our very simple example, we can see that the optimal outcome is 5, with a = 2 and b = 3. While this is a rather trivial example, you can probably imagine a linear programming model that utilizes thousands of variables and hundreds of constraints.

A good example might be an investment portfolio problem, where you might end up with something like the below, in pseudo-code:

Maximize <expected profit from all stock investments>

Subject to:
<investment in the technology sector must be between 10% - 20% of portfolio>
<investment in the consumer staples must not exceed investment in financials>

Which would be rather complex and difficult to solve by hand or trial and error. Operations research software will use several algorithms to solve these problems quickly. If you’re interested in the underlying algorithms, I recommend learning about the simplex method here and the interior point method here.

Integer Programming Algorithms

Integer programming is like linear programming with an additional allowance for some or all of the variables to be integer values. While this may not seem like a large improvement at first, it allows us to solve many problems that could have remained unsolved using linear programming alone.

One such problem is the knapsack problem, where we are given a set of items with assigned values and weights and are asked to find the highest value combination of items we can fit into our knapsack. A linear programming model will not be able to solve this because there is no way to express the idea that you can either put an item into your knapsack or not, but you cannot put part of an item into your knapsack—every variable is a continuous variable!

An example knapsack problem might have the following parameters:

ObjectValue (units of $10)Size (generic units)
Mysterious figurine74
Huge bottle of apple cider27
French horn1010

And the MIP model will look like this:

Maximize 5a + 7b + 2c + 10d (objective: maximize value of items take)
Subject to:
    2a + 4b + 7c + 10d <= 15 (space constraint)

The optimal solution, in this case, is a=0, b=1, c=0, d=1, with the value of the total item being 17.

The problem we will solve today will also require integer programming since an employee at a factory can either be scheduled for a shift or not—for the sake of simplicity, you cannot schedule an employee for 2/3 of a shift at this factory.

To make integer programming possible, several mathematical algorithms are used. If you are interested in the underlying algorithms, I recommend studying the cutting-planes algorithm and the branch-and-bound algorithm here.

Example Problem: Scheduling

Problem Description

Today, we will explore the problem of staffing a factory. As the management of the factory, we will want to minimize labor costs, but we want to ensure sufficient coverage for every shift to meet production demand.

Suppose we have five shifts with the following staffing demands:

Shift 1Shift 2Shift 3Shift 4Shift 5
1 person4 people3 people5 people2 people

And, suppose we have the following workers:

NameAvailabilityCost per Shift ($)
Melisandre1, 2, 520
Bran2, 3, 4, 515
Cersei3, 435
Daenerys4, 535
Theon2, 4, 510
Jon1, 3, 525
Tyrion2, 4, 530
Jaime2, 3, 520
Arya1, 2, 420

To keep the problem simple, let us assume for the moment that availability and cost are the sole concerns.

Our Tools

For today’s problem, we will use a piece of open source branch-and-cut software called CBC. We will interface with this software using PuLP, which is a popular operations research modeling library for Python. You can find information about it here.

PuLP allows us to construct MIP models in a very Pythonic fashion and integrates nicely with existing Python code. This is very useful as some of the most popular data manipulation and analysis tools are written in Python, and most commercial operations research systems require extensive data processing before and after the optimization.

To demonstrate the simplicity and elegance of PuLP, here is the knapsack problem from earlier and solved in PuLP:

import pulp as pl

# declare some variables
# each variable is a binary variable that is either 0 or 1
# 1 means the item will go into the knapsack
a = pl.LpVariable("a", 0, 1, pl.LpInteger)
b = pl.LpVariable("b", 0, 1, pl.LpInteger)
c = pl.LpVariable("c", 0, 1, pl.LpInteger)
d = pl.LpVariable("d", 0, 1, pl.LpInteger)

# define the problem
prob = pl.LpProblem("knapsack", pl.LpMaximize)

# objective function - maximize value of objects in knapsack
prob += 5 * a + 7 * b + 2 * c + 10 * d

# constraint - weight of objects cannot exceed 15
prob += 2 * a + 4 * b + 7 * c + 10 * d <= 15

status = prob.solve()  # solve using the default solver, which is cbc
print(pl.LpStatus[status])  # print the human-readable status

# print the values
print("a", pl.value(a))
print("b", pl.value(b))
print("c", pl.value(c))
print("d", pl.value(d))

Run this, and you should get the output:

a 0.0
b 1.0
c 0.0
d 1.0

Now onto our scheduling problem!

Coding Our Solution

The coding of our solution is fairly straightforward. First, we will want to define our data:

import pulp as pl
import collections as cl

# data
shift_requirements = [1, 4, 3, 5, 2]
workers = {
    "Melisandre": {
        "availability": [0, 1, 4],
        "cost": 20
    "Bran": {
        "availability": [1, 2, 3, 4],
        "cost": 15
    "Cersei": {
        "availability": [2, 3],
        "cost": 35
    "Daenerys": {
        "availability": [3, 4],
        "cost": 35
    "Theon": {
        "availability": [1, 3, 4],
        "cost": 10
    "Jon": {
        "availability": [0, 2, 4],
        "cost": 25
    "Tyrion": {
        "availability": [1, 3, 4],
        "cost": 30
    "Jaime": {
        "availability": [1, 2, 4],
        "cost": 20
    "Arya": {
        "availability": [0, 1, 3],
        "cost": 20

Next, we will want to define the model:

# define the model: we want to minimize the cost
prob = pl.LpProblem("scheduling", pl.LpMinimize)

# some model variables
cost = []
vars_by_shift = cl.defaultdict(list)

for worker, info in workers.items():
    for shift in info['availability']:
        worker_var = pl.LpVariable("%s_%s" % (worker, shift), 0, 1, pl.LpInteger)
        cost.append(worker_var * info['cost'])

# set the objective to be the sum of cost
prob += sum(cost)

# set the shift requirements
for shift, requirement in enumerate(shift_requirements):
    prob += sum(vars_by_shift[shift]) >= requirement

And now we just ask it to solve and print the results!

status = prob.solve()
print("Result:", pl.LpStatus[status])
results = []
for shift, vars in vars_by_shift.items():
        "shift": shift,
        "workers": [ for var in vars if var.varValue == 1],

for result in sorted(results, key=lambda x: x['shift']):
    print("Shift:", result['shift'], 'workers:', ', '.join(result['workers']))

Once you run the code, you should see the following outputs:

Result: Optimal
Shift: 0 workers: Arya_0
Shift: 1 workers: Melisandre_1, Bran_1, Theon_1, Arya_1
Shift: 2 workers: Bran_2, Jon_2, Jaime_2
Shift: 3 workers: Bran_3, Daenerys_3, Theon_3, Tyrion_3, Arya_3
Shift: 4 workers: Bran_4, Theon_4

Crank Up the Difficulty a Bit: Additional Constraints

Even though the previous model was interesting and useful, it does not fully demonstrate the power of mixed-integer programming. We could also easily write a for loop to find the cheapest x workers for every shift, where x is the number of workers needed for a shift. To demonstrate how MIP can be used to solve a problem that would be challenging to solve in an imperative fashion, let us examine what would happen if we add a few extra constraints.

Additional Constraints

Suppose that, due to new labor regulations, no workers can be assigned to more than 2 shifts. To account for the increased work, we have recruited the help of Dothraki Staffing Group, who will supply up to 10 Dothraki workers for each shift at a rate of $40 per shift.

Additionally suppose that, due to some ongoing interpersonal conflicts outside of our factory, Cersei and Jaime are unable to work any shifts alongside either Daenerys or Jon. Additionally, Jaime and Cersei are unable to work any shifts together. Finally, Arya, who has particularly complex interpersonal relationships, is unable to work in the same shift with Jaime, Cersei, or Melisandre.

The addition of these two new constraints and resources by no means makes the problem unsolvable through imperative means, but it makes the solution much harder, as one will have to account for opportunity costs of scheduling a worker to a particular shift.


With mixed-integer programming, however, it is much easier. We simply need to amend our code like so:

Define the ban list and some constraints:

ban_list = {
    ("Daenerys", "Jaime"),
    ("Daenerys", "Cersei"),
    ("Jon", "Jaime"),
    ("Jon", "Cersei"),
    ("Arya", "Jaime"),
    ("Arya", "Cersei"),
    ("Arya", "Melisandre"),
    ("Jaime", "Cersei")


Populate variables for implementing the ban and max shift constraints:

for worker, info in workers.items():
    for shift in info['availability']:
        worker_var = pl.LpVariable("%s_%d" % (worker, shift), 0, 1, pl.LpInteger)
        # store some variable data so we can implement the ban constraint
        var_data = (worker,)
        vars_by_shift[shift].append((worker_var, var_data))
        # store vars by variable so we can implement the max shift constraint
        cost.append(worker_var * info['cost'])

Add the Dothraki variables:

for shift in range(len(shift_requirements)):
    dothraki_var = pl.LpVariable('dothraki_%d' % shift, 0, DOTHRAKI_MAX, pl.LpInteger)
    cost.append(dothraki_var * DOTHRAKI_COST)
    dothrakis_by_shift[shift] = dothraki_var

We will also need a slightly modified loop for computing shift and ban requirements:

# set the shift requirements
for shift, requirement in enumerate(shift_requirements):
    prob += sum([var[0] for var in vars_by_shift[shift]]) + dothrakis_by_shift[shift] >= requirement

# set the ban requirements
for shift, vars in vars_by_shift.items():
    for var1 in vars:
        for var2 in vars:
            worker_pair = var1[1][0], var2[1][0]
            if worker_pair in ban_list:
                prob += var1[0] + var2[0] <= 1

# set the labor standards:
for worker, vars in vars_by_worker.items():
    prob += sum(vars) <= 2

And finally, to print the results:

status = prob.solve()
print("Result:", pl.LpStatus[status])
results = []
for shift, vars in vars_by_shift.items():
        "shift": shift,
        "workers": [var[1][0] for var in vars if var[0].varValue == 1],
        "dothrakis": dothrakis_by_shift[shift].varValue

for result in sorted(results, key=lambda x: x['shift']):
    print("Shift:", result['shift'], 'workers:', ', '.join(result['workers']), 'dothrakis hired:', int(result['dothrakis']))

And we should be good to go. Run the code and you should see output as below:

Result: Optimal
Shift: 0 workers: Arya dothrakis hired: 0
Shift: 1 workers: Melisandre, Theon, Tyrion, Jaime dothrakis hired: 0
Shift: 2 workers: Bran, Jon dothrakis hired: 1
Shift: 3 workers: Bran, Daenerys, Theon, Tyrion, Arya dothrakis hired: 0
Shift: 4 workers: Melisandre, Jaime dothrakis hired: 0

And there we have it, a result that respects the banned workers’ list, follows labor regulations, and uses Dothraki workers judiciously.


Today, we explored using mixed-integer programming to make better decisions. We discussed the underlying algorithms and techniques used in operations research and looked at an example problem that is representative of how mixed-integer programming is used in the real world.

I hope this article inspired you to learn more about operations research and made you think about how this technology can be applied to your projects. We’ve only really seen the tip of the iceberg when it comes to the exciting world of optimization algorithms and operations research.

You can find all the code related to this article on GitHub. If you would like to discuss more, share your comments below!

This post was written by Shanglung Wang, Python Developer for Toptal.

Here are the top 15 places to find a Python developer:

1. Toptal

Toptal is a matching service, initially created with only tech talent in mind. Although it has expanded its pool of talent to include designers and finance experts, the company’s bread and butter is its developer vertical. If you want to be sure that a Python developer is up to the job, hiring an exceptional developer from Toptal is likely your best option.

Why? Toptal boasts an elite developer base. Their trademark system for vetting talent allows for only the best to become a part of their community. According to Toptal, only 3% of applicants make it through their battery of technical tests and their comprehensive vetting process.

2. Hired

Hired helps employers find software engineers and developers. On Hired, you can use their pipeline to find custom matches. You can create a company profile, search for candidates using their search algorithm (which can eliminate gender and racial identifiers for fairer hiring), and request interviews with candidates.

What’s best about Hired? It’s great for finding specialized Python developers who are actively searching for new opportunities, have relevant experience (as most candidates on Hired have at least two years of experience), and are in your area.

3. We Work Remotely

We Work Remotely is a job board dedicated to remote listings. As a result of the remote-only restriction, there is a higher than average amount of tech and tech-creative hybrid job postings, which include front-end web developer and Python developer positions.

Posting a job listing may be a bit more expensive than other job boards listed at $299 a month. However, if you’re looking to fill a remote position and are uninterested in recruiting local employees or freelancers, you should strongly consider utilizing We Work Remotely.

4. GitHub Jobs

Don’t waste your time perusing large job boards like Monster and Indeed. You’ll have far better luck with job boards geared toward tech talent. GitHub has a massive front-end developer community as it’s one of the largest open-source online repositories for coders. For a relatively small fee, you can post a Python developer job listing and gain a great deal of exposure on GitHub’s huge developer community.


The official Python job board is one of the surest ways to find a qualified Python developer. While you won’t be personally matched by tech professionals as is the case with Toptal or Hired, it is one of the best communities focused solely on advertising Python job openings.

If you’re open to writing up a job description, querying interested candidates, and conducting a technical interview yourself, and would like to forgo a massive general job board like Indeed, Monster, and GitHub, you may want to strongly consider’s job board.

6. Authentic Jobs

Authentic Jobs is a job board for leading web, design, and creative talent. It has been steadily rising in popularity since its inception; the board is used by The New York Times in part of its acquisition process.

As Python development often involves both creative and technical aspects, Authentic Jobs is a great place to begin your search. Much of their job board is populated with listings of web developer and Python developer positions which are likely similar to your own needs.

Authentic Jobs allows for posting developer positions remote or local, so you won’t be limited to remote employees or freelancers.

7. Remote Python

Remote Python allows you to find remote Python developers by posting job ads that fit their guidelines. If you’re open to hiring remote Python developers and to performing the vetting and screening yourself, the Remote Python job board could help supplement your search.

In addition to posting job ads on the site, you’ll be able to search Python developers registered with the site. It’s important to take into account that these developers aren’t vetted, however. If you want developers selected based on stringent technical criteria, you should consider using a talent matching service.

8. Stack Overflow

Stack Overflow has an online community that rivals GitHub. Arguably, it’s the absolute largest and most trusted community of developers on the web. Stack Overflow is often used as a resource for all kinds of developers, novice to expert, seeking to learn more about coding. Their job board, like GitHub’s, allows for an incredible amount of exposure to dedicated Python developers around the world.

9. Upwork

If recruiting services and job boards aren’t your thing, you might want to consider a freelance marketplace like Upwork.

Upwork has one of the largest marketplaces with millions of registered freelancers. You can hire contractors for a few simple coding tasks or begin a long-term relationship with a series of complex Python projects. If you like the idea of finding, interviewing, and managing freelancers, Upwork’s colossal marketplace will likely meet your needs.

10. People Per Hour

People Per Hour is another freelance marketplace akin to Upwork. What makes People Per Hour unique is that it holds contests and allows Freelancers to post their own job postings called hourlies.

People Per Hour has millions of members, thousands of confirmed hours, and a plethora of success stories from freelancers and entrepreneurs alike. The ease of posting jobs, contacting freelancers, and paying for hours worked makes People Per Hour a superb choice for employers interested in searching for and vetting freelance candidates themselves.

Additionally, with People Per Hour, you can connect with local freelancers, so you aren’t necessarily limited to remote talent.

11. has a growing community of developers over 25,000 strong. Like Toptal, their service is designed to take the tedium out of hiring. vets their talent and ensures that their freelancers are committed to each and every project. prides itself on its humanism. Every employer is connected with a VP, instead of a sales representative, and freelancers are provided with the resources to succeed.

What’s most alluring about the network? manages and replaces talent – with no risk to you – and back every single hour worked with a 100% money-back guarantee.

12. Guru

Guru has a large global network of freelancers—albeit smaller than Upwork’s and People Per Hour’s massive pool. You can explore the profiles of 1.5 million gurus, propose projects, and pay your hired talent with their secure SafePay system.

Guru isn’t focused on developers, let alone Angular developers, as it is a freelance network comprised of every sort of professional. So, like with Upwork and People Per Hour, you’ll have to narrow your search yourself. Also, as with many freelance networks, the vetting and interviewing will be up to you.

13. Freelancer

Freelancer is an absolutely massive marketplace with 25 million registered users, 12 million total posted jobs, and thousands of completed projects. With the size of the marketplace comes a unique challenge, however. Finding the perfect Python developer is like finding a needle in a haystack.

Although website development is one of the most popular job categories on Freelancer, you still will have to search through thousands of Freelancer profiles, vet and interview candidates yourself, and manage payments yourself.

If you’re looking for an affordable option, however, Freelancer is a wonderful hiring solution. For more long-term commitments, you’ll want to consider matching services like Toptal or

14. Find Bacon

Find Bacon is a job board aimed at eliminating the hassle of searching for design and development jobs. Find Bacon is a pleasantalternative to massive job boards like Indeed and Monster and is highly affordable. Posting a job posting for a 30-day period is only $99 dollars.

They also offer subscription packs which allow for 10 job posts a month. If you’re a company looking to fill multiple positions or are planning on hiring freelancers on an ongoing basis, you may want to consider investing in a subscription pack for a niche job boardlike Find Bacon.

15. X-Team

X-Team matches you with qualified Python developers who receive mentorship and educational resources just for being a part of X-Team. Like Toptal and, they do the heavy lifting of hiring. You won’t be saddled with rifling through resumes or preparing personalized interview scripts.

The major downside of using X-Team is that, as their name suggests, they are adept at organizing teams. If you’re only looking to hire an individual Python developer, you’ll want to use a different matching service.

Honorable Mention: SimplyHired

SimplyHired is a large job board. It’s similar to big, general job boards like Indeed or Monster. The site comes with loads of resources from salary recommendations to hiring guides, and offers low prices for job listings. Like with Indeed and Monster, you’ll get a great deal of exposure. With over a billion job applications delivered, SimplyHired is a highly-respected job board worth investigating.

Begin exploring salary estimators, post within a network of over a hundred job boards in record time, and browse through the collated jobs by cities to see if posting a job listing on SimplyHired is worth the time and money for you.

Choosing the right site

Finding the best sites to find developers is no easy endeavor. Unless you’re a battle-worn recruiter, you likely won’t know how to navigate the complexities of hiring a Python developer. That’s completely okay—there’s plenty of sites and services to help you along the way.

Matching services like Toptal, and to a lesser extent, Hired and, are great solutions for employers searching for tech talent, and for those who are looking to place their trust in experienced tech professionals. For those short on time with high-quality developers as a priority, Toptal and Hired are superb choices.

On the other end of the spectrum, there are freelance marketplaces like Upwork, People Per Hour, and Freelancer that allow you to cast a much wider net for Python developers.

Employers looking for full-time developers may also benefit from utilizing Stack Overflow and GitHub’s job boards, which can provide wonderful exposure to the Python developer community.

Freelance marketplaces like Upwork, Freelancer, and Guru allow you to instantly connect with developers, but you’ll have to care for the hiring details yourself. If you have ample time to devote to screening candidates and are confident in your ability to interview Python developers, they are a great choice. Otherwise, you should steer clear from marketplaces and job boards alike.

Job boards, marketplaces, and matching services all have their uses. Which site will best serve you will depend on your specific situation.

Ultimately, which sites you employ depends on a multitude of factors, such as:

  • How quickly you need to hire a Python developer (i.e. your timeline)
  • How much experience you have hiring Python developers
  • Whether or not you’re equipped to test technical skills
  • How many developers you need to bring on
  • What level of experience those Python developers need
  • Whether or not you’re open to remote workers
  • What your budget constraints are
  • How important quality is to your project(s)