BPP - Better Boto Paginator

A python library I wrote for per-resource pagination of AWS resources

Many books

1 min

BBP - Better Boto Paginator

boto3 is the official Python SDK for Amazon Web Services (AWS). It has pagination functionality. This means that if you're trying to enumerate a long list of resources, the paginator will provides an easier way to fetch chunk after chunk of the resource list, compared to raw list_ calls.

The problem with how the module exposes these pages is that you end up with a list of lists. For example, to get a list of all objects within an S3 bucket, you can do:

import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
objects = [p['Contents'] for p in paginator.paginate(Bucket='my-bucket')]

This returns a list of lists of object information. Do you remember off the top of your head how to flatten a list of lists into one list through list comprehension? I sure don't. Yes I could have a for loop and append to a list each iteration, but that feels like more effort than should be required.

Even if you're not loading the whole resource list into a list in memory, and are instead processing within a for loop, you end up with a messy nested for loop.

for page in paginator.paginate(Bucket='my-bucket'):
    if ['Contents'] in page:
        for element in page['Contents']:

I find this a bit awkward. What I really want is:

for element in function(Bucket='my-bucket'):

Where function is smart enough to either return the next item on the page it already has in memory, or fetch the next page with a new API call and return the first item of that.

I wrote the bbp library to solve this problem. (The code is published on GitHub.)


pip install bbp


Here's an example of how to use it for the Lambda ListFunctions paginator.

from wrapper import paginator
from pprint import pprint
for lam in paginator('lambda', 'list_functions', 'Functions'):
    pprint(lam) # process just one element at a time

Here's another example, using the S3 ListObjectsV2 paginator. In this example we need to pass in the bucket name as an extra argument. Just specify this as a name=value pair at the end of the argument list.

for obj in paginator('s3', 'list_objects_v2', 'Contents', Bucket='mybucket'):
    pprint(obj) # process a single resource