Two Simple Algorithms for Chunking a List in Python
Convert a list to evenly sized segments
The Challenge
Create a function that converts a list to a two-dimensional “list of lists” where each nested structure is a specified equal length.
Here are some example inputs and expected outputs:# INPUT LIST: [1,2,3,4,5,6]
# INPUT SIZE: 3
# OUTPUT: [[1,2,3],[4,5,6]]# INPUT LIST: [1,2,3,4,5]
# INPUT SIZE: 2
# OUTPUT: [[1,2],[3,4],[5]]# INPUT LIST: [1,2,3]
# INPUT SIZE: 4
# OUTPUT: [[1,2,3]]
Reviewing our examples, there are some important distinctions for scenarios where the list and chunk size do not perfect match up.
- When the list does not evenly divide by the chunk size, the last index will be shorter than the evenly sized chunks.
- When the chunk size is larger than the list itself, a chunk will still be created with the list in the first index.
We’re going to solve this two ways. The first function will be written generically — using techniques that can be applied in any programming language. The second function will be optimized for Python — using techniques that are specific to the language.
Runner-Up
For a generic chunking algorithm, we’re going to utilize basic programming techniques. We begin by creating an empty list, setting a counting variable to zero, and a variable for the length of our list.
A while loop is an iteration (looping) structure that will execute its body when the expression evaluates to True. We want to loop through the entire list, so we will iterate as long as i is less than arr_length.
Inside the while loop, we need a condition to add new chunks at the appropriate time. We’ll use the modulus (remainder) operator here. If you’re unfamiliar with this arithmetic operator, it calculates the remainder from the division of two values. When the remainder is equal to zero — beginning of the loop and every n-th iteration where n is equal to the chunk size — we nest a new list.
Now, we know we will always be populating the newest chunk created, which has the largest index. We use len(chunked_list)-1 for the index value to represent this knowledge and add the currently iterated item from our original list.
Finally, we increment i to avoid an infinite loop and return chunked_list once the loop completes.def genericChunk(arr, chunk_size):
chunked_list = []
i = 0
arr_length = len(arr)while i < arr_length:
if i % chunk_size == 0:
chunked_list.append([])
chunked_list[len(chunked_list)-1].append(arr[i])i = i + 1return chunked_list
Optimal Solution
We can improve our generic function using tools available in Python. Specifically, we’ll use list comprehension and slice notation to efficiently grab sections of the original list. Additionally, we’ll import the ceil() function from the math library to keep our function as a one-liner.
List comprehensions are an ultra-fast replacement for simple for loops. We’re being so efficient that we don’t even need the chunked_list variable, we can return directly from the list comprehension.
The number of iterations will be equal to the length of the list, divided by the chunk size, rounded up. This accounts for a list such as our second example where the list does not divide evenly by the chunk size.
We define each item in the returned list using slice notation. The starting index of the slice will be equal to our current iterator i multiplied by the chunk size. The ending index will be the starting index plus the chunk size.from math import ceildef chunkList2(arr, chunk_size):
return [
arr[i*chunk_size:i*chunk_size+chunk_size]
for i in range(ceil(len(arr)/chunk_size))
]
Update: Shared by Naquad, the following function removes the need for offsetting i and importing math.def batch(seq, size):
return [
seq[i:i + size]
for i in range(0, len(seq), size)
]
Do you have your own solution for chunking a list? Share your experiences, questions, and feedback below. Thanks for reading and be sure to follow Code 85 for more plain language programming guides!