Use Sets to Count Occurrences in Python
Refactor a three-step algorithm into a one-liner
I love Python’s flexibility. It’s elegant “ah-ha” moments where a clever implementation of native tools yields simplified results that put a smile on my face.
Recently, I had to analyze a data set — a list of dictionaries where one term named status needed to be counted. I’ve done this in other languages before and had an algorithm which I had ported over to Python. But it dawned on me that there was a better way.
Sample Input & Output
Before we jump into the algorithms, here’s a sample input and output to help frame the following solutions.records = [
{ "id": 1, "status": "Pass" },
{ "id": 2, "status": "Fail" },
{ "id": 3, "status": "Pass" },
{ "id": 4, "status": "Exempt" }
]counts = countOccurrences(records, "status")
"""
{
"Pass": 2,
"Fail": 1,
"Exempt": 1
}
"""
Notice how the function will identify all the different values of status and return a dictionary of counts for each value. Now, let’s review the language agnostic algorithm then dive into the better Python solution.
Occurrence Counting Algorithm
The following algorithm can be written in a few different ways, but essentially what needs to happen is we need to keep track of all the different values in status and increment the counts as we go. Written inside a function, it looks something like this.def countOccurrences(my_list, term):
counts = {}
for row in my_list:
if row[term] not in counts:
counts[row[term]] = 0counts[term] += 1return counts
Tracing the function, we begin by passing two arguments: the list to iterate over and the value of the term we will be analyzing. Within the function definition we create an empty dictionary and as we iterate over the list we check to see if row[term], which is the value of status, does not exist as a term in the counts dictionary. If it does not exist, we add it and initialize the count to 0. Continuing, we increment the appropriate count and return the dictionary.
Notice that I didn’t hardcode status but instead used a variable so that the function is a little easier to use in multiple scenarios.
Using Sets and Comprehensions
The condensed mechanism that I realized uses sets and comprehensions. In Python, a set is a data type where each element must be unique. This means that duplicates cannot exist in a set. Comprehensions are a short-hand convention that combine the creation of a new data point through iteration of another. List comprehensions are the most popular implementation, but set comprehensions and dictionary comprehensions exist as well.
Even though this solution is a one-liner, we’ll use the same function setup for a seamless refactor.def countOccurrences(my_list, term):
return {
val: len([True for row in my_list if row[term] == val])
for val in {x[term] for x in my_list}
}
Now, this is a dense one-liner so let’s deconstruct it. We use all three comprehensions for the following purposes:
- Dictionary: Create the return object
- Set: Identify unique values of
status - List: Iterate over data set and filter based on
status
Notice the part of line 4 that reads {x[term] for x in my_list}. This is creating a set of the values from status that is used to iterate over when creating the return dictionary. Hence, why line 3 begins with val: which is the temporary element used while iterating over the set.
As for the actual count — what gets set as the definition for val — we have the actual value of status so we need to find the rows that match that value. This is where the list comprehension comes into play. We iterate over my_list, assigning the temporary element to row but only include the elements where row[term] is equal to val. It doesn’t really matter what we set the value to within the list comprehension so I just used the boolean value True. Finally, we need to take the length of the created list so we use the len() function.
Conclusion
I admit that I did not speed test both solutions so for all I know, the former was actually more efficient. However, These types of mental exercises are great for flexing that problem-solving part of the brain and learning to view problems through the lens of the Python toolkit.
Please share your experiences, questions, and feedback below. Follow Code 85 for more plain language programming guides. Thanks for reading!