Algorithms: Find the First Duplicate
Search a Python list for the first duplicated value
The Challenge
Write a function to find the first duplicate in a list of values. The values are allowed to be integers inclusively between 1 and the length of the list. If no duplicate value is found, return -1.
Here are some valid and invalid sample lists with explanations.[1,2,3] # valid, no duplicate
[1,2,3,3,2,1] # valid, 3 is the first duplicate
[0,1,2] # invalid, 0 is less than 1
[1,2,4] # invalid, 4 is greater than the list's length
Let’s take a look at the second example, which has a duplicate value. The first duplicate is not the first value that happens to have a duplicate — which would be 1 — but rather, the first value that is a duplicate of a previous value in the list — which is 3.
Now that we understand the challenge, I’d encourage you to try on your own.
If you’ve already tried or you want to skip ahead, then let’s get to the solution!
Runner-Up
This algorithm utilizes sets — a mutable data type that contains only unique values.
The premise of the solution is to iterate over the list, adding new values to a set. Before adding each value, we check to see if our current value exists in the set. If it does exist, then it is the first duplicate and will be returned.
A few notes about the above solution:
- An empty set cannot be defined with curly braces. Python will interpret it as an empty dictionary.
- This algorithm also works for strings and other iterables.
- While the solution has a time complexity of O(n), which is linear and optimal for this challenge, the algorithm takes up extra memory creating the set.
Optimal Solution
The best algorithm for the challenge takes advantage of the stipulation that the values in the list must be between 1 and the length of the list. This is significant because by shifting any value in the list by -1, we can guarantee that the shifted value is a valid index.
The foundation of this algorithm lies in leveraging each value as an index in the list and flagging values to signify that they have been accessed. We’ll convert the value to a negative, which preserves its original value.
The first time we come across a negative value, we know it’s been accessed before and thus is our first duplicate.
A few notes about the above solution:
- We shift the value by -1 because indexes are zero-based and a value that is equal to the length of the list would be out of bounds.
- The absolute value is returned since the value has been previously flagged.
- This solution has the same time complexity of O(n), but does not take the excess space that using a set would.
Conclusion
The first time I tried this challenge, I used a for loop to keep track of the current value, a nested for loop to look for a duplicate, and a variable to keep track of the lowest index of a duplicate. Did it work? Yes. Was it efficient? No.
How did it go for you? What was your initial solution? Share your experiences below!
A special thanks to Nick White for his explanation of this algorithm!