06 Sep 2021

How to Find Duplicates in Nested Arrays of Objects

A flexible algorithm that works for any depth of nesting

Parsing through data is one of life’s great joys/headaches. One of the most common requests I receive is identifying if a value — which is buried deep in a labyrinth of nested arrays/objects — is unique or a duplicate.

Well, I’m tired of reinventing the wheel over and over so I’ve developed a standard strategy that will work for a key-value pair at any depth.

We’ll be using JavaScript for the code examples; however, the principles in the algorithm are universal and can be applied to any language.

Assumptions About Our Data

Making the claim that an algorithm will work with “any data” is bold and most likely overstated. The reality is that we always have assumptions about our data, some of which are even unrealized assumptions.

For this exercise, we’ll assume that the top-level of our data structure is an object and not an array of objects. Additionally, nested structures will always be an array of objects.

This may seem restrictive, but it’s actually a very common data structure. So again, our data will be an object with nested arrays of objects.

The General Strategy

The plan is to maintain a simple array of values that are extracted by recursing through our data. From this simple array, we are able to easily identify if each new value already exists in our array, meaning it is a duplicate.

Sample Data

Before we dive into the algorithm, let’s take a look at the data we’ll be using as a sample/test in the article.let record = {
"first_name": "Jonathan",
"last_name": "Hsu",
"customers": [
{
"name": "Acme Corp",
"contacts": [
{
"name": "Justin",
"emails": [
{
"email": "justin@acme-labs.com",
"type": "work"
},
{
"email": "surfer@gmail.com",
"type": "personal"
}
]
}
]
},
{
"name": "Acme Labs",
"contacts": [
{
"name": "Jane Doe",
"emails": [
{
"email": "jdoe@acme-labs.com",
"type": "work"
},
{
"email": "jane@gmail.com",
"type": "personal"
}
]
},
{
"name": "John Doe",
"emails": [
{
"email": "jdoe@acme-labs.com",
"type": "work"
},
{
"email": "john@gmail.com",
"type": "personal"
}
]
}
]
}
]
};

Notice that the outer-most structure is an object and that we have multiple nested arrays of objects. While our algorithm will work at any level, we’ll aim high and check to see if any of the email addresses are duplicates.

Building the Algorithm

Let’s start by establishing that our function will be recursive, meaning it will call itself. This is how we accommodate the “of any depth” requirement.

Now, to make this work we’ll need five arguments in each function call:

The top-level object
The name(s) of nested array(s)
The key of the field we are checking
An array used as a container for new values
A boolean used to keep track of whether a duplicate has been found

The first step is writing the skeleton of the function definition including the recursive and base cases.function duplicateSearch(obj, fields, key, list=[], duplicate_found = false) {
if(fields.length > 1) {
// recursive case
} else {
// base case
}return duplicate_found;
}

In our skeleton, we assume that no duplicates have been found. Like turning on a light switch, we’ll flip this variable to true if a duplicate is identified.

Building out the Recursive Case

We can determine that more recursions are necessary if the length of fields is greater than one. If that is the case, then we need to continue to traverse our data structure until we get to the last level where our search key exists.

Since we know that all nested data are arrays of objects, we can loop accordingly.for(let row of obj[fields[0]]) {
duplicate_found = duplicateSearch(row, fields.slice(1), key, list);
}

In our recursive function call, we iterate over the first value in the fields array and pass in the remaining fields.

Building out the Base Case

Once we get to the end of our fields array (length is equal to 1) then it’s time to look for the actual duplicates.

We’ll loop through each object in our final array and for each object, we search for the value in our list. If it is not present then we have a unique value that must be added to the list as we continue the search.

If we find that the value is present in the list, then we return “true” because we have found a duplicate. That value of true flips our variable representing the assumed false (lack of duplicates).for(let row of obj[fields[0]]) {
if(list.indexOf(row[key]) == -1) {
list.push(row[key]);
} else {
return true;
}
}

The Completed Function

Just in case you need to check your work, here’s the algorithm in its entirety.

Testing the Algorithm

We’ll run two tests with our algorithm and sample data. The first will search for duplicate email addresses, which exist. The second will search for duplicate contact names, which do not exist.console.log(
duplicateSearch(
record,
['customers','contacts','emails'],
'email'
)
); // trueconsole.log(
duplicateSearch(
record,
['customers','contacts'],
'name'
)
); // false

Conclusion

As you can see from the algorithm tests, the function call can be easily modified to accommodate any depth of nested arrays of objects. This algorithm returns a simple “yes/no” to the question, are there duplicate values for a given key.

Use this as a starting point to modify for your exact needs. Best wishes and happy coding!

I hope you liked this article! Subscribe to be notified when I publish new articles or even better, consider joining the medium community.