Like in the previous article of this series I want to compare Linq with a classical loop. This time we want to look at the use case to handle data which contains nested data. Again we want to use validated clean data as input and raw data including null pointers.
Use case one: clean data
Let’s start with the first use case. We have a simple data class for a person and a data class for a person group. The person group contains a list of person. So we can created the data structure with nested data. Out of a list of persons we want to find a specific one by name. To keep it simple we just look for the first person with the name. If the list does not contain the data we are looking for, we shall return a default person object.
The data classes are defined as following:
public class Person { public string Name { get; set; } public uint Age { get; set; } public static readonly Person Default = new Person() { Name = "new person", Age = 0 }; } public class PersonGroup { public string GroupName { get; set; } public List<Person> Persons { get; set; } }
Our demo console application creates a list of data and calls the data query method, first with an existing person and second with a not existing one.
List<Person> persons; List<PersonGroup> groups = new List<PersonGroup>(); persons = new List<Person>(); persons.Add(new Person() { Name = "John Doe", Age = 35 }); persons.Add(new Person() { Name = "John Foe", Age = 47 }); groups.Add(new PersonGroup() { GroupName = "male", Persons = persons }); persons = new List<Person>(); persons.Add(new Person() { Name = "Jane Doe", Age = 41 }); groups.Add(new PersonGroup() { GroupName = "female", Persons = persons }); //--------- Person person; //search existing person person = FindPerson(groups, "Jane Doe"); Console.WriteLine("Name: " + person.Name); //search not existing person person = FindPerson(groups, "???"); Console.WriteLine("Name: " + person.Name); Console.ReadKey();
The data query method shall be implemented twice: by using a loop and by using Linq. We start with the classical loop:
static private Person FindPerson(List<PersonGroup> groups, string name) { foreach (PersonGroup group in groups) { foreach (Person person in group.Persons) { if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)) { return person; } } } return Person.Default; }
And we implement the same function by using Linq.
static private Person FindPerson(List<PersonGroup> groups, string name) { var result = from personGroup in groups from person in personGroup.Persons where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase) select person; return result .DefaultIfEmpty<Person>(Person.Default) .First<Person>(); }
Code Review for use case one
The loop method is implemented with a simple nested loop containing the data comparison. The source code is clean and easy to understand. The same statement can be given for the Linq implementation. The only difficult part is the “DefaultIfEmpty” statement. In case the developer adds a little comment why the “DefaultIsEmpty” call is done, the Linq query is easy to understand too and I don’t prefer any of the two implementations.
Use case two: dirty data
The second use case adds an important need: the query must be robust. So the data may for example contain null values. Like in the first use case the method shall return a default person if the one we looking for is not found. Null values or not initialized list shall not throw an error. In this case also the default person shall be returned.
In our test console application we create some dirty data. And we add additional tests to call the function with the data or even with null parameters.
List<Person> persons; List<PersonGroup> groups = new List<PersonGroup>(); persons = new List<Person>(); persons.Add(new Person() { Name = "John Doe", Age = 35 }); persons.Add(new Person() { Name = "John Foe", Age = 47 }); groups.Add(new PersonGroup() { GroupName = "male", Persons = persons }); groups.Add(null); groups.Add(new PersonGroup() { GroupName = "female", Persons = null }); persons = new List<Person>(); persons.Add(null); persons.Add(new Person() { Name = null, Age = 41 }); persons.Add(new Person() { Name = "Jane Doe", Age = 41 }); groups.Add(new PersonGroup() { GroupName = "female", Persons = persons }); //--------- Person person; //search existing person person = FindPerson(groups, "Jane Doe"); Console.WriteLine("Name: " + person.Name); //search not existing person person = FindPerson(groups, "???"); Console.WriteLine("Name: " + person.Name); //search in a list which is not yet initialized person = FindPerson(null, "???"); Console.WriteLine("Name: " + person.Name); Console.ReadKey();
The implemented query using the loop must be adapted to handle all these special cases. The following source code shows an according implementation. The list and the list content will be checked for null values.
static private Person FindPerson(List<PersonGroup> groups, string name) { if (groups == null) { return Person.Default; } foreach (PersonGroup group in groups) { if ((group == null) || (group.Persons == null)) { continue; } foreach (Person person in group.Persons) { if (person == null) { continue; } if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)) { return person; } } } return Person.Default; }
The implementation of the Linq query must be adapted too. A check of the whole list, as well of the single element is added.
static private Person FindPerson(List<PersonGroup> groups, string name) { if (groups == null) { return Person.Default; } var result = from personGroup in groups where personGroup != null where personGroup.Persons != null from person in personGroup.Persons where person != null where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase) select person; return result .DefaultIfEmpty<Person>(Person.Default) .First<Person>(); }
Code Review for use case two
The nested loop gets more complex as the different special cases must be handled. This adds the need for additional if-statements. In such a case you may think about the possibility to extract the inner loop and move it to an own method. By using such an additional method the code is very easy to understand even with the additional if-statements.
The Linq implementation is done by using a query containing an inner query. To handle all the special cases with not initialized data, some additional where-statements were added. This will expand the query a little bit but it stays understandable.
In this case I like the Linq implementation a little bit more compared to the nested loop. But on the other hand the Linq query has a major disadvantage: find an error is difficult and time consuming. You may try this by removing one or several of the where-statements looking for invalid data.