Linq vs Loop: Join

Like in the previous article of this series I want to compare Linq with a classical loop. This time we want to look at data objects which shall be joined to create a result. Again we want to use validated clean data as input and raw data including null pointers.

Use case one: clean data

Let’s start with the first use case. We have a simple data class for a person and a data class for an address. The data classes are linked together by the AddressIdentifier property.

Out of a list of persons and addresses we want to find a specific person by name. The result shall contain the person name and address. To keep it simple we just look for the first person with the name. If the list does not contain the data we are looking for, we shall return a default person and address.

The data classes are defined as following:

public class Person
{        
    public string Name { get; set; }
    public uint Age { get; set; }
    public uint AddressIdentifier { get; set; }

    public static readonly Person Default = new Person()
    {
        Name = "new person",
        Age = 0,
        AddressIdentifier = 0
    };        
}

public class Address
{
    public uint AddressIdentifier { get; set; }

    public string City { get; set; }

    public static readonly Address Default = new Address()
    {
        AddressIdentifier = 0,
        City = "new city"            
    };
}

 

Our demo console application creates a list of data and calls the data query method, first with an existing person and second with a not existing one.

List<Person> persons = new List<Person>();

persons.Add(new Person() { Name = "John Doe", Age = 35, AddressIdentifier = 1 });
persons.Add(new Person() { Name = "Jane Doe", Age = 41, AddressIdentifier = 1 });

List<Address> addresses = new List<Address>();
addresses.Add(new Address() { AddressIdentifier = 1, City = "Chicago" });

//---------

string information;

//search existing person
information = GetPersonInformation(persons, addresses, "Jane Doe");
Console.WriteLine(information);

//search not existing person
information = GetPersonInformation(persons, addresses, "???");
Console.WriteLine(information);

Console.ReadKey();

The data query method shall be implemented twice: by using a loop and by using Linq. We start with the classical loop:

static private string GetPersonInformation(List<Person> persons, List<Address> addresses, string name)
{
    Person actualPerson = Person.Default;
    Address actualAddress = Address.Default;

    foreach (Person person in persons)
    {
        if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
        {
            actualPerson = person;
            break;
        }
    }

    foreach (Address address in addresses)
    {
        if (actualPerson.AddressIdentifier == address.AddressIdentifier)
        {
            actualAddress = address;
            break;
        }
    }

    return actualPerson.Name + ", " + actualAddress.City;
}

And we implement the same function by using Linq.

static private string GetPersonInformation(List<Person> persons, List<Address> addresses, string name)
{
    var result = from person in persons
                    join address in addresses
                    on person.AddressIdentifier equals address.AddressIdentifier
                    where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)
                    select new
                    {
                        Name = person.Name,
                        City = address.City
                    };

    var element = result
        .DefaultIfEmpty(new { Name = Person.Default.Name, City = Address.Default.City })
        .First();

    return element.Name + ", " + element.City;
}

 

Code Review for use case one

The query using the loops is easy to understand and contains clean code. You may think about the possibility to extract the both loops and create single methods to find a person and an address. By doing this refactoring you create three very short and simple methods but with a little increase in complexity. Therefore in my opinion a single method with both loops is fine too.

The Linq query is easy to understand too. You have to know some details about Linq, for example the need for the DefaultIfEmpty statement may not be clear in the first moment. Therefore it would be helpful to add some comments to the query to explain why some statements are needed.

I don’t favor any of the two implementations. From my point of view they are coequal.

Use case two: dirty data

The second use case adds an important need: the query must be robust. So the data may for example contain null values. Like in the first use case the method shall return default data if the person we looking for is not found. Null values or not initialized list shall not throw an error. In this case also the default data shall be returned.

In our test console application we create some dirty data. And we add additional tests to call the function with the data or even with null parameters.

List<Person> persons = new List<Person>();

persons.Add(new Person() { Name = "John Doe", Age = 35, AddressIdentifier = 1 });
persons.Add(null);
persons.Add(new Person() { Name = null, Age = 38, AddressIdentifier = 2 });
persons.Add(new Person() { Name = "Jane Doe", Age = 41, AddressIdentifier = 3 });
persons.Add(new Person() { Name = "Jane Foe", Age = 41, AddressIdentifier = 4 });

List<Address> addresses = new List<Address>();
addresses.Add(new Address() { AddressIdentifier = 1, City = "Chicago" });
addresses.Add(new Address() { AddressIdentifier = 2, City = null });
addresses.Add(null);
addresses.Add(new Address() { AddressIdentifier = 3, City = "Chicago" });            

//---------

string information;

//search existing person
information = GetPersonInformation(persons, addresses, "Jane Doe");
Console.WriteLine(information);

information = GetPersonInformation(persons, addresses, "Jane Foe");
Console.WriteLine(information);

//search not existing person
information = GetPersonInformation(persons, addresses, "???");
Console.WriteLine(information);

//search in a list which is not yet initialized
information = GetPersonInformation(null, addresses, "???");
Console.WriteLine(information);

information = GetPersonInformation(persons, null, "???");
Console.WriteLine(information);

information = GetPersonInformation(null, null, "???");
Console.WriteLine(information);

Console.ReadKey();  

The implemented query using the loop must be adapted to handle all these special cases. The following source code shows an according implementation. The list and the list content will be checked for null values.

static private string GetPersonInformation(List<Person> persons, List<Address> addresses, string name)
{
    Person actualPerson = Person.Default;
    Address actualAddress = Address.Default;

    if (persons != null)
    {
        foreach (Person person in persons)
        {
            if (person == null)
            {
                continue;
            }

            if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
            {
                actualPerson = person;
                break;
            }
        }
    }

    if (addresses != null)
    {
        foreach (Address address in addresses)
        {
            if (address == null)
            {
                continue;
            }

            if (actualPerson.AddressIdentifier == address.AddressIdentifier)
            {
                actualAddress = address;
                break;
            }
        }
    }

    return actualPerson.Name + ", " + actualAddress.City;
}

 

The implementation of the Linq query must be adapted too. A check of the whole list, as well of the single element is added.

static private string GetPersonInformation(List<Person> persons, List<Address> addresses, string name)
{
    if(persons == null)
    {
        persons = new List<Person>();
    }

    if(addresses == null)
    {
        addresses = new List<Address>();
    }

    var result = from person in persons.Where(p => p != null)
                    join address in addresses.Where(a => a != null) 
                    on person.AddressIdentifier equals address.AddressIdentifier                         
                    where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)
                    select new
                    {
                        Name = person.Name,
                        City = address.City
                    };

    var element = result
        .DefaultIfEmpty(new { Name = Person.Default.Name, City = Address.Default.City })
        .First();

    return element.Name + ", " + element.City;
}      

Code Review for use case two

The method containing the loops gets more complex with all the if-statements. Therefore you should extract the two loops and create single methods looking for a person and an address. By doing this little refactoring the loop implementation will become very easy to understand.

The Linq implementation was not changed much. Before execution of the query some data checks are done. But there is a little detail, the additional small queries within the in-parts. These queries are needed to remove null objects. I think you have some possibilities to refactor this implementation. You may extract the nested queries or the data checks. Or in case you want to leave the complex query as it is, you should add comments to explain it a little bit.

Without refactoring I don’t like any of these two implementations as they are some kind of complex. I would like to have two separate query methos, one for the person and the other one for the address and an additional managing method which calls these two query methods and joins the result. The single query methods as well as the join can be implemented with simple Linq statements.

Werbung
Dieser Beitrag wurde unter .NET, C#, LINQ veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit deinem WordPress.com-Konto. Abmelden /  Ändern )

Facebook-Foto

Du kommentierst mit deinem Facebook-Konto. Abmelden /  Ändern )

Verbinde mit %s