Linq vs Loop: Check whether item exists

Like in the previous article of this series I want to compare Linq with a classical loop. This time we want to look at the common use case to check whether an item exists. Again we want to use validated clean data as input and raw data including null pointers.

Use case one: clean data

Let’s start with the first use case. We have a simple data class for a person. Out of a list of persons we want to check whether a specific person exists.

The data class is defined as following:

public class Person
{        
    public string Name { get; set; }
    public uint Age { get; set; }        
}

Our demo console application creates a list of data and calls the data query method, first with an existing person and second with a not existing one.

List<Person> persons = new List<Person>();

persons.Add(new Person() { Name = "John Doe", Age = 35 });
persons.Add(new Person() { Name = "Jane Doe", Age = 41 });

//---------

bool exist;

//search existing person
exist = DoesExist(persons, "Jane Doe");
Console.WriteLine("Exist: " + exist.ToString());

//search not existing person
exist = DoesExist(persons, "???");
Console.WriteLine("Exist: " + exist.ToString());

Console.ReadKey();

The data query method shall be implemented twice: by using a loop and by using Linq. We start with the classical loop:

static private bool DoesExist(List<Person> persons, string name)
{
    foreach (Person person in persons)
    {
        if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
        {
            return true;
        }
    }

    return false;
}

And we implement the same function by using Linq.

static private bool DoesExist(List<Person> persons, string name)
{
    return persons.Exists(x => string.Equals(x.Name, name, StringComparison.OrdinalIgnoreCase));
}

Code Review for use case one

Both methods will survive a code review without any issues. The loop is clean and well structured. The Linq query contains a long predicate inside the Exists call. But I think as this string comparison is easy to understand it is fine to write the whole statement in this one line of code.

In this use case I prefer the Linq method because I think it is something easier and quicker to read and understand.

Use case two: dirty data

The second use case adds an important need: the query must be robust. So the data may for example contain null values. Like in the first use case the method shall return whether the person exists. Null pointers will be rated as not existing person and therefore they shall be ignored and shall not throw an exception.

In our test console application we create some dirty data. And we add additional tests to call the function with the data or even with null parameters.

List<Person> persons = new List<Person>();

persons.Add(new Person() { Name = "John Doe", Age = 35 });
persons.Add(null);
persons.Add(new Person() { Name = null, Age = 38 });
persons.Add(new Person() { Name = "Jane Doe", Age = 41 });

//---------
bool exist;

//search existing person
exist = DoesExist(persons, "Jane Doe");
Console.WriteLine("Exist: " + exist.ToString());

//search not existing person
exist = DoesExist(persons, "???");
Console.WriteLine("Exist: " + exist.ToString());

//search in a list which is not yet initialized
exist = DoesExist(null, "???");
Console.WriteLine("Exist: " + exist.ToString());

Console.ReadKey();

The implemented query using the loop must be adapted to handle these special cases. The following source code shows an according implementation. The list and the list content will be checked for null values.

static private bool DoesExist(List<Person> persons, string name)
{
    if (persons == null)
    {
        return false;
    }

    foreach (Person person in persons)
    {
        if (person == null)
        {
            continue;
        }

        if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
        {
            return true;
        }
    }

    return false;
}

The implementation of the Linq query must be adapted too. A check of the whole list, as well of the single element is added.

static private bool DoesExist(List<Person> persons, string name)
{
    if (persons == null)
    {
        return false;
    }

    Predicate<Person> personFinder = (Person p) =>
    {
        if (p == null)
        {
            return false;
        }

        return string.Equals(p.Name, name, StringComparison.OrdinalIgnoreCase);
    };

    return persons.Exists(personFinder);
}

Code Review for use case two

I think both implementations are fine. They are self-explaining and easy to understand.  This time I don’t see any advantage using Linq. The needed predicate definition and the null pointer checks will lead to an implementation which is nearly equal to the loop. Therefore in my opinion the two solutions are coequal.

Veröffentlicht unter .NET, C#, Clean Code, LINQ | Kommentar hinterlassen

Design patterns: Adapter

The Adapter design pattern, which is also known as Wrapper design pattern, is used to convert the interface of an existing class into an interface expected by the client. In such a case the existing class has an incompatible interface. The adapter implements the compatible interface and executes its functionality by using the existing class. This could be necessary in case you use existing components which cannot or should not be changed but which should be integrated in your application.

For example you implement several data serializes which have the following interface. The data is represented as string in this example. The connection settings are use case specific and can be a file name, a database connection string or similar.

public interface IDataSerializer
{
    string ConnectionSettings { get; set; }
    string Read();
    void Write(string data);
}

In an existing library you already have a component which allows file access. It offers the following interface.

public interface IFileAccess
{
    void Open(string fileName);
    void Create(string fileName);
    void Close();
    string Read();
    void Write(string data);
}

You cannot change this existing class but you want to use it within your application. So you can implement an adapter which reuses the existing component but offers the expected interface.

public class FileSerializer : IDataSerializer
{
    private IFileAccess _fileAccess = new FileAccess();

    public string ConnectionSettings { get; set; }

    public string Read()
    {
        string data;

        _fileAccess.Open(ConnectionSettings);
        data = _fileAccess.Read();
        _fileAccess.Close();

        return data;
    }

    public void Write(string data)
    {
        _fileAccess.Create(ConnectionSettings);
        _fileAccess.Write(data);
        _fileAccess.Close();
    }
}
Veröffentlicht unter .NET, C#, Design Pattern, Softwaredesign | Kommentar hinterlassen

Don’t return null

If you write a function returning an object instance you have to think about the special cases. This may be error cases or maybe situations where no object instance can be found or created. In such cases you will often see functions returning “null”.

For example you implement a product management object which shall offer a function to get the product by a product identifier. So you create the following function:

GetProduct(int productIdentifier)

But what do you want to do if there is no product with the given identifier? Or what to do if the product management component is not initialized yet? You may return “null” in these cases. But this may result in several issues. For example the following code will break in such a case.

Product myProduct = GetProduct(123);

myProduct.SetUnitPrice(5,7);

In case “myProduct” is “null” the code will result in an error. Therefore returning null will create work in the calling code and it is a source for errors if the calling code missing the null checks.

A better way is to throw a specific error or return a special case object. Throwing an error allows standard error handling with try-catch blocks on the calling code or a subordinate error handling routine. Returning a special case object will allow to continue the use case. For example in the above case the function may return a new product with uninitialized parameters if the specific product is not found. Of course, such a special case object must match with the use cases and may also result in some issues.

In summary you should not return null. Instead you should throw a specific error. In defined use cases it may also be possible to return a special case object.

Veröffentlicht unter .NET, C#, Clean Code | Kommentar hinterlassen

Design patterns: Iterator

An Iterator provides a way to access the elements of an object sequentially without exposing the underlying representation. It is a base Design Patterns which we use permanently. For example a loop over a list is done by using the list iterator. But of course, the Iterator pattern is not limited to simple loops over lists. You can use complex structures too and you can provide several ways to loop over elements. A complex structure can be a tree. In this case it is possible to implement an Iterator for the tree nodes. And of course there are several ways to loop over elements. Beside the standard iteration from front to back you may provide a way to loop from back to front, from a specific index to a specific, you can change the step width and so on.

The .NET Framework offers the IEnumerable and IEnumerator interfaces to implement the Iterator design pattern. Moreover the yield return keyword makes implementing this pattern even easier.

IEnumerable is an interface that defines one method GetEnumerator which returns an IEnumerator interface.  This will allow a read-only access to a collection. A collection that implements IEnumerable can be used within a foreach statement.

An IEnumerator is a thing that can enumerate. It has the Current property and the MoveNext and  Reset methods.

Within the following example we create an generic collection which provides some Iterators. In this case it is sufficient to implement the IEnumerable interface for the standard front to back iteration and to add some more iterators by providing an enumerable object using the yield return feature.

At first we create a little data class to store information of a month, in this case to keep it simple we use the month name only.

public class Month
{
    public string Name { get; set; }
}

Next, we implement a generic ItemCollection class. Internal it will store the items within a standard list. To access the data we add some methods: Add, Reset and Count. At next we implement the IEnumerable interface. For the standard iteration from start to end we can use the enumerator of the list.

public class ItemCollection<T> : IEnumerable<T>
{
    private List<T> _items = new List<T>();
        
    public void Add(T item)
    {
        _items.Add(item);
    }

    public void Reset()
    {
        _items = new List<T>();
    }

    public int Count
    {
        get
        {
            return _items.Count;
        }
    }

    public IEnumerator<T> GetEnumerator()
    {
        return _items.GetEnumerator();
    }
        
    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }
} 

Within a simple test console application we may now use our ItemCollection class.

static void Main(string[] args)
{
    ItemCollection<Month> months = new ItemCollection<Month>();
            
    months.Add(new Month() { Name = "January" });
    months.Add(new Month() { Name = "February" });
    months.Add(new Month() { Name = "March" });
    months.Add(new Month() { Name = "April" });
    months.Add(new Month() { Name = "May" });
    months.Add(new Month() { Name = "June" });
    months.Add(new Month() { Name = "July" });
    months.Add(new Month() { Name = "August" });
    months.Add(new Month() { Name = "September" });
    months.Add(new Month() { Name = "October" });
    months.Add(new Month() { Name = "November" });
    months.Add(new Month() { Name = "December" });

    Console.WriteLine("---standard---");
    foreach (Month month in months)
    {
        Console.WriteLine(month.Name);
    }

    Console.ReadKey();
}

Till now our ItemCollection class does not provide any benefit compared to a standard list. So let us add some more iterators. The following code shows the additional iterator to loop from back to front, from an index to an end index or with a specific step width.

public class ItemCollection<T> : IEnumerable<T>
{
    private List<T> _items = new List<T>();
        
    public void Add(T item)
    {
        _items.Add(item);
    }

    public void Reset()
    {
        _items = new List<T>();
    }

    public int Count
    {
        get
        {
            return _items.Count;
        }
    }

    public IEnumerator<T> GetEnumerator()
    {
        return _items.GetEnumerator();
    }
        
    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

    public IEnumerable<T> FrontToBack
    {
        get
        {
            return this;
        }
    }

    public IEnumerable<T> BackToFront
    {
        get
        {
            for (int index = Count - 1; index >= 0; index--)
            {
                yield return _items[index];
            }                
        }
    }

    public IEnumerable<T> FromTo(int fromIndex, int toIndex)
    {
        for (int index = fromIndex; index <= toIndex; index++)
        {
            yield return _items[index];
        }
    }

    public IEnumerable<T> StepWidth(int stepWidth)
    {
        for (int index = 0; index < Count; index = index + stepWidth)
        {
            yield return _items[index];
        }
    }
}  

The new iterators can be used within the foreach loop.

static void Main(string[] args)
{
    ItemCollection<Month> months = new ItemCollection<Month>();
            
    months.Add(new Month() { Name = "January" });
    months.Add(new Month() { Name = "February" });
    months.Add(new Month() { Name = "March" });
    months.Add(new Month() { Name = "April" });
    months.Add(new Month() { Name = "May" });
    months.Add(new Month() { Name = "June" });
    months.Add(new Month() { Name = "July" });
    months.Add(new Month() { Name = "August" });
    months.Add(new Month() { Name = "September" });
    months.Add(new Month() { Name = "October" });
    months.Add(new Month() { Name = "November" });
    months.Add(new Month() { Name = "December" });

    Console.WriteLine("---standard---");
    foreach (Month month in months)
    {
        Console.WriteLine(month.Name);
    }

    Console.WriteLine("---front to back---");
    foreach(Month month in months.FrontToBack)
    {
        Console.WriteLine(month.Name);
    }

    Console.WriteLine("---back to front---");
    foreach (Month month in months.BackToFront)
    {
        Console.WriteLine(month.Name);
    }

    Console.WriteLine("---from index to index---");
    foreach (Month month in months.FromTo(5,6))
    {
        Console.WriteLine(month.Name);
    }

    Console.WriteLine("---with step width---");
    foreach (Month month in months.StepWidth(3))
    {
        Console.WriteLine(month.Name);
    }

    Console.ReadKey();
}

This little example shows the principle of the Iterator design pattern. Of course, for such simple collections like the one in the example you can use the available collections provided by the .NET framework. But if you use complex structures, for example a tree, you may want implement your own Iterator patterns.

Veröffentlicht unter .NET, C#, Clean Code, Design Pattern | Kommentar hinterlassen

Design patterns: Facade

A Façade class provides a higher level interface to a set of sub level interfaces. This will make the subsystem easier to use.

The high level interface will hide details of the low level interfaces. This may be complex workflows to use the low level interfaces, for example a mandatory initialization process. The Façade also hides the needed connections and interactions between the low level interfaces. Furthermore it will reduce complexity by removing parameters or result values which are not needed on the high level.

In other words the Façade is a class that provides a set of methods and properties that makes it easier for clients to use a complex subsystem of classes.

The following code shows a possible implementation of the Façade design pattern. In this example some configuration data shall be stored within a file. The data is stored in xml format and for security reasons the file will be encrypted. The low level system offers classes to convert the data, do the encryption and the file handling.

class ConfigurationData
{
    public string MyFirstValue { get; set; }
    public int MySecondValue { get; set; }
}

class XmlConverter
{
    public string ToXml(ConfigurationData data)
    {
        //...serialization
    }

    public ConfigurationData ToConfigurationData(string xml)
    {
        //...deserialization
    }
}

class EncryptionService
{
    public string Encrypt(string rawData)
    {
        //...encryption
    }

    public string Decrypt(string encryptedData)
    {
        //...decryption
    }
}

class FileService
{
    public void WriteFile(string fileName, string data)
    {
        //...write file
    }

    public string ReadFile(string fileName)
    {
        //...read file
    }
}

The client does not need to know these details and therefore we offer a simple to use façade class.

class ConfigurationManager
{
    public void SaveConfiguration(ConfigurationData data)
    {
        string fileName = "MyFile.cfg";

        string rawData;
        string encryptedData;

        XmlConverter converter = new XmlConverter();
        EncryptionService encryptionService = new EncryptionService();
        FileService fileService = new FileService();

        rawData = converter.ToXml(data);
        encryptedData = encryptionService.Encrypt(rawData);
        fileService.WriteFile(fileName, encryptedData);
    }
}

Due to the simple design and the huge effort of a Façade class it is a very common design pattern. It is one of the most commonly used design patterns in a 3-tier architectural model. In such a architecture the presentation layer is the client. The client calls to business layer take place via a Façade, often called the service layer. This service layer hides the complexity of the business layer objects and their interactions.

Furthermore a Façade can be used if you have to deal with confusing or messy legacy classes. You can hide such classes behind a Façade which exposes only what is necessary and presents this needs in an easy to use and well-organized interface.

Last but not least I want to mention that the Façade pattern is sometimes mixed up with the Wrapper pattern. But it is easy to distinguish between both if you have a look to the functionality they provide. A Wrapper offers the same functionality like the sub level interface it hides. It will only change the interface itself for example to solve compatibility issues. In contrast, a Facade offers an simplified and easy to use interface of a complex subsystem. A Facade therefore does not contain the full set of features and possibilities of the subsystem. It will offer the functionality needed by the client only. In summary, a Wrapper is used for compatibility and a Façade is used to simplify interfaces.

Veröffentlicht unter .NET, C#, Design Pattern | Kommentar hinterlassen

Use factory methods instead of overloaded constructors

If a class can be created and initialized in different ways or by using different kind of values you may often see overloaded constructors. This is a standard way to implement such an object creation but it has a major disadvantage. The constructor itself is not self-explaining.  Very often you need a good code documentation to understand the behavior of the different constructors.

A possible solution to create cleaner code may be to use factory methods instead of the overloaded constructors. These factory methods should have names explaining their use case. This will make the code clean and an additional documentation may be obsolete.

Within the .NET Framework you will find some classes, following this guideline. For example the TimeSpan class. A TimeSpan class instance may be created in different ways. It would be possibleto use a combination of concrete values for a duration e.g. year, month, days and so on or you may use ticks, seconds or minutes as single values.

If you use overloaded constructors you may implement the following ones for the TimeSpan class.

TimeSpan(double ticks)

TimeSpan(int years, int months, int days)

TimeSpan(int hours, int minutes, int seconds, int milliseconds)

TimeSpan(int years, int months, int days, int hours, int minutes, int seconds, int milliseconds)

In such a case it is necessary to write a good documentation and explain all functions. For example if you want to create a TimeSpan of one month, will it be allowed to call the following constructors?

TimeSpan(0,1,0)

TimeSpan(0,0,31)

TimeSpan(744,0,0,0)

TimeSpan(0,1,0,0,0,0,0)

Maybe the class will support all these ways to create the instance. But you will have two major disadvantages: To write the code you have to read the documentation and the created code is hard to read.

An alternative way to implement the TimeSpan class is by using factory methods instead of constructors. For example take a look at the following methods.

FromMonths(int months)

FromMinutes(int minutes)

FromTicks(int ticks)

The factory methods will create TimeSpan instances too. But these methods will remove the disadvantages shown above. They are clean and self-explaining and the created code is easy to read. The above task to create a TimeSpan of one month will be implemented following:

TimeSpan duration = TimeSpan.FromMonths(1);

Such factory methods will make the overloaded constructors obsolete. Therefore you should remove them or make them private. In summary you should use the following guideline:

When constructors are overloaded, use static factory methods with names that describe the arguments and make the corresponding constructors private.

Veröffentlicht unter .NET, C#, Clean Code | Kommentar hinterlassen

Linq vs Loop: Simple Query

Within this article and within further articles I want to compare Linq and classical loops. During code reviews you may sometimes hear the suggestion to use language specific features as they will simplify the source code. Such a feature is Linq. On the other hand you may ask yourself whether such a feature always create better code. Of course, to compare two different implementations of the same use case, you have to look at different criteria: complexity, maintainability, robustness (risk of errors), comprehensibility and so on. One of the most important or even the most important factor is the wish to have code which is easy to understand. This factor depends on the knowledge of the developers and on the best practices and clean code principles of the team or the company. Therefore you cannot generally define whether one or another solution is best practice for all companies

Following I want to compare Linq queries and classical loops for the use case to do a simple query. I want to show the pros and cons of each solution and give my personal opinion which solution I prefer. You may use my input to make your own decision.

The query will be implemented with respect of two different use cases. In the first use case we have a known and well defined data handling. The data handling is responsible for the data validation and will ensure we never get not initialized data. So we don’t have to check for null pointers or data ranges.

In the second use case we don’t know the data handling components. We want to offer a query which can be used by everyone. Our query itself shall be robust enough to deal with null pointers and other data issues.

Use case one: clean data

Let’s start with the first use case. We have a simple data class for a person. Out of a list of persons we want to find a specific one by name. To keep it simple we just look for the first person with the name. If the list does not contain the data we are looking for, we shall return a default person object.

The data class is defined as following:

public class Person
{        
    public string Name { get; set; }
    public uint Age { get; set; }

    public static readonly Person Default = new Person()
    {
        Name = "new person",
        Age = 0
    };        
}

Our demo console application creates a list of data and calls the data query method, first with an existing person and second with a not existing one.

List<Person> persons = new List<Person>();

persons.Add(new Person() { Name = "John Doe", Age = 35 });
persons.Add(new Person() { Name = "Jane Doe", Age = 41 });

//---------

Person person;

//search existing person
person = FindPerson(persons, "Jane Doe");
Console.WriteLine("Name: " + person.Name);

//search not existing person
person = FindPerson(persons, "???");
Console.WriteLine("Name: " + person.Name);

Console.ReadKey();

The data query method shall be implemented twice: by using a loop and by using Linq. We start with the classical loop:

static private Person FindPerson(List<Person> persons, string name)
{
    foreach (Person person in persons)
    {
        if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
        {
            return person;
        }
    }

    return Person.Default;
}

 

And we implement the same function by using Linq.

static private Person FindPerson(List<Person> persons, string name)
{
    var result = from person in persons
                    where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)
                    select person;

    return result
        .DefaultIfEmpty<Person>(Person.Default)
        .First<Person>();
}

Code Review for use case one

Let us compare the two implementations.

The loop is very clean, simple and easy to understand. With one look into the code I understand its functionality. The code does not contain any comments. And I don’t miss comments as the code is self-explaining.

The Linq implementation is simple too. The Linq query is a simple “from-where-select” query which is easy to understand. But if I read the code I will stumble over the “DefaultIfEmpty” function call. As I read this line of code I have to think about it a little moment to see it is needed for the use case that we don’t find a matching value in our data list.

As a result of the code review I actually prefer the first implementation using the loop because it is clear and easy to understand. Reading the Linq query has let stop me and think about the “DefaultIfEmpty” part.

In case the developer adds a little comment why the “DefaultIsEmpty” call is done, the Linq query is easy to understand too and I don’t prefer any of the two implementations.

Use case two: dirty data

The second use case adds an important need: the query must be robust. So the data may for example contain null values. Like in the first use case the method shall return a default person if the one we looking for is not found. Null values or not initialized list shall not throw an error. In this case also the default person shall be returned.

In our test console application we create some dirty data. And we add additional tests to call the function with the data or even with null parameters.

List<Person> persons = new List<Person>();

persons.Add(new Person() { Name = "John Doe", Age = 35 });
persons.Add(null);
persons.Add(new Person() { Name = null, Age = 38 });
persons.Add(new Person() { Name = "Jane Doe", Age = 41 });

//---------
Person person;

//search existing person
person = FindPerson(persons, "Jane Doe");
Console.WriteLine("Name: " + person.Name);

//search not existing person
person = FindPerson(persons, "???");
Console.WriteLine("Name: " + person.Name);

//search in a list which is not yet initialized
person = FindPerson(null, "???");
Console.WriteLine("Name: " + person.Name);

Console.ReadKey();

 

The implemented query using the loop must be adapted to handle all these special cases. The following source code shows an according implementation. The list and the list content will be checked for null values.

static private Person FindPerson(List<Person> persons, string name)
{
    if (persons == null)
    {
        return Person.Default;
    }

    foreach (Person person in persons)
    {
        if (person == null)
        {
            continue;
        }

        if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
        {
            return person;
        }
    }

    return Person.Default;
}

The implementation of the Linq query must be adapted too. A check of the whole list, as well of the single element is added.

static private Person FindPerson(List<Person> persons, string name)
{
    if (persons == null)
    {
        return Person.Default;
    }

    var result = from person in persons
                    where person != null
                    where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)
                    select person;

    return result
        .DefaultIfEmpty<Person>(Person.Default)
        .First<Person>();
}

Code Review for use case two

I think we have the same situation like in the first case. The adapted methods are quite easy to understand. They are clean and self-explaining. The “DefaultIsEmpty” part may need a comment or if the team will use Linq regularly such a comment may be obsolete. Therefore both implementations are equivalent from my point of view.

Veröffentlicht unter .NET, C#, Clean Code, LINQ | Kommentar hinterlassen