Prefer non-member functions

In C++ you can write a function as member function of a class or as non-member function outside of classes. But which of this both software designs is applicable? Which one do you should prefer? The answer is easy: It depends! Within this article I want to give you some guidelines which you may use to make decisions about this software design question. This will help you in your day by day business because as software developers we will write functions every day and therefore it’s a very fundamental question where we want to place this functions: inside or outside of a class.

To think about this question we will use an easy example. Let’s say we want to implement a database access. Typical functionalities of such a database access are connection management and create, read, update and delete values. The following example shows a basic implementation.

class DatabaseConnection;
class DatabaseValue;

namespace Database
{
  class DatabaseAccess
  {
  public:
    void Connect(std::string connectionSettings);
    void Disconnect();

    void InsertValue(std::string sqlQuery);
    DatabaseValue ReadValue(std::string sqlQuery);
    void UpdateValue(std::string sqlQuery);
    void DeleteValue(std::string sqlQuery);

  private:
    DatabaseConnection mConnection;
  };
}

I have implemented all functions as member functions inside of a class. Why did I choose this software design? It is also possible to write non-member functions instead.

namespace Database
{
  DatabaseConnection Connect(std::string connectionSettings);
  void Disconnect(DatabaseConnection connection);

  void InsertValue(DatabaseConnection connection, std::string sqlQuery);
  DatabaseValue ReadValue(DatabaseConnection connection, std::string sqlQuery);
  void UpdateValue(DatabaseConnection connection, std::string sqlQuery);
  void DeleteValue(DatabaseConnection connection, std::string sqlQuery);
}

As you can see, all functions share a common data member, the database connection object. If you use non-member functions you have to hand around this connection object, or even worse, you have to use a global variable. Therefore this use case is a perfect candidate for a class with member functions. The class will have its internal data and resource management which will be used by the member functions. And it will have the public interface with the member functions a client needs to implement database functionalities.

After implementation of this class you will use it within client. At this moment you may see that most of the clients will have a few database accesses only and most often read single values. Therefore we now have the need for a function which establish a database connection, read the value and closes the connection. You may add this function to the existing database class or write a non-member function. The following source code shows both implementations.

namespace Database
{
  class DatabaseAccess
  {
  public:
    void Connect(std::string connectionSettings);
    void Disconnect();

    void InsertValue(std::string sqlQuery);
    DatabaseValue ReadValue(std::string sqlQuery);
    void UpdateValue(std::string sqlQuery);
    void DeleteValue(std::string sqlQuery);

    // member function
    DatabaseValue ReadSingleValue(std::string connectionSettings, std::string sqlQuery)
    {
      DatabaseValue dbValue;
      DatabaseAccess dbAccess = DatabaseAccess();

      dbAccess.Connect(connectionSettings);
      dbValue = dbAccess.ReadValue(sqlQuery);
      dbAccess.Disconnect();

      return dbValue;
    }

  private:
    DatabaseConnection mDatabaseConnection;
  };

  // non-member function
  DatabaseValue ReadSingleValue(std::string connectionSettings, std::string sqlQuery)
  {
    DatabaseValue dbValue;
    DatabaseAccess dbAccess = DatabaseAccess();

    dbAccess.Connect(connectionSettings);
    dbValue = dbAccess.ReadValue(sqlQuery);
    dbAccess.Disconnect();

    return dbValue;
  }
}

What do you think is the better solution? Implement the method as member function or as non-member function?

I think this method should not be a member of the class. There are several reasons for my opinion. At first, the method adds a new concept to the class which does not fit into the existing interface. The existing interface is based on a connection which you have to establish and close and several database access functions you may use between opening and closing of the connection. The new function has a completely new functionality which is independent of the existing ones. It needs an own connection settings string and maybe it will result in errors if you already called the connect-function. So this new function does not fit into the existing interface and therefore makes the class harder to use for a client.

Another important aspect is the implementation of the function. As you can see we don’t use the class internal members within the function. This is a strong indication for the guess that the function should not part of the class.

Therefore I prefer to implement this method as non-member function. The method is strongly connected with the database access class because both belong to the same group of functionalities. But it is not part of the database access class itself because it does not extend the class functionality. It adds a new group of functionalities instead.

Non-member functions and classes which belong to the same topic can be grouped within the same namespace. Like in this example they all belong to the group of database specific functionalities and therefore I have grouped them together within the Database namespace.

Guidelines for non-member functions

After this comparison between member functions and non-member functions I want to think about some important aspects of non-member functions. At the beginning of this article I wrote a class and within the next code example I just take the class methods and put them outside of the class as non-member functions. But this is dangerous and create hard to maintain and error-prone code. That’s because these functions are non-member functions from a technical point of view but not from a logical point of view. They all belong together, can only be called in the right order and may have side effects or must hand around resources.

Therefore I want to talk about some guidelines you should take in mind whenever you write a non-member function. From my point of view a non-member function respect the following aspects:

  • Atomic
  • Stateless
  • Without side effects

A function is atomic if it can be executed without the need of other functions. Within the above source code with the database functions as non-members, the functions depend on each other. You have to call the Connect function first before you can use the Insert, Read, Update and Delete functions. And at the end you must not forget to call the Disconnect function. So these functions have a strong relationship. You cannot use o of the functions you always have to use the right combination of several functions.

If you have such an atomic function, it should be stateless. Sometimes stateless functions are also called service functions. It means that if you call the function with the same parameters several times it will always return the same result independent of the state of the application or the system where it is executed. For example you may try to break up the strong relationship between the database non-member functions and put the database connection into a global variable to remove the need for the data turn around. In this case the relationship between the functions will be reduced. But of course you are far away from atomic functions and add a global variable which results in some other issues. But if you do so your functions will become state-controlled. For example the Read-function has a SQL-query as input and return the database value. But their functionality is state-controlled now as it will check the state of the database connection, which is now stored within the global variable. If the connection is established the function will return the result of the SQL-query, otherwise it may throw an exception. The non-member function will now become a state-controlled non-service function.

Another very important aspect is the question whether the function has side effects or not. Maybe that’s the most important of the three aspects because if it is not fulfilled it can result in big issues and make the source code very hard to maintain and error-prone. To have a function without side-effects, the function should not change anything within your software system. So, for example, it should not write into a global variable, should not set some application property and it should not create any kind of output file. This may surprise you as we have implemented a database non-member function which will read one value from the database and you may now think what if we want to write one value? Is it bad style if we put this in a non-member function? Furthermore the last example that we don’t want to create some kind of output file may be surprising too as many of the non-member functions are implemented especially fur such helper or tool methods. You are right if you think about the functionality of this non-member function. But there are huge differences if we think about the two main possibilities how we can implement this functionality. For example if you have a data object and you want to serialize into a file in xml format. You may implement this within a non-member function. This will contravene the guideline that a non-member function shall not have any side effects because in this case we create a file and therefore change out system. So think about the other possibility and put all functionality with side effects into a class. This will be a good idea anyway because “side effect” always means that we have to access some resource. And resource management is one of the main reasons why we have classes. Going back to the serialization example in my opinion it will be a good design to implement a File management class which does the generic file handling without analyzing the file content. The de/serialization of the data, which means converting an object into an xml stream and vise verse, will be implemented within a second class. If we have these two classes we can create our non-member function which writes an object into the file. This non-member function will use both classes and combine their features to create the output file. Of course this will create a file and therefore has a side effect, but the big difference between the direct implementation of the file writer within the non-member function is that the non-member function itself will not create the file and therefore it is not the source for the side effects. It will delegate this functionality to the according experts, in this case the classes which are responsible for the resource management.

The long explanation of the last topic and the small difference between own responsibility for side effects versus delegated responsibility for side effects shows the small difference of these concepts from an implementation point of view. Developers may even argue that they want to create the file directly within the non-member function and avoid the overhead and implement additional classes. But if you have to understand, maintain or extend such source code or if you have to hear user complaints about application errors and you have to write bugfixes based on function with side effects, you will understand why I think that this aspect is so important.

Do not put everything in the class

Within the previous paragraph I addressed an interesting aspect: a non-member function can use one or more classes to fulfill its functionality. I even think that this is the standard use case. An interesting case in this context is a function which uses member functions of one class. Such functions are often implemented. This is understandable because if you implement a class and you want to add some functionality based on the class it feels self-evident to put this function into the class. But as we have seen so far some of these functions are good candidates for non-member functions. Fortunately there is a really easy characteristic to recognize such functions. If a function will use the public interface functions of the class only, it is a perfect candidate for a non-member function. So if you write a class and their member functions you may ask yourself: Does this class use the internal members and resource handling or does it only use the public interface functions? In the first case it should be implemented as class member function and in the second case as non-member function.

Advantages of non-member functions

So far we have seen use cases and guidelines when and how we can implement non-member functions. But why should we implement them at all? We could also put them together into a class. In our example the function to read a single database value can be grouped together with some other database tool functions and we can put them into a database tool or helper class. And if you look into existing code you will sometimes find a lot of “helper”, “tool” or “common” classes.

If we implement the non-member functions as atomic functions without side effects we don’t have any advantage if we put them together in one class. There is no relationship between the functions, except that they belong together from a logical point of view. But we can respect this aspect by putting them into the same namespace and there is no need to put them into a class and therefore connect them in a technical way.

If we leave them disconnected we have less dependency. The client which uses the functions depends on this function only. As a result only a change of the function will result in the need to adapt the client. If we put several functions into one class we increase the probability for changes. Clients will now use the function of the class and therefore they may depend on the class and must be changed or re-compiled every time the class changes, which will of course occur more often then a change of a single function. By using non-member functions we will increase the independence of the function as we create fewer dependencies to clients.

The same aspect will be relevant if we want to add new functionality. In case we have non-member functions we can add new functionality at any time as this will not affect the clients. In contrast, if we add something to a class it may become necessary to update or recompile the client code. By using non-member functions we facilitate the implementation of new functionality.

The atomicity of non-member function will allow us to group them together as we want. We can use one common namespace for all non-member functions or several sub-namespaces with one common parent namespace or any kind of namespace structure. Furthermore we can put these function groups into several files. So we can create functions packages and deploy them independently. By using classes or a system of parent classes and sub classes you may be able to create the same grouping for the functions but it will become more complex and less maintainable then using namespaces. And with classes you don’t have the possibility to split functions of the same group into different files and deploy them separated. You can use the same namespace in different files but you cannot write partial classes in different files. By using non-member functions can group functions in an easy manner and you are able to create and deploy function packages.

Summary

Member functions of classes as well as non-member functions have their own advantages and disadvantages. If you want to benefit of these advantages you have to choose the right kind of function depending on your use case.

If you want to write functions which are strongly coupled as they share the same resources and will be used together by a client you should implement an according class and write class member functions.

If you have atomic functions which do not share or need any resources and can be executed as service-functions without any side effects, you should implement these methods as non-member functions. Several non-member functions which belong to the same topic can be grouped within one namespace.

Advertisements
Veröffentlicht unter C++ | Kommentar hinterlassen

Returning handles to private members makes them public

Some key features of object-oriented programming can be brunch together to something I want to call the “black box idea”. An object is a black box offering a public interface. All internal implementations, like state management, data control and resource management are not visible to the client. That’s why we implement a public interface and private class members.

Within this article I want to think about the question whether we can return handles to class internal members with respect to the black box idea or whether this will violate the object oriented principles.

At first we will create a small example application which we will later on use to test several possibilities to return handles. Within the example we have a class “Person” which should manage person specific data. The data is stored in a struct and the person class will have an internal member with an instance of this struct. Within this example the struct contains the name and address of the person. Both elements are implemented as structs too. Furthermore for testing purposes we add some functionality to the person class. Within the ctor we create the person data and we add a method which writes the person name to the console.

struct Name
{
  std::string FirstName;
  std::string LastName;
};

struct Address
{
  std::string City;
  std::string Street;
};

struct PersonData
{
  Name mName;
  Address mAddress;
};

class Person
{
public:
  Person()
  {
    mPersonData = std::make_shared<PersonData>();
    mPersonData->mName.FirstName = "John";
    mPersonData->mName.LastName = "Doe";
    mPersonData->mAddress.City = "Springfield";
    mPersonData->mAddress.Street = "Main Street";
  }

  void ShowName()
  {
    std::cout << mPersonData->mName.FirstName << " " << mPersonData->mName.LastName << std::endl;
  }

private:
  std::shared_ptr<PersonData> mPersonData;
};

 

We will use this implementation for all following examples. To make the examples clearer I don’t want to show the whole source code every time. Therefore the next examples show an excerpt of the people class only, with additional methods we want to implement. Soo keep in mind that we always have the ctor which create the people data and the ShowName method which prints the name to the console.

 

Returning handle

The issue of this article is the question whether returning a handle to an internal member is possible without the risk to violate object oriented concepts. So we will start to an according method to the people class. Let’s say we want to get the name of the person. Therefore we return a handle to the according data member.

class Person
{
public:
  ...

  Name& GetName() { return mPersonData->mName; }

  ...
};

 

Within a client application we can use this method and get the name of the person. But this software design has a big disadvantage. The client will also get the possibility to change the object internal member, without knowing it or on purpose. The following code shows an according example. The client can set the name of the data object he received and if we call the ShowPerson method we will see that the class internal member was changed too.

int _tmain(int argc, _TCHAR* argv[])
{
  Person person = Person();
  person.ShowName();

  person.GetName().LastName = "Smith";
  person.ShowName();

  return 0;
}

 

By returning a handle to the internal member we make this member public. If we disclosure class internal details we will violate the object oriented concepts and create error-prone code. So we create a private member which is public on closer inspection. Unfortunately the compiler will not detect such unsafe implementations and therefore we do not get a compiler warning.

 

Const function

Constness is a strong and important C++ feature. By making a method const we will ensure that this method cannot change class internal members. That’s exactly what we want. So let’s make the method const and try again whether we can change the internal member.

class Person
{
public:
  ...

  Name& const GetName() { return mPersonData->mName; }

  ...
};

int _tmain(int argc, _TCHAR* argv[])
{
  Person person = Person();
  person.ShowName();

  person.GetName().LastName = "Smith";
  person.ShowName();

  return 0;
}

 

Unfortunately this has no effect. We can compile and execute the code and the internal member can still be changed. Is there something wrong with the const feature? No, everything is fine as we don’t change the person object. The person object stores the data by using a pointer to the person data object. By calling the GetName method we get this pointer and change the data. So the pointer itself will not be changed and the bitwise constness of the person object is guaranteed. As a result, from a compiler point of view this code is fine. From a user point of view we see that we have ignored the logical constness of the object but that’s something the compiler cannot detect.

 

Const handle

Another possibility may be to make the returned handle itself const. If we do so we have a read only handle and cannot change the internal member anymore. So let’s try this possibility.

class Person
{
public:
  ...

  const Name& const GetName() { return mPersonData->mName; }  

  ...
};

int _tmain(int argc, _TCHAR* argv[])
{
  Person person = Person();
  person.ShowName();

  person.GetName().LastName = "Smith";  // compilation error
  person.ShowName();

  return 0;
}

 

This change seems to work. We will now get a compiler error and cannot change the data object anymore. As conclusion this seems to be an easy solution for our issue. Just make the object handle read only and we cannot change it by mistake or on purpose.

But unfortunately there is another issue which can result in error-prone code and instable applications. We will now have read access to an object which is managed by the person class. And the person class thinks that this data object is one its private members. So the person class can change, invalidate and delete the data object and the underlying memory without respect of any existing client.

The following source code shows an according example. For test purposes I have created an Invalidate-method which resets the data object. This method is for test purposes only. Such a reset of the internal member can occur at any time as the person object can change its internal state whenever needed.

class Person
{
public:
  ...

  const Name& const GetName() { return mPersonData->mName; }  

  void Invalidate()
  {
    mPersonData.reset();
  }

  ...
};

int _tmain(int argc, _TCHAR* argv[])
{
  Person person = Person();
  person.ShowName();

  // get handle to the name object
  const Name* pName = &(person.GetName());

  // invalidate the internal member
  // such a reset can be done internally within the person object 
  // or it may be triggered by another thread or...
  person.Invalidate();

  // try to access the name handle data
  std::cout << pName->LastName;   // error during runtime

  return 0;
}

 

If we execute the test application we will get an error during runtime. We try to access the data member “pName->LastName;“  but the pointer to the data object is invalid and therefore we get an according exception.

Returning a handle is dangerous, even if it is a const one. This conclusion can be used as general guideline. Of course there may be exceptions to this rule but as a general guideline you shall keep in mind to not return handles.

If you have to make an exception to this rule, for example to create resource or speed optimized core components for your application or framework, you should minimize the chance for errors. In such a case you should use the according interface in a small and defined internal scope only. You should not return handles within an interface which is used by external clients. And of course you have to add according source code documentation explaining how to use the code and the risk in case this code is not used as intended.

 

Summary

Avoid returning handles to object internals.  To keep internal members private increases encapsulation and helps to write robust and clean code.

Veröffentlicht unter C++ | Kommentar hinterlassen

Design patterns: Command

The Command design pattern encapsulates a request as an object. This will allow adding additional functionality to the request. A typical example is the undo/redo functionality.

The command object will be used as mediator between the client and the receiver and ads its additional behavior. For example we could have a calculator application. The client will do some calculation by requesting the according functions of the receiver, in this case a calculation object. If we want to implement undo/redo functions, we could now add an according command object. The client will call the command instead of the calculator methods. The command itself will call the calculator methods. And it offers additional behaviors like the undo/redo functionality.

The following example will show a possible implementation of this pattern. At first we implement the client and the receiver for the calculator example. So let’s start by defining the interface for the calculator.

enum Operation
{
    Summation,
    Subtraction,
    Multiplication,
    Division
}

interface ICalculator
{
    int ExecuteOperation(Operation operation, int operand);
}

At next we implement the calculator.

class Calculator : ICalculator
{
    private int _currentValue = 0;

    public int ExecuteOperation(Operation operation, int operand)
    {
        switch (operation)
        {
            case Operation.Summation: _currentValue = _currentValue + operand; break;
            case Operation.Subtraction: _currentValue = _currentValue - operand; break;
            case Operation.Multiplication: _currentValue = _currentValue * operand; break;
            case Operation.Division: _currentValue = _currentValue / operand; break;
            default: throw new ArgumentException();
        }

        return _currentValue;
    }
}

And we use the implemented calculator within a console application.

static void Main(string[] args)
{
    ICalculator calculator = new Calculator();

    Console.WriteLine("+ 2 = " + calculator.ExecuteOperation(Operation.Summation, 2));
    Console.WriteLine("* 5 = " + calculator.ExecuteOperation(Operation.Multiplication, 5));
    Console.WriteLine("- 3 = " + calculator.ExecuteOperation(Operation.Subtraction, 3));

    Console.ReadKey();
}

Now we want to add undo/redo features. This can be done by using the command design pattern. So we want to add a calculator command which offers a undo function. We don’t need an explicit redo function as a redo is equal to a repeated execution of the command.

interface ICalculatorCommand
{
    int ExecuteOperation();
    int UndoOperation();
}

The command itself is implemented by using the calculator object. It will redirect the client calls to the calculator and adds the undo function.

class CalculatorCommand : ICalculatorCommand
{
    private ICalculator _calculator;
    private Operation _operation;
    private int _operand;

    public CalculatorCommand(
        ICalculator calculator,
        Operation operation,
        int operand)
    {
        _calculator = calculator;
        _operation = operation;
        _operand = operand;
    }

    public int ExecuteOperation()
    {
        return _calculator.ExecuteOperation(_operation, _operand);
    }

    public int UndoOperation()
    {
        Operation undoOperation = GetUndoOperation();

        return _calculator.ExecuteOperation(undoOperation, _operand);
    }

    private Operation GetUndoOperation()
    {
        switch (_operation)
        {
            case Operation.Summation: return Operation.Subtraction;
            case Operation.Subtraction: return Operation.Summation;
            case Operation.Multiplication: return Operation.Division;
            case Operation.Division: return Operation.Multiplication;
            default: throw new ArgumentException();
        }            
    }
}

The client, in our example the console application, will now have the possibility to store the calculation command. This will allow adding unlimited undo and redoing steps. In this example we undo and redo the last two calculations.

static void Main(string[] args)
{
    ICalculator calculator = new Calculator();
    ICalculatorCommand command;
    List<ICalculatorCommand> commands = new List<ICalculatorCommand>();

    command = new CalculatorCommand(calculator, Operation.Summation, 2);
    Console.WriteLine("+ 2 = " + command.ExecuteOperation());
    commands.Add(command);

    command = new CalculatorCommand(calculator, Operation.Multiplication, 5);
    Console.WriteLine("* 5 = " + command.ExecuteOperation());
    commands.Add(command);

    command = new CalculatorCommand(calculator, Operation.Subtraction, 3);
    Console.WriteLine("- 3 = " + command.ExecuteOperation());
    commands.Add(command);

    //undo last two command
    Console.WriteLine("undo = " + commands[2].UndoOperation());
    Console.WriteLine("undo = " + commands[1].UndoOperation());

    //undo last two command
    Console.WriteLine("redo = " + commands[1].ExecuteOperation());
    Console.WriteLine("redo = " + commands[2].ExecuteOperation());

    Console.ReadKey();
}
Veröffentlicht unter .NET, C#, Design Pattern | Kommentar hinterlassen

Define a variable inside or outside of a loop

If you implement something you will write a lot of loops. Within the loops you will often use variables. This is a very common issue but it isn’t that easy like it looks in the first moment. Because you have to make one decision: whether you define the needed variable(s) inside or outside the loop.

// define variable outside of loop
MyClass myInstance;

for (int i = 0; i < n; i++)
{
  myInstance = DoExecute();
}

// define variable inside of loop  
for (int i = 0; i < n; i++)
{
  MyClass myInstance(DoExecute());
}

 

I have heard many arguments, pros and cons about both approaches. Therefore I want to use this article to write about the most important aspects of this decision. I think the most important aspects are: readability, maintainability and costs (speed and memory). These are basic criteria which you can use for any kind of design decision.

 

Readability

In my opinion both implementations are coequal.

I already have heard the argument that the definition outside of the loop is less readable because if you read the assignment you sometimes have to have a look at the variable definition and therefore scroll up until you will find it. This may be necessary to understand what kind of variable we have. But there is something wrong with this argument. The root cause of the issue isn’t the question where we define the variable. There are two other design issues. At first your variable shall have a name which makes it readable without the need to know its type. And second you loop and your function shall not contain that much code that you have to scroll.

 

Maintainability

I think there is one difference between the both implementations which may have an influence to the code maintainability. The difference is the scope of the variables. In case you define the variable outside of the loop in can be accessed in a bigger scope. And therefore the variable which was created to use it within the loop only may be used in other parts of the function and therefore creates dependencies and reduces the maintainability of the source code.

But unfortunately this argument isn’t that solid too. If this issue occurs, your function will probably do more than one thing. If a function does one thing only the function wide scope of the variable should not be an issue.

 

Costs

As we have not seen any important differences so far we will now look at the costs of the different implementations. So we want to think about memory usage and execution speed.

One common guideline says: “Define a variable when it is needed”. The thought behind this guideline is that you don’t define a variable which will maybe never be used. A function may return early in case of an error or in case you have explicit return statements for example in parameter checks at the beginning of the function. If you define a variable prior to this possible function returns it may happen that it is never used. Therefore it’s a waste of time and memory to define the variable before it is needed. In the case of the loop the probability of this use case is small. Parameter checks which may lead to an early function return should already be done. But if you execute some functions within the loop which may throw an error, this argument will become relevant. Therefore with respect to this common development guideline we shall define the variable when it is needed and prefer the definition inside of the loop.

At next I want to think about the execution speed. In terms of operations the two approaches will create the following costs. (‘n’ is the number of loop iterations)

Variable definition outside of the loop Variable definition outside of the loop
1 constructor

n assignments

1 destructor

n constructors

n destructors

 

These costs of constructors, assignments and destructors depend on the programming language and on the variable type. Therefore it is not possible to say you shall always use the first or second approach. But if you implement the loop you may think about the used object. Is it a large object which needs to manage resources? In this case construction and destruction may be very expensive. Or is it a lightweight object with small construction cost? In this case construction and destruction may be very fast.

 

Summary

In terms of readability and maintainability you should prefer to define the variable inside the loop. The arguments for this decision are loose and therefore the main aspect is regarding execution costs.

If you don’t have any performance issues or if you don’t know the costs of the destructor, constructor and assignment of the object, you shall also prefer to define the variable inside the loop.

Only if you are dealing with a performance sensitive part of your application and you know that the constructor-destructor pair costs more than an assignment, you shall prefer to define the variable outside of the loop.

Veröffentlicht unter C++ | Kommentar hinterlassen

Slicing problem

By default C++ will pass parameters by value. Therefore when you pass an object to a method a copy of this object will be created. Other languages, for example C# will pass parameters by implicit references.

If you work with interfaces and pass parameters by value, the slicing problem can occur. “Working with interfaces” means you normally have a superclass defining functions and a couple of subclasses overwriting these functions.

Within the following example we implement a window base class with a drawing function. To keep it easy we implement a console output as replacement of a complex drawing algorithm. The window base class will be used by a couple of subclasses. The example contains one subclass which of course will overwrite the drawing function and add its own visualization.

class Window
{
public:
  virtual void Draw() const
  {
    std::cout << "window" << std::endl;
  }
};

class ToolWindow : public Window
{
public:
  virtual void Draw() const
  {
    std::cout << "tool window" << std::endl;
  }
};

int _tmain(int argc, _TCHAR* argv[])
{
  ToolWindow toolWindow = ToolWindow();

  toolWindow.Draw();

  return 0;
}

If we execute the application the output “tool window” is shown.

Now we add a generic drawing function which will get the window interface and call the windows drawing method. So we can use any kind of window subclass and pass it to the generic function.

void Draw(Window window)
{
  window.Draw();
};

int _tmain(int argc, _TCHAR* argv[])
{
  ToolWindow toolWindow = ToolWindow();

  Draw(toolWindow);

  return 0;
}

But now the output is “window”. What’s wrong within this example implementation? We have created a tool window and passed it to the function which expected a window (interface).

If we go back to the start of this article we will find the explanation: “C++ passes parameters by value and therefore creates a copy of the object passed to a method”. As our method expects a window object a new window object was created as copy of the given tool window. But of course the window object does not know the additional implementations of the tool window and therefore this part of the object was “sliced” away. The “slicing problem” occurs when an object of a subclass type is copied to an object of superclass type and thereby losing part of the information which was contained in the subclass.

To implement the functionally we want – a generic function expecting a interface (superclass) – we have to pass the value as reference. This can be done by using one of the most important implementation patterns in C++: “pass by reference to const”. The following source code shows the adapted example. Now we pass the value by using a const reference.

void Draw(const Window& window)
{
  window.Draw();
};

int _tmain(int argc, _TCHAR* argv[])
{
  ToolWindow toolWindow = ToolWindow();

  Draw(toolWindow);

  return 0;
}

This time the output is like expected: “tool window”. The tool window object reference was passed to the function and nothing was spliced away.
Of course, the slicing problem will not only occur for function parameters. Each type cast to a superclass may slice the subclass part away. Therefore I want to finish this article by showing a casting example. The following source code contains two castings: one to a superclass object and one to a superclass reference.

int _tmain(int argc, _TCHAR* argv[])
{
  ToolWindow toolWindow = ToolWindow();

  static_cast<Window>(toolWindow).Draw();
  static_cast<Window*>(&toolWindow)->Draw();

  return 0;
}

Of course we can expect the same slicing problem like in the examples above. The first cast will slice the subclass implementation away and outputs “window” and the second cast outputs “tool window”.

Veröffentlicht unter C++ | Kommentar hinterlassen

Methods should do one thing only

According to the Single Responsibility Principle a class should have one, and only one, reason to change. To same statement is valid for methods. A method should do one thing only and therefore have only one reason to change.

Unfortunately this principle will be disregarded often. And I think the main reason is that it isn’t that easy to say whether a method does one or several things. Within this article I want to give you some hints which may help you to count the responsibilities of a method.

Does your method delegate or implement tasks?

In general, source code can contain code to execute a task or code which delegates the task execution to another method or class. For example you want to create a file with some content. You may use the File class of the .NET framework and use the open, write and close methods to implement you file handling task. This implementation contains execution code. Your source code implicitly implements the task. This implementation will normally be part of a method. Within another class this method is used. For example prior to closing your application you want to store all data. So your shutdown module will call the implemented file creation method. This shutdown method will therefore delegate the work to another method.

Methods which delegate the work to other methods will most often contain several of such delegations. They will call them by using a logical order or workflow. Therefore I normally call these two types of code: “execution code” and “logical code”. Execution code is the implementation of a task and logical code contains a workflow which uses the methods of the execution code.

In my opinion the separation between execution code and logical code is a base concept for clean software architecture. And this concept will help us to solve the issue of this article. Therefore I want to ask the following question: “Is a method which contains execution code and logical code able to do one thing only or will it always do several things?”

I think it will help to look at a little example. The following example shall implement a report generation. The report will contain the salary for all employees. To make it easy we will not implement every detail. Let’s say we have following components: a data class for an employee and a report component which can create pages and add some headers and text elements. Furthermore it is possible to add a new page be defining the content.

interface IReportContent
{
	//...
}

interface IReport
{
	void CreateNewPage();
	void AddHeader(string header);
	void AddText(string content);
	void AddPage(IReportContent content);
}

interface IEmployee
{
	string Name { get; }
	double Salary { get; }
}

Or report shall contain a page with some base information, pages for all employees and a summary page. Therefore we will implement the following method.

class ReportGenerator
{
	public IReport CreateSalaryReport(List<IEmployee> employees)
	{
		IReport report = new Report();
					
		//create header
		report.CreateNewPage();
		report.AddHeader("Employee Report");
		report.AddText("Date: " + DateTime.Now);
		report.AddText("Created by: " + "...");

		//add all employees
		foreach (IEmployee employee in employees)
		{
			IReportContent content = GetEmployeeSalaryReportPage(employee);
			report.AddPage(content);
		}
			
		//add summary
		report.CreateNewPage();
		report.AddHeader("Total Pays:");
		report.AddText("....");
	}

	private IReportContent GetEmployeeSalaryReportPage(IEmployee employee)
	{
		//...
	}
}

What do you think if you look at this method? How many responsibilities does this method have? Or in the words of the Single Responsibility Principle: How many reasons for a change exist?

I think the method does three things. It creates the header page, it creates the summary page and it concatenates all pages to create the report. Or in other words, the method has three reasons for a change: A change of the content of the header page, a change of the content of the summary page or a change of the report structure.

Therefore we have to refactor this method. In this case you can extract the header page creation and the summary page creation into own methods.

class ReportGenerator
{
	public IReport CreateSalaryReport(List<IEmployee> employees)
	{
		IReport report = new Report();
		IReportContent content;

		content = GetReportHeader();
		report.AddPage(content);

		foreach (IEmployee employee in employees)
		{
			content = GetEmployeeSalaryReportPage(employee);
			report.AddPage(content);
		}

		content = GetReportSummary();
		report.AddPage(content);

	}

	private IReportContent GetReportHeader()
	{
		IReportContent content;
		content.AddHeader("Employee Report");
		content.AddText("Date: " + DateTime.Now);
		content.AddText("Created by: " + "...");

		return content;
	}

	private IReportContent GetEmployeeSalaryReportPage(IEmployee employee)
	{
		//...
	}

	private IReportContent GetReportSummary()
	{
		IReportContent content;
		content.AddText("Total Pays:");
		content.AddText("....");

		return content;
	}
}

With this small refactoring the responsibilities of the method have greatly changed. Now the method has one responsibility only to create a report by combining the report parts. As a result there is only one reason for a change: the page structure of the report changes, for example the summary page shall be removed.

After this example I want to come back to the topic and question of this chapter: “Does your method delegate or implement tasks?”

As you can see the first version of the method does both. It implements the creation of the header and summary pages and it delegates the creation of the employee salary page. This mix of execution code and logical code is a clear signal that the method does several things. In summary I want to make the following statement:

A method which contains execution code and logical code will always ever do several things. Therefore you should never mix execution code and logical code within a method.

Does your method contain several separated logical workflows?

If your method contains execution code or logical code only, you need another possibility to see whether it does several things. In this case you should discover the logical workflows within the method. If you can find more than one workflow or if you see a chance to split up the existing workflow into different parts, than there is a high probability your method does more than one thing.

I want to show a little example. Let’s say we have a settings engine which should store settings data. The data is stored as encrypted string. To make it easy we will look at the settings engine only and hide the details of the database component and the settings data component.

class SettingsEngine
{
	IDatabaseController _database;
	IConfigurationController _configuration;

	public void SaveSettings()
	{
		//connect or create database
		if(_database.Connect() == false)
		{
			_database.Create();
		}

		//get settings data as encrypted data
		string settings;
		settings = _configuration.GetEncryptedSettingsData();

		//write to database
		if(_database.WriteSettingsData(settings) == false)
		{
			throw new DatabaseWriteException(...);
		}
	}
}

The method contains logical code only. So far so good, but let us check the second criteria. Does the method contain several logical workflows? Unfortunately: yes! There are three workflows. At first we have the overall workflow to store the settings data, which is the main workflow of the method. Second we have the procedure to initialize the database by connect to the database or create a new one if necessary. And there is a small third workflow which tries to write the data and throw an exception if the data update fails. As a result of this design there are three possible reasons for a method change: The connections procedure could be changed, for example throw an error and don’t create the database. The data update could be changed, for example repeat the write step with additional rights and don’t throw an error. And the whole storage workflow can be changed for example by removing the connection step as this can be done by another service class.

In summary we have two separated workflows within the method workflow. We can therefore refactor this method by extracting these workflows.

class SettingsEngine
{
	IDatabaseController _database;
	IConfigurationController _configuration;

	public void SaveSettings()
	{
		InitializeDatabase();

		string settings;
		settings = _configuration.GetEncryptedSettingsData();

		WriteSettingsToDatabase(settings);
	}

	private void InitializeDatabase()
	{
		if (_database.Connect() == false)
		{
			_database.Create();
		}            
	}

	private void WriteSettingsToDatabase(string settings)
	{
		if (_database.WriteSettingsData(settings) == false)
		{
			throw new DatabaseWriteException(...);
		}
	}
}

Now the method contains no individual workflows. There is only the main workflow to store the settings. The method will do one thing only now and therefore there is only one reason the change the method.

In summary there I can give the following recommendation:

A method which contains several separated logical workflows will always ever do several things. Therefore you should never implement more than one workflow within one method.

Summary

A base software concept is: A method should do one thing only or in other words there should only one reason to change a method. But the source code of nearly every application will contain methods which violate this rule. One reason may be that it isn’t that easy to see whether your method does more than one thing or not. Within this article I have introduced two easy checks which you can use to identify methods with several responsibilities.

Veröffentlicht unter .NET, C#, Clean Code | Kommentar hinterlassen

Manage resources by using smart pointer RAII objects

A common way to create objects is by using a factory method or an abstract factory. Such a factory will normally create and return the object and handover the responsibility over the object to the client. Therefore the client has to release the created object to free no longer needed resources. The following source code shows a typical implementation pattern. The object is created by using a factory and released at the end of a function call.

class MyObject
{
};

MyObject* CreateMyObject()
{
  return new MyObject();
}

int _tmain(int argc, _TCHAR* argv[])
{
  MyObject *pMyObject = CreateMyObject();

  // ...

  delete pMyObject;
  
	return 0;
}

 

This looks fine in the first moment but this pattern is a source for errors. It may happen that the „delete“ statement is never executed. For example an error may occur in one of the statements before „delete“ and the function returns early. But not only runtime errors may occur, this pattern will also increase the possibility of implementation errors, especially in huge functions were the factory and the delete statements are far away from each other. For example a developer may add parameter checks or checks the return values of sub-function calls and return in case of wrong parameters or results.

To avoid such issues you should manage resources in own objects. If the object scope is left, the object will be destroyed and the destructor is called. This is done in any case, independent whether the function is executed completely, an early return is called or an error occurred.

By using a resource management object you can free resources in the dtor and you don’t have to think about all the possible ways the object scope is left.

This concept is often called RAII (resource acquisition is initialization). It means that objects acquire resources in their constructors and release them in their destructors.

Fur such simple cases like the one above you don’t have to implement an own object. Instead you can use an existing one offered by the STL, for example a shared pointer.

int _tmain(int argc, _TCHAR* argv[])
{
  std::shared_ptr<MyObject> pMyObject(CreateMyObject());

  // ...

  return 0;
}

 

Within the destructor, the shared pointer will delete the containing object. So it’s destructor is invoked and you have the possibility to release resources. You have to have one importand thing in mind whenever you use a shared pointer: it calls “delete” but not “delete[]”. Therefore you cannot use it for dynamically allocated arrays. That’s because vector and string can almost always replace dynamically allocated arrays and should be used instead. In case you need a shared pointer for arrays you can find one within the Boost library (boost::shared_array).

Another important guideline is to create the shared pointer within an own statement and do not use it inside another statement. Let us use the previous example to explain why. For example you want to execute a function „DoSomething“ which has two parameters, the object pointer and an object with application settings. These settings are read by a function “GetApplicationSetting()”. So you may call the DoSomething function and execute the creation of the shared pointer and the function call for the application settings as nested statements.

int _tmain(int argc, _TCHAR* argv[])
{
  DoSomething(std::shared_ptr<MyObject> pMyObject(CreateMyObject()), GetApplicationSettings());

  return 0;
}

 

Beside the fact that such nested function calls are difficult to read, this call may leak resources. But why? We use an object to manage the resources and as we have learned so far this should solve the resource leaking issue.

We have to think about the possibilities of the compiler to understand this behavior. The compiler has to add the following steps.

  • call GetApplicationSetting()
  • execute factory.CreateMyObject)
  • call std::shared_ptr ctor

But the compiler does not execute them in this order. He can change the execution order to create more efficient code. So the compiler can choose the following order:

  • execute factory.CreateMyObject)
  • call GetApplicationSetting()
  • call std::shared_ptr ctor

What will happen if “GetApplicationSetting()” throws an exception? In this case we create our object by calling the factory method but we have not yet stored it within the shared pointer. So we end up in a resource leak as the dtor of the object is never called. To avoid this issue we should create new objects and store them within the smart pointer in a standalone statement. Furthermore to increase source code readability I would recommend avoiding nested function calls in general. So I prefer to call “GetApplicationSettings()” in a standalone statement too.

int _tmain(int argc, _TCHAR* argv[])
{
  std::shared_ptr<MyObject> pMyObject(CreateMyObject());
  ApplicationSettings mySettings = GetApplicationSettings()

  DoSomething(pMyObject, mySettings);

  return 0;
}

 

Summary

Follow the RAII concept and use objects to manage resources. Implement the resource creation and the management object creation in a standalone statement.

Veröffentlicht unter C++ | Kommentar hinterlassen