logger class in C++

Within this article I want to show some base implementation techniques and demonstrate them in the context of a logging class. The logging class should able to write logging information into a file. The file access should be thread safe. Therefore, we should manage a file and a locking resource. With respect to the resource management needs and the multithreaded use of the logger, we will use the following base implementation techniques: RAII, error handling, move ctor and move assignment, copy ctor and copy assignment, multithreading support and singleton pattern.

 

Resource management with RAII

To manage the file and the lock resources, we will use the RAII concept. The resources will be managed by objects. We will start by implementing the logger class and create a parametrized ctor. According to the RAII concept, this ctor will initialize the file stream. The second resource, the locking object, is needed in the output function. This output function which will add a logging message can be called by different threads. Therefore, we use a locking mechanism. The std class “lock_guard” implements the RAII concept and fulfills out needs.

The following source code shows a possible implementation including error handling which is explained next. The implementation so far will create compiler errors. Within the next chapter we will look at the different constructor’s and expand the example so it becomes executable.

// ----- header file -----

#pragma once

#include 
#include 
#include 

namespace Logging
{
  class MyLogger
  {
  public:
    // parametrized ctor
    MyLogger(std::string fileName);


    // dtor
    ~MyLogger()     
    {
      mStream.close();
    }

    // write to log file
    void WriteLine(std::string content);

  private:
    std::ofstream mStream;
    std::mutex mMutex;
  };
}

// ----- source file -----

#include "stdafx.h"
#include 
#include 
#include 

#include "MyLogger.h"

namespace Logging
{
  MyLogger::MyLogger(std::string fileName)
  {
    mStream.open(fileName);

    if (mStream.fail())
    {
      throw std::iostream::failure("Cannot open file: " + fileName);
    }
  }  

  void MyLogger::WriteLine(std::string content)
  {
    std::lock_guard lock(mMutex);
    
    std::time_t now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
    char timestamp[26];
    ctime_s(timestamp, sizeof timestamp, &now);

    std::string timestampWithoutEndl(timestamp);    
    timestampWithoutEndl = timestampWithoutEndl.substr(0, 24);

    mStream << timestampWithoutEndl << ": " << content << std::endl;
  }
}

 

Furthermore, we have to think about error handling. Within the parametrized ctor we want to open the file stream. This may fail. According to the documentation of “ofstream” the “open” function will normally not throw an exception and we have to check success by using the “fail” method. But it is possible to configure the streams in a way they will throw exceptions. As this is a global configuration for all streams we may not be sure whether some other component enables this behavior. In case logging is a mandatory feature and we ensure a correct file-handling including access rights on directories, we may decide that an error on opening the file should be an exceptional case. So, we should throw an error in this case. As the “open” function may already throw an exception of type “ios_base::failure” we will check for the non-exception behavior and throw an own exception of same type in this case.

 

Move vs. copy

With respect to the resource management we have to think about an important question: what do we expect if a copy of a class instance is created? The instance manages a file stream. Should the copy access the same file stream too? Of course not. This will bypass the RAII concept and leads to several issues. Therefore, we will not allow to create a copy of the instance. If someone wants to write to two different files, two different instances have to be created. But we may allow to pass the class instance into another scope, e.g. within a function call. As we don’t want to copy the class instance we have to use another technique: move ctor and move assignment. In case we move the class instance we can steal the resources from the source class and transfer it to the target class. As the source class instance will be deleted anyway this steal of resources is allowed.

The following source code shows an according implementation. The copy ctor and copy assignment operator are disabled and the move ctor and move assignment is implemented.


#pragma once

#include 
#include 
#include 

namespace Logging
{
  class MyLogger
  {
  public:
    // parametrized ctor
    MyLogger(std::string fileName);

    // disable copy ctor and copy assignment
    MyLogger(const MyLogger&) = delete;    
    MyLogger& operator= (const MyLogger&) = delete;
    
    // move ctor and move assignment
    MyLogger(MyLogger&& other)     
    {
      mStream.close();
      mStream = move(other.mStream);
    }
    
    MyLogger& operator=(MyLogger&& other)
    {
      mStream.close();
      mStream = move(other.mStream);

      return *this;
    }

    // dtor
    ~MyLogger()     
    {
      mStream.close();
    }

    // write to log file
    void WriteLine(std::string content);

  private:
    std::ofstream mStream;
    std::mutex mMutex;
  };
}

 

Client application

Now we can compile the source code and use it within a client application. The following console application contains some examples how to use the logger.

int _tmain(int argc, _TCHAR* argv[])
{
  // compiler error as no std ctor exists
  MyLogger logger;  
  
  // calls parametrized ctor
  MyLogger logger1(R"(d:\test1.txt)");    

  logger1.WriteLine("Hello");
  logger1.WriteLine("World");

  // call parametrized ctor and move ctor
  // use exception handler
  try
  {    
    MyLogger logger2 = MyLogger(R"(d:\test2.txt)");   

    logger2.WriteLine("Hello");
    logger2.WriteLine("World");
  }
  catch (std::ios_base::failure& e)
  {
    std::cout << e.what() << std::endl;

    return -1;
  }  

  // move assignment
  MyLogger logger3 = MyLogger(R"(d:\test3.txt)");

  logger3 = MyLogger(R"(d:\test4.txt)");  // calls move assignment operator
  logger3.WriteLine("Hello again");  // writes to test4.txt

  // copy ctor and copy assignment
  logger3 = MyLogger(logger1);  // compiler error as copy ctor is deleted
  logger3 = logger1;  // compiler error as copy assignment is deleted

	return 0;
}

 

Multithreading

As the “WriteLine” function already uses locks it can be used within multithreading scenarios. The following source code shows an example with two threads using the same logger instance.

MyLogger gLogger1(R"(d:\test.txt)");

void DoSomething(std::string input)
{
  //...do something

  //log function execution and results
  gLogger1.WriteLine("DoSomething was called with parameter: " + input);
}

void ExecuteThread(std::string threadNumber)
{
  for (int i = 0; i < 10; i++)
  {
    DoSomething(threadNumber + "_" + std::to_string(i));
    std::this_thread::sleep_for(std::chrono::milliseconds(10));
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  DoSomething("in front of threads");

  std::thread thread1(ExecuteThread, "1");
  std::thread thread2(ExecuteThread, "2");

  thread1.join();
  thread2.join();
  
  DoSomething("behind threads");

	return 0;
}

 

Singleton

If there should be one global logging instance which writes in a pre-defined file, you may use the singleton pattern to provide a global accessible logging instance. The following source code shows a possible implementation of the singleton pattern.

#include "MyLogger.h"

namespace Logging
{
  class MyLoggerSingleton
  {
  public:
    MyLoggerSingleton(MyLoggerSingleton const&) = delete;             // Copy construct
    MyLoggerSingleton(MyLoggerSingleton&&) = delete;                  // Move construct
    MyLoggerSingleton& operator=(MyLoggerSingleton const&) = delete;  // Copy assign
    MyLoggerSingleton& operator=(MyLoggerSingleton &&) = delete;      // Move assign

    static MyLogger& Instance()
    {
      static MyLogger myInstance(R"(d:\test.txt)");
      return myInstance;
    }  

  protected:
    MyLoggerSingleton() {}
    ~MyLoggerSingleton() {}
  };
}

 

You can use this singleton within the client application.

void DoSomething(std::string input)
{
  //...do something

  //log function execution and results
  MyLoggerSingleton::Instance().WriteLine("DoSomething was called with parameter: " + input);
}

void ExecuteThread(std::string threadNumber)
{
  for (int i = 0; i < 10; i++)
  {
    DoSomething(threadNumber + "_" + std::to_string(i));
    std::this_thread::sleep_for(std::chrono::milliseconds(10));
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  DoSomething("in front of threads");

  std::thread thread1(ExecuteThread, "1");
  std::thread thread2(ExecuteThread, "2");

  thread1.join();
  thread2.join();
  
  DoSomething("behind threads");

	return 0;
}

 

The singleton class is implemented by using a static member. This is a thread-safe implementation in C++11. I know you will find many excessive discussions about the singleton pattern, if it is thread-safe in c++ and whether you should use it at all. I don’t want to be part of this sometimes misleading discussions but I want to add some thoughts about the implemented singleton for the logger as you maybe want to implement something similar and may be confused whether a singleton is good or bad.

Singletons, in general, are widely used and they are one of the base software implementation patterns. They offer a lot of advantages. But of course, they will not match with any use case. For example, the logging approach shown in the implemented examples has one major disadvantage: the productive code is mixed up with logging code. If you have a complex application and the need for detailed logging in all situations, you code will be blown up very fast with a lot of logging code. In such cases it may be better to use other techniques, for example aspect oriented implementation.

Other discussion about thread safety of singletons and issues about a wrong clean up order of static variables are, in my opinion, outdated. With C++11 static member initialization is thread safe and since C++98 the cleanup order of statics is defined.

 

Summary

This article has shown some base implementation techniques like RAII, error handling, copy and movement ctor and singleton pattern. These techniques were used to implement a simple logging class. You may use this implementation as a template for implementing your own classes which have to manage a resource for example a file, a database connection or a network stream.

Advertisements
Veröffentlicht unter C++ | Kommentar hinterlassen

int * const x; (const pointer vs. const data)

The example within the title shows a typical pointer declaration containing the const keyword: “int * const x;”. If you write or read such a declaration you may be confused and ask yourself whether the pointer or the data is constant.

There is a simple principle which helps to read such declarations: Read it backwards. In the example, you therefore read: x is a constant pointer to an int.

The following examples show some declarations with constant pointers and/or constant data. By using the “read it backwards” principle, these declarations are easy to understand.

int* x;   // pointer to int
  
const int* x;   // pointer to constant int
int const* x;   // pointer to constant int

int* const x;   // constant pointer to int
  
int const * const x;   // constant pointer to constant int
const int * const x;   // constant pointer to constant int
  
int** x;    // pointer to pointer to int

const int** x;   // pointer to pointer to constant int
int const** x;   // pointer to pointer to constant int

int* * const x;   // constant pointer to pointer to int

int* const * const x;   // constant pointer to constant pointer to int
Veröffentlicht unter C++ | Kommentar hinterlassen

Fast way to return a large object

Within this article I want to think about the question how we can move objects between different scopes without the need of expensive copying and without the need to use an error-prone pointer handling. To analyze this topic, we want to write a function which returns a large object. Returning an object of a small built-in type, like an integer value, carries little to no overhead. Returning a larger object of class type may require more expensive copying from one memory location to another.

To keep it simple I will use a vector-object within the example of this article. But the shown principles will be the same if you use your own classes.

 

Return a pointer

The traditional approach returning a large object is to use a pointer. Therefore, we will start with an example application implementing this approach.

std::vector* CreateInstance()
{
  std::vector* instance = new std::vector();

  //...fill vector with data
  
  return instance;
};

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector* data = CreateInstance();

  //...use data class

  delete data;

	return 0;
}

 

This works fine but comes with a big disadvantage: the wild pointer will undermine the resource safety concept of C++. This will lead to typical issues memory leaks.

 

Return a smart pointer

To improve the example application, we could use a smart pointer. This will eliminate the resource safety violation as we now follow the RAII principle (resource acquisition is initialization).

std::unique_ptr<std::vector> CreateInstance()
{
  std::unique_ptr<std::vector> instance = std::make_unique<std::vector>();

  //...fill vector with data

  return instance;
};

int _tmain(int argc, _TCHAR* argv[])
{
  std::unique_ptr<std::vector> data = CreateInstance();

  //...use data class

  return 0;
}

 

This implementation is much better compared to the first one. But it also raises the question: Why use a pointer at all? Often, I don’t want to use a pointer at all, even if it is a smart pointer. Pointers distracts from the conventional use of an object.

 

Return an object

What I want to do is using the object. I want to implement a function which creates and returns the object without the need of using pointers. The following source code shows the adapted example.

std::vector CreateInstance()
{
  std::vector instance = std::vector();

  //...fill vector with data

  return instance;
};

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data = CreateInstance();

  //...use data class

  return 0;
}

 

By default, this copies the elements of “instance” into “data”. And of course, such a copy is expensive for large objects. But since “instance” is just about to be destroyed and the memory holding its elements is to be freed, there is no need to copy this object. Instead it is possible to steal the elements. C++11 directly supports this “stealing mechanism” by using move constructors. Therefore, with C++11 this implementation will create a cheap copy of the object as it simple moves the ownership of the elements. So, we don’t have to fear expensive copy mechanisms and can return the object directly, without the need of pointers.

 

Return value optimization

The shown move of the object will only work if the object contains a move constructor. This could be an explicitly implemented one or an implicitly declared one. Maybe you will come in a situation where you have to use an object which does not contain a move constructor. Which of the above implementation will now be the best one? Good news everyone: you can still return the object and don’t have to use pointers. This is possible because the compiler itself provides an optimization algorithm: the return value optimization. In case the compiler will find the code shown above, he can optimize it and change it to something like that:

void CreateInstance(std::vector* instance)
{
  instance = new std::vector();

  //...fill vector with data
};

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data;  
  CreateInstance(&data);

  //...use data class

  return 0;
}

 

This optimization is called “Copy elision”. It omits copy- and move- constructors. Therefore, it will work in both cases: for objects with move constructor and for objects without move constructors. In one case, it omits the expensive copy constructor and in the other case it omits the cheap movement and may make it even a little cheaper.

 

Summary

Don’t fear to return large objects directly. This object movement is very cheap in case your object supports movement. Most objects contain the needed move constructor implicitly or you can explicitly implement one. Furthermore, in case your object does not contain a move constructor at all, the return value optimization of the compiler will eliminate expensive copy command.

Veröffentlicht unter C++ | Kommentar hinterlassen

Type Safety and Resource Safety in C++

If I should summarize the main advantages of C++ in one short sentence I would say: C++ is a completely type safe and resource safe language without performance loss.

The type- and resource safety are two very powerful features if you use them right. Because of the very important principle “no performance loss”, C++ will often give responsibility to the developer. The language itself offers powerful and usually easy to use concepts which allows developers to implement in an efficient way. But on the other hand, the language will not add expensive management and validation layers to prevent programming errors because such concepts will compromise the performance. For example, C++ offers simple and lightweight pointer types. As a consequence, programming errors can lead to issues like dangling pointers and undefined behavior of the application. Other languages offer managed pointers to prevent such issues, but with the cost of performance loss.

Within this article I want to talk about the type-safety and resource-safety concepts and think about programming techniques we should use and programming errors we should avoid.

 

Type safety

C++ has a static and strict type system. “Static” means that types are already known at compile time. Of course, C++ also offers techniques for late bindings during runtime. But these programming techniques will only extend the static type system by additional dynamic features. “Strict” on the other hand means, to check the type compatibility. For example, you cannot sum up an integer and a string variable or you cannot pass a const variable to a function expecting a non-const variable. Most of these compatibility checks can be done during compile time as the type system is static. Of course, if dynamic features are used the compatibility check is moved to runtime.

The type system of C++ allows definitions which go beyond the specification of the data type. Of course, you will define the data type, like “unsigned int”, “double”, “string” and so on but you can define additional type characteristics like constness and non-constness or for example whether the type is a pointer, reference or r-value. This allows to create use case specific data types which will limit they usage to the use case specific needs. For example, a function which needs to read a value could get this value as const parameter. This use case specific limitation will prevent wrong usage, for example you cannot by mistake set the parameter to a new value in this case.

The static and strict type system will allow to avoid programming errors without any performance loss. As mentioned before, all type system checks are done during compile time and therefore do not influence the runtime performance. C++ will not restrict the developer to use this static type system only. It is also possible to use dynamic types but with the downside that programming errors are not detectable during compile time. Therefore, related issues will occur during runtime. To prevent performance loss, there are typically not many type checks during runtime. Therefore, such programming errors will most often lead to undefined behavior.

As you want to use the powerful type system of C++ you should pay attention to the following guidelines. At first you should think about the type itself. For example: do you need an integer value or a long? At next, you have to define the constness of the type: Do you want to read or write to the parameter? And of course, you have to define whether you want to use the value directly or whether you want to use a pointer or reference. In summary: choose the right type according to the needs of the use case.

As mentioned before, C++ offers the flexibility to bypass this static and strict type system and use dynamic features. As this comes with some major disadvantages you should avoid such language features. Use such features only if there is no other possibility, for example in case you have to deal with third party components offering an interface which needs dynamic type system features. But aside from such special use cases, you should never bypass the strict type system. This means you must not use things like casts, void pointers or unions. The misuse of casts and unions can lead to type and memory violations. Therefore, you should avoid casts and use a variant class rather than a plain union in most cases.

 

Resource safety

Many articles about comparison of programming languages contain a topic about resource management. But most of them only compare the memory management. Therefore, you will find definitions like: C++ has an explicit memory management and C# has an implicit garbage collection. But unfortunately, such considerations limit the view on one special case and don’t explain the base resource management concepts of the languages. Of course, memory management is an important topic, but as we want to think about resource safety in general, we should keep in mind that there are several other resources. For example: files, streams, network connections, mutexes, database connections and many more.

C++’s model of resource management is based on the use of constructors and destructors. Constructors specify the meaning of object initialization and destructors define the object cleanup. For scoped objects, destruction is implicit executed at the end of the scope. For objects placed in the free store (heap, dynamic memory) using new, delete is required. This model has been part of C++ since the very earliest days.

This simple but efficient constructor/destructor mechanism was introduced to handle the resource management part. But there is one big issue: there are two ways to call the destructor. As mentioned before, one way is the automatic call at end of object scope. And the other way is the explicitly to implement call of “delete” for objects created by using “new”. This second possibility for object management is a big source of errors. Programming errors may lead to situation were “delete” is never called and therefore resources will not be cleaned up. Or it may lead to situation were a “delete” was executed and resources are already cleaned up, but the calling object nevertheless wants to work with these resources.

As conclusion, C++ offers a nice and lightweight resource management system, but we have to use it in the right way. A common technique is called RAII (resource acquisition is initialization). This programming technique says that all resources, needed by an object instance, must be initialized within the constructor and cleaned up within the destructor. C++ supports the RAII principle in a nearly perfect way, except the fact that it is possible that the destructor of an object instance is never called as that instance life cycle is not implemented correctly. For example, we can create object instances and never delete them.

Fortunately, there is a simple solution for this issue. As mentioned before there exists an automatic life time management for objects. Scoped object will be deleted automatic as the scope has left. So, let’s use this nice mechanism. That means: think about the “new” and “delete” functionality and the resource safe way to use this feature. According to the RAII principle, resources must be managed by objects. And of course, memory is an important resource. So, if you have to allocate and free memory by using “new” and “delete”, this resource must be managed within an object. That’s all! This simple concept avoids a lot of the typical implementation errors in C++. And of course, there are already build in features within the language which support this programming principle, for example smart pointers, lock classes and so on.

That means, don’t use wild “new” and “delete” commands somewhere in your implementation. Manage the needed memory within an object and use “new” in constructor and “delete” in destructor. Even if “new” and “delete” are used in a small scope, near to each other, they should not be used. For example, a gladly used implementation is to call “new” at the start of a method to initialize a resource and “delete” at the end of the method. It seems to be harmless if you do this wild memory management within this small scope of a method call. But each instruction within the method could result in an early end of the method, for example caused by an exception. In such situations, the “delete” is never called. Therefore, even in such supposed easy manageable situations, use the RAII principle and do the resource management by using an object. The standard “unique_ptr” class is most often a perfect solution for resource management within method scope.

 

Summary

C++ offers a very strong and strict type system and an easy to implement and performant resource safe system. If you consequently use these concepts you could write efficient code. Of course, source code will never be flawless but if you bypass these type safety and resource safety concepts your source code will become bad and error faulty very quickly.

Veröffentlicht unter C++ | Kommentar hinterlassen

Arrays and inheritance, a source of errors

If you work with arrays of objects and offer some functions to execute on these arrays, it is common to pass an array pointer and the array size as function parameters. Arrays will often hold a huge amount of data. You normally don’t want to pass them as value and create a copy of the array. Therefore, if you pass an array as parameter it is transferred as pointer to the first element. This C++ feature is known as “array decay”. Once you convert an array into a pointer you lose the ability of the sizeof operator to count elements in the array. This lost ability is referred to as “decay” as the array decays into a pointer.

The following source code shows an example for array decaying. We have a class with a print function and we create an array of class instances. As we want to call the print function for all arrays, we create an according helper function. So, we pass the array and the array size as parameters to this function. As mentioned before, the array is passed as pointer to avoid copying the whole array. Therefore, the array is decayed into a pointer.

class MyClass
{
public:
  MyClass() : x(5)
  {
  }

  void Print() const
  {
    std::cout << x << std::endl;
  }

private:
  int x;
};


void PrintAll(const MyClass* elements, const int numberOfElements)
{
  for (int index = 0; index < numberOfElements; ++index)
  {
    elements[index].Print();
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  MyClass elements[10];

  PrintAll(elements, 10);

	return 0;
}

 

The example implementation works fine and prints the right results. So, array decaying seems to be a good and straight forward feature. But unfortunately, it is also a source of errors. As mentioned before, the decay comes together with a loss of size information. But size information is needed if we want to step through the array elements. The element size is calculated by the given array type, in this case “MyClass”. So, we have to think about the question what happens if the passed array is of a different type, for example of an object derived from “MyClass”. The following example shows an according implementation.

class MyClass
{
public:
  MyClass() : x(5)
  {
  }

  void Print() const
  {
    std::cout << x << std::endl;
  }

private:
  int x;
};

class MyDerivedClass : public MyClass
{
private:
  double y;
};

void PrintAll(const MyClass* elements, const int numberOfElements)
{
  for (int index = 0; index < numberOfElements; ++index)
  {
    elements[index].Print();
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  MyDerivedClass elements[10];

  PrintAll(elements, 10);

  return 0;
}

 

If you execute this test application you will see strange outputs. This undefined behavior is a result of the array decay. The size information of the origin type “MyDerivedClass” gets lost. Within the function the size of “MyClass” is used to go to the next array position. Therefore, a wrong memory location is interpreted as class instance and so we end up in undefined behavior of the application.

The way that array names decay into pointers is fundamental to their use in C and C++. However, array decay interacts very badly with inheritance as this feature isn’t available in C. A logical guideline may be to use arrays in C only because array decay works fine in case there is no inheritance. In C++ we should instead use alternatives to arrays. So, you can use the build in vector type. The following source code shows the example application with change from array to vector.

class MyClass
{
public:
  MyClass() : x(5)
  {
  }

  void Print() const
  {
    std::cout << x << std::endl;
  }

private:
  int x;
};

class MyDerivedClass : public MyClass
{
private:
  double y;
};

void PrintAll(std::vector<MyClass>& elements)
{
  for (int index = 0; index < elements.size(); ++index)
  {
    elements[index].Print();
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector<MyClass> elements(10);
  PrintAll(elements);   // ok

  std::vector<MyDerivedClass> derivedElements(10);
  PrintAll(derivedElements);  // compiler error, cannot convert vector<MyDerivedClass> to vector<MyClass>

  return 0;
}

 

This time the implementation error can be detected by the compiler. The strong type safety system of C++ prevents the function call with an incompatible type. So we can pass a vector of “MyClass” instances but no vector of “MyDerivedClass” instances.

 

Summary

Array decay is the concept of passing an array as a pointer to the first element. As a side effect, the size information is lost. This will lead to undefined behavior if the passed array contains elements with a divergent size, for example like in most inheritance scenarios. Therefore it is recommended to use arrays in C code or in scenarios without inheritance only. In all other cases, you should use type safe collections like vectors.

Veröffentlicht unter C++ | Kommentar hinterlassen

Exception-safe code

During the last years the manner we handle exceptions has fundamentally changed. If we look few years back we will find a lot of applications with exception handling in bigger context only. For example, a module executing a bigger task may be executed in an own thread or process and exceptions according this module were cached and the module was restarted after an error. Nowadays the exception handling has moved to smaller parts of the application. Ideally, we will work with exception-safe methods now.

Within this article I want to show the base ideas behind exception handling on method level and think about the according programming concepts. I will show general development patterns and don’t want to explain implementation details like the try-catch syntax.

 

Typical issues

At first, we should think about typical issues occurring as result of erroneous execution of a method and implement an example application containing features according these issues.

If an error occurs the execution of a method will be interrupted. Often, several data objects will be updated within a method. If execution is interrupted and only a part of the data is changed we will have invalid or corrupted data.

The method interruption may also result in resource management issues. If a resource is created at the beginning on the execution and released at the end, an interruption will have the side effect that the resource is not released at all. Such a resource may be for example memory, a file, a database connection or a locking object. Therefore, we could see long term issues, like continuously rising memory workload and immediately occurring issues like an application freeze as result of a deadlock.

Within the example application I want to address the two common issues we identified so far: data corruption and resource management. As resource management is a broad topic I want to add different concerns: a database resource and a synchronization resource for multithreading.

The example application should manage a list of customers. Each customer has a name and a list of orders. The orders are stored within a database. A management class should allow a thread-safe access to the customer data objects. To keep it simple we will omit implementation details of the single components, e.g. the database access, and look at the exception-safe topics. So, we want to implement a single method only: adding a new customer. During the article, we will try to implement this method in an exception-safe manner.

The following source code shows the base implementation of the example application.

class Orders;
class DatabaseConnection;
class OrderFactory;

struct Customer
{
  unsigned int mIdentifier;
  std::string mName;
  Orders* pOrders;
};

class CustomerManagement
{
public:
  CustomerManagement() : mNextAvailableIdentifier(1) {};

  void AddCustomer(std::string name);

private:
  std::mutex mMutex;
  unsigned int mNextAvailableIdentifier;

  std::vector<Customer> mCustomers;
};

int _tmain(int argc, _TCHAR* argv[])
{
  CustomerManagement manager = CustomerManagement();

  manager.AddCustomer("John Doe");

	return 0;
}

 

The customer manager has some private members: a list of customers, a mutex to implement a thread-safe access to the customer list and a counter for the customer identifier. Each customer should get a unique identifier so we use an internal counter which is used for this purpose. I know, there are better solutions to implement such an identifier but this simple solution will help to show issues like data corruption on exceptions.

The three forward declarations contain functionalities we want to use but like explained before, we don’t want to implement these classes. The orders class is a data class to store an order, the database connection class will allow the access to a database and the order factory creates a new order list and stores it into the database.

 

First implementation

Now we want to implement the “AddCustomer” method. At first, we don’t think about exception safety and implement the method in a straightforward manner: lock, get resources, update data, release resources and unlock.

void CustomerManagement::AddCustomer(std::string name)
{
  mMutex.lock();

  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  mCustomers.push_back(customer);

  DatabaseConnection* pConnection = new DatabaseConnection();
  mCustomers.back().pOrders = OrderFactory.CreateEmptyCollection(pConnection);
  delete pConnection;

  mNextAvailableIdentifier++;

  mMutex.unlock();
}

 

I am sure you have seen such method patterns quite often. But if we think about exception-safety we will identify some critical issues. What may happen if the method gets interrupted due to an exception, e.g. during a database access? Before you continue with reading think about this possibility and the issues which may occur.

I think we can identify three issues: a deadlock occurs as unlock will not be executed, the database workload increases as the database connection will not be closed and the internal data may be corrupted as the customer is not completely created or the internal counter for the identifier is not increased.

 

Requirements for exception safety

There are two common requirements for exception safety: leak no resources and don’t allow data structures to become corrupted. The deadlock issue belongs to the first topic as it is about management of a synchronization resource. So, we don’t add thread-safety as own requirement. At next we want to think about the two requirements and change the implementation of the “AddCustomer” method accordingly.

 

Leak no resources

Resource leaking may result in undefined behavior and can cause serious errors and difficult to reproduce strange behaviors of the application. But fortunately, you can avoid resource leaks in an amazingly simple way: use resource management objects.

The C++ language itself will ensure that all object instances created in the context of the method will be released at the end of the method execution. And this is independent whether we have a normal execution or a premature interruption due to an exception.

Within the example we use two resources: the locking object and the database connection object. We can use already existing implementations to instantiate resource management objects for these resources. The following code shows an implementation of the “AddCustomer” method with respect to the “leak no resources” requirement.

void CustomerManagement::AddCustomer(std::string name)
{
  std::lock_guard<std::mutex> guard(mMutex);
  
  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  mCustomers.push_back(customer);

  std::unique_ptr<DatabaseConnection> pConnection(new DatabaseConnection());    
  mCustomers.back().pOrders = OrderFactory.CreateEmptyCollection(pConnection);  

  mNextAvailableIdentifier++;  
}

 

Don’t allow data structures to become corrupted

If you are looking for implementation patterns avoiding data corruption, you will find several solutions. But they (nearly) all following two basic ideas and can therefore grouped together. One group will ensure that the objects remains in a valid state. The data may not be correct, for example it may be initialized incomplete, but the object state is fine and the data structures itself are not corrupted. In such cases the client can decide whether he wants to undo the erroneous step or not. The second group of methods have an atomic behavior. They succeed completely or if they fail the data and application state is like before the function call.

At next I want to show the concept of exception-safety guarantees. This is a well-known and often used concept which is based on the previous described groups of data handling concepts. It extends these two concepts by a third one which adds methods which will never throw any exceptions.

 

Exception-safe guarantees

This programming concept says that each single method must implement one out of three exceptions-safe guarantees.

  • Basic guarantee
  • Strong guarantee
  • No-throw guarantee

Within the next sections I want to explain each of the three guarantees and change the example method according these concepts.

 

Basic guarantee

A method which gives the basic guarantee will ensure that everything remains on a valid state and that there is no corrupted data. But the precise program state may not be predictable. Therefore, based on the type of error or moment when it occurs, the data and program state may be different on two function calls, but the data and state are always valid and not corrupted. The client is responsible to handle errors and may clean up data and repeat the method call if necessary.

The following source code shows the adapted “AddCustomer” method, which will give the basic guarantee now.

void CustomerManagement::AddCustomer(std::string name)
{
  std::lock_guard<std::mutex> guard(mMutex);

  std::unique_ptr<DatabaseConnection> pConnection(new DatabaseConnection());

  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  customer.pOrders = nullptr;

  mCustomers.push_back(customer);
  mNextAvailableIdentifier++;

  mCustomers.back().pOrders = OrderFactory.CreateEmptyCollection(pConnection);
}

 

Let us think about this implementation. What can happen in case of exceptions?

Locking and database access is implemented by using resource management objects. Independent at which moment an error occurs, the objects will be released and stay valid. So, at next, let us have a look at the data structures. The internal data structure contains a list with customers and a counter to create the customer identifier. We have two sub-function calls which potentially may throw an error: creating the database connection and create the empty order collection. Independent whether the one or the other throws an error, the data structure remains valid but it will have different states. If the database creation fails, the method call will return with an error. In this case no customer was added at all and the internal counter for the identifier creation was not changed. If the second sub-method call fails, the customer was already added with standard values (null pointer for orders list) and the internal counter for the identifier creation was changed too. So, the data structure is valid but it has a state different from the first error case.

Of course, our customer management object will offer some more functions, like searching for a customer and delete a customer. Therefore, after an error the client will be able to check the actual object state and continue accordingly.

Methods offering the “basic guarantee” will ensure a valid objects state and data which is not corrupted but it will not ensure a defined state or data content after an exception.

 

Strong guarantee

The strong guarantee eliminates the disadvantage of the undefined state of the basic guarantee. A method giving the strong guarantee will always have one of two defined states: it is executed completely or it has the same state like before the execution. Therefore, such methods have an atomic or transactional behavior. If an error occurs the state and data of the object is unchanged. Everything remains in the same state as it was before. If the method succeeds it succeeds completely. In case of an error the client does not longer have to do any analysis of the object state and maybe clean up some data.

The following code shows the adapted implementation of the “AddCustomer” method.

void CustomerManagement::AddCustomer(std::string name)
{
  std::lock_guard<std::mutex> guard(mMutex);

  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  customer.pOrders = nullptr;

  std::unique_ptr<DatabaseConnection> pConnection(new DatabaseConnection());
  customer.pOrders = OrderFactory.CreateEmptyCollection(pConnection);
  
  mCustomers.push_back(customer);
  mNextAvailableIdentifier++;    
}

 

Like in this example method, the strong behavior is often implemented by using temporary data objects. At the end of the method, if no error occurred, the actual object data and the temporary one will be swapped.

 

No-throw guarantee

A method giving this guarantee promise to never throw an application exception. It will always do what it promises to do and throw serious errors only, like an out of memory exception. For example, all operations on built-in types offer the no-throw guarantee.

If we try to implement our example method according this guarantee we will see two issues: the sub-functions call to open the database connection and the sub-function call to create the order list. With this design limitation, it is very difficult to implement a no-throw method. But, for example, we can implement a queue for database commands. This queue will get a command and returns immediately without any error. The command itself will be executed by the queue manager later. With such a design change our “AssCustomer” method will use sub-functions which give a no-throw guarantee and therefore we are able to give this guarantee too.

class DatabaseQueryQueue;

void CustomerManagement::AddCustomer(std::string name)
{
  std::lock_guard<std::mutex> guard(mMutex);

  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  customer.pOrders = DatabaseQueryQueue.CreateEmptyCollection();

  mCustomers.push_back(customer);
  mNextAvailableIdentifier++;
}

 

Now we can offer the no-throw guarantee. But the needed software design change is expensive. We must implement a new manager layer to access the database and adapt all our existing code accordingly. This brings us to the question which of the three guarantees we should offer.

 

Choose between the three guarantees

Exception safe code must offer one of the three guarantees. In my opinion, any of the functions you write should be exception-safe. Resource management should be done by management objects anyway and the data access should be done in a logical manner. Therefore, if you follow some base programming guidelines your methods already should give the basis guarantee, or only some little modifications are necessary.

If a method and therefore the application using this method is not exception-safe, it can result in resource leaks or corrupt data which results in unexpected application behavior and errors. You don’t want this and therefore exception safety is a basic need for your code.

Which guarantee is given by a function should be an individual choice. You can compare risk of exceptions and costs of exception safe implementations. No-throw methods are wonderful from a client point of view but they may be very expensive. If the implementation effort for different kinds of the three guarantees is nearly the same, you should choose the strongest one. But this guideline may be wrong in some situation as a stronger guarantee may not be practical in 100% of the time. For example, the strong guarantee often uses temporary objects or store the previous state and therefore it must create additional objects and must executed additional copy or move commands. This kind of exceptions-safety guarantee may not be reasonable for time critical parts of your application.

In summary, the choice of the suitable guarantee depends on the requirements of the application, the module, the object and the single method. These requirements normally limit the decision to one or two of the three guarantees. For the remaining ones, you must balance benefit against implementation effort to make your decision.

 

Sub-function calls

As seen in our example method, often we depend on existing functions which we want to use in our method. If we call a function within our method we are limited to their exception guarantee level and cannot give a higher one. If we call several methods the guarantee level may even be reduced. For example, we call two functions giving a strong guarantee. If the second function fails the changes of the first have already be done. Therefore, we can give the basic guarantee only.

You should keep this in mind in case you want to estimate the costs of an implementation. The components you are using will limit the safety guarantee you are able to give. If you must give a higher guarantee you may not be able to use the existing component and for example implement an exception safe proxy for the sub-component. This will have a great impact on the implementation costs of you component.

 

Exception-safe application

The exception-safety of the whole application depends on the guarantees the single functions can give. Even one function without exception safety will make the whole application unsafe, because in case that single function throws an error the whole application is in an undefined and unsafe state. The same is true for an object or module. The exception-safe guarantee of the whole software or a software component is according to the lowest guarantee of all their functions.

 

Summary

It is not difficult to write exception safe code. You simply must pay attention to two concepts. At first use objects to manage resources. This will prevent resource leaks. And at second think about the three safety-guarantees and which one you want and can give. According to this decision implement your internal data handling accordingly to prevent data corruption.

Exception safety is an important concept. It should be a visible part of your object interface. Therefore, you should think about exception safety at the same moment as you define the object interface. Furthermore, document your decision. This will be important for clients and for future maintainers.

Veröffentlicht unter C++ | Kommentar hinterlassen

Pure Interfaces

In my opinion, one disadvantage of c++ is that there exists no explicit interface concept or language feature. Of course, you can implement interfaces in c++ but you must use an abstract. The downside of this concept is the fact that an abstract class can already contain implementations and definitions or implementations of private or protected members or functions. Therefore, it allows to have elements which should not be part of an interface.

But before we start to analyze this in detail we want step back and define some terms. In most type-safe object-oriented programming languages with will find the concept of interfaces, abstract classes and concrete classes. An interface is a syntactical contract only. It defines the methods a class must contain. An abstract class also contains contract definitions and additional it already contains implementations. As an abstract class contains contract definitions for methods which must be implemented, it cannot be instantiated. Instead it is thought as base class for concrete classes. If a concreate class is based on an abstract class, it can use the implementations of the abstract class and it must implement the methods defined as contract but not implemented in the abstract class. A concrete class contains implementations only and therefore it can be instantiated. It can be derived from an interface and/or an abstract class and of course it can be implemented without using a parent element.

In c++ we don’t have an implicit language feature to implement interfaces. But we have abstract and concrete classes. An abstract class can be implemented by adding a contract for a method which must be implemented by a concreate class. In c++ we can implement such a contract by using a pure virtual method. If we want to implement an interface, we can do this by implementing an abstract class with pure virtual functions only. Therefore, you can implement interfaces in c++ but you must use an abstract class. As described at the beginning, abstract classes and interfaces are two independent elements in object oriented concepts. As c++ does not distinguish these two concepts, the compiler cannot prevent according implementation errors. For example, if you want to implement an interface, you can add elements (e.g. implemented methods) which will make the interface to an abstract class. But c++ cannot prevent this smelly software design. So, as a developer, you are responsible to write interfaces which are according object-oriented concepts. At next we will look at an example and think about a possible code design.

Let’s start with an example. We want to implement an application which is used to show and edit documents of different types. Therefore, we need some kind of document class which offers methods to manage a document. Within the example we want to use a base class “document” and two derived classes “TextFile” and “HtmlFile”. We will start with the load and save features of the document. In this article, we will think about the document data management only and offer an according interface which can be used by the client application. The following source code shows a possible implementation.

class Document
{
public:   // interface for clients of Document
  virtual void Load(std::string fileName);
  virtual void Save();

protected:  // common functions for implementers of Document
  std::string Serialize();
  void DeSerialize(std::string data);
  std::string CalculateHash();

protected:  // common data for implementers of Document
  std::string mFileName;
};

class TextFile : public Document
{
public:
  void Load(std::string fileName);
  void Save();

protected:
  std::string mEncoding;
};

class HtmlFile : public Document
{
public:
  void Load(std::string fileName);
  void Save();
};

 

The base class “document” contains public implementations for the load and the save methods. These public methods are our interface for the client application. Furthermore, it contains protected methods to serialize and de-serialize data, a function to calculate a hash code which is needed for some security features and it contains internal data about the loaded document. These protected methods and protected data members are an implementation help for the derived classes which can use these methods and data members. The derived classes can use the already implemented load and save methods or if needed they can overwrite them. Furthermore, they can add new elements like the “mEncoding” member in the “TextFile” class.

This implementation seems to solve our needs. And of course, you will find implementations of this kind very often in existing code. But based on the thoughts we had have at the start of the article we must ask: is this a good software design? What do you think?

 

Single Responsibility

In my opinion, there is one major issue regarding the above implementation: the class “Document” has four responsibilities. It provides the interface for client, it contains document specific functions used by derived classes, it contains generic functions which could be needed by other classes and not only derived ones and it manages the document data.

According the “single responsibility principle” such a design has many disadvantages. Due to the unnecessary dependencies, the code will be bad to maintain, changes will result in higher effort and compiling takes longer. So, we should try to split the document class into four separate classes, each responsible for one topic.

The following source code shows a possible implementation.

class DocumentInterface
{
public:   // interface for clients of Document
  virtual void Load(std::string fileName) = 0;
  virtual void Save() = 0;
};

class DocumentTools
{
protected:  // common functions for implementers of Document
  std::string CalculateHash();
};

class DocumentData
{
protected:  // common data for implementers of Document
  std::string mFileName;
};

class Serializer
{
public:  // common functions 
  std::string Serialize();
  void DeSerialize(std::string data);
};

 

We have three document specific classes, one for the interface, one for common helper functions used to implement derived classes and one for the data management. Furthermore, we have created a generic serializer class as it offers common serialization features which may be used in other uses cases too. So, this implementation can be reused in other scenarios and is no longer limited to documents.

 

Composition vs. Inheritance

The nice separation of concerns principle forces us to implement four independent classes. To implement our document specific features, we will use and connect these classes. This connection can be created by using two main concepts: composition and inheritance.

We know our classes “TextFile” and “HtmlFile” are both documents which must implement the document interface. Therefore, we found a first need for inheritance. But most often it isn’t that easy and so it isn’t in this case too. We could think about two main designs: implement the interface directly or implement a base class. “TextFile” as well as “HtmlFile” are both documents. Maybe we have some advantages if we use a base class “document” which implements the interface and we derive from this class. The following source code shows both possibilities.

// inheritance without base class
class DocumentInterface {};
class TextFile : public DocumentInterface {};
class HtmlFile : public DocumentInterface {};

// inheritance with base class
class DocumentInterface {};
class DocumentBase : public DocumentInterface {};
class TextFile : public DocumentBase {};
class HtmlFile : public DocumentBase {};

 

Beside the decision about the kind of inheritance we want to use, we should decide whether we want to use inheritance at all. This will lead us to the well-known “composition vs. inheritance” topic. Let’s assume we implement a document base class. This base class can derive from the tool and data classes or it can use the tool and data classes. The following code shows these possibilities.

// inheritance for additional classes
class DocumentInterface {};
class DocumentTools {};
class DocumentData {};
class Serializer {};
class DocumentBase : public DocumentInterface, public DocumentTools,
  public DocumentData, public Serializer {};

// composition for additional classes
class DocumentInterface {};
class DocumentTools {};
class DocumentData {};
class Serializer {};
class DocumentBase : public DocumentInterface
{
private:
  DocumentTools mTools;
  DocumentData mData;
  Serializer mSerializer;
};

 

As we can see we must make some fundamental design decisions before we start to implement the document feature. At first, we should decide whether we want to implement a base document class. Such a base class makes sense if we have some common functionality which is needed in several sub classes. For example, if we can implement the “Load” and “Save” functions in a generic way, we should implement them only once. In this case these functions should be implemented in a base class and can be used by all derived classes.

At next we should think about the composition vs. inheritance topic. If you have two classes and you want to choose the type of dependency, you could ask: “Is x also a y? Or does x only use y?”. For example, think about the dependency between “TextFile” and “Document”. Is the “TextFile” a “Document”? I think: Yes, it is. Therefore, we have a inheritance connection between them. What is with “HtmlFile” and “Serializer”? Is the “HtmlFile” a “Serializer”? I don’t think so. But the “HtmlFile” may use the “Serializer”. Therefore, we should use a composition in this case.

 

Possible class design

Based on the Thoughts so far, I want to implement a suitable class design. As a general rule, I would recommend avoiding dependencies. Each functionality should be an own feature and implemented independent from the other parts of the software. This will increase reusability, maintainability, the software will be easier to understand, unit-testing will be much easier and so the software gets a higher quality.

Of course, dependencies are needed sometimes. If a complex feature should be done, it needs the combine the different functionalities of the independent classes and combine them to solve a more complex task. That’s the point where we want to create dependencies. Depending on the use case, we will select the corresponding single classes needed to solve the use case and combine them to a more complex system.

The following source code shows an implementation according this rule. There are several independent classes. As we want to use the features of this classes to manage documents, we will combine the single tasks (classes) by using them in one complex task (class). So, we create a document base class which creates the dependency between the document specific data class, the document specific tool class, the independent tools and the client interface. This base class will then be starting point for concrete document classes. These will be derived from the base class and may use their functionality or implement their own features.

// interface for clients of Document
class DocumentInterface
{
public:   
  virtual void Load(std::string fileName) = 0;
  virtual void Save() = 0;
};

// common functions for implementers of Document
class DocumentTools
{
protected:  
  std::string CalculateHash();
};

// common data for implementers of Document
class DocumentData
{
protected:  
  std::string mFileName;
};

// service class which is independent from document features 
// an may be used used in other application units too
class Serializer
{
public:  
  std::string Serialize();
  void DeSerialize(std::string data);
};

// base class for all documents with
// base implementation of the document interface
class DocumentBase
{
public:   
  virtual void Load(std::string fileName);
  virtual void Save();

protected:
  DocumentTools mTools;
  DocumentData mData;

private:
  Serializer mSerializer;
};

// document class, derived from document base
// may use the implementations of the base class
// may add additional functions
class TextFile : public DocumentBase
{
protected:
  std::string mEncoding;
};

// document class, derived from document base
// may use the implementations of the base class
// may add additional functions
class HtmlFile : public DocumentBase
{
};

 

Pure interfaces

Based on the fundamental consideration about interfaces in object oriented languages, we have analyzed a messy implementation example and have thought about the interface concept in c++. This first concept, of separation between interfaces, abstract classes and concreate classes was the base idea of the further design decisions und design guidelines to implement classes without dependencies to each other and create a connection between them only at higher level in use case specific scenarios.

In summary, the basis of the design was a clear separation between interfaces, base classes and concrete classes. As we don’t have an explicit language concept for interfaces in c++, we should create an implicit rule: “Make pure Interfaces”. As you want to implement an interface for a client, you should use an abstract class, but it must contain public pure virtual functions only.

Veröffentlicht unter C++ | Kommentar hinterlassen