Fast way to return a large object

Within this article I want to think about the question how we can move objects between different scopes without the need of expensive copying and without the need to use an error-prone pointer handling. To analyze this topic, we want to write a function which returns a large object. Returning an object of a small built-in type, like an integer value, carries little to no overhead. Returning a larger object of class type may require more expensive copying from one memory location to another.

To keep it simple I will use a vector-object within the example of this article. But the shown principles will be the same if you use your own classes.

 

Return a pointer

The traditional approach returning a large object is to use a pointer. Therefore, we will start with an example application implementing this approach.

std::vector* CreateInstance()
{
  std::vector* instance = new std::vector();

  //...fill vector with data
  
  return instance;
};

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector* data = CreateInstance();

  //...use data class

  delete data;

	return 0;
}

 

This works fine but comes with a big disadvantage: the wild pointer will undermine the resource safety concept of C++. This will lead to typical issues memory leaks.

 

Return a smart pointer

To improve the example application, we could use a smart pointer. This will eliminate the resource safety violation as we now follow the RAII principle (resource acquisition is initialization).

std::unique_ptr<std::vector> CreateInstance()
{
  std::unique_ptr<std::vector> instance = std::make_unique<std::vector>();

  //...fill vector with data

  return instance;
};

int _tmain(int argc, _TCHAR* argv[])
{
  std::unique_ptr<std::vector> data = CreateInstance();

  //...use data class

  return 0;
}

 

This implementation is much better compared to the first one. But it also raises the question: Why use a pointer at all? Often, I don’t want to use a pointer at all, even if it is a smart pointer. Pointers distracts from the conventional use of an object.

 

Return an object

What I want to do is using the object. I want to implement a function which creates and returns the object without the need of using pointers. The following source code shows the adapted example.

std::vector CreateInstance()
{
  std::vector instance = std::vector();

  //...fill vector with data

  return instance;
};

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data = CreateInstance();

  //...use data class

  return 0;
}

 

By default, this copies the elements of “instance” into “data”. And of course, such a copy is expensive for large objects. But since “instance” is just about to be destroyed and the memory holding its elements is to be freed, there is no need to copy this object. Instead it is possible to steal the elements. C++11 directly supports this “stealing mechanism” by using move constructors. Therefore, with C++11 this implementation will create a cheap copy of the object as it simple moves the ownership of the elements. So, we don’t have to fear expensive copy mechanisms and can return the object directly, without the need of pointers.

 

Return value optimization

The shown move of the object will only work if the object contains a move constructor. This could be an explicitly implemented one or an implicitly declared one. Maybe you will come in a situation where you have to use an object which does not contain a move constructor. Which of the above implementation will now be the best one? Good news everyone: you can still return the object and don’t have to use pointers. This is possible because the compiler itself provides an optimization algorithm: the return value optimization. In case the compiler will find the code shown above, he can optimize it and change it to something like that:

void CreateInstance(std::vector* instance)
{
  instance = new std::vector();

  //...fill vector with data
};

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data;  
  CreateInstance(&data);

  //...use data class

  return 0;
}

 

This optimization is called “Copy elision”. It omits copy- and move- constructors. Therefore, it will work in both cases: for objects with move constructor and for objects without move constructors. In one case, it omits the expensive copy constructor and in the other case it omits the cheap movement and may make it even a little cheaper.

 

Summary

Don’t fear to return large objects directly. This object movement is very cheap in case your object supports movement. Most objects contain the needed move constructor implicitly or you can explicitly implement one. Furthermore, in case your object does not contain a move constructor at all, the return value optimization of the compiler will eliminate expensive copy command.

Advertisements
Veröffentlicht unter C++ | Kommentar hinterlassen

Type Safety and Resource Safety in C++

If I should summarize the main advantages of C++ in one short sentence I would say: C++ is a completely type safe and resource safe language without performance loss.

The type- and resource safety are two very powerful features if you use them right. Because of the very important principle “no performance loss”, C++ will often give responsibility to the developer. The language itself offers powerful and usually easy to use concepts which allows developers to implement in an efficient way. But on the other hand, the language will not add expensive management and validation layers to prevent programming errors because such concepts will compromise the performance. For example, C++ offers simple and lightweight pointer types. As a consequence, programming errors can lead to issues like dangling pointers and undefined behavior of the application. Other languages offer managed pointers to prevent such issues, but with the cost of performance loss.

Within this article I want to talk about the type-safety and resource-safety concepts and think about programming techniques we should use and programming errors we should avoid.

 

Type safety

C++ has a static and strict type system. “Static” means that types are already known at compile time. Of course, C++ also offers techniques for late bindings during runtime. But these programming techniques will only extend the static type system by additional dynamic features. “Strict” on the other hand means, to check the type compatibility. For example, you cannot sum up an integer and a string variable or you cannot pass a const variable to a function expecting a non-const variable. Most of these compatibility checks can be done during compile time as the type system is static. Of course, if dynamic features are used the compatibility check is moved to runtime.

The type system of C++ allows definitions which go beyond the specification of the data type. Of course, you will define the data type, like “unsigned int”, “double”, “string” and so on but you can define additional type characteristics like constness and non-constness or for example whether the type is a pointer, reference or r-value. This allows to create use case specific data types which will limit they usage to the use case specific needs. For example, a function which needs to read a value could get this value as const parameter. This use case specific limitation will prevent wrong usage, for example you cannot by mistake set the parameter to a new value in this case.

The static and strict type system will allow to avoid programming errors without any performance loss. As mentioned before, all type system checks are done during compile time and therefore do not influence the runtime performance. C++ will not restrict the developer to use this static type system only. It is also possible to use dynamic types but with the downside that programming errors are not detectable during compile time. Therefore, related issues will occur during runtime. To prevent performance loss, there are typically not many type checks during runtime. Therefore, such programming errors will most often lead to undefined behavior.

As you want to use the powerful type system of C++ you should pay attention to the following guidelines. At first you should think about the type itself. For example: do you need an integer value or a long? At next, you have to define the constness of the type: Do you want to read or write to the parameter? And of course, you have to define whether you want to use the value directly or whether you want to use a pointer or reference. In summary: choose the right type according to the needs of the use case.

As mentioned before, C++ offers the flexibility to bypass this static and strict type system and use dynamic features. As this comes with some major disadvantages you should avoid such language features. Use such features only if there is no other possibility, for example in case you have to deal with third party components offering an interface which needs dynamic type system features. But aside from such special use cases, you should never bypass the strict type system. This means you must not use things like casts, void pointers or unions. The misuse of casts and unions can lead to type and memory violations. Therefore, you should avoid casts and use a variant class rather than a plain union in most cases.

 

Resource safety

Many articles about comparison of programming languages contain a topic about resource management. But most of them only compare the memory management. Therefore, you will find definitions like: C++ has an explicit memory management and C# has an implicit garbage collection. But unfortunately, such considerations limit the view on one special case and don’t explain the base resource management concepts of the languages. Of course, memory management is an important topic, but as we want to think about resource safety in general, we should keep in mind that there are several other resources. For example: files, streams, network connections, mutexes, database connections and many more.

C++’s model of resource management is based on the use of constructors and destructors. Constructors specify the meaning of object initialization and destructors define the object cleanup. For scoped objects, destruction is implicit executed at the end of the scope. For objects placed in the free store (heap, dynamic memory) using new, delete is required. This model has been part of C++ since the very earliest days.

This simple but efficient constructor/destructor mechanism was introduced to handle the resource management part. But there is one big issue: there are two ways to call the destructor. As mentioned before, one way is the automatic call at end of object scope. And the other way is the explicitly to implement call of “delete” for objects created by using “new”. This second possibility for object management is a big source of errors. Programming errors may lead to situation were “delete” is never called and therefore resources will not be cleaned up. Or it may lead to situation were a “delete” was executed and resources are already cleaned up, but the calling object nevertheless wants to work with these resources.

As conclusion, C++ offers a nice and lightweight resource management system, but we have to use it in the right way. A common technique is called RAII (resource acquisition is initialization). This programming technique says that all resources, needed by an object instance, must be initialized within the constructor and cleaned up within the destructor. C++ supports the RAII principle in a nearly perfect way, except the fact that it is possible that the destructor of an object instance is never called as that instance life cycle is not implemented correctly. For example, we can create object instances and never delete them.

Fortunately, there is a simple solution for this issue. As mentioned before there exists an automatic life time management for objects. Scoped object will be deleted automatic as the scope has left. So, let’s use this nice mechanism. That means: think about the “new” and “delete” functionality and the resource safe way to use this feature. According to the RAII principle, resources must be managed by objects. And of course, memory is an important resource. So, if you have to allocate and free memory by using “new” and “delete”, this resource must be managed within an object. That’s all! This simple concept avoids a lot of the typical implementation errors in C++. And of course, there are already build in features within the language which support this programming principle, for example smart pointers, lock classes and so on.

That means, don’t use wild “new” and “delete” commands somewhere in your implementation. Manage the needed memory within an object and use “new” in constructor and “delete” in destructor. Even if “new” and “delete” are used in a small scope, near to each other, they should not be used. For example, a gladly used implementation is to call “new” at the start of a method to initialize a resource and “delete” at the end of the method. It seems to be harmless if you do this wild memory management within this small scope of a method call. But each instruction within the method could result in an early end of the method, for example caused by an exception. In such situations, the “delete” is never called. Therefore, even in such supposed easy manageable situations, use the RAII principle and do the resource management by using an object. The standard “unique_ptr” class is most often a perfect solution for resource management within method scope.

 

Summary

C++ offers a very strong and strict type system and an easy to implement and performant resource safe system. If you consequently use these concepts you could write efficient code. Of course, source code will never be flawless but if you bypass these type safety and resource safety concepts your source code will become bad and error faulty very quickly.

Veröffentlicht unter C++ | Kommentar hinterlassen

Arrays and inheritance, a source of errors

If you work with arrays of objects and offer some functions to execute on these arrays, it is common to pass an array pointer and the array size as function parameters. Arrays will often hold a huge amount of data. You normally don’t want to pass them as value and create a copy of the array. Therefore, if you pass an array as parameter it is transferred as pointer to the first element. This C++ feature is known as “array decay”. Once you convert an array into a pointer you lose the ability of the sizeof operator to count elements in the array. This lost ability is referred to as “decay” as the array decays into a pointer.

The following source code shows an example for array decaying. We have a class with a print function and we create an array of class instances. As we want to call the print function for all arrays, we create an according helper function. So, we pass the array and the array size as parameters to this function. As mentioned before, the array is passed as pointer to avoid copying the whole array. Therefore, the array is decayed into a pointer.

class MyClass
{
public:
  MyClass() : x(5)
  {
  }

  void Print() const
  {
    std::cout << x << std::endl;
  }

private:
  int x;
};


void PrintAll(const MyClass* elements, const int numberOfElements)
{
  for (int index = 0; index < numberOfElements; ++index)
  {
    elements[index].Print();
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  MyClass elements[10];

  PrintAll(elements, 10);

	return 0;
}

 

The example implementation works fine and prints the right results. So, array decaying seems to be a good and straight forward feature. But unfortunately, it is also a source of errors. As mentioned before, the decay comes together with a loss of size information. But size information is needed if we want to step through the array elements. The element size is calculated by the given array type, in this case “MyClass”. So, we have to think about the question what happens if the passed array is of a different type, for example of an object derived from “MyClass”. The following example shows an according implementation.

class MyClass
{
public:
  MyClass() : x(5)
  {
  }

  void Print() const
  {
    std::cout << x << std::endl;
  }

private:
  int x;
};

class MyDerivedClass : public MyClass
{
private:
  double y;
};

void PrintAll(const MyClass* elements, const int numberOfElements)
{
  for (int index = 0; index < numberOfElements; ++index)
  {
    elements[index].Print();
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  MyDerivedClass elements[10];

  PrintAll(elements, 10);

  return 0;
}

 

If you execute this test application you will see strange outputs. This undefined behavior is a result of the array decay. The size information of the origin type “MyDerivedClass” gets lost. Within the function the size of “MyClass” is used to go to the next array position. Therefore, a wrong memory location is interpreted as class instance and so we end up in undefined behavior of the application.

The way that array names decay into pointers is fundamental to their use in C and C++. However, array decay interacts very badly with inheritance as this feature isn’t available in C. A logical guideline may be to use arrays in C only because array decay works fine in case there is no inheritance. In C++ we should instead use alternatives to arrays. So, you can use the build in vector type. The following source code shows the example application with change from array to vector.

class MyClass
{
public:
  MyClass() : x(5)
  {
  }

  void Print() const
  {
    std::cout << x << std::endl;
  }

private:
  int x;
};

class MyDerivedClass : public MyClass
{
private:
  double y;
};

void PrintAll(std::vector<MyClass>& elements)
{
  for (int index = 0; index < elements.size(); ++index)
  {
    elements[index].Print();
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector<MyClass> elements(10);
  PrintAll(elements);   // ok

  std::vector<MyDerivedClass> derivedElements(10);
  PrintAll(derivedElements);  // compiler error, cannot convert vector<MyDerivedClass> to vector<MyClass>

  return 0;
}

 

This time the implementation error can be detected by the compiler. The strong type safety system of C++ prevents the function call with an incompatible type. So we can pass a vector of “MyClass” instances but no vector of “MyDerivedClass” instances.

 

Summary

Array decay is the concept of passing an array as a pointer to the first element. As a side effect, the size information is lost. This will lead to undefined behavior if the passed array contains elements with a divergent size, for example like in most inheritance scenarios. Therefore it is recommended to use arrays in C code or in scenarios without inheritance only. In all other cases, you should use type safe collections like vectors.

Veröffentlicht unter C++ | Kommentar hinterlassen

Exception-safe code

During the last years the manner we handle exceptions has fundamentally changed. If we look few years back we will find a lot of applications with exception handling in bigger context only. For example, a module executing a bigger task may be executed in an own thread or process and exceptions according this module were cached and the module was restarted after an error. Nowadays the exception handling has moved to smaller parts of the application. Ideally, we will work with exception-safe methods now.

Within this article I want to show the base ideas behind exception handling on method level and think about the according programming concepts. I will show general development patterns and don’t want to explain implementation details like the try-catch syntax.

 

Typical issues

At first, we should think about typical issues occurring as result of erroneous execution of a method and implement an example application containing features according these issues.

If an error occurs the execution of a method will be interrupted. Often, several data objects will be updated within a method. If execution is interrupted and only a part of the data is changed we will have invalid or corrupted data.

The method interruption may also result in resource management issues. If a resource is created at the beginning on the execution and released at the end, an interruption will have the side effect that the resource is not released at all. Such a resource may be for example memory, a file, a database connection or a locking object. Therefore, we could see long term issues, like continuously rising memory workload and immediately occurring issues like an application freeze as result of a deadlock.

Within the example application I want to address the two common issues we identified so far: data corruption and resource management. As resource management is a broad topic I want to add different concerns: a database resource and a synchronization resource for multithreading.

The example application should manage a list of customers. Each customer has a name and a list of orders. The orders are stored within a database. A management class should allow a thread-safe access to the customer data objects. To keep it simple we will omit implementation details of the single components, e.g. the database access, and look at the exception-safe topics. So, we want to implement a single method only: adding a new customer. During the article, we will try to implement this method in an exception-safe manner.

The following source code shows the base implementation of the example application.

class Orders;
class DatabaseConnection;
class OrderFactory;

struct Customer
{
  unsigned int mIdentifier;
  std::string mName;
  Orders* pOrders;
};

class CustomerManagement
{
public:
  CustomerManagement() : mNextAvailableIdentifier(1) {};

  void AddCustomer(std::string name);

private:
  std::mutex mMutex;
  unsigned int mNextAvailableIdentifier;

  std::vector<Customer> mCustomers;
};

int _tmain(int argc, _TCHAR* argv[])
{
  CustomerManagement manager = CustomerManagement();

  manager.AddCustomer("John Doe");

	return 0;
}

 

The customer manager has some private members: a list of customers, a mutex to implement a thread-safe access to the customer list and a counter for the customer identifier. Each customer should get a unique identifier so we use an internal counter which is used for this purpose. I know, there are better solutions to implement such an identifier but this simple solution will help to show issues like data corruption on exceptions.

The three forward declarations contain functionalities we want to use but like explained before, we don’t want to implement these classes. The orders class is a data class to store an order, the database connection class will allow the access to a database and the order factory creates a new order list and stores it into the database.

 

First implementation

Now we want to implement the “AddCustomer” method. At first, we don’t think about exception safety and implement the method in a straightforward manner: lock, get resources, update data, release resources and unlock.

void CustomerManagement::AddCustomer(std::string name)
{
  mMutex.lock();

  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  mCustomers.push_back(customer);

  DatabaseConnection* pConnection = new DatabaseConnection();
  mCustomers.back().pOrders = OrderFactory.CreateEmptyCollection(pConnection);
  delete pConnection;

  mNextAvailableIdentifier++;

  mMutex.unlock();
}

 

I am sure you have seen such method patterns quite often. But if we think about exception-safety we will identify some critical issues. What may happen if the method gets interrupted due to an exception, e.g. during a database access? Before you continue with reading think about this possibility and the issues which may occur.

I think we can identify three issues: a deadlock occurs as unlock will not be executed, the database workload increases as the database connection will not be closed and the internal data may be corrupted as the customer is not completely created or the internal counter for the identifier is not increased.

 

Requirements for exception safety

There are two common requirements for exception safety: leak no resources and don’t allow data structures to become corrupted. The deadlock issue belongs to the first topic as it is about management of a synchronization resource. So, we don’t add thread-safety as own requirement. At next we want to think about the two requirements and change the implementation of the “AddCustomer” method accordingly.

 

Leak no resources

Resource leaking may result in undefined behavior and can cause serious errors and difficult to reproduce strange behaviors of the application. But fortunately, you can avoid resource leaks in an amazingly simple way: use resource management objects.

The C++ language itself will ensure that all object instances created in the context of the method will be released at the end of the method execution. And this is independent whether we have a normal execution or a premature interruption due to an exception.

Within the example we use two resources: the locking object and the database connection object. We can use already existing implementations to instantiate resource management objects for these resources. The following code shows an implementation of the “AddCustomer” method with respect to the “leak no resources” requirement.

void CustomerManagement::AddCustomer(std::string name)
{
  std::lock_guard<std::mutex> guard(mMutex);
  
  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  mCustomers.push_back(customer);

  std::unique_ptr<DatabaseConnection> pConnection(new DatabaseConnection());    
  mCustomers.back().pOrders = OrderFactory.CreateEmptyCollection(pConnection);  

  mNextAvailableIdentifier++;  
}

 

Don’t allow data structures to become corrupted

If you are looking for implementation patterns avoiding data corruption, you will find several solutions. But they (nearly) all following two basic ideas and can therefore grouped together. One group will ensure that the objects remains in a valid state. The data may not be correct, for example it may be initialized incomplete, but the object state is fine and the data structures itself are not corrupted. In such cases the client can decide whether he wants to undo the erroneous step or not. The second group of methods have an atomic behavior. They succeed completely or if they fail the data and application state is like before the function call.

At next I want to show the concept of exception-safety guarantees. This is a well-known and often used concept which is based on the previous described groups of data handling concepts. It extends these two concepts by a third one which adds methods which will never throw any exceptions.

 

Exception-safe guarantees

This programming concept says that each single method must implement one out of three exceptions-safe guarantees.

  • Basic guarantee
  • Strong guarantee
  • No-throw guarantee

Within the next sections I want to explain each of the three guarantees and change the example method according these concepts.

 

Basic guarantee

A method which gives the basic guarantee will ensure that everything remains on a valid state and that there is no corrupted data. But the precise program state may not be predictable. Therefore, based on the type of error or moment when it occurs, the data and program state may be different on two function calls, but the data and state are always valid and not corrupted. The client is responsible to handle errors and may clean up data and repeat the method call if necessary.

The following source code shows the adapted “AddCustomer” method, which will give the basic guarantee now.

void CustomerManagement::AddCustomer(std::string name)
{
  std::lock_guard<std::mutex> guard(mMutex);

  std::unique_ptr<DatabaseConnection> pConnection(new DatabaseConnection());

  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  customer.pOrders = nullptr;

  mCustomers.push_back(customer);
  mNextAvailableIdentifier++;

  mCustomers.back().pOrders = OrderFactory.CreateEmptyCollection(pConnection);
}

 

Let us think about this implementation. What can happen in case of exceptions?

Locking and database access is implemented by using resource management objects. Independent at which moment an error occurs, the objects will be released and stay valid. So, at next, let us have a look at the data structures. The internal data structure contains a list with customers and a counter to create the customer identifier. We have two sub-function calls which potentially may throw an error: creating the database connection and create the empty order collection. Independent whether the one or the other throws an error, the data structure remains valid but it will have different states. If the database creation fails, the method call will return with an error. In this case no customer was added at all and the internal counter for the identifier creation was not changed. If the second sub-method call fails, the customer was already added with standard values (null pointer for orders list) and the internal counter for the identifier creation was changed too. So, the data structure is valid but it has a state different from the first error case.

Of course, our customer management object will offer some more functions, like searching for a customer and delete a customer. Therefore, after an error the client will be able to check the actual object state and continue accordingly.

Methods offering the “basic guarantee” will ensure a valid objects state and data which is not corrupted but it will not ensure a defined state or data content after an exception.

 

Strong guarantee

The strong guarantee eliminates the disadvantage of the undefined state of the basic guarantee. A method giving the strong guarantee will always have one of two defined states: it is executed completely or it has the same state like before the execution. Therefore, such methods have an atomic or transactional behavior. If an error occurs the state and data of the object is unchanged. Everything remains in the same state as it was before. If the method succeeds it succeeds completely. In case of an error the client does not longer have to do any analysis of the object state and maybe clean up some data.

The following code shows the adapted implementation of the “AddCustomer” method.

void CustomerManagement::AddCustomer(std::string name)
{
  std::lock_guard<std::mutex> guard(mMutex);

  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  customer.pOrders = nullptr;

  std::unique_ptr<DatabaseConnection> pConnection(new DatabaseConnection());
  customer.pOrders = OrderFactory.CreateEmptyCollection(pConnection);
  
  mCustomers.push_back(customer);
  mNextAvailableIdentifier++;    
}

 

Like in this example method, the strong behavior is often implemented by using temporary data objects. At the end of the method, if no error occurred, the actual object data and the temporary one will be swapped.

 

No-throw guarantee

A method giving this guarantee promise to never throw an application exception. It will always do what it promises to do and throw serious errors only, like an out of memory exception. For example, all operations on built-in types offer the no-throw guarantee.

If we try to implement our example method according this guarantee we will see two issues: the sub-functions call to open the database connection and the sub-function call to create the order list. With this design limitation, it is very difficult to implement a no-throw method. But, for example, we can implement a queue for database commands. This queue will get a command and returns immediately without any error. The command itself will be executed by the queue manager later. With such a design change our “AssCustomer” method will use sub-functions which give a no-throw guarantee and therefore we are able to give this guarantee too.

class DatabaseQueryQueue;

void CustomerManagement::AddCustomer(std::string name)
{
  std::lock_guard<std::mutex> guard(mMutex);

  Customer customer = Customer();
  customer.mIdentifier = mNextAvailableIdentifier;
  customer.mName = name;
  customer.pOrders = DatabaseQueryQueue.CreateEmptyCollection();

  mCustomers.push_back(customer);
  mNextAvailableIdentifier++;
}

 

Now we can offer the no-throw guarantee. But the needed software design change is expensive. We must implement a new manager layer to access the database and adapt all our existing code accordingly. This brings us to the question which of the three guarantees we should offer.

 

Choose between the three guarantees

Exception safe code must offer one of the three guarantees. In my opinion, any of the functions you write should be exception-safe. Resource management should be done by management objects anyway and the data access should be done in a logical manner. Therefore, if you follow some base programming guidelines your methods already should give the basis guarantee, or only some little modifications are necessary.

If a method and therefore the application using this method is not exception-safe, it can result in resource leaks or corrupt data which results in unexpected application behavior and errors. You don’t want this and therefore exception safety is a basic need for your code.

Which guarantee is given by a function should be an individual choice. You can compare risk of exceptions and costs of exception safe implementations. No-throw methods are wonderful from a client point of view but they may be very expensive. If the implementation effort for different kinds of the three guarantees is nearly the same, you should choose the strongest one. But this guideline may be wrong in some situation as a stronger guarantee may not be practical in 100% of the time. For example, the strong guarantee often uses temporary objects or store the previous state and therefore it must create additional objects and must executed additional copy or move commands. This kind of exceptions-safety guarantee may not be reasonable for time critical parts of your application.

In summary, the choice of the suitable guarantee depends on the requirements of the application, the module, the object and the single method. These requirements normally limit the decision to one or two of the three guarantees. For the remaining ones, you must balance benefit against implementation effort to make your decision.

 

Sub-function calls

As seen in our example method, often we depend on existing functions which we want to use in our method. If we call a function within our method we are limited to their exception guarantee level and cannot give a higher one. If we call several methods the guarantee level may even be reduced. For example, we call two functions giving a strong guarantee. If the second function fails the changes of the first have already be done. Therefore, we can give the basic guarantee only.

You should keep this in mind in case you want to estimate the costs of an implementation. The components you are using will limit the safety guarantee you are able to give. If you must give a higher guarantee you may not be able to use the existing component and for example implement an exception safe proxy for the sub-component. This will have a great impact on the implementation costs of you component.

 

Exception-safe application

The exception-safety of the whole application depends on the guarantees the single functions can give. Even one function without exception safety will make the whole application unsafe, because in case that single function throws an error the whole application is in an undefined and unsafe state. The same is true for an object or module. The exception-safe guarantee of the whole software or a software component is according to the lowest guarantee of all their functions.

 

Summary

It is not difficult to write exception safe code. You simply must pay attention to two concepts. At first use objects to manage resources. This will prevent resource leaks. And at second think about the three safety-guarantees and which one you want and can give. According to this decision implement your internal data handling accordingly to prevent data corruption.

Exception safety is an important concept. It should be a visible part of your object interface. Therefore, you should think about exception safety at the same moment as you define the object interface. Furthermore, document your decision. This will be important for clients and for future maintainers.

Veröffentlicht unter C++ | Kommentar hinterlassen

Pure Interfaces

In my opinion, one disadvantage of c++ is that there exists no explicit interface concept or language feature. Of course, you can implement interfaces in c++ but you must use an abstract. The downside of this concept is the fact that an abstract class can already contain implementations and definitions or implementations of private or protected members or functions. Therefore, it allows to have elements which should not be part of an interface.

But before we start to analyze this in detail we want step back and define some terms. In most type-safe object-oriented programming languages with will find the concept of interfaces, abstract classes and concrete classes. An interface is a syntactical contract only. It defines the methods a class must contain. An abstract class also contains contract definitions and additional it already contains implementations. As an abstract class contains contract definitions for methods which must be implemented, it cannot be instantiated. Instead it is thought as base class for concrete classes. If a concreate class is based on an abstract class, it can use the implementations of the abstract class and it must implement the methods defined as contract but not implemented in the abstract class. A concrete class contains implementations only and therefore it can be instantiated. It can be derived from an interface and/or an abstract class and of course it can be implemented without using a parent element.

In c++ we don’t have an implicit language feature to implement interfaces. But we have abstract and concrete classes. An abstract class can be implemented by adding a contract for a method which must be implemented by a concreate class. In c++ we can implement such a contract by using a pure virtual method. If we want to implement an interface, we can do this by implementing an abstract class with pure virtual functions only. Therefore, you can implement interfaces in c++ but you must use an abstract class. As described at the beginning, abstract classes and interfaces are two independent elements in object oriented concepts. As c++ does not distinguish these two concepts, the compiler cannot prevent according implementation errors. For example, if you want to implement an interface, you can add elements (e.g. implemented methods) which will make the interface to an abstract class. But c++ cannot prevent this smelly software design. So, as a developer, you are responsible to write interfaces which are according object-oriented concepts. At next we will look at an example and think about a possible code design.

Let’s start with an example. We want to implement an application which is used to show and edit documents of different types. Therefore, we need some kind of document class which offers methods to manage a document. Within the example we want to use a base class “document” and two derived classes “TextFile” and “HtmlFile”. We will start with the load and save features of the document. In this article, we will think about the document data management only and offer an according interface which can be used by the client application. The following source code shows a possible implementation.

class Document
{
public:   // interface for clients of Document
  virtual void Load(std::string fileName);
  virtual void Save();

protected:  // common functions for implementers of Document
  std::string Serialize();
  void DeSerialize(std::string data);
  std::string CalculateHash();

protected:  // common data for implementers of Document
  std::string mFileName;
};

class TextFile : public Document
{
public:
  void Load(std::string fileName);
  void Save();

protected:
  std::string mEncoding;
};

class HtmlFile : public Document
{
public:
  void Load(std::string fileName);
  void Save();
};

 

The base class “document” contains public implementations for the load and the save methods. These public methods are our interface for the client application. Furthermore, it contains protected methods to serialize and de-serialize data, a function to calculate a hash code which is needed for some security features and it contains internal data about the loaded document. These protected methods and protected data members are an implementation help for the derived classes which can use these methods and data members. The derived classes can use the already implemented load and save methods or if needed they can overwrite them. Furthermore, they can add new elements like the “mEncoding” member in the “TextFile” class.

This implementation seems to solve our needs. And of course, you will find implementations of this kind very often in existing code. But based on the thoughts we had have at the start of the article we must ask: is this a good software design? What do you think?

 

Single Responsibility

In my opinion, there is one major issue regarding the above implementation: the class “Document” has four responsibilities. It provides the interface for client, it contains document specific functions used by derived classes, it contains generic functions which could be needed by other classes and not only derived ones and it manages the document data.

According the “single responsibility principle” such a design has many disadvantages. Due to the unnecessary dependencies, the code will be bad to maintain, changes will result in higher effort and compiling takes longer. So, we should try to split the document class into four separate classes, each responsible for one topic.

The following source code shows a possible implementation.

class DocumentInterface
{
public:   // interface for clients of Document
  virtual void Load(std::string fileName) = 0;
  virtual void Save() = 0;
};

class DocumentTools
{
protected:  // common functions for implementers of Document
  std::string CalculateHash();
};

class DocumentData
{
protected:  // common data for implementers of Document
  std::string mFileName;
};

class Serializer
{
public:  // common functions 
  std::string Serialize();
  void DeSerialize(std::string data);
};

 

We have three document specific classes, one for the interface, one for common helper functions used to implement derived classes and one for the data management. Furthermore, we have created a generic serializer class as it offers common serialization features which may be used in other uses cases too. So, this implementation can be reused in other scenarios and is no longer limited to documents.

 

Composition vs. Inheritance

The nice separation of concerns principle forces us to implement four independent classes. To implement our document specific features, we will use and connect these classes. This connection can be created by using two main concepts: composition and inheritance.

We know our classes “TextFile” and “HtmlFile” are both documents which must implement the document interface. Therefore, we found a first need for inheritance. But most often it isn’t that easy and so it isn’t in this case too. We could think about two main designs: implement the interface directly or implement a base class. “TextFile” as well as “HtmlFile” are both documents. Maybe we have some advantages if we use a base class “document” which implements the interface and we derive from this class. The following source code shows both possibilities.

// inheritance without base class
class DocumentInterface {};
class TextFile : public DocumentInterface {};
class HtmlFile : public DocumentInterface {};

// inheritance with base class
class DocumentInterface {};
class DocumentBase : public DocumentInterface {};
class TextFile : public DocumentBase {};
class HtmlFile : public DocumentBase {};

 

Beside the decision about the kind of inheritance we want to use, we should decide whether we want to use inheritance at all. This will lead us to the well-known “composition vs. inheritance” topic. Let’s assume we implement a document base class. This base class can derive from the tool and data classes or it can use the tool and data classes. The following code shows these possibilities.

// inheritance for additional classes
class DocumentInterface {};
class DocumentTools {};
class DocumentData {};
class Serializer {};
class DocumentBase : public DocumentInterface, public DocumentTools,
  public DocumentData, public Serializer {};

// composition for additional classes
class DocumentInterface {};
class DocumentTools {};
class DocumentData {};
class Serializer {};
class DocumentBase : public DocumentInterface
{
private:
  DocumentTools mTools;
  DocumentData mData;
  Serializer mSerializer;
};

 

As we can see we must make some fundamental design decisions before we start to implement the document feature. At first, we should decide whether we want to implement a base document class. Such a base class makes sense if we have some common functionality which is needed in several sub classes. For example, if we can implement the “Load” and “Save” functions in a generic way, we should implement them only once. In this case these functions should be implemented in a base class and can be used by all derived classes.

At next we should think about the composition vs. inheritance topic. If you have two classes and you want to choose the type of dependency, you could ask: “Is x also a y? Or does x only use y?”. For example, think about the dependency between “TextFile” and “Document”. Is the “TextFile” a “Document”? I think: Yes, it is. Therefore, we have a inheritance connection between them. What is with “HtmlFile” and “Serializer”? Is the “HtmlFile” a “Serializer”? I don’t think so. But the “HtmlFile” may use the “Serializer”. Therefore, we should use a composition in this case.

 

Possible class design

Based on the Thoughts so far, I want to implement a suitable class design. As a general rule, I would recommend avoiding dependencies. Each functionality should be an own feature and implemented independent from the other parts of the software. This will increase reusability, maintainability, the software will be easier to understand, unit-testing will be much easier and so the software gets a higher quality.

Of course, dependencies are needed sometimes. If a complex feature should be done, it needs the combine the different functionalities of the independent classes and combine them to solve a more complex task. That’s the point where we want to create dependencies. Depending on the use case, we will select the corresponding single classes needed to solve the use case and combine them to a more complex system.

The following source code shows an implementation according this rule. There are several independent classes. As we want to use the features of this classes to manage documents, we will combine the single tasks (classes) by using them in one complex task (class). So, we create a document base class which creates the dependency between the document specific data class, the document specific tool class, the independent tools and the client interface. This base class will then be starting point for concrete document classes. These will be derived from the base class and may use their functionality or implement their own features.

// interface for clients of Document
class DocumentInterface
{
public:   
  virtual void Load(std::string fileName) = 0;
  virtual void Save() = 0;
};

// common functions for implementers of Document
class DocumentTools
{
protected:  
  std::string CalculateHash();
};

// common data for implementers of Document
class DocumentData
{
protected:  
  std::string mFileName;
};

// service class which is independent from document features 
// an may be used used in other application units too
class Serializer
{
public:  
  std::string Serialize();
  void DeSerialize(std::string data);
};

// base class for all documents with
// base implementation of the document interface
class DocumentBase
{
public:   
  virtual void Load(std::string fileName);
  virtual void Save();

protected:
  DocumentTools mTools;
  DocumentData mData;

private:
  Serializer mSerializer;
};

// document class, derived from document base
// may use the implementations of the base class
// may add additional functions
class TextFile : public DocumentBase
{
protected:
  std::string mEncoding;
};

// document class, derived from document base
// may use the implementations of the base class
// may add additional functions
class HtmlFile : public DocumentBase
{
};

 

Pure interfaces

Based on the fundamental consideration about interfaces in object oriented languages, we have analyzed a messy implementation example and have thought about the interface concept in c++. This first concept, of separation between interfaces, abstract classes and concreate classes was the base idea of the further design decisions und design guidelines to implement classes without dependencies to each other and create a connection between them only at higher level in use case specific scenarios.

In summary, the basis of the design was a clear separation between interfaces, base classes and concrete classes. As we don’t have an explicit language concept for interfaces in c++, we should create an implicit rule: “Make pure Interfaces”. As you want to implement an interface for a client, you should use an abstract class, but it must contain public pure virtual functions only.

Veröffentlicht unter C++ | Kommentar hinterlassen

Casting in C++, Part 2

Within the first part of the article we have seen the existing type of casts and we have analyzed the c++ style operators for static cast and dynamic cast. We will now continue with the const cast and reinterpret cast operator and finish the topic with some more advanced topics like comparison between dynamic and static cast and the disadvantages of c style casts.

const_cast

The const cast operator typically used to cast away the constness of an object. It is the only c++ style cast that can do this. Const cast is considered safer than simple type casting because it won’t happen if the type of cast is not same as the type of the original object.

Like any other cast operator, the const cast should be used wisely. A constant object is normally explicitly constant to avoid misusing and possible application errors or undefined behavior. The following example shows a possible use case. Let´s say we have an object which allows to execute data queries. Such a query should analyze the data only and therefore the according method is defined as const, so the this-pointer is const and the data of the object cannot be changed within the method.

class DataQuery
{
public:
  DataQuery()
  {
  };

  int ExecuteQuery(const std::string query) const
  {
    return 42;
  }
};

Now you want to extend the actual implementation with a query counter. This may be easy as you can add a new object member and increase it by every call to the query method. But unfortunately, you are not allowed to change the object interface (for example to avoid conflicts with clients). As the method is defined as const and therefore the this-pointer is const, you cannot increase your member variable. In such a case, you can use the const cast operator to cast away the constness of the this-pointer. The following source code shows the adapted example.

class DataQuery
{
public:
  DataQuery() : mCounter{ 0 }
  {
  };

  int ExecuteQuery(const std::string query) const
  {
    (const_cast<DataQuery*>(this))->mCounter++;

    return 42;
  }

private:
  int mCounter;
};

Of course, in such a case you may ask whether this is a clean solution of the issue or just a workaround to bypass an environment limitation, in this case the fixed interface. In my opinion, a const cast is always a workaround. A clean architecture and implementation should not have a need to use a const cast operator.

But why may it sometimes by dangerous to cast away the constness and sometimes it is ok? And how can you recognize and distinguish these two situations? The following two examples show an interesting aspect which may help you to answer these questions. Let´s say we have a variable and a constant pointer to this variable. That’s typical in many situations, for example in the above example with the query counter we had have an internal data member and a constant pointer to the data (the this-pointer). Furthermore, we have a function which changes data and therefore you have a non-const pointer to the data. The following source code shows a possible implementation of such a use case.

void fun(int* pValue)
{
  *pValue = *pValue + 10;
}

int _tmain(int argc, _TCHAR* argv[])
{
  int value = 10;

  const int *pValue = &value;
  int* pValueNotConst = const_cast<int*>(pValue);
  fun(pValueNotConst);

  std::cout << value;
  return 0;
}

What do you think: Is it allowed to cast away the constness in this case? Please think about this question for a short moment…

And now we will do a little modification and define the origin integer value as const too.

void fun(int* pValue)
{
  *pValue = *pValue + 10;
}

int _tmain(int argc, _TCHAR* argv[])
{
  const int value = 10;

  const int *pValue = &value;
  int* pValueNotConst = const_cast<int*>(pValue);
  fun(pValueNotConst);
  
  std::cout << value;
  return 0;
}

What do you think now: Will this change your opinion whether the cast is allowed or not? Is it allowed in both cases, just in one or maybe in none of the two examples?

Within the first of the two examples the origin variable is not const. You can modify the value and therefore it is fine to cast away the constness of the pointer to change the data of the variable. In the second example, the origin data is const and therefore you should never change it. If you bypass this limitation and try to change the value, you may result in undefined behavior – in the above example the value of the variable is still “10” even after function call.

Of course, the implementation of the first example is dangerous too. Even if it works now it may be an issue in future, for example if someone changes the constness of the origin variable. In such cases the variable and the client code with the cast are normally not within three lines of code inside a method. Instead they are often far away from each other, in different classes and modules. Such a change will result in annoying and costly troubleshooting.

reinterpret_cast

By using the reinterpret cast operator the given data is interpreted as it has the new type. This cast will not do a type convert. It will read the memory you passed in a different way. You give it a memory location and you ask it to read that sequence of bits as if it had the new type. Therefore, it can only be used with pointers and references. Reinterpret cast is intended for low-level casts that yield implementation dependent results, for example casting a pointer to an int. Such casts should be rare outside low-level code.

The following code shows an example were the given unsigned short integer value is reinterpreted as signed short integer.

int _tmain(int argc, _TCHAR* argv[])
{
  unsigned short int value1 = 30;
  unsigned short int value2 = 40000;
  
  short int value3 = *reinterpret_cast<short int*>(&value1);
  short int value4 = *reinterpret_cast<short int*>(&value2);

  std::cout << value3 << std::endl;   // value is like expected
  std::cout << value4 << std::endl;   // may result in undefined behaviour

  return 0;
}

Such a reinterpretation may work or it may result in undefined behavior. Within this example you have such a situation in case the value range of the source data type exceeds the value range of the target data type. Therefore, if you use reinterpret cast, you must know what you are doing as the compiler cannot detect such logical issues. Reinterpret cast is very dangerous and why it should not be used in this type of cases. You should only use it when you have a pointer and you need to read that memory location in a certain way and you know that the memory can be read in that way.

dynamic vs. static cast

In some situations, you may not be sure whether to use a dynamic cast or a static cast. The advantage of using a dynamic cast is that it allows to check whether a conversion has succeeded during run-time. The disadvantage is that there is a performance overhead associated with this check.

If you want to cast a derived class to its base class, a dynamic cast as well as a static cast will give you the right result. As the dynamic cast coms with a performance overhead you should use a static cast in this case.

If you want to cast a base class to its derived class the conversion may be succeed or fail depending whether your class is of the expected type or not. A dynamic cast will check whether the conversion is possible or not. Therefore, you should prefer a dynamic cast in this situation as a static cast may result in undefined behavior.

The following example contains both situations, a derived-to-base cast and a base-to-derived cast.

class Animal
{
public:
  virtual void Print() const
  {
    std::cout << "animal" << std::endl;
  };
};

class Bird : public Animal
{
public:
  void Print() const
  {
    std::cout << "bird" << std::endl;
  }
};

int _tmain(int argc, _TCHAR* argv[])
{
  Bird bird = Bird();
  
  Animal animal1 = static_cast<Animal>(bird);  // OK
  Animal* animal2 = dynamic_cast<Animal*>(&bird);  // OK, but slower

  Bird* bird1 = static_cast<Bird*>(animal2);   // may result in undefined behavior
  Bird* bird2 = dynamic_cast<Bird*>(animal2);  // checks whether conversion is possible
    
  return 0;
}

Disadvantages of c style casts

As mentioned before you should avoid c style casts. They are not bad at all or will inevitably lead to errors, but compared to the new c++ style cast they have some disadvantages and therefore a higher possibility of misuse. Following I want to show some examples to explain the disadvantages of c style casts.

We will start with a typical situation: existing code will be changed. Of course, this will happen often in daily business and unfortunately such changes, even little ones, may introduce new errors. The following example shows a typical situation where a function “foo” is called which calls another function “bar”. As bar expects a derived class we must convert the parameter. Later, the function foo must be changed a little bit and the parameter should be passed as const parameter. The function “foo2” will show this change. Unfortunately, by using the c style cast, we will cast the constness of the parameter away. Depending on the implementation details of “foo2” and “bar” this may lead to unexpected behavior.

class Animal{};
class Bird : public Animal{};

void Bar(Bird* bird)
{
}

// origin function with c style cast
void Foo(Animal* animal)
{  
  Bird* bird = (Bird*)animal;
  Bar(bird);
}

// new function with c style cast
void Foo2(const Animal* animal)
{
  Bird* bird = (Bird*)animal;
  Bar(bird);
}

int _tmain(int argc, _TCHAR* argv[])
{
  Bird bird = Bird();  
  Foo(&bird);
  
  return 0;
}

If we have the same situation but implement it by using a c++ style cast, the compiler will detect such an issue and you get a compiler error. The following code shows the same example with a c++ style cast.

class Animal
{ 
public:
  virtual ~Animal(){}; 
};
class Bird : public Animal{};

void Bar(Bird* bird)
{
}

// origin function with c++ style cast
void Foo(Animal* animal)
{  
  Bird* bird = dynamic_cast<Bird*>(animal);
  Bar(bird);
}

// new function with c++ style cast
void Foo2(const Animal* animal)
{
  Bird* bird = dynamic_cast<Bird*>(animal);  // compiler error
  Bar(bird);
}

int _tmain(int argc, _TCHAR* argv[])
{
  Bird bird = Bird();  
  Foo(&bird);
  
  return 0;
}

In c we have no classes. But we can use c style cast to convert base-to-derived class types and vice versa. How is this possible? If you think about this question you found another issue regarding c style cast. As seen within the section “dynamic vs. static cast” a derived-to-base cast is harmless. That’s also true if you use c style casts. But a base-to-derived class may result in undefined behavior. If you use a c style cast in this case, it behaves like a reinterpret cast. And unfortunately, this will often result in undefined behavior, as many developers will expect that the cast is done well and they not expect a reinterpret cast in this situation. The following example shows such a situation.

class Animal
{
public:
  virtual void Print() const
  {
    std::cout << "animal" << std::endl;
  };
};

class Bird : public Animal
{
public:
  void Print() const
  {
    std::cout << "bird" << std::endl;
  }
};

class Fish : public Animal
{
public:
  void Print() const
  {
    std::cout << "fish" << std::endl;
  }
};

void Foo(const Animal* animal)
{
  animal->Print();

  const Bird* bird = (Bird*)(animal);
  if (bird)
  {
    bird->Print();
  }

  const Fish* fish = (Fish*)(animal);
  if (fish)
  {
    fish->Print();
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  Bird bird = Bird();
  Fish fish = Fish();

  Foo(&bird);
  Foo(&fish);

  return 0;
}

If you implement the above example by using c++ style casts you will explicitly select one of the four cast operators. In this case the dynamic cast will be your choice. This explicit operator selection is a big advantage of c++ style casts over c style casts. It will make the code more robust, as the compiler is now able to detect programming errors, and it will make the code more readable as it now contains an explicit information what we want to do.

If you use c style casts, you sign a contract with your compiler and promise: “I know what I am doing”. This may be fine four you and I’m sure you really know what you are doing. But unfortunately, in the same moment you add something more to this contract: “I also know what other programmers are doing and I know they know what I am doing.” This sound like a strange promise which cannot be fulfilled. But as you are most often not the only developer ever touching this code, you will implicitly give this promise by using c style casts.

Summary

Using the four c++ style casting operator’s makes the code more readable and more maintainable. It makes the logic behind the code more explicit. And it makes the code less error-prone by having the compiler catch errors either as you’re making them or later as you go back and change old code.

But on the other hand, you should not often have to use cast operators as they indicate software design issues.

In summary, I want to define three simple guidelines:

First guideline: Do not use casts. Instead check whether the software design is suitable.

Second guideline: If you must cast, use the c++ style cast operators.

Third guideline: Avoid dynamic cast in performance sensitive code.

Veröffentlicht unter C++ | Kommentar hinterlassen

Casting in C++, Part 1

The rules of C++ are designed to guarantee that type errors are impossible. Unfortunately, casts subvert the type system. That can lead to all kinds of trouble. Casting is a fundamental concept but it should be used with caution. A good software design helps to reduce the number of needed casts but sometimes they are needed. Within this article I want to analyze the different types of casting, how they are used and what kinds of issues may occur on unconsidered use of casts.

 

Types of casts

In c++ applications, you can find some different kinds of casts. There are implicit casts, c style casts, c++ functional cast expressions and c++ style casts. The following source code shows a conversion from a float into an int value by using the different kinds of casts.

int _tmain(int argc, _TCHAR* argv[])
{
  float a = 17.4f;
  int b;
  
  b = a;                      // implicit conversion
  b = (int)a;                 // c style cast
  b = int(a);                 // c++ functional cast expression
  b = static_cast<int>(a);    // c++ style cast

  return 0;
}

 

At the end of the article we will see this example again and identify the cast type to prefer. Now, let’s start by thinking about the different types of casts and understand their behavior.

The implicit cast “b = a” may or may not have side effects depending on the types of a and b. For example, an implicit cast from float to int results in a loss of data precision. As this may not be intended by the developer most compilers will generate compiler warnings in this case.

C style casts “(T)expression” are available to ensure downward compatibility with c. Unlike in c++, there was only one operator in c to convert data types. Since c does not know any classes and therefore no methods, the type conversion is correspondingly limited to the base data types. In c++, the c cast was extended to the c++ structures. Thus, you can do almost anything with a c cast as with one of the four c++ cast operators.

The functional cast expression used in this example “T(expression)” consists of a simple type specifier followed by a single expression in parentheses. This cast expression is exactly equivalent to the corresponding c style cast. Furthermore, the functional cast expression is available in some other variants, for example more than one expression can be used inside the parentheses. This cast looks like a constructor. And often it is as it results in a constructor call. Conversion is a form of initialization. When a type is implicitly convertible to another, a functional cast is a form of direct initialization. The compiler knows which types are convertible.

The c++ style cast – also called new style cast – offers four new cast forms: const_cast(expression), dynamic_cast(expression), reinterpret_cast(expression) and static_cast(expression). The c++ style cast will lead to compiler errors when used incorrectly. In addition, readers of the source code are roughly informed about the intention of the cast as one out of the four cast forms is explicitly used.

In c++ you should use the new style casts. We will see explanations for this guideline within the further course of the article. Therefore, the following paragraphs are focused on the c++ style casts. At first, we want to understand the for new cast forms. Afterwards, we will come back to the c style cast and see some of the issues which may occur and how these are avoided by using the new style casts.

 

static_cast

The static cast operator is the most popular and most commonly used one. It converts data type by using an existing conversion rule. Such a rule may be for example a constructor or an overloaded cast operator. The static cast is done at compile time.

The elementary data types can be largely converted into one another. However, this also clearly shows the greatest disadvantage of a type transformation: it almost always goes with data loss. For example, if you convert a floating-point number to an integer, all decimal places are lost.

The following example shows a typical use case for the static cast operator: a division of two integers where the result shall be a floating-point number.

int _tmain(int argc, _TCHAR* argv[])
{
  int x = 10;
  int y = 11;
  
  double z = x / y;
  std::cout << z << std::endl;

  z = static_cast<double>(x) / y;
  std::cout << z << std::endl;

  return 0;
}

 

The first output is “0” because the implicit cast will remove the decimal places. The second output is “0.9” because with the static cast the variable was casted to a floating-point number.

 

The static cast can be used to force implicit conversions like from int to double or from non-const to const objects. So, you can use it to add constness to an object. Furthermore, the static cast can be used to perform conversions like void* pointers to typed pointers and pointer-to-base to pointer-to-derived. The following example shows such a type conversion from a derived child class to the base class.

class Animal 
{
public:
  virtual void Print() const
  {
    std::cout << "animal" << std::endl;
  };
};

class Bird : public Animal 
{
public:
  void Print() const
  {
    std::cout << "bird" << std::endl;
  }
};

int _tmain(int argc, _TCHAR* argv[])
{
  Bird bird = Bird();
  Animal animal = static_cast<Animal>(bird);
  
  bird.Print();
  animal.Print();

  return 0;
}

 

Within the above example we converted an object of child class to an object of a base class. What do you think will happen if we use pointers instead and additional add to opposite use case and convert from base to child? The following example will show some interesting results.

class Animal
{
public:
  virtual void Print() const
  {
    std::cout << "animal" << std::endl;
  };
};

class Bird : public Animal
{
public:
  void Print() const
  {
    std::cout << "bird" << std::endl;
  }

  void Print2() const
  {
    std::cout << "bird2" << std::endl;
  }
};

class Car
{
public:
  void Print() const
  {
    std::cout << "car" << std::endl;
  }
};

int _tmain(int argc, _TCHAR* argv[])
{  
  Bird* pBird = new Bird();
  Animal* pAnimal = static_cast<Animal*>(pBird);  // this works
  
  pBird->Print();
  pAnimal->Print();
  //pAnimal->Print2();  // ERROR - Won't compile   

  pBird = static_cast<Bird*>(pAnimal); // this works
  pBird->Print();
  
  pAnimal = new Animal();
  pBird = static_cast<Bird*>(pAnimal); // this works
  pBird->Print();
  pBird->Print2();
  
  //Car* pCar = static_cast<Car*>(pBird); // ERROR - Won't compile   
  
  return 0;
}

 

The application creates the following output:

bird
bird
bird
animal
bird2

 

We can see that the first conversion converts the pointer to a base class pointer. Of course, we cannot call the child class method “Print2” but if we call “Print” we can see that the child class function is still used. At next we convert this pointer back to the child class pointer which is fine as the underlying object is of type “Bird”. Within the last use case we create a base class and convert the pointer of the base class to a pointer of the child class. This will work but interestingly the function calls will result in a call of the base class function “Print” and the child class function “Print2”. This example will show a very important fact: you can use static cast operator to cast object pointers and to cast a base class pointer to a child class pointer, but this may result in unexpected or even undefined behavior. Later, if we look at the dynamic cast, we will see another example which compares dynamic and static casts in case of converting from base to child class. You may now think the compiler should show a warning or error in this case, but the compiler will allow downcasts to derived class. It cannot give compile time error because a base-derived relationship can exist at runtime depending on the address of the pointers being casted. Therefore, such static casts always succeed at compile time, but may raise undefined behavior at runtime if you don’t cast to the right type. If we want to detect and avoid casting errors at runtime, we must use a dynamic cast.

 

dynamic_cast

The dynamic cast operator is the only cast operator which is performed at runtime. This will allow a to write code which depends on the runtime state of a variable. Pointers and references can contain objects of their own type and objects of a derived class type. Therefore, it is possible, that the type of a variable is not known at compile time because at runtime the variable may be of the given type or of one of the existing derived types. The dynamic cast operator allows a safe downcast of a variable in one of the derived types.

The dynamic cast is the only cast operator which cannot be performed using the old c style syntax. It is also the only cast that may have a significant runtime cost. Therefore, you should not use it in situations where a static cast is sufficient.

If you want to downcast types of an inheritance architecture by using the dynamic cast, you must implement polymorph classes, which means they must have at least one virtual method. For example, you can write a virtual destructor.

The following source code shows an object hierarchy with polymorph classes.

class Animal
{
public:
  virtual void Print() const
  {
    std::cout << "animal" << std::endl;
  };
};

class Bird : public Animal
{
public:
  void Print() const
  {
    std::cout << "bird" << std::endl;
  }
};

class Fish : public Animal
{
public:
  void Print() const
  {
    std::cout << "fish" << std::endl;
  }
};

 

We can use these classes to write an example application. Within this example we will write a function with an input parameter of type “Animal”. Of course, we can pass variables of the derived classes “Bird” and “Fish” to this function too. Within the method we can convert the animal-object back to the derived class by executing a downcast using the dynamic cast operator. If the cast is successful, the result is a pointer to the object but now it is of the type given within the cast. If the object has another type than given in the cast operator, the result is a null pointer. Therefore, after executing a dynamic cast, you must check whether the resulting pointer is valid or a null reference. The following source code shows an according example.

void Foo(const Animal* animal)
{
  animal->Print();

  const Bird* bird = dynamic_cast<const Bird*>(animal);
  if (bird)
  {
    bird->Print();
  }

  const Fish* fish = dynamic_cast<const Fish*>(animal);
  if (fish)
  {
    fish->Print();
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  Bird bird = Bird();
  Fish fish = Fish();

  Foo(&bird);
  Foo(&fish);

  return 0;
}

 

Of course, it is also possible to use references instead of pointers. In this case the dynamic cast will throw an exception if the cast is not possible. The source code shows the adapted example.

void Foo(const Animal& animal)
{
  animal.Print();

  const Bird bird = dynamic_cast<const Bird&>(animal);  //throws an exception if called with 'fish'
  const Fish fish = dynamic_cast<const Fish&>(animal);  //throws an exception if called with 'bird'
}

int _tmain(int argc, _TCHAR* argv[])
{
  Bird bird = Bird();
  Fish fish = Fish();

  Foo(bird);
  Foo(fish);

  return 0;
}

 

As mentioned above, dynamic cast are only available for polymorphic classes and dynamic cast can have significant runtime costs. This is because the RTTI (run-time type information) features are needed to safely convert the types. In practice, you will often have polymorphic classes anyway because base classes must have a virtual destructor to allow objects of derived classes to perform proper cleanup if they are deleted from a base pointer. So, this is not a limitation. But due to the runtime costs you should not execute dynamic casts if they are not needed.

The above source code shows a typical situation where dynamic cast is used. You will find such method in a lot of example applications, tutorials, books and of course in professional applications. But I think the method shown in the example is an example for bad software design too. The interface of the function tells the client that it expects an “Animal” object, but the implementation expects a “Bird” or “Fish”. This contradiction between interface design and functionality can lead to many issues and even unexpected behavior of your application. As this article is about casting and not about software design I don’t want to go further into this topic. But you should keep in mind, and that’s true for all four cast operators, that the need of a cast operator may be an indication for a bad software design. In source code with a clean design there is nearly no need for the cast operators.

Within the previous chapter, about the static cast, I mentioned that the static cast should not be used for a downcast. Therefore, to finish this topic we can test the behavior of the example application in case we use the static cast instead of the dynamic cast. The following source code shows a possible modification of the example application.

void Foo(const Animal* animal)
{
  const Bird* bird = static_cast<const Bird*>(animal);
  if (bird)
  {
    bird->Print();
  }

  const Fish* fish = static_cast<const Fish*>(animal);
  if (fish)
  {
    fish->Print();
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  Bird bird = Bird();
  Fish fish = Fish();

  Foo(&bird);
  Foo(&fish);

  return 0;
}

 

This looks like a valid implementation and it will compile without errors. It even cannot give compile time errors because a base-derived relationship can exist at runtime depending on the address of the pointers being casted. Static cast always succeeds, but will raise undefined behavior if you don’t cast to the right type. If you execute the example application you will get outputs you may not expect.

 

Preview on part 2

As we have seen two of the four cast operators now, I want to finish this first part of the topic. Within a next blog article, I want to continue this topic and explain the const cast operator and the reinterpret cast operator. Furthermore, I want to include a comparison of dynamic cast and static cast based on an example with base-to-derived and derived-to-base class conversion. Moreover part 2 will contain some more examples showing the disadvantages of the c style cast.

Veröffentlicht unter C++ | Kommentar hinterlassen