std::atomic

Within multithreading applications even trivial functions like reading and writing values may create issues. The std::atomic template can be used in such situations. Within this article I want to give a short introduction into this topic.

 

Atomic operations

Let’s start with an easy example. Within a loop we will increase a value by using the “++” operator. The loop will be executed by several threads.

int value;

void increase()
{
	for (int i = 0; i < 100000; i++)
	{
		value++;
	}
}

int main()
{
	std::thread thread1(increase);
	std::thread thread2(increase);
	std::thread thread3(increase);
	std::thread thread4(increase);
	std::thread thread5(increase);

	thread1.join();
	thread2.join();
	thread3.join();
	thread4.join();
	thread5.join();

	std::cout << value << '\n';

  return 0;
}

 

We expect an output of “500000” but unfortunately the output is below this value and it will be different on each execution of the application.

That’s because the “++” operation is not atomic. To keep it simple we can say this operation consist of three steps: read value, increase value and write value. If these steps are executed in parallel, several threads may read the same value and increase it by one. Therefore, the result is much lower than expected.

The same issue may occur on simple read and write mechanisms too. For example, thread 1 sets a value and thread 2 reads a value. In case the value type does not fit into a processor word, even such trivial read and write operations are not atomics. Accessing a 64bit variable in a 32bit system may result in incomplete values.

In such cases we need a synchronization mechanism to prevent the parallel execution. Of course, we could use standard lock mechanism. Or we could use the std::atomic template. This template defines an atomic type and guarantees atomicity of the operations of this type. To adapt the previous example, we just have to change the data type.

std::atomic<int> value;

void increase()
{
	for (int i = 0; i < 100000; i++)
	{
		value++;
	}
}

int main()
{
	std::thread thread1(increase);
	std::thread thread2(increase);
	std::thread thread3(increase);
	std::thread thread4(increase);
	std::thread thread5(increase);

	thread1.join();
	thread2.join();
	thread3.join();
	thread4.join();
	thread5.join();

	std::cout << value << '\n';

	return 0;
}

 

Now the result is like expected. The output of the application is “500000”.

 

std::atomic

As mentioned before, the std::atomic template defines a atomic type. If one thread writes to an atomic object while another thread reads from it, the behavior is well-defined. The operation itself has a guaranteed atomicity. Furthermore, read and writes to several different objects can have a sequential consistency guarantee. This guarantee is based on the selected memory model. We will see according examples later.

 

Read and write values

A common scenario in multithreading applications is the parallel read and write of variables by different threads. Let’s create a simple example with two threads, one is writing variable values and the other one reads these values.

int x;
int y;

void write()
{
	x = 10;
	y = 20;
}

void read()
{
	std::cout << y << '\n';
	std::cout << x << '\n';
}

int main()
{
	std::thread thread1(write);
	std::thread thread2(read);

	thread1.join();
	thread2.join();

  return 0;
}

 

Within this kind of implementation, the behavior is undefined. Depending on the order of the threads the output may be “20/10”, “0/0”, “0/10” or “20/0”. But beside these expected results it may happen that a read is done during a write and a therefore an incomplete written value is used. Within the example this should not happen but ex described before depending on the used processor and value types it may happen. Therefore, we can say that the behavior of the application is undefined.

By using the std::atomic template we could change the undefined behavior to a defined on. We just have to define the variables as atomic and use the according read and write functions.

std::atomic<int> x;
std::atomic<int> y;

void write()
{
	x.store(10);
	y.store(20);
}

void read()
{
	std::cout << y.load() << '\n';
	std::cout << x.load() << '\n';
}

int main()
{
	std::thread thread1(write);
	std::thread thread2(read);

	thread1.join();
	thread2.join();

  return 0;
}

 

Now, the behavior of the application is well defined. Independent from the used processor and independent from the data type (we could change from “int” to any other type), we have a defined set of results. Depending on the order of the threads the output may be “20/10”, “0/0”, “0/10”. The output “20/0” is not possible because the default mode for atomic loads and stores enforces sequential consistency. That means, within this example, x will always be written before y is changed. Therefore, the output “0/10” is possible but not “20/0”. As the std::atomic template ensures an atomic execution of the read and write functions we don’t have to fear undefined behavior due to incomplete data updates. So, we have a defined behavior with three possible results.

 

Memory model

As mentioned before, the selected memory model will change the behavior of atomic read and write functions. By default, the memory model ensures a sequential consistence. This may be expensive because the compiler is likely to emit memory barriers between every access. If your application or algorithm does not need this sequential consistency you can set a more relaxed memory model.

For example, if it is fine to get “20/0” as result within our application, we can set the memory model to “memory_order_relaxed”. This removes the synchronization and ordering constraints, but operation’s atomicity is still guaranteed.

void write()
{
	x.store(10, std::memory_order_relaxed);
	y.store(20, std::memory_order_relaxed);
}

void read()
{
	std::cout << y.load(std::memory_order_relaxed) << '\n';
	std::cout << x.load(std::memory_order_relaxed) << '\n';
}

 

Another interesting memory model is the Release-Acquire ordering. You can set the store functions within the first thread to “memory_order_release” and the load functions in the second thread to “memory_order_acquire”. In this case all memory writes (non-atomic and relaxed atomic) that happened before the atomic store in thread 1 are executed before load in thread 2 is executed. This takes us back to the ordered loads and stores, so “20/0” is no longer a possible output. But it does so with minimal overhead and an increased execution performance. Within this trivial example, the result is the same as with the full-blown sequential consistency. In a more complex example with several threads reading and writing all or some of the variables, the result may be different from the default sequential consistency.

void write()
{
	x.store(10, std::memory_order_release);
	y.store(20, std::memory_order_release);
}

void read()
{
	std::cout << y.load(std::memory_order_acquire) << '\n';
	std::cout << x.load(std::memory_order_acquire) << '\n';
}

 

As mentioned before, the Release-Acquire ordering ensures that all memory writes before the atomic store are executed, even non-atomic ones. So we could change the application and use a non-atomic type for x. The atomic store of y still ensures the correct write order.

int x;
std::atomic<int> y;

void write()
{
	x = 10;
	y.store(20, std::memory_order_release);
}

void read()
{
	std::cout << y.load(std::memory_order_acquire) << '\n';
	std::cout << x << '\n';
}

int main()
{
	std::thread thread1(write);
	std::thread thread2(read);

	thread1.join();
	thread2.join();

	return 0;
}

 

But again, in a more complex scenario with several threads and maybe a read of x only, it could be necessary to define the x as atomic.

 

Synchronization of threads

A common implementation technique for thread synchronization are locks. By using the atomic template, you can create such thread synchronizations without locks. Depending on the selected memory model this may increase the performance of your application.

The following application shows an example how to use the atomic template to synchronize the execution of two threads. Thread 2 will wait until thread 1 finished execution. This is done by using a variable which contains the execution state. Furthermore thread 1 commits its results within a data variable. The used Release-Acquire ordering mechanism of the atomic template ensures that the data is written before the synchronization flag is set.

int data;
std::atomic<bool> ready;

void write()
{
	data = 10;
	ready.store(true, std::memory_order_release);
}

void read()
{
	if(ready.load(std::memory_order_acquire))
	{ 	
		std::cout << data << '\n';
	}
}

int main()
{
	std::thread thread1(write);
	std::thread thread2(read);

	thread1.join();
	thread2.join();

	return 0;
}

 

Summary

This article gave a short introduction into the std::atomic template based on some common use cases. The examples name some major issues regarding data access in multithreading applications and introduce according implementations based on the atomic template. Of course this article is an introduction only. The atomic template offers some more features, for example there exist more memory model configurations beside the three shown within this article.

Werbeanzeigen
Veröffentlicht unter C++ | Kommentar hinterlassen

Deconstruction in C# 7

C# 7 introduces a nice syntax to deconstruct a class and access all members. If you want to support deconstruction for an object you just must write a “Deconstruct” method which contains one or more out parameter which are used to assign class properties to variables. This deconstruction is also supported by the new Tuple struct introduced with C# 7. The following example shows the deconstruction feature used for a tuple and for an own class.

static void Main(string[] args)
{
  (int x, int y) = GetTuple();

  (double width, double height) = GetRectangle();

  Rectangle rect = new Rectangle(1, 2);
  (double a, double b) = rect;
}

private static (int x, int y) GetTuple()
{
  return (1, 2);
}

private static Rectangle GetRectangle()
{
  return new Rectangle(5, 8);
}

class Rectangle
{
  public Rectangle(double width, double height)
  {
    Width = width;
    Height = height;
  }

  public double Width { get; }
  public double Height { get; }

  public void Deconstruct(out double width, out double height)
  {
    width = Width;
    height = Height;
  }
}

 

The assignment of the class properties to the variables is done by the position. So, the variable names and the class property names or tuple element names must not match. Furthermore, it is possible to write several “Deconstruct” methods with different number of elements.

In the first moment this looks like a nice feature which increases the readability of the source code. But it comes with a big issue. As the deconstruction is done be the element positions only it is a source for errors. If you switch elements by mistake, you create an issue which cannot be found by the compiler if the types are equal. This may cost you a lot of debugging time because such a mistake isn’t easy to find. The following code will show this disadvantage. Both lines of code are valid but the second one will result in runtime errors as you mixed up the parameters.

static void Main(string[] args)
{
  (double width, double height) = GetRectangle();

  (double height, double width) = GetRectangle();
}

 

Furthermore, you can get in trouble in case someone changes a class you are using. For example, several class properties get changed as result of a refactoring. So, their names and meanings will be changed. If you use this class within you code and access the class properties your compiler will show you according error messages as the property names have changed. But if you use deconstruction and the number and type of parameters haven’t changed you will get in trouble. Names and meanings of class elements have changed but the deconstruction method does only consider number and type of parameters. So, neither the compiler nor you will suspect any issues if you use the refactored class. But during runtime of your application you can expect some unpleasant surprises followed by hours of debugging and bug fixing.

Let’s look at the example with the rectangle. The developer of this class has done a refactoring and now the rectangle is represented as vector with an angle and a length. But unfortunately, the Deconstruction interface hasn’t changed and so you do not get any compilation errors.

static void Main(string[] args)
{
  (double width, double height) = GetRectangle();
}

private static Rectangle GetRectangle()
{
  return new Rectangle(5, 8);
}

class Rectangle
{
  public Rectangle(double vectorAngle, double vectorLength)
  {
    VectorAngle = vectorAngle;
    VectorLength = vectorLength;
  }

  public double VectorAngle { get; }
  public double VectorLength { get; }

  public void Deconstruct(out double vectorAngle, out double vectorLength)
  {
    vectorAngle = VectorAngle;
    vectorLength = VectorLength;
  }
}

 

In my opinion the deconstruction feature comes with more disadvantages then advantages. The advantage of a little bit shortened source code is way too small compared to the disadvantage of possible hard to find implementation Errors.

Veröffentlicht unter C# | Kommentar hinterlassen

Local or nested functions in C# 7

With C# 7.0 it is possible to create functions nested within other functions. This feature is called “local functions”. The following source code shows an according example.

static void Main(string[] args)
{
  int x = DoSomething();

  //int y = Calc(1, 2, 3);  // compiler error because Calc does not exist within this context
}

private static int DoSomething()
{
  int shared = 15;

  int x = Calc(1, 2, 3);

  return x;

  int Calc(int a, int b, int c)
  {
    return a * b + c + shared;
  }
}

 

The function “Calc” is nested within the function “DoSomething”. As a result, “Calc” is available in the function scope only and cannot be used outside of “DoSomething”. Inside the enclosing function, you can place the nested function wherever you want. Furthermore, the nested function has access to the variables of the enclosed function.

In my opinion local functions are a nice feature. A common software design concept is to hide internal implementation details of software components by limiting the scope of internal components. If you have functions which are needed inside a class only, you can limit the scope to this class by defining a private function. But so far this was the lowest possible scope. If you have created sub-functions which were needed by a single function only, you had to publish these functions on class scope too, or use other techniques like lambda functions. With C# 7 you are now able to limit the scope of such nested functions to the scope of the enclosed function.

I like local functions because you are now able to define a proper scope. But there are two features which I don’t like because they may result in source code which could be difficult to understand. One of these two features is the possibility to place the nested functions wherever you want. In my opinion you should place them always at the beginning or at the end. I prefer the end, after the return statement. This will increase the readability of the source code because at first you have the code of the main function itself and then the code of the internal sub-functions. So, you create a clear separation between these different concerns and you have reading order from the enclosed main function to the nested local functions. The second feature which I don’t like is the possibility to access variables of the enclosed function. Based on a technical point of view this feature is fine, because the nested function is part of the enclosing functions scope and therefore can access elements of this scope. But from a software design point of view it is a questionable feature because you can create a lot of dependencies between the enclosed function and several local functions. This can result in difficult code with a spaghetti like data flow. An alternative is to pass variables as parameters to the local functions and create a clear separation between the different functions and their concerns. In my opinion you should therefore use this feature very wisely and only if it really comes along with some advantages.

A typical use case for local functions are recursive function calls. Often you will have a public main function which should do some calculation. This function may now use recursion function calls to execute the calculation. So, you will create an internal function used for these recursive calls and this internal function often contains parameters with interim results or recursion status information. Therefore, such a function should never be part of the public interface. On the contrary, it may only be used meaningful within the scope of the main function. Now you will have the possibility to implement it according to this need and create a local function. The following example shows an according use case. Within a tree structure we will get the maximum depth of the tree.

static void Main(string[] args)
{
  TreeElement tree = new TreeElement();
  TreeElement sub1a = new TreeElement();
  TreeElement sub1b = new TreeElement();
  TreeElement sub2 = new TreeElement();

  sub1b.SubElements = new List();
  sub1b.SubElements.Add(sub2);

  tree.SubElements = new List();
  tree.SubElements.Add(sub1a);
  tree.SubElements.Add(sub1b);

  int depth = GetDepth(tree);
  Console.WriteLine("Depth: " + depth);
}

public class TreeElement
{
  public List SubElements;
}

private static int GetDepth(TreeElement tree)
{
  return GetDepth(tree, 1);

  int GetDepth(TreeElement subTree, int actualDepth)
  {
    int maxDepth = actualDepth;

    if (subTree.SubElements == null)
    {
      return actualDepth;
    }

    foreach (TreeElement element in subTree.SubElements)
    {
      int elementDepth = GetDepth(element, actualDepth + 1);
      if (maxDepth < elementDepth)
      {
        maxDepth = elementDepth;
      }
    }

    return maxDepth;
  }
}

 

The calculation of the depth is done by recursive function calls which step into the sub elements. If a tree leaf is reached, the recursive function will return its current depth. Therefore, the actual depth of the sub-tree is passed as parameter to the recursive function. Of course, you could use the recursive function in the public interface too but in this case the user of the interface will see the depth parameter too which is needed internally only. You could set the default value of this parameter to “1” and explain it within a documentation but nevertheless you interface gets dirty as it contains parameters which are necessary due to internal needs only. By using a local function, you can offer a clean interface and hide the internal implementation details. Furthermore, this internal helper function is visible within the containing function scope only and will therefore neither be visible in the public interface nor in the private interface of the surrounding class.

Veröffentlicht unter C# | Kommentar hinterlassen

The Visitor Pattern – part 9: summary

This article is one part of a series about the Visitor pattern. Please read the previous articles to get an overview about the pattern and to get the needed basics to understand this article.

As this article is the last one of the series I want to give a short retrospective and summarize the advantages and disadvantages of the Visitor pattern.

 

Retrospective

The article series started with a short introduction of the Visitor pattern, shows its strength but also explained the reasons for existing confusions and misunderstandings regarding this pattern. To remove these misunderstandings, we started with an investigation regarding the technical needs for this pattern and explored the difference of single dispatch and double dispatch.

Based on the technical needs we could derive the aims of the pattern. So, the article series continued with explanations and examples for the two main needs: enumeration and dispatching. A further article combined these two aspects to create a wholesome Visitor pattern example.

Based on this Visitor pattern example we investigated some additional questions often seen in practical application of the pattern. We created an extended enumerator Visitor with access to container elements which are not part of the visible objects and we analyzed the pros and cons of a monolithic single Visitor versus small specialized Visitors.

At next there followed articles about topics regarding the code quality. We learned how to create reusable enumerator and query visitors following the separation of concerns rule. And we have seen some examples which can lead to over engineering if we use the Visitor pattern.
 

Based on these articles I want to summarize the advantages and disadvantages of the pattern. Additional, as the pattern is a good candidate for over engineering, I want to name some use cases for alternative patterns.

Within the articles of this series we have seen some different implementations of the pattern. They all following the same main principles but have some differences in implementation according the practical needs of the respective use cases. Therefore, I don’t want to finish the article with a default implementation of the pattern as there is no singe default implementation. Instead I want to summarize the implementation aspect by list the involved components, the concepts to connect these components and I want to give some recommendations according the naming of the Visitor interfaces and methods.

 

Advantages

In object-oriented implementations, data classes often inherit from base classes or interfaces. Data containers which hold collections of these data classes often store pointers to the base classes. This object-oriented concept works fine as long as no type specific methods are needed. In case the client wants to call type specific methods he must cast the base class pointer to the specific type. This may result in code which is hard to understand, hard to maintain and has a good chance for errors.

The Visitor pattern resolves this issue. It allows a type safe access to the origin data type without the need of casting. Furthermore, the Visitor pattern offers traversing mechanism. As result, the pattern allows an easy access of data elements independent whether they are stored in a simple list or a complex container and independent whether they are stored in origin type or as pointers to any base type.

 

Disadvantages

It isn’t easy to find the right balance between many small specialized Visitor interfaces and a few or a single monolithic wide Visitor interface. And both approaches came with drawbacks. Many small interfaces increasing the complexity of the system and few wide interfaces lead to implementations containing empty method implementations.

The Visitor pattern traverse elements of a data container. This traversing is done via double dispatch. As a result of this dispatching, the traversing is some kind of hidden feature. A client implements the dispatching callbacks but he may easily overlock that the callback is one step of a traversing over a data container. On one hand that’s fine because the client should not have to think about the data structures, but on the other hand this may cause issues. Like in any other traversing mechanism, the client should not change the data container during traversing. So, the dispatching callback methods of the client should not change the data container itself. Furthermore, a traversing step should normally not execute long running complex functionalities. In summary, the traversing is some kind of hidden feature for a client developer, but it is an important software design fact which has to be respected by the client developer. This contradiction may result in bad software designs, especially if the developer isn’t that familiar with the Visitor pattern.

 

Alternatives

The Visitor pattern is a powerful but complex pattern. It unites two functionalities: type conversation by dispatching and traversing of complex data structures. But if only one of these functionalities is needed, you will find more specialized patterns which are less complex. If you want to traverse over a data structure only, you should implement an Iterator pattern. And if you want to do the dispatching only, you may use Callback mechanisms.

 

Components and Dependencies

The Visitor pattern contains the following components.

  • Data Elements
  • Data Container
  • Enumerator
  • Algorithm

The Data elements are the Visitable components and the Enumerator and Algorithm together are the Visitor component which execute some functionality based on the data elements. As explained and shown within the article series, I prefer a clear separation of concerns. Therefore, I recommend implementing the Algorithm aspect and the Enumeration aspect in different and independent components. Different Algorithms and different Enumeration can be loosely coupled by composition according the use case specific needs. A strong coupling by using inheritance should only be used in case there is no need for a reuse of any of the components.

 

Naming

At the very beginning of the article series I mentioned that, in my opinion, the Visitor pattern is one of the most misused and misunderstand patterns and the main source of this issue is the misleading naming of the pattern interfaces and methods.

Names of methods, interfaces and components should reflect their purpose. Therefore, I recommend avoiding the naming which is used within the origin Visitor pattern as this naming is meaningless. Following I would give you some naming I like to use. But of course, feel free to find your own naming.

As the name Visitor is well known and most software developers know the pattern, you should use this name anyway, even if it is universal and therefore some kind of meaningless. You should use the name but extend it with a more describing naming. As mentioned before, we have a separation of concerns and we may create specialized visitors. Therefore, we may create visitors for following purposes: enumeration, queries, data updates and so on. So, you should not implement a “IVisitor” interface or a “Visitor” component, but specialized ones like a “CustomerEnumerationVisitor” or a “OrderQueryVisitor”. Some may argue that implementation details should not part of naming. That’s totally fine, but as the Visitor contains the hidden enumeration feature it may be very useful for a client developer to know that he uses a component which is implemented as Visitor. Therefore, in this case it is fine to add the “Visitor” prefix to the component name.

The data elements will be passed to the Visitors. Therefore, I would call them “Visitable”. This results in according interface names like “IVisitableCustomer” or “IVisitableOrder”.

But what’s with the method names? The origin method names are “accept” and “visit”. To be honest, I think these names are terrible. They don’t say anything about their purpose. So, let’s think about the purpose of the methods and find some better names. The Visitor pattern is implemented by double dispatch. For this purpose, the data elements offer a method to get the instance of the data element. This getter method will pass the element instance to an instance receiver method which is provided by the Visitor. The method names should reflect this dispatching process. So, let’s call the method to get the element instance “GetInstance” and call the method which receives these instance “InstanceReceiver”.

By using such a clear naming, you could avoid some of the misunderstanding and errors which are a result of the complexity of the Visitor pattern.

 

Conclusion

The Visitor pattern is one of the base implementation patterns in software development. So, it should be part of the tool kit of a good software engineer. But unfortunately, the Visitor Pattern comes with a big disadvantage: it is one of the most misunderstood and misused patterns I know. This article series provided an extensive overview about the Visitor pattern and give you the needed knowledge to use the Visitor pattern within your applications.

Veröffentlicht unter C++ | Kommentar hinterlassen

The Visitor Pattern – part 8: over engineering

This article is one part of a series about the Visitor pattern. Please read the previous articles to get an overview about the pattern and to get the needed basics to understand this article.

 

Motivation

Normally, I would like to limit myself to explain the use cases of a pattern and don’t waste time to write about the cases where you should not use this pattern. But if you read about the Visitor pattern you will find a lot of statements about the purpose and advantages of the Visitor pattern which are not directly related to the pattern. This may result in implementations using the Visitor pattern instead of some other pattern which is more suitable for the specific situation. Such kind of over engineering comes with a higher complexity of the code and therefore creates more effort and is more error prone. Within this article I want to mention some of the statements you will find about the Visitor pattern, analyze them and may find better alternatives.

 

Extensibility of the elements

A lot of articles about the Visitor pattern contain the following statement: “The Visitor pattern is used to extend the functionality of data elements without changing these data elements”.

Of course, this statement isn’t wrong. But this is a side effect of the pattern only. If you want to get this feature, you don’t have to use the Visitor pattern at all. There are better alternatives.

By using the Visitor pattern, you pass the element instance to a dispatcher which will use the data to execute some query. If you want to add some new functionality, you can create a new query and you don’t have to extend the data element. But think about an implementation without the Visitor pattern. Will you implement the data query within the data class in this case? Of course not. According to the “separation of concerns” principle you create a separate query class. This class will use the data element or data structure to execute its functionality. Therefore, if you follow the base software design concepts, like “separation of concerns”, you can extend the functionality without changing the data elements. Using the Visitor pattern is over engineering in this case.

 

Extensibility of the queries

In connection with the previous statement it is also mentioned that the Visitor allows the add new queries without changing the actual implementation. Again, this statement is true but a side effect only. The separation between data elements, data structures, enumerators and queries is a base concept of the object-oriented paradigm and therefore it is a base concept in object-oriented programming languages. No additional or special implementation pattern is needed. Again, the Visitor pattern is over engineering in case the separation is the only reason using it.

 

Use case specific Visitable methods in data elements

So far, we have implemented several examples with different visitors and visitable elements. But in all examples, we have used the same base interface: the functions “GetInstance” and “InstanceReceiver” do not have a return value and the only function parameter is the object instance.

Now we want to think about the question whether it is advantageous to break this strict concept and offer use case specific functions. For example, a Visitor can be used to validate all data elements. So, we could extend the “InstanceReceiver” function and add a new parameter which contains validation information. Or we can even define a return value for this method and the element changes its internal state according the return value, for example set a validation flag.

At first this use case specific interfaces may sound fine as they fulfill the specific needs. But I would not recommend such a specialization of the interface as it comes with big disadvantages. The main purpose of the visitor pattern is the dispatching of elements. This is a very common functionality which can be used in a lot of situations. If we now mix in a use case specific interface, we limit the visitor to exactly this single case and therefore we reduce maintainability and extensibility of the implementation. Furthermore, the strength of the Visitor interface is based on the strict separation of concerns. The involved object instances are very tightly coupled. If we extend the interface and add use case specific parameters and return values, we create a strong relationship between the elements and again reduce maintainability and extensibility. Furthermore, we add some “hidden” functionality. Within the example above we added a return value to set some data within the date element after the Visitor executes something. Such object changes should be done by using setter functions or properties but nor by evaluating a return value of a function which should get the object instance. Therefore, I call it “hidden” functionality or “side effect” of the method as something is done which does not correspond with the main purpose of the method. Such side effects will reduce maintainability and are a good source for errors.

 

High effort in case a new data element is added

One disadvantage often mentioned about the Visitor pattern is the high effort resulting on a change of the data base. If you add a new data element you must add in in the Visitor interface(s) and of course adapt all visitors which implement the interface. But is this a real disadvantage of the pattern? I think no, it isn’t. Maybe it is even an advantage.

If you have a list of elements with type specific element interfaces and you want to evaluate or change the elements, you always must traverse about the list and implement type specific functionality. Independent of the used pattern or way of implementation, you must adapt or extend this implementation in case a new element type is added or an existing one is removed. So, this is a use case dependent need and not a pattern specific disadvantage.

If you use the Visitor pattern you have the advantage of a central Visitor interface. You can add the new element type to this interface, by adding a new “InstanceReceiver” function, and the compiler will show you all source code elements – all visitors – which must be adapted. If you use other implementations, like switch cases in combination with type casts, you may have to find all code elements by yourself, which is a very error prone process.

As conclusion, this often-mentioned disadvantage of the Visitor pattern isn’t valid because it is a use case specific need and not a pattern specific result. On the contrary, you can say the Visitor pattern has the advantage to support this use case in a very easy way. You just have to change the according Visitor interface and the compiler will do the critical work and find all code elements you have to change.

 

Outlook

The next article will finish the series with a summary of the whole Topic.

Veröffentlicht unter C++ | Kommentar hinterlassen

The Visitor Pattern – part 7: reusable enumerator and query visitors

This article is one part of a series about the Visitor pattern. Please read the previous articles to get an overview about the pattern and to get the needed basics to understand this article.

The example implementation of this article is focused on the Visitor pattern. To keep the example as short as possible I intentional ignored some mandatory programming guidelines. Therefore, please keep in mind that the implementation should show the idea of the Visitor pattern but it ignores things like const correctness, private members or RAII (resource acquisition is initialization).

 

Reusable visitors

Within the examples so far, we created enumerator and query visitors. This separation of concerns leads to clean, easy to understand and maintainable code. The query visitors inherit from the enumerator visitors and can use their traversing features. This will allow to implement different queries based on one enumerator. But so far it is not possible to use one query based on different enumerators. As C++ supports multiple inheritance it would be possible to inherit from several enumerators. But this will result in a complex and difficult interface of the query visitor because depending on the data structure we must use an individual subset of the interface only.

It would be better to have independent components and create a loose coupling depending on the actual use case. Therefore, we should remove the inheritance which creates a very strong coupling and use composition instead.

If we use composition, we will implement independent enumerators and queries. The enumerators will be implemented as concreate classes and no longer as abstract base classes. Furthermore, the enumerators will support an enumeration interface. The queries will no longer be coupled with a concreate data structure. Instead they use the enumeration interface to access the elements, independent of the concreate structure.

Let’s implement an according example. We will use the example application of the previous articles with the order history data structure. Additional we create a new data structure which represents the article stock. To keep it simple we will use one article only. Please keep in mind that the Visitor pattern isn’t the best choice in this kind of use cases, as the dispatching aspect disappears. I removed the dispatching aspect only to focus on the composition software design.

A query should be created which lists all book titles. This query should be used together with each of the two data structures. Therefore, we want to create independent components and create a use case specific loose coupling only.

At first, we define the interfaces. Beside the well-known default Visitor interfaces, we add an additional interface for the enumerator visitor.

class IVisitable
{
public:
  virtual void GetInstance(IVisitor* visitor) = 0;
};

class IVisitor
{
public:
  virtual void InstanceReceiver(Book* book) = 0;
};

class IVisitorEnumerator : public IVisitor
{
public:
  virtual void EnumerateAll() = 0;
};

 

At next we add the date element and two data structures. The stock is a simple list and the order history a tree-like structure.

//-------------------------------------
// Element
//-------------------------------------

class Book : IVisitable
{
public:
  Book(std::string title) : mTitle(title) {};

  std::string mTitle;

  void GetInstance(IVisitor* visitor)
  {
    visitor->InstanceReceiver(this);
  }
};

//-------------------------------------
// container 1
//-------------------------------------

class StockItem
{
public:
  StockItem(Book book, int count) : mBook(book), mCount(count) {};

  Book mBook;
  int mCount;
};

class Stock
{
public:
  std::vector mStockItems;
};

//-------------------------------------
// container 2
//-------------------------------------

class Order
{
public:
  Order(Book book, int count) : mBook(book), mCount(count) {};

  Book mBook;
  int mCount;
};

class DailyOrders
{
public:
  std::vector mOrders;
  std::string mDate;
};

class OrdersHistory
{
public:
  std::vector mDailyOrders;
};

 

Based on the enumerator visitor interface we will now be able to implement the enumerators for the two data structures. Within the constructor we will pass the query visitor and the data structure. Of course, this is a kind of implicit design definition and you may use another way to set the dependencies between the instances. For example, instead of the generic enumerator interface you can create two individual interfaces and specify these methods.

class OrderHistoryEnumerator : public IVisitorEnumerator
{
public:
  OrderHistoryEnumerator(IVisitor& elementsReceiver, OrdersHistory& ordersHistory) 
    : mElementsReceiver(elementsReceiver), mOrdersHistory(ordersHistory) {};

  void EnumerateAll()
  {
    for (auto& dailyOrders : mOrdersHistory.mDailyOrders)
    {
      std::for_each(dailyOrders.mOrders.begin(), dailyOrders.mOrders.end(),
        [&](Order& order){order.mBook.GetInstance(this); });
    }
  }

  void InstanceReceiver(Book* book)
  {
    mElementsReceiver.InstanceReceiver(book);
  }

private:
  IVisitor& mElementsReceiver;
  OrdersHistory& mOrdersHistory;
};


class StockEnumerator : public IVisitorEnumerator
{
public:
  StockEnumerator(IVisitor& elementsReceiver, Stock& stock) 
    : mElementsReceiver(elementsReceiver), mStock(stock) {};

  void EnumerateAll()
  {
    std::for_each(mStock.mStockItems.begin(), mStock.mStockItems.end(),
      [&](StockItem& item){item.mBook.GetInstance(this); });
  }

  void InstanceReceiver(Book* book)
  {
    mElementsReceiver.InstanceReceiver(book);
  }

private:
  IVisitor& mElementsReceiver;
  Stock& mStock;
};

 

The query itself is very easy to implement. You pass the enumerator visitor and execute enumeration and you implement the instance receiver methods. Like before you may use implicit design rules or you may define a query visitor interface to define the “ExecuteQuery” method.

class BookTitlesQuery : public IVisitor
{
public:

  std::vector ExecuteQuery(IVisitorEnumerator& enumerator)
  {
    mTitles.clear();
    enumerator.EnumerateAll();
    return mTitles;
  }

  void InstanceReceiver(Book* book)
  {
    mTitles.push_back(book->mTitle);
  }

private:
  std::vector mTitles;
};

 

As mentioned before, this implementation approach allows a loose and use case specific coupling of the components. Within a test application we can therefore create the data structures, the enumerators and the queries and we use composition to create the connection between the components. As you can see, this connection is created within the context of the test method only. Such a loose coupling will allow an easy maintenance and extension of the source code. It will be easy to add new queries based on the existing enumerators as well as add new data structures and enumerators and use them within existing queries.

int _tmain(int argc, _TCHAR* argv[])
{
  // prepare data
  Book book1("My book 1");
  Book book2("My book 2");
  Book book3("My book 3");
  Book book4("My book 4");
  Book book5("My book 5");

  Stock stock;
  stock.mStockItems.push_back(StockItem(book1, 20));
  stock.mStockItems.push_back(StockItem(book2, 30));
  stock.mStockItems.push_back(StockItem(book3, 10));
  stock.mStockItems.push_back(StockItem(book4, 50));
  stock.mStockItems.push_back(StockItem(book5, 10));

  DailyOrders dailyOrders1;
  DailyOrders dailyOrders2;

  dailyOrders1.mDate = "20180101";
  dailyOrders1.mOrders.push_back(Order(book1, 3));
  dailyOrders1.mOrders.push_back(Order(book2, 4));
  dailyOrders1.mOrders.push_back(Order(book4, 2));

  dailyOrders2.mDate = "20180102";
  dailyOrders2.mOrders.push_back(Order(book2, 3));
  dailyOrders2.mOrders.push_back(Order(book4, 2));
  dailyOrders2.mOrders.push_back(Order(book5, 2));

  OrdersHistory ordersHistory;
  ordersHistory.mDailyOrders.push_back(dailyOrders1);
  ordersHistory.mDailyOrders.push_back(dailyOrders2);

  // execute queries
  std::vector titles;

  BookTitlesQuery bookTitlesQuery;

  StockEnumerator stockEnumerator(bookTitlesQuery, stock);
  OrderHistoryEnumerator historyEnumerator(bookTitlesQuery, ordersHistory);

  titles = bookTitlesQuery.ExecuteQuery(stockEnumerator);
  
  std::cout << std::endl;
  std::cout << "---Stock---" << std::endl;

  std::for_each(titles.begin(), titles.end(),
    [](std::string title){std::cout << title << std::endl;; });

  titles = bookTitlesQuery.ExecuteQuery(historyEnumerator);

  std::cout << std::endl;
  std::cout << "---Order History---" << std::endl;

  std::for_each(titles.begin(), titles.end(),
    [](std::string title){std::cout << title << std::endl;; });

  return 0;
}

 

Assessment

The enumerator and query Visitors together form a unit which is needed to solve a use case. Instead of implementing fix units we are now able to implement independent components and connect them depending on the use cases. This advantage of maintainable and extensible code comes with a minor disadvantage. As you can see within the example we have to define and implement some additional interfaces. But this additional work is negligible. Furthermore, the enumerator visitors must implement all visitor methods even if they are used only, to pass on the element instance to the query visitor.

I would recommend using composition over inheritance as the advantages go far beyond the disadvantages. Of course, of you have a very fixed use case with data structure specific queries only, you could use the inheritance concept. If you don’t use the flexibility of independent enumerators and queries, then you don’t need to implement such a flexibility.

 

Outlook

Within the next articles we will think about use cases which are often used as examples for the Visitor pattern but which could be implemented easier by using other patterns and we will finish the article series with a summary of the whole topic.

Veröffentlicht unter C++ | Kommentar hinterlassen

C# 7: binary literals, digit separators, out variables

Within this article I want to introduce some of the minor but helpful new features. These are binary literals, digit separators and out variables.

Binary Literal

So far, we could use decimal and hexadecimal literals in C#. With C# 7.0 a third type is supported, the binary literal.

static void Main(string[] args)
{
  int x = 42;       // decimal literal
  int y = 0x2A;     // hexadecimal literal
  int z = 0b101010; // binary literal
}

Digit Separators

Decimal, hexadecimal and binary literals may be difficult to read if they are long. With C# 7.0 we could use the underscore character within such literals as digit separators. The use of digit separators will not change the meaning of the literal. But you can increase the readability of your literals if you use them wisely.

static void Main(string[] args)
{
  int x = 1_542;          // decimal literal
  int y = 0x2A_BF_71_4D;  // hexadecimal literal
  int z = 0b1011_1010;    // binary literal

  int max = Math.Max(1_513, 2_712);
}

Out Variables inline declaration

With C#7 output variables can be declared inline directly when passing to the method.

static void Main(string[] args)
{
  // C# 6 style
  int x;
  DoSomething(out x);
  Console.WriteLine(x);

  // C# 7 style
  DoSomething(out int y);
  Console.WriteLine(y);
}

private static void DoSomething(out int value)
{
  value = 5;
}

In my opinion this feature is a matter of taste. On the one hand you become the possibility to slim down the source code but on the other hand the variable declaration will be hidden inside the method call. In some situations, the source code readability may be increased by using the inline declaration but in other cases the explicit declaration outside of the method will increase the readability. For example, in cases were the variable is used in several places within a complex function it may be better to declare it explicitly at the beginning of the function instead of using the hidden inline declaration.

Veröffentlicht unter C# | Kommentar hinterlassen