Collaborative Software Testing

Introduction

Over the past decades, software development and software testing have greatly changed. Modern software systems have become very complex and at the same time software release cycles have kept shortening. Additional aspects like cyber security, distributed and connected systems, artificial intelligence or cloud-based software will add another level of complexity.

As a result of this development, software testing needs to be creative, cover a lot of functionality and go beyond what may be described in requirements documents, use cases or user stories. Of course, a good software testing engineer masters a lot of test design techniques like requirement-based testing, decision tables, boundary-value analysis or fuzz testing. But these techniques must be used with respect to the software to test. A good test design is a result of an efficient combination of testing techniques with respect to the actual software system.

Unfortunately, the conditions in many companies do not support an efficient test workflow. Very often test designs are created by one person or team, working from documents or user stories. So, the software test designers often work in isolation.

When test design is created in isolation, it is easy to miss important perspectives and conditions that should be tested. Therefore, it is important to change this workflow and establish a collaborative approach. This will strongly increase the quality of the test process. A collaborative approach has many benefits:

  • Increased knowledge of the application to be tested
  • Shared knowledge and experience in test design
  • Shared problem-solving ability
  • Increased creativity in ways to create tests
  • Better planning and prioritizing of tests
  • Better communication about the project, the tests and the problems to be solved

Collaboration

Working in isolation is a no go in modern software development. The complexity of software systems in combination with short release cycles can only be managed through an efficient test procedure. The basis for an efficient test procedure is collaboration. Test engineers need to work together among their team and with other departments like developers, users and managers.

Collaboration is necessary in all areas of the software testing workflow:

  • Test case creation
  • Test execution
  • Test evaluation and documentation

Especially the test case creation part is very critical. This task is the most difficult one of the whole software testing process. It is very complex and requires a high level of knowledge in software testing principles and a strong understanding of the software to test. Therefore, working in isolation is impossible, as the needed knowledge is distributed over different departments and people. This fact is often underestimated in daily practice. In my opinion, the test case creation process has the highest improvement potential of all the software testing tasks.

Testing process during software implementation

Software developers will normally test their implementations during the development process. These unit tests can be executed before the new feature is even integrated into the product. Even if these unit tests may have a technical focus, each software unit is created with respect to the final use cases of the system. Therefore, unit tests should not be created and managed in isolation by the development team, they must be a part of the overall test process.

As consequence, collaboration should be the basis of the whole test process, independent whether it is a unit test, integration test or system test. There are several established development processes that see collaboration as a central element, for example agile development methods like scrum. Within this article I will not explain a single development process, but you should keep in mind that testing during development should get the same attention as any other test phase. The software implementation phase and the test process are strongly coupled and belong to one overall process. Even unit tests should be developed with respect to a collaborative test process.

Brainstorming

Unfortunately, there are a lot of projects where people work isolated. The developers create their unit tests and the test engineers create their integration tests and system tests. Sometimes they work together within their groups or even totally independent of each other. As we already know, this inefficient process will often result in bad test quality.

Luckily, there is an easy solution for this issue: just talk with each other! It isn’t always necessary to introduce a new workflow within your company or define new development rules. It will be sufficient to spend a little time and bring all project members together, for example within brainstorming meetings. For example, you can start each software iteration with such a brainstorming and talk about the features for the next iteration and the ways to test these features.

With this kind of collaboration all project members can benefit of each other’s knowledge and together it will be possible to define an efficient test procedure. Testing engineers and software developers will benefit a lot from each other if they work in close collaboration. Testing engineers bring in expertise in understanding and analyzing requirements from a system perspective. Software developers have the application knowledge which is needed to write effective test cases.

Of course, after such an initial brainstorming, the project members should not continue to work in isolation. But now everyone can do his work with respect to the whole testing workflow and with the benefit of the common knowledge of the whole team. The collaboration must continue during this software iteration, for example by working groups and synchronization meetings. Again, this can be done without changing the development process at all. You can do such meetings independent of your actual development process.

Brainstorming with product owner

Especially for critical or complex functionalities, it may be advantageous to expand the brainstorming and consult additional project roles, like managers, product owners or users. Everyone has its own view of the expected functionality and its own quality criteria. But you should keep in mind that a brainstorming may become inefficient if there are too many participants. In this case you can split up the group into smaller working groups or use other communication techniques.

Tools

There are numerous tools useable for test case development, execution and documentation, as well as tools for the execution and documentation of meetings and for the collaboration within teams. Since cooperation is important in many areas and not in testing only, there is a wide selection of tools that cover general or test-specific requirements. So, from a tool perspective, you are well covered. And you should use this fact to your advantage. You should establish a development and testing process which fulfils your needs. Due to the huge number of software tools you will find the tools that matches with your process. Unfortunately, some companies or teams will start with the selection of the tool and create their process around that tool. This will normally result in an inefficient process. So, the tool should be according to the process and not the other way around.

Collaboration during test execution

So far, we have seen the benefits of collaboration for the test design and test development process. But what’s with test execution?

Manual tests or partially automated tests are never executed twice in the same way. No matter whether you have a very detailed test specification or not, the execution always depends on the tester. Each tester will have its own interpretation of the test description and each tester will execute the test a little bit different. Even if two testers will repeat the exact same steps, there may be a difference in terms of time, for example the timespan between two user interactions.

Now, you may think it is a disadvantage, having such unprecise test cases. But it is a chance to find more issues. If your test case is executed by different persons, it is executed in little bit different variations. And one of these variations may show a bug which may never have been found if the test would always be executed by the same person.

In many companies, the tests or tests groups are always executed by the same persons. So, each tester has its own tests and after a while the tester does no longer need to think during execution and even the attention wears off. Break this dreary and inefficient daily routine! Work together with the other team members and other teams and distribute the tests. A collaborative test execution will increase the test quality and makes the daily work more varied.

Summary

Testing is one of the most important steps in software development. It is part of the whole process, from requirement analysis to delivery of the final product. Modern software systems have become very complex and at the same time software release cycles have kept shortening. Even experienced software testing engineers are not able to handle this complexity if they work isolated. Collaboration is a basis for modern software tests. This collaboration of different teams should therefore a primary part of the overall software development process.

Werbung
Veröffentlicht unter Projektleitung, Testing | Kommentar hinterlassen

Fuzz Testing

Introduction

Software quality strongly depends on software testing. Due to the huge complexity of software systems they will always contain errors. Even for a single function with a few parameters you would need thousands or millions of tests to check the entire functionality. Software system contain a huge number of functions and additional these functions may have dependencies and will have different behavior depending on the system status and the sequence of called functions. Therefore, it is impossible to create a complete test for a software system and find all errors.

As consequence, software tests are focused on use case specific situations. They will primary test the software according to their intended use. But as soon as the software is released, there will be users which do not use the software in the intended way. This may result in situations not covered by the software tests. Such unexpected inputs or processes may discover errors.

Fuzz Testing

As described before, the main part of software tests should focus on the intended use of the software and check concrete use cases, inputs and results. Focusing on the intended use will result in a manageable number of use cases which could be covered by according test cases.

But how can we simulate and test unexpected user inputs which are not according to the intended use? Here we come back to the issue that this results in a nearly unlimited number of possible test cases.

One idea to solve this issue is testing with random input parameters and random execution sequences. Such test procedures are called fuzz testing or robustness testing. They will call software interfaces or functions with random data and check the result in some general way. For example, in case a function returns an execution status, this status can be checked. Another common way is to check whether the function is executed within a defined timespan and without a software crash.

Therefore, fuzz tests are not used to find specific errors, like wrong results. They are used to find general issues like software crash or freeze. This will allow to create a huge number of automated tests with random input parameters and random execution sequences.

Another advantage of automatically generated tests is the possibility to create long test sequences. Standard test cases which are used to check the intended behavior are normally very short. They will often test one functionality only and then reset the software system to a start state. Of course, there should also be test cases which will check longer execution sequence, but these tests are relatively rare as it is much more time consuming to develop such complex test cases.

Fuzz tests offer a good way to compensate this issue. With fuzz tests it will become easy to randomly test short and long execution sequences. But of course, fuzz tests for execution sequences are no alternative to the normal use case specific test cases as they don’t find specific errors. But again, they are a good complement to the standard tests as they can find general errors like software crashes or freezes.

Evaluation and documentation of fuzz tests

Fuzz testing allows to create and executed a huge number of test cases within a short time span. As these tests are executed with random values and execution sequences, each test execution results in new test cases. Therefore, fuzz testing tools can normally record and repeat the test cases. Of course, in case of an error, it is mandatory to record and store the test case. But depending on the specific project needs it may not be necessary to store the details of successful test cases is you don’t want to repeat them in the same way.

The documentation of fuzz tests may follow the same principle. It should be possible to create a short report with detail information for the erroneous test cases and summarized information for the successful test cases. And it should be possible to create a full test report with detail information about all executed test cases. Of course, such a full report may become unmanageable very fast as it contains a lot of data.

Smart Fuzzing

Simple fuzz tests are based on random test parameters. This will lead to redundant test cases. Furthermore, there is no distinction between practical data and non-practical data. These disadvantages may result in inefficient fuzz tests. Of course, fuzz tests should use random data because they should find error situations that nobody thought of. But fuzz testing will become more efficient if it is possible to create test date with a higher focus on covering the areas around the main use cases. Such an optimization is called smart fuzzing.

Smart fuzzing is a kind of gray box testing. So, the test system must have knowledge about the software system to test. For example, it may know the internal state machine and the meaning of input parameters. This will allow to create practical test data and reduce the number of redundant test cases.

With smart fuzzing it is possible to increase the test quality. But, of course, you don’t get this for free. The test system must get the knowledge about the software system to test. So smart fuzzing comes with a high effort for set up and teaching of the test system.

Summary

Fuzz testing improves software quality. It allows automated tests of software functionality outside of the main use cases. It is mainly focused on tests with uncommon or unexpected data and unusual sequence of events. Simple fuzz test can be created with little effort. According testing tools will allow to create, execute, record and document such test with random input data. If you want to create efficient and high-quality fuzz tests you should spend more time and use tools which allow smart fuzz tests. This kind of tests need knowledge about the test system and therefore you should schedule a higher effort for configuration, teaching or even implementation of such smart fuzz test tools.

Veröffentlicht unter Testing | Kommentar hinterlassen

Automated cognitive testing

In this article I would like to give few basic thoughts on the topic of cognitive testing and clarify what is meant by this kind of testing, what cognitive software systems are, how to test such systems and what is meant by cognitive testing tools.

Cognition

It isn’t that easy to find an exact definition for the term “cognition” because it depends on the context. For example, in contrast to cognitive software systems the human cognition involves thinks. So, I want to use a simplified clarification, which is applicable to man and machine.

Cognition is based on knowledge. This knowledge can be built up by analyzing historical data (learning) and by analyzing actual situations (self-observation). A cognitive action will start with an analysis of the actual situation and therefore the actual information. This information can contain a lot of unstructured data. This data will be analyzed based on the knowledge of the cognitive system. The result of this process is an appropriate action or reaction.

A cognitive system accesses many sources and data, combines that data, transforms information, filters and evaluates that information out of its context, interacts with other systems (such as a human being), learns from information, context, and interaction. The system makes something new out of this data, even if the underlying and available information and its relationships are complex.

Cognitive Software Systems

The term “artificial intelligence” has been a popular catchphrase for a long time. But in the recent years this outworn catchphrase was replaced by the term “cognitive system” or “cognitive software”.

Cognitive software systems will process unstructured data and use artificial intelligence to draw conclusions. Currently the artificial intelligence is often based on neural networks and the knowledge is build up by machine learning or the special form of deep learning.

We will use such intelligent software systems nearly every day. For example, every speech recognition in our cars or smart homes uses technology based on neural networks.

But I think there is a major issue concerning the use of the term “cognitive software”: it is used for a wide range of software system types, starting with simple systems which simulate intelligence and ending with systems which think and act in a human way for example systems for autonomous driving. Only the second type is a real cognitive software. So, in my opinion, the term “cognitive software” is used to often, even for systems which have some simple kind of artificial intelligence, like speech recognition, but which don’t have real cognitive capability’s.

Cognitive Testing

After the short introduction of the term cognitive and of cognitive software systems, we can start with the main topic of this article: cognitive testing. So, at first let’s think about the main work steps of a software tester.

The software tester will specify a set of basic tests. These basic tests are very specific and strict. They contain a detailed description of the pre conditions, the input date, the execution steps and the expected result. These basic tests will be executed for each test candidate. This type of testing is widespread and used in many companies. Of course, the creation of the tests isn’t easy. The tester will use his knowledge, creativity and experience to develop the test cases. So, this working step is strongly based on his cognitive skills. In contrast, the execution of these basic tests is very easy and does not depend on cognitive skills. The work of a tester is often supported by software tools. You will find a lot of tools for the execution of basic tests. But interestingly it will be very difficult to find tools which strongly support the development of tests. Maybe because this part is strongly based on cognitive skills?

There exists a second type of testing which is called “free testing” or “explorative testing”. This type of testing is rarely used because it is time consuming and therefore expensive. Explorative or free testing means, the tester will not work according a fixed test specification. Instead he will use his knowledge about the testee and execute some new test sequences without a previous planning. Of course, these new sequences must be documented, and the tester must evaluate the result carefully because the expectation about the result depends on the knowledge of the tester. Therefore, such free testing can be very difficult even for an experienced software tester. This work, again, strongly depends on cognitive skill and so it is not a big surprise to find nearly no software tools for explorative testing.

Cognitive Testing Tools

The complexity of software systems is constantly rising. Therefore, there is a need for intelligent testing tools which strongly support the tester during development and execution of software tests. As a result of the rising complexity of the software to test, the testing tools must become smarter too. So, wee need testing tools with artificial intelligence and cognitive capabilities.

Such testing tools must offer some kind of intelligence, for example based on neuronal networks. Therefore, a new and important part of the daily routine of a tester will be to train these systems. This training or teaching must be done with use case specific data to create a wide and deep knowledge about the software to test. A good trained testing tool will be able to support the tester on development and execution of test cases or the testing tool will even be able to do this work alone without the need of a tester.

Testing of cognitive systems

As mentioned before, the complexity of software systems is constantly rising. So, it can be assumed to see the same growth for intelligent software systems. Over time the number of these software systems will rise, and they will evolve from system with simple artificial intelligence to systems with real cognitive capabilities.

Nowadays, most software systems are complex but have a well-defined behavior. Furthermore, it is rather rare to execute explorative test. Therefore, the development and execution of test cases isn’t that complex, but it is a time-consuming task. As a result, there exist a lot of powerful testing tools which support the work of the tester.

In the future, the number of intelligent or even cognitive software systems will increase. This will lead to a big change of the testing process. On one hand the development of the basis test cases will get more complex and on the other hand the importance of explorative test will increase sharply. The actual testing tools are not suitable for this new testing process. Cognitive software systems are a new kind of software which will lead to a new testing process and which will need a new type of tools: cognitive testing tools.

Current Situation

Software systems with artificial intelligence keep moving into more and more areas. But the current systems are limited to specific and simple use cases and are far away from real cognitive software systems. It will take a few more years before we will see and use cognitive software system in our daily life. Therefore, there is nearly no need for cognitive testing tools now. But over the next years the systems with artificial intelligence will increase in number and complexity. This will lead to an evolution within the test tools too. They will also become smarter and as soon as we start to use real cognitive software systems, we will need cognitive testing tools too.

Veröffentlicht unter Projektleitung, Testing | Kommentar hinterlassen

String View in C++17 (std::string_view)

The C++ string (std::string) is like a thin wrapper which stores its data on the heap. When you deal with strings it happens very often that a string copy must be created and therefore a memory allocation is done. But there are many use cases which do not require a copy of the instance. In these situations, you want to analyze the string or do some calculations based on the string but you don’t want to change its content.

With C++17 the string view was introduced (std::string_view). The string view is designed for use cases were a non-owning reference to a string is needed. It represents a view of a sequence of characters. This sequence of characters can be a C++-string or C-string. A string view offers nearly the same methods as a standard string. The following example shows how to create and use a string view.

int main()
{
	std::string str = "FooBar";
	std::string_view strview(str);

	std::string_view substr = strview.substr(strview.find('B'));

	std::cout << substr << '\n';
}

A typical implementation of a string view holds two members only: a pointer to constant char array and a size. Therefore, it is quite cheap to copy a string view. The main purpose of a string view is to avoid copying data if only a non-mutating view is required. So, the main reason for string view is performance.

The substr method is a good example to show the difference between string and string view performance. Both offer the substr method to get a substring. But there is a large difference in performance. The substr method of the string has a linear complexity and therefore it directly depends on the size of the content. The string view method has a constant complexity and therefore it is independent of the content size. So, if you have to deal with large strings and substring you may get a huge performance gain by using string view.

But if we speak about performance, we should never make a general statement like: “string view is always more performant than string”. Compilers are very smart when handling strings, especially when strings are short. Such a “small string optimization” is done for short strings. For example, in MSVC and GCC strings with a size up to 15 characters are stored on the stack and not on the heap. Because of the compiler optimizations, in some use cases there might no advantage if you use string view instead of string.

Non-owning reference

The string view holds a non-owning reference to a character array. Therefore, the lifetime of the referenced object must be larger than the lifetime of the string view. Otherwise it results in undefined behavior. The following code shows an erroneous implementation. The returned string view references to the string which is valid in method scope only. So, the use of the string view within the main-method scope will result in undefined behavior.

std::string_view Create()
{
	std::string str = "FooBar";
	std::string_view strview(str);

	return strview;
}

int main()
{
	std::string_view strview = Create();

	std::cout << strview << '\n';
}

Not null-terminated

There is one major pitfall if you use a string view: The string view content may not be null-terminated. This becomes relevant if you want to use functions like “atoi” or “printf” which expects a null-terminated c-string. You can easily pass the content of the string view by its “data()” method. It returns a pointer to the underlying character array. But this character array may not be null-terminated. Unfortunately the string view does not have a “c_str()” method like the string class. And to confuse us completely the “data()” method of the string view may have a different behavior than the “data()” method of the string, which returns a null-terminated character array in C++11 and later. In my opinion it isn’t a good choice to have such different behaviors in string and string view as they have nearly equal interfaces and as result users expect an equal behavior.

If you have to get the underlying character array of a string view as a null-terminated string, you have to explicitly convert it to a string and so you are able to use the “c_str()” method. The following example shows an according use case. The first string view contains a null-terminated character array. So, we can pass it to the “strlen” method. The second string view is not null-terminated. So, the “strlen” method call will result in undefined behavior. The third method call shows the conversion to a null-terminated string.

int main()
{
	// null terminated string
	std::string str = "FooBar";
	std::string_view strview(str);

	std::cout << std::strlen(strview.data()) << '\n';

	// not null terminated string
	char str2[6] = { 'F','o','o','B','a','r' };
	std::string_view strview2(str2, sizeof str2);

	std::cout << std::strlen(strview2.data()) << '\n';

	// convert to null terminated string
	std::cout << std::strlen(std::string(strview2).c_str()) << '\n';
}

Use string view as method parameter

After this short excursion to the risks when using string view, we will come back to its strengths. As we learned so far, the string view may be favorable whenever we must create a constant copy of the string value content. So, it may be a perfect choice for a read-only function parameter. Following we will compare three variants of a method implementation. Within these variants the string is passed as c-string, c++ string or string view.

Let’s start with a c-string. To keep it simple the method contains the extraction of a substring only, as example to access the input parameter.

void AnalyzeCharArray(const char* s)
{
	auto x = strchr(s, 'B');
}

int main()
{
	char arr[6] = { 'F','o','o','B','a','r' };
	std::string str = "FooBar";
	std::string_view strview(str);

	// with c string
	AnalyzeCharArray(arr);
	AnalyzeCharArray(str.c_str());
	AnalyzeCharArray(std::string(strview).c_str());
}

This implementation has some disadvantages. We must use “c_str()” if the method is called with std::string. We must do a safely null-terminated string construction if the method is called with string view.

So, let’s change the method and use a C++ string.

void AnalyzeStdString(const std::string& s)
{ 
	auto x = s.substr(s.find('B')); 
}

int main()
{
	char arr[6] = { 'F','o','o','B','a','r' };
	std::string str = "FooBar";
	std::string_view strview(str);

	// with c++ string
	AnalyzeStdString(arr);
	AnalyzeStdString(str);
	AnalyzeStdString(std::string(strview));
}

That looks better. But there are still some downsides. If the method is called with a string view, a conversion to string is necessary. Furthermore, there are some memory allocations: if the method is called with an c-string, the conversion to a c++ string and the substr call.

As third alternative we use the string view as method parameter.


void AnalyzeStringView(const std::string_view s)
{
	auto x = s.substr(s.find('B'));
}

int main()
{

	// with string view
	AnalyzeStringView(arr);
	AnalyzeStringView(str);
	AnalyzeStringView(strview);
}

This seems to be a good choice. The method can be called directly with char array, string and string view and there is no additional memory allocation. Furthermore, the string view offers same member functions like a string and can be use with algorithms.

So, the use of a string view as method parameter is the best choice in such use cases.

Use string view as return value

Of course, the string view can be used as return value too. As shown previously you just must be careful with object lifetime of the referenced object. The following source code shows an according example.

std::string_view GetSubstring(const std::string_view s)
{
	return s.substr(s.find('B'));
}

int main()
{
	std::string str = "FooBar";
	std::string_view strview(str);

	std::cout << GetSubstring(strview) << '\n';
}

Summary

The string view represents a view to a sequence of characters. As the string view is a lightweight object stored on the stack it solves some performance issues of a standard string. Both objects have a nearly similar interface so you can easily exchange them. There are two major pitfalls if you use a string view. At first, you must think about the lifetime of the referenced object instance. As the string view holds a non-owning reference, the lifetime of the referenced object must be larger than the lifetime of the string view. At second, the string view may contain a string which is not null-terminated. So, you may have to convert it to a c-string if you want to pass it to a function which expect a null-terminated character array.

Veröffentlicht unter C++ | Kommentar hinterlassen

“if constexpr” in C++17 (static if)

With C++17 the „if constexpr“ statement was introduced. This so called “static if” or “compile-time if-expression” can be used to conditionally compile code. The feature allows to discard branches of an if statement at compile-time based on a constant expression condition.

if constexpr(condition)
	statement1; 
else
	statement2;

Depending on the condition the statement1 or statement2 is discarded at compile time. A discarded statement inside a template is not instantiated. Therefore, this feature is mainly used in templates. It allows to compile specific statements only, depending on the template type. This can greatly simplify template code as it will be possible to easily express intends similarly to “run-time” code. Later we will see an example where static-if is used instead of template specialization.

We already have a feature for conditionally code compilation: the “#ifdef” directive. So, will static-if replace this directive? No, it will not as these two statements are not identical. Both will conditionally compile code, but “#ifdef” will do this based on conditions that can be evaluated at preprocessing time. For example, #ifdef could not be used to conditionally compile code depending on the value of a template parameter. On the other hand, static-if cannot be used to discard syntactically invalid code, while “#ifdef” can. So, there are use cases where you can use the one or the other implementation kind and there are use cases which are specific for one of them.

Example

As mentioned before, the static-if feature is very interesting for template implementation, for example as an alternative to template specialization. The following example shows a template implementation with specific code depending on the template type.

template 
void PrintInfo(T x)
{
	if constexpr (std::is_same_v)
	{
		std::cout << "string with length " << x.length() << std::endl;
	}
	else if constexpr (std::is_same_v)
	{
		std::cout << "int" << std::endl;
	}
	else
	{
		std::cout << "some other type" << std::endl;
	}
}

int main()
{
	std::string val1 = "foo";
	int val2 = 42;
	double val3 = 5.8;

	PrintInfo(val1);
	PrintInfo(val2);
	PrintInfo(val3);	
}

Following you will see an identical implementation static-if. In this case template specialization is used.

template 
void PrintInfo(T x)
{
	std::cout << "some other type" << std::endl;
}

template 
void PrintInfo(std::string x)
{
	std::cout << "string with length " << x.length() << std::endl;
}

template 
void PrintInfo(int x)
{
	std::cout << "int" << std::endl;
}

int main()
{
	std::string val1 = "foo";
	int val2 = 42;
	double val3 = 5.8;

	PrintInfo(val1);
	PrintInfo(val2);
	PrintInfo(val3);	
}

If you compare the two implementations you may say that the one with template specialization is easier to read and to understand. Of course, most real implementations will be more complex so it isn’t possible to say which of the two implementation concepts is favorable. In my opinion it depends on the use case. Therefore, the static-if will not replace existing concepts like template specialization.

Compile-time discard

On the beginning of the article I mentioned that the static-if executes a conditional discard and that discarded statement inside a template will not be instantiated. What does this mean? And what difference do we have between static-if inside and outside of templates.

The following example shows the template implementation we seen previously but this time the static-if is replaced with a normal if-statement.

template 
void PrintInfo(T x)
{
	if (std::is_same_v)
	{
		std::cout << "string with length " << x.length() << std::endl;
	}
	else if (std::is_same_v)
	{
		std::cout << "int" << std::endl;
	}
	else
	{
		std::cout << "some other type" << std::endl;
	}
}

int main()
{
	std::string val1 = "foo";
	int val2 = 42;
	double val3 = 5.8;

	PrintInfo(val1);
	PrintInfo(val2);
	PrintInfo(val3);
}

Of course, this code is invalid and results in a compiler error as the “int” type does not have a “length” method. But in case we use static-if the example can be compiled. That’s because the discarded if-elements will not be instantiated at all.

If we use the static-if outside of a template we can see a different behavior. The discarded if-elements will be instantiated. The following code shows an easy example with an undeclared identifier within the discarded if-element. The code within the template can be compiled but the code without the template shows an according compiler error.

template 
void DoSomething(T x)
{
	if constexpr(true)
	{
		std::cout << x << std::endl;
	}
	else
	{
		std::cout << y << std::endl;	// OK as code is not instantiated at all
	}
}

int main()
{
	int x = 2;

	DoSomething(x);

	if constexpr(true)
	{
		std::cout << x << std::endl;
	}
	else
	{
		std::cout << y << std::endl;	// error C2065: 'y': undeclared identifier
	}
}

static-if vs. template specialization

We already seen a comparison between static-if and template specialization within the short example at the beginning. Let’s have a look at a more complex example to get a better feeling for the differences of the concepts.

Let’s say we have the following use case. We should implement a calculation which consists of three steps: a preparation, a transformation and a result creation. The three steps are already implemented. So, we use a given interface. The preparation and result creation steps are available as type specific methods. Therefore, we want to implement the calculation function as template and use the according type specific method variants.

The following implementation contains the given interface as dummy implementation and the template implementation. The template is implemented in two variants, one uses static-if and the other one uses template specialization.

// some given interface
int PrepareByString(std::string x) { return x.length(); }
int PrepareByInt(int x) { return x - 10; };
void Transform(int* x) { *x = *x + 5; }
std::string CreateStringResult(int x) { return std::to_string(x); }
int CreateIntResult(int x) { return x + 5; }

// template with constexpr
template 
T Calculate(T x)
{
	int temp = 0;
	if constexpr (std::is_same_v) temp = PrepareByString(x);
	else if constexpr (std::is_same_v) temp = PrepareByInt(x);	
	else return x;
	
	Transform(&temp);

	if constexpr (std::is_same_v) return CreateStringResult(temp);
	else if constexpr (std::is_same_v) return CreateIntResult(temp);
}

// template with specialization
template 
T Calculate(T x)
{
	return x;
}

template 
std::string Calculate(std::string x)
{
	int temp = PrepareByString(x);

	Transform(&temp);

	return CreateStringResult(temp);
}

template 
int Calculate(int x)
{
	int temp = PrepareByInt(x);

	Transform(&temp);

	return CreateIntResult(temp);
}

// main function
int main()
{
	std::string val1 = "foo";
	int val2 = 42;
	double val3 = 5.8;
	
	std::cout << Calculate(val1) << std::endl;
	std::cout << Calculate(val2) << std::endl;
	std::cout << Calculate(val3) << std::endl;
}

Based in this short example we can see the advantages and disadvantages of both solutions. We have implemented a fixed calculation algorithm which is same for all data types. If we use template specialization and implement each data type separately we must duplicate this algorithm. And of course, duplicated code comes with the well-known disadvantages. By using static-if we must implement the algorithm one time only. But we had to add two if-statements. So instead of the straight procedure we add code branches. Therefore, the complexity of the single calculation method increases but the complexity of the template is reduced as the template contains one method only instead of three methods.

This is still an easy example with a few lines of code, but it will show the main difference of the concepts and the resulting code. Whether you use the one or the other concept may be use case specific. And of course, there are many other alternatives too or you can even mix up several concepts. In summary I recommend use of static-if in templates. It often improves the code quality as it makes the source code easier to read and to maintain.

Veröffentlicht unter C++ | 1 Kommentar

Auto Type Deduction in Range-Based For Loops

Range-based For Loops offer a nice way to loop over the elements of a container. In combination with Auto Type Deduction the source code will become very clean and easy to read and write.

for (auto element : container) 
{ 
    // ...
}

The Auto Type Deduction is available in different variants:

  • auto
  • const auto
  • auto&
  • const auto&
  • auto&&
  • const auto&&
  • decltype(auto)

Of course, these variants result in different behaviors and you should choose the right one according to your needs. Following I will give a short overview of these variants and explain their behavior and standard use case.

auto

This will create a copy of the container element. This variant is used in case you want to get and modify the element content, for example to pass it to e function, but leave the origin container element as it is.

const auto

Like in the first variant, this creates a copy of the element content. But this time this copy is constant and cannot be changed. In most cases “const auto” isn’t a good choice. If you want to work with an immutable copy you can use “const auto&” too and don’t have to create a copy. There are a few use cases for this variant only. For example, it could be useful in multithreading scenarios. Let’s say you want to use the element several times within the loop. But in parallel another thread may change the container element. By using “const auto” you create a copy of the element and can use this copy in your loop several times. If you use “const auto&” you will get the updated element instead. So, there are scenarios where “const auto” and “const auto&” create different results. Therefore, we have a need for “const auto” even if it is used very rarely.

auto&

This will create a reference to the original container element. So, it is used in case you want to modify the container content.

const auto&

This creates a constant reference to the original container element. So that’s the perfect choice if you need read-only access to the elements.

auto&&

Like “auto&” this variant with double “&” is used in case you want to modify the origin container elements. There are some special cases where it isn’t possible to use the normal “auto&” variant. For example a loop over “std::vector<bool>” yields a temporary proxy object, which cannot bind to an lvalue reference (auto&). In such cases “auto&&” can be used. It is a forwarding reference. If it is initialized with an lvalue, it creates an lvalue reference and if it is initialized with an rvalue, it creates an rvalue reference. As a result, “auto&&” is a good candidate for generic code and therefore it is most often used in templates.

Of course, as the forwarding reference “auto&&” covers the use cases of the standard reference “auto&” we may ask ourselves if we should ever use the variant with the double “&”. But I would not recommend this. The syntax with double “&” is more confusing. Developers are familiar with the standard reference syntax and will expect the forward reference syntax in special cases only. So, I recommend using “auto&” outside of templates and “auto&&” within templates.

const auto&&

This creates a read-only forwarding reference. This variant will bind to rvalues only. A read-only access will work for containers which yield a temporary proxy object. So in contrast to a read-access we don’t have to use the double “&” variant for containers like “std::vector<bool>”. There are only a view theoretical use cases for “const auto&&” and therefore you will normally not use this variant for your applications.

decltype(auto)

This variant should not be used in Range-Based For Loop. “decltype(auto)” is primarily useful for deducing the return type of forwarding functions. Use it to declare a local variable is an antipattern. Therefore, I don’t want to get in detail about “decltype” and just mentioned it for completeness. Even if the compiler allows to write “decltype(auto)” you should not use it in Range-Based For Loops.

Summary

  • Use “auto” when you want to work with a copy of the elements
  • Use “auto&” when you want to modify elements
  • Use “auto&&” when you want to modify elements in generic code
  • Use “const auto&” when you want read-only access to elements
  • Use “const auto” in multithreading scenarios when you need read-only access to volatile elements
Veröffentlicht unter C++ | 1 Kommentar

C# Protected Internal vs Private Protected

C# offers the composed access modifiers “protected internal”. With C# 7.2 a new composed access modifier was added: “private protected”. Unfortunately, these modifiers are hard to understand as their names don’t reflect their meaning. Within this article I want to explain the two modifiers and their technical background.

Within the CLR you will find the single access modifiers “Family” and “Assembly”. “Family” means that this object or a derived object has access. Within C# and many other programming languages this “Family” modifier is implemented with the “protected” keyword. The CLR “Assembly” modifier means that a member is accessible by everyone within the defining assembly. The according implementation in C# is done with the “internal” keyword. So far, as we have the single modifiers it is quite easy. But what if we combine those two modifiers?

Within the CLR it stays simple. It offers two compound access modifiers: “Family and Assembly” and “Family or Assembly”. It combines the single modifiers by “And” to create an intersection or by “Or” to create a union.

“Family and Assembly” will allow access from objects of this assembly in case they are derived objects.

“Family or Assembly” will allow access from everyone object within the defining assembly and additional by any derived object outside of the assembly.

In C# things become difficult as they choose an awkward syntax. At first, the support for the “Family or Assembly” CLR access modifier was added to C#. But the C# syntax was not “protected and internal” it was “protected internal” without the “and”. I think that’s a good choice as it keeps things simple. The “and” keyword would be disturbing and unnecessary.

But with C# 7.2 the “Family and Assembly” CLR access modifier should become a part of C#. Now the language designers had the issue that the “protected internal” modifier without the “and” keyword was still part of the language. So how should they name the new modifier? If they choose a syntax like “protected or internal” it would be easy to understand but with the downside of the implicit interpretation of “protected internal” as “protected AND internal” and therefore with the risk that developers by mistake use the wrong modifier as they have nearly the same syntax.

So, the language designers decided to use the syntax “private protected”. This should mean we have the well-known protected relationship between base and derived object and additional the “private” keyword means that this relationship is limited to derived objects of the same assembly. In my opinion that wasn’t a good decision. Of course, the composed keywords are now different and cannot be mixed up by mistake, but they are awkward and confusing as they don’t reflect their meaning.

But of course, we should not criticize this decision too much because it was a decision between bad alternatives only. There was no possibility to add the “Family and Assembly” feature in a clean way. The crucial mistake was already done as the “Family or Assembly” feature was added to C#. The language designers choose a syntax without respect to the possibility that the “Family and Assembly” feature will be added in future.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

Ref Return and Ref Locals in C# 7

The C# language supports passing arguments by value or by reference since the first language version. But returning a value was possible by value only. This has been changed in C# 7 by introducing two new features: ref returns and ref locals. With these new features it is possible to return by reference. ‚Ref return‘ allows to return an alias to an existing variable and ‚ref local‘ can store this alias in a local variable.

The main goal of this language extension is allowing developers to pass around references to value types instead of copies of the values. This is important when working with large data structures implemented as value types. Of course, the new feature can be used with reference types too but reference types will already be returned as pointers and you will not have advantages if you return them as reference to pointer.

Return by value

The following source code shows an example for a return by value. The function returns the second element of the list. As it is a return by value, a copy of the element will be returned. A modification of the returned element is done within the copy and not within the origin object instance.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
  new Person() {mName = "John Doe", mAge = 31 },
  new Person() {mName = "Jane Doe", mAge = 27 },
  };

  Person person = GetSecond(persons);
  person.mAge = 41;

  // output:
  // 'John Doe (31)'
  // 'Jane Doe (27)'
  foreach (Person p in persons)
  {
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

struct Person
{
  public string mName;
  public int mAge;
}

static Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) throw new ArgumentException();

  return persons[1];
}

If you want to find and use the origin element you had to implement the method in a different way, for example return the found index and use this index to modify the origin list. With the new ‘ref return’ feature it becomes possible to implement the needed behavior very easily. You just must change the return from a value to a reference.

Return by reference

The following source code shows the same adapted example. This time the found list item is returned as reference. The reference is sored within the ref local variable. Changes mode to this variable are mode to the referenced object instance. So the origin list item will be changed.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  ref Person person = ref GetSecond(persons);
  person.mAge = 41;

  foreach (Person p in persons)
  {
    // output:
    // 'John Doe (31)'
    // 'Jane Doe (41)'
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) throw new ArgumentException();

  return ref persons[1];
}

Use a List instead of an Array

Within the previous example I used an array of struct objects. What do you think will happen if we change to another container type, for example to a list?

static void Main(string[] args)
{
  List persons = new List
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  ref Person person = ref GetSecond(persons);
  person.mAge = 41;

  foreach (Person p in persons)
  {
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

static ref Person GetSecond(List persons)
{
  if (persons.Count < 2) throw new ArgumentException();

  // error CS8156: An expression cannot be used in this context
  // because it may not be passed or returned by reference
  return ref persons[1];
}

It isn’t longer possible to compile the code. The ‘return ref persons[1]’ statement results in an compiler error. That’s because of a different implementation of the array indexer and the list indexer. The array indexer returns a reference to the list item. So, we can use this reference as return value. The list indexer instead returns a copy of the value. As it isn’t allowed to return the indexer expression itself nor the returned temporary local variable, the compiler will show an according error message. Within the following article you can find further information about this issue.

Ref local

Within the example application we have stored the returned reference within a ‘ref local’ variable. A ref local variable is an alias to the origin object instance. It is initialized by the ref return value. The reference itself is constant after this initialization. Therefore, an assignment to a ref local will not change the reference but it will change the content of the referenced object.

The following source code shows an according example.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  ref Person person = ref persons[1];
  person.mAge = 41;
  person = persons[0];
  person.mAge = 51;

  foreach (Person p in persons)
  {
    // output:
    // 'John Doe (31)'
    // 'John Doe (51)'
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

At first the ref local is initialized with a reference to the second list item. Then the age is changed to 41. Now we have another assignment which is very interesting. ‘persons[0]’ returns a reference to the first list item. But an assignment to our ref local variable will not change the reference. The reference is set during initialization and stays constant. The assignment will change the value of the referenced object. Therefore, the second list item – which is referenced by the ref local variable – will be changed to the values which are stored within the first list item, which are ‘John Doe’ and ‘31’. At next we set the age to ‘51’. So, the output is like shown in comment within the example application. The first list item is not changed at all and the second list item was updated with the name stored within the first item and an age set due to the last assignment.

Ref vs. Pointer

The previous example and this topic in general may raise the question about the difference between a reference and a pointer. So, we should take a minute and think about this question as it is important for a deep understanding of the ref local and ref return mechanism.

Briefly summarized you can say: References and pointers do both refer to an object instance. But references are constant after initialization where pointers can be changed.

This single difference is an important one as it results in different characteristics and possibilities of the two concepts. Following I want to mention some important ones. As references are constant they cannot be null. Pointers instead can be reassigned and as consequence they can also set to null. You can have pointers to pointers and create extra levels of indirection, whereas references only offer one level of indirection. As pointers can be reassigned, various arithmetic operations can be performed on them, which is called ‘pointer arithmetic’. It is easier to work with references as they cannot be null and you don’t have to think about indirection. But it is not safer to work with references because pointers as well as references can refer to invalid objects or memory locations.

Please look at the following functions and think about the parameter kind: is it a reference or a pointer or a value copy?

Function Parameter kind
void Foo(MyStruct x) Copy of the value passed by the caller
void Foo(MyClass x) Pointer to the origin object instance which is available on caller level
void Foo(ref MyStruct x) Reference to the origin object instance which is available on caller level
void Foo(ref MyClass x) Reference to the pointer to the origin object instance which is available on caller level   Often C# developers say that’s a “pointer to a pointer” but in fact it is a reference to a pointer. As you know after reading above comparison of the two concepts that’s a small but important difference.

Use ref return without ref local

You can define a method with ref return and assign the result to a variable which is not ref local. In this case the content which is referenced by temporary variable of the method result will be copied to the local variable. So your local variable is a copy of the origin list item and changes will therefore not affect the list item.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  Person person = GetSecond(persons);
  person.mAge = 41;

  foreach (Person p in persons)
  {
    // output:
    // 'John Doe (31)'
    // 'Jane Doe (27)'
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) throw new ArgumentException();

  return ref persons[1];
}

Ref return of method local variable

A ref return creates a reference to an existing object instance. If the lifetime of the object instance is shorter than the lifetime of the reference, the reference will refer to an invalid object or memory location. This will result in critical runtime errors. Therefore, the referred object instance must be in a higher scope or in the same scope as the ref local variable. You cannot create a method local object instance and return a reference to this object instance. The following source code shows an according example with the compiler error messages as comment.

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2)
  {
    // error: 
    // An expression cannot be used in this context 
    // because it may not be passed or returned by reference
    return ref new Person();

    // error:
    // Cannot return local 'person' by reference because it is not a ref local
    Person person = new Person();
    return ref person;
  }

  return ref persons[1];
}

Return null

As we already learned, a reference cannot be null. Therefore, a method with ref return cannot return null. Within the examples so far, we have thrown an error if the parameter is invalid. But exceptions should be thrown in exceptional cases only. In my opinion it is not an exceptional case if we pass a list with less than two elements to the ‘GetSecond’ method. So, I don’t want to throw an exception but as no list item is found I want to return an invalid element. For reference types I want to return null and for value types I want to return a default value. But as we have seen, whether it is possible to create a local default value and return it by reference nor it is possible to return null. But it is possible to return a reference to an object instance if the object is in higher scope. We can use this possibility and define a default value for an invalid list item.

The following source code shows an according example with a list of value types.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
    new Person() {mName = "John Doe", mAge = 31 }
  };

  ref Person person = ref GetSecond(persons);
  person.mAge = 41;
}

static Person gDefaultPerson = new Person();

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) return ref gDefaultPerson;

  return ref persons[1];
}

And we can adapt the example for a list of reference types.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
        new Person() {mName = "John Doe", mAge = 31 }
  };

  ref Person person = ref GetSecond(persons);

  if (person != null) person.mAge = 41;
}

static Person gDefaultPerson = null;

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) return ref gDefaultPerson;

  return ref persons[1];
}

But there is one very critical design fault within this implementation. The returned default value can be changed. So, a next method call or a method call by another client may return a default value with changed content. This may lead to undefined behavior in the application. But what if we use an immutable object for the default value? This will solve the issue and allows to use this implementation concept. So, you must implement an immutable object and return a reference to this constant object instance. With C# 7.2 it will be possible to use the readonly modifier for structs and ref returns. This will make it even more comfortable to create and use immutable structs.

‘In’ modifier and ‘Readonly’ modifier for struct und ref return

The code examples of this article were created with C# 7.0. With C# 7.2 you can use two additional features which allows to write more performant code. These features are the ‘in’ modifier for method parameters and the ‘readonly’ modifier for ref returns and for structs.

Method parameters are often used as input for the method. So, they will not be changed within the method. If you use a struct as method parameter it is passed by value. In this case the runtime creates a copy of the struct instance and pass the copy to the method. This language design concept allows to use the method parameter as method local value without any side effect to the origin struct instance outside of the method scope. But of course, this comes with the disadvantage of performance loss as it may be expensive to create the copy of the struct.

But do we need a copy at all if we just read the parameter values? Of course not! In this case it would be fine to pass a reference to the origin struct. But it must be guaranteed that it is used to read values only. This is exactly the idea of the ‘in’ modifier. As well as the ‘out’ and ‘ref’ modifiers, the parameter will be passed as reference. Additional a ‘in’ parameter will become read only. So, you cannot assign a new value to the parameter. This is comparable to the “pass by const reference” principle in C++.

In theory the ‘in’ modifier is a nice and easy way to improve the performance of method calls with struct parameters. But unfortunately, it isn’t that easy. Depending on the implementation of the struct the compiler must create a copy of the parameter even if you use the ‘in’ modifier. This procedure is called ‘defensive copy’. It is used in case the compiler cannot guarantee that the parameter will not be changed inside the method. Of course, the compiler can prevent direct assignments. But if you call a struct method, the compiler may not know if the member method changes the internal state of the struct. In such situations the defensive copy is created.

To prevent a creation of a defensive copy you can implement an immutable struct. In this case you must use the ‘readonly’ modifier for the class declaration. A readonly struct cannot be changed. Even member methods cannot change the internal state. If you pass such a readonly struct instance as in parameter to a method the compiler knows that the value stays constant and does not have to create a defensive copy.

The ‘readonly’ modifier is moreover available for the ref return value of a method. Of course, the reference itself is constant by definition, so this readonly modifier means that the referenced object instance is constant.

Summary

Ref returns and ref locals help to write more performant code as there’s no need to move copies of values between methods. These enhancements are designed for performance critical algorithms where minimizing memory allocations is a major factor. For the same reason the ‘in’ modifier and the ‘readonly’ modifier for structs were introduced. To pass readonly structs as in parameters to methods may increase the application performance.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

Pattern Matching in C# 7

Patterns are used to test whether a value matches a specific expectation and if it matches patterns allow to extract information from the value. You already create such pattern matchings by writing if and switch statements. With these statements you test values and if they match the expectation you extract and use the values information.

With C# 7 we got an extension to the syntax for is and case statements. This syntax extension allows combine the two steps: testing a value and extract its information.

Introduction

Let’s start with a basic example to see what we are talking about. The following source code shows how to test whether a value is of specific type and then use the value for a console output. The code shows the old and new syntax so you can compare these two implementations. As you can see the new syntax combines the value testing and information extraction in one short statement.

static void Main(string[] args)
{
  WriteValueCS7("abc");
  WriteValueCS6(15);
  WriteValueCS7(18.4);
}

static void WriteValueCS7(dynamic x)
{
  //C# 7
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");
}

static void WriteValueCS6(dynamic x)
{
  //C# 6
  if (x is int)
  {
    var i = (int)x;
    Console.WriteLine("integer: " + i);
  }
  else if (x is string)
  {
    var s = x as string;
    Console.WriteLine("string: " + s);
  }
  else
  {
    Console.WriteLine("not supported type");
  }
}

The example shows pattern matching used in an is-expression to do a type check. The new pattern matching syntax is furthermore supported in case-expressions and it allows three different type of patterns: the type pattern, the const pattern and the var pattern. We will see these different possibilities within the next paragraphs.

Type Pattern

We have already seen the type pattern matching within the previous example. It is used to check whether a value is of a specific type. If the type is matching a new variable of this type is created and can be used to extract the value information. If a value is null, the type check always returns false. The following source code shows an according example.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");
}

Const Pattern

The pattern matching can be used to check whether the value matches a constant. Within this pattern you cannot create a new variable with the value information as the value already matches a constant and can be used as it is.

static void Main(string[] args)
{
string a = "abc";
string b = null;
int c = 15;
int d = 17;

WriteValue(a);  // output: 'const: abc'
WriteValue(b);  // output: 'const: null'
WriteValue(c);  // output: 'const: 15'
WriteValue(d);  // output: 'unknown'
}

static void WriteValue(dynamic x)
{
if (x is 15) Console.WriteLine("const: 15");
else if (x is "abc") Console.WriteLine("const: abc");
else if (x is null) Console.WriteLine("const: null");
else Console.WriteLine("unknown");
}

Var Pattern

The var pattern is a special case of the type pattern with one major distinction: the pattern will match any value, even if the value is null. Following we see the example previously used for the type pattern, extended with the var pattern.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else if (x is var v) Console.WriteLine("not supported type");
}

If we look at this example we may ask two critical questions: Why do we have to specify a temporary variable for the var pattern if we dont use it? And why do we use the var pattern at all is it is the same as the empty (default) else-statement?

The first question is easy to answer. If we use the var pattern and don’t need the target variable we can use the discard wildcard „_“ which was also introduced with C# 7.

The second question is more difficult. As described, the var pattern always matches. So, it represents a default case, which is the empty else in an if-else statement. Therefore, if we just want to write the default else-case we should not use the var pattern at all. But the var pattern proves to be practical as we want to distinguish between different groups of default-cases. The following code shows an according example. It uses more than one var-pattern to handle the default-case in more detail. As mentioned above the last var pattern is unnecessary and you can write an empty else. I used the var pattern anyway to show you how to use the discard character.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;
  double d = 17.5;
  Guid e = Guid.NewGuid();

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: ''null' is not supported'
  WriteValue(c);  // output: 'integer: 15'
  WriteValue(d);  // output: 'not supported primitive type'
  WriteValue(e);  // output: 'not supported type'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else if ((x is var v) && (v == null)) Console.WriteLine("'null' is not supported");
  else if ((x is var o) && (o.GetType().IsPrimitive)) Console.WriteLine("not supported primitive type");
  else if (x is var _) Console.WriteLine("not supported type");
}

Switch-case

At the beginning of the article I mentioned that pattern matching can be used in if-statements and switch-statements. Now we know the three types of pattern matching and have used them in if-statements. At next we will see how to use the patterns in switch-statements.

The switch-statement so far was a pattern expression. it supported the const pattern only and was limited to numeric types and the string type. With C# 7 those restrictions have been removed. Now the switch-statement supports pattern matching and therefore all three patterns can be used. Furthermore, a variable of any type may be used in a switch statement.

The new possibilities have an side-effect which made it necessary to change the behavior of the switch-case-statement. So far, the switch statement supported const pattern only and therefore the case-clauses were unique. With the new pattern matching the case-clauses can overlap and may not be unique anymore. Therefore, the order of the case-clauses matters. For example, the compiler emits an error if the previous clause matches a base type and the next clause matches a derived type. Because of the possible overlapping case-clauses, each case must end with a break or return. This prevents code execution to „fall through“ from one case expression to the next.

The following example shows the type pattern used in an switch-case-statement.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  switch (x)
  {
    case int i: Console.WriteLine("integer: " + i); break;
    case string s: Console.WriteLine("string: " + s); break;
    default: Console.WriteLine("not supported type"); break;
  }
}

Switch-case with predicates

Another feature related to pattern matching is the ability to use predicates within the switch-case-statement. Within a case-clause a when-clause can be used to do more specific checks.

The following source code shows the use case we already seen in the var pattern example. But this time we use the switch-case and where statements instead of the if-statement.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;
  double d = 17.5;
  Guid e = Guid.NewGuid();

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: ''null' is not supported'
  WriteValue(c);  // output: 'integer: 15'
  WriteValue(d);  // output: 'not supported primitive type'
  WriteValue(e);  // output: 'not supported type'
}

static void WriteValue(dynamic x)
{
  switch (x)
  {
    case int i: Console.WriteLine("integer: " + i); break;
    case string s: Console.WriteLine("string: " + s); break;
    case var v when v == null: Console.WriteLine("'null' is not supported"); break;
    case var o when o.GetType().IsPrimitive: Console.WriteLine("not supported primitive type"); break;
    default: Console.WriteLine("not supported type"); break;
  }
}

Scope of pattern variables

A variable introduced within a type pattern or var pattern in an if-statement is lifted to the outer scope. This leads to strange behavior of the compiler. On the one hand it is not meaningful to use the variable outside the if-statement because it may not be initialized. And on the other hand, the compiler behavior is different for an if-statement and an else-if statement. But maybe this strange behavior will be fixed in a next compiler version. The following source code shows an according example with the compiler errors as comments.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");

  Console.WriteLine(i); // error: Use of unassigned local variable 'i'
  i = 15; // ok      

  // s = "abc";  // error: The name 's' does not exist in the current context
  string s = "abc"; // error: 's' cannot be declared in this scope because that name is used in a local or parameter
}

Pattern variables created inside a case-clause are only valid within the case-clause. They are not lifted outside the switch-case scope. In my opinion this leads to a clean separation of concerns and it would be nice to have the same behavior in if-statements.

Summary

Pattern matching is a powerful concept. The pattern matching possibilities introduced with C# 7 offer nice ways to write complex if-statements and switch-statements in a clean way. The patterns introduced so far are just some base ones and with C# 8 it is planned to add some more advanced ones like recursive pattern, positional pattern and property pattern. So, this programming concept is not just syntactical sugar, it will become an important concept in C# and introduces more and more functional programming techniques to the language.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

C++17: initializers in if-statement and switch-statement

With C++17 it is possible to initialize a variable inside an if-statement and a switch-statement. We already know and use this concept in the for-statement. To be honest: I don’t like this feature. Within this article I want to introduce this feature and explain my doubts. Following I will write about the if-statement only because everything also applies to the switch-statement and so it is sufficient to show one of both.

The new syntax with the initializer inside the if-statement comes with a big improvement: the variable is moved inside the scope of the if-block. An important software design concept is to use the smallest scope as possible and the new syntax helps to implement according to this design concept.

But you must pay dearly for this advantage. As the initialization moves into the if-statement, initialization and comparison will be mixed up. This violates two other software design concepts, named “separation of concerns” and “keep it simple”. Depending on the complexity of the initialization and the comparison you may create a very complex if-statement. This may result in hard to read and error prone code. Only in case you have a very simple initialization and a very simple comparison, the combination of both may stay simple as well. In all other cases I recommend avoiding the new feature and clearly separate the initialization and the comparison in order to increase the code readability.

At next i want to show some examples. Within the example a couple of functions are called. If you want to compile the example source code you could use the following implementations of These functions.

int CalcCount() { return 1000; }
int CalcExpectedCount() { return 1000; }
int CalcOldCount() { return 1000; }
bool IsInitialized() { return true; }

Let’s have a look at a simple example. The following source code shows an if-statement with an included variable initialization and the same if-statement with a separation of the initialization and the comparison. Furthermore, just for fun, I removed the line breaks for the second example to compare it with the new syntax.

Due to an issue within the wordpress codeblock element, i could not use my origin code examples. I had to remove all insertion operators „<<“ and use „..“ as placeholder. So, the examples will contain some standard outputs, and within these outputs the two points „..“ must be seen as inseration operator „<<„.

// init inside if
if (int count = CalcCount(); count > 100)
{
	std::cout .. "count: " .. count .. std::endl;  
}

// init outside if
int count = CalcCount();

if(count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

// init outside if without line break
int count = CalcCount(); if (count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

If we compare the first and the second implementation – so if we compare new and classical syntax – we could say that the differences are small. In my opinion both variants are easy to read. Even the third one, with classical syntax but without line breaks, may be easy to read, even if it is unusual. But you can see that the new syntax isn’t that different from the classical without line break. Just the if-statement moved at the front. Of course, this little change increased the readability a lot.

So, we must look at a more complex example. Let’s see how things look like if we increase the complexity of the initialization, but leave the comparison as simple as before.

// init inside if
if (int count = IsInitialized() ? CalcCount() : (CalcExpectedCount() + CalcOldCount()) / 2; count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

// init outside if
int count = IsInitialized() ? CalcCount() : (CalcExpectedCount() + CalcOldCount()) / 2;

if (count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

// init outside if, separate different init variants
int count = 0;

if (IsInitialized())
{
	count = CalcCount();
}
else
{
	count = (CalcExpectedCount() + CalcOldCount()) / 2;
}

if (count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

At first you can see the new syntax. In my opinion this if-statement is very hard to read. You have to stop at this line of code, read it several times and look closely to understand the meaning of the code.

The second implementation separates the initialization and the comparison. I think this will make it a little bit easier to read the code.

The third example clearly separates the different concerns. We have a standard initialization, an initialization for fallback cases and a comparison. This source code is easy to read. You don’t have to stop reading at any line of code as you must read it again to understand it. The complex initialization and comparison is spitted into simple parts.

Summary

Initializers in if-statements and switch-statements allow a clear assignment of the variable to the scope of the statement. But mixing the two concerns of initialization and comparison will often result in complex code. Therefore, in my opinion, the new syntax should be used with caution. If the initialization as well as the comparison is short and simple the resulting combination of both may be simple too and in this case the new syntax should be used.

Veröffentlicht unter C++ | 2 Kommentare