Automated cognitive testing

In this article I would like to give few basic thoughts on the topic of cognitive testing and clarify what is meant by this kind of testing, what cognitive software systems are, how to test such systems and what is meant by cognitive testing tools.

Cognition

It isn’t that easy to find an exact definition for the term “cognition” because it depends on the context. For example, in contrast to cognitive software systems the human cognition involves thinks. So, I want to use a simplified clarification, which is applicable to man and machine.

Cognition is based on knowledge. This knowledge can be built up by analyzing historical data (learning) and by analyzing actual situations (self-observation). A cognitive action will start with an analysis of the actual situation and therefore the actual information. This information can contain a lot of unstructured data. This data will be analyzed based on the knowledge of the cognitive system. The result of this process is an appropriate action or reaction.

A cognitive system accesses many sources and data, combines that data, transforms information, filters and evaluates that information out of its context, interacts with other systems (such as a human being), learns from information, context, and interaction. The system makes something new out of this data, even if the underlying and available information and its relationships are complex.

Cognitive Software Systems

The term “artificial intelligence” has been a popular catchphrase for a long time. But in the recent years this outworn catchphrase was replaced by the term “cognitive system” or “cognitive software”.

Cognitive software systems will process unstructured data and use artificial intelligence to draw conclusions. Currently the artificial intelligence is often based on neural networks and the knowledge is build up by machine learning or the special form of deep learning.

We will use such intelligent software systems nearly every day. For example, every speech recognition in our cars or smart homes uses technology based on neural networks.

But I think there is a major issue concerning the use of the term “cognitive software”: it is used for a wide range of software system types, starting with simple systems which simulate intelligence and ending with systems which think and act in a human way for example systems for autonomous driving. Only the second type is a real cognitive software. So, in my opinion, the term “cognitive software” is used to often, even for systems which have some simple kind of artificial intelligence, like speech recognition, but which don’t have real cognitive capability’s.

Cognitive Testing

After the short introduction of the term cognitive and of cognitive software systems, we can start with the main topic of this article: cognitive testing. So, at first let’s think about the main work steps of a software tester.

The software tester will specify a set of basic tests. These basic tests are very specific and strict. They contain a detailed description of the pre conditions, the input date, the execution steps and the expected result. These basic tests will be executed for each test candidate. This type of testing is widespread and used in many companies. Of course, the creation of the tests isn’t easy. The tester will use his knowledge, creativity and experience to develop the test cases. So, this working step is strongly based on his cognitive skills. In contrast, the execution of these basic tests is very easy and does not depend on cognitive skills. The work of a tester is often supported by software tools. You will find a lot of tools for the execution of basic tests. But interestingly it will be very difficult to find tools which strongly support the development of tests. Maybe because this part is strongly based on cognitive skills?

There exists a second type of testing which is called “free testing” or “explorative testing”. This type of testing is rarely used because it is time consuming and therefore expensive. Explorative or free testing means, the tester will not work according a fixed test specification. Instead he will use his knowledge about the testee and execute some new test sequences without a previous planning. Of course, these new sequences must be documented, and the tester must evaluate the result carefully because the expectation about the result depends on the knowledge of the tester. Therefore, such free testing can be very difficult even for an experienced software tester. This work, again, strongly depends on cognitive skill and so it is not a big surprise to find nearly no software tools for explorative testing.

Cognitive Testing Tools

The complexity of software systems is constantly rising. Therefore, there is a need for intelligent testing tools which strongly support the tester during development and execution of software tests. As a result of the rising complexity of the software to test, the testing tools must become smarter too. So, wee need testing tools with artificial intelligence and cognitive capabilities.

Such testing tools must offer some kind of intelligence, for example based on neuronal networks. Therefore, a new and important part of the daily routine of a tester will be to train these systems. This training or teaching must be done with use case specific data to create a wide and deep knowledge about the software to test. A good trained testing tool will be able to support the tester on development and execution of test cases or the testing tool will even be able to do this work alone without the need of a tester.

Testing of cognitive systems

As mentioned before, the complexity of software systems is constantly rising. So, it can be assumed to see the same growth for intelligent software systems. Over time the number of these software systems will rise, and they will evolve from system with simple artificial intelligence to systems with real cognitive capabilities.

Nowadays, most software systems are complex but have a well-defined behavior. Furthermore, it is rather rare to execute explorative test. Therefore, the development and execution of test cases isn’t that complex, but it is a time-consuming task. As a result, there exist a lot of powerful testing tools which support the work of the tester.

In the future, the number of intelligent or even cognitive software systems will increase. This will lead to a big change of the testing process. On one hand the development of the basis test cases will get more complex and on the other hand the importance of explorative test will increase sharply. The actual testing tools are not suitable for this new testing process. Cognitive software systems are a new kind of software which will lead to a new testing process and which will need a new type of tools: cognitive testing tools.

Current Situation

Software systems with artificial intelligence keep moving into more and more areas. But the current systems are limited to specific and simple use cases and are far away from real cognitive software systems. It will take a few more years before we will see and use cognitive software system in our daily life. Therefore, there is nearly no need for cognitive testing tools now. But over the next years the systems with artificial intelligence will increase in number and complexity. This will lead to an evolution within the test tools too. They will also become smarter and as soon as we start to use real cognitive software systems, we will need cognitive testing tools too.

Veröffentlicht unter Projektleitung, Testing | Kommentar hinterlassen

String View in C++17 (std::string_view)

The C++ string (std::string) is like a thin wrapper which stores its data on the heap. When you deal with strings it happens very often that a string copy must be created and therefore a memory allocation is done. But there are many use cases which do not require a copy of the instance. In these situations, you want to analyze the string or do some calculations based on the string but you don’t want to change its content.

With C++17 the string view was introduced (std::string_view). The string view is designed for use cases were a non-owning reference to a string is needed. It represents a view of a sequence of characters. This sequence of characters can be a C++-string or C-string. A string view offers nearly the same methods as a standard string. The following example shows how to create and use a string view.

int main()
{
	std::string str = "FooBar";
	std::string_view strview(str);

	std::string_view substr = strview.substr(strview.find('B'));

	std::cout << substr << '\n';
}

A typical implementation of a string view holds two members only: a pointer to constant char array and a size. Therefore, it is quite cheap to copy a string view. The main purpose of a string view is to avoid copying data if only a non-mutating view is required. So, the main reason for string view is performance.

The substr method is a good example to show the difference between string and string view performance. Both offer the substr method to get a substring. But there is a large difference in performance. The substr method of the string has a linear complexity and therefore it directly depends on the size of the content. The string view method has a constant complexity and therefore it is independent of the content size. So, if you have to deal with large strings and substring you may get a huge performance gain by using string view.

But if we speak about performance, we should never make a general statement like: “string view is always more performant than string”. Compilers are very smart when handling strings, especially when strings are short. Such a “small string optimization” is done for short strings. For example, in MSVC and GCC strings with a size up to 15 characters are stored on the stack and not on the heap. Because of the compiler optimizations, in some use cases there might no advantage if you use string view instead of string.

Non-owning reference

The string view holds a non-owning reference to a character array. Therefore, the lifetime of the referenced object must be larger than the lifetime of the string view. Otherwise it results in undefined behavior. The following code shows an erroneous implementation. The returned string view references to the string which is valid in method scope only. So, the use of the string view within the main-method scope will result in undefined behavior.

std::string_view Create()
{
	std::string str = "FooBar";
	std::string_view strview(str);

	return strview;
}

int main()
{
	std::string_view strview = Create();

	std::cout << strview << '\n';
}

Not null-terminated

There is one major pitfall if you use a string view: The string view content may not be null-terminated. This becomes relevant if you want to use functions like “atoi” or “printf” which expects a null-terminated c-string. You can easily pass the content of the string view by its “data()” method. It returns a pointer to the underlying character array. But this character array may not be null-terminated. Unfortunately the string view does not have a “c_str()” method like the string class. And to confuse us completely the “data()” method of the string view may have a different behavior than the “data()” method of the string, which returns a null-terminated character array in C++11 and later. In my opinion it isn’t a good choice to have such different behaviors in string and string view as they have nearly equal interfaces and as result users expect an equal behavior.

If you have to get the underlying character array of a string view as a null-terminated string, you have to explicitly convert it to a string and so you are able to use the “c_str()” method. The following example shows an according use case. The first string view contains a null-terminated character array. So, we can pass it to the “strlen” method. The second string view is not null-terminated. So, the “strlen” method call will result in undefined behavior. The third method call shows the conversion to a null-terminated string.

int main()
{
	// null terminated string
	std::string str = "FooBar";
	std::string_view strview(str);

	std::cout << std::strlen(strview.data()) << '\n';

	// not null terminated string
	char str2[6] = { 'F','o','o','B','a','r' };
	std::string_view strview2(str2, sizeof str2);

	std::cout << std::strlen(strview2.data()) << '\n';

	// convert to null terminated string
	std::cout << std::strlen(std::string(strview2).c_str()) << '\n';
}

Use string view as method parameter

After this short excursion to the risks when using string view, we will come back to its strengths. As we learned so far, the string view may be favorable whenever we must create a constant copy of the string value content. So, it may be a perfect choice for a read-only function parameter. Following we will compare three variants of a method implementation. Within these variants the string is passed as c-string, c++ string or string view.

Let’s start with a c-string. To keep it simple the method contains the extraction of a substring only, as example to access the input parameter.

void AnalyzeCharArray(const char* s)
{
	auto x = strchr(s, 'B');
}

int main()
{
	char arr[6] = { 'F','o','o','B','a','r' };
	std::string str = "FooBar";
	std::string_view strview(str);

	// with c string
	AnalyzeCharArray(arr);
	AnalyzeCharArray(str.c_str());
	AnalyzeCharArray(std::string(strview).c_str());
}

This implementation has some disadvantages. We must use “c_str()” if the method is called with std::string. We must do a safely null-terminated string construction if the method is called with string view.

So, let’s change the method and use a C++ string.

void AnalyzeStdString(const std::string& s)
{ 
	auto x = s.substr(s.find('B')); 
}

int main()
{
	char arr[6] = { 'F','o','o','B','a','r' };
	std::string str = "FooBar";
	std::string_view strview(str);

	// with c++ string
	AnalyzeStdString(arr);
	AnalyzeStdString(str);
	AnalyzeStdString(std::string(strview));
}

That looks better. But there are still some downsides. If the method is called with a string view, a conversion to string is necessary. Furthermore, there are some memory allocations: if the method is called with an c-string, the conversion to a c++ string and the substr call.

As third alternative we use the string view as method parameter.


void AnalyzeStringView(const std::string_view s)
{
	auto x = s.substr(s.find('B'));
}

int main()
{

	// with string view
	AnalyzeStringView(arr);
	AnalyzeStringView(str);
	AnalyzeStringView(strview);
}

This seems to be a good choice. The method can be called directly with char array, string and string view and there is no additional memory allocation. Furthermore, the string view offers same member functions like a string and can be use with algorithms.

So, the use of a string view as method parameter is the best choice in such use cases.

Use string view as return value

Of course, the string view can be used as return value too. As shown previously you just must be careful with object lifetime of the referenced object. The following source code shows an according example.

std::string_view GetSubstring(const std::string_view s)
{
	return s.substr(s.find('B'));
}

int main()
{
	std::string str = "FooBar";
	std::string_view strview(str);

	std::cout << GetSubstring(strview) << '\n';
}

Summary

The string view represents a view to a sequence of characters. As the string view is a lightweight object stored on the stack it solves some performance issues of a standard string. Both objects have a nearly similar interface so you can easily exchange them. There are two major pitfalls if you use a string view. At first, you must think about the lifetime of the referenced object instance. As the string view holds a non-owning reference, the lifetime of the referenced object must be larger than the lifetime of the string view. At second, the string view may contain a string which is not null-terminated. So, you may have to convert it to a c-string if you want to pass it to a function which expect a null-terminated character array.

Veröffentlicht unter C++ | Kommentar hinterlassen

“if constexpr” in C++17 (static if)

With C++17 the „if constexpr“ statement was introduced. This so called “static if” or “compile-time if-expression” can be used to conditionally compile code. The feature allows to discard branches of an if statement at compile-time based on a constant expression condition.

if constexpr(condition)
	statement1; 
else
	statement2;

Depending on the condition the statement1 or statement2 is discarded at compile time. A discarded statement inside a template is not instantiated. Therefore, this feature is mainly used in templates. It allows to compile specific statements only, depending on the template type. This can greatly simplify template code as it will be possible to easily express intends similarly to “run-time” code. Later we will see an example where static-if is used instead of template specialization.

We already have a feature for conditionally code compilation: the “#ifdef” directive. So, will static-if replace this directive? No, it will not as these two statements are not identical. Both will conditionally compile code, but “#ifdef” will do this based on conditions that can be evaluated at preprocessing time. For example, #ifdef could not be used to conditionally compile code depending on the value of a template parameter. On the other hand, static-if cannot be used to discard syntactically invalid code, while “#ifdef” can. So, there are use cases where you can use the one or the other implementation kind and there are use cases which are specific for one of them.

Example

As mentioned before, the static-if feature is very interesting for template implementation, for example as an alternative to template specialization. The following example shows a template implementation with specific code depending on the template type.

template 
void PrintInfo(T x)
{
	if constexpr (std::is_same_v)
	{
		std::cout << "string with length " << x.length() << std::endl;
	}
	else if constexpr (std::is_same_v)
	{
		std::cout << "int" << std::endl;
	}
	else
	{
		std::cout << "some other type" << std::endl;
	}
}

int main()
{
	std::string val1 = "foo";
	int val2 = 42;
	double val3 = 5.8;

	PrintInfo(val1);
	PrintInfo(val2);
	PrintInfo(val3);	
}

Following you will see an identical implementation static-if. In this case template specialization is used.

template 
void PrintInfo(T x)
{
	std::cout << "some other type" << std::endl;
}

template 
void PrintInfo(std::string x)
{
	std::cout << "string with length " << x.length() << std::endl;
}

template 
void PrintInfo(int x)
{
	std::cout << "int" << std::endl;
}

int main()
{
	std::string val1 = "foo";
	int val2 = 42;
	double val3 = 5.8;

	PrintInfo(val1);
	PrintInfo(val2);
	PrintInfo(val3);	
}

If you compare the two implementations you may say that the one with template specialization is easier to read and to understand. Of course, most real implementations will be more complex so it isn’t possible to say which of the two implementation concepts is favorable. In my opinion it depends on the use case. Therefore, the static-if will not replace existing concepts like template specialization.

Compile-time discard

On the beginning of the article I mentioned that the static-if executes a conditional discard and that discarded statement inside a template will not be instantiated. What does this mean? And what difference do we have between static-if inside and outside of templates.

The following example shows the template implementation we seen previously but this time the static-if is replaced with a normal if-statement.

template 
void PrintInfo(T x)
{
	if (std::is_same_v)
	{
		std::cout << "string with length " << x.length() << std::endl;
	}
	else if (std::is_same_v)
	{
		std::cout << "int" << std::endl;
	}
	else
	{
		std::cout << "some other type" << std::endl;
	}
}

int main()
{
	std::string val1 = "foo";
	int val2 = 42;
	double val3 = 5.8;

	PrintInfo(val1);
	PrintInfo(val2);
	PrintInfo(val3);
}

Of course, this code is invalid and results in a compiler error as the “int” type does not have a “length” method. But in case we use static-if the example can be compiled. That’s because the discarded if-elements will not be instantiated at all.

If we use the static-if outside of a template we can see a different behavior. The discarded if-elements will be instantiated. The following code shows an easy example with an undeclared identifier within the discarded if-element. The code within the template can be compiled but the code without the template shows an according compiler error.

template 
void DoSomething(T x)
{
	if constexpr(true)
	{
		std::cout << x << std::endl;
	}
	else
	{
		std::cout << y << std::endl;	// OK as code is not instantiated at all
	}
}

int main()
{
	int x = 2;

	DoSomething(x);

	if constexpr(true)
	{
		std::cout << x << std::endl;
	}
	else
	{
		std::cout << y << std::endl;	// error C2065: 'y': undeclared identifier
	}
}

static-if vs. template specialization

We already seen a comparison between static-if and template specialization within the short example at the beginning. Let’s have a look at a more complex example to get a better feeling for the differences of the concepts.

Let’s say we have the following use case. We should implement a calculation which consists of three steps: a preparation, a transformation and a result creation. The three steps are already implemented. So, we use a given interface. The preparation and result creation steps are available as type specific methods. Therefore, we want to implement the calculation function as template and use the according type specific method variants.

The following implementation contains the given interface as dummy implementation and the template implementation. The template is implemented in two variants, one uses static-if and the other one uses template specialization.

// some given interface
int PrepareByString(std::string x) { return x.length(); }
int PrepareByInt(int x) { return x - 10; };
void Transform(int* x) { *x = *x + 5; }
std::string CreateStringResult(int x) { return std::to_string(x); }
int CreateIntResult(int x) { return x + 5; }

// template with constexpr
template 
T Calculate(T x)
{
	int temp = 0;
	if constexpr (std::is_same_v) temp = PrepareByString(x);
	else if constexpr (std::is_same_v) temp = PrepareByInt(x);	
	else return x;
	
	Transform(&temp);

	if constexpr (std::is_same_v) return CreateStringResult(temp);
	else if constexpr (std::is_same_v) return CreateIntResult(temp);
}

// template with specialization
template 
T Calculate(T x)
{
	return x;
}

template 
std::string Calculate(std::string x)
{
	int temp = PrepareByString(x);

	Transform(&temp);

	return CreateStringResult(temp);
}

template 
int Calculate(int x)
{
	int temp = PrepareByInt(x);

	Transform(&temp);

	return CreateIntResult(temp);
}

// main function
int main()
{
	std::string val1 = "foo";
	int val2 = 42;
	double val3 = 5.8;
	
	std::cout << Calculate(val1) << std::endl;
	std::cout << Calculate(val2) << std::endl;
	std::cout << Calculate(val3) << std::endl;
}

Based in this short example we can see the advantages and disadvantages of both solutions. We have implemented a fixed calculation algorithm which is same for all data types. If we use template specialization and implement each data type separately we must duplicate this algorithm. And of course, duplicated code comes with the well-known disadvantages. By using static-if we must implement the algorithm one time only. But we had to add two if-statements. So instead of the straight procedure we add code branches. Therefore, the complexity of the single calculation method increases but the complexity of the template is reduced as the template contains one method only instead of three methods.

This is still an easy example with a few lines of code, but it will show the main difference of the concepts and the resulting code. Whether you use the one or the other concept may be use case specific. And of course, there are many other alternatives too or you can even mix up several concepts. In summary I recommend use of static-if in templates. It often improves the code quality as it makes the source code easier to read and to maintain.

Veröffentlicht unter C++ | Kommentar hinterlassen

Auto Type Deduction in Range-Based For Loops

Range-based For Loops offer a nice way to loop over the elements of a container. In combination with Auto Type Deduction the source code will become very clean and easy to read and write.

for (auto element : container) 
{ 
    // ...
}

The Auto Type Deduction is available in different variants:

  • auto
  • const auto
  • auto&
  • const auto&
  • auto&&
  • const auto&&
  • decltype(auto)

Of course, these variants result in different behaviors and you should choose the right one according to your needs. Following I will give a short overview of these variants and explain their behavior and standard use case.

auto

This will create a copy of the container element. This variant is used in case you want to get and modify the element content, for example to pass it to e function, but leave the origin container element as it is.

const auto

Like in the first variant, this creates a copy of the element content. But this time this copy is constant and cannot be changed. In most cases “const auto” isn’t a good choice. If you want to work with an immutable copy you can use “const auto&” too and don’t have to create a copy. There are a few use cases for this variant only. For example, it could be useful in multithreading scenarios. Let’s say you want to use the element several times within the loop. But in parallel another thread may change the container element. By using “const auto” you create a copy of the element and can use this copy in your loop several times. If you use “const auto&” you will get the updated element instead. So, there are scenarios where “const auto” and “const auto&” create different results. Therefore, we have a need for “const auto” even if it is used very rarely.

auto&

This will create a reference to the original container element. So, it is used in case you want to modify the container content.

const auto&

This creates a constant reference to the original container element. So that’s the perfect choice if you need read-only access to the elements.

auto&&

Like “auto&” this variant with double “&” is used in case you want to modify the origin container elements. There are some special cases where it isn’t possible to use the normal “auto&” variant. For example a loop over “std::vector<bool>” yields a temporary proxy object, which cannot bind to an lvalue reference (auto&). In such cases “auto&&” can be used. It is a forwarding reference. If it is initialized with an lvalue, it creates an lvalue reference and if it is initialized with an rvalue, it creates an rvalue reference. As a result, “auto&&” is a good candidate for generic code and therefore it is most often used in templates.

Of course, as the forwarding reference “auto&&” covers the use cases of the standard reference “auto&” we may ask ourselves if we should ever use the variant with the double “&”. But I would not recommend this. The syntax with double “&” is more confusing. Developers are familiar with the standard reference syntax and will expect the forward reference syntax in special cases only. So, I recommend using “auto&” outside of templates and “auto&&” within templates.

const auto&&

This creates a read-only forwarding reference. This variant will bind to rvalues only. A read-only access will work for containers which yield a temporary proxy object. So in contrast to a read-access we don’t have to use the double “&” variant for containers like “std::vector<bool>”. There are only a view theoretical use cases for “const auto&&” and therefore you will normally not use this variant for your applications.

decltype(auto)

This variant should not be used in Range-Based For Loop. “decltype(auto)” is primarily useful for deducing the return type of forwarding functions. Use it to declare a local variable is an antipattern. Therefore, I don’t want to get in detail about “decltype” and just mentioned it for completeness. Even if the compiler allows to write “decltype(auto)” you should not use it in Range-Based For Loops.

Summary

  • Use “auto” when you want to work with a copy of the elements
  • Use “auto&” when you want to modify elements
  • Use “auto&&” when you want to modify elements in generic code
  • Use “const auto&” when you want read-only access to elements
  • Use “const auto” in multithreading scenarios when you need read-only access to volatile elements
Veröffentlicht unter C++ | Kommentar hinterlassen

C# Protected Internal vs Private Protected

C# offers the composed access modifiers “protected internal”. With C# 7.2 a new composed access modifier was added: “private protected”. Unfortunately, these modifiers are hard to understand as their names don’t reflect their meaning. Within this article I want to explain the two modifiers and their technical background.

Within the CLR you will find the single access modifiers “Family” and “Assembly”. “Family” means that this object or a derived object has access. Within C# and many other programming languages this “Family” modifier is implemented with the “protected” keyword. The CLR “Assembly” modifier means that a member is accessible by everyone within the defining assembly. The according implementation in C# is done with the “internal” keyword. So far, as we have the single modifiers it is quite easy. But what if we combine those two modifiers?

Within the CLR it stays simple. It offers two compound access modifiers: “Family and Assembly” and “Family or Assembly”. It combines the single modifiers by “And” to create an intersection or by “Or” to create a union.

“Family and Assembly” will allow access from objects of this assembly in case they are derived objects.

“Family or Assembly” will allow access from everyone object within the defining assembly and additional by any derived object outside of the assembly.

In C# things become difficult as they choose an awkward syntax. At first, the support for the “Family or Assembly” CLR access modifier was added to C#. But the C# syntax was not “protected and internal” it was “protected internal” without the “and”. I think that’s a good choice as it keeps things simple. The “and” keyword would be disturbing and unnecessary.

But with C# 7.2 the “Family and Assembly” CLR access modifier should become a part of C#. Now the language designers had the issue that the “protected internal” modifier without the “and” keyword was still part of the language. So how should they name the new modifier? If they choose a syntax like “protected or internal” it would be easy to understand but with the downside of the implicit interpretation of “protected internal” as “protected AND internal” and therefore with the risk that developers by mistake use the wrong modifier as they have nearly the same syntax.

So, the language designers decided to use the syntax “private protected”. This should mean we have the well-known protected relationship between base and derived object and additional the “private” keyword means that this relationship is limited to derived objects of the same assembly. In my opinion that wasn’t a good decision. Of course, the composed keywords are now different and cannot be mixed up by mistake, but they are awkward and confusing as they don’t reflect their meaning.

But of course, we should not criticize this decision too much because it was a decision between bad alternatives only. There was no possibility to add the “Family and Assembly” feature in a clean way. The crucial mistake was already done as the “Family or Assembly” feature was added to C#. The language designers choose a syntax without respect to the possibility that the “Family and Assembly” feature will be added in future.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

Ref Return and Ref Locals in C# 7

The C# language supports passing arguments by value or by reference since the first language version. But returning a value was possible by value only. This has been changed in C# 7 by introducing two new features: ref returns and ref locals. With these new features it is possible to return by reference. ‚Ref return‘ allows to return an alias to an existing variable and ‚ref local‘ can store this alias in a local variable.

The main goal of this language extension is allowing developers to pass around references to value types instead of copies of the values. This is important when working with large data structures implemented as value types. Of course, the new feature can be used with reference types too but reference types will already be returned as pointers and you will not have advantages if you return them as reference to pointer.

Return by value

The following source code shows an example for a return by value. The function returns the second element of the list. As it is a return by value, a copy of the element will be returned. A modification of the returned element is done within the copy and not within the origin object instance.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
  new Person() {mName = "John Doe", mAge = 31 },
  new Person() {mName = "Jane Doe", mAge = 27 },
  };

  Person person = GetSecond(persons);
  person.mAge = 41;

  // output:
  // 'John Doe (31)'
  // 'Jane Doe (27)'
  foreach (Person p in persons)
  {
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

struct Person
{
  public string mName;
  public int mAge;
}

static Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) throw new ArgumentException();

  return persons[1];
}

If you want to find and use the origin element you had to implement the method in a different way, for example return the found index and use this index to modify the origin list. With the new ‘ref return’ feature it becomes possible to implement the needed behavior very easily. You just must change the return from a value to a reference.

Return by reference

The following source code shows the same adapted example. This time the found list item is returned as reference. The reference is sored within the ref local variable. Changes mode to this variable are mode to the referenced object instance. So the origin list item will be changed.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  ref Person person = ref GetSecond(persons);
  person.mAge = 41;

  foreach (Person p in persons)
  {
    // output:
    // 'John Doe (31)'
    // 'Jane Doe (41)'
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) throw new ArgumentException();

  return ref persons[1];
}

Use a List instead of an Array

Within the previous example I used an array of struct objects. What do you think will happen if we change to another container type, for example to a list?

static void Main(string[] args)
{
  List persons = new List
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  ref Person person = ref GetSecond(persons);
  person.mAge = 41;

  foreach (Person p in persons)
  {
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

static ref Person GetSecond(List persons)
{
  if (persons.Count < 2) throw new ArgumentException();

  // error CS8156: An expression cannot be used in this context
  // because it may not be passed or returned by reference
  return ref persons[1];
}

It isn’t longer possible to compile the code. The ‘return ref persons[1]’ statement results in an compiler error. That’s because of a different implementation of the array indexer and the list indexer. The array indexer returns a reference to the list item. So, we can use this reference as return value. The list indexer instead returns a copy of the value. As it isn’t allowed to return the indexer expression itself nor the returned temporary local variable, the compiler will show an according error message. Within the following article you can find further information about this issue.

Ref local

Within the example application we have stored the returned reference within a ‘ref local’ variable. A ref local variable is an alias to the origin object instance. It is initialized by the ref return value. The reference itself is constant after this initialization. Therefore, an assignment to a ref local will not change the reference but it will change the content of the referenced object.

The following source code shows an according example.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  ref Person person = ref persons[1];
  person.mAge = 41;
  person = persons[0];
  person.mAge = 51;

  foreach (Person p in persons)
  {
    // output:
    // 'John Doe (31)'
    // 'John Doe (51)'
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

At first the ref local is initialized with a reference to the second list item. Then the age is changed to 41. Now we have another assignment which is very interesting. ‘persons[0]’ returns a reference to the first list item. But an assignment to our ref local variable will not change the reference. The reference is set during initialization and stays constant. The assignment will change the value of the referenced object. Therefore, the second list item – which is referenced by the ref local variable – will be changed to the values which are stored within the first list item, which are ‘John Doe’ and ‘31’. At next we set the age to ‘51’. So, the output is like shown in comment within the example application. The first list item is not changed at all and the second list item was updated with the name stored within the first item and an age set due to the last assignment.

Ref vs. Pointer

The previous example and this topic in general may raise the question about the difference between a reference and a pointer. So, we should take a minute and think about this question as it is important for a deep understanding of the ref local and ref return mechanism.

Briefly summarized you can say: References and pointers do both refer to an object instance. But references are constant after initialization where pointers can be changed.

This single difference is an important one as it results in different characteristics and possibilities of the two concepts. Following I want to mention some important ones. As references are constant they cannot be null. Pointers instead can be reassigned and as consequence they can also set to null. You can have pointers to pointers and create extra levels of indirection, whereas references only offer one level of indirection. As pointers can be reassigned, various arithmetic operations can be performed on them, which is called ‘pointer arithmetic’. It is easier to work with references as they cannot be null and you don’t have to think about indirection. But it is not safer to work with references because pointers as well as references can refer to invalid objects or memory locations.

Please look at the following functions and think about the parameter kind: is it a reference or a pointer or a value copy?

Function Parameter kind
void Foo(MyStruct x) Copy of the value passed by the caller
void Foo(MyClass x) Pointer to the origin object instance which is available on caller level
void Foo(ref MyStruct x) Reference to the origin object instance which is available on caller level
void Foo(ref MyClass x) Reference to the pointer to the origin object instance which is available on caller level   Often C# developers say that’s a “pointer to a pointer” but in fact it is a reference to a pointer. As you know after reading above comparison of the two concepts that’s a small but important difference.

Use ref return without ref local

You can define a method with ref return and assign the result to a variable which is not ref local. In this case the content which is referenced by temporary variable of the method result will be copied to the local variable. So your local variable is a copy of the origin list item and changes will therefore not affect the list item.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  Person person = GetSecond(persons);
  person.mAge = 41;

  foreach (Person p in persons)
  {
    // output:
    // 'John Doe (31)'
    // 'Jane Doe (27)'
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) throw new ArgumentException();

  return ref persons[1];
}

Ref return of method local variable

A ref return creates a reference to an existing object instance. If the lifetime of the object instance is shorter than the lifetime of the reference, the reference will refer to an invalid object or memory location. This will result in critical runtime errors. Therefore, the referred object instance must be in a higher scope or in the same scope as the ref local variable. You cannot create a method local object instance and return a reference to this object instance. The following source code shows an according example with the compiler error messages as comment.

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2)
  {
    // error: 
    // An expression cannot be used in this context 
    // because it may not be passed or returned by reference
    return ref new Person();

    // error:
    // Cannot return local 'person' by reference because it is not a ref local
    Person person = new Person();
    return ref person;
  }

  return ref persons[1];
}

Return null

As we already learned, a reference cannot be null. Therefore, a method with ref return cannot return null. Within the examples so far, we have thrown an error if the parameter is invalid. But exceptions should be thrown in exceptional cases only. In my opinion it is not an exceptional case if we pass a list with less than two elements to the ‘GetSecond’ method. So, I don’t want to throw an exception but as no list item is found I want to return an invalid element. For reference types I want to return null and for value types I want to return a default value. But as we have seen, whether it is possible to create a local default value and return it by reference nor it is possible to return null. But it is possible to return a reference to an object instance if the object is in higher scope. We can use this possibility and define a default value for an invalid list item.

The following source code shows an according example with a list of value types.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
    new Person() {mName = "John Doe", mAge = 31 }
  };

  ref Person person = ref GetSecond(persons);
  person.mAge = 41;
}

static Person gDefaultPerson = new Person();

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) return ref gDefaultPerson;

  return ref persons[1];
}

And we can adapt the example for a list of reference types.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
        new Person() {mName = "John Doe", mAge = 31 }
  };

  ref Person person = ref GetSecond(persons);

  if (person != null) person.mAge = 41;
}

static Person gDefaultPerson = null;

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) return ref gDefaultPerson;

  return ref persons[1];
}

But there is one very critical design fault within this implementation. The returned default value can be changed. So, a next method call or a method call by another client may return a default value with changed content. This may lead to undefined behavior in the application. But what if we use an immutable object for the default value? This will solve the issue and allows to use this implementation concept. So, you must implement an immutable object and return a reference to this constant object instance. With C# 7.2 it will be possible to use the readonly modifier for structs and ref returns. This will make it even more comfortable to create and use immutable structs.

‘In’ modifier and ‘Readonly’ modifier for struct und ref return

The code examples of this article were created with C# 7.0. With C# 7.2 you can use two additional features which allows to write more performant code. These features are the ‘in’ modifier for method parameters and the ‘readonly’ modifier for ref returns and for structs.

Method parameters are often used as input for the method. So, they will not be changed within the method. If you use a struct as method parameter it is passed by value. In this case the runtime creates a copy of the struct instance and pass the copy to the method. This language design concept allows to use the method parameter as method local value without any side effect to the origin struct instance outside of the method scope. But of course, this comes with the disadvantage of performance loss as it may be expensive to create the copy of the struct.

But do we need a copy at all if we just read the parameter values? Of course not! In this case it would be fine to pass a reference to the origin struct. But it must be guaranteed that it is used to read values only. This is exactly the idea of the ‘in’ modifier. As well as the ‘out’ and ‘ref’ modifiers, the parameter will be passed as reference. Additional a ‘in’ parameter will become read only. So, you cannot assign a new value to the parameter. This is comparable to the “pass by const reference” principle in C++.

In theory the ‘in’ modifier is a nice and easy way to improve the performance of method calls with struct parameters. But unfortunately, it isn’t that easy. Depending on the implementation of the struct the compiler must create a copy of the parameter even if you use the ‘in’ modifier. This procedure is called ‘defensive copy’. It is used in case the compiler cannot guarantee that the parameter will not be changed inside the method. Of course, the compiler can prevent direct assignments. But if you call a struct method, the compiler may not know if the member method changes the internal state of the struct. In such situations the defensive copy is created.

To prevent a creation of a defensive copy you can implement an immutable struct. In this case you must use the ‘readonly’ modifier for the class declaration. A readonly struct cannot be changed. Even member methods cannot change the internal state. If you pass such a readonly struct instance as in parameter to a method the compiler knows that the value stays constant and does not have to create a defensive copy.

The ‘readonly’ modifier is moreover available for the ref return value of a method. Of course, the reference itself is constant by definition, so this readonly modifier means that the referenced object instance is constant.

Summary

Ref returns and ref locals help to write more performant code as there’s no need to move copies of values between methods. These enhancements are designed for performance critical algorithms where minimizing memory allocations is a major factor. For the same reason the ‘in’ modifier and the ‘readonly’ modifier for structs were introduced. To pass readonly structs as in parameters to methods may increase the application performance.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

Pattern Matching in C# 7

Patterns are used to test whether a value matches a specific expectation and if it matches patterns allow to extract information from the value. You already create such pattern matchings by writing if and switch statements. With these statements you test values and if they match the expectation you extract and use the values information.

With C# 7 we got an extension to the syntax for is and case statements. This syntax extension allows combine the two steps: testing a value and extract its information.

Introduction

Let’s start with a basic example to see what we are talking about. The following source code shows how to test whether a value is of specific type and then use the value for a console output. The code shows the old and new syntax so you can compare these two implementations. As you can see the new syntax combines the value testing and information extraction in one short statement.

static void Main(string[] args)
{
  WriteValueCS7("abc");
  WriteValueCS6(15);
  WriteValueCS7(18.4);
}

static void WriteValueCS7(dynamic x)
{
  //C# 7
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");
}

static void WriteValueCS6(dynamic x)
{
  //C# 6
  if (x is int)
  {
    var i = (int)x;
    Console.WriteLine("integer: " + i);
  }
  else if (x is string)
  {
    var s = x as string;
    Console.WriteLine("string: " + s);
  }
  else
  {
    Console.WriteLine("not supported type");
  }
}

The example shows pattern matching used in an is-expression to do a type check. The new pattern matching syntax is furthermore supported in case-expressions and it allows three different type of patterns: the type pattern, the const pattern and the var pattern. We will see these different possibilities within the next paragraphs.

Type Pattern

We have already seen the type pattern matching within the previous example. It is used to check whether a value is of a specific type. If the type is matching a new variable of this type is created and can be used to extract the value information. If a value is null, the type check always returns false. The following source code shows an according example.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");
}

Const Pattern

The pattern matching can be used to check whether the value matches a constant. Within this pattern you cannot create a new variable with the value information as the value already matches a constant and can be used as it is.

static void Main(string[] args)
{
string a = "abc";
string b = null;
int c = 15;
int d = 17;

WriteValue(a);  // output: 'const: abc'
WriteValue(b);  // output: 'const: null'
WriteValue(c);  // output: 'const: 15'
WriteValue(d);  // output: 'unknown'
}

static void WriteValue(dynamic x)
{
if (x is 15) Console.WriteLine("const: 15");
else if (x is "abc") Console.WriteLine("const: abc");
else if (x is null) Console.WriteLine("const: null");
else Console.WriteLine("unknown");
}

Var Pattern

The var pattern is a special case of the type pattern with one major distinction: the pattern will match any value, even if the value is null. Following we see the example previously used for the type pattern, extended with the var pattern.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else if (x is var v) Console.WriteLine("not supported type");
}

If we look at this example we may ask two critical questions: Why do we have to specify a temporary variable for the var pattern if we dont use it? And why do we use the var pattern at all is it is the same as the empty (default) else-statement?

The first question is easy to answer. If we use the var pattern and don’t need the target variable we can use the discard wildcard „_“ which was also introduced with C# 7.

The second question is more difficult. As described, the var pattern always matches. So, it represents a default case, which is the empty else in an if-else statement. Therefore, if we just want to write the default else-case we should not use the var pattern at all. But the var pattern proves to be practical as we want to distinguish between different groups of default-cases. The following code shows an according example. It uses more than one var-pattern to handle the default-case in more detail. As mentioned above the last var pattern is unnecessary and you can write an empty else. I used the var pattern anyway to show you how to use the discard character.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;
  double d = 17.5;
  Guid e = Guid.NewGuid();

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: ''null' is not supported'
  WriteValue(c);  // output: 'integer: 15'
  WriteValue(d);  // output: 'not supported primitive type'
  WriteValue(e);  // output: 'not supported type'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else if ((x is var v) && (v == null)) Console.WriteLine("'null' is not supported");
  else if ((x is var o) && (o.GetType().IsPrimitive)) Console.WriteLine("not supported primitive type");
  else if (x is var _) Console.WriteLine("not supported type");
}

Switch-case

At the beginning of the article I mentioned that pattern matching can be used in if-statements and switch-statements. Now we know the three types of pattern matching and have used them in if-statements. At next we will see how to use the patterns in switch-statements.

The switch-statement so far was a pattern expression. it supported the const pattern only and was limited to numeric types and the string type. With C# 7 those restrictions have been removed. Now the switch-statement supports pattern matching and therefore all three patterns can be used. Furthermore, a variable of any type may be used in a switch statement.

The new possibilities have an side-effect which made it necessary to change the behavior of the switch-case-statement. So far, the switch statement supported const pattern only and therefore the case-clauses were unique. With the new pattern matching the case-clauses can overlap and may not be unique anymore. Therefore, the order of the case-clauses matters. For example, the compiler emits an error if the previous clause matches a base type and the next clause matches a derived type. Because of the possible overlapping case-clauses, each case must end with a break or return. This prevents code execution to „fall through“ from one case expression to the next.

The following example shows the type pattern used in an switch-case-statement.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  switch (x)
  {
    case int i: Console.WriteLine("integer: " + i); break;
    case string s: Console.WriteLine("string: " + s); break;
    default: Console.WriteLine("not supported type"); break;
  }
}

Switch-case with predicates

Another feature related to pattern matching is the ability to use predicates within the switch-case-statement. Within a case-clause a when-clause can be used to do more specific checks.

The following source code shows the use case we already seen in the var pattern example. But this time we use the switch-case and where statements instead of the if-statement.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;
  double d = 17.5;
  Guid e = Guid.NewGuid();

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: ''null' is not supported'
  WriteValue(c);  // output: 'integer: 15'
  WriteValue(d);  // output: 'not supported primitive type'
  WriteValue(e);  // output: 'not supported type'
}

static void WriteValue(dynamic x)
{
  switch (x)
  {
    case int i: Console.WriteLine("integer: " + i); break;
    case string s: Console.WriteLine("string: " + s); break;
    case var v when v == null: Console.WriteLine("'null' is not supported"); break;
    case var o when o.GetType().IsPrimitive: Console.WriteLine("not supported primitive type"); break;
    default: Console.WriteLine("not supported type"); break;
  }
}

Scope of pattern variables

A variable introduced within a type pattern or var pattern in an if-statement is lifted to the outer scope. This leads to strange behavior of the compiler. On the one hand it is not meaningful to use the variable outside the if-statement because it may not be initialized. And on the other hand, the compiler behavior is different for an if-statement and an else-if statement. But maybe this strange behavior will be fixed in a next compiler version. The following source code shows an according example with the compiler errors as comments.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");

  Console.WriteLine(i); // error: Use of unassigned local variable 'i'
  i = 15; // ok      

  // s = "abc";  // error: The name 's' does not exist in the current context
  string s = "abc"; // error: 's' cannot be declared in this scope because that name is used in a local or parameter
}

Pattern variables created inside a case-clause are only valid within the case-clause. They are not lifted outside the switch-case scope. In my opinion this leads to a clean separation of concerns and it would be nice to have the same behavior in if-statements.

Summary

Pattern matching is a powerful concept. The pattern matching possibilities introduced with C# 7 offer nice ways to write complex if-statements and switch-statements in a clean way. The patterns introduced so far are just some base ones and with C# 8 it is planned to add some more advanced ones like recursive pattern, positional pattern and property pattern. So, this programming concept is not just syntactical sugar, it will become an important concept in C# and introduces more and more functional programming techniques to the language.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen