Ref Return and Ref Locals in C# 7

The C# language supports passing arguments by value or by reference since the first language version. But returning a value was possible by value only. This has been changed in C# 7 by introducing two new features: ref returns and ref locals. With these new features it is possible to return by reference. ‚Ref return‘ allows to return an alias to an existing variable and ‚ref local‘ can store this alias in a local variable.

The main goal of this language extension is allowing developers to pass around references to value types instead of copies of the values. This is important when working with large data structures implemented as value types. Of course, the new feature can be used with reference types too but reference types will already be returned as pointers and you will not have advantages if you return them as reference to pointer.

Return by value

The following source code shows an example for a return by value. The function returns the second element of the list. As it is a return by value, a copy of the element will be returned. A modification of the returned element is done within the copy and not within the origin object instance.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
  new Person() {mName = "John Doe", mAge = 31 },
  new Person() {mName = "Jane Doe", mAge = 27 },
  };

  Person person = GetSecond(persons);
  person.mAge = 41;

  // output:
  // 'John Doe (31)'
  // 'Jane Doe (27)'
  foreach (Person p in persons)
  {
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

struct Person
{
  public string mName;
  public int mAge;
}

static Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) throw new ArgumentException();

  return persons[1];
}

If you want to find and use the origin element you had to implement the method in a different way, for example return the found index and use this index to modify the origin list. With the new ‘ref return’ feature it becomes possible to implement the needed behavior very easily. You just must change the return from a value to a reference.

Return by reference

The following source code shows the same adapted example. This time the found list item is returned as reference. The reference is sored within the ref local variable. Changes mode to this variable are mode to the referenced object instance. So the origin list item will be changed.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  ref Person person = ref GetSecond(persons);
  person.mAge = 41;

  foreach (Person p in persons)
  {
    // output:
    // 'John Doe (31)'
    // 'Jane Doe (41)'
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) throw new ArgumentException();

  return ref persons[1];
}

Use a List instead of an Array

Within the previous example I used an array of struct objects. What do you think will happen if we change to another container type, for example to a list?

static void Main(string[] args)
{
  List persons = new List
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  ref Person person = ref GetSecond(persons);
  person.mAge = 41;

  foreach (Person p in persons)
  {
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

static ref Person GetSecond(List persons)
{
  if (persons.Count < 2) throw new ArgumentException();

  // error CS8156: An expression cannot be used in this context
  // because it may not be passed or returned by reference
  return ref persons[1];
}

It isn’t longer possible to compile the code. The ‘return ref persons[1]’ statement results in an compiler error. That’s because of a different implementation of the array indexer and the list indexer. The array indexer returns a reference to the list item. So, we can use this reference as return value. The list indexer instead returns a copy of the value. As it isn’t allowed to return the indexer expression itself nor the returned temporary local variable, the compiler will show an according error message. Within the following article you can find further information about this issue.

Ref local

Within the example application we have stored the returned reference within a ‘ref local’ variable. A ref local variable is an alias to the origin object instance. It is initialized by the ref return value. The reference itself is constant after this initialization. Therefore, an assignment to a ref local will not change the reference but it will change the content of the referenced object.

The following source code shows an according example.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  ref Person person = ref persons[1];
  person.mAge = 41;
  person = persons[0];
  person.mAge = 51;

  foreach (Person p in persons)
  {
    // output:
    // 'John Doe (31)'
    // 'John Doe (51)'
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

At first the ref local is initialized with a reference to the second list item. Then the age is changed to 41. Now we have another assignment which is very interesting. ‘persons[0]’ returns a reference to the first list item. But an assignment to our ref local variable will not change the reference. The reference is set during initialization and stays constant. The assignment will change the value of the referenced object. Therefore, the second list item – which is referenced by the ref local variable – will be changed to the values which are stored within the first list item, which are ‘John Doe’ and ‘31’. At next we set the age to ‘51’. So, the output is like shown in comment within the example application. The first list item is not changed at all and the second list item was updated with the name stored within the first item and an age set due to the last assignment.

Ref vs. Pointer

The previous example and this topic in general may raise the question about the difference between a reference and a pointer. So, we should take a minute and think about this question as it is important for a deep understanding of the ref local and ref return mechanism.

Briefly summarized you can say: References and pointers do both refer to an object instance. But references are constant after initialization where pointers can be changed.

This single difference is an important one as it results in different characteristics and possibilities of the two concepts. Following I want to mention some important ones. As references are constant they cannot be null. Pointers instead can be reassigned and as consequence they can also set to null. You can have pointers to pointers and create extra levels of indirection, whereas references only offer one level of indirection. As pointers can be reassigned, various arithmetic operations can be performed on them, which is called ‘pointer arithmetic’. It is easier to work with references as they cannot be null and you don’t have to think about indirection. But it is not safer to work with references because pointers as well as references can refer to invalid objects or memory locations.

Please look at the following functions and think about the parameter kind: is it a reference or a pointer or a value copy?

Function Parameter kind
void Foo(MyStruct x) Copy of the value passed by the caller
void Foo(MyClass x) Pointer to the origin object instance which is available on caller level
void Foo(ref MyStruct x) Reference to the origin object instance which is available on caller level
void Foo(ref MyClass x) Reference to the pointer to the origin object instance which is available on caller level   Often C# developers say that’s a “pointer to a pointer” but in fact it is a reference to a pointer. As you know after reading above comparison of the two concepts that’s a small but important difference.

Use ref return without ref local

You can define a method with ref return and assign the result to a variable which is not ref local. In this case the content which is referenced by temporary variable of the method result will be copied to the local variable. So your local variable is a copy of the origin list item and changes will therefore not affect the list item.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
      new Person() {mName = "John Doe", mAge = 31 },
      new Person() {mName = "Jane Doe", mAge = 27 },
  };

  Person person = GetSecond(persons);
  person.mAge = 41;

  foreach (Person p in persons)
  {
    // output:
    // 'John Doe (31)'
    // 'Jane Doe (27)'
    Console.WriteLine(p.mName + " (" + p.mAge + ")");
  }
}

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) throw new ArgumentException();

  return ref persons[1];
}

Ref return of method local variable

A ref return creates a reference to an existing object instance. If the lifetime of the object instance is shorter than the lifetime of the reference, the reference will refer to an invalid object or memory location. This will result in critical runtime errors. Therefore, the referred object instance must be in a higher scope or in the same scope as the ref local variable. You cannot create a method local object instance and return a reference to this object instance. The following source code shows an according example with the compiler error messages as comment.

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2)
  {
    // error: 
    // An expression cannot be used in this context 
    // because it may not be passed or returned by reference
    return ref new Person();

    // error:
    // Cannot return local 'person' by reference because it is not a ref local
    Person person = new Person();
    return ref person;
  }

  return ref persons[1];
}

Return null

As we already learned, a reference cannot be null. Therefore, a method with ref return cannot return null. Within the examples so far, we have thrown an error if the parameter is invalid. But exceptions should be thrown in exceptional cases only. In my opinion it is not an exceptional case if we pass a list with less than two elements to the ‘GetSecond’ method. So, I don’t want to throw an exception but as no list item is found I want to return an invalid element. For reference types I want to return null and for value types I want to return a default value. But as we have seen, whether it is possible to create a local default value and return it by reference nor it is possible to return null. But it is possible to return a reference to an object instance if the object is in higher scope. We can use this possibility and define a default value for an invalid list item.

The following source code shows an according example with a list of value types.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
    new Person() {mName = "John Doe", mAge = 31 }
  };

  ref Person person = ref GetSecond(persons);
  person.mAge = 41;
}

static Person gDefaultPerson = new Person();

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) return ref gDefaultPerson;

  return ref persons[1];
}

And we can adapt the example for a list of reference types.

static void Main(string[] args)
{
  Person[] persons = new Person[]
  {
        new Person() {mName = "John Doe", mAge = 31 }
  };

  ref Person person = ref GetSecond(persons);

  if (person != null) person.mAge = 41;
}

static Person gDefaultPerson = null;

static ref Person GetSecond(Person[] persons)
{
  if (persons.Length < 2) return ref gDefaultPerson;

  return ref persons[1];
}

But there is one very critical design fault within this implementation. The returned default value can be changed. So, a next method call or a method call by another client may return a default value with changed content. This may lead to undefined behavior in the application. But what if we use an immutable object for the default value? This will solve the issue and allows to use this implementation concept. So, you must implement an immutable object and return a reference to this constant object instance. With C# 7.2 it will be possible to use the readonly modifier for structs and ref returns. This will make it even more comfortable to create and use immutable structs.

‘In’ modifier and ‘Readonly’ modifier for struct und ref return

The code examples of this article were created with C# 7.0. With C# 7.2 you can use two additional features which allows to write more performant code. These features are the ‘in’ modifier for method parameters and the ‘readonly’ modifier for ref returns and for structs.

Method parameters are often used as input for the method. So, they will not be changed within the method. If you use a struct as method parameter it is passed by value. In this case the runtime creates a copy of the struct instance and pass the copy to the method. This language design concept allows to use the method parameter as method local value without any side effect to the origin struct instance outside of the method scope. But of course, this comes with the disadvantage of performance loss as it may be expensive to create the copy of the struct.

But do we need a copy at all if we just read the parameter values? Of course not! In this case it would be fine to pass a reference to the origin struct. But it must be guaranteed that it is used to read values only. This is exactly the idea of the ‘in’ modifier. As well as the ‘out’ and ‘ref’ modifiers, the parameter will be passed as reference. Additional a ‘in’ parameter will become read only. So, you cannot assign a new value to the parameter. This is comparable to the “pass by const reference” principle in C++.

In theory the ‘in’ modifier is a nice and easy way to improve the performance of method calls with struct parameters. But unfortunately, it isn’t that easy. Depending on the implementation of the struct the compiler must create a copy of the parameter even if you use the ‘in’ modifier. This procedure is called ‘defensive copy’. It is used in case the compiler cannot guarantee that the parameter will not be changed inside the method. Of course, the compiler can prevent direct assignments. But if you call a struct method, the compiler may not know if the member method changes the internal state of the struct. In such situations the defensive copy is created.

To prevent a creation of a defensive copy you can implement an immutable struct. In this case you must use the ‘readonly’ modifier for the class declaration. A readonly struct cannot be changed. Even member methods cannot change the internal state. If you pass such a readonly struct instance as in parameter to a method the compiler knows that the value stays constant and does not have to create a defensive copy.

The ‘readonly’ modifier is moreover available for the ref return value of a method. Of course, the reference itself is constant by definition, so this readonly modifier means that the referenced object instance is constant.

Summary

Ref returns and ref locals help to write more performant code as there’s no need to move copies of values between methods. These enhancements are designed for performance critical algorithms where minimizing memory allocations is a major factor. For the same reason the ‘in’ modifier and the ‘readonly’ modifier for structs were introduced. To pass readonly structs as in parameters to methods may increase the application performance.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

Pattern Matching in C# 7

Patterns are used to test whether a value matches a specific expectation and if it matches patterns allow to extract information from the value. You already create such pattern matchings by writing if and switch statements. With these statements you test values and if they match the expectation you extract and use the values information.

With C# 7 we got an extension to the syntax for is and case statements. This syntax extension allows combine the two steps: testing a value and extract its information.

Introduction

Let’s start with a basic example to see what we are talking about. The following source code shows how to test whether a value is of specific type and then use the value for a console output. The code shows the old and new syntax so you can compare these two implementations. As you can see the new syntax combines the value testing and information extraction in one short statement.

static void Main(string[] args)
{
  WriteValueCS7("abc");
  WriteValueCS6(15);
  WriteValueCS7(18.4);
}

static void WriteValueCS7(dynamic x)
{
  //C# 7
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");
}

static void WriteValueCS6(dynamic x)
{
  //C# 6
  if (x is int)
  {
    var i = (int)x;
    Console.WriteLine("integer: " + i);
  }
  else if (x is string)
  {
    var s = x as string;
    Console.WriteLine("string: " + s);
  }
  else
  {
    Console.WriteLine("not supported type");
  }
}

The example shows pattern matching used in an is-expression to do a type check. The new pattern matching syntax is furthermore supported in case-expressions and it allows three different type of patterns: the type pattern, the const pattern and the var pattern. We will see these different possibilities within the next paragraphs.

Type Pattern

We have already seen the type pattern matching within the previous example. It is used to check whether a value is of a specific type. If the type is matching a new variable of this type is created and can be used to extract the value information. If a value is null, the type check always returns false. The following source code shows an according example.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");
}

Const Pattern

The pattern matching can be used to check whether the value matches a constant. Within this pattern you cannot create a new variable with the value information as the value already matches a constant and can be used as it is.

static void Main(string[] args)
{
string a = "abc";
string b = null;
int c = 15;
int d = 17;

WriteValue(a);  // output: 'const: abc'
WriteValue(b);  // output: 'const: null'
WriteValue(c);  // output: 'const: 15'
WriteValue(d);  // output: 'unknown'
}

static void WriteValue(dynamic x)
{
if (x is 15) Console.WriteLine("const: 15");
else if (x is "abc") Console.WriteLine("const: abc");
else if (x is null) Console.WriteLine("const: null");
else Console.WriteLine("unknown");
}

Var Pattern

The var pattern is a special case of the type pattern with one major distinction: the pattern will match any value, even if the value is null. Following we see the example previously used for the type pattern, extended with the var pattern.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else if (x is var v) Console.WriteLine("not supported type");
}

If we look at this example we may ask two critical questions: Why do we have to specify a temporary variable for the var pattern if we dont use it? And why do we use the var pattern at all is it is the same as the empty (default) else-statement?

The first question is easy to answer. If we use the var pattern and don’t need the target variable we can use the discard wildcard „_“ which was also introduced with C# 7.

The second question is more difficult. As described, the var pattern always matches. So, it represents a default case, which is the empty else in an if-else statement. Therefore, if we just want to write the default else-case we should not use the var pattern at all. But the var pattern proves to be practical as we want to distinguish between different groups of default-cases. The following code shows an according example. It uses more than one var-pattern to handle the default-case in more detail. As mentioned above the last var pattern is unnecessary and you can write an empty else. I used the var pattern anyway to show you how to use the discard character.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;
  double d = 17.5;
  Guid e = Guid.NewGuid();

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: ''null' is not supported'
  WriteValue(c);  // output: 'integer: 15'
  WriteValue(d);  // output: 'not supported primitive type'
  WriteValue(e);  // output: 'not supported type'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else if ((x is var v) && (v == null)) Console.WriteLine("'null' is not supported");
  else if ((x is var o) && (o.GetType().IsPrimitive)) Console.WriteLine("not supported primitive type");
  else if (x is var _) Console.WriteLine("not supported type");
}

Switch-case

At the beginning of the article I mentioned that pattern matching can be used in if-statements and switch-statements. Now we know the three types of pattern matching and have used them in if-statements. At next we will see how to use the patterns in switch-statements.

The switch-statement so far was a pattern expression. it supported the const pattern only and was limited to numeric types and the string type. With C# 7 those restrictions have been removed. Now the switch-statement supports pattern matching and therefore all three patterns can be used. Furthermore, a variable of any type may be used in a switch statement.

The new possibilities have an side-effect which made it necessary to change the behavior of the switch-case-statement. So far, the switch statement supported const pattern only and therefore the case-clauses were unique. With the new pattern matching the case-clauses can overlap and may not be unique anymore. Therefore, the order of the case-clauses matters. For example, the compiler emits an error if the previous clause matches a base type and the next clause matches a derived type. Because of the possible overlapping case-clauses, each case must end with a break or return. This prevents code execution to „fall through“ from one case expression to the next.

The following example shows the type pattern used in an switch-case-statement.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  switch (x)
  {
    case int i: Console.WriteLine("integer: " + i); break;
    case string s: Console.WriteLine("string: " + s); break;
    default: Console.WriteLine("not supported type"); break;
  }
}

Switch-case with predicates

Another feature related to pattern matching is the ability to use predicates within the switch-case-statement. Within a case-clause a when-clause can be used to do more specific checks.

The following source code shows the use case we already seen in the var pattern example. But this time we use the switch-case and where statements instead of the if-statement.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;
  double d = 17.5;
  Guid e = Guid.NewGuid();

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: ''null' is not supported'
  WriteValue(c);  // output: 'integer: 15'
  WriteValue(d);  // output: 'not supported primitive type'
  WriteValue(e);  // output: 'not supported type'
}

static void WriteValue(dynamic x)
{
  switch (x)
  {
    case int i: Console.WriteLine("integer: " + i); break;
    case string s: Console.WriteLine("string: " + s); break;
    case var v when v == null: Console.WriteLine("'null' is not supported"); break;
    case var o when o.GetType().IsPrimitive: Console.WriteLine("not supported primitive type"); break;
    default: Console.WriteLine("not supported type"); break;
  }
}

Scope of pattern variables

A variable introduced within a type pattern or var pattern in an if-statement is lifted to the outer scope. This leads to strange behavior of the compiler. On the one hand it is not meaningful to use the variable outside the if-statement because it may not be initialized. And on the other hand, the compiler behavior is different for an if-statement and an else-if statement. But maybe this strange behavior will be fixed in a next compiler version. The following source code shows an according example with the compiler errors as comments.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");

  Console.WriteLine(i); // error: Use of unassigned local variable 'i'
  i = 15; // ok      

  // s = "abc";  // error: The name 's' does not exist in the current context
  string s = "abc"; // error: 's' cannot be declared in this scope because that name is used in a local or parameter
}

Pattern variables created inside a case-clause are only valid within the case-clause. They are not lifted outside the switch-case scope. In my opinion this leads to a clean separation of concerns and it would be nice to have the same behavior in if-statements.

Summary

Pattern matching is a powerful concept. The pattern matching possibilities introduced with C# 7 offer nice ways to write complex if-statements and switch-statements in a clean way. The patterns introduced so far are just some base ones and with C# 8 it is planned to add some more advanced ones like recursive pattern, positional pattern and property pattern. So, this programming concept is not just syntactical sugar, it will become an important concept in C# and introduces more and more functional programming techniques to the language.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

C++17: initializers in if-statement and switch-statement

With C++17 it is possible to initialize a variable inside an if-statement and a switch-statement. We already know and use this concept in the for-statement. To be honest: I don’t like this feature. Within this article I want to introduce this feature and explain my doubts. Following I will write about the if-statement only because everything also applies to the switch-statement and so it is sufficient to show one of both.

The new syntax with the initializer inside the if-statement comes with a big improvement: the variable is moved inside the scope of the if-block. An important software design concept is to use the smallest scope as possible and the new syntax helps to implement according to this design concept.

But you must pay dearly for this advantage. As the initialization moves into the if-statement, initialization and comparison will be mixed up. This violates two other software design concepts, named “separation of concerns” and “keep it simple”. Depending on the complexity of the initialization and the comparison you may create a very complex if-statement. This may result in hard to read and error prone code. Only in case you have a very simple initialization and a very simple comparison, the combination of both may stay simple as well. In all other cases I recommend avoiding the new feature and clearly separate the initialization and the comparison in order to increase the code readability.

At next i want to show some examples. Within the example a couple of functions are called. If you want to compile the example source code you could use the following implementations of These functions.

int CalcCount() { return 1000; }
int CalcExpectedCount() { return 1000; }
int CalcOldCount() { return 1000; }
bool IsInitialized() { return true; }

Let’s have a look at a simple example. The following source code shows an if-statement with an included variable initialization and the same if-statement with a separation of the initialization and the comparison. Furthermore, just for fun, I removed the line breaks for the second example to compare it with the new syntax.

Due to an issue within the wordpress codeblock element, i could not use my origin code examples. I had to remove all insertion operators „<<“ and use „..“ as placeholder. So, the examples will contain some standard outputs, and within these outputs the two points „..“ must be seen as inseration operator „<<„.

// init inside if
if (int count = CalcCount(); count > 100)
{
	std::cout .. "count: " .. count .. std::endl;  
}

// init outside if
int count = CalcCount();

if(count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

// init outside if without line break
int count = CalcCount(); if (count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

If we compare the first and the second implementation – so if we compare new and classical syntax – we could say that the differences are small. In my opinion both variants are easy to read. Even the third one, with classical syntax but without line breaks, may be easy to read, even if it is unusual. But you can see that the new syntax isn’t that different from the classical without line break. Just the if-statement moved at the front. Of course, this little change increased the readability a lot.

So, we must look at a more complex example. Let’s see how things look like if we increase the complexity of the initialization, but leave the comparison as simple as before.

// init inside if
if (int count = IsInitialized() ? CalcCount() : (CalcExpectedCount() + CalcOldCount()) / 2; count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

// init outside if
int count = IsInitialized() ? CalcCount() : (CalcExpectedCount() + CalcOldCount()) / 2;

if (count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

// init outside if, separate different init variants
int count = 0;

if (IsInitialized())
{
	count = CalcCount();
}
else
{
	count = (CalcExpectedCount() + CalcOldCount()) / 2;
}

if (count > 100)
{
	std::cout .. "count: " .. count .. std::endl;
}

At first you can see the new syntax. In my opinion this if-statement is very hard to read. You have to stop at this line of code, read it several times and look closely to understand the meaning of the code.

The second implementation separates the initialization and the comparison. I think this will make it a little bit easier to read the code.

The third example clearly separates the different concerns. We have a standard initialization, an initialization for fallback cases and a comparison. This source code is easy to read. You don’t have to stop reading at any line of code as you must read it again to understand it. The complex initialization and comparison is spitted into simple parts.

Summary

Initializers in if-statements and switch-statements allow a clear assignment of the variable to the scope of the statement. But mixing the two concerns of initialization and comparison will often result in complex code. Therefore, in my opinion, the new syntax should be used with caution. If the initialization as well as the comparison is short and simple the resulting combination of both may be simple too and in this case the new syntax should be used.

Veröffentlicht unter C++ | 2 Kommentare

Name hiding in inheritance

The C++ name hiding rules for variables are well known by software developers. In contrast, name hiding in inheritance sometimes leads to issues although it follows the same rules. Such issues are therefore not a result of the rules but an effect due to different expectations in these different programming scenarios.

Name hiding rules for variables

Let’s start with the well-known rules for variables. The following example shows a typical scenario.

double x;
	
int main()
{
	int x;

	x = 5.5;	// conversion from double to int

	std::cout << x << std::endl;	// prints 5

	::x = 8.8;	// changes the global x

	std::cout << x << std::endl;	// prints 5
	
	return 0;
}

Within this example you see a double variable within the global scope and an integer variable within the local scope. As they have the same names, the local variable hides the global one, even if they have different types. If we want to access the double variable we must use the global namespace explicitly.

Name hiding rules in inheritance

At next, let us try the same name hiding within an inheritance scenario.

class Base
{
public:
	void Calc(double x) { std::cout << "Base calc was called" << std::endl; }
};

class Derived : public Base
{
public:
	void Calc(int x) { std::cout << "Derived calc was called" << std::endl; }
};

int main()
{
	Derived d;

	d.Calc(3.5);	// derived is called, conversion of double to int
	d.Calc(8);		// derived is called

	return 0;
}

Within this example we have a function with a double parameter in the base class and a function with an integer parameter in the derived class. The same name hiding rules like in the example before are still used. The local function of the derived class will hide the global function of the base class even if the function parameters are different.

Some developers are surprised by this behavior as they expect that the public functions of the base class will become functions of the derived class too and a function call with respect to the function parameters types can be executed. That’s a valid and comprehensible expectation because from a software architectural point of view public inheritance represents a “is-a” relationship.

From technical point of view the different kinds of inheritance just result in different visibilities of interfaces. Therefore, name hiding should be seen with respect to this technical point of view. So, the behavior of the example application is correct.

Make hidden names visible

Of course, there are many use cases where you want to keep the hidden names visible. For example, in public inheritance scenarios you normally want to have the interface visible and it should be a rare case to keep it hidden. As expected, this behavior was respected for C++. With the “using” statement the hidden names will become visible again. The following example shows the according modification of the derived class. This time the base class method is called if we pass a double parameter.

class Base
{
public:
	void Calc(double x) { std::cout << "Base calc was called" << std::endl; }
};

class Derived : public Base
{
public:
	using Base::Calc;	// make base class function name visible in derived class

	void Calc(int x) { std::cout << "Derived calc was called" << std::endl; }
};

int main()
{
	Derived d;

	d.Calc(3.5);	// base is called
	d.Calc(8);		// derived is called

	return 0;
}

Make specific hidden name visible

Within the previous example we have seen the “using” statement to make hidden names visible. As we already learned, names are independent of data types. Therefore, if we have different functions with the same name implemented in the base class, these functions will become visible. The following source code shows an according example. Furthermore, I have changed to private inheritance to show that the concept is independent of the inheritance kind.

class Base
{
public:
	void Calc(double x) { std::cout << "Base calc double was called" << std::endl; }

	void Calc(std::string x) { std::cout << "Base calc string was called" << std::endl; }
};

class Derived : private Base
{
public:
	using Base::Calc;	// make base class function name visible in derived class

	void Calc(int x) { std::cout << "Derived calc was called" << std::endl; }
};

int main()
{
	Derived d;

	d.Calc(3.5);	// base is called
	d.Calc(8);		// derived is called

	d.Calc("abc");

	return 0;
}

Based on a technical point of view the example looks fine. But from a software architectural point of view you may argue that it is bad design if we make the private interface public in derived class. And you are totally right. But sometimes such design decisions are made for several reasons. But in this case you want to keep the design fault as small as possible and make only one or a few of the available functions public. You can do this by using forward declaration instead of the “using” declaration. The following source code shows the adapted example.

class Base
{
public:
	void Calc(double x) { std::cout << "Base calc double was called" << std::endl; }

	void Calc(std::string x) { std::cout << "Base calc string was called" << std::endl; }
};

class Derived : private Base
{
public:
	void Calc(double x) { Base::Calc(x); };

	void Calc(int x) { std::cout << "Derived calc was called" << std::endl; }
};

int main()
{
	Derived d;

	d.Calc(3.5);	// base is called
	d.Calc(8);		// derived is called

	d.Calc("abc");	// compiler error

	return 0;
}

Summary

Names in derived classes hide names of base classes. This behavior is correct from a technical point of view. But in case of public inheritance it contradicts our expectations from a software architectural point of view. But we can easily make the hidden names visible again with the “using” declaration or with forward declarations.

Veröffentlicht unter C++ | Kommentar hinterlassen

Expression Bodied Members in C# 7

The concept of Expression Bodied Members (EBM) was introduced with C# 6 and as it becomes popular, many enhancements were added with C# 7. Within this article I want to give you the full picture of this feature so I explain the C# 6 and C# 7 EBM features.

With C# 6 the following EBM were introduced:

  • Expression bodied Methods
  • Expression bodied Properties

With C# 7 the following EBM were added:

  • Expression bodied Property Getter
  • Expression bodied Property Setter
  • Expression bodied Indexer
  • Expression bodied Operators Overloading
  • Expression bodied Constructor
  • Expression bodied Destructor (Finalizer)

Syntax

Member methods as well as property getters and property setters are sometimes implemented with a single instruction. In such cases the syntax overhead like brackets or getter and setter syntax is larger than the syntax for the real functionality. EBM allow to reduce the syntax overhead and therefore bring back the focus on the real functionality. This will increase the readability of the source code. For EBM a syntax is used which is already known from lambda expression: the “=>” sign. In contrast to lambda expressions you are limited to a single instruction. I think that’s a really good design decision because the EBM syntax only makes sense in such special cases. If your class members like constructor, property getters or methods contain several instructions, the bracket syntax used so far is more suitable. The instruction which belong together will be written into one block and therefore you can easily see that they belong together. But if you have one instruction only there is nothing which must be grouped. So, in this case it makes sense to leave out the block syntax and use a more lightweight style.

Examples

The following paragraphs show examples for each EBM type. As the examples are quite easily and self-explaining you will not find further descriptions or explanations. But at the end of the article you will find a summary and my personally thinking about the EBM feature.

Each example contains the same implementation twice: one time in EBM syntax and one time in standard block syntax. This allows an easy comparison of both implementation styles. But of course, if you want to compile the source code you must comment out one of the two implementations.

Expression bodied Methods

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  int result = myClass.Sum(3, 5);

  Console.WriteLine(result);
}

class MyClass
{
  // C# 6
  public int Sum(int a, int b) => a + b;

  // C# 5
  public int Sum(int a, int b)
  {
    return a + b;
  }
}

Expression bodied Properties

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  Console.WriteLine(myClass.Value);
}

class MyClass
{
  // C# 6
  public int Value => mValue;

  // C# 5
  public int Value
  {
    get { return mValue; }
  }

  private int mValue = 12;
}

Expression bodied Property Getter and Setter

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  myClass.Value = 42;
  Console.WriteLine(myClass.Value);
}

class MyClass
{
  // C# 7
  public int Value
  {
    get => mValue;
    set => mValue = value;
  }

  // C# 6
  public int Value
  {
    get { return mValue; }
    set { mValue = value; }
  }

  private int mValue = 12;
}

Expression bodied Indexer

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  Console.WriteLine(myClass[2]);
}

class MyClass
{
  // C# 7
  public int this[int index] => mValues[index];

  // C# 6
  public int this[int index]
  {
    get { return mValues[index]; }
  }

  private int[] mValues = new int[] { 11, 12, 13, 14 };
}

Expression bodied Operators Overloading

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  myClass.Value = 15;
  myClass++;

  Console.WriteLine(myClass.Value);
}

class MyClass
{
  // C# 7
  public static MyClass operator ++(MyClass myClass) => new MyClass() { Value = myClass.Value + 1 };

  // C# 6
  public static MyClass operator ++(MyClass myClass)
  {
    return new MyClass() { Value = myClass.Value + 1 };
  }

  public int Value { get; set; }
}

Expression bodied Constructor and Destructor (Finalizer)

static void Main(string[] args)
{
  MyClass myClass = new MyClass();
}

class MyClass
{
  // C# 7
  public MyClass() => Init();
  ~MyClass() => CleanUp();

  // C# 6
  public MyClass() { Init(); }
  ~MyClass() { CleanUp(); }

  private void Init() { }
  private void CleanUp() { }
}

Summary and Assessment

As you can see within the examples, the source code will become more readable as unnecessary syntax overhead is removed. From my point of view EBM is a quite nice feature. But of course, you should not overuse it. EBM should only be used if you have use a simple instruction. Furthermore, I don’t like to use EBM in ctor or finalizer in cases where you want to manage a single resource only. This feels a little bit inappropriate and most often you have more than one resource within a class. If you just want to call a single method in ctor or finalizer EBM is still fine.

Disadvantage of EBM

I think there is one minor disadvantage of EBM. The used ‘=>’ sign looks nearly like the ‘=’ sign. If you mix up these two signs you may write source code with another behavior than expected. The following example shows such an issue.

static void Main(string[] args)
{
  MyClass x;

  for (int i = 0; i  new MyLargeClass();
}

class MyLargeClass
{
  // ...
}

Both implementations look nearly the same but have a different behavior. One is an initializer the other one is a getter. So, in one case the value has a getter only and in the other case a getter and setter. Furthermore, one getter will always return the same object instance and the other one will always create a new object instance. Within the shown example this may result in huge performance differences depending on the kind of the returned object. Of course, this is a rare issue and it will not result in runtime errors. So, this theoretical disadvantage should not stop you from using EBM.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

C# Array indexer vs. List indexer

The C# CLR contains a lot of nice collection classes. They are optimized for their individual use case. But from a common perspective they all have the same behavior we expect from a collection.

But there is one exception from this rule: the array class. The array was added to the CLR from the very beginning and you can think of this class as a built-in generic. The array class has many differences compared to the other CLR containers. One major difference is the indexer. The indexer of the array returns the element by reference and not by value like all other collections.

This difference may have a huge importance if you choose the right collection for your use case. Furthermore, if an array indexer is used without respect to the fact that it returns a reference, it can result in undefined behavior.

The following example shows the difference between an array indexer and a list indexer.

static void Main(string[] args)
{
  var array = new[] { new MyStruct(42) };
  var list = new List { new MyStruct(42) };

  // array[0].mValue = 15;
  // list[0].mValue = 15;  // error CS1612: Cannot modify the return value because it is not a variable

  array[0].SetValue(15);
  list[0].SetValue(15);

  Console.WriteLine("Array value: " + array[0].mValue);   // output: 'Array value: 15'
  Console.WriteLine("List value: " + list[0].mValue);     // output: 'List value: 42'
}

public struct MyStruct
{
  public MyStruct(int x) { mValue = x; }

  public void SetValue(int x) { mValue = x; }

  public int mValue;
}

The example shows two important differences. If we try to set a member value, we will become an error message in case of the list collection. This is because the list indexer returns a copy of the object. As we don’t create a variable to store this copy, we are not able to set the member value. If we instead use a member function to set the value, the function call will be successful in both cases. But it has a different behavior. The array value will be changed but the list value will not. That’s because the array indexer returns a reference to the array element. The member function is called for this element. The list indexer returns a copy of the element. The member function is called on this temporary object. So, the origin list member is not changed at all.

The different behavior of array indexer and list indexer isn’t an error in the C# CLR. On the contrary, it offers advantages because you can choose the right collection type according to your needs. So, you should keep this special behavior of the array indexer in mind and use it if you have according use cases.

Veröffentlicht unter .NET, C# | 1 Kommentar

Object Instance Creation

Whenever you write an application in C++ you will create a lot of object instances. So, this is a base development task. C++ offers several ways to initialize variables. These are not just syntactical variations for the same task. The different initialization kinds may result in different behaviors. This rich variety of initialization kinds can result in some pitfalls, wrong expectations and programming errors.

Within this article I want to show the different object instance creation methods and explain their differences. To fully understand the examples of this article you should know the different types of constructors. You can get or refresh knowledge about constructors within this article.

Example Class

Within the examples of this article, a class “MyClass” is used. As we want to focus on the object instance creation, this class does not have any functionality. But it will provide several standard constructors and assignment operators. The ctor’s and operators just contain console outputs. This will help to see which ctor or operator is called. The following source code shows “MyClass”.

#include "stdafx.h"
#include 
#include 
#include 

class MyClass
{
public:
	MyClass();		// default ctor
	~MyClass();		// dtor

	MyClass(const int size);		// parameterized ctor

	MyClass(const MyClass& obj);		// copy ctor
	MyClass& operator=(const MyClass& obj);		// copy assignment operator

	MyClass(MyClass&& obj);		// move ctor
	MyClass& operator=(MyClass&& obj);		// move assignment operator

	MyClass(const std::initializer_list& list);	//initializer_list ctor
};

MyClass::MyClass() 
{
	std::cout << "default ctor" << std::endl;
}

MyClass::~MyClass()
{
}

MyClass::MyClass(const int size) 
{
	std::cout << "parameterized ctor" << std::endl;	
}

MyClass::MyClass(const MyClass& obj) 
{
	std::cout << "copy ctor" << std::endl;
}

MyClass& MyClass::operator=(const MyClass& obj)
{
	std::cout << "copy assignment operator" << std::endl;
	return *this;
}

MyClass::MyClass(MyClass&& obj)
{
	std::cout << "move ctor" << std::endl;
}

MyClass& MyClass::operator=(MyClass&& obj)
{
	std::cout << "move assignment operator" << std::endl;
	return *this;
}

MyClass::MyClass(const std::initializer_list& list) 
{
	std::cout << "initializer_list ctor" << std::endl;
}

Quiz

As a developer you already have implemented a huge number of object instantiations. Therefore, I want to start with a quiz. Following source code shows several ways to initialize MyClass. Please take a few minutes and think about these initializations. Try to answer following question for each line of code: Which ctor and/or assignment operator is called?

int main()
{
  MyClass test1;
  MyClass test2();
  MyClass test3{};
  <pre><code>MyClass test4(42);  
  MyClass test5{ 42 };

  MyClass test6(42.5);
  MyClass test7{ 42.5 };

  MyClass test8 = 42;
  MyClass test9 = 42.5;
  MyClass test10 = { 42 };
  MyClass test11 = { 42.5 };

  MyClass test12 = MyClass();
  MyClass test13 = MyClass(42);
  MyClass test14 = MyClass{ 42 };

  MyClass test15(test1);
  MyClass test16{ test1 };

  MyClass test17 = test1;
  MyClass test18 = { test1 };

  return 0;</code></pre>
}

MyClass test1

This is the simplest way to create an object instance. The default constructor will be called. Within the default constructor you should initialize all class members, otherwise they may contain garbage values.

MyClass test2()

Like before, this looks quite simple and we may expect that the default constructor is called. But not even close. This isn’t an object initialization at all. It is a function declaration. The function “test2” without parameters and a return value “MyClass” is declared. This C++ pitfall results in the redundant use of the parentheses. For backward compatibility the meaning of this code it still as in C++98 so it is still a function declaration. To bypass this pitfall, you should not use this syntax at all. Instead use the version seen above without parenthesis or use the braces syntax introduced with C++11 (as you can see in the next paragraph). But on the other hand, it is not a big issue because it will not result in errors. If you try to use the supposed object instance you will get according errors and if you not use the “test2” you get according compiler warnings too.

MyClass test3{}

This syntax was introduced with C++11. An object initialization with braces “{}” will call the default ctor. So, this syntax is equivalent to “MyClass test1”.

MyClass test4(42)

This will call the parameterized ctor and pass the “42” as parameter. The example class is a container type and therefore this object initialization will provide a container for 42 elements initialized with default value.

MyClass test5{ 42 }

If we use the braces syntax the values inside the braces will be converted to an initializer_list and therefore the initializer_list ctor is called. If we again think about the created container object we can say this time a container with one element was created and the element value was set to 42.

So “MyClass test4(42)” and “MyClass test5{ 42 }” will have different results but the syntax is nearly the same. This is a very important aspect and unfortunately a source for errors. Therefore, we should analyze this topic in more detail.

Furthermore, the braces syntax is still allowed even if we don’t have an initializer_list ctor. In this case the parameterized ctor is called according to the values given inside the braces.

In case a parameterized ctor and an initializer_list ctor exists the initializer_list ctor is prevered and will hide the parameterized ctor. This means if we get such a collision of two possible ctor’s the one with the initializer_list is prevered automatically and we don’t get any compiler warning. This may be a source for errors.

For example, we may use a container class “MyContainer”. This container class offers a parameterized ctor with two parameters: number of elements, initial value. We can create an object instance with “MyContainer x{10, 5}”. This will create a container with 10 elements all initialized with value 5. After a couple of time, the class MyContainer will be extended by the nice feature of an initializer_list ctor. But this new feature will change the behavior of the user code which uses the class. The existing initialization “MyContainer x{10, 5}” will now create a container with two elements of value 10 and 5. To fix this error we have to change the initialization and use parenthesis to call the hidden parameterized ctor: “MyContainer x(10,5)”.

This example shows the issues you may get with the initializer_list ctor. If you add this ctor to an existing class and if you have had parameterized ctor’s so far, they will get hidden and as a result you may break user code.

You will find an according example in the standard template library. The vector class offers an initializer_list ctor and it offers a hidden parameterized ctor with two parameters: number of elements, initial value.

MyClass test6(42.5)

The example class contains a parameterized ctor with an integer value as parameter. This parameterized ctor will be called even if the parameter does not match. An according value conversion is done. This narrowing conversion is allowed for some build in types but it may result in a loss of data and therefore an compiler warning will be shown.

MyClass test7{ 42.5 }

An object instance creation with braces will call the parameterized ctor too. But in contrast to the version above with parentheses syntax, narrowing conversions are not allowed. Therefore, in our example this object instantiation will result in an error.

MyClass test8 = 42

MyClass test9 = 42.5

MyClass test10 = { 42 }

MyClass test11 = { 42.5 }

These initializations are nearly the same as the ones explained above, with the syntactical difference that we use an assignment operator. But what’s the consequence of this different syntax? Will the assignment operator of MyClass be called?

The answer is simple: The use of the assignment operator is just a syntactical difference. These initializations are therefore equal to the ones explained above (see ’test4’ to ‘test7’).

Soo for test8 and test9 the parameterized ctor is called. Test10 will call the initializer_list ctor. Test11 will result in a compiler error as the narrowing conversion is not allowed.

MyClass test12 = MyClass()

MyClass test13 = MyClass(42)

MyClass test14 = MyClass{ 42 }

Now it becomes a little more difficult. What will happen in these cases? If we look at the different parts of the syntax, for example for the first case “MyClass test12 = MyClass()” we may think following: The “MyClass()” command creates an temporary object instance by calling the default ctor and “=“ will call the assignment operator and assign the temporary object to “test12” which was previously created due to the command “MyClass test12”. But this assumption is wrong. Unfortunately, I have heard it a few times, especially when people say you can optimize you code by eliminating the supposed temporary object and the call of several ctor’s and assignments.

So, what’s happening by using this kind of syntax? Nothing special! It has the same meaning as the syntax used for “test1”, “test4“ and “test5”. Therefore, for test12 the default ctor is called, for test13 the parameterized ctor is called and for test15 the initializer_list ctor is used. No temporary object is created and the assignment operator is never called.

In summary of the examples seen so far, we can say there is no difference between the following three initializations which will all call the parameterized ctor. Same is true for the default ctor and initializer_list ctor examples seen so far.

  • MyClass test4(42)
  • MyClass test8 = 42
  • MyClass test13 = MyClass(42)

These three spellings will create an instance of MyClass by calling the parameterized ctor. If you have read the article mentioned at the beginning you will answer back that there may be a theoretical difference. If we use explicit ctor’s the second syntax will no longer allowed. But that’s a restriction for explicit ctor’s only. In terms of common concepts, the three spellings will have the same result. But which one should be preferred? This depends on the coding guidelines of your company, your project team or your personal preferences. At the end of the article I will mention some coding guidelines.

MyClass test15(test1)

MyClass test16{ test1 }

These cases will call the copy ctor. As the given parameter is of type MyClass, the braces will not create an initializer_list.

MyClass test17 = test1

MyClass test18 = { test1 }

And again, the copy ctor will be called. As explained before, even if the syntax suggest that the assignment operator function is involved, it will never be called. So these initializations are nearly equal to the previous ones (test15 and test16) with the small difference that explicit ctor cannot be called.

Summary

As you can see there are many ways to initialize an object. These different initializations could have big differences in syntax but they have the same behavior. But unfortunately, there are some pitfalls to, like the initializer_list ctor which may hide a parameterized ctor. The braces syntax will offer uniform way to initialize objects. It should be used as preferred syntax as it can be used in nearly all cases. Following you will find some guidelines for object initializations but of course you may have your own coding guidelines or preferences.

Guidelines

Prefer object initialization with braces “{…}”, because it’s more consistent, more correct, can be used in nearly all cases and avoids old-style pitfalls at all.

In single-argument cases, especially on initialization of build in types, it is fine to omit the braces, for example “int i = 8;”.

In rare cases use parentheses “(…)” to explicitly call a ctor which is otherwise hidden by an initializer_list ctor.

When you design a class, avoid providing a ctor that ambiguously overloads with an initializer_list ctor. Users of your class should never need to use parentheses syntax to reach such a hidden ctor.

Veröffentlicht unter C++ | Kommentar hinterlassen