std::atomic

Within multithreading applications even trivial functions like reading and writing values may create issues. The std::atomic template can be used in such situations. Within this article I want to give a short introduction into this topic.

 

Atomic operations

Let’s start with an easy example. Within a loop we will increase a value by using the “++” operator. The loop will be executed by several threads.

int value;

void increase()
{
	for (int i = 0; i < 100000; i++)
	{
		value++;
	}
}

int main()
{
	std::thread thread1(increase);
	std::thread thread2(increase);
	std::thread thread3(increase);
	std::thread thread4(increase);
	std::thread thread5(increase);

	thread1.join();
	thread2.join();
	thread3.join();
	thread4.join();
	thread5.join();

	std::cout << value << '\n';

  return 0;
}

 

We expect an output of “500000” but unfortunately the output is below this value and it will be different on each execution of the application.

That’s because the “++” operation is not atomic. To keep it simple we can say this operation consist of three steps: read value, increase value and write value. If these steps are executed in parallel, several threads may read the same value and increase it by one. Therefore, the result is much lower than expected.

The same issue may occur on simple read and write mechanisms too. For example, thread 1 sets a value and thread 2 reads a value. In case the value type does not fit into a processor word, even such trivial read and write operations are not atomics. Accessing a 64bit variable in a 32bit system may result in incomplete values.

In such cases we need a synchronization mechanism to prevent the parallel execution. Of course, we could use standard lock mechanism. Or we could use the std::atomic template. This template defines an atomic type and guarantees atomicity of the operations of this type. To adapt the previous example, we just have to change the data type.

std::atomic<int> value;

void increase()
{
	for (int i = 0; i < 100000; i++)
	{
		value++;
	}
}

int main()
{
	std::thread thread1(increase);
	std::thread thread2(increase);
	std::thread thread3(increase);
	std::thread thread4(increase);
	std::thread thread5(increase);

	thread1.join();
	thread2.join();
	thread3.join();
	thread4.join();
	thread5.join();

	std::cout << value << '\n';

	return 0;
}

 

Now the result is like expected. The output of the application is “500000”.

 

std::atomic

As mentioned before, the std::atomic template defines a atomic type. If one thread writes to an atomic object while another thread reads from it, the behavior is well-defined. The operation itself has a guaranteed atomicity. Furthermore, read and writes to several different objects can have a sequential consistency guarantee. This guarantee is based on the selected memory model. We will see according examples later.

 

Read and write values

A common scenario in multithreading applications is the parallel read and write of variables by different threads. Let’s create a simple example with two threads, one is writing variable values and the other one reads these values.

int x;
int y;

void write()
{
	x = 10;
	y = 20;
}

void read()
{
	std::cout << y << '\n';
	std::cout << x << '\n';
}

int main()
{
	std::thread thread1(write);
	std::thread thread2(read);

	thread1.join();
	thread2.join();

  return 0;
}

 

Within this kind of implementation, the behavior is undefined. Depending on the order of the threads the output may be “20/10”, “0/0”, “0/10” or “20/0”. But beside these expected results it may happen that a read is done during a write and a therefore an incomplete written value is used. Within the example this should not happen but ex described before depending on the used processor and value types it may happen. Therefore, we can say that the behavior of the application is undefined.

By using the std::atomic template we could change the undefined behavior to a defined on. We just have to define the variables as atomic and use the according read and write functions.

std::atomic<int> x;
std::atomic<int> y;

void write()
{
	x.store(10);
	y.store(20);
}

void read()
{
	std::cout << y.load() << '\n';
	std::cout << x.load() << '\n';
}

int main()
{
	std::thread thread1(write);
	std::thread thread2(read);

	thread1.join();
	thread2.join();

  return 0;
}

 

Now, the behavior of the application is well defined. Independent from the used processor and independent from the data type (we could change from “int” to any other type), we have a defined set of results. Depending on the order of the threads the output may be “20/10”, “0/0”, “0/10”. The output “20/0” is not possible because the default mode for atomic loads and stores enforces sequential consistency. That means, within this example, x will always be written before y is changed. Therefore, the output “0/10” is possible but not “20/0”. As the std::atomic template ensures an atomic execution of the read and write functions we don’t have to fear undefined behavior due to incomplete data updates. So, we have a defined behavior with three possible results.

 

Memory model

As mentioned before, the selected memory model will change the behavior of atomic read and write functions. By default, the memory model ensures a sequential consistence. This may be expensive because the compiler is likely to emit memory barriers between every access. If your application or algorithm does not need this sequential consistency you can set a more relaxed memory model.

For example, if it is fine to get “20/0” as result within our application, we can set the memory model to “memory_order_relaxed”. This removes the synchronization and ordering constraints, but operation’s atomicity is still guaranteed.

void write()
{
	x.store(10, std::memory_order_relaxed);
	y.store(20, std::memory_order_relaxed);
}

void read()
{
	std::cout << y.load(std::memory_order_relaxed) << '\n';
	std::cout << x.load(std::memory_order_relaxed) << '\n';
}

 

Another interesting memory model is the Release-Acquire ordering. You can set the store functions within the first thread to “memory_order_release” and the load functions in the second thread to “memory_order_acquire”. In this case all memory writes (non-atomic and relaxed atomic) that happened before the atomic store in thread 1 are executed before load in thread 2 is executed. This takes us back to the ordered loads and stores, so “20/0” is no longer a possible output. But it does so with minimal overhead and an increased execution performance. Within this trivial example, the result is the same as with the full-blown sequential consistency. In a more complex example with several threads reading and writing all or some of the variables, the result may be different from the default sequential consistency.

void write()
{
	x.store(10, std::memory_order_release);
	y.store(20, std::memory_order_release);
}

void read()
{
	std::cout << y.load(std::memory_order_acquire) << '\n';
	std::cout << x.load(std::memory_order_acquire) << '\n';
}

 

As mentioned before, the Release-Acquire ordering ensures that all memory writes before the atomic store are executed, even non-atomic ones. So we could change the application and use a non-atomic type for x. The atomic store of y still ensures the correct write order.

int x;
std::atomic<int> y;

void write()
{
	x = 10;
	y.store(20, std::memory_order_release);
}

void read()
{
	std::cout << y.load(std::memory_order_acquire) << '\n';
	std::cout << x << '\n';
}

int main()
{
	std::thread thread1(write);
	std::thread thread2(read);

	thread1.join();
	thread2.join();

	return 0;
}

 

But again, in a more complex scenario with several threads and maybe a read of x only, it could be necessary to define the x as atomic.

 

Synchronization of threads

A common implementation technique for thread synchronization are locks. By using the atomic template, you can create such thread synchronizations without locks. Depending on the selected memory model this may increase the performance of your application.

The following application shows an example how to use the atomic template to synchronize the execution of two threads. Thread 2 will wait until thread 1 finished execution. This is done by using a variable which contains the execution state. Furthermore thread 1 commits its results within a data variable. The used Release-Acquire ordering mechanism of the atomic template ensures that the data is written before the synchronization flag is set.

int data;
std::atomic<bool> ready;

void write()
{
	data = 10;
	ready.store(true, std::memory_order_release);
}

void read()
{
	if(ready.load(std::memory_order_acquire))
	{ 	
		std::cout << data << '\n';
	}
}

int main()
{
	std::thread thread1(write);
	std::thread thread2(read);

	thread1.join();
	thread2.join();

	return 0;
}

 

Summary

This article gave a short introduction into the std::atomic template based on some common use cases. The examples name some major issues regarding data access in multithreading applications and introduce according implementations based on the atomic template. Of course this article is an introduction only. The atomic template offers some more features, for example there exist more memory model configurations beside the three shown within this article.

Werbung
Dieser Beitrag wurde unter C++ veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit deinem WordPress.com-Konto. Abmelden /  Ändern )

Facebook-Foto

Du kommentierst mit deinem Facebook-Konto. Abmelden /  Ändern )

Verbinde mit %s