# Weak memory consistency Lecture 1

Viktor Vafeiadis MPI-SWS

2022-11-14

# The illusion of sequential consistency

### Sequential consistency (SC)

- The standard simplistic concurrency model.
- ▶ Threads access shared memory in an interleaved fashion.



# The illusion of sequential consistency

### Sequential consistency (SC)

- The standard simplistic concurrency model.
- ► Threads access shared memory in an interleaved fashion.



#### But...

- No multicore processor implements SC.
- Compiler optimizations invalidate SC.

# Weak consistency

Hardware provides weak consistency.

- ▶ Weak memory models ~> semantics of shared memory.
- Every hardware architecture has its own WMM: x86-TSO, ARM, Power, Itanium.





# Weak consistency examples

Store buffering (SB) Initially, x = y = 0x := 1; $a := y \ //0 \ \| \ y := 1;$  $b := x \ //0$ 



# Load buffering (LB) Initially, x = y = 0a := y; //1 || b := x; //1x := 1 || y := 1



# Weak consistency in "real life"

Messages may be delayed.



Messages may be sent/received out of order.



$$Email := 1;$$
 $a := Sms; //1$ 
 $Sms := 1;$ 
 $b := Email; //0$ 



# Operational memory models

# A simple concurrent programming language

### Basic domains:

$$r \in \text{Reg}$$
- Registers (local variables) $x \in \text{Loc}$ - Locations $v \in \text{Val}$ - Values including 0 $i \in \text{Tid} = \{1, \dots, N\}$ - Thread identifiers

#### Expressions and commands:

$$e ::= r | v | e + e | ...$$
  
 $c ::= skip | if e then c else c | while e do c |$   
 $c ; c | r := e | r := x | x := e |$   
 $r := FAA(x, e) | r := CAS(x, e, e) | fence$ 

**Programs**, P : Tid  $\rightarrow$  Cmd, written as  $P = c_1 \parallel ... \parallel c_N$ 

# Basic set up

### Thread subsystem

- ▶ Thread-local steps:  $c, s \xrightarrow{l} c', s'$ .
- Interpret sequential programs.
- Lift them to program steps:  $P, S \xrightarrow{i:l} P', S'$ .

### Storage subsystem (defined by the memory model)

- Describe the effect of memory accesses and fences.
- $M \xrightarrow{i:l} M'$  where M is the state of the storage subsystem.

#### Linking the two

Either the thread or the storage subsystem make an internal step, ε; or they make matching i:l steps.

$$\blacktriangleright P, S, M \Rightarrow P', S', M'.$$

# The thread subsystem

Store:  $s : \text{Reg} \rightarrow \text{Val}$  (Initial store:  $s_0 \triangleq \lambda r. 0$ )State:  $\langle c, s \rangle \in \text{Command} \times \text{Store}$ 

#### Transitions:

$$\frac{c_{1}, s \stackrel{i}{\rightarrow} c'_{1}, s'}{skip; c, s \stackrel{\varepsilon}{\rightarrow} c, s} \qquad \frac{c_{1}, s \stackrel{i}{\rightarrow} c'_{1}, s'}{c_{1}; c_{2}, s \stackrel{i}{\rightarrow} c'_{1}; c_{2}, s'} \qquad \frac{s' = s[r \mapsto s(e)]}{r := e, s \stackrel{\varepsilon}{\rightarrow} skip, s'}$$

$$\frac{l = R(x, v)}{r := x, s \stackrel{l}{\rightarrow} skip, s[r \mapsto v]} \qquad \frac{l = W(x, s(e))}{x := e, s \stackrel{l}{\rightarrow} skip, s}$$

$$\frac{s(e) \neq 0}{if e \text{ then } c_{1} \text{ else } c_{2}, s \stackrel{\varepsilon}{\rightarrow} c_{1}, s} \qquad \frac{s(e) = 0}{if e \text{ then } c_{1} \text{ else } c_{2}, s \stackrel{\varepsilon}{\rightarrow} c_{2}, s}$$

while  $e \text{ do } c, s \xrightarrow{\varepsilon} \text{ if } e \text{ then } (c; \text{ while } e \text{ do } c) \text{ else skip}, s$ 

# The thread subsystem: RMW and fence commands

Fetch-and-add:

$$\frac{l = U(x, v, v + s(e))}{r := \mathsf{FAA}(x, e), s \xrightarrow{l} \mathsf{skip}, s[r \mapsto v]}$$

**Compare-and-swap:** 

$$\frac{l = \mathbb{R}(x, v) \quad v \neq s(e_r)}{r := \mathsf{CAS}(x, e_r, e_w), s \stackrel{l}{\rightarrow} \mathsf{skip}, s[r \mapsto 0]}$$
$$\frac{l = \mathbb{U}(x, s(e_r), s(e_w))}{r := \mathsf{CAS}(x, e_r, e_w), s \stackrel{l}{\rightarrow} \mathsf{skip}, s[r \mapsto 1]}$$

Fence:

fence, 
$$s \xrightarrow{F} skip, s$$

# Lifting to concurrent programs

**State:**  $\langle P, S \rangle \in \text{Program} \times (\text{Tid} \rightarrow \text{Store})$ 

▶ Initial stores: 
$$S_0 \triangleq \lambda i. s_0$$

▶ Initial state:  $\langle P, S_0 \rangle$ 

#### Transition:

$$\frac{P(i), S(i) \xrightarrow{l} c, s}{P, S \xrightarrow{i:l} P[i \mapsto c], S[i \mapsto s]}$$

# SC storage subsystem



# SC storage subsystem

Machine state:  $M : Loc \rightarrow Val$ 

Maps each location to its value.

► Initial state: 
$$M_0 \triangleq \lambda x$$
. 0  
(*i.e.*, the memory that maps every location to 0)

#### Transitions:

 $\frac{l = W(x, v)}{M \xrightarrow{i:l} M[x \mapsto v]} \qquad \frac{l = R(x, v) \qquad M(x) = v}{M \xrightarrow{i:l} M}$  $\frac{l = U(x, v_r, v_w) \qquad M(x) = v_r}{M \xrightarrow{i:l} M[x \mapsto v_w]} \qquad \frac{l = F}{M \xrightarrow{i:l} M}$ 

# SC: Linking the thread and storage subsystems

$$\frac{P, S \xrightarrow{i:\varepsilon} P', S'}{P, S, M \Rightarrow P', S', M} \qquad \frac{P, S \xrightarrow{i:l} P', S' \qquad M \xrightarrow{i:l} M'}{P, S, M \Rightarrow P', S', M'}$$

#### Definition (Allowed outcome)

- An outcome is a function O : Tid → Store.
- An outcome O is allowed for a program P under SC if there exists M such that P, S<sub>0</sub>, M<sub>0</sub> ⇒<sup>\*</sup> skip|....||skip, O, M.

# TSO storage subsystem



# TSO storage subsystem

#### The state consists of:

- ► A memory M : Loc  $\rightarrow$  Val
- A function B : Tid → (Loc × Val)\* assigning a *store buffer* to every thread.

### Initial state: $\langle M_0, B_0 \rangle$ where

- $M_0 = \lambda x. 0$  (the memory maps 0 to every location)
- $B_0 = \lambda i. \epsilon$  (all store buffers are empty)

## TSO storage subsystem transitions

$$\frac{\overset{\text{WRITE}}{I = \mathbb{W}(x, v)}}{M, B \xrightarrow{i:I} M, B[i \mapsto \langle x, v \rangle \cdot B(i)]} \qquad \frac{\overset{\text{PROPAGATE}}{B(i) = b \cdot \langle x, v \rangle}}{M, B \xrightarrow{i:\varepsilon} M[x \mapsto v], B[i \mapsto b]}$$

READ  

$$I = \mathbf{R}(x, v)$$

$$B(i) = \langle x_n, v_n \rangle \dots \langle x_2, v_2 \rangle \cdot \langle x_1, v_1 \rangle$$

$$\underline{M[x_1 \mapsto v_1][x_2 \mapsto v_2] \dots [x_n \mapsto v_n](x) = v}$$

$$M, B \xrightarrow{i:l} M, B$$

$$\frac{\stackrel{\text{RMW}}{I = U(x, v_r, v_w)} \quad B(i) = \epsilon \quad M(x) = v_r}{M, B \xrightarrow{i:I} M[x \mapsto v_w], B} \qquad \frac{\stackrel{\text{FENCE}}{I = F} \quad B(i) = \epsilon}{M, B \xrightarrow{i:I} M, B}$$

# TSO: linking thread and storage subsystems

$$\begin{array}{c} \text{SILENT-THREAD} \\ \hline P, S \xrightarrow{i:\varepsilon} P', S' \\ \hline P, S, M, B \Rightarrow P', S', M, B \end{array} \qquad \begin{array}{c} \text{SILENT-STORAGE} \\ \hline M, B \xrightarrow{i:\varepsilon} M', B' \\ \hline P, S, M, B \Rightarrow P, S, M', B' \end{array}$$

$$\frac{P, S \xrightarrow{i:l} P', S' \qquad M, B \xrightarrow{i:l} M', B'}{P, S, M, B \Rightarrow P', S', M', B'}$$

#### Definition (Allowed outcome)

An outcome *O* is *allowed* for a program *P* under TSO if there exists *M* such that  $P, S_0, M_0, B_0 \Rightarrow^* \mathbf{skip} \parallel ... \parallel \mathbf{skip}, O, M, B_0$ .