#### Weak Memory

Anton Podkopaev

St. Petersburg University, JetBrains Research, Russia

13.12.2017

CPU and PL MMs. Promise MM

Compilation correctness for Promise MM

1







• CPU and PL MMs. Promise MM



#### • CPU and PL MMs. Promise MM

 Compilation correctness for Promise MM to {x86, Power, ARM}



#### CPU and PL MMs. Promise MM

 Compilation correctness for Promise MM to {x86, Power, ARM}



#### • CPU and PL MMs. Promise MM

 Compilation correctness for Promise MM to {x86, Power, ARM}

# Memory model (MM) is concurrent system's semantics

# Sequential consistency [Lamport, 1979]

# Sequential consistency [Lamport, 1979]

## system's behaviors program interleavings

$$\overrightarrow{[x] := 1}; \qquad \overrightarrow{[y] := 1}; \\ a := [y]; \qquad b := [x]; \end{cases}$$

ValuesMemory
$$a = \bot; b = \bot$$
 $[x] \leftarrow 0; [y] \leftarrow 0$ 

ValuesMemory
$$a = \bot; b = \bot$$
 $[x] \leftarrow 1; [y] \leftarrow 0$ 

ValuesMemory
$$a = \bot; b = \bot$$
 $[x] \leftarrow 1; [y] \leftarrow 1$ 

$$\begin{array}{c} [x] := 1; \\ \hline a := [y]; \end{array} \qquad \begin{bmatrix} [y] := 1; \\ b := [x]; \\ \hline \end{array}$$

ValuesMemory
$$a = \bot; b = 1$$
 $[x] \leftarrow 1; [y] \leftarrow 1$ 

$$\begin{array}{c} [x] := 1; \\ a := [y]; \\ \end{array} \begin{array}{c} [y] := 1; \\ b := [x]; \\ \end{array} \end{array}$$

Values  
$$a = 1; b = 1$$
Memory  
 $[x] \leftarrow 1; [y] \leftarrow 1$ 

$$\begin{array}{c} [x] := 1; \\ a := [y]; \\ \end{array} \begin{array}{c} [y] := 1; \\ b := [x]; \\ \end{array} \end{array}$$

Impossible to get 
$$a = b = 0$$

Values  
$$a = 1; b = 1$$
Memory  
 $[x] \leftarrow 1; [y] \leftarrow 1$ 

## Is it the same in reality?

## Is it the same in reality? Let's check!

$$\begin{array}{c} [x] := 1; \\ a := [y]; \\ \end{array} \begin{array}{c} [y] := 1; \\ b := [x]; \\ \end{array} \end{array}$$

- -

Possible to get a = b = 0 on GCC+x86!

ValuesMemory
$$a = 1; b = 1$$
 $[x] \leftarrow 1; [y] \leftarrow 1$ 

#### Features

- reorderings
- cache
- buffers
- read-after-write elimination
- speculative execution
- fake dependency elimination
- • •

#### Features

- reorderings
- cache
- buffers
- read-after-write elimination
- speculative execution
- fake dependency elimination

• • • •

#### Correct

#### Features

- reorderings
- cache
- buffers
- read-after-write elimination
- speculative execution
- fake dependency elimination

• • • •

#### Correct for **one** thread

#### Features

- reorderings
- cache
- buffers
- read-after-write elimination
- speculative execution
- fake dependency elimination

. . . .

Correct for **one** thread

Lead to strange concurrent behaviors

#### Features

- reorderings
- cache
- buffers
- read-after-write elimination
- speculative execution
- fake dependency elimination

. . .

Lead to weak concurrent behaviors

Correct for **one** thread

#### Weak MMs allow weak behaviors

#### Weak MMs allow weak behaviors

#### Real systems have weak MMs

#### Weak MMs allow weak behaviors

# Real systems have weak MMs (i.e., x86, Power, ARM, C++, Java)

# Realistic weak MMs are subtle ...and different to each other

## But most have *data race freedom* (DRF) results

## But most have *data race freedom* (DRF) results:

No data races  $\Rightarrow$  only SC behaviors

### When **not** to care about Weak MMs

# Writing/verifing a program, whichhas immutable data only

- is single-threaded
- is properly locked multi-threaded

### When to care about Weak MMs

## Writing/verifing lock-free code (i.e., locks themselves)

## Agenda

• Memory models. What, why, and when to care

• CPU and PL MMs. Promise MM

 Compilation correctness for Promise MM to {x86, Power, ARM}

# [Store Buffering in ×86-TSO [Owens et al., 2009]]

L

# Load Buffering

$$a := [x];$$
   
  $[y] := 1;$    
  $b := [y];$    
  $[x] := 1;$ 

Final values 
$$a = 1$$
,  $b = 1$ 



Final values 
$$a = \_$$
,  $b = \_$ 



Final values 
$$a = \_$$
,  $b = \_$ 



Final values 
$$a = \_$$
,  $b = \_$ 



Final values 
$$a = \_$$
,  $b = \_$ 



Final values 
$$a = \_$$
,  $b = \_$ 



Final values 
$$a = \_$$
,  $b = \_$ 



Final values 
$$a = \_$$
,  $b = \_$ 



Final values 
$$a = \_$$
,  $b = \_$ 



Final values 
$$a = \_$$
,  $b = \_$ 



Final values 
$$a = \_$$
,  $b = 1$ 



Final values 
$$a = \_$$
,  $b = 1$ 



Final values 
$$a = 1$$
,  $b = 1$ 

$$a := [x];$$
  $\| b := [x];$   $\| c := [y];$   
 $[x] := 1;$   $\| [y] := b;$   $\| [x] := c;$ 

*a* = 1?















































18

# CPU MM should:

# CPU MM should: 1. describe real CPUs

#### CPU MM should:

- 1. describe real CPUs
- 2. save room for future optimizations

# CPU MM should:

- 1. describe real CPUs
- 2. save room for future optimizations
- 3. provide reasonable guarantees for PLs

# MM for PL?

# MM for PL? Has to satisfy 3 requirements

Compilation correctness for Promise MM

### 1. Efficient Compilation

20

### 1. Efficient Compilation

$$\begin{array}{c} [x] := 1; \\ a := [y]; \end{array} \left| \begin{array}{c} [y] := 1; \\ b := [x]; \end{array} \right|$$

### 1. Efficient Compilation

Source (SC MM)

### 1. Efficient Compilation

Source (SC MM)

Target (x86 MM)



. .

### 1. Efficient Compilation

| Source (SC MM)  | $egin{array}{rll} [x] &:= 1; \ {\it a} &:= [y]; \end{array}$ | $egin{array}{rll} [y] &:= 1; \ b &:= [x]; \end{array}$ |
|-----------------|--------------------------------------------------------------|--------------------------------------------------------|
|                 |                                                              | <i>,</i>                                               |
| Target (x86 MM) | [x] := 1;<br>mfence;<br>a := [y];                            | [y] := 1;<br>mfence;<br>b := [x];                      |
|                 | -                                                            | 20                                                     |

### 1. Efficient Compilation



$$\begin{array}{c|c} [x] & := & 1; \\ a & := & [y]; \end{array} \begin{array}{c|c} [y] & := & 1; \\ b & := & [x]; \end{array}$$

21

Original

$$\begin{array}{c|c} [x] & := & 1; \\ a & := & [y]; \end{array} \begin{array}{c|c} [y] & := & 1; \\ b & := & [x]; \end{array}$$

21

Original

$$\begin{array}{c} [x] := 1; \\ a := [y]; \end{array} \left| \begin{array}{c} [y] := 1; \\ b := [x]; \end{array} \right|$$

Optimized

$$a := [y];$$
  $[y] := 1;$   
 $[x] := 1;$   $b := [x];$ 

Original



Optimized

$$\begin{bmatrix} a := [y]; \\ [x] := 1; \end{bmatrix} \begin{bmatrix} [y] := 1; \\ b := [x]; \end{bmatrix}$$

UI

#### 3. No Out-Of-Thin-Air

#### 3. No Out-Of-Thin-Air

$$a := [x];$$
  $b := [y];$   
 $[y] := a;$   $[x] := b;$ 

. .

22

#### 3. No Out-Of-Thin-Air

$$a := [x];$$
  $b := [y];$   
 $[y] := a;$   $[x] := b;$ 

. .

# C/C++11 MM allows a = b = 42

22

# Memory Models for PLs

## Memory Models for PLs

SC MM, [Lamport, 1979] Java MM, [Manson et al., 2005] C/C++11 MM, [Batty et al., 2011]

EC

Х

# Memory Models for PLs

SC MM, [Lamport, 1979] Java MM, [Manson et al., 2005] C/C++11 MM, [Batty et al., 2011]

- allow Efficient Compilation (x86, Power, ARM)
- validate Compiler Optimizations (merging, rearranging, etc)
- no Out-Of-Thin-Air

EC CO

 $\begin{array}{c|c} X & X \\ \checkmark & X \\ \checkmark & X \\ \checkmark & \checkmark^* \end{array}$ 

# Memory Models for PLs

SC MM, [Lamport, 1979] Java MM, [Manson et al., 2005] C/C++11 MM, [Batty et al., 2011]

- allow Efficient Compilation (x86, Power, ARM)
- validate Compiler Optimizations (merging, rearranging, etc)
- no Out-Of-Thin-Air

# Memory Models for PLs

SC MM, [Lamport, 1979] Java MM, [Manson et al., 2005] C/C++11 MM, [Batty et al., 2011]



- allow Efficient Compilation (x86, Power, ARM)
- validate Compiler Optimizations (merging, rearranging, etc)
- no Out-Of-Thin-Air

# Memory Models for PLs

SC MM, [Lamport, 1979] Java MM, [Manson et al., 2005] C/C++11 MM, [Batty et al., 2011]



- allow Efficient Compilation (x86, Power, ARM)
- validate Compiler Optimizations (merging, rearranging, etc)
- no Out-Of-Thin-Air

# C/C++11 MM has OOTA

# C/C++11 MM has OOTA. Why?

# C/C++11 MM is **axiomatic** MM



26



26

R*y*1 R*x*1 ↓ po po  $W \times 1$ WV

Axioms: 1. hb is acyclic

. . .



$$a := [x]; \mid b := [y]; [y] := a; \mid [x] := b;$$

Axioms: 1. hb is acyclic

. . .



$$a := [x]; \mid b := [y]; [y] := a; \mid [x] := b;$$

Axioms: 1. hb is acyclic

. . .



# [OOTA-if example]

### Solutions

[?]

[**?**]

[?]

[Jeffrey and Riely, 2016]

[Kang et al., 2017]

[Pichon-Pharabod and Sewell, 2016]

[Podkopaev et al., 2016]

28

## Solutions

[?]

[?]

[?]

[Jeffrey and Riely, 2016]

Promise MM, [Kang et al., 2017]

[Pichon-Pharabod and Sewell, 2016]

[Podkopaev et al., 2016]

# Memory Models for PLs

SC MM, [Lamport, 1979] Java MM, [Manson et al., 2005] C/C++11 MM, [Batty et al., 2011]



Requirements:

- allow Efficient Compilation (x86, Power, ARM)
- validate Compiler Optimizations (merging, rearranging, etc)
- no Out-Of-Thin-Air



# [Store Buffering in Promise]

. .

Memory: 
$$[\langle x: 0@0_{\tau} \rangle, \langle y: 0@0_{\tau} \rangle]$$

Final values 
$$a = \_$$
,  $b = \_$ 

$$a := [x];$$
   
 $[y] := 1;$    
 $[x] := 1;$ 

Memory: 
$$[\langle x: 0@0_{\tau} \rangle, \langle y: 0@0_{\tau} \rangle]$$

Final values 
$$a = \_$$
,  $b = \_$ 

Promised 
$$\begin{bmatrix} y \end{bmatrix} := 1;$$
  $\begin{bmatrix} b \\ y \end{bmatrix} := 1;$ 

Memory: 
$$[\langle x: 0@0_{\tau} \rangle, \langle y: 0@0_{\tau} \rangle, \langle y: 1@0_{\tau} \rangle]$$

Final values 
$$a = \_$$
,  $b = \_$ 

Promised 
$$\overbrace{[y] := 1;} \stackrel{\bullet}{[x];}$$
  $\begin{bmatrix} b := [y]; \\ \hline [x] := 1; \end{bmatrix}$ 

Memory: 
$$[\langle x: 0@0_{\tau} \rangle, \langle y: 0@0_{\tau} \rangle, \langle y: 1@0_{\tau} \rangle]$$

Final values 
$$a = \_$$
,  $b = 1$ 

Promised 
$$\overbrace{[y] := 1;} \downarrow b := [y];$$
  
 $[x] := 1;$ 

Memory: 
$$[\langle x: 0@0_{\tau} \rangle, \langle y: 0@0_{\tau} \rangle, \langle y: 1@0_{\tau} \rangle, \langle x: 1@1_{\tau} \rangle]$$

Final values 
$$a = \_$$
,  $b = 1$ 

Promised 
$$\begin{bmatrix} a := [x]; \\ \hline [y] := 1; \end{bmatrix} \begin{bmatrix} b := [y]; \\ \hline [x] := 1; \end{bmatrix}$$

Memory: 
$$[\langle x: 0@0_{\tau} \rangle, \langle y: 0@0_{\tau} \rangle, \langle y: 1@0_{\tau} \rangle, \langle x: 1@0_{\tau} \rangle]$$

Final values 
$$a = 1$$
,  $b = 1$ 

-

$$a := [x];$$
   
 $[y] := 1;$    
 $[x] := 1;$ 

Memory: 
$$[\langle x: 0@0_{\tau} \rangle, \langle y: 0@0_{\tau} \rangle, \langle y: 1@0_{\tau} \rangle, \langle x: 1@1_{\tau} \rangle]$$

Final values 
$$a = 1$$
,  $b = 1$ 

$$a := [x];$$
  $b := [x];$   $c := [y];$   
 $[x] := 1;$   $[y] := b;$   $[x] := c;$ 

$$a = 1?$$



.

$$\overrightarrow{a} := [x]; \qquad \overrightarrow{b} := [x]; \qquad \overrightarrow{c} := [y]; \\ [x] := 1; \qquad [y] := b; \qquad [x] := c;$$

V1: 
$$[x@0_{\tau}, y@0_{\tau}]$$
 V2:  $[x@0_{\tau}, y@0_{\tau}]$  V3:  $[x@0_{\tau}, y@0_{\tau}]$   
Memory:  $[\langle x : 0@0_{\tau} \rangle, \langle y : 0@0_{\tau} \rangle]$ 

Values:
$$a = \bot$$
 $b = \bot$  $c = \bot$ 

V1:  $[x@0_{\tau}, y@0_{\tau}]$  V2:  $[x@0_{\tau}, y@0_{\tau}]$  V3:  $[x@0_{\tau}, y@0_{\tau}]$ Memory:  $[\langle x : 0@0_{\tau} \rangle, \langle y : 0@0_{\tau} \rangle, \langle x : 1@2_{\tau} \rangle]$ 

Values:
$$a = \bot$$
 $b = \bot$  $c = \bot$ 

$$\overrightarrow{a := [x];} \| \underbrace{b := [x];}_{[y] := b;} \| \overrightarrow{c := [y];} \\ \overrightarrow{[x] := 1;} \| \overrightarrow{[y] := b;} \| \overrightarrow{[x] := c;}$$

V1:  $[x@0_{\tau}, y@0_{\tau}]$  V2:  $[x@2_{\tau}, y@0_{\tau}]$  V3:  $[x@0_{\tau}, y@0_{\tau}]$ Memory:  $[\langle x : 0@0_{\tau} \rangle, \langle y : 0@0_{\tau} \rangle, \langle x : 1@2_{\tau} \rangle]$ 

Values:
$$a = \bot$$
 $b = 1$  $c = \bot$ 

$$\overrightarrow{a := [x];} \| \underbrace{b := [x];}_{[y] := b;} \| \overrightarrow{c := [y];} \\ [x] := c;$$
  
Promised

V1: 
$$[x@0_{\tau}, y@0_{\tau}]$$
 V2:  $[x@2_{\tau}, y@1_{\tau}]$  V3:  $[x@0_{\tau}, y@0_{\tau}]$   
Memory:  $[\langle x : 0@0_{\tau} \rangle, \langle y : 0@0_{\tau} \rangle, \langle x : 1@2_{\tau} \rangle, \langle y : 1@1_{\tau} \rangle]$ 

Values:
$$a = \bot$$
 $b = 1$  $c = \bot$ 

$$\overrightarrow{a := [x];}_{\substack{[x] := 1;\\ \text{Promised}}} \left\| \begin{array}{c} b := [x];\\ \underline{[y]} := b;\\ \end{array} \right\| \begin{array}{c} c := [y];\\ \underline{[x]} := c;\\ \end{array}$$

V1: 
$$[x@0_{\tau}, y@0_{\tau}]$$
 V2:  $[x@2_{\tau}, y@1_{\tau}]$  V3:  $[x@0_{\tau}, y@1_{\tau}]$   
Memory:  $[\langle x : 0@0_{\tau} \rangle, \langle y : 0@0_{\tau} \rangle, \langle x : 1@2_{\tau} \rangle, \langle y : 1@1_{\tau} \rangle]$ 

Values: 
$$a = \bot$$
  $b = 1$   $c = 1$ 

$$\begin{array}{c|c} \overrightarrow{a} := [x];\\ (x] := 1;\\ Promised \end{array} \end{array} \begin{array}{c|c} b := [x];\\ (y] := b;\\ (x] := c;\\ (x] := c;\\ (x) := c;\\ ($$

V1: 
$$[x@0_{\tau}, y@0_{\tau}]$$
 V2:  $[x@2_{\tau}, y@1_{\tau}]$  V3:  $[x@1_{\tau}, y@1_{\tau}]$   
Memory:  $[\langle x : 0@0_{\tau} \rangle, \langle y : 0@0_{\tau} \rangle, \langle x : 1@2_{\tau} \rangle, \langle y : 1@1_{\tau} \rangle, \langle x : 1@1_{\tau} \rangle]$ 

Values: 
$$a = \bot$$
  $b = 1$   $c = 1$ 

$$\begin{array}{c|c} a := [x];\\ \hline [x] := 1;\\ Promised \end{array} \end{array} \begin{array}{c|c} b := [x];\\ \hline [y] := b;\\ \hline [x] := c; \end{array}$$

V1: 
$$[x@1_{\tau}, y@0_{\tau}]$$
 V2:  $[x@2_{\tau}, y@1_{\tau}]$  V3:  $[x@1_{\tau}, y@1_{\tau}]$   
Memory:  $[\langle x : 0@0_{\tau} \rangle, \langle y : 0@0_{\tau} \rangle, \langle x : 1@2_{\tau} \rangle, \langle y : 1@1_{\tau} \rangle, \langle x : 1@1_{\tau} \rangle]$ 

Values: 
$$a = 1$$
  $b = 1$   $c = 1$ 

V1: 
$$[x@2_{\tau}, y@0_{\tau}]$$
 V2:  $[x@2_{\tau}, y@1_{\tau}]$  V3:  $[x@1_{\tau}, y@1_{\tau}]$   
Memory:  $[\langle x : 0@0_{\tau} \rangle, \langle y : 0@0_{\tau} \rangle, \langle x : 1@2_{\tau} \rangle, \langle y : 1@1_{\tau} \rangle, \langle x : 1@1_{\tau} \rangle]$ 

Values: 
$$a = 1$$
  $b = 1$   $c = 1$ 



#### • Memory models. What, why, and when to care

#### • CPU and PL MMs. Promise MM

 Compilation correctness for Promise MM to {x86, Power, ARM} CPU and PL MMs. Promise MM

Compilation correctness for Promise MM

# **Compilation Correctness**

# Source language

# **Compilation Correctness**



# **Compilation Correctness**

### *compile* : $S \rightarrow T$



# **Compilation Correctness**

### *compile* : $S \rightarrow T$





# **Compilation Correctness**

### *compile* : $S \rightarrow T$

# $\begin{bmatrix} & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & & \\ & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\$

# **Compilation Correctness**

# $\begin{array}{l} \textit{compile} : S \to T \\ \forall \textit{Prog} \in S. \\ [[\textit{compile}(\textit{Prog})]]_T \subseteq [[\textit{Prog}]]_S. \end{array}$

# **Compilation Targets**

- x86-TSO, [Owens et al., 2009]
- Power, [Alglave et al., 2014]
- ARMv8 POP, [Flur et al., 2016]
- ARMv8.3, [Pulte et al., 2018]

• x86-TSO = SC + transformations[Lahav and Vafeiadis, 2016]

- x86-TSO = SC + transformations[Lahav and Vafeiadis, 2016]
  - Reordering of independent write-read
  - Read-after-write elimination

- x86-TSO = SC + transformations [Lahav and Vafeiadis, 2016]
  - Reordering of independent write-read
  - Read-after-write elimination

• Transformations are sound in Promise [Kang et al., 2017]

- x86-TSO = SC + transformations [Lahav and Vafeiadis, 2016]
  - Reordering of independent write-read
  - Read-after-write elimination

• Transformations are sound in Promise [Kang et al., 2017]

• SC  $\subset$  Promise

## Compilation to Power

# Compilation to Power

• Power = StrongPower + transformation [Lahav and Vafeiadis, 2016]

# Compilation to Power

- Power = StrongPower + transformation [Lahav and Vafeiadis, 2016]
  - Reordering of independent instructions

#### Compilation to Power

- Power = StrongPower + transformation [Lahav and Vafeiadis, 2016]
  - Reordering of independent instructions

• Transformation is sound in Promise [Kang et al., 2017]

#### Compilation to Power

- Power = StrongPower + transformation [Lahav and Vafeiadis, 2016]
  - Reordering of independent instructions

• Transformation is sound in Promise [Kang et al., 2017]

• StrongPower  $\subseteq$  promise-free version of Promise

# Scheme isn't applicable to ARM POP

## Counterexample. ARM-Weak

$$a := [x]; \ // \ 1 \ || \ b := [x]; \ || \ c := [y];$$
  
 $[x] := 1; \ || \ [y] := b; \ || \ [x] := c;$ 

## Counterexample. ARM-Weak

$$a := [x]; \ // 1 \ | \ b := [x]; \ | \ c := [y];$$
  
 $[x] := 1; \ | \ [y] := b; \ | \ [x] := c;$ 

#### Allowed in $\operatorname{ARM}\,\mathsf{POP}$

## Counterexample. ARM-Weak

$$a := [x]; \ // 1 \ | \ b := [x]; \ | \ c := [y];$$
  
 $[x] := 1; \ | \ [y] := b; \ | \ [x] := c;$ 

**Allowed** in ARM POP; **Cannot** be explained by transformations over a stronger model

#### Main Differences between Promise and ARMv8 POP

# 1. Promise can execute only writes out-of-order

# 2. ARMv8 POP doesn't totally order writes to a specific location

## Main Proof Ingredients

# 1. "Lagging" simulation

# 2. ARMv8 POP + timestamps

## Main Proof Ingredients

# 1. "Lagging" simulation

# 2. ARMv8 POP + timestamps



### "Lagging" Simulation

a := [x]; b := [y]; c := [x];[y] := 1;

## "Lagging" Simulation

$$\begin{array}{rcl} a & := & [x]; \\ b & := & [y]; \\ \hline c & := & [x]; \\ [y] & := & 1; \end{array}$$

42



$$a := [x]; \leftarrow$$
 Fully executed by ARM  
 $b := [y];$   
 $c := [x]; \leftarrow$  Partially executed by ARM  
 $[y] := 1;$ 

$$a := [x]; \leftarrow$$
 Fullyexecuted by ARM $b := [y];$  $c := [x]; \leftarrow$  Partially executed by ARM $[y] := 1; \leftarrow$  Notexecuted by ARM



### "Lagging" Simulation



#### $\mathcal{I}$ — simulation relation

### "Lagging" Simulation



 $\mathcal{I} = \mathcal{I}_{Promise \text{ is waiting}} \cup \mathcal{I}_{Promise \text{ is executing}}$ 

### "Lagging" Simulation



## $\mathcal{I} = \mathcal{I}_{\text{Promise is waiting}} \cup \mathcal{I}_{\text{Promise is executing}}$

### "Lagging" Simulation



# $\mathcal{I} = \boxed{\mathcal{I}_{\text{Promise is waiting}}} \cup \mathcal{I}_{\text{Promise is executing}}$

### "Lagging" Simulation



 $\mathcal{I} = \mathcal{I}_{\text{Promise is waiting}} \cup \mathcal{I}_{\text{Promise is executing}}$ 



### "Lagging" Simulation



### $\mathcal{I} = \mathcal{I}_{Promise \text{ is waiting}} \cup \mathcal{I}_{Promise \text{ is executing}}$

### "Lagging" Simulation



42







## Main Proof Ingredients

# 1. "Lagging" simulation

# 2. ARMv8 POP + timestamps

Compilation correctness for  $\operatorname{Promise}\,\mathsf{MM}$ 

#### ARMv8 POP

$$[x] := 1; || [x] := 2;$$

44

Compilation correctness for  $\operatorname{Promise}\,\mathsf{MM}$ 

$$[x] := 1;$$
  $[x] := 2;$ 



CPU and PL MMs. Promise MM

Compilation correctness for Promise MM



CPU and PL MMs. Promise MM

Compilation correctness for Promise MM





CPU and PL MMs. Promise MM

Compilation correctness for Promise MM



CPU and PL MMs. Promise MM

Compilation correctness for Promise MM



### ARMv8 POP



CPU and PL MMs. Promise MM

Compilation correctness for Promise MM

### ARMv8 POP



### Let's determine the order beforehand!

### ARMv8 POP + Timestamps

$$[x] := 1;$$
  $[x] := 2;$ 

44

### ARMv8 POP + Timestamps

$$\begin{bmatrix} x \end{bmatrix} := 1; \qquad \begin{bmatrix} x \end{bmatrix} := 2;$$
$$\begin{bmatrix} x \end{bmatrix} \leftarrow 1 @ \mathbf{3}_{\tau} \end{bmatrix}$$



### ARMv8 POP + Timestamps

$$[x] := 1; \qquad || \qquad [x] := 2;$$
$$[x] \leftarrow 1 @ \mathbf{3}_{\tau} \qquad | \qquad | \qquad [x] \leftarrow 2 @ \mathbf{8}_{\tau}$$



### ARMv8 POP + Timestamps



### May propagate from left

### ARMv8 POP + Timestamps



### **Cannot** propagate from right!

### **Proof Structure**

### 1. Introduce $ARM + \tau$

## 2. Prove equivalence between $ARM+\tau$ and ARMv8 POP

3. Show "lagging" simulation from Promise to  $ARM+\tau$ 

# How to prove correctness of compilation?

## How to prove correctness of compilation?

### Standard technique: Simulation

## Simulation works for operational semantics

# Simulation works for operational semantics

### How to simulate graphs?

# Simulation works for operational semantics

### How to **simulate graphs**?

Traverse in proper order!



























### Traversal formally

### Cover step:

### $G \vdash \langle C, I \rangle \rightarrow \langle C \cup \{a\}, I \rangle$

Issue step:

### $G \vdash \langle C, I \rangle \rightarrow \langle C, I \cup \{w\} \rangle$

### Traversal formally

### Cover step:

### $G \vdash \langle C, I \rangle \rightarrow \langle C \cup \{a\}, I \rangle$

Mimics Promise requirements!

*Issue step*:

### $G \vdash \langle C, I \rangle \rightarrow \langle C, I \cup \{w\} \rangle$

### **Proof Structure**

#### 1. Prove Promise simulates traversal

### **Proof Structure**

#### 1. Prove Promise simulates traversal

### 2. Show completeness of traversal

### **Proof Structure**

#### 1. Prove Promise simulates traversal

### 2. Show completeness of traversal $\forall G. \ G \in \text{Consistent}(ARMv8.3) \Rightarrow$ $G \vdash \langle \emptyset, \emptyset \rangle \rightarrow^* \langle G.Events, G.Writes \rangle$



- MMs are important and complicated, but locks help
- Problems in existing MMs, but there are solutions
- Not all MMs might be explained by reorderings



- MMs are important and complicated, but locks help
- Problems in existing MMs, but there are solutions
- Not all MMs might be explained by reorderings

#### http://podkopaev.net

Thank you!

#### Links I

| Alglave, J., Maranget, L., and Tautschnig, M. (2014).<br>Herding cats: Modelling, simulation, testing, and data mining for weak memory.<br><i>ACM Trans. Program. Lang. Syst.</i> , 36(2):7:1–7:74.                                 |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Batty, M., Owens, S., Sarkar, S., Sewell, P., and Weber, T. (2011).<br>Mathematizing C++ concurrency.<br>In <i>POPL 2011</i> , pages 55–66. ACM.                                                                                    |
| Flur, S., Gray, K. E., Pulte, C., Sarkar, S., Sezgin, A., Maranget, L., Deacon, W., and<br>Sewell, P. (2016).<br>Modelling the ARMv8 architecture, operationally: Concurrency and ISA.<br>In <i>POPL 2016</i> , pages 608–621. ACM. |
| Jeffrey, A. and Riely, J. (2016).<br>On thin air reads: Towards an event structures model of relaxed memory.<br>In <i>LICS 2016</i> . IEEE.                                                                                         |
| Kang, J., Hur, CK., Lahav, O., Vafeiadis, V., and Dreyer, D. (2017).<br>A promising semantics for relaxed-memory concurrency.<br>In <i>POPL 2017</i> . ACM.                                                                         |
| Lahav, O. and Vafeiadis, V. (2016).<br>Explaining relaxed memory models with program transformations.<br>In <i>FM 2016</i> . Springer.                                                                                              |

#### Links II

Lamport, L. (1979).

How to make a multiprocessor computer that correctly executes multiprocess programs.

*IEEE Trans. Computers*, 28(9):690–691.



Manson, J., Pugh, W., and Adve, S. V. (2005). The Java memory model. In *POPL 2005*, pages 378–391. ACM.



Owens, S., Sarkar, S., and Sewell, P. (2009). A better x86 memory model: x86-TSO. In *TPHOL 2009*, pages 391–407.



Pichon-Pharabod, J. and Sewell, P. (2016).

A concurrency semantics for relaxed atomics that permits optimisation and avoids thin-air executions.

In POPL 2016, pages 622-633. ACM.



Podkopaev, A., Sergey, I., and Nanevski, A. (2016). Operational aspects of C/C++ concurrency. *CoRR*, abs/1606.01400.

#### Links III

Pulte, C., Flur, S., Deacon, W., French, J., Sarkar, S., and Sewell, P. (2018). Simplifying ARM concurrency: Multicopy-atomic axiomatic and operational models for ARMv8.