# Consistency Transactions Transactional Memory

Chris Rossbach

# Outline for Today

- Questions?
- Administrivia
  - Have you started the next lab yet?  $\textcircled{\odot}$
- Agenda
  - Consistency
  - Transactions
  - Transactional Memory
- Acks: Yoav Cohen for some STM slides

#### Faux Quiz questions

- How are promises and futures related? Since there is disagreement on the nomenclature, don't worry about which is which—just describe what the different objects are and how they function.
- How does HTM resemble or differ from Load-linked Stored-Conditional?
- What are some pros and cons of HTM vs STM?
- What is Open Nesting? Closed Nesting? Flat Nesting?
- How does 2PL differ from 2PC?
- Define ACID properties: which, if any, of these properties does TM relax?

#### Memory Consistency

## Memory Consistency

- Formal specification of memory semantics
  - Statement of how shared memory will behave with multiple CPUs
  - Ordering of reads and writes

#### Memory Consistency

- Formal specification of memory semantics
  - Statement of how shared memory will behave with multiple CPUs
  - Ordering of reads and writes
- Memory Consistency != Cache Coherence
  - Coherence: propagate updates to cached copies
    - Invalidate vs. Update
  - Coherence vs. Consistency?
    - **Coherence:** ordering of ops. at a single location
    - **Consistency:** ordering of ops. at multiple locations



 Result of *any* execution is same as if all operations execute on a uniprocessor



- Result of *any* execution is same as if all operations execute on a uniprocessor
- Operations on each processor are *totally ordered* in the sequence and respect program order for each processor



- Result of *any* execution is same as if all operations execute on a uniprocessor
- Operations on each processor are *totally ordered* in the sequence and respect program order for each processor



- In program order
- Read returns value of last write

- Result of *any* execution is same as if all operations execute on a uniprocessor
- Operations on each processor are *totally ordered* in the sequence and respect program order for each processor



- Why do modern CPUs not implement SC?
- Requirements: *program order, write atomicity*



- All operations are executed in *some* sequential order
- each process issues operations in program order
  - Any valid interleaving is allowed
  - All *agree* on the same interleaving
  - Each process preserves its program order

| P1: W | (x)a  |       |       | P1: W( | (x)a  |       |       |
|-------|-------|-------|-------|--------|-------|-------|-------|
| P2:   | W(x)b |       |       | P2:    | W(x)b |       |       |
| P3:   |       | R(x)b | R(x)a | P3:    |       | R(x)b | R(x)a |
| P4:   |       | R(x)b | R(x)a | P4:    |       | R(x)a | R(x)b |
|       |       | (a)   |       |        |       | (b)   |       |

- All operations are executed in *some* sequential order
- each process issues operations in program order
  - Any valid interleaving is allowed
  - All *agree* on the same interleaving
  - Each process preserves its program order

| P1: W( | Ла    |       |       | <u>P1:</u> W( | x)a   |       |           |
|--------|-------|-------|-------|---------------|-------|-------|-----------|
| P2:    | W(x)b |       |       | P2:           | W(x)b |       |           |
| P3:    |       | R(x)b | R(x)a | P3:           |       | R(x)b | R(x)a     |
| P4:    |       | R(x)b | R(x)a | P4:           |       | R(x   | :)a R(x)b |
|        |       | (a)   |       |               |       | (b)   |           |

Are either of these SC?

#### Sequential Consistency: Canonical Example

Initially, Flag1 = Flag2 = 0

 P1
 P2

 Flag1 = 1
 Flag2 = 1

 if (Flag2 == 0)
 if (Flag1 == 0)

 enter CS
 enter CS

#### Sequential Consistency: Canonical Example

Initially, Flag1 = Flag2 = 0

# P1 P2 Flag1 = 1 Flag2 = 1 if (Flag2 == 0) if (Flag1 == 0) enter CS enter CS

Can both P1 and P2 wind up in the critical section at the same time?

#### Do we need Sequential Consistency?

Initially, Flag1 = Flag2 = 0

 P1
 P2

 Flag1 = 1
 Flag1 = 1

if(Flag2 == 0) data++

#### Do we need Sequential Consistency?

Initially, Flag1 = Flag2 = 0

<u>P1</u> <u>P2</u> Flag1 = 1

if(Flag2 == 0)

data++

Key issue:

- P1 and P2 may not see each other's writes in the same order
- Implication: both in critical section, which is incorrect
- Why would this happen?

#### Do we need Sequential Consistency?

Initially, Flag1 = Flag2 = 0

**<u>P1</u>** <u>**P2**</u> Flag1 = 1

if(Flag2 == 0)

data++

Key issue:

- P1 and P2 may not see each other's writes in the same order
- Implication: both in critical section, which is incorrect
- Why would this happen?



#### Write Buffers

- P\_0 write  $\rightarrow$  queue op in write buffer, proceed
- P\_0 read  $\rightarrow$  look in write buffer,
- $P_(x \neq 0)$  read  $\rightarrow$  old value: write buffer hasn't drained

- Program Order
  - Processor's memory operations must complete in program order

- Program Order
  - Processor's memory operations must complete in program order
- Write Atomicity
  - Writes to the same location seen by all other CPUs
  - Subsequent reads must not return value of a write until propagated to all

- Program Order
  - Processor's memory operations must complete in program order
- Write Atomicity
  - Writes to the same location seen by all other CPUs
  - Subsequent reads must not return value of a write until propagated to all
- Write acknowledgements are necessary
  - Cache coherence provides these properties for a cache-only system

- Program Order
  - Processor's memory operations must complete in program order
- Write Atomicity
  - Writes to the same location seen by all other CPUs
  - Subsequent reads must not return value of a write until propagated to all
- Write acknowledgements are necessary
  - Cache coherence provides these properties for a cache-only system

Disadvantages:

- Difficult to implement!
  - Coherence to (e.g.) write buffers is hard
- Sacrifices many potential optimizations
  - Hardware (cache) and software (compiler)
  - Major performance hit

# Why Relax Consistency?

- Motivation, originally
  - Allow in-order processors to overlap store latency with other work
  - "Other work" depends on loads, so loads bypass stores using a *store queue*
- PC (processor consistency), SPARC TSO, IBM/370
  - Just relax read-to-write program order requirement
- Subsequently
  - Hide latency of one store with latency of other stores
  - Stores to be performed OOO with respect to each other
  - Breaks SC even further
- This led to definition of SPARC PSO/RMO, WO, PowerPC WC, Itanium
- What's the problem with relaxed consistency?
  - Shared memory programs can break if not written for specific cons. model

• **<u>Program Order</u>** relaxations (different locations)

•  $W \rightarrow R$ ;  $W \rightarrow W$ ;  $R \rightarrow R/W$ 

- **Program Order** relaxations (different locations)
  - $W \rightarrow R$ ;  $W \rightarrow W$ ;  $R \rightarrow R/W$
- Write Atomicity relaxations
  - Read returns another processor's Write early

- **Program Order** relaxations (different locations)
  - $W \rightarrow R$ ;  $W \rightarrow W$ ;  $R \rightarrow R/W$
- Write Atomicity relaxations
  - Read returns another processor's Write early
- *Requirement:* synchronization primitives for safety
  - Fence, barrier instructions etc

- **Program Order** relaxations (different locations)
  - $W \rightarrow R$ ;  $W \rightarrow W$ ;  $R \rightarrow R/W$
- Write Atomicity relaxations
  - Read returns another processor's V
- Requirement: synchronization pri
  - Fence, barrier instructions etc

| THE D             |                                                                                                                                                                                            | D DW                                           | <b>D</b> 101 1          | D 10                    |                            |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|-------------------------|-------------------------|----------------------------|
| $W \rightarrow R$ | $\mathbf{W} \rightarrow \mathbf{W}$                                                                                                                                                        | $\mathbf{R} \rightarrow \mathbf{R} \mathbf{W}$ |                         |                         | Safety net                 |
| Order             | Order                                                                                                                                                                                      | Order                                          | Write Early             | Write Early             |                            |
|                   |                                                                                                                                                                                            |                                                |                         | $\checkmark$            |                            |
| $\checkmark$      |                                                                                                                                                                                            |                                                |                         |                         | serialization instructions |
| $\checkmark$      |                                                                                                                                                                                            |                                                |                         | $\checkmark$            | RMW                        |
| $\checkmark$      |                                                                                                                                                                                            |                                                | $\overline{\mathbf{v}}$ | $\overline{\mathbf{v}}$ | RMW                        |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               |                                                |                         | $\checkmark$            | RMW, STBAR                 |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   |                         | $\checkmark$            | synchronization            |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   |                         | $\checkmark$            | release, acquire, nsync,   |
|                   |                                                                                                                                                                                            |                                                |                         |                         | RMW                        |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   | $\checkmark$            | $\checkmark$            | release, acquire, nsync,   |
|                   |                                                                                                                                                                                            |                                                |                         |                         | RMW                        |
| $\checkmark$      |                                                                                                                                                                                            | $\checkmark$                                   |                         | $\checkmark$            | MB, WMB                    |
|                   | $\overline{\checkmark}$                                                                                                                                                                    | $\overline{}$                                  |                         | $\overline{\mathbf{A}}$ | various MEMBAR's           |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   | $\checkmark$            | $\checkmark$            | SYNC                       |
|                   | $W \rightarrow R$<br>Order<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$ |                                                |                         |                         |                            |

#### **Relaxed** Consis

```
static inline void arch_write_lock(arch_rwlock_t *rw) {
    asm volatile(LOCK_PREFIX_WRITE_LOCK_SUB(%1) "(%0)\n\t"
        "jz 1f\n"
        "call __write_lock_failed\n\t"
        "1:\n"
        ::LOCK_PTR_REG (&rw->write), "i" (RW_LOCK_BIAS) : "memory"); }
```

- **Program Order** relaxations (different locations)
  - $W \rightarrow R$ ;  $W \rightarrow W$ ;  $R \rightarrow R/W$
- Write Atomicity relaxations
  - Read returns another processor's V
- Requirement: synchronization pri
  - Fence, barrier instructions etc

| Relaxation      | $W \rightarrow R$<br>Order | $W \rightarrow W$<br>Order | $\begin{array}{c} R \rightarrow RW \\ \textbf{Order} \end{array}$ | Read Others'<br>Write Early | Read Own<br>Write Early | Safety net                      |
|-----------------|----------------------------|----------------------------|-------------------------------------------------------------------|-----------------------------|-------------------------|---------------------------------|
| SC [16]         |                            |                            |                                                                   |                             | $\checkmark$            |                                 |
| IBM 370 [14]    | $\checkmark$               |                            |                                                                   |                             |                         | serialization instructions      |
| TSO [20]        | $\checkmark$               |                            |                                                                   |                             | $\checkmark$            | RMW                             |
| PC [13, 12]     | $\checkmark$               |                            |                                                                   | $\sim$                      | $\checkmark$            | RMW                             |
| PSO [20]        | $\sim$                     | $\checkmark$               |                                                                   |                             | $\checkmark$            | RMW, STBAR                      |
| WO [5]          | $\checkmark$               | $\checkmark$               | $\checkmark$                                                      |                             | $\checkmark$            | synchronization                 |
| RCsc [13, 12]   | $\checkmark$               | $\checkmark$               | $\checkmark$                                                      |                             | $\checkmark$            | release, acquire, nsync,<br>RMW |
| RCpc [13, 12]   | $\checkmark$               | $\checkmark$               | $\checkmark$                                                      | $\checkmark$                | $\checkmark$            | release, acquire, nsync,<br>RMW |
| Alpha [19]      | $\checkmark$               | $\sim$                     | $\sim$                                                            |                             | $\checkmark$            | MB, WMB                         |
| RMO [21]        | $\overline{}$              | $\sim$                     | $\sim$                                                            |                             | $\sim$                  | various MEMBAR's                |
| PowerPC [17, 4] | $\overline{}$              | $\overline{}$              | $\sim$                                                            | $\overline{\mathbf{A}}$     | $\overline{\mathbf{v}}$ | SYNC                            |

- **Program Order** relaxations (different locations)
  - $W \rightarrow R$ ;  $W \rightarrow W$ ;  $R \rightarrow R/W$
- Write Atomicity relaxations
  - Read returns another processor's V
- Requirement: synchronization pri
  - Fence, barrier instructions etc

| THE D             | <b>NI N</b>                                                                                                                                                                                | D DW                                           | <b>D</b> 101 1          | D 10                    |                            |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|-------------------------|-------------------------|----------------------------|
| $W \rightarrow R$ | $\mathbf{W} \rightarrow \mathbf{W}$                                                                                                                                                        | $\mathbf{R} \rightarrow \mathbf{R} \mathbf{W}$ |                         |                         | Safety net                 |
| Order             | Order                                                                                                                                                                                      | Order                                          | Write Early             | Write Early             |                            |
|                   |                                                                                                                                                                                            |                                                |                         | $\checkmark$            |                            |
| $\checkmark$      |                                                                                                                                                                                            |                                                |                         |                         | serialization instructions |
| $\checkmark$      |                                                                                                                                                                                            |                                                |                         | $\checkmark$            | RMW                        |
| $\checkmark$      |                                                                                                                                                                                            |                                                | $\overline{\mathbf{v}}$ | $\overline{\mathbf{v}}$ | RMW                        |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               |                                                |                         | $\checkmark$            | RMW, STBAR                 |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   |                         | $\checkmark$            | synchronization            |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   |                         | $\checkmark$            | release, acquire, nsync,   |
|                   |                                                                                                                                                                                            |                                                |                         |                         | RMW                        |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   | $\sim$                  | $\checkmark$            | release, acquire, nsync,   |
|                   |                                                                                                                                                                                            |                                                |                         |                         | RMW                        |
| $\checkmark$      |                                                                                                                                                                                            | $\checkmark$                                   |                         | $\checkmark$            | MB, WMB                    |
|                   | $\overline{\checkmark}$                                                                                                                                                                    | $\overline{}$                                  |                         | $\overline{\mathbf{A}}$ | various MEMBAR's           |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   | $\checkmark$            | $\checkmark$            | SYNC                       |
|                   | $W \rightarrow R$<br>Order<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$ |                                                |                         |                         |                            |

• **Program Order** relaxations (different locations) • W  $\rightarrow$  R; W  $\rightarrow$  W; R  $\rightarrow$  R/W static inline unsigned long )ns \_\_arch\_spin\_trylock(arch\_spinlock\_t \*lock) essor's V unsigned long tmp, token; tion pri token = LOCK TOKEN;  $W \rightarrow R$  $W \rightarrow W$  $R \rightarrow RW$ Relaxation Read Others Read Own Safety net Order Order Order Write Early Write Early \_asm\_\_ \_\_volatile\_ ( SC [16] etc "1: " **PPC\_LWARX(%0,0,%2,1)** "\n\ IBM 370 [14] serialization instructions  $\sqrt{}$ cmpwi 0,%0,0\n\ **TSO** [20] RMW PC [13, 12] RMW bne- 2f\n\ PSO [20] RMW, STBAR stwcx. %1,0,%2\n\ WO [5] synchronization  $\sqrt{}$ bne- 1b\n" RCsc [13, 12] 1 release, acquire, nsync, RMW **PPC ACQUIRE BARRIER** RCpc [13, 12]  $\sqrt{}$  $\sqrt{}$  $\sqrt{}$  $\sqrt{}$ release, acquire, nsync,  $\sqrt{}$ "2:" : "=&r" (tmp) RMW MB. WMB Alpha [19]  $\sqrt{}$ : "r" (token), "r" (&lock->slock) **RMO** [21] various MEMBAR's  $\sqrt{}$ : "cr0", "memory"); PowerPC [17, 4] SYNC return tmp; PowerPC

- **Program Order** relaxations (different locations)
  - $W \rightarrow R$ ;  $W \rightarrow W$ ;  $R \rightarrow R/W$
- Write Atomicity relaxations
  - Read returns another processor's V
- Requirement: synchronization pri
  - Fence, barrier instructions etc

| THE D             |                                                                                                                                                                                            | D DW                                           | <b>D</b> 101 1          | D 10                    |                            |
|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|-------------------------|-------------------------|----------------------------|
| $W \rightarrow R$ | $\mathbf{W} \rightarrow \mathbf{W}$                                                                                                                                                        | $\mathbf{R} \rightarrow \mathbf{R} \mathbf{W}$ |                         |                         | Safety net                 |
| Order             | Order                                                                                                                                                                                      | Order                                          | Write Early             | Write Early             |                            |
|                   |                                                                                                                                                                                            |                                                |                         | $\checkmark$            |                            |
| $\checkmark$      |                                                                                                                                                                                            |                                                |                         |                         | serialization instructions |
| $\checkmark$      |                                                                                                                                                                                            |                                                |                         | $\checkmark$            | RMW                        |
| $\checkmark$      |                                                                                                                                                                                            |                                                | $\overline{\mathbf{v}}$ | $\overline{\mathbf{v}}$ | RMW                        |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               |                                                |                         | $\checkmark$            | RMW, STBAR                 |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   |                         | $\checkmark$            | synchronization            |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   |                         | $\checkmark$            | release, acquire, nsync,   |
|                   |                                                                                                                                                                                            |                                                |                         |                         | RMW                        |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   | $\sim$                  | $\checkmark$            | release, acquire, nsync,   |
|                   |                                                                                                                                                                                            |                                                |                         |                         | RMW                        |
| $\checkmark$      |                                                                                                                                                                                            | $\checkmark$                                   |                         | $\checkmark$            | MB, WMB                    |
|                   | $\overline{\checkmark}$                                                                                                                                                                    | $\overline{}$                                  |                         | $\overline{\mathbf{A}}$ | various MEMBAR's           |
| $\checkmark$      | $\checkmark$                                                                                                                                                                               | $\checkmark$                                   | $\checkmark$            | $\checkmark$            | SYNC                       |
|                   | $W \rightarrow R$<br>Order<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$<br>$\checkmark$ |                                                |                         |                         |                            |

# Some Key Consistency Models

#### TSO

- x86
- Stores are totally ordered, reads not
- Differs from PC by allowing early reads of processor's own writes

#### **RC: Release Consistency**

- Key insight: only synchronization references need to be ordered
- Hence, relax memory for all other references
  - Enable high-performance OOO implementation
- Programmer labels synchronization references
  - Hardware must carefully order these labeled references
- Labeling schemes:
  - Explicit synchronization ops (acquire/release)
  - Memory fence or memory barrier ops:
    - All preceding ops must finish before following ones begin
- Fence ops drain pipeline

#### Transactions and Transactional Memory

• 3 Programming Model Dimensions:

- 3 Programming Model Dimensions:
  - How to specify computation

- 3 Programming Model Dimensions:
  - How to specify computation
  - How to specify communication

#### • 3 Programming Model Dimensions:

- How to specify computation
- How to specify communication
- How to specify coordination/control transfer

- 3 Programming Model Dimensions:
  - How to specify computation
  - How to specify communication
  - How to specify coordination/control transfer



- 3 Programming Model Dimensions:
  - How to specify computation
  - How to specify communication
  - How to specify coordination/control transfer
- Threads, Futures, Events etc.
  - Mostly about how to express control



- 3 Programming Model Dimensions:
  - How to specify computation
  - How to specify communication
  - How to specify coordination/control transfer
- Threads, Futures, Events etc.
  - Mostly about how to express control
- Transactions
  - Mostly about how to deal with shared state



#### Transactions

*Core issue: multiple updates* 

Canonical examples:

```
move(file, old-dir, new-dir) { create(file, dir) {
    delete(file, old-dir)
    add(file, new-dir)
} create(file, dir) {
    alloc-disk(file, header, data)
    write(header)
    add (file, dir)
}
```

### Transactions

*Core issue: multiple updates* 

Canonical examples:

```
move(file, old-dir, new-dir) { create(file, dir) {
    delete(file, old-dir)
    add(file, new-dir)
} create(file, dir) {
    alloc-disk(file, header, data)
    write(header)
    add (file, dir)
    }
}
```

- Modified data in memory/caches
- Even if in-memory data is durable, multiple disk updates

## Transactions

*Core issue: multiple updates* 

Canonical examples:

```
move(file, old-dir, new-dir) { create(file, dir) {
    delete(file, old-dir)
    add(file, new-dir)
} create(file, dir) {
    alloc-disk(file, header, data)
    write(header)
    add (file, dir)
}
```

Problems: crash in the middle / visibility of intermediate state

- Modified data in memory/caches
- Even if in-memory data is durable, multiple disk updates

- Want reliable update of two resources (e.g. in two disks, machines...)
  - Move file from A to B
  - Create file (update free list, inode, data block)
  - Bank transfer (move \$100 from my account to VISA account)
  - Move directory from server A to B

- Want reliable update of two resources (e.g. in two disks, machines...)
  - Move file from A to B
  - Create file (update free list, inode, data block)
  - Bank transfer (move \$100 from my account to VISA account)
  - Move directory from server A to B
- Machines can crash, messages can be lost

- Want reliable update of two resources (e.g. in two disks, machines...)
  - Move file from A to B
  - Create file (update free list, inode, data block)
  - Bank transfer (move \$100 from my account to VISA account)
  - Move directory from server A to B
- Machines can crash, messages can be lost

Can we use messages? E.g. with retries over unreliable medium to synchronize with guarantees?

- Want reliable update of two resources (e.g. in two disks, machines...)
  - Move file from A to B
  - Create file (update free list, inode, data block)
  - Bank transfer (move \$100 from my account to VISA account)
  - Move directory from server A to B
- Machines can crash, messages can be lost

Can we use messages? E.g. with retries over unreliable medium to synchronize with guarantees?

## No.

Not even if all messages get through!

• Two generals on separate mountains

- Two generals on separate mountains
- Can only communicate via messengers

- Two generals on separate mountains
- Can only communicate via messengers
- Messengers can get lost or captured

- Two generals on separate mountains
- Can only communicate via messengers
- Messengers can get lost or captured
- Need to coordinate attack
  - attack at same time good, different times bad!

- Two generals on separate mountains
- Can only communicate via messengers
- Messengers can get lost or captured
- Need to coordinate attack
  - attack at same time good, different times bad!





- Two generals on separate mountains
- Can only communicate via messengers
- Messengers can get lost or captured
- Need to coordinate attack
  - attack at same time good, different times bad!





- Two generals on separate mountains
- Can only communicate via messengers
- Messengers can get lost or captured
- Need to coordinate attack
  - attack at same time good, different times bad!



General A  $\rightarrow$  General B: let's attack at dawn



- Two generals on separate mountains
- Can only communicate via messengers
- Messengers can get lost or captured
- Need to coordinate attack
  - attack at same time good, different times bad!



General A  $\rightarrow$  General B: let's attack at dawn General B  $\rightarrow$  General A: OK, dawn.



- Two generals on separate mountains
- Can only communicate via messengers
- Messengers can get lost or captured
- Need to coordinate attack
  - attack at same time good, different times bad!



General A → General B: let's attack at dawn General B → General A: OK, dawn. General A → General B: Check. Dawn it is.



- Two generals on separate mountains
- Can only communicate via messengers
- Messengers can get lost or captured
- Need to coordinate attack
  - attack at same time good, different times bad!



General A → General B: let's attack at dawn General B → General A: OK, dawn. General A → General B: Check. Dawn it is. General B → General A: Alright already—dawn.



- Two generals on separate mountains
- Can only communicate via messengers
- Messengers can get lost or captured
- Need to coordinate attack
  - attack at same time good, different times bad!



General A → General B: let's attack at dawn General B → General A: OK, dawn. General A → General B: Check. Dawn it is. General B → General A: Alright already—dawn.

- Even if all messages delivered, can't assumemaybe some message didn't get through.
- No solution: one of the few CS impossibility results.



#### Transactions can help (but can't solve it)

#### Transactions can help (but can't solve it)

- Solves weaker problem:
  - 2 things will either happen or not
  - not necessarily at the same time

# Transactions can help (but can't solve it)

- Solves weaker problem:
  - 2 things will either happen or not
  - not necessarily at the same time
- Core idea: one entity has the power to say yes or no for all
  - Local txn: one final update (TxEND) irrevocably triggers several
  - Distributed transactions
    - 2 phase commit
    - One machine has final say for all machines
    - Other machines bound to comply

# Transactions can help (but can't solve it)

- Solves weaker problem:
  - 2 things will either happen or not
  - not necessarily at the same time
- Core idea: one entity has the power to say yes or no for all
  - Local txn: one final update (TxEND) irrevocably triggers several
  - Distributed transactions
    - 2 phase commit
    - One machine has final say for all machines
    - Other machines bound to comply

What is the role of synchronization here?

#### Transactional Programming Model

begin transaction;

#### Transactional Programming Model

begin transaction;

x = read("x-values", ....); y = read("y-values", ....); z = x+y; write("z-values", z, ....); commit transaction;

What has changed from previous programming models?

#### **ACID Semantics**

#### **ACID Semantics**

#### What are they?

- A
- C
- |
- D

#### **ACID Semantics**

begin transaction;

x = read("x-values", ....);

y = read("y-values", ....);

z = x+y;

write("z-values", z, ....);

commit transaction;

• Atomic – all updates happen or none do

begin transaction;

x = read("x-values", ....);

y = read("y-values", ....);

z = x+y;

write("z-values", z, ....);

commit transaction;

- Atomic all updates happen or none do
- Consistent system invariants maintained across updates

begin transaction; x = read("x-values", ....);

y = read("y-values", ....);

z = x+y;

write("z-values", z, ....);

commit transaction;

- Atomic all updates happen or none do
- Consistent system invariants maintained across updates
- Isolated no visibility into partial updates

begin transaction; x = read("x-values", ....); y = read("y-values", ....); z = x+y; write("z-values", z, ....); commit transaction;

- Atomic all updates happen or none do
- Consistent system invariants maintained across updates
- Isolated no visibility into partial updates
- Durable once done, stays done

begin transaction; x = read("x-values", ....); y = read("y-values", ....); z = x+y; write("z-values", z, ....); commit transaction;

- Atomic all updates happen or none do
- Consistent system invariants maintained across updates
- Isolated no visibility into partial updates
- Durable once done, stays done
- Are subsets ever appropriate?
  - When would ACI be useful?
  - ACD?
  - Isolation only?

begin transaction; x = read("x-values", ....); y = read("y-values", ....); z = x+y; write("z-values", z, ....); commit transaction;

• Key idea: turn multiple updates into a single one

- Key idea: turn multiple updates into a single one
- Many implementation Techniques
  - Two-phase locking
  - Timestamp ordering
  - Optimistic Concurrency Control
  - Journaling
  - 2,3-phase commit
  - Speculation-rollback
  - Single global lock
  - Compensating transactions

- Key idea: turn multiple updates into a single one
- Many implementation Techniques
  - Two-phase locking
  - Timestamp ordering
  - Optimistic Concurrency Control
  - Journaling
  - 2,3-phase commit
  - Speculation-rollback
  - Single global lock
  - Compensating transactions

Key problems:

- output commit
- synchronization

- Key idea: turn multiple updates into a single one
- Many implementation Techniques
  - Two-phase locking
  - Timestamp ordering
  - Optimistic Concurrency Control
  - Journaling
  - 2,3-phase commit
  - Speculation-rollback
  - Single global lock
  - Compensating transactions

Key problems:

- output commit
- synchronization



```
BEGIN_TXN();
x = read("x-values", ....);
y = read("y-values", ....);
z = x+y;
write("z-values", z, ....);
COMMIT_TXN();
```

BEGIN\_TXN(); x = read("x-values", ....); y = read("y-values", ....); z = x+y; write("z-values", z, ....); COMMIT\_TXN();





BEGIN\_TXN(); x = read("x-values", ....); y = read("y-values", ....); z = x+y; write("z-values", z, ....); COMMIT\_TXN(); BEGIN\_TXN() {
LOCK(single-global-lock);

COMMIT\_TXN() {
 UNLOCK(single-global-lock);
}

BEGIN\_TXN(); x = read("x-values", ....); y = read("y-values", ....); z = x+y; write("z-values", z, ....); COMMIT\_TXN(); BEGIN\_TXN() {
LOCK(single-global-lock);

COMMIT\_TXN() { UNLOCK(single-global-lock); }

Pros/Cons?

- Phase 1: only acquire locks in order
- Phase 2: unlock at commit
- avoids deadlock

```
BEGIN_TXN();
Lock x, y
x = x + 1
y = y - 1
unlock y, x
COMMIT_TXN();
```

- Phase 1: only acquire locks in order
- Phase 2: unlock at commit
- avoids deadlock

```
BEGIN_TXN();
Lock x, y
x = x + 1
y = y - 1
unlock y, x
COMMIT_TXN();
```



- Phase 1: only acquire locks in order
- Phase 2: unlock at commit
- avoids deadlock

```
BEGIN_TXN();
Lock x, y
x = x + 1
y = y - 1
unlock y, x
COMMIT_TXN();
```

BEGIN\_TXN() {
 rwset = Union(rset, wset);
 rwset = sort(rwset);
 forall x in rwset
 LOCK(x);
}

COMMIT\_TXN() {
forall x in rwset
UNLOCK(x);

- Phase 1: only acquire locks in order
- Phase 2: unlock at commit
- avoids deadlock

```
BEGIN_TXN();
Lock x, y
x = x + 1
y = y - 1
unlock y, x
COMMIT_TXN();
```

BEGIN\_TXN() {
 rwset = Union(rset, wset);
 rwset = sort(rwset);
 forall x in rwset
 LOCK(x);
}

COMMIT\_TXN() {
 forall x in rwset
 UNLOCK(x);

Pros/Cons?

- Phase 1: only acquire locks in order
- Phase 2: unlock at commit
- avoids deadlock

```
BEGIN_TXN();
Lock x, y
x = x + 1
y = y - 1
unlock y, x
COMMIT_TXN();
```

BEGIN\_TXN() {
 rwset = Union(rset, wset);
 rwset = sort(rwset);
 forall x in rwset
 LOCK(x);
}

COMMIT\_TXN() {
 forall x in rwset
 UNLOCK(x);

Pros/Cons? What happens on failures?

- Phase 1: only acquire locks in order
- Phase 2: unlock at commit
- avoids deadlock

```
BEGIN_TXN();
Lock x, y
x = x + 1
y = y - 1
unlock y, x
COMMIT_TXN();
```

A: grab locks A: modify x, y, A: unlock y, x B: grab locks B: update x, y B: unlock y, x **B: COMMIT** A: CRASH

BEGIN\_TXN() {
 rwset = Union(rset, wset);
 rwset = sort(rwset);
 forall x in rwset
 LOCK(x);
}

COMMIT\_TXN() {
 forall x in rwset
 UNLOCK(x);

Pros/Cons? What happens on failures?

- Phase 1: only acquire locks in ch
- Phase 2: unlock at commit
- avoids deadlock

```
BEGIN_TXN();
Lock x, y
x = x + 1
y = y - 1
unlock y, x
COMMIT_TXN();
```

B commits changes that depend on A's updates

A: grab locks A: modify x, y, A: unlock y, x B: grab locks B: update x, y B: unlock y, x B: COMMIT A: CRASH BEGIN\_TXN() {
 rwset = Union(rset, wset);
 rwset = sort(rwset);
 forall x in rwset
 LOCK(x);
}

COMMIT\_TXN() {
 forall x in rwset
 UNLOCK(x);

Pros/Cons? What happens on failures?

## Two-phase commit

- N participants agree or don't (atomicity)
- Phase 1: everyone "prepares"
- Phase 2: Master decides and tells everyone to actually commit
- What if the master crashes in the middle?

## 2PC: Phase 1

- 1. Coordinator sends REQUEST to all participants
- 2. Participants receive request and
- 3. Execute locally
- 4. Write VOTE\_COMMIT or VOTE\_ABORT to local log
- 5. Send VOTE\_COMMIT or VOTE\_ABORT to coordinator

Example—move:  $C \rightarrow S1$ : delete foo from /,  $C \rightarrow S2$ : add foo to /

| Failure case:                         | Success case:                         |
|---------------------------------------|---------------------------------------|
| S1 writes rm /foo, VOTE_COMMIT to log | S1 writes rm /foo, VOTE_COMMIT to log |
| S1 sends VOTE_COMMIT                  | S1 sends VOTE_COMMIT                  |
| S2 decides permission problem         | S2 writes add foo to /                |
| S2 decides permission problem         | S2 writes add foo to /                |
| S2 writes/sends VOTE_ABORT            | S2 writes/sends VOTE_COMMIT           |

## 2PC: Phase 2

- Case 1: receive VOTE\_ABORT or timeout
  - Write GLOBAL\_ABORT to log
  - send GLOBAL\_ABORT to participants
- Case 2: receive VOTE\_COMMIT from all
  - Write GLOBAL\_COMMIT to log
  - send GLOBAL\_COMMIT to participants
- Participants receive decision, write GLOBAL\_\* to log

# 2PC corner cases

#### Phase 1

- 1. Coordinator sends REQUEST to all participants
- X 2. Participants receive request and
  - 3. Execute locally
  - 4. Write VOTE\_COMMIT or VOTE\_ABORT to local log
  - 5. Send VOTE\_COMMIT or VOTE\_ABORT to coordinator

#### <u>Phase 2</u>

- Y Case 1: receive VOTE\_ABORT or timeout
  - Write GLOBAL\_ABORT to log
  - send GLOBAL\_ABORT to participants
  - Case 2: receive VOTE\_COMMIT from all
    - Write GLOBAL\_COMMIT to log
      - send GLOBAL\_COMMIT to participants
- Z Participants recv decision, write GLOBAL\_\* to log

- What if participant crashes at X?
- Coordinator crashes at Y?
- Participant crashes at Z?
- Coordinator crashes at W?

• Coordinator crashes at W, never wakes up

- Coordinator crashes at W, never wakes up
- All nodes block forever!

- Coordinator crashes at W, never wakes up
- All nodes block forever!
- Can participants ask each other what happened?

- Coordinator crashes at W, never wakes up
- All nodes block forever!
- Can participants ask each other what happened?
- 2PC: always has risk of indefinite blocking

- Coordinator crashes at W, never wakes up
- All nodes block forever!
- Can participants ask each other what happened?
- 2PC: always has risk of indefinite blocking
- Solution: (yes) 3 phase commit!
  - Reliable replacement of crashed "leader"
  - 2PC often good enough in practice

- Composition of transactions
  - E.g. interact with multiple organizations, each supporting txns
  - Travel agency: canonical example

- Composition of transactions
  - E.g. interact with multiple organizations, each supporting txns
  - Travel agency: canonical example
- Nesting: view transaction as collection of:
  - actions on unprotected objects
  - protected actions that my be undone or redone
  - real actions that may be deferred but not undone
  - nested transactions that may be undone

- 3 basic flavors:
- \* Flat: subsume inner transactions
- \* Closed: subsume w partial rollback
- \* **Open:** pause transactional context

- Composition of transactions
  - E.g. interact with multiple organizations, each supporting txns
  - Travel agency: canonical example
- Nesting: view transaction as collection of:
  - actions on unprotected objects
  - protected actions that my be undone or redone
  - real actions that may be deferred but not undone
  - nested transactions that may be undone

## Nested Transactions

- 3 basic flavors:
- \* Flat: subsume inner transactions
- \* **Closed:** subsume w partial rollback
- \* **Open:** pause transactional context

- Composition of transactions
  - E.g. interact with multiple organizations, each supporting txns
  - Travel agency: canonical example
- Nesting: view transaction as collection of:
  - actions on unprotected objects
  - protected actions that my be undone or redone
  - real actions that may be deferred but not undone
  - nested transactions that may be undone
- Open Nesting details:
  - Nested transaction returns name and parameters of compensating transaction
  - Parent includes compensating transaction in log of parent transaction
  - Invoke compensating transactions from log if parent transaction aborted
  - Consistent, atomic, durable, but not isolated

## Nesting Semantics Exercise

1 BeginTX() X = read(x)2 Y = read(y)3 write(x, X+1+Y) 4 5 BeginTX() Z = read(z) + X + Y6 7 EndTX() 8 EndTX() 9

#### What if TX aborts btw 7,8

- Under flat nesting?
- Under closed nesting?
- Under open nesting?

## Transactional Memory: ACI

Transactional Memory :

- Make multiple memory accesses atomic
- All or nothing Atomicity
- No interference Isolation
- Correctness Consistency
- No durability, for obvious reasons

Keywords :

Commit, Abort, Speculative access, Checkpoint

## Transactional Memory: ACI

Transactional Memory :

- Make multiple memory accesses atomic
- All or nothing Atomicity
- No interference Isolation
- Correctness Consistency
- No durability, for obvious reasons

Keywords :

Commit, Abort, Speculative access, Checkpoint

```
remove(list, x) {
  lock(list);
  pos = find(list, x);
  if(pos)
    erase(list, pos);
  unlock(list);
}
```

## Transactional Memory: ACI

Transactional Memory :

- Make multiple memory accesses atomic
- All or nothing Atomicity
- No interference Isolation
- Correctness Consistency
- No durability, for obvious reasons

Keywords :

Commit, Abort, Speculative access, Checkpoint

```
remove(list, x) {
  lock(list);
  pos = find(list, x);
  if(pos)
     erase(list, pos);
  unlock(list);
}
remove(list, x) {
  TXBEGIN();
```

```
TXBEGIN();
pos = find(list, x);
if(pos)
    erase(list, pos);
TXEND();
}
```

## The **Real** Goal

```
remove(list, x) {
  lock(list);
  pos = find(list, x);
  if(pos)
     erase(list, pos);
  unlock(list);
}
```

```
remove(list, x) {
   TXBEGIN();
   pos = find(list, x);
   if(pos)
      erase(list, pos);
   TXEND();
}
```

### The **Real** Goal



## The **Real** Goal

```
remove(list, x) {
   atomic {
     pos = find(list, x);
     if(pos)
        erase(list, pos);
   }
}
```

(ist, x) remov k(list); = find(list, x) lf() s ) te(list, pos); er unlock ist); emove(list, { TXBEGIN(); pos = find(lis x); f(pos) erase(list, po **ND();** }

#### The **Real** Goal remove(list, x) { atomic {

```
pos = find(list, x);
if(pos)
    erase(list, pos);
}
```

- Transactions: super-awesome
- Transactional Memory: also super-awesome, but:
- Transactions != TM
- TM is an *implementation technique*
- Often presented as programmer abstraction
- Remember Optimistic Concurrency Control

```
ist, x)
remov
    k(list);
      = find(list, x)
  f()
         re(list, pos);
     er
 unlock ist);
emove(list,
 TXBEGIN();
 pos = find(lis
                   x);
   f(pos)
     erase(list, po
     ND();
```



```
pthread mutex t g global_lock;
⊟begin tx() {
    pthread_mutex_lock(g_global_lock);
L}
⊟end tx() {
    pthread mutex unlock(g global lock);
└}
🗏 abort () {
     // can't happen
└}
```

### A Simple TM

```
pthread mutex t g global lock;
                                       }
⊟begin tx() {
     pthread_mutex_lock(g_global_lock);
L}
⊟end tx() {
    pthread mutex unlock(g global lock);
└}
⊟abort() {
     // can't happen
└}
```

```
remove(list, x) {
    begin_tx();
    pos = find(list, x);
    if(pos)
        erase(list, pos);
    end_tx();
}
```

### A Simple TM

```
pthread mutex t g global lock;
                                       }
⊒begin tx() {
    pthread mutex lock(g global lock);
└}
⊟end tx() {
    pthread mutex unlock(g global lock);
└}
⊟abort() {
     // can't happen
└}
```

```
remove(list, x) {
    begin_tx();
    pos = find(list, x);
    if(pos)
        erase(list, pos);
    end_tx();
}
```

Actually, this works fine... But how can we improve it?

Consider a hash-table

Consider a hash-table









#### Pessimistic concurrency control



#### Pessimistic concurrency control



### Optimistic concurrency control



### Optimistic concurrency control



### Optimistic concurrency control



# **TM Primer**

#### Key Ideas:

- Critical sections execute concurrently
- Conflicts are detected dynamically
   Conflict
- If conflict serializability is violated, rollback

#### Key Abstractions:

- Primitives
  - xbegin, xend, xabort
  - Conflict  $\emptyset \neq \{W_a\} \cap \{R_b \cup W_b\}$
- Contention Manager
  - Need flexible policy























C is in the read set of cpu0, and in the write set of cpu1





### Data Versioning

- Eager Versioning
- Lazy Versioning



### **Data Versioning**

- Eager Versioning
- Lazy Versioning

### **Conflict Detection and Resolution**

- Eager Detection (Pessimistic)
- Lazy Detection (Optimistic)



### Data Versioning

- Eager Versioning
- Lazy Versioning

### **Conflict Detection and Resolution**

- Eager Detection (Pessimistic)
- Lazy Detection (Optimistic)

### **Conflict Detection Granularity**

- Object Granularity
- Word Granularity
- Cache line Granularity



# TM Design Alternatives

- Hardware (HTM)
  - Caches track RW set, HW speculation/checkpoint
- Software (STM)
  - Instrument RW
  - Inherit TX Object



# Hardware Transactional Memory

- Idea: Track read / write sets in HW
  - commit / rollback in hardware as well
- Cache coherent hardware already manages much of this
- Basic idea: cache == speculative storage
  - HTM ~= smarter cache
- Can support many different TM paradigms
  - Eager, lazy
  - optimistic, pessimistic

• "Small" modification to cache

"Small" modification to cache



"Small" modification to cache



"Small" modification to cache



### Key ideas

- Checkpoint architectural state
- Caches: 'versioning' for memory
- Change coherence protocol
  - Conflict detection in hardware
- 'Commit' transactions if no conflict
- 'Abort' on conflict (or special cond)
- 'Retry' aborted transaction

# Coherence for Conflict Detection and Versioning



# Coherence for Conflict Detection and Versioning



- Lines in TMI state are speculative
- Lines in TS, TE have been read
- Invalidations/Upgrades for T\* → transactional conflicts
- Commit: T\* -> \*
- Abort:  $T^* \rightarrow I$ , rollback registers

# Coherence for Conflict Detection and Versioning



- Lines in TMI state are speculative
- Lines in TS, TE have been read
- Invalidations/Upgrades for T\* → transactional conflicts
- Commit: T\* -> \*
- Abort:  $T^* \rightarrow I$ , rollback registers

**Pros/Cons?** 

# Case Study: SUN Rock

- Major challenge: diagnosing cause of Transaction aborts
  - Necessary for intelligent scheduling of transactions
  - Also for debugging code
  - debugging the processor architecture / µarchitecture
- Many unexpected causes of aborts
- Rock v1 diagnostics unable to distinguish distinct failure modes

| Mask  | Name  | Description and example cause                                                            |
|-------|-------|------------------------------------------------------------------------------------------|
| 0x001 | EXOG  | Exogenous - Intervening code has run: cps register contents are invalid.                 |
| 0x002 | COH   | Coherence - Conflicting memory operation.                                                |
| 0x004 | TCC   | Trap Instruction - A trap instruction evaluates to "taken".                              |
| 0x008 | INST  | Unsupported Instruction - Instruction not supported inside transactions.                 |
| 0x010 | PREC  | Precise Exception - Execution generated a precise exception.                             |
| 0x020 | ASYNC | Async - Received an asynchronous interrupt.                                              |
| 0x040 | SIZ   | Size - Transaction write set exceeded the size of the store queue.                       |
| 0x080 | LD    | Load - Cache line in read set evicted by transaction.                                    |
| 0x100 | ST    | Store - Data TLB miss on a store.                                                        |
| 0x200 | CTI   | Control transfer - Mispredicted branch.                                                  |
| 0x400 | FP    | Floating point - Divide instruction.                                                     |
| 0x800 | UCTI  | Unresolved control transfer - branch executed without resolving load on which it depends |

Table 1. cps register: bit definitions and example failure reasons that set them.

# Case Study: SUN Rock

- UCore FGU UCore IFU UCore FGU UCore D3 D4 D4 D4 D4 D4 Core L2\$ L2\$ Core L2\$ L2\$ Core Core Core Core L2\$ L2\$ Core
- Major challenge: diagnosing cause of Transaction aborts
  - Necessary for intelligent scheduling of transactions
  - Also for debugging code
  - debugging the processor architecture / µarchitecture
- Many unexpected causes of aborts
- Rock v1 diagnostics unable to distinguish distinct failure modes

| Mask  | Name  | Description and example cause                                                            |
|-------|-------|------------------------------------------------------------------------------------------|
| 0x001 | EXOG  | Exogenous - Intervening code has run: cps register contents are invalid.                 |
| 0x002 | COH   | Coherence - Conflicting memory operation.                                                |
| 0x004 | TCC   | Trap Instruction - A trap instruction evaluates to "taken".                              |
| 0x008 | INST  | Unsupported Instruction - Instruction not supported inside transactions.                 |
| 0x010 | PREC  | Precise Exception - Execution generated a precise exception.                             |
| 0x020 | ASYNC | Async - Received an asynchronous interrupt.                                              |
| 0x040 | SIZ   | Size - Transaction write set exceeded the size of the store queue.                       |
| 0x080 | LD    | Load - Cache line in read set evicted by transaction.                                    |
| 0x100 | ST    | Store - Data TLB miss on a store.                                                        |
| 0x200 | CTI   | Control transfer - Mispredicted branch.                                                  |
| 0x400 | FP    | Floating point - Divide instruction.                                                     |
| 0x800 | UCTI  | Unresolved control transfer - branch executed without resolving load on which it depends |

Table 1. cps register: bit definitions and example failure reasons that set them

| Th     | read 1              | Thread 2 |  |
|--------|---------------------|----------|--|
| 1<br>2 | atomic {<br>r1 = x; | x = 1;   |  |
| 3      | r2 = x;             |          |  |
| 4      | }                   |          |  |

| Thread 1                | Thread 2 |
|-------------------------|----------|
| 1 atomic {<br>2 r1 = x; | x = 1;   |
| 3 r2 = x;               |          |
| 4 }                     |          |

Can r1 != r2?

| Thread 1                             | Thread 2 |  |  |
|--------------------------------------|----------|--|--|
| 1 atomic {<br>2 r1 = x;<br>3 r2 = x; | x = 1;   |  |  |
| 4 }                                  |          |  |  |
| Can r1 != r2?                        |          |  |  |

Non-repeatable reads

#### Initially, x == 0

| Thread 1    | Thread 2 | Thread 1   | Thread 2 |  |
|-------------|----------|------------|----------|--|
| 1 atomic {  | . 1.     | 1 atomic { | 10.      |  |
| 2 $r1 = x;$ | x = 1;   | 2 r = x;   | x = 10;  |  |
| 3 r2 = x;   |          | 3 x = r+1; |          |  |
| 4 }         |          | 4 }        |          |  |

Can r1 != r2? Non-repeatable reads

#### Initially, x == 0

| Thread 1                                    | Thread 2              | Thread 1                                    | Thread 2 |
|---------------------------------------------|-----------------------|---------------------------------------------|----------|
| 1 atomic {<br>2 r1 = x;<br>3 r2 = x;<br>4 } | x = 1;                | 1 atomic {<br>2 r = x;<br>3 x = r+1;<br>4 } | x = 10;  |
| Can r1<br>Non-repea                         | != r2?<br>table reads | Can x                                       | ==1?     |

#### Initially, x == 0

| Thread 1                                    | Thread 2 Thread 1 |                                             | Thread 2 |  |
|---------------------------------------------|-------------------|---------------------------------------------|----------|--|
| 1 atomic {<br>2 r1 = x;<br>3 r2 = x;<br>4 } | x = 1;            | 1 atomic {<br>2 r = x;<br>3 x = r+1;<br>4 } | x = 10;  |  |
| Can r1                                      | != r2?            | Can x==1?                                   |          |  |
| Non-repea                                   | atable reads      | Lost Updates                                |          |  |

|                                      |          | Initially                            | y, x == 0 | Initially,                     | x is even |
|--------------------------------------|----------|--------------------------------------|-----------|--------------------------------|-----------|
| Thread 1                             | Thread 2 | Thread 1                             | Thread 2  | Thread 1                       | Thread 2  |
| 1 atomic {<br>2 r1 = x;<br>3 r2 = x; | x = 1;   | 1 atomic {<br>2 r = x;<br>3 x = r+1; | x = 10;   | 1 atomic {<br>2 x++;<br>3 x++; | r = x;    |
| 4 }                                  |          | 4 }                                  |           | 4 }                            |           |
| Can r1 != r2?                        |          | Can x                                | x==1?     |                                |           |
| Non-repeatable reads                 |          | Lost L                               | Jpdates   |                                |           |

|                                             |          | Initially, x == 0                           |                 | Initially, x is even                  |          |
|---------------------------------------------|----------|---------------------------------------------|-----------------|---------------------------------------|----------|
| Thread 1                                    | Thread 2 | Thread 1                                    | Thread 2        | Thread 1                              | Thread 2 |
| 1 atomic {<br>2 r1 = x;<br>3 r2 = x;<br>4 } | x = 1;   | 1 atomic {<br>2 r = x;<br>3 x = r+1;<br>4 } | x = 10;         | 1 atomic {<br>2 x++;<br>3 x++;<br>4 } | r = x;   |
| Can r1 != r2?<br>Non-repeatable reads       |          | Can x<br>Lost U                             | ==1?<br>Ipdates | Can r b                               | e odd?   |

|                                                      | Initially, x == 0                               | Initially, x is even                               |
|------------------------------------------------------|-------------------------------------------------|----------------------------------------------------|
| Thread 1 Thread 2                                    | Thread 1 Thread 2                               | Thread 1 Thread 2                                  |
| <pre>1 atomic { 2 r1 = x; x = 1; 3 r2 = x; 4 }</pre> | 1 atomic {<br>2 r = x; x =<br>3 x = r+1;<br>4 } | <pre>1 atomic { 10; 2 x++; r = x; 3 x++; 4 }</pre> |
| Can r1 != r2?                                        | Can x==1?                                       | Can r be odd?                                      |
| Non-repeatable reads                                 | Lost Updates                                    | Dirty reads                                        |

## TM Tricks

- Lock Elision
  - In many data structures, accesses are contention free in the common case
  - But need locks for the uncommon case where contention does occur
  - For example, double ended queue
  - Can replace lock with atomic section, default to lock when needed
  - Allows extra parallelism in the average case

# Lock Elision

hashTable.lock()
var = hashTable.lookup(X);
if (!var) hashTable.insert(X);
hashTable.unlock();

hashTable.lock()
var = hashTable.lookup(Y);
if (!var) hashTable.insert(Y);
hashTable.unlock();

# Lock Elision

hashTable.lock()
var = hashTable.lookup(X);
if (!var) hashTable.insert(X);
hashTable.unlock();

# Hardware notices lock Instruction sequence!

hashTable.lock()
var = hashTable.lookup(Y);
if (!var) hashTable.insert(Y);
hashTable.unlock();

# Lock Elision

hashTable.lock()
var = hashTable.lookup(X);
if (!var) hashTable.insert(X);
hashTable.unlock();

# Hardware notices lock Instruction sequence!

```
hashTable.lock()
var = hashTable.lookup(Y);
if (!var) hashTable.insert(Y);
hashTable.unlock();
```

**Parallel Execution** 

atomic {

if (!hashTable.isUnlocked()) abort; var = hashTable.lookup(X); if (!var) hashTable.insert(X); } orElse ...

```
atomic {
```

if (!hashTable.isUnlocked()) abort; var = hashTable.lookup(X); if (!var) hashTable.insert(X); } orElse ...

## Privatization

atomic {
 var = getWorkUnit();
 do\_long\_compution(var);
}

## Privatization

```
atomic {
    var = getWorkUnit();
    do_long_compution(var);
}
```

```
VS
```

```
atomic {
     var = getWorkUnit();
}
do_long_compution(var);
```

## Privatization

```
atomic {
    var = getWorkUnit();
    do_long_compution(var);
}
```

```
VS
```

```
atomic {
            var = getWorkUnit();
}
do_long_compution(var);
```

may only work correctly in TMs that support strong isolation. (why?)

}

atomic {

do\_lots\_of\_work();
update\_global\_statistics();

}

atomic {

do\_lots\_of\_work();
update\_global\_statistics();

}

atomic {

do\_lots\_of\_work();
update\_global\_statistics();

```
Work Deferral
```

```
atomic {
         do_lots_of_work();
         update_global_statistics();
atomic {
         do_lots_of_work();
          atomic open {
                    update_global_statistics();
          }
}
atomic {
 do_lots_of_work();
 update_local_statistics(); //effectively serializes transactions
atomic{
         update_global_statististics_using_local_statistics()
}
```



System == <threads, memory>



System == <threads, memory> Memory cell support 4 operations:



System == <threads, memory> Memory cell support 4 operations:

Write<sup>i</sup>(L,v) - thread i writes v to L



System == <threads, memory>

Memory cell support 4 operations:

- Write<sup>i</sup>(L,v) thread i writes v to L
- Read<sup>i</sup>(L,v) thread i reads v from L



System == <threads, memory>

Memory cell support 4 operations:

- Write<sup>i</sup>(L,v) thread i writes v to L
- Read<sup>i</sup>(L,v) thread i reads v from L
- LL<sup>i</sup>(L,v) thread i reads v from L, marks L read by I



System == <threads, memory>

Memory cell support 4 operations:

- Write<sup>i</sup>(L,v) thread i writes v to L
- Read<sup>i</sup>(L,v) thread i reads v from L
- LL<sup>i</sup>(L,v) thread i reads v from L, marks L read by I
- SC<sup>i</sup>(L,v) thread i writes v to L
  - returns success if L is marked as read by i.
  - Otherwise it returns *failure*.



# STM Design Overview









# Threads: Rec Objects

#### class Rec {

```
boolean stable = false;
boolean, int status= (false,0); //can have two values...
boolean allWritten = false;
int version = 0;
int size = 0;
int size = 0;
int locs[] = {null};
int oldValues[] = {null};
```

Each thread → instance of Rec class (short for record).

Rec instance defines current transaction on thread

# Memory: STM Object

public class STM {
 int memory[];
 Rec ownerships[];

public boolean, int[] startTranscation(Rec rec, int[] dataSet){...};

private void initialize(Rec rec, int[] dataSet)
private void transaction(Rec rec, int version, boolean isInitiator) {...};
private void acquireOwnerships(Rec rec, int version) {...};
private void releaseOwnershipd(Rec rec, int version) {...};
private void agreeOldValues(Rec rec, int version) {...};
private void updateMemory(Rec rec, int version, int[] newvalues) {...};









































Flow of a transaction



Flow of a transaction





Flow of a transaction



Flow of a transaction



### Flow of a transaction



### Flow of a transaction



public boolean, int[] startTranscation(Rec rec, int[] dataSet) {
 initialize(rec, dataSet);
 rec.stable = true;

transaction(rec, rec.version, true);

rec.stable = false;

}

rec.version++;

if (rec.status) return (true, rec.oldValues);
else return false;





}

(status, failedLoc) = LL(rec.status);

}

(status, failedLoc) = LL(rec.status);

```
else { // failed in acquireOwnerships
releaseOwnerships(rec, version);
if (isInitiator) {
    Rec failedTrans = ownerships[failedLoc];
    if (failedTrans == null) return;
    else { // execute the transaction that owns the location you want
        int failedVer = failedTrans.version;
        if (failedTrans.stable) transaction(failedTrans, failedVer, false);
    }
}
```

rec – The thread that executes this transaction. version – Serial number of the transaction. isInitiator – Am I the initiating thread or the helper?

private void transaction(Rec rec, int version, boolean isInitiator) {
 acquireOwnerships(rec, version); // try to own locations

}

(status, failedLoc) = LL(rec.status);

else { // failed in acquireOwnerships releaseOwnerships(rec, version); if (isInitiator) { Rec failedTrans = ownerships[failedLoc]; if (failedTrans == null) return;

#### else {

// execute the transaction that owns the location you want

int failedVer = failedTrans.version;

if (failedTrans.stable) transaction(failedTrans, failedVer, false);

rec – The thread that executes this transaction. version – Serial number of the transaction. isInitiator – Am I the initiating thread or the helper?

### Another thread own the locations I need and it hasn't finished its transaction yet.

So I go out and execute its transaction in order to help it.

```
private void acquireOwnerships(Rec rec, int version) {
       for (int j=1; j<=rec.size; j++) {
               while (true) do {
                       int loc = locs[j];
                       if LL(rec.status) != null return; // transaction completed by some other thread
                       Rec owner = LL(ownerships[loc]);
                       if (rec.version != version) return;
                       if (owner == rec) break; // location is already mine
                       if (owner == null) {
                                               // acquire location
                               if (SC(rec.status, (null, 0))) {
                                 if ( SC(ownerships[loc], rec) ) {
                                   break;
                                                                                                                   If I'm not the last one to
                                                                                                                   read this field, it means that
                                                                                                                   another thread is trying to
                        else {// location is taken by someone else
                                                                                                                   execute this transaction.
                               if (SC(rec.status, (false, j))) return;
                                                                                                                   Try to loop until I succeed
                                                                                                                   or until the other thread
                                                                                                                   completes the transaction
```



# HTM vs. STM

| Hardware                               | Software                                 |
|----------------------------------------|------------------------------------------|
| Fast (due to hardware operations)      | Slow (due to software validation/commit) |
| Light code instrumentation             | Heavy code instrumentation               |
| HW buffers keep amount of metadata low | Lots of metadata                         |
| No need of a middleware                | Runtime library needed                   |
| Only short transactions allowed (why?) | Large transactions possible              |

# HTM vs. STM

| Hardware                               | Software                                 |
|----------------------------------------|------------------------------------------|
| Fast (due to hardware operations)      | Slow (due to software validation/commit) |
| Light code instrumentation             | Heavy code instrumentation               |
| HW buffers keep amount of metadata low | Lots of metadata                         |
| No need of a middleware                | Runtime library needed                   |
| Only short transactions allowed (why?) | Large transactions possible              |

How would you get the best of both?

# Hybrid-TM

- Best-effort HTM (use STM for long trx)
- Possible conflicts between HW,SW and HW-SW Trx
  - What kind of conflicts do SW-Trx care about?
  - What kind of conflicts do HW-Trx care about?
- Some initial proposals:
  - HyTM: uses an ownership record per memory location (overhead?)
  - PhTM: HTM-only or (heavy) STM-only, low instrumentation

### Questions?