- What's vertical scalability.
- Architectures we care about.
- Why do we even care about vertical vs. horizontal scalability
  - numbers of cores grow
  - latency, latency, latency
- Past bottlenecks:
  - heavyweight locks
    - dynamic "lock identities"
    - fair
    - shared lock table
  - fast path locking
    - 9.2
    - most relation locks don't conflict
    - if no other conflicts, acquire locally
      - no shared locks!
    - when acquiring conflicting lock: check all fastpath owners

2xE5-2676, pgbench readonly
nclient fastpath                plain
1	15767.613029            14648.321682
2	30181.087726            31050.552320
4       58022.919055            53580.015882
8       99929.592542            79902.211657
16      197863.674721           116520.584584
32      402500.476521           125716.291930
64      516190.013200           103557.810298
96      524033.645406           98968.013220
128     526929.427577           97300.116397
196     518387.253132           94800.963682
256     513310.256754           89605.783986


  - XLOG Insertion
    - 9.4
    - WAL is a sequential structure
    - buffered in memory
    - old way was exclusive locks
    - new way is reserving space, fill in parallel
      1  - 52.711939
      8  - 286.496054
      16 - 346.113313
      24 - 363.242111

nclient    old way           new way
1          45.054616         44.896155
2          61.825701         63.758291
4          84.540911         101.886975
8          86.992427         123.295212
16         81.344399         142.994028
32         82.789298         180.576711
64         71.673193         186.595098
96         60.401167         188.734743
128        57.654713         183.889669
196        50.175884         175.800850
256        48.403708         175.850582

  - LWLock scalability
    - 9.5
    - queued reader writer lock
    - used for buffer locks, protecting hash tables, and a lot of other things
    - often used in shared mode
    - but state protected by a spinlock
      => massive spinlock contention
    - protect lock state only using atomics
    - complexities around queueing

     89.53%  postgres  postgres           [.] s_lock
      2.53%  postgres  postgres           [.] LWLockAcquire
      1.79%  postgres  postgres           [.] LWLockRelease
      0.63%  postgres  postgres           [.] hash_search_with_hash_value

           old code          new code
1          11466             11395
4          53846             53876
8          102673            102040
16         174818            176274
32         293249            295961
48         348542            377979
64         217754            447015
96         149011            461657
128        135191            457799


   - Buffer replacement
     - 9.5
     - used to be protected by single lock
     - first made to use a spinlock in a granular way
     - then removed the lock entirely


Future problems:
   - Extension lock
     - diagram
     - somewhat simple
   - Snapshot computation
     - linear with connections
       - worse due to cache hierarchy effects
     - conflicts with commits
     - Two different approaches discussed
   - Cache replacement
   - Buffer pinning
   - Root pages