Why:
    - Speed
    - bottlnecked on branches, indirect calls
    - branch predictor overwhelmed
    - pure instruction count
How:
    - manually emit valid instructions, map/remap yourself
        - extremely fast to do
        - no optimization
	- a lot of work per-arch
    - use compiler framework
        - licensing
        - maturity
        - optimization
    - => LLVM
What:
    - Expression Evaluation
    - Tuple Deforming
    - sorting
    - Whole of Executor
    - copy input / output?
Issues:
    - C-API
    - Speed of emission and error handling
    - Error Handling
Design:
    - mostly C API
    - shared library
    - simplistic planner integration in core
    - type syncing
Expression Eval:
    - emit base steps manually as IR
    - more complex stuff just function calls
    - Issues:
      - too many pointers
      - Function call interface
Tuple deforming:
    - convert on-disk tuple to in-memory representation
    - requires alignment computations, column width checks, null bitmap checks
    - "naive" approach:
      - provide static tupledesc, rely on compiler to unroll loops, remove unnecessary branches.
      - slow-ish (requires a lot of optimization passes)
      - compiler heuristics don't always trigger
    - manually write out IR
      - remove alignment checks if unnecessary
      - remove NOT NULL checks if unnecessary
Future:
    - improve generated code
      - fewer pointer constants
      - improve function call interface
    - better planner integration
    - off process optimization
    - in / shared process caching