Why: - Speed - bottlnecked on branches, indirect calls - branch predictor overwhelmed - pure instruction count How: - manually emit valid instructions, map/remap yourself - extremely fast to do - no optimization - a lot of work per-arch - use compiler framework - licensing - maturity - optimization - => LLVM What: - Expression Evaluation - Tuple Deforming - sorting - Whole of Executor - copy input / output? Issues: - C-API - Speed of emission and error handling - Error Handling Design: - mostly C API - shared library - simplistic planner integration in core - type syncing Expression Eval: - emit base steps manually as IR - more complex stuff just function calls - Issues: - too many pointers - Function call interface Tuple deforming: - convert on-disk tuple to in-memory representation - requires alignment computations, column width checks, null bitmap checks - "naive" approach: - provide static tupledesc, rely on compiler to unroll loops, remove unnecessary branches. - slow-ish (requires a lot of optimization passes) - compiler heuristics don't always trigger - manually write out IR - remove alignment checks if unnecessary - remove NOT NULL checks if unnecessary Future: - improve generated code - fewer pointer constants - improve function call interface - better planner integration - off process optimization - in / shared process caching