- Performance Problems
  - tuple deforming
  - expression evaluation
    - WHERE clauses
    - aggregates
    - ...
- Expression Evaluation
  - a lot faster in v10
    - no point describing older state
  - still massive bottleneck for many usecases
    - for many rows
    - instantiation overhead
  - minor details can be optimized further with "plain techniques", but stuck
  - biggest problem: jumps, branches, function calls
    - switch (op->opcode)
		case EEOP_CONST:
		{
			*op->resnull = op->d.constval.isnull;
			*op->resvalue = op->d.constval.value;

			op++;
			continue;
		}
		case EEOP_FUNCEXPR:
		{
			FunctionCallInfo fcinfo = op->d.func.fcinfo_data;

			fcinfo->isnull = false;
			*op->resvalue = (op->d.func.fn_addr) (fcinfo);
			*op->resnull = fcinfo->isnull;

			op++;
			continue;
		}
		case EEOP_BOOL_OR_STEP_LAST:
		{
			if (*op->resnull)
			{
				/* result is already set to NULL, need not change it */
			}
			else if (DatumGetBool(*op->resvalue))
			{
				/* result is already set to TRUE, need not change it */

				/*
				 * No point jumping to jumpdone - would be same target (as
				 * this is the last argument to the AND expression), except
				 * more expensive.
				 */
			}
			else if (*op->d.boolexpr.anynull)
			{
				*op->resvalue = (Datum) 0;
				*op->resnull = true;
			}
			else
			{
				/* result is already set to FALSE, need not change it */
			}

			continue;
		}  -
- JIT
  - convert some "interpreted code" into natively executed code, for frequent functions/first call
  - usually *lot* more expensive for singe execution, but code ~5-10x faster
  - context of postgres:
    - when repeated execution of function
      - reliable
      - slows down unnecessarily
    - when plan expensive
      - wrong cost leads to missed optimizations
  - LLVM - compiler framework
    - "intermediate language"
    - converted to native code
  - Difficulties:
    - memory management & error handling
    - API stability
    - llvm limitations
  - 0-60% faster
    - other bottlenecks
    - JITing limitations
  - further improvement:
    - inline functions / operators
      - use clang to generate IR for functions
	- 0-180%
    - cache generated code
      - reduce overhead / use more JIT