LCOV - differential code coverage report
Current view: top level - src/include/storage - checksum_impl.h (source / functions) Coverage Total Hit UBC CBC
Current: Differential Code Coverage 16@8cea358b128 vs 17@8cea358b128 Lines: 100.0 % 21 21 21
Current Date: 2024-04-14 14:21:10 Functions: 100.0 % 2 2 2
Baseline: 16@8cea358b128 Branches: 91.7 % 12 11 1 11
Baseline Date: 2024-04-14 14:21:09 Line coverage date bins:
Legend: Lines: hit not hit | Branches: + taken - not taken # not executed (240..) days: 100.0 % 21 21 21
Function coverage date bins:
(240..) days: 100.0 % 2 2 2
Branch coverage date bins:
(240..) days: 91.7 % 12 11 1 11

 Age         Owner                    Branch data    TLA  Line data    Source code
                                  1                 :                : /*-------------------------------------------------------------------------
                                  2                 :                :  *
                                  3                 :                :  * checksum_impl.h
                                  4                 :                :  *    Checksum implementation for data pages.
                                  5                 :                :  *
                                  6                 :                :  * This file exists for the benefit of external programs that may wish to
                                  7                 :                :  * check Postgres page checksums.  They can #include this to get the code
                                  8                 :                :  * referenced by storage/checksum.h.  (Note: you may need to redefine
                                  9                 :                :  * Assert() as empty to compile this successfully externally.)
                                 10                 :                :  *
                                 11                 :                :  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
                                 12                 :                :  * Portions Copyright (c) 1994, Regents of the University of California
                                 13                 :                :  *
                                 14                 :                :  * src/include/storage/checksum_impl.h
                                 15                 :                :  *
                                 16                 :                :  *-------------------------------------------------------------------------
                                 17                 :                :  */
                                 18                 :                : 
                                 19                 :                : /*
                                 20                 :                :  * The algorithm used to checksum pages is chosen for very fast calculation.
                                 21                 :                :  * Workloads where the database working set fits into OS file cache but not
                                 22                 :                :  * into shared buffers can read in pages at a very fast pace and the checksum
                                 23                 :                :  * algorithm itself can become the largest bottleneck.
                                 24                 :                :  *
                                 25                 :                :  * The checksum algorithm itself is based on the FNV-1a hash (FNV is shorthand
                                 26                 :                :  * for Fowler/Noll/Vo).  The primitive of a plain FNV-1a hash folds in data 1
                                 27                 :                :  * byte at a time according to the formula:
                                 28                 :                :  *
                                 29                 :                :  *     hash = (hash ^ value) * FNV_PRIME
                                 30                 :                :  *
                                 31                 :                :  * FNV-1a algorithm is described at http://www.isthe.com/chongo/tech/comp/fnv/
                                 32                 :                :  *
                                 33                 :                :  * PostgreSQL doesn't use FNV-1a hash directly because it has bad mixing of
                                 34                 :                :  * high bits - high order bits in input data only affect high order bits in
                                 35                 :                :  * output data. To resolve this we xor in the value prior to multiplication
                                 36                 :                :  * shifted right by 17 bits. The number 17 was chosen because it doesn't
                                 37                 :                :  * have common denominator with set bit positions in FNV_PRIME and empirically
                                 38                 :                :  * provides the fastest mixing for high order bits of final iterations quickly
                                 39                 :                :  * avalanche into lower positions. For performance reasons we choose to combine
                                 40                 :                :  * 4 bytes at a time. The actual hash formula used as the basis is:
                                 41                 :                :  *
                                 42                 :                :  *     hash = (hash ^ value) * FNV_PRIME ^ ((hash ^ value) >> 17)
                                 43                 :                :  *
                                 44                 :                :  * The main bottleneck in this calculation is the multiplication latency. To
                                 45                 :                :  * hide the latency and to make use of SIMD parallelism multiple hash values
                                 46                 :                :  * are calculated in parallel. The page is treated as a 32 column two
                                 47                 :                :  * dimensional array of 32 bit values. Each column is aggregated separately
                                 48                 :                :  * into a partial checksum. Each partial checksum uses a different initial
                                 49                 :                :  * value (offset basis in FNV terminology). The initial values actually used
                                 50                 :                :  * were chosen randomly, as the values themselves don't matter as much as that
                                 51                 :                :  * they are different and don't match anything in real data. After initializing
                                 52                 :                :  * partial checksums each value in the column is aggregated according to the
                                 53                 :                :  * above formula. Finally two more iterations of the formula are performed with
                                 54                 :                :  * value 0 to mix the bits of the last value added.
                                 55                 :                :  *
                                 56                 :                :  * The partial checksums are then folded together using xor to form a single
                                 57                 :                :  * 32-bit checksum. The caller can safely reduce the value to 16 bits
                                 58                 :                :  * using modulo 2^16-1. That will cause a very slight bias towards lower
                                 59                 :                :  * values but this is not significant for the performance of the
                                 60                 :                :  * checksum.
                                 61                 :                :  *
                                 62                 :                :  * The algorithm choice was based on what instructions are available in SIMD
                                 63                 :                :  * instruction sets. This meant that a fast and good algorithm needed to use
                                 64                 :                :  * multiplication as the main mixing operator. The simplest multiplication
                                 65                 :                :  * based checksum primitive is the one used by FNV. The prime used is chosen
                                 66                 :                :  * for good dispersion of values. It has no known simple patterns that result
                                 67                 :                :  * in collisions. Test of 5-bit differentials of the primitive over 64bit keys
                                 68                 :                :  * reveals no differentials with 3 or more values out of 100000 random keys
                                 69                 :                :  * colliding. Avalanche test shows that only high order bits of the last word
                                 70                 :                :  * have a bias. Tests of 1-4 uncorrelated bit errors, stray 0 and 0xFF bytes,
                                 71                 :                :  * overwriting page from random position to end with 0 bytes, and overwriting
                                 72                 :                :  * random segments of page with 0x00, 0xFF and random data all show optimal
                                 73                 :                :  * 2e-16 false positive rate within margin of error.
                                 74                 :                :  *
                                 75                 :                :  * Vectorization of the algorithm requires 32bit x 32bit -> 32bit integer
                                 76                 :                :  * multiplication instruction. As of 2013 the corresponding instruction is
                                 77                 :                :  * available on x86 SSE4.1 extensions (pmulld) and ARM NEON (vmul.i32).
                                 78                 :                :  * Vectorization requires a compiler to do the vectorization for us. For recent
                                 79                 :                :  * GCC versions the flags -msse4.1 -funroll-loops -ftree-vectorize are enough
                                 80                 :                :  * to achieve vectorization.
                                 81                 :                :  *
                                 82                 :                :  * The optimal amount of parallelism to use depends on CPU specific instruction
                                 83                 :                :  * latency, SIMD instruction width, throughput and the amount of registers
                                 84                 :                :  * available to hold intermediate state. Generally, more parallelism is better
                                 85                 :                :  * up to the point that state doesn't fit in registers and extra load-store
                                 86                 :                :  * instructions are needed to swap values in/out. The number chosen is a fixed
                                 87                 :                :  * part of the algorithm because changing the parallelism changes the checksum
                                 88                 :                :  * result.
                                 89                 :                :  *
                                 90                 :                :  * The parallelism number 32 was chosen based on the fact that it is the
                                 91                 :                :  * largest state that fits into architecturally visible x86 SSE registers while
                                 92                 :                :  * leaving some free registers for intermediate values. For future processors
                                 93                 :                :  * with 256bit vector registers this will leave some performance on the table.
                                 94                 :                :  * When vectorization is not available it might be beneficial to restructure
                                 95                 :                :  * the computation to calculate a subset of the columns at a time and perform
                                 96                 :                :  * multiple passes to avoid register spilling. This optimization opportunity
                                 97                 :                :  * is not used. Current coding also assumes that the compiler has the ability
                                 98                 :                :  * to unroll the inner loop to avoid loop overhead and minimize register
                                 99                 :                :  * spilling. For less sophisticated compilers it might be beneficial to
                                100                 :                :  * manually unroll the inner loop.
                                101                 :                :  */
                                102                 :                : 
                                103                 :                : #include "storage/bufpage.h"
                                104                 :                : 
                                105                 :                : /* number of checksums to calculate in parallel */
                                106                 :                : #define N_SUMS 32
                                107                 :                : /* prime multiplier of FNV-1a hash */
                                108                 :                : #define FNV_PRIME 16777619
                                109                 :                : 
                                110                 :                : /* Use a union so that this code is valid under strict aliasing */
                                111                 :                : typedef union
                                112                 :                : {
                                113                 :                :     PageHeaderData phdr;
                                114                 :                :     uint32      data[BLCKSZ / (sizeof(uint32) * N_SUMS)][N_SUMS];
                                115                 :                : } PGChecksummablePage;
                                116                 :                : 
                                117                 :                : /*
                                118                 :                :  * Base offsets to initialize each of the parallel FNV hashes into a
                                119                 :                :  * different initial state.
                                120                 :                :  */
                                121                 :                : static const uint32 checksumBaseOffsets[N_SUMS] = {
                                122                 :                :     0x5B1F36E9, 0xB8525960, 0x02AB50AA, 0x1DE66D2A,
                                123                 :                :     0x79FF467A, 0x9BB9F8A3, 0x217E7CD2, 0x83E13D2C,
                                124                 :                :     0xF8D4474F, 0xE39EB970, 0x42C6AE16, 0x993216FA,
                                125                 :                :     0x7B093B5D, 0x98DAFF3C, 0xF718902A, 0x0B1C9CDB,
                                126                 :                :     0xE58F764B, 0x187636BC, 0x5D7B3BB1, 0xE73DE7DE,
                                127                 :                :     0x92BEC979, 0xCCA6C0B2, 0x304A0979, 0x85AA43D4,
                                128                 :                :     0x783125BB, 0x6CA8EAA2, 0xE407EAC6, 0x4B5CFC3E,
                                129                 :                :     0x9FBF8C76, 0x15CA20BE, 0xF2CA9FD3, 0x959BD756
                                130                 :                : };
                                131                 :                : 
                                132                 :                : /*
                                133                 :                :  * Calculate one round of the checksum.
                                134                 :                :  */
                                135                 :                : #define CHECKSUM_COMP(checksum, value) \
                                136                 :                : do { \
                                137                 :                :     uint32 __tmp = (checksum) ^ (value); \
                                138                 :                :     (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 17); \
                                139                 :                : } while (0)
                                140                 :                : 
                                141                 :                : /*
                                142                 :                :  * Block checksum algorithm.  The page must be adequately aligned
                                143                 :                :  * (at least on 4-byte boundary).
                                144                 :                :  */
                                145                 :                : static uint32
 2053 tgl@sss.pgh.pa.us         146                 :CBC      116980 : pg_checksum_block(const PGChecksummablePage *page)
                                147                 :                : {
                                148                 :                :     uint32      sums[N_SUMS];
 3958                           149                 :         116980 :     uint32      result = 0;
                                150                 :                :     uint32      i,
                                151                 :                :                 j;
                                152                 :                : 
                                153                 :                :     /* ensure that the size is compatible with the algorithm */
                                154                 :                :     Assert(sizeof(PGChecksummablePage) == BLCKSZ);
                                155                 :                : 
                                156                 :                :     /* initialize partial checksums to their corresponding offsets */
                                157                 :         116980 :     memcpy(sums, checksumBaseOffsets, sizeof(checksumBaseOffsets));
                                158                 :                : 
                                159                 :                :     /* main checksum calculation */
 2053                           160         [ +  + ]:        7603700 :     for (i = 0; i < (uint32) (BLCKSZ / (sizeof(uint32) * N_SUMS)); i++)
 3958                           161         [ +  + ]:      247061760 :         for (j = 0; j < N_SUMS; j++)
 2053                           162                 :      239575040 :             CHECKSUM_COMP(sums[j], page->data[i][j]);
                                163                 :                : 
                                164                 :                :     /* finally add in two rounds of zeroes for additional mixing */
 3958                           165         [ +  + ]:         350940 :     for (i = 0; i < 2; i++)
                                166         [ +  + ]:        7720680 :         for (j = 0; j < N_SUMS; j++)
                                167                 :        7486720 :             CHECKSUM_COMP(sums[j], 0);
                                168                 :                : 
                                169                 :                :     /* xor fold partial checksums together */
                                170         [ +  + ]:        3860340 :     for (i = 0; i < N_SUMS; i++)
                                171                 :        3743360 :         result ^= sums[i];
                                172                 :                : 
                                173                 :         116980 :     return result;
                                174                 :                : }
                                175                 :                : 
                                176                 :                : /*
                                177                 :                :  * Compute the checksum for a Postgres page.
                                178                 :                :  *
                                179                 :                :  * The page must be adequately aligned (at least on a 4-byte boundary).
                                180                 :                :  * Beware also that the checksum field of the page is transiently zeroed.
                                181                 :                :  *
                                182                 :                :  * The checksum includes the block number (to detect the case where a page is
                                183                 :                :  * somehow moved to a different location), the page header (excluding the
                                184                 :                :  * checksum itself), and the page data.
                                185                 :                :  */
                                186                 :                : uint16
                                187                 :         116980 : pg_checksum_page(char *page, BlockNumber blkno)
                                188                 :                : {
 2053                           189                 :         116980 :     PGChecksummablePage *cpage = (PGChecksummablePage *) page;
                                190                 :                :     uint16      save_checksum;
                                191                 :                :     uint32      checksum;
                                192                 :                : 
                                193                 :                :     /* We only calculate the checksum for properly-initialized pages */
  643 peter@eisentraut.org      194         [ -  + ]:         116980 :     Assert(!PageIsNew((Page) page));
                                195                 :                : 
                                196                 :                :     /*
                                197                 :                :      * Save pd_checksum and temporarily set it to zero, so that the checksum
                                198                 :                :      * calculation isn't affected by the old checksum stored on the page.
                                199                 :                :      * Restore it after, because actually updating the checksum is NOT part of
                                200                 :                :      * the API of this function.
                                201                 :                :      */
 2053 tgl@sss.pgh.pa.us         202                 :         116980 :     save_checksum = cpage->phdr.pd_checksum;
                                203                 :         116980 :     cpage->phdr.pd_checksum = 0;
                                204                 :         116980 :     checksum = pg_checksum_block(cpage);
                                205                 :         116980 :     cpage->phdr.pd_checksum = save_checksum;
                                206                 :                : 
                                207                 :                :     /* Mix in the block number to detect transposed pages */
 3958                           208                 :         116980 :     checksum ^= blkno;
                                209                 :                : 
                                210                 :                :     /*
                                211                 :                :      * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of
                                212                 :                :      * one. That avoids checksums of zero, which seems like a good idea.
                                213                 :                :      */
 1500 michael@paquier.xyz       214                 :         116980 :     return (uint16) ((checksum % 65535) + 1);
                                215                 :                : }
        

Generated by: LCOV version 2.1-beta2-3-g6141622