LCOV - differential code coverage report
Current view: top level - src/backend/access/table - tableam.c (source / functions) Coverage Total Hit UBC GNC CBC DCB
Current: Differential Code Coverage 16@8cea358b128 vs 17@8cea358b128 Lines: 89.4 % 180 161 19 2 159 7
Current Date: 2024-04-14 14:21:10 Functions: 100.0 % 18 18 1 17 1
Baseline: 16@8cea358b128 Branches: 64.4 % 118 76 42 2 74
Baseline Date: 2024-04-14 14:21:09 Line coverage date bins:
Legend: Lines: hit not hit | Branches: + taken - not taken # not executed [..60] days: 100.0 % 4 4 1 3
(240..) days: 89.2 % 176 157 19 1 156
Function coverage date bins:
[..60] days: 100.0 % 2 2 2
(240..) days: 100.0 % 16 16 1 15
Branch coverage date bins:
[..60] days: 100.0 % 2 2 2
(240..) days: 63.8 % 116 74 42 74

 Age         Owner                    Branch data    TLA  Line data    Source code
                                  1                 :                : /*----------------------------------------------------------------------
                                  2                 :                :  *
                                  3                 :                :  * tableam.c
                                  4                 :                :  *      Table access method routines too big to be inline functions.
                                  5                 :                :  *
                                  6                 :                :  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
                                  7                 :                :  * Portions Copyright (c) 1994, Regents of the University of California
                                  8                 :                :  *
                                  9                 :                :  *
                                 10                 :                :  * IDENTIFICATION
                                 11                 :                :  *    src/backend/access/table/tableam.c
                                 12                 :                :  *
                                 13                 :                :  * NOTES
                                 14                 :                :  *    Note that most function in here are documented in tableam.h, rather than
                                 15                 :                :  *    here. That's because there's a lot of inline functions in tableam.h and
                                 16                 :                :  *    it'd be harder to understand if one constantly had to switch between files.
                                 17                 :                :  *
                                 18                 :                :  *----------------------------------------------------------------------
                                 19                 :                :  */
                                 20                 :                : #include "postgres.h"
                                 21                 :                : 
                                 22                 :                : #include <math.h>
                                 23                 :                : 
                                 24                 :                : #include "access/syncscan.h"
                                 25                 :                : #include "access/tableam.h"
                                 26                 :                : #include "access/xact.h"
                                 27                 :                : #include "optimizer/plancat.h"
                                 28                 :                : #include "port/pg_bitutils.h"
                                 29                 :                : #include "storage/bufmgr.h"
                                 30                 :                : #include "storage/shmem.h"
                                 31                 :                : #include "storage/smgr.h"
                                 32                 :                : 
                                 33                 :                : /*
                                 34                 :                :  * Constants to control the behavior of block allocation to parallel workers
                                 35                 :                :  * during a parallel seqscan.  Technically these values do not need to be
                                 36                 :                :  * powers of 2, but having them as powers of 2 makes the math more optimal
                                 37                 :                :  * and makes the ramp-down stepping more even.
                                 38                 :                :  */
                                 39                 :                : 
                                 40                 :                : /* The number of I/O chunks we try to break a parallel seqscan down into */
                                 41                 :                : #define PARALLEL_SEQSCAN_NCHUNKS            2048
                                 42                 :                : /* Ramp down size of allocations when we've only this number of chunks left */
                                 43                 :                : #define PARALLEL_SEQSCAN_RAMPDOWN_CHUNKS    64
                                 44                 :                : /* Cap the size of parallel I/O chunks to this number of blocks */
                                 45                 :                : #define PARALLEL_SEQSCAN_MAX_CHUNK_SIZE     8192
                                 46                 :                : 
                                 47                 :                : /* GUC variables */
                                 48                 :                : char       *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
                                 49                 :                : bool        synchronize_seqscans = true;
                                 50                 :                : 
                                 51                 :                : 
                                 52                 :                : /* ----------------------------------------------------------------------------
                                 53                 :                :  * Slot functions.
                                 54                 :                :  * ----------------------------------------------------------------------------
                                 55                 :                :  */
                                 56                 :                : 
                                 57                 :                : const TupleTableSlotOps *
 1861 andres@anarazel.de         58                 :CBC    12542720 : table_slot_callbacks(Relation relation)
                                 59                 :                : {
                                 60                 :                :     const TupleTableSlotOps *tts_cb;
                                 61                 :                : 
                                 62         [ +  + ]:       12542720 :     if (relation->rd_tableam)
                                 63                 :       12538831 :         tts_cb = relation->rd_tableam->slot_callbacks(relation);
                                 64         [ +  + ]:           3889 :     else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
                                 65                 :                :     {
                                 66                 :                :         /*
                                 67                 :                :          * Historically FDWs expect to store heap tuples in slots. Continue
                                 68                 :                :          * handing them one, to make it less painful to adapt FDWs to new
                                 69                 :                :          * versions. The cost of a heap slot over a virtual slot is pretty
                                 70                 :                :          * small.
                                 71                 :                :          */
                                 72                 :            207 :         tts_cb = &TTSOpsHeapTuple;
                                 73                 :                :     }
                                 74                 :                :     else
                                 75                 :                :     {
                                 76                 :                :         /*
                                 77                 :                :          * These need to be supported, as some parts of the code (like COPY)
                                 78                 :                :          * need to create slots for such relations too. It seems better to
                                 79                 :                :          * centralize the knowledge that a heap slot is the right thing in
                                 80                 :                :          * that case here.
                                 81                 :                :          */
                                 82   [ +  +  -  + ]:           3682 :         Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
                                 83                 :                :                relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
                                 84                 :           3682 :         tts_cb = &TTSOpsVirtual;
                                 85                 :                :     }
                                 86                 :                : 
                                 87                 :       12542720 :     return tts_cb;
                                 88                 :                : }
                                 89                 :                : 
                                 90                 :                : TupleTableSlot *
                                 91                 :       12345074 : table_slot_create(Relation relation, List **reglist)
                                 92                 :                : {
                                 93                 :                :     const TupleTableSlotOps *tts_cb;
                                 94                 :                :     TupleTableSlot *slot;
                                 95                 :                : 
                                 96                 :       12345074 :     tts_cb = table_slot_callbacks(relation);
                                 97                 :       12345074 :     slot = MakeSingleTupleTableSlot(RelationGetDescr(relation), tts_cb);
                                 98                 :                : 
                                 99         [ +  + ]:       12345074 :     if (reglist)
                                100                 :         132968 :         *reglist = lappend(*reglist, slot);
                                101                 :                : 
                                102                 :       12345074 :     return slot;
                                103                 :                : }
                                104                 :                : 
                                105                 :                : 
                                106                 :                : /* ----------------------------------------------------------------------------
                                107                 :                :  * Table scan functions.
                                108                 :                :  * ----------------------------------------------------------------------------
                                109                 :                :  */
                                110                 :                : 
                                111                 :                : TableScanDesc
                                112                 :          43165 : table_beginscan_catalog(Relation relation, int nkeys, struct ScanKeyData *key)
                                113                 :                : {
 1792                           114                 :          43165 :     uint32      flags = SO_TYPE_SEQSCAN |
                                115                 :                :         SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | SO_TEMP_SNAPSHOT;
 1861                           116                 :          43165 :     Oid         relid = RelationGetRelid(relation);
                                117                 :          43165 :     Snapshot    snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
                                118                 :                : 
 1842                           119                 :          43165 :     return relation->rd_tableam->scan_begin(relation, snapshot, nkeys, key,
                                120                 :                :                                             NULL, flags);
                                121                 :                : }
                                122                 :                : 
                                123                 :                : 
                                124                 :                : /* ----------------------------------------------------------------------------
                                125                 :                :  * Parallel table scan related functions.
                                126                 :                :  * ----------------------------------------------------------------------------
                                127                 :                :  */
                                128                 :                : 
                                129                 :                : Size
 1861                           130                 :            523 : table_parallelscan_estimate(Relation rel, Snapshot snapshot)
                                131                 :                : {
                                132                 :            523 :     Size        sz = 0;
                                133                 :                : 
                                134   [ +  +  -  + ]:            523 :     if (IsMVCCSnapshot(snapshot))
                                135                 :            450 :         sz = add_size(sz, EstimateSnapshotSpace(snapshot));
                                136                 :                :     else
                                137         [ -  + ]:             73 :         Assert(snapshot == SnapshotAny);
                                138                 :                : 
                                139                 :            523 :     sz = add_size(sz, rel->rd_tableam->parallelscan_estimate(rel));
                                140                 :                : 
                                141                 :            523 :     return sz;
                                142                 :                : }
                                143                 :                : 
                                144                 :                : void
                                145                 :            523 : table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
                                146                 :                :                               Snapshot snapshot)
                                147                 :                : {
                                148                 :            523 :     Size        snapshot_off = rel->rd_tableam->parallelscan_initialize(rel, pscan);
                                149                 :                : 
                                150                 :            523 :     pscan->phs_snapshot_off = snapshot_off;
                                151                 :                : 
                                152   [ +  +  -  + ]:            523 :     if (IsMVCCSnapshot(snapshot))
                                153                 :                :     {
                                154                 :            450 :         SerializeSnapshot(snapshot, (char *) pscan + pscan->phs_snapshot_off);
                                155                 :            450 :         pscan->phs_snapshot_any = false;
                                156                 :                :     }
                                157                 :                :     else
                                158                 :                :     {
                                159         [ -  + ]:             73 :         Assert(snapshot == SnapshotAny);
                                160                 :             73 :         pscan->phs_snapshot_any = true;
                                161                 :                :     }
                                162                 :            523 : }
                                163                 :                : 
                                164                 :                : TableScanDesc
  573 pg@bowt.ie                165                 :           1929 : table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
                                166                 :                : {
                                167                 :                :     Snapshot    snapshot;
 1792 andres@anarazel.de        168                 :           1929 :     uint32      flags = SO_TYPE_SEQSCAN |
                                169                 :                :         SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
                                170                 :                : 
  573 pg@bowt.ie                171         [ -  + ]:           1929 :     Assert(RelationGetRelid(relation) == pscan->phs_relid);
                                172                 :                : 
                                173         [ +  + ]:           1929 :     if (!pscan->phs_snapshot_any)
                                174                 :                :     {
                                175                 :                :         /* Snapshot was serialized -- restore it */
                                176                 :           1783 :         snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 1861 andres@anarazel.de        177                 :           1783 :         RegisterSnapshot(snapshot);
 1792                           178                 :           1783 :         flags |= SO_TEMP_SNAPSHOT;
                                179                 :                :     }
                                180                 :                :     else
                                181                 :                :     {
                                182                 :                :         /* SnapshotAny passed by caller (not serialized) */
 1861                           183                 :            146 :         snapshot = SnapshotAny;
                                184                 :                :     }
                                185                 :                : 
 1842                           186                 :           1929 :     return relation->rd_tableam->scan_begin(relation, snapshot, 0, NULL,
                                187                 :                :                                             pscan, flags);
                                188                 :                : }
                                189                 :                : 
                                190                 :                : 
                                191                 :                : /* ----------------------------------------------------------------------------
                                192                 :                :  * Index scan related functions.
                                193                 :                :  * ----------------------------------------------------------------------------
                                194                 :                :  */
                                195                 :                : 
                                196                 :                : /*
                                197                 :                :  * To perform that check simply start an index scan, create the necessary
                                198                 :                :  * slot, do the heap lookup, and shut everything down again. This could be
                                199                 :                :  * optimized, but is unlikely to matter from a performance POV. If there
                                200                 :                :  * frequently are live index pointers also matching a unique index key, the
                                201                 :                :  * CPU overhead of this routine is unlikely to matter.
                                202                 :                :  *
                                203                 :                :  * Note that *tid may be modified when we return true if the AM supports
                                204                 :                :  * storing multiple row versions reachable via a single index entry (like
                                205                 :                :  * heap's HOT).
                                206                 :                :  */
                                207                 :                : bool
 1847                           208                 :        5704770 : table_index_fetch_tuple_check(Relation rel,
                                209                 :                :                               ItemPointer tid,
                                210                 :                :                               Snapshot snapshot,
                                211                 :                :                               bool *all_dead)
                                212                 :                : {
                                213                 :                :     IndexFetchTableData *scan;
                                214                 :                :     TupleTableSlot *slot;
                                215                 :        5704770 :     bool        call_again = false;
                                216                 :                :     bool        found;
                                217                 :                : 
                                218                 :        5704770 :     slot = table_slot_create(rel, NULL);
                                219                 :        5704770 :     scan = table_index_fetch_begin(rel);
                                220                 :        5704770 :     found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
                                221                 :                :                                     all_dead);
                                222                 :        5704770 :     table_index_fetch_end(scan);
                                223                 :        5704770 :     ExecDropSingleTupleTableSlot(slot);
                                224                 :                : 
                                225                 :        5704770 :     return found;
                                226                 :                : }
                                227                 :                : 
                                228                 :                : 
                                229                 :                : /* ------------------------------------------------------------------------
                                230                 :                :  * Functions for non-modifying operations on individual tuples
                                231                 :                :  * ------------------------------------------------------------------------
                                232                 :                :  */
                                233                 :                : 
                                234                 :                : void
 1788                           235                 :            153 : table_tuple_get_latest_tid(TableScanDesc scan, ItemPointer tid)
                                236                 :                : {
 1789 tgl@sss.pgh.pa.us         237                 :            153 :     Relation    rel = scan->rs_rd;
 1794 andres@anarazel.de        238                 :            153 :     const TableAmRoutine *tableam = rel->rd_tableam;
                                239                 :                : 
                                240                 :                :     /*
                                241                 :                :      * We don't expect direct calls to table_tuple_get_latest_tid with valid
                                242                 :                :      * CheckXidAlive for catalog or regular tables.  See detailed comments in
                                243                 :                :      * xact.c where these variables are declared.
                                244                 :                :      */
 1345 akapila@postgresql.o      245   [ -  +  -  -  :            153 :     if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
                                              -  + ]
 1345 akapila@postgresql.o      246         [ #  # ]:UBC           0 :         elog(ERROR, "unexpected table_tuple_get_latest_tid call during logical decoding");
                                247                 :                : 
                                248                 :                :     /*
                                249                 :                :      * Since this can be called with user-supplied TID, don't trust the input
                                250                 :                :      * too much.
                                251                 :                :      */
 1794 andres@anarazel.de        252         [ +  + ]:CBC         153 :     if (!tableam->tuple_tid_valid(scan, tid))
                                253         [ +  - ]:              6 :         ereport(ERROR,
                                254                 :                :                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                255                 :                :                  errmsg("tid (%u, %u) is not valid for relation \"%s\"",
                                256                 :                :                         ItemPointerGetBlockNumberNoCheck(tid),
                                257                 :                :                         ItemPointerGetOffsetNumberNoCheck(tid),
                                258                 :                :                         RelationGetRelationName(rel))));
                                259                 :                : 
 1793 tgl@sss.pgh.pa.us         260                 :            147 :     tableam->tuple_get_latest_tid(scan, tid);
 1794 andres@anarazel.de        261                 :            147 : }
                                262                 :                : 
                                263                 :                : 
                                264                 :                : /* ----------------------------------------------------------------------------
                                265                 :                :  * Functions to make modifications a bit simpler.
                                266                 :                :  * ----------------------------------------------------------------------------
                                267                 :                :  */
                                268                 :                : 
                                269                 :                : /*
                                270                 :                :  * simple_table_tuple_insert - insert a tuple
                                271                 :                :  *
                                272                 :                :  * Currently, this routine differs from table_tuple_insert only in supplying a
                                273                 :                :  * default command ID and not allowing access to the speedup options.
                                274                 :                :  */
                                275                 :                : void
    3 akorotkov@postgresql      276                 :          75742 : simple_table_tuple_insert(Relation rel, TupleTableSlot *slot)
                                277                 :                : {
                                278                 :          75742 :     table_tuple_insert(rel, slot, GetCurrentCommandId(true), 0, NULL);
 1849 andres@anarazel.de        279                 :          75742 : }
                                280                 :                : 
                                281                 :                : /*
                                282                 :                :  * simple_table_tuple_delete - delete a tuple
                                283                 :                :  *
                                284                 :                :  * This routine may be used to delete a tuple when concurrent updates of
                                285                 :                :  * the target tuple are not expected (for example, because we have a lock
                                286                 :                :  * on the relation associated with the tuple).  Any failure is reported
                                287                 :                :  * via ereport().
                                288                 :                :  */
                                289                 :                : void
    3 akorotkov@postgresql      290                 :          40311 : simple_table_tuple_delete(Relation rel, ItemPointer tid, Snapshot snapshot)
                                291                 :                : {
                                292                 :                :     TM_Result   result;
                                293                 :                :     TM_FailureData tmfd;
                                294                 :                : 
 1788 andres@anarazel.de        295                 :          40311 :     result = table_tuple_delete(rel, tid,
                                296                 :                :                                 GetCurrentCommandId(true),
                                297                 :                :                                 snapshot, InvalidSnapshot,
                                298                 :                :                                 true /* wait for commit */ ,
                                299                 :                :                                 &tmfd, false /* changingPart */ );
                                300                 :                : 
 1849                           301   [ -  +  -  -  :          40311 :     switch (result)
                                                 - ]
                                302                 :                :     {
 1849 andres@anarazel.de        303                 :UBC           0 :         case TM_SelfModified:
                                304                 :                :             /* Tuple was already updated in current command? */
                                305         [ #  # ]:              0 :             elog(ERROR, "tuple already updated by self");
                                306                 :                :             break;
                                307                 :                : 
 1849 andres@anarazel.de        308                 :CBC       40311 :         case TM_Ok:
                                309                 :                :             /* done successfully */
                                310                 :          40311 :             break;
                                311                 :                : 
 1849 andres@anarazel.de        312                 :UBC           0 :         case TM_Updated:
                                313         [ #  # ]:              0 :             elog(ERROR, "tuple concurrently updated");
                                314                 :                :             break;
                                315                 :                : 
                                316                 :              0 :         case TM_Deleted:
                                317         [ #  # ]:              0 :             elog(ERROR, "tuple concurrently deleted");
                                318                 :                :             break;
                                319                 :                : 
                                320                 :              0 :         default:
 1788                           321         [ #  # ]:              0 :             elog(ERROR, "unrecognized table_tuple_delete status: %u", result);
                                322                 :                :             break;
                                323                 :                :     }
 1849 andres@anarazel.de        324                 :CBC       40311 : }
                                325                 :                : 
                                326                 :                : /*
                                327                 :                :  * simple_table_tuple_update - replace a tuple
                                328                 :                :  *
                                329                 :                :  * This routine may be used to update a tuple when concurrent updates of
                                330                 :                :  * the target tuple are not expected (for example, because we have a lock
                                331                 :                :  * on the relation associated with the tuple).  Any failure is reported
                                332                 :                :  * via ereport().
                                333                 :                :  */
                                334                 :                : void
 1788                           335                 :          31923 : simple_table_tuple_update(Relation rel, ItemPointer otid,
                                336                 :                :                           TupleTableSlot *slot,
                                337                 :                :                           Snapshot snapshot,
                                338                 :                :                           TU_UpdateIndexes *update_indexes)
                                339                 :                : {
                                340                 :                :     TM_Result   result;
                                341                 :                :     TM_FailureData tmfd;
                                342                 :                :     LockTupleMode lockmode;
                                343                 :                : 
                                344                 :          31923 :     result = table_tuple_update(rel, otid, slot,
                                345                 :                :                                 GetCurrentCommandId(true),
                                346                 :                :                                 snapshot, InvalidSnapshot,
                                347                 :                :                                 true /* wait for commit */ ,
                                348                 :                :                                 &tmfd, &lockmode, update_indexes);
                                349                 :                : 
 1849                           350   [ -  +  -  -  :          31923 :     switch (result)
                                                 - ]
                                351                 :                :     {
 1849 andres@anarazel.de        352                 :UBC           0 :         case TM_SelfModified:
                                353                 :                :             /* Tuple was already updated in current command? */
                                354         [ #  # ]:              0 :             elog(ERROR, "tuple already updated by self");
                                355                 :                :             break;
                                356                 :                : 
 1849 andres@anarazel.de        357                 :CBC       31923 :         case TM_Ok:
                                358                 :                :             /* done successfully */
                                359                 :          31923 :             break;
                                360                 :                : 
 1849 andres@anarazel.de        361                 :UBC           0 :         case TM_Updated:
                                362         [ #  # ]:              0 :             elog(ERROR, "tuple concurrently updated");
                                363                 :                :             break;
                                364                 :                : 
                                365                 :              0 :         case TM_Deleted:
                                366         [ #  # ]:              0 :             elog(ERROR, "tuple concurrently deleted");
                                367                 :                :             break;
                                368                 :                : 
                                369                 :              0 :         default:
 1788                           370         [ #  # ]:              0 :             elog(ERROR, "unrecognized table_tuple_update status: %u", result);
                                371                 :                :             break;
                                372                 :                :     }
 1849 andres@anarazel.de        373                 :CBC       31923 : }
                                374                 :                : 
                                375                 :                : 
                                376                 :                : /* ----------------------------------------------------------------------------
                                377                 :                :  * Helper functions to implement parallel scans for block oriented AMs.
                                378                 :                :  * ----------------------------------------------------------------------------
                                379                 :                :  */
                                380                 :                : 
                                381                 :                : Size
 1861                           382                 :            523 : table_block_parallelscan_estimate(Relation rel)
                                383                 :                : {
                                384                 :            523 :     return sizeof(ParallelBlockTableScanDescData);
                                385                 :                : }
                                386                 :                : 
                                387                 :                : Size
                                388                 :            523 : table_block_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan)
                                389                 :                : {
                                390                 :            523 :     ParallelBlockTableScanDesc bpscan = (ParallelBlockTableScanDesc) pscan;
                                391                 :                : 
                                392                 :            523 :     bpscan->base.phs_relid = RelationGetRelid(rel);
                                393                 :            523 :     bpscan->phs_nblocks = RelationGetNumberOfBlocks(rel);
                                394                 :                :     /* compare phs_syncscan initialization to similar logic in initscan */
                                395                 :           1399 :     bpscan->base.phs_syncscan = synchronize_seqscans &&
                                396   [ +  +  +  - ]:            876 :         !RelationUsesLocalBuffers(rel) &&
                                397         [ +  + ]:            353 :         bpscan->phs_nblocks > NBuffers / 4;
                                398                 :            523 :     SpinLockInit(&bpscan->phs_mutex);
                                399                 :            523 :     bpscan->phs_startblock = InvalidBlockNumber;
                                400                 :            523 :     pg_atomic_init_u64(&bpscan->phs_nallocated, 0);
                                401                 :                : 
                                402                 :            523 :     return sizeof(ParallelBlockTableScanDescData);
                                403                 :                : }
                                404                 :                : 
                                405                 :                : void
                                406                 :            114 : table_block_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
                                407                 :                : {
                                408                 :            114 :     ParallelBlockTableScanDesc bpscan = (ParallelBlockTableScanDesc) pscan;
                                409                 :                : 
                                410                 :            114 :     pg_atomic_write_u64(&bpscan->phs_nallocated, 0);
                                411                 :            114 : }
                                412                 :                : 
                                413                 :                : /*
                                414                 :                :  * find and set the scan's startblock
                                415                 :                :  *
                                416                 :                :  * Determine where the parallel seq scan should start.  This function may be
                                417                 :                :  * called many times, once by each parallel worker.  We must be careful only
                                418                 :                :  * to set the startblock once.
                                419                 :                :  */
                                420                 :                : void
 1358 drowley@postgresql.o      421                 :           1544 : table_block_parallelscan_startblock_init(Relation rel,
                                422                 :                :                                          ParallelBlockTableScanWorker pbscanwork,
                                423                 :                :                                          ParallelBlockTableScanDesc pbscan)
                                424                 :                : {
 1861 andres@anarazel.de        425                 :           1544 :     BlockNumber sync_startpage = InvalidBlockNumber;
                                426                 :                : 
                                427                 :                :     /* Reset the state we use for controlling allocation size. */
 1358 drowley@postgresql.o      428                 :           1544 :     memset(pbscanwork, 0, sizeof(*pbscanwork));
                                429                 :                : 
                                430                 :                :     StaticAssertStmt(MaxBlockNumber <= 0xFFFFFFFE,
                                431                 :                :                      "pg_nextpower2_32 may be too small for non-standard BlockNumber width");
                                432                 :                : 
                                433                 :                :     /*
                                434                 :                :      * We determine the chunk size based on the size of the relation. First we
                                435                 :                :      * split the relation into PARALLEL_SEQSCAN_NCHUNKS chunks but we then
                                436                 :                :      * take the next highest power of 2 number of the chunk size.  This means
                                437                 :                :      * we split the relation into somewhere between PARALLEL_SEQSCAN_NCHUNKS
                                438                 :                :      * and PARALLEL_SEQSCAN_NCHUNKS / 2 chunks.
                                439                 :                :      */
                                440         [ +  + ]:           1544 :     pbscanwork->phsw_chunk_size = pg_nextpower2_32(Max(pbscan->phs_nblocks /
                                441                 :                :                                                        PARALLEL_SEQSCAN_NCHUNKS, 1));
                                442                 :                : 
                                443                 :                :     /*
                                444                 :                :      * Ensure we don't go over the maximum chunk size with larger tables. This
                                445                 :                :      * means we may get much more than PARALLEL_SEQSCAN_NCHUNKS for larger
                                446                 :                :      * tables.  Too large a chunk size has been shown to be detrimental to
                                447                 :                :      * synchronous scan performance.
                                448                 :                :      */
                                449                 :           1544 :     pbscanwork->phsw_chunk_size = Min(pbscanwork->phsw_chunk_size,
                                450                 :                :                                       PARALLEL_SEQSCAN_MAX_CHUNK_SIZE);
                                451                 :                : 
 1861 andres@anarazel.de        452                 :           1545 : retry:
                                453                 :                :     /* Grab the spinlock. */
                                454         [ +  + ]:           1545 :     SpinLockAcquire(&pbscan->phs_mutex);
                                455                 :                : 
                                456                 :                :     /*
                                457                 :                :      * If the scan's startblock has not yet been initialized, we must do so
                                458                 :                :      * now.  If this is not a synchronized scan, we just start at block 0, but
                                459                 :                :      * if it is a synchronized scan, we must get the starting position from
                                460                 :                :      * the synchronized scan machinery.  We can't hold the spinlock while
                                461                 :                :      * doing that, though, so release the spinlock, get the information we
                                462                 :                :      * need, and retry.  If nobody else has initialized the scan in the
                                463                 :                :      * meantime, we'll fill in the value we fetched on the second time
                                464                 :                :      * through.
                                465                 :                :      */
                                466         [ +  + ]:           1545 :     if (pbscan->phs_startblock == InvalidBlockNumber)
                                467                 :                :     {
                                468         [ +  + ]:            514 :         if (!pbscan->base.phs_syncscan)
                                469                 :            512 :             pbscan->phs_startblock = 0;
                                470         [ +  + ]:              2 :         else if (sync_startpage != InvalidBlockNumber)
                                471                 :              1 :             pbscan->phs_startblock = sync_startpage;
                                472                 :                :         else
                                473                 :                :         {
                                474                 :              1 :             SpinLockRelease(&pbscan->phs_mutex);
                                475                 :              1 :             sync_startpage = ss_get_location(rel, pbscan->phs_nblocks);
                                476                 :              1 :             goto retry;
                                477                 :                :         }
                                478                 :                :     }
                                479                 :           1544 :     SpinLockRelease(&pbscan->phs_mutex);
                                480                 :           1544 : }
                                481                 :                : 
                                482                 :                : /*
                                483                 :                :  * get the next page to scan
                                484                 :                :  *
                                485                 :                :  * Get the next page to scan.  Even if there are no pages left to scan,
                                486                 :                :  * another backend could have grabbed a page to scan and not yet finished
                                487                 :                :  * looking at it, so it doesn't follow that the scan is done when the first
                                488                 :                :  * backend gets an InvalidBlockNumber return.
                                489                 :                :  */
                                490                 :                : BlockNumber
 1358 drowley@postgresql.o      491                 :         100170 : table_block_parallelscan_nextpage(Relation rel,
                                492                 :                :                                   ParallelBlockTableScanWorker pbscanwork,
                                493                 :                :                                   ParallelBlockTableScanDesc pbscan)
                                494                 :                : {
                                495                 :                :     BlockNumber page;
                                496                 :                :     uint64      nallocated;
                                497                 :                : 
                                498                 :                :     /*
                                499                 :                :      * The logic below allocates block numbers out to parallel workers in a
                                500                 :                :      * way that each worker will receive a set of consecutive block numbers to
                                501                 :                :      * scan.  Earlier versions of this would allocate the next highest block
                                502                 :                :      * number to the next worker to call this function.  This would generally
                                503                 :                :      * result in workers never receiving consecutive block numbers.  Some
                                504                 :                :      * operating systems would not detect the sequential I/O pattern due to
                                505                 :                :      * each backend being a different process which could result in poor
                                506                 :                :      * performance due to inefficient or no readahead.  To work around this
                                507                 :                :      * issue, we now allocate a range of block numbers for each worker and
                                508                 :                :      * when they come back for another block, we give them the next one in
                                509                 :                :      * that range until the range is complete.  When the worker completes the
                                510                 :                :      * range of blocks we then allocate another range for it and return the
                                511                 :                :      * first block number from that range.
                                512                 :                :      *
                                513                 :                :      * Here we name these ranges of blocks "chunks".  The initial size of
                                514                 :                :      * these chunks is determined in table_block_parallelscan_startblock_init
                                515                 :                :      * based on the size of the relation.  Towards the end of the scan, we
                                516                 :                :      * start making reductions in the size of the chunks in order to attempt
                                517                 :                :      * to divide the remaining work over all the workers as evenly as
                                518                 :                :      * possible.
                                519                 :                :      *
                                520                 :                :      * Here pbscanwork is local worker memory.  phsw_chunk_remaining tracks
                                521                 :                :      * the number of blocks remaining in the chunk.  When that reaches 0 then
                                522                 :                :      * we must allocate a new chunk for the worker.
                                523                 :                :      *
                                524                 :                :      * phs_nallocated tracks how many blocks have been allocated to workers
                                525                 :                :      * already.  When phs_nallocated >= rs_nblocks, all blocks have been
                                526                 :                :      * allocated.
                                527                 :                :      *
                                528                 :                :      * Because we use an atomic fetch-and-add to fetch the current value, the
                                529                 :                :      * phs_nallocated counter will exceed rs_nblocks, because workers will
                                530                 :                :      * still increment the value, when they try to allocate the next block but
                                531                 :                :      * all blocks have been allocated already. The counter must be 64 bits
                                532                 :                :      * wide because of that, to avoid wrapping around when rs_nblocks is close
                                533                 :                :      * to 2^32.
                                534                 :                :      *
                                535                 :                :      * The actual block to return is calculated by adding the counter to the
                                536                 :                :      * starting block number, modulo nblocks.
                                537                 :                :      */
                                538                 :                : 
                                539                 :                :     /*
                                540                 :                :      * First check if we have any remaining blocks in a previous chunk for
                                541                 :                :      * this worker.  We must consume all of the blocks from that before we
                                542                 :                :      * allocate a new chunk to the worker.
                                543                 :                :      */
                                544         [ +  + ]:         100170 :     if (pbscanwork->phsw_chunk_remaining > 0)
                                545                 :                :     {
                                546                 :                :         /*
                                547                 :                :          * Give them the next block in the range and update the remaining
                                548                 :                :          * number of blocks.
                                549                 :                :          */
                                550                 :           6513 :         nallocated = ++pbscanwork->phsw_nallocated;
                                551                 :           6513 :         pbscanwork->phsw_chunk_remaining--;
                                552                 :                :     }
                                553                 :                :     else
                                554                 :                :     {
                                555                 :                :         /*
                                556                 :                :          * When we've only got PARALLEL_SEQSCAN_RAMPDOWN_CHUNKS chunks
                                557                 :                :          * remaining in the scan, we half the chunk size.  Since we reduce the
                                558                 :                :          * chunk size here, we'll hit this again after doing
                                559                 :                :          * PARALLEL_SEQSCAN_RAMPDOWN_CHUNKS at the new size.  After a few
                                560                 :                :          * iterations of this, we'll end up doing the last few blocks with the
                                561                 :                :          * chunk size set to 1.
                                562                 :                :          */
                                563         [ +  + ]:          93657 :         if (pbscanwork->phsw_chunk_size > 1 &&
                                564                 :           2215 :             pbscanwork->phsw_nallocated > pbscan->phs_nblocks -
                                565         [ +  + ]:           2215 :             (pbscanwork->phsw_chunk_size * PARALLEL_SEQSCAN_RAMPDOWN_CHUNKS))
                                566                 :              4 :             pbscanwork->phsw_chunk_size >>= 1;
                                567                 :                : 
                                568                 :          93657 :         nallocated = pbscanwork->phsw_nallocated =
                                569                 :          93657 :             pg_atomic_fetch_add_u64(&pbscan->phs_nallocated,
                                570                 :          93657 :                                     pbscanwork->phsw_chunk_size);
                                571                 :                : 
                                572                 :                :         /*
                                573                 :                :          * Set the remaining number of blocks in this chunk so that subsequent
                                574                 :                :          * calls from this worker continue on with this chunk until it's done.
                                575                 :                :          */
                                576                 :          93657 :         pbscanwork->phsw_chunk_remaining = pbscanwork->phsw_chunk_size - 1;
                                577                 :                :     }
                                578                 :                : 
 1861 andres@anarazel.de        579         [ +  + ]:         100170 :     if (nallocated >= pbscan->phs_nblocks)
                                580                 :           1544 :         page = InvalidBlockNumber;  /* all blocks have been allocated */
                                581                 :                :     else
                                582                 :          98626 :         page = (nallocated + pbscan->phs_startblock) % pbscan->phs_nblocks;
                                583                 :                : 
                                584                 :                :     /*
                                585                 :                :      * Report scan location.  Normally, we report the current page number.
                                586                 :                :      * When we reach the end of the scan, though, we report the starting page,
                                587                 :                :      * not the ending page, just so the starting positions for later scans
                                588                 :                :      * doesn't slew backwards.  We only report the position at the end of the
                                589                 :                :      * scan once, though: subsequent callers will report nothing.
                                590                 :                :      */
                                591         [ +  + ]:         100170 :     if (pbscan->base.phs_syncscan)
                                592                 :                :     {
                                593         [ +  + ]:           8852 :         if (page != InvalidBlockNumber)
                                594                 :           8850 :             ss_report_location(rel, page);
                                595         [ +  + ]:              2 :         else if (nallocated == pbscan->phs_nblocks)
                                596                 :              1 :             ss_report_location(rel, pbscan->phs_startblock);
                                597                 :                :     }
                                598                 :                : 
                                599                 :         100170 :     return page;
                                600                 :                : }
                                601                 :                : 
                                602                 :                : /* ----------------------------------------------------------------------------
                                603                 :                :  * Helper functions to implement relation sizing for block oriented AMs.
                                604                 :                :  * ----------------------------------------------------------------------------
                                605                 :                :  */
                                606                 :                : 
                                607                 :                : /*
                                608                 :                :  * table_block_relation_size
                                609                 :                :  *
                                610                 :                :  * If a table AM uses the various relation forks as the sole place where data
                                611                 :                :  * is stored, and if it uses them in the expected manner (e.g. the actual data
                                612                 :                :  * is in the main fork rather than some other), it can use this implementation
                                613                 :                :  * of the relation_size callback rather than implementing its own.
                                614                 :                :  */
                                615                 :                : uint64
 1742 rhaas@postgresql.org      616                 :        1161069 : table_block_relation_size(Relation rel, ForkNumber forkNumber)
                                617                 :                : {
                                618                 :        1161069 :     uint64      nblocks = 0;
                                619                 :                : 
                                620                 :                :     /* InvalidForkNumber indicates returning the size for all forks */
                                621         [ -  + ]:        1161069 :     if (forkNumber == InvalidForkNumber)
                                622                 :                :     {
 1742 rhaas@postgresql.org      623         [ #  # ]:UBC           0 :         for (int i = 0; i < MAX_FORKNUM; i++)
 1007 tgl@sss.pgh.pa.us         624                 :              0 :             nblocks += smgrnblocks(RelationGetSmgr(rel), i);
                                625                 :                :     }
                                626                 :                :     else
 1007 tgl@sss.pgh.pa.us         627                 :CBC     1161069 :         nblocks = smgrnblocks(RelationGetSmgr(rel), forkNumber);
                                628                 :                : 
 1742 rhaas@postgresql.org      629                 :        1161050 :     return nblocks * BLCKSZ;
                                630                 :                : }
                                631                 :                : 
                                632                 :                : /*
                                633                 :                :  * table_block_relation_estimate_size
                                634                 :                :  *
                                635                 :                :  * This function can't be directly used as the implementation of the
                                636                 :                :  * relation_estimate_size callback, because it has a few additional parameters.
                                637                 :                :  * Instead, it is intended to be used as a helper function; the caller can
                                638                 :                :  * pass through the arguments to its relation_estimate_size function plus the
                                639                 :                :  * additional values required here.
                                640                 :                :  *
                                641                 :                :  * overhead_bytes_per_tuple should contain the approximate number of bytes
                                642                 :                :  * of storage required to store a tuple above and beyond what is required for
                                643                 :                :  * the tuple data proper. Typically, this would include things like the
                                644                 :                :  * size of the tuple header and item pointer. This is only used for query
                                645                 :                :  * planning, so a table AM where the value is not constant could choose to
                                646                 :                :  * pass a "best guess".
                                647                 :                :  *
                                648                 :                :  * usable_bytes_per_page should contain the approximate number of bytes per
                                649                 :                :  * page usable for tuple data, excluding the page header and any anticipated
                                650                 :                :  * special space.
                                651                 :                :  */
                                652                 :                : void
                                653                 :         193702 : table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
                                654                 :                :                                    BlockNumber *pages, double *tuples,
                                655                 :                :                                    double *allvisfrac,
                                656                 :                :                                    Size overhead_bytes_per_tuple,
                                657                 :                :                                    Size usable_bytes_per_page)
                                658                 :                : {
                                659                 :                :     BlockNumber curpages;
                                660                 :                :     BlockNumber relpages;
                                661                 :                :     double      reltuples;
                                662                 :                :     BlockNumber relallvisible;
                                663                 :                :     double      density;
                                664                 :                : 
                                665                 :                :     /* it should have storage, so we can call the smgr */
                                666                 :         193702 :     curpages = RelationGetNumberOfBlocks(rel);
                                667                 :                : 
                                668                 :                :     /* coerce values in pg_class to more desirable types */
                                669                 :         193702 :     relpages = (BlockNumber) rel->rd_rel->relpages;
                                670                 :         193702 :     reltuples = (double) rel->rd_rel->reltuples;
                                671                 :         193702 :     relallvisible = (BlockNumber) rel->rd_rel->relallvisible;
                                672                 :                : 
                                673                 :                :     /*
                                674                 :                :      * HACK: if the relation has never yet been vacuumed, use a minimum size
                                675                 :                :      * estimate of 10 pages.  The idea here is to avoid assuming a
                                676                 :                :      * newly-created table is really small, even if it currently is, because
                                677                 :                :      * that may not be true once some data gets loaded into it.  Once a vacuum
                                678                 :                :      * or analyze cycle has been done on it, it's more reasonable to believe
                                679                 :                :      * the size is somewhat stable.
                                680                 :                :      *
                                681                 :                :      * (Note that this is only an issue if the plan gets cached and used again
                                682                 :                :      * after the table has been filled.  What we're trying to avoid is using a
                                683                 :                :      * nestloop-type plan on a table that has grown substantially since the
                                684                 :                :      * plan was made.  Normally, autovacuum/autoanalyze will occur once enough
                                685                 :                :      * inserts have happened and cause cached-plan invalidation; but that
                                686                 :                :      * doesn't happen instantaneously, and it won't happen at all for cases
                                687                 :                :      * such as temporary tables.)
                                688                 :                :      *
                                689                 :                :      * We test "never vacuumed" by seeing whether reltuples < 0.
                                690                 :                :      *
                                691                 :                :      * If the table has inheritance children, we don't apply this heuristic.
                                692                 :                :      * Totally empty parent tables are quite common, so we should be willing
                                693                 :                :      * to believe that they are empty.
                                694                 :                :      */
                                695   [ +  +  +  + ]:         193702 :     if (curpages < 10 &&
 1323 tgl@sss.pgh.pa.us         696                 :          51944 :         reltuples < 0 &&
 1742 rhaas@postgresql.org      697         [ +  + ]:          51944 :         !rel->rd_rel->relhassubclass)
                                698                 :          50726 :         curpages = 10;
                                699                 :                : 
                                700                 :                :     /* report estimated # pages */
                                701                 :         193702 :     *pages = curpages;
                                702                 :                :     /* quick exit if rel is clearly empty */
                                703         [ +  + ]:         193702 :     if (curpages == 0)
                                704                 :                :     {
                                705                 :           8028 :         *tuples = 0;
                                706                 :           8028 :         *allvisfrac = 0;
                                707                 :           8028 :         return;
                                708                 :                :     }
                                709                 :                : 
                                710                 :                :     /* estimate number of tuples from previous tuple density */
 1323 tgl@sss.pgh.pa.us         711   [ +  +  +  + ]:         185674 :     if (reltuples >= 0 && relpages > 0)
 1742 rhaas@postgresql.org      712                 :         121218 :         density = reltuples / (double) relpages;
                                713                 :                :     else
                                714                 :                :     {
                                715                 :                :         /*
                                716                 :                :          * When we have no data because the relation was never yet vacuumed,
                                717                 :                :          * estimate tuple width from attribute datatypes.  We assume here that
                                718                 :                :          * the pages are completely full, which is OK for tables but is
                                719                 :                :          * probably an overestimate for indexes.  Fortunately
                                720                 :                :          * get_relation_info() can clamp the overestimate to the parent
                                721                 :                :          * table's size.
                                722                 :                :          *
                                723                 :                :          * Note: this code intentionally disregards alignment considerations,
                                724                 :                :          * because (a) that would be gilding the lily considering how crude
                                725                 :                :          * the estimate is, (b) it creates platform dependencies in the
                                726                 :                :          * default plans which are kind of a headache for regression testing,
                                727                 :                :          * and (c) different table AMs might use different padding schemes.
                                728                 :                :          */
                                729                 :                :         int32       tuple_width;
                                730                 :                :         int         fillfactor;
                                731                 :                : 
                                732                 :                :         /*
                                733                 :                :          * Without reltuples/relpages, we also need to consider fillfactor.
                                734                 :                :          * The other branch considers it implicitly by calculating density
                                735                 :                :          * from actual relpages/reltuples statistics.
                                736                 :                :          */
    3 akorotkov@postgresql      737         [ +  + ]:GNC       64456 :         fillfactor = RelationGetFillFactor(rel, HEAP_DEFAULT_FILLFACTOR);
                                738                 :                : 
 1742 rhaas@postgresql.org      739                 :CBC       64456 :         tuple_width = get_rel_data_width(rel, attr_widths);
                                740                 :          64456 :         tuple_width += overhead_bytes_per_tuple;
                                741                 :                :         /* note: integer division is intentional here */
  286 tomas.vondra@postgre      742                 :GNC       64456 :         density = (usable_bytes_per_page * fillfactor / 100) / tuple_width;
                                743                 :                :     }
 1742 rhaas@postgresql.org      744                 :CBC      185674 :     *tuples = rint(density * (double) curpages);
                                745                 :                : 
                                746                 :                :     /*
                                747                 :                :      * We use relallvisible as-is, rather than scaling it up like we do for
                                748                 :                :      * the pages and tuples counts, on the theory that any pages added since
                                749                 :                :      * the last VACUUM are most likely not marked all-visible.  But costsize.c
                                750                 :                :      * wants it converted to a fraction.
                                751                 :                :      */
                                752   [ +  +  -  + ]:         185674 :     if (relallvisible == 0 || curpages <= 0)
                                753                 :          91420 :         *allvisfrac = 0;
                                754         [ +  + ]:          94254 :     else if ((double) relallvisible >= curpages)
                                755                 :          49179 :         *allvisfrac = 1;
                                756                 :                :     else
                                757                 :          45075 :         *allvisfrac = (double) relallvisible / curpages;
                                758                 :                : }
        

Generated by: LCOV version 2.1-beta2-3-g6141622