LCOV - differential code coverage report
Current view: top level - src/backend/storage/smgr - md.c (source / functions) Coverage Total Hit UNC LBC UBC GNC CBC DUB DCB
Current: Differential Code Coverage 16@8cea358b128 vs 17@8cea358b128 Lines: 75.4 % 480 362 23 3 92 84 278 10 25
Current Date: 2024-04-14 14:21:10 Functions: 97.1 % 34 33 1 7 26 3
Baseline: 16@8cea358b128 Branches: 51.7 % 350 181 30 3 136 36 145
Baseline Date: 2024-04-14 14:21:09 Line coverage date bins:
Legend: Lines: hit not hit | Branches: + taken - not taken # not executed [..60] days: 73.3 % 15 11 4 11
(60,120] days: 79.3 % 92 73 19 73
(240..) days: 74.5 % 373 278 3 92 278
Function coverage date bins:
[..60] days: 100.0 % 1 1 1
(60,120] days: 100.0 % 4 4 4
(240..) days: 96.6 % 29 28 1 2 26
Branch coverage date bins:
[..60] days: 66.7 % 6 4 2 4
(60,120] days: 53.3 % 60 32 28 32
(240..) days: 51.1 % 284 145 3 136 145

 Age         Owner                    Branch data    TLA  Line data    Source code
                                  1                 :                : /*-------------------------------------------------------------------------
                                  2                 :                :  *
                                  3                 :                :  * md.c
                                  4                 :                :  *    This code manages relations that reside on magnetic disk.
                                  5                 :                :  *
                                  6                 :                :  * Or at least, that was what the Berkeley folk had in mind when they named
                                  7                 :                :  * this file.  In reality, what this code provides is an interface from
                                  8                 :                :  * the smgr API to Unix-like filesystem APIs, so it will work with any type
                                  9                 :                :  * of device for which the operating system provides filesystem support.
                                 10                 :                :  * It doesn't matter whether the bits are on spinning rust or some other
                                 11                 :                :  * storage technology.
                                 12                 :                :  *
                                 13                 :                :  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
                                 14                 :                :  * Portions Copyright (c) 1994, Regents of the University of California
                                 15                 :                :  *
                                 16                 :                :  *
                                 17                 :                :  * IDENTIFICATION
                                 18                 :                :  *    src/backend/storage/smgr/md.c
                                 19                 :                :  *
                                 20                 :                :  *-------------------------------------------------------------------------
                                 21                 :                :  */
                                 22                 :                : #include "postgres.h"
                                 23                 :                : 
                                 24                 :                : #include <unistd.h>
                                 25                 :                : #include <fcntl.h>
                                 26                 :                : #include <sys/file.h>
                                 27                 :                : 
                                 28                 :                : #include "access/xlogutils.h"
                                 29                 :                : #include "commands/tablespace.h"
                                 30                 :                : #include "common/file_utils.h"
                                 31                 :                : #include "miscadmin.h"
                                 32                 :                : #include "pg_trace.h"
                                 33                 :                : #include "pgstat.h"
                                 34                 :                : #include "storage/bufmgr.h"
                                 35                 :                : #include "storage/fd.h"
                                 36                 :                : #include "storage/md.h"
                                 37                 :                : #include "storage/relfilelocator.h"
                                 38                 :                : #include "storage/smgr.h"
                                 39                 :                : #include "storage/sync.h"
                                 40                 :                : #include "utils/memutils.h"
                                 41                 :                : 
                                 42                 :                : /*
                                 43                 :                :  * The magnetic disk storage manager keeps track of open file
                                 44                 :                :  * descriptors in its own descriptor pool.  This is done to make it
                                 45                 :                :  * easier to support relations that are larger than the operating
                                 46                 :                :  * system's file size limit (often 2GBytes).  In order to do that,
                                 47                 :                :  * we break relations up into "segment" files that are each shorter than
                                 48                 :                :  * the OS file size limit.  The segment size is set by the RELSEG_SIZE
                                 49                 :                :  * configuration constant in pg_config.h.
                                 50                 :                :  *
                                 51                 :                :  * On disk, a relation must consist of consecutively numbered segment
                                 52                 :                :  * files in the pattern
                                 53                 :                :  *  -- Zero or more full segments of exactly RELSEG_SIZE blocks each
                                 54                 :                :  *  -- Exactly one partial segment of size 0 <= size < RELSEG_SIZE blocks
                                 55                 :                :  *  -- Optionally, any number of inactive segments of size 0 blocks.
                                 56                 :                :  * The full and partial segments are collectively the "active" segments.
                                 57                 :                :  * Inactive segments are those that once contained data but are currently
                                 58                 :                :  * not needed because of an mdtruncate() operation.  The reason for leaving
                                 59                 :                :  * them present at size zero, rather than unlinking them, is that other
                                 60                 :                :  * backends and/or the checkpointer might be holding open file references to
                                 61                 :                :  * such segments.  If the relation expands again after mdtruncate(), such
                                 62                 :                :  * that a deactivated segment becomes active again, it is important that
                                 63                 :                :  * such file references still be valid --- else data might get written
                                 64                 :                :  * out to an unlinked old copy of a segment file that will eventually
                                 65                 :                :  * disappear.
                                 66                 :                :  *
                                 67                 :                :  * File descriptors are stored in the per-fork md_seg_fds arrays inside
                                 68                 :                :  * SMgrRelation. The length of these arrays is stored in md_num_open_segs.
                                 69                 :                :  * Note that a fork's md_num_open_segs having a specific value does not
                                 70                 :                :  * necessarily mean the relation doesn't have additional segments; we may
                                 71                 :                :  * just not have opened the next segment yet.  (We could not have "all
                                 72                 :                :  * segments are in the array" as an invariant anyway, since another backend
                                 73                 :                :  * could extend the relation while we aren't looking.)  We do not have
                                 74                 :                :  * entries for inactive segments, however; as soon as we find a partial
                                 75                 :                :  * segment, we assume that any subsequent segments are inactive.
                                 76                 :                :  *
                                 77                 :                :  * The entire MdfdVec array is palloc'd in the MdCxt memory context.
                                 78                 :                :  */
                                 79                 :                : 
                                 80                 :                : typedef struct _MdfdVec
                                 81                 :                : {
                                 82                 :                :     File        mdfd_vfd;       /* fd number in fd.c's pool */
                                 83                 :                :     BlockNumber mdfd_segno;     /* segment number, from 0 */
                                 84                 :                : } MdfdVec;
                                 85                 :                : 
                                 86                 :                : static MemoryContext MdCxt;     /* context for all MdfdVec objects */
                                 87                 :                : 
                                 88                 :                : 
                                 89                 :                : /* Populate a file tag describing an md.c segment file. */
                                 90                 :                : #define INIT_MD_FILETAG(a,xx_rlocator,xx_forknum,xx_segno) \
                                 91                 :                : ( \
                                 92                 :                :     memset(&(a), 0, sizeof(FileTag)), \
                                 93                 :                :     (a).handler = SYNC_HANDLER_MD, \
                                 94                 :                :     (a).rlocator = (xx_rlocator), \
                                 95                 :                :     (a).forknum = (xx_forknum), \
                                 96                 :                :     (a).segno = (xx_segno) \
                                 97                 :                : )
                                 98                 :                : 
                                 99                 :                : 
                                100                 :                : /*** behavior for mdopen & _mdfd_getseg ***/
                                101                 :                : /* ereport if segment not present */
                                102                 :                : #define EXTENSION_FAIL              (1 << 0)
                                103                 :                : /* return NULL if segment not present */
                                104                 :                : #define EXTENSION_RETURN_NULL       (1 << 1)
                                105                 :                : /* create new segments as needed */
                                106                 :                : #define EXTENSION_CREATE            (1 << 2)
                                107                 :                : /* create new segments if needed during recovery */
                                108                 :                : #define EXTENSION_CREATE_RECOVERY   (1 << 3)
                                109                 :                : /*
                                110                 :                :  * Allow opening segments which are preceded by segments smaller than
                                111                 :                :  * RELSEG_SIZE, e.g. inactive segments (see above). Note that this breaks
                                112                 :                :  * mdnblocks() and related functionality henceforth - which currently is ok,
                                113                 :                :  * because this is only required in the checkpointer which never uses
                                114                 :                :  * mdnblocks().
                                115                 :                :  */
                                116                 :                : #define EXTENSION_DONT_CHECK_SIZE   (1 << 4)
                                117                 :                : /* don't try to open a segment, if not already open */
                                118                 :                : #define EXTENSION_DONT_OPEN         (1 << 5)
                                119                 :                : 
                                120                 :                : 
                                121                 :                : /* local routines */
                                122                 :                : static void mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum,
                                123                 :                :                          bool isRedo);
                                124                 :                : static MdfdVec *mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior);
                                125                 :                : static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum,
                                126                 :                :                                    MdfdVec *seg);
                                127                 :                : static void register_unlink_segment(RelFileLocatorBackend rlocator, ForkNumber forknum,
                                128                 :                :                                     BlockNumber segno);
                                129                 :                : static void register_forget_request(RelFileLocatorBackend rlocator, ForkNumber forknum,
                                130                 :                :                                     BlockNumber segno);
                                131                 :                : static void _fdvec_resize(SMgrRelation reln,
                                132                 :                :                           ForkNumber forknum,
                                133                 :                :                           int nseg);
                                134                 :                : static char *_mdfd_segpath(SMgrRelation reln, ForkNumber forknum,
                                135                 :                :                            BlockNumber segno);
                                136                 :                : static MdfdVec *_mdfd_openseg(SMgrRelation reln, ForkNumber forknum,
                                137                 :                :                               BlockNumber segno, int oflags);
                                138                 :                : static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forknum,
                                139                 :                :                              BlockNumber blkno, bool skipFsync, int behavior);
                                140                 :                : static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum,
                                141                 :                :                               MdfdVec *seg);
                                142                 :                : 
                                143                 :                : static inline int
  372 tmunro@postgresql.or      144                 :CBC     1237959 : _mdfd_open_flags(void)
                                145                 :                : {
                                146                 :        1237959 :     int         flags = O_RDWR | PG_BINARY;
                                147                 :                : 
                                148         [ +  + ]:        1237959 :     if (io_direct_flags & IO_DIRECT_DATA)
                                149                 :            319 :         flags |= PG_O_DIRECT;
                                150                 :                : 
                                151                 :        1237959 :     return flags;
                                152                 :                : }
                                153                 :                : 
                                154                 :                : /*
                                155                 :                :  * mdinit() -- Initialize private state for magnetic disk storage manager.
                                156                 :                :  */
                                157                 :                : void
 8327 tgl@sss.pgh.pa.us         158                 :          19578 : mdinit(void)
                                159                 :                : {
 8691                           160                 :          19578 :     MdCxt = AllocSetContextCreate(TopMemoryContext,
                                161                 :                :                                   "MdSmgr",
                                162                 :                :                                   ALLOCSET_DEFAULT_SIZES);
 5407 heikki.linnakangas@i      163                 :          19578 : }
                                164                 :                : 
                                165                 :                : /*
                                166                 :                :  * mdexists() -- Does the physical file exist?
                                167                 :                :  *
                                168                 :                :  * Note: this will return true for lingering files, with pending deletions
                                169                 :                :  */
                                170                 :                : bool
  573 pg@bowt.ie                171                 :        1123745 : mdexists(SMgrRelation reln, ForkNumber forknum)
                                172                 :                : {
                                173                 :                :     /*
                                174                 :                :      * Close it first, to ensure that we notice if the fork has been unlinked
                                175                 :                :      * since we opened it.  As an optimization, we can skip that in recovery,
                                176                 :                :      * which already closes relations when dropping them.
                                177                 :                :      */
  738 tmunro@postgresql.or      178         [ +  + ]:        1123745 :     if (!InRecovery)
  573 pg@bowt.ie                179                 :         526893 :         mdclose(reln, forknum);
                                180                 :                : 
                                181                 :        1123745 :     return (mdopenfork(reln, forknum, EXTENSION_RETURN_NULL) != NULL);
                                182                 :                : }
                                183                 :                : 
                                184                 :                : /*
                                185                 :                :  * mdcreate() -- Create a new relation on magnetic disk.
                                186                 :                :  *
                                187                 :                :  * If isRedo is true, it's okay for the relation to exist already.
                                188                 :                :  */
                                189                 :                : void
                                190                 :        3127289 : mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
                                191                 :                : {
                                192                 :                :     MdfdVec    *mdfd;
                                193                 :                :     char       *path;
                                194                 :                :     File        fd;
                                195                 :                : 
                                196   [ +  +  +  + ]:        3127289 :     if (isRedo && reln->md_num_open_segs[forknum] > 0)
 6311 tgl@sss.pgh.pa.us         197                 :        2982072 :         return;                 /* created and opened already... */
                                198                 :                : 
  573 pg@bowt.ie                199         [ -  + ]:         145217 :     Assert(reln->md_num_open_segs[forknum] == 0);
                                200                 :                : 
                                201                 :                :     /*
                                202                 :                :      * We may be using the target table space for the first time in this
                                203                 :                :      * database, so create a per-database subdirectory if needed.
                                204                 :                :      *
                                205                 :                :      * XXX this is a fairly ugly violation of module layering, but this seems
                                206                 :                :      * to be the best place to put the check.  Maybe TablespaceCreateDbspace
                                207                 :                :      * should be here and not in commands/tablespace.c?  But that would imply
                                208                 :                :      * importing a lot of stuff that smgr.c oughtn't know, either.
                                209                 :                :      */
  648 rhaas@postgresql.org      210                 :         145217 :     TablespaceCreateDbspace(reln->smgr_rlocator.locator.spcOid,
                                211                 :                :                             reln->smgr_rlocator.locator.dbOid,
                                212                 :                :                             isRedo);
                                213                 :                : 
  573 pg@bowt.ie                214                 :         145217 :     path = relpath(reln->smgr_rlocator, forknum);
                                215                 :                : 
  372 tmunro@postgresql.or      216                 :         145217 :     fd = PathNameOpenFile(path, _mdfd_open_flags() | O_CREAT | O_EXCL);
                                217                 :                : 
 9716 bruce@momjian.us          218         [ +  + ]:         145217 :     if (fd < 0)
                                219                 :                :     {
 8424                           220                 :           6182 :         int         save_errno = errno;
                                221                 :                : 
 1903 akapila@postgresql.o      222         [ +  - ]:           6182 :         if (isRedo)
  372 tmunro@postgresql.or      223                 :           6182 :             fd = PathNameOpenFile(path, _mdfd_open_flags());
 9716 bruce@momjian.us          224         [ -  + ]:           6182 :         if (fd < 0)
                                225                 :                :         {
                                226                 :                :             /* be sure to report the error reported by create, not open */
 8700 tgl@sss.pgh.pa.us         227                 :UBC           0 :             errno = save_errno;
 6311                           228         [ #  # ]:              0 :             ereport(ERROR,
                                229                 :                :                     (errcode_for_file_access(),
                                230                 :                :                      errmsg("could not create file \"%s\": %m", path)));
                                231                 :                :         }
                                232                 :                :     }
                                233                 :                : 
 8558 tgl@sss.pgh.pa.us         234                 :CBC      145217 :     pfree(path);
                                235                 :                : 
  573 pg@bowt.ie                236                 :         145217 :     _fdvec_resize(reln, forknum, 1);
                                237                 :         145217 :     mdfd = &reln->md_seg_fds[forknum][0];
 2775 andres@anarazel.de        238                 :         145217 :     mdfd->mdfd_vfd = fd;
                                239                 :         145217 :     mdfd->mdfd_segno = 0;
                                240                 :                : 
  285 heikki.linnakangas@i      241         [ +  + ]:         145217 :     if (!SmgrIsTemp(reln))
                                242                 :         142168 :         register_dirty_segment(reln, forknum, mdfd);
                                243                 :                : }
                                244                 :                : 
                                245                 :                : /*
                                246                 :                :  * mdunlink() -- Unlink a relation.
                                247                 :                :  *
                                248                 :                :  * Note that we're passed a RelFileLocatorBackend --- by the time this is called,
                                249                 :                :  * there won't be an SMgrRelation hashtable entry anymore.
                                250                 :                :  *
                                251                 :                :  * forknum can be a fork number to delete a specific fork, or InvalidForkNumber
                                252                 :                :  * to delete all forks.
                                253                 :                :  *
                                254                 :                :  * For regular relations, we don't unlink the first segment file of the rel,
                                255                 :                :  * but just truncate it to zero length, and record a request to unlink it after
                                256                 :                :  * the next checkpoint.  Additional segments can be unlinked immediately,
                                257                 :                :  * however.  Leaving the empty file in place prevents that relfilenumber
                                258                 :                :  * from being reused.  The scenario this protects us from is:
                                259                 :                :  * 1. We delete a relation (and commit, and actually remove its file).
                                260                 :                :  * 2. We create a new relation, which by chance gets the same relfilenumber as
                                261                 :                :  *    the just-deleted one (OIDs must've wrapped around for that to happen).
                                262                 :                :  * 3. We crash before another checkpoint occurs.
                                263                 :                :  * During replay, we would delete the file and then recreate it, which is fine
                                264                 :                :  * if the contents of the file were repopulated by subsequent WAL entries.
                                265                 :                :  * But if we didn't WAL-log insertions, but instead relied on fsyncing the
                                266                 :                :  * file after populating it (as we do at wal_level=minimal), the contents of
                                267                 :                :  * the file would be lost forever.  By leaving the empty file until after the
                                268                 :                :  * next checkpoint, we prevent reassignment of the relfilenumber until it's
                                269                 :                :  * safe, because relfilenumber assignment skips over any existing file.
                                270                 :                :  *
                                271                 :                :  * Additional segments, if any, are truncated and then unlinked.  The reason
                                272                 :                :  * for truncating is that other backends may still hold open FDs for these at
                                273                 :                :  * the smgr level, so that the kernel can't remove the file yet.  We want to
                                274                 :                :  * reclaim the disk space right away despite that.
                                275                 :                :  *
                                276                 :                :  * We do not need to go through this dance for temp relations, though, because
                                277                 :                :  * we never make WAL entries for temp rels, and so a temp rel poses no threat
                                278                 :                :  * to the health of a regular rel that has taken over its relfilenumber.
                                279                 :                :  * The fact that temp rels and regular rels have different file naming
                                280                 :                :  * patterns provides additional safety.  Other backends shouldn't have open
                                281                 :                :  * FDs for them, either.
                                282                 :                :  *
                                283                 :                :  * We also don't do it while performing a binary upgrade.  There is no reuse
                                284                 :                :  * hazard in that case, since after a crash or even a simple ERROR, the
                                285                 :                :  * upgrade fails and the whole cluster must be recreated from scratch.
                                286                 :                :  * Furthermore, it is important to remove the files from disk immediately,
                                287                 :                :  * because we may be about to reuse the same relfilenumber.
                                288                 :                :  *
                                289                 :                :  * All the above applies only to the relation's main fork; other forks can
                                290                 :                :  * just be removed immediately, since they are not needed to prevent the
                                291                 :                :  * relfilenumber from being recycled.  Also, we do not carefully
                                292                 :                :  * track whether other forks have been created or not, but just attempt to
                                293                 :                :  * unlink them unconditionally; so we should never complain about ENOENT.
                                294                 :                :  *
                                295                 :                :  * If isRedo is true, it's unsurprising for the relation to be already gone.
                                296                 :                :  * Also, we should remove the file immediately instead of queuing a request
                                297                 :                :  * for later, since during redo there's no possibility of creating a
                                298                 :                :  * conflicting relation.
                                299                 :                :  *
                                300                 :                :  * Note: we currently just never warn about ENOENT at all.  We could warn in
                                301                 :                :  * the main-fork, non-isRedo case, but it doesn't seem worth the trouble.
                                302                 :                :  *
                                303                 :                :  * Note: any failure should be reported as WARNING not ERROR, because
                                304                 :                :  * we are usually not in a transaction anymore when this is called.
                                305                 :                :  */
                                306                 :                : void
  573 pg@bowt.ie                307                 :         169924 : mdunlink(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
                                308                 :                : {
                                309                 :                :     /* Now do the per-fork work */
                                310         [ -  + ]:         169924 :     if (forknum == InvalidForkNumber)
                                311                 :                :     {
  573 pg@bowt.ie                312         [ #  # ]:UBC           0 :         for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
                                313                 :              0 :             mdunlinkfork(rlocator, forknum, isRedo);
                                314                 :                :     }
                                315                 :                :     else
  573 pg@bowt.ie                316                 :CBC      169924 :         mdunlinkfork(rlocator, forknum, isRedo);
 4287 tgl@sss.pgh.pa.us         317                 :         169924 : }
                                318                 :                : 
                                319                 :                : /*
                                320                 :                :  * Truncate a file to release disk space.
                                321                 :                :  */
                                322                 :                : static int
 1230 tmunro@postgresql.or      323                 :         199495 : do_truncate(const char *path)
                                324                 :                : {
                                325                 :                :     int         save_errno;
                                326                 :                :     int         ret;
                                327                 :                : 
                                328                 :         199495 :     ret = pg_truncate(path, 0);
                                329                 :                : 
                                330                 :                :     /* Log a warning here to avoid repetition in callers. */
                                331   [ +  +  -  + ]:         199495 :     if (ret < 0 && errno != ENOENT)
                                332                 :                :     {
 1230 tmunro@postgresql.or      333                 :UBC           0 :         save_errno = errno;
                                334         [ #  # ]:              0 :         ereport(WARNING,
                                335                 :                :                 (errcode_for_file_access(),
                                336                 :                :                  errmsg("could not truncate file \"%s\": %m", path)));
                                337                 :              0 :         errno = save_errno;
                                338                 :                :     }
                                339                 :                : 
 1230 tmunro@postgresql.or      340                 :CBC      199495 :     return ret;
                                341                 :                : }
                                342                 :                : 
                                343                 :                : static void
  573 pg@bowt.ie                344                 :         169924 : mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
                                345                 :                : {
                                346                 :                :     char       *path;
                                347                 :                :     int         ret;
                                348                 :                :     int         save_errno;
                                349                 :                : 
                                350                 :         169924 :     path = relpath(rlocator, forknum);
                                351                 :                : 
                                352                 :                :     /*
                                353                 :                :      * Truncate and then unlink the first segment, or just register a request
                                354                 :                :      * to unlink it later, as described in the comments for mdunlink().
                                355                 :                :      */
  522 tgl@sss.pgh.pa.us         356   [ +  +  +  +  :         169924 :     if (isRedo || IsBinaryUpgrade || forknum != MAIN_FORKNUM ||
                                              +  + ]
                                357         [ +  + ]:          35454 :         RelFileLocatorBackendIsTemp(rlocator))
                                358                 :                :     {
  648 rhaas@postgresql.org      359         [ +  + ]:         137373 :         if (!RelFileLocatorBackendIsTemp(rlocator))
                                360                 :                :         {
                                361                 :                :             /* Prevent other backends' fds from holding on to the disk space */
 1230 tmunro@postgresql.or      362                 :         125761 :             ret = do_truncate(path);
                                363                 :                : 
                                364                 :                :             /* Forget any pending sync requests for the first segment */
  524 tgl@sss.pgh.pa.us         365                 :         125761 :             save_errno = errno;
  573 pg@bowt.ie                366                 :         125761 :             register_forget_request(rlocator, forknum, 0 /* first seg */ );
  524 tgl@sss.pgh.pa.us         367                 :         125761 :             errno = save_errno;
                                368                 :                :         }
                                369                 :                :         else
 1230 tmunro@postgresql.or      370                 :          11612 :             ret = 0;
                                371                 :                : 
                                372                 :                :         /* Next unlink the file, unless it was already found to be missing */
  522 tgl@sss.pgh.pa.us         373   [ +  +  -  + ]:         137373 :         if (ret >= 0 || errno != ENOENT)
                                374                 :                :         {
 1230 tmunro@postgresql.or      375                 :          20248 :             ret = unlink(path);
                                376   [ +  +  -  + ]:          20248 :             if (ret < 0 && errno != ENOENT)
                                377                 :                :             {
  522 tgl@sss.pgh.pa.us         378                 :UBC           0 :                 save_errno = errno;
 1230 tmunro@postgresql.or      379         [ #  # ]:              0 :                 ereport(WARNING,
                                380                 :                :                         (errcode_for_file_access(),
                                381                 :                :                          errmsg("could not remove file \"%s\": %m", path)));
  522 tgl@sss.pgh.pa.us         382                 :              0 :                 errno = save_errno;
                                383                 :                :             }
                                384                 :                :         }
                                385                 :                :     }
                                386                 :                :     else
                                387                 :                :     {
                                388                 :                :         /* Prevent other backends' fds from holding on to the disk space */
 1230 tmunro@postgresql.or      389                 :CBC       32551 :         ret = do_truncate(path);
                                390                 :                : 
                                391                 :                :         /* Register request to unlink first segment later */
  522 tgl@sss.pgh.pa.us         392                 :          32551 :         save_errno = errno;
                                393                 :          32551 :         register_unlink_segment(rlocator, forknum, 0 /* first seg */ );
                                394                 :          32551 :         errno = save_errno;
                                395                 :                :     }
                                396                 :                : 
                                397                 :                :     /*
                                398                 :                :      * Delete any additional segments.
                                399                 :                :      *
                                400                 :                :      * Note that because we loop until getting ENOENT, we will correctly
                                401                 :                :      * remove all inactive segments as well as active ones.  Ideally we'd
                                402                 :                :      * continue the loop until getting exactly that errno, but that risks an
                                403                 :                :      * infinite loop if the problem is directory-wide (for instance, if we
                                404                 :                :      * suddenly can't read the data directory itself).  We compromise by
                                405                 :                :      * continuing after a non-ENOENT truncate error, but stopping after any
                                406                 :                :      * unlink error.  If there is indeed a directory-wide problem, additional
                                407                 :                :      * unlink attempts wouldn't work anyway.
                                408                 :                :      */
                                409   [ +  +  -  + ]:         169924 :     if (ret >= 0 || errno != ENOENT)
                                410                 :                :     {
 8558                           411                 :          44222 :         char       *segpath = (char *) palloc(strlen(path) + 12);
                                412                 :                :         BlockNumber segno;
                                413                 :                : 
  522 tgl@sss.pgh.pa.us         414                 :UBC           0 :         for (segno = 1;; segno++)
                                415                 :                :         {
  522 tgl@sss.pgh.pa.us         416                 :CBC       44222 :             sprintf(segpath, "%s.%u", path, segno);
                                417                 :                : 
  648 rhaas@postgresql.org      418         [ +  + ]:          44222 :             if (!RelFileLocatorBackendIsTemp(rlocator))
                                419                 :                :             {
                                420                 :                :                 /*
                                421                 :                :                  * Prevent other backends' fds from holding on to the disk
                                422                 :                :                  * space.  We're done if we see ENOENT, though.
                                423                 :                :                  */
 1230 tmunro@postgresql.or      424   [ +  -  +  - ]:          41183 :                 if (do_truncate(segpath) < 0 && errno == ENOENT)
                                425                 :          41183 :                     break;
                                426                 :                : 
                                427                 :                :                 /*
                                428                 :                :                  * Forget any pending sync requests for this segment before we
                                429                 :                :                  * try to unlink.
                                430                 :                :                  */
  573 pg@bowt.ie                431                 :UBC           0 :                 register_forget_request(rlocator, forknum, segno);
                                432                 :                :             }
                                433                 :                : 
 8558 tgl@sss.pgh.pa.us         434         [ +  - ]:CBC        3039 :             if (unlink(segpath) < 0)
                                435                 :                :             {
                                436                 :                :                 /* ENOENT is expected after the last segment... */
                                437         [ -  + ]:           3039 :                 if (errno != ENOENT)
 6311 tgl@sss.pgh.pa.us         438         [ #  # ]:UBC           0 :                     ereport(WARNING,
                                439                 :                :                             (errcode_for_file_access(),
                                440                 :                :                              errmsg("could not remove file \"%s\": %m", segpath)));
 8558 tgl@sss.pgh.pa.us         441                 :CBC        3039 :                 break;
                                442                 :                :             }
                                443                 :                :         }
                                444                 :          44222 :         pfree(segpath);
                                445                 :                :     }
                                446                 :                : 
                                447                 :         169924 :     pfree(path);
10141 scrappy@hub.org           448                 :         169924 : }
                                449                 :                : 
                                450                 :                : /*
                                451                 :                :  * mdextend() -- Add a block to the specified relation.
                                452                 :                :  *
                                453                 :                :  * The semantics are nearly the same as mdwrite(): write at the
                                454                 :                :  * specified position.  However, this is to be used for the case of
                                455                 :                :  * extending a relation (i.e., blocknum is at or beyond the current
                                456                 :                :  * EOF).  Note that we assume writing a block beyond current EOF
                                457                 :                :  * causes intervening file space to become filled with zeroes.
                                458                 :                :  */
                                459                 :                : void
 5725 heikki.linnakangas@i      460                 :         106809 : mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
                                461                 :                :          const void *buffer, bool skipFsync)
                                462                 :                : {
                                463                 :                :     off_t       seekpos;
                                464                 :                :     int         nbytes;
                                465                 :                :     MdfdVec    *v;
                                466                 :                : 
                                467                 :                :     /* If this build supports direct I/O, the buffer must be I/O aligned. */
                                468                 :                :     if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
  372 tmunro@postgresql.or      469         [ -  + ]:         106809 :         Assert((uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer));
                                470                 :                : 
                                471                 :                :     /* This assert is too expensive to have on normally ... */
                                472                 :                : #ifdef CHECK_WRITE_VS_EXTEND
                                473                 :                :     Assert(blocknum >= mdnblocks(reln, forknum));
                                474                 :                : #endif
                                475                 :                : 
                                476                 :                :     /*
                                477                 :                :      * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any
                                478                 :                :      * more --- we mustn't create a block whose number actually is
                                479                 :                :      * InvalidBlockNumber.  (Note that this failure should be unreachable
                                480                 :                :      * because of upstream checks in bufmgr.c.)
                                481                 :                :      */
 6311 tgl@sss.pgh.pa.us         482         [ -  + ]:         106809 :     if (blocknum == InvalidBlockNumber)
 6311 tgl@sss.pgh.pa.us         483         [ #  # ]:UBC           0 :         ereport(ERROR,
                                484                 :                :                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                                485                 :                :                  errmsg("cannot extend file \"%s\" beyond %u blocks",
                                486                 :                :                         relpath(reln->smgr_rlocator, forknum),
                                487                 :                :                         InvalidBlockNumber)));
                                488                 :                : 
 4993 rhaas@postgresql.org      489                 :CBC      106809 :     v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
                                490                 :                : 
 2489 tgl@sss.pgh.pa.us         491                 :         106809 :     seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                                492                 :                : 
 5879                           493         [ -  + ]:         106809 :     Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
                                494                 :                : 
 1985 tmunro@postgresql.or      495         [ -  + ]:         106809 :     if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_EXTEND)) != BLCKSZ)
                                496                 :                :     {
 6311 tgl@sss.pgh.pa.us         497         [ #  # ]:UBC           0 :         if (nbytes < 0)
                                498         [ #  # ]:              0 :             ereport(ERROR,
                                499                 :                :                     (errcode_for_file_access(),
                                500                 :                :                      errmsg("could not extend file \"%s\": %m",
                                501                 :                :                             FilePathName(v->mdfd_vfd)),
                                502                 :                :                      errhint("Check free disk space.")));
                                503                 :                :         /* short write: complain appropriately */
                                504         [ #  # ]:              0 :         ereport(ERROR,
                                505                 :                :                 (errcode(ERRCODE_DISK_FULL),
                                506                 :                :                  errmsg("could not extend file \"%s\": wrote only %d of %d bytes at block %u",
                                507                 :                :                         FilePathName(v->mdfd_vfd),
                                508                 :                :                         nbytes, BLCKSZ, blocknum),
                                509                 :                :                  errhint("Check free disk space.")));
                                510                 :                :     }
                                511                 :                : 
 4993 rhaas@postgresql.org      512   [ +  +  +  - ]:CBC      106809 :     if (!skipFsync && !SmgrIsTemp(reln))
 5725 heikki.linnakangas@i      513                 :             29 :         register_dirty_segment(reln, forknum, v);
                                514                 :                : 
                                515         [ -  + ]:         106809 :     Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
10141 scrappy@hub.org           516                 :         106809 : }
                                517                 :                : 
                                518                 :                : /*
                                519                 :                :  * mdzeroextend() -- Add new zeroed out blocks to the specified relation.
                                520                 :                :  *
                                521                 :                :  * Similar to mdextend(), except the relation can be extended by multiple
                                522                 :                :  * blocks at once and the added blocks will be filled with zeroes.
                                523                 :                :  */
                                524                 :                : void
  375 andres@anarazel.de        525                 :         195784 : mdzeroextend(SMgrRelation reln, ForkNumber forknum,
                                526                 :                :              BlockNumber blocknum, int nblocks, bool skipFsync)
                                527                 :                : {
                                528                 :                :     MdfdVec    *v;
                                529                 :         195784 :     BlockNumber curblocknum = blocknum;
                                530                 :         195784 :     int         remblocks = nblocks;
                                531                 :                : 
                                532         [ -  + ]:         195784 :     Assert(nblocks > 0);
                                533                 :                : 
                                534                 :                :     /* This assert is too expensive to have on normally ... */
                                535                 :                : #ifdef CHECK_WRITE_VS_EXTEND
                                536                 :                :     Assert(blocknum >= mdnblocks(reln, forknum));
                                537                 :                : #endif
                                538                 :                : 
                                539                 :                :     /*
                                540                 :                :      * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any
                                541                 :                :      * more --- we mustn't create a block whose number actually is
                                542                 :                :      * InvalidBlockNumber or larger.
                                543                 :                :      */
                                544         [ -  + ]:         195784 :     if ((uint64) blocknum + nblocks >= (uint64) InvalidBlockNumber)
  375 andres@anarazel.de        545         [ #  # ]:UBC           0 :         ereport(ERROR,
                                546                 :                :                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                                547                 :                :                  errmsg("cannot extend file \"%s\" beyond %u blocks",
                                548                 :                :                         relpath(reln->smgr_rlocator, forknum),
                                549                 :                :                         InvalidBlockNumber)));
                                550                 :                : 
  375 andres@anarazel.de        551         [ +  + ]:CBC      391568 :     while (remblocks > 0)
                                552                 :                :     {
  331 tgl@sss.pgh.pa.us         553                 :         195784 :         BlockNumber segstartblock = curblocknum % ((BlockNumber) RELSEG_SIZE);
  375 andres@anarazel.de        554                 :         195784 :         off_t       seekpos = (off_t) BLCKSZ * segstartblock;
                                555                 :                :         int         numblocks;
                                556                 :                : 
                                557         [ -  + ]:         195784 :         if (segstartblock + remblocks > RELSEG_SIZE)
  375 andres@anarazel.de        558                 :UBC           0 :             numblocks = RELSEG_SIZE - segstartblock;
                                559                 :                :         else
  375 andres@anarazel.de        560                 :CBC      195784 :             numblocks = remblocks;
                                561                 :                : 
                                562                 :         195784 :         v = _mdfd_getseg(reln, forknum, curblocknum, skipFsync, EXTENSION_CREATE);
                                563                 :                : 
                                564         [ -  + ]:         195784 :         Assert(segstartblock < RELSEG_SIZE);
                                565         [ -  + ]:         195784 :         Assert(segstartblock + numblocks <= RELSEG_SIZE);
                                566                 :                : 
                                567                 :                :         /*
                                568                 :                :          * If available and useful, use posix_fallocate() (via
                                569                 :                :          * FileFallocate()) to extend the relation. That's often more
                                570                 :                :          * efficient than using write(), as it commonly won't cause the kernel
                                571                 :                :          * to allocate page cache space for the extended pages.
                                572                 :                :          *
                                573                 :                :          * However, we don't use FileFallocate() for small extensions, as it
                                574                 :                :          * defeats delayed allocation on some filesystems. Not clear where
                                575                 :                :          * that decision should be made though? For now just use a cutoff of
                                576                 :                :          * 8, anything between 4 and 8 worked OK in some local testing.
                                577                 :                :          */
                                578         [ +  + ]:         195784 :         if (numblocks > 8)
                                579                 :                :         {
                                580                 :                :             int         ret;
                                581                 :                : 
                                582                 :            509 :             ret = FileFallocate(v->mdfd_vfd,
                                583                 :                :                                 seekpos, (off_t) BLCKSZ * numblocks,
                                584                 :                :                                 WAIT_EVENT_DATA_FILE_EXTEND);
                                585         [ -  + ]:            509 :             if (ret != 0)
                                586                 :                :             {
  375 andres@anarazel.de        587         [ #  # ]:UBC           0 :                 ereport(ERROR,
                                588                 :                :                         errcode_for_file_access(),
                                589                 :                :                         errmsg("could not extend file \"%s\" with FileFallocate(): %m",
                                590                 :                :                                FilePathName(v->mdfd_vfd)),
                                591                 :                :                         errhint("Check free disk space."));
                                592                 :                :             }
                                593                 :                :         }
                                594                 :                :         else
                                595                 :                :         {
                                596                 :                :             int         ret;
                                597                 :                : 
                                598                 :                :             /*
                                599                 :                :              * Even if we don't want to use fallocate, we can still extend a
                                600                 :                :              * bit more efficiently than writing each 8kB block individually.
                                601                 :                :              * pg_pwrite_zeros() (via FileZero()) uses pg_pwritev_with_retry()
                                602                 :                :              * to avoid multiple writes or needing a zeroed buffer for the
                                603                 :                :              * whole length of the extension.
                                604                 :                :              */
  375 andres@anarazel.de        605                 :CBC      195275 :             ret = FileZero(v->mdfd_vfd,
                                606                 :                :                            seekpos, (off_t) BLCKSZ * numblocks,
                                607                 :                :                            WAIT_EVENT_DATA_FILE_EXTEND);
                                608         [ -  + ]:         195275 :             if (ret < 0)
  375 andres@anarazel.de        609         [ #  # ]:UBC           0 :                 ereport(ERROR,
                                610                 :                :                         errcode_for_file_access(),
                                611                 :                :                         errmsg("could not extend file \"%s\": %m",
                                612                 :                :                                FilePathName(v->mdfd_vfd)),
                                613                 :                :                         errhint("Check free disk space."));
                                614                 :                :         }
                                615                 :                : 
  375 andres@anarazel.de        616   [ +  -  +  + ]:CBC      195784 :         if (!skipFsync && !SmgrIsTemp(reln))
                                617                 :         186740 :             register_dirty_segment(reln, forknum, v);
                                618                 :                : 
                                619         [ -  + ]:         195784 :         Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
                                620                 :                : 
                                621                 :         195784 :         remblocks -= numblocks;
                                622                 :         195784 :         curblocknum += numblocks;
                                623                 :                :     }
                                624                 :         195784 : }
                                625                 :                : 
                                626                 :                : /*
                                627                 :                :  * mdopenfork() -- Open one fork of the specified relation.
                                628                 :                :  *
                                629                 :                :  * Note we only open the first segment, when there are multiple segments.
                                630                 :                :  *
                                631                 :                :  * If first segment is not present, either ereport or return NULL according
                                632                 :                :  * to "behavior".  We treat EXTENSION_CREATE the same as EXTENSION_FAIL;
                                633                 :                :  * EXTENSION_CREATE means it's OK to extend an existing relation, not to
                                634                 :                :  * invent one out of whole cloth.
                                635                 :                :  */
                                636                 :                : static MdfdVec *
 1733 tmunro@postgresql.or      637                 :        3245026 : mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior)
                                638                 :                : {
                                639                 :                :     MdfdVec    *mdfd;
                                640                 :                :     char       *path;
                                641                 :                :     File        fd;
                                642                 :                : 
                                643                 :                :     /* No work if already open */
 2775 andres@anarazel.de        644         [ +  + ]:        3245026 :     if (reln->md_num_open_segs[forknum] > 0)
                                645                 :        2180455 :         return &reln->md_seg_fds[forknum][0];
                                646                 :                : 
  648 rhaas@postgresql.org      647                 :        1064571 :     path = relpath(reln->smgr_rlocator, forknum);
                                648                 :                : 
  372 tmunro@postgresql.or      649                 :        1064571 :     fd = PathNameOpenFile(path, _mdfd_open_flags());
                                650                 :                : 
 9716 bruce@momjian.us          651         [ +  + ]:        1064571 :     if (fd < 0)
                                652                 :                :     {
 1903 akapila@postgresql.o      653         [ +  + ]:         359003 :         if ((behavior & EXTENSION_RETURN_NULL) &&
                                654         [ +  - ]:         358981 :             FILE_POSSIBLY_DELETED(errno))
                                655                 :                :         {
                                656                 :         358981 :             pfree(path);
                                657                 :         358981 :             return NULL;
                                658                 :                :         }
                                659         [ +  - ]:             22 :         ereport(ERROR,
                                660                 :                :                 (errcode_for_file_access(),
                                661                 :                :                  errmsg("could not open file \"%s\": %m", path)));
                                662                 :                :     }
                                663                 :                : 
 8558 tgl@sss.pgh.pa.us         664                 :         705568 :     pfree(path);
                                665                 :                : 
 2775 andres@anarazel.de        666                 :         705568 :     _fdvec_resize(reln, forknum, 1);
                                667                 :         705568 :     mdfd = &reln->md_seg_fds[forknum][0];
 7258 tgl@sss.pgh.pa.us         668                 :         705568 :     mdfd->mdfd_vfd = fd;
                                669                 :         705568 :     mdfd->mdfd_segno = 0;
                                670                 :                : 
 5725 heikki.linnakangas@i      671         [ -  + ]:         705568 :     Assert(_mdnblocks(reln, forknum, mdfd) <= ((BlockNumber) RELSEG_SIZE));
                                672                 :                : 
 7258 tgl@sss.pgh.pa.us         673                 :         705568 :     return mdfd;
                                674                 :                : }
                                675                 :                : 
                                676                 :                : /*
                                677                 :                :  * mdopen() -- Initialize newly-opened relation.
                                678                 :                :  */
                                679                 :                : void
 1733 tmunro@postgresql.or      680                 :         976733 : mdopen(SMgrRelation reln)
                                681                 :                : {
                                682                 :                :     /* mark it not open */
                                683         [ +  + ]:        4883665 :     for (int forknum = 0; forknum <= MAX_FORKNUM; forknum++)
                                684                 :        3906932 :         reln->md_num_open_segs[forknum] = 0;
                                685                 :         976733 : }
                                686                 :                : 
                                687                 :                : /*
                                688                 :                :  * mdclose() -- Close the specified relation, if it isn't closed already.
                                689                 :                :  */
                                690                 :                : void
 5725 heikki.linnakangas@i      691                 :        3357189 : mdclose(SMgrRelation reln, ForkNumber forknum)
                                692                 :                : {
 2775 andres@anarazel.de        693                 :        3357189 :     int         nopensegs = reln->md_num_open_segs[forknum];
                                694                 :                : 
                                695                 :                :     /* No work if already closed */
                                696         [ +  + ]:        3357189 :     if (nopensegs == 0)
 6311 tgl@sss.pgh.pa.us         697                 :        2882951 :         return;
                                698                 :                : 
                                699                 :                :     /* close segments starting from the end */
 2775 andres@anarazel.de        700         [ +  + ]:         948476 :     while (nopensegs > 0)
                                701                 :                :     {
                                702                 :         474238 :         MdfdVec    *v = &reln->md_seg_fds[forknum][nopensegs - 1];
                                703                 :                : 
 1556 noah@leadboat.com         704                 :         474238 :         FileClose(v->mdfd_vfd);
                                705                 :         474238 :         _fdvec_resize(reln, forknum, nopensegs - 1);
 2775 andres@anarazel.de        706                 :         474238 :         nopensegs--;
                                707                 :                :     }
                                708                 :                : }
                                709                 :                : 
                                710                 :                : /*
                                711                 :                :  * mdprefetch() -- Initiate asynchronous read of the specified blocks of a relation
                                712                 :                :  */
                                713                 :                : bool
  120 tmunro@postgresql.or      714                 :GNC      137257 : mdprefetch(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
                                715                 :                :            int nblocks)
                                716                 :                : {
                                717                 :                : #ifdef USE_PREFETCH
                                718                 :                : 
  372 tmunro@postgresql.or      719         [ -  + ]:CBC      137257 :     Assert((io_direct_flags & IO_DIRECT_DATA) == 0);
                                720                 :                : 
  120 tmunro@postgresql.or      721         [ -  + ]:GNC      137257 :     if ((uint64) blocknum + nblocks > (uint64) MaxBlockNumber + 1)
 1467 tmunro@postgresql.or      722                 :UBC           0 :         return false;
                                723                 :                : 
  120 tmunro@postgresql.or      724         [ +  + ]:GNC      274514 :     while (nblocks > 0)
                                725                 :                :     {
                                726                 :                :         off_t       seekpos;
                                727                 :                :         MdfdVec    *v;
                                728                 :                :         int         nblocks_this_segment;
                                729                 :                : 
                                730                 :         137257 :         v = _mdfd_getseg(reln, forknum, blocknum, false,
                                731         [ +  + ]:         137257 :                          InRecovery ? EXTENSION_RETURN_NULL : EXTENSION_FAIL);
                                732         [ -  + ]:         137257 :         if (v == NULL)
  120 tmunro@postgresql.or      733                 :UNC           0 :             return false;
                                734                 :                : 
  120 tmunro@postgresql.or      735                 :GNC      137257 :         seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                                736                 :                : 
                                737         [ -  + ]:         137257 :         Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
                                738                 :                : 
                                739                 :         137257 :         nblocks_this_segment =
                                740                 :         137257 :             Min(nblocks,
                                741                 :                :                 RELSEG_SIZE - (blocknum % ((BlockNumber) RELSEG_SIZE)));
                                742                 :                : 
                                743                 :         137257 :         (void) FilePrefetch(v->mdfd_vfd, seekpos, BLCKSZ * nblocks_this_segment,
                                744                 :                :                             WAIT_EVENT_DATA_FILE_PREFETCH);
                                745                 :                : 
                                746                 :         137257 :         blocknum += nblocks_this_segment;
                                747                 :         137257 :         nblocks -= nblocks_this_segment;
                                748                 :                :     }
                                749                 :                : #endif                          /* USE_PREFETCH */
                                750                 :                : 
 1467 tmunro@postgresql.or      751                 :CBC      137257 :     return true;
                                752                 :                : }
                                753                 :                : 
                                754                 :                : /*
                                755                 :                :  * Convert an array of buffer address into an array of iovec objects, and
                                756                 :                :  * return the number that were required.  'iov' must have enough space for up
                                757                 :                :  * to 'nblocks' elements, but the number used may be less depending on
                                758                 :                :  * merging.  In the case of a run of fully contiguous buffers, a single iovec
                                759                 :                :  * will be populated that can be handled as a plain non-vectored I/O.
                                760                 :                :  */
                                761                 :                : static int
  118 tmunro@postgresql.or      762                 :GNC     1636798 : buffers_to_iovec(struct iovec *iov, void **buffers, int nblocks)
                                763                 :                : {
                                764                 :                :     struct iovec *iovp;
                                765                 :                :     int         iovcnt;
                                766                 :                : 
                                767         [ -  + ]:        1636798 :     Assert(nblocks >= 1);
                                768                 :                : 
                                769                 :                :     /* If this build supports direct I/O, buffers must be I/O aligned. */
                                770         [ +  + ]:        3311021 :     for (int i = 0; i < nblocks; ++i)
                                771                 :                :     {
                                772                 :                :         if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
                                773         [ -  + ]:        1674223 :             Assert((uintptr_t) buffers[i] ==
                                774                 :                :                    TYPEALIGN(PG_IO_ALIGN_SIZE, buffers[i]));
                                775                 :                :     }
                                776                 :                : 
                                777                 :                :     /* Start the first iovec off with the first buffer. */
                                778                 :        1636798 :     iovp = &iov[0];
                                779                 :        1636798 :     iovp->iov_base = buffers[0];
                                780                 :        1636798 :     iovp->iov_len = BLCKSZ;
                                781                 :        1636798 :     iovcnt = 1;
                                782                 :                : 
                                783                 :                :     /* Try to merge the rest. */
                                784         [ +  + ]:        1674223 :     for (int i = 1; i < nblocks; ++i)
                                785                 :                :     {
                                786                 :          37425 :         void       *buffer = buffers[i];
                                787                 :                : 
                                788         [ +  + ]:          37425 :         if (((char *) iovp->iov_base + iovp->iov_len) == buffer)
                                789                 :                :         {
                                790                 :                :             /* Contiguous with the last iovec. */
                                791                 :          35683 :             iovp->iov_len += BLCKSZ;
                                792                 :                :         }
                                793                 :                :         else
                                794                 :                :         {
                                795                 :                :             /* Need a new iovec. */
                                796                 :           1742 :             iovp++;
                                797                 :           1742 :             iovp->iov_base = buffer;
                                798                 :           1742 :             iovp->iov_len = BLCKSZ;
                                799                 :           1742 :             iovcnt++;
                                800                 :                :         }
                                801                 :                :     }
                                802                 :                : 
                                803                 :        1636798 :     return iovcnt;
                                804                 :                : }
                                805                 :                : 
                                806                 :                : /*
                                807                 :                :  * mdreadv() -- Read the specified blocks from a relation.
                                808                 :                :  */
                                809                 :                : void
                                810                 :        1105525 : mdreadv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
                                811                 :                :         void **buffers, BlockNumber nblocks)
                                812                 :                : {
                                813         [ +  + ]:        2211035 :     while (nblocks > 0)
                                814                 :                :     {
                                815                 :                :         struct iovec iov[PG_IOV_MAX];
                                816                 :                :         int         iovcnt;
                                817                 :                :         off_t       seekpos;
                                818                 :                :         int         nbytes;
                                819                 :                :         MdfdVec    *v;
                                820                 :                :         BlockNumber nblocks_this_segment;
                                821                 :                :         size_t      transferred_this_segment;
                                822                 :                :         size_t      size_this_segment;
                                823                 :                : 
                                824                 :        1105525 :         v = _mdfd_getseg(reln, forknum, blocknum, false,
                                825                 :                :                          EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY);
                                826                 :                : 
                                827                 :        1105510 :         seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                                828                 :                : 
                                829         [ -  + ]:        1105510 :         Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
                                830                 :                : 
                                831                 :        1105510 :         nblocks_this_segment =
                                832                 :        1105510 :             Min(nblocks,
                                833                 :                :                 RELSEG_SIZE - (blocknum % ((BlockNumber) RELSEG_SIZE)));
                                834                 :        1105510 :         nblocks_this_segment = Min(nblocks_this_segment, lengthof(iov));
                                835                 :                : 
                                836                 :        1105510 :         iovcnt = buffers_to_iovec(iov, buffers, nblocks_this_segment);
                                837                 :        1105510 :         size_this_segment = nblocks_this_segment * BLCKSZ;
                                838                 :        1105510 :         transferred_this_segment = 0;
                                839                 :                : 
                                840                 :                :         /*
                                841                 :                :          * Inner loop to continue after a short read.  We'll keep going until
                                842                 :                :          * we hit EOF rather than assuming that a short read means we hit the
                                843                 :                :          * end.
                                844                 :                :          */
                                845                 :                :         for (;;)
                                846                 :                :         {
  118 tmunro@postgresql.or      847                 :UNC           0 :             TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum,
                                848                 :                :                                                 reln->smgr_rlocator.locator.spcOid,
                                849                 :                :                                                 reln->smgr_rlocator.locator.dbOid,
                                850                 :                :                                                 reln->smgr_rlocator.locator.relNumber,
                                851                 :                :                                                 reln->smgr_rlocator.backend);
  118 tmunro@postgresql.or      852                 :GNC     1105510 :             nbytes = FileReadV(v->mdfd_vfd, iov, iovcnt, seekpos,
                                853                 :                :                                WAIT_EVENT_DATA_FILE_READ);
                                854                 :                :             TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,
                                855                 :                :                                                reln->smgr_rlocator.locator.spcOid,
                                856                 :                :                                                reln->smgr_rlocator.locator.dbOid,
                                857                 :                :                                                reln->smgr_rlocator.locator.relNumber,
                                858                 :                :                                                reln->smgr_rlocator.backend,
                                859                 :                :                                                nbytes,
                                860                 :                :                                                size_this_segment - transferred_this_segment);
                                861                 :                : 
                                862                 :                : #ifdef SIMULATE_SHORT_READ
                                863                 :                :             nbytes = Min(nbytes, 4096);
                                864                 :                : #endif
                                865                 :                : 
                                866         [ -  + ]:        1105510 :             if (nbytes < 0)
  118 tmunro@postgresql.or      867         [ #  # ]:UNC           0 :                 ereport(ERROR,
                                868                 :                :                         (errcode_for_file_access(),
                                869                 :                :                          errmsg("could not read blocks %u..%u in file \"%s\": %m",
                                870                 :                :                                 blocknum,
                                871                 :                :                                 blocknum + nblocks_this_segment - 1,
                                872                 :                :                                 FilePathName(v->mdfd_vfd))));
                                873                 :                : 
  118 tmunro@postgresql.or      874         [ -  + ]:GNC     1105510 :             if (nbytes == 0)
                                875                 :                :             {
                                876                 :                :                 /*
                                877                 :                :                  * We are at or past EOF, or we read a partial block at EOF.
                                878                 :                :                  * Normally this is an error; upper levels should never try to
                                879                 :                :                  * read a nonexistent block.  However, if zero_damaged_pages
                                880                 :                :                  * is ON or we are InRecovery, we should instead return zeroes
                                881                 :                :                  * without complaining.  This allows, for example, the case of
                                882                 :                :                  * trying to update a block that was later truncated away.
                                883                 :                :                  */
  118 tmunro@postgresql.or      884   [ #  #  #  # ]:UNC           0 :                 if (zero_damaged_pages || InRecovery)
                                885                 :                :                 {
                                886                 :              0 :                     for (BlockNumber i = transferred_this_segment / BLCKSZ;
                                887         [ #  # ]:              0 :                          i < nblocks_this_segment;
                                888                 :              0 :                          ++i)
                                889                 :              0 :                         memset(buffers[i], 0, BLCKSZ);
                                890                 :              0 :                     break;
                                891                 :                :                 }
                                892                 :                :                 else
                                893         [ #  # ]:              0 :                     ereport(ERROR,
                                894                 :                :                             (errcode(ERRCODE_DATA_CORRUPTED),
                                895                 :                :                              errmsg("could not read blocks %u..%u in file \"%s\": read only %zu of %zu bytes",
                                896                 :                :                                     blocknum,
                                897                 :                :                                     blocknum + nblocks_this_segment - 1,
                                898                 :                :                                     FilePathName(v->mdfd_vfd),
                                899                 :                :                                     transferred_this_segment,
                                900                 :                :                                     size_this_segment)));
                                901                 :                :             }
                                902                 :                : 
                                903                 :                :             /* One loop should usually be enough. */
  118 tmunro@postgresql.or      904                 :GNC     1105510 :             transferred_this_segment += nbytes;
                                905         [ -  + ]:        1105510 :             Assert(transferred_this_segment <= size_this_segment);
                                906         [ +  - ]:        1105510 :             if (transferred_this_segment == size_this_segment)
                                907                 :        1105510 :                 break;
                                908                 :                : 
                                909                 :                :             /* Adjust position and vectors after a short read. */
  118 tmunro@postgresql.or      910                 :UNC           0 :             seekpos += nbytes;
                                911                 :              0 :             iovcnt = compute_remaining_iovec(iov, iov, iovcnt, nbytes);
                                912                 :                :         }
                                913                 :                : 
  118 tmunro@postgresql.or      914                 :GNC     1105510 :         nblocks -= nblocks_this_segment;
                                915                 :        1105510 :         buffers += nblocks_this_segment;
                                916                 :        1105510 :         blocknum += nblocks_this_segment;
                                917                 :                :     }
10141 scrappy@hub.org           918                 :CBC     1105510 : }
                                919                 :                : 
                                920                 :                : /*
                                921                 :                :  * mdwritev() -- Write the supplied blocks at the appropriate location.
                                922                 :                :  *
                                923                 :                :  * This is to be used only for updating already-existing blocks of a
                                924                 :                :  * relation (ie, those before the current EOF).  To extend a relation,
                                925                 :                :  * use mdextend().
                                926                 :                :  */
                                927                 :                : void
  118 tmunro@postgresql.or      928                 :GNC      531288 : mdwritev(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
                                929                 :                :          const void **buffers, BlockNumber nblocks, bool skipFsync)
                                930                 :                : {
                                931                 :                :     /* This assert is too expensive to have on normally ... */
                                932                 :                : #ifdef CHECK_WRITE_VS_EXTEND
                                933                 :                :     Assert(blocknum < mdnblocks(reln, forknum));
                                934                 :                : #endif
                                935                 :                : 
                                936         [ +  + ]:        1062576 :     while (nblocks > 0)
                                937                 :                :     {
                                938                 :                :         struct iovec iov[PG_IOV_MAX];
                                939                 :                :         int         iovcnt;
                                940                 :                :         off_t       seekpos;
                                941                 :                :         int         nbytes;
                                942                 :                :         MdfdVec    *v;
                                943                 :                :         BlockNumber nblocks_this_segment;
                                944                 :                :         size_t      transferred_this_segment;
                                945                 :                :         size_t      size_this_segment;
                                946                 :                : 
                                947                 :         531288 :         v = _mdfd_getseg(reln, forknum, blocknum, skipFsync,
                                948                 :                :                          EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY);
                                949                 :                : 
                                950                 :         531288 :         seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                                951                 :                : 
                                952         [ -  + ]:         531288 :         Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
                                953                 :                : 
                                954                 :         531288 :         nblocks_this_segment =
                                955                 :         531288 :             Min(nblocks,
                                956                 :                :                 RELSEG_SIZE - (blocknum % ((BlockNumber) RELSEG_SIZE)));
                                957                 :         531288 :         nblocks_this_segment = Min(nblocks_this_segment, lengthof(iov));
                                958                 :                : 
                                959                 :         531288 :         iovcnt = buffers_to_iovec(iov, (void **) buffers, nblocks_this_segment);
                                960                 :         531288 :         size_this_segment = nblocks_this_segment * BLCKSZ;
                                961                 :         531288 :         transferred_this_segment = 0;
                                962                 :                : 
                                963                 :                :         /*
                                964                 :                :          * Inner loop to continue after a short write.  If the reason is that
                                965                 :                :          * we're out of disk space, a future attempt should get an ENOSPC
                                966                 :                :          * error from the kernel.
                                967                 :                :          */
                                968                 :                :         for (;;)
                                969                 :                :         {
  118 tmunro@postgresql.or      970                 :UNC           0 :             TRACE_POSTGRESQL_SMGR_MD_WRITE_START(forknum, blocknum,
                                971                 :                :                                                  reln->smgr_rlocator.locator.spcOid,
                                972                 :                :                                                  reln->smgr_rlocator.locator.dbOid,
                                973                 :                :                                                  reln->smgr_rlocator.locator.relNumber,
                                974                 :                :                                                  reln->smgr_rlocator.backend);
  118 tmunro@postgresql.or      975                 :GNC      531288 :             nbytes = FileWriteV(v->mdfd_vfd, iov, iovcnt, seekpos,
                                976                 :                :                                 WAIT_EVENT_DATA_FILE_WRITE);
                                977                 :                :             TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum,
                                978                 :                :                                                 reln->smgr_rlocator.locator.spcOid,
                                979                 :                :                                                 reln->smgr_rlocator.locator.dbOid,
                                980                 :                :                                                 reln->smgr_rlocator.locator.relNumber,
                                981                 :                :                                                 reln->smgr_rlocator.backend,
                                982                 :                :                                                 nbytes,
                                983                 :                :                                                 size_this_segment - transferred_this_segment);
                                984                 :                : 
                                985                 :                : #ifdef SIMULATE_SHORT_WRITE
                                986                 :                :             nbytes = Min(nbytes, 4096);
                                987                 :                : #endif
                                988                 :                : 
                                989         [ -  + ]:         531288 :             if (nbytes < 0)
                                990                 :                :             {
  118 tmunro@postgresql.or      991                 :UNC           0 :                 bool        enospc = errno == ENOSPC;
                                992                 :                : 
                                993   [ #  #  #  # ]:              0 :                 ereport(ERROR,
                                994                 :                :                         (errcode_for_file_access(),
                                995                 :                :                          errmsg("could not write blocks %u..%u in file \"%s\": %m",
                                996                 :                :                                 blocknum,
                                997                 :                :                                 blocknum + nblocks_this_segment - 1,
                                998                 :                :                                 FilePathName(v->mdfd_vfd)),
                                999                 :                :                          enospc ? errhint("Check free disk space.") : 0));
                               1000                 :                :             }
                               1001                 :                : 
                               1002                 :                :             /* One loop should usually be enough. */
  118 tmunro@postgresql.or     1003                 :GNC      531288 :             transferred_this_segment += nbytes;
                               1004         [ -  + ]:         531288 :             Assert(transferred_this_segment <= size_this_segment);
                               1005         [ +  - ]:         531288 :             if (transferred_this_segment == size_this_segment)
                               1006                 :         531288 :                 break;
                               1007                 :                : 
                               1008                 :                :             /* Adjust position and iovecs after a short write. */
  118 tmunro@postgresql.or     1009                 :UNC           0 :             seekpos += nbytes;
                               1010                 :              0 :             iovcnt = compute_remaining_iovec(iov, iov, iovcnt, nbytes);
                               1011                 :                :         }
                               1012                 :                : 
  118 tmunro@postgresql.or     1013   [ +  +  +  + ]:GNC      531288 :         if (!skipFsync && !SmgrIsTemp(reln))
                               1014                 :         530338 :             register_dirty_segment(reln, forknum, v);
                               1015                 :                : 
                               1016                 :         531288 :         nblocks -= nblocks_this_segment;
                               1017                 :         531288 :         buffers += nblocks_this_segment;
                               1018                 :         531288 :         blocknum += nblocks_this_segment;
                               1019                 :                :     }
 8771 tgl@sss.pgh.pa.us        1020                 :CBC      531288 : }
                               1021                 :                : 
                               1022                 :                : 
                               1023                 :                : /*
                               1024                 :                :  * mdwriteback() -- Tell the kernel to write pages back to storage.
                               1025                 :                :  *
                               1026                 :                :  * This accepts a range of blocks because flushing several pages at once is
                               1027                 :                :  * considerably more efficient than doing so individually.
                               1028                 :                :  */
                               1029                 :                : void
  331 peter@eisentraut.org     1030                 :          85655 : mdwriteback(SMgrRelation reln, ForkNumber forknum,
                               1031                 :                :             BlockNumber blocknum, BlockNumber nblocks)
                               1032                 :                : {
                               1033         [ -  + ]:          85655 :     Assert((io_direct_flags & IO_DIRECT_DATA) == 0);
                               1034                 :                : 
                               1035                 :                :     /*
                               1036                 :                :      * Issue flush requests in as few requests as possible; have to split at
                               1037                 :                :      * segment boundaries though, since those are actually separate files.
                               1038                 :                :      */
                               1039         [ +  + ]:         170842 :     while (nblocks > 0)
                               1040                 :                :     {
                               1041                 :          85655 :         BlockNumber nflush = nblocks;
                               1042                 :                :         off_t       seekpos;
                               1043                 :                :         MdfdVec    *v;
                               1044                 :                :         int         segnum_start,
                               1045                 :                :                     segnum_end;
                               1046                 :                : 
                               1047                 :          85655 :         v = _mdfd_getseg(reln, forknum, blocknum, true /* not used */ ,
                               1048                 :                :                          EXTENSION_DONT_OPEN);
                               1049                 :                : 
                               1050                 :                :         /*
                               1051                 :                :          * We might be flushing buffers of already removed relations, that's
                               1052                 :                :          * ok, just ignore that case.  If the segment file wasn't open already
                               1053                 :                :          * (ie from a recent mdwrite()), then we don't want to re-open it, to
                               1054                 :                :          * avoid a race with PROCSIGNAL_BARRIER_SMGRRELEASE that might leave
                               1055                 :                :          * us with a descriptor to a file that is about to be unlinked.
                               1056                 :                :          */
                               1057         [ +  + ]:          85655 :         if (!v)
                               1058                 :            468 :             return;
                               1059                 :                : 
                               1060                 :                :         /* compute offset inside the current segment */
                               1061                 :          85187 :         segnum_start = blocknum / RELSEG_SIZE;
                               1062                 :                : 
                               1063                 :                :         /* compute number of desired writes within the current segment */
                               1064                 :          85187 :         segnum_end = (blocknum + nblocks - 1) / RELSEG_SIZE;
                               1065         [ -  + ]:          85187 :         if (segnum_start != segnum_end)
  331 peter@eisentraut.org     1066                 :UBC           0 :             nflush = RELSEG_SIZE - (blocknum % ((BlockNumber) RELSEG_SIZE));
                               1067                 :                : 
  331 peter@eisentraut.org     1068         [ -  + ]:CBC       85187 :         Assert(nflush >= 1);
                               1069         [ -  + ]:          85187 :         Assert(nflush <= nblocks);
                               1070                 :                : 
                               1071                 :          85187 :         seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                               1072                 :                : 
                               1073                 :          85187 :         FileWriteback(v->mdfd_vfd, seekpos, (off_t) BLCKSZ * nflush, WAIT_EVENT_DATA_FILE_FLUSH);
                               1074                 :                : 
                               1075                 :          85187 :         nblocks -= nflush;
                               1076                 :          85187 :         blocknum += nflush;
                               1077                 :                :     }
                               1078                 :                : }
                               1079                 :                : 
                               1080                 :                : /*
                               1081                 :                :  * mdnblocks() -- Get the number of blocks stored in a relation.
                               1082                 :                :  *
                               1083                 :                :  * Important side effect: all active segments of the relation are opened
                               1084                 :                :  * and added to the md_seg_fds array.  If this routine has not been
                               1085                 :                :  * called, then only segments up to the last one actually touched
                               1086                 :                :  * are present in the array.
                               1087                 :                :  */
                               1088                 :                : BlockNumber
 5725 heikki.linnakangas@i     1089                 :        1937888 : mdnblocks(SMgrRelation reln, ForkNumber forknum)
                               1090                 :                : {
                               1091                 :                :     MdfdVec    *v;
                               1092                 :                :     BlockNumber nblocks;
                               1093                 :                :     BlockNumber segno;
                               1094                 :                : 
 1319 bruce@momjian.us         1095                 :        1937888 :     mdopenfork(reln, forknum, EXTENSION_FAIL);
                               1096                 :                : 
                               1097                 :                :     /* mdopen has opened the first segment */
 2775 andres@anarazel.de       1098         [ -  + ]:        1937869 :     Assert(reln->md_num_open_segs[forknum] > 0);
                               1099                 :                : 
                               1100                 :                :     /*
                               1101                 :                :      * Start from the last open segments, to avoid redundant seeks.  We have
                               1102                 :                :      * previously verified that these segments are exactly RELSEG_SIZE long,
                               1103                 :                :      * and it's useless to recheck that each time.
                               1104                 :                :      *
                               1105                 :                :      * NOTE: this assumption could only be wrong if another backend has
                               1106                 :                :      * truncated the relation.  We rely on higher code levels to handle that
                               1107                 :                :      * scenario by closing and re-opening the md fd, which is handled via
                               1108                 :                :      * relcache flush.  (Since the checkpointer doesn't participate in
                               1109                 :                :      * relcache flush, it could have segment entries for inactive segments;
                               1110                 :                :      * that's OK because the checkpointer never needs to compute relation
                               1111                 :                :      * size.)
                               1112                 :                :      */
                               1113                 :        1937869 :     segno = reln->md_num_open_segs[forknum] - 1;
                               1114                 :        1937869 :     v = &reln->md_seg_fds[forknum][segno];
                               1115                 :                : 
                               1116                 :                :     for (;;)
                               1117                 :                :     {
 5725 heikki.linnakangas@i     1118                 :UBC           0 :         nblocks = _mdnblocks(reln, forknum, v);
 8327 tgl@sss.pgh.pa.us        1119         [ -  + ]:CBC     1937869 :         if (nblocks > ((BlockNumber) RELSEG_SIZE))
 7570 tgl@sss.pgh.pa.us        1120         [ #  # ]:UBC           0 :             elog(FATAL, "segment too big");
 8327 tgl@sss.pgh.pa.us        1121         [ +  - ]:CBC     1937869 :         if (nblocks < ((BlockNumber) RELSEG_SIZE))
                               1122                 :        1937869 :             return (segno * ((BlockNumber) RELSEG_SIZE)) + nblocks;
                               1123                 :                : 
                               1124                 :                :         /*
                               1125                 :                :          * If segment is exactly RELSEG_SIZE, advance to next one.
                               1126                 :                :          */
 8375 tgl@sss.pgh.pa.us        1127                 :UBC           0 :         segno++;
                               1128                 :                : 
                               1129                 :                :         /*
                               1130                 :                :          * We used to pass O_CREAT here, but that has the disadvantage that it
                               1131                 :                :          * might create a segment which has vanished through some operating
                               1132                 :                :          * system misadventure.  In such a case, creating the segment here
                               1133                 :                :          * undermines _mdfd_getseg's attempts to notice and report an error
                               1134                 :                :          * upon access to a missing segment.
                               1135                 :                :          */
 2775 andres@anarazel.de       1136                 :              0 :         v = _mdfd_openseg(reln, forknum, segno, 0);
                               1137         [ #  # ]:              0 :         if (v == NULL)
                               1138                 :              0 :             return segno * ((BlockNumber) RELSEG_SIZE);
                               1139                 :                :     }
                               1140                 :                : }
                               1141                 :                : 
                               1142                 :                : /*
                               1143                 :                :  * mdtruncate() -- Truncate relation to specified number of blocks.
                               1144                 :                :  */
                               1145                 :                : void
 4993 rhaas@postgresql.org     1146                 :CBC         851 : mdtruncate(SMgrRelation reln, ForkNumber forknum, BlockNumber nblocks)
                               1147                 :                : {
                               1148                 :                :     BlockNumber curnblk;
                               1149                 :                :     BlockNumber priorblocks;
                               1150                 :                :     int         curopensegs;
                               1151                 :                : 
                               1152                 :                :     /*
                               1153                 :                :      * NOTE: mdnblocks makes sure we have opened all active segments, so that
                               1154                 :                :      * truncation loop will get them all!
                               1155                 :                :      */
 5725 heikki.linnakangas@i     1156                 :            851 :     curnblk = mdnblocks(reln, forknum);
 8327 tgl@sss.pgh.pa.us        1157         [ -  + ]:            851 :     if (nblocks > curnblk)
                               1158                 :                :     {
                               1159                 :                :         /* Bogus request ... but no complaint if InRecovery */
 6311 tgl@sss.pgh.pa.us        1160         [ #  # ]:UBC           0 :         if (InRecovery)
                               1161                 :              0 :             return;
                               1162         [ #  # ]:              0 :         ereport(ERROR,
                               1163                 :                :                 (errmsg("could not truncate file \"%s\" to %u blocks: it's only %u blocks now",
                               1164                 :                :                         relpath(reln->smgr_rlocator, forknum),
                               1165                 :                :                         nblocks, curnblk)));
                               1166                 :                :     }
 8991 tgl@sss.pgh.pa.us        1167         [ +  + ]:CBC         851 :     if (nblocks == curnblk)
 6311                          1168                 :            340 :         return;                 /* no work */
                               1169                 :                : 
                               1170                 :                :     /*
                               1171                 :                :      * Truncate segments, starting at the last one. Starting at the end makes
                               1172                 :                :      * managing the memory for the fd array easier, should there be errors.
                               1173                 :                :      */
 2775 andres@anarazel.de       1174                 :            511 :     curopensegs = reln->md_num_open_segs[forknum];
                               1175         [ +  + ]:           1022 :     while (curopensegs > 0)
                               1176                 :                :     {
                               1177                 :                :         MdfdVec    *v;
                               1178                 :                : 
                               1179                 :            511 :         priorblocks = (curopensegs - 1) * RELSEG_SIZE;
                               1180                 :                : 
                               1181                 :            511 :         v = &reln->md_seg_fds[forknum][curopensegs - 1];
                               1182                 :                : 
 8991 tgl@sss.pgh.pa.us        1183         [ -  + ]:            511 :         if (priorblocks > nblocks)
                               1184                 :                :         {
                               1185                 :                :             /*
                               1186                 :                :              * This segment is no longer active. We truncate the file, but do
                               1187                 :                :              * not delete it, for reasons explained in the header comments.
                               1188                 :                :              */
 2584 rhaas@postgresql.org     1189         [ #  # ]:UBC           0 :             if (FileTruncate(v->mdfd_vfd, 0, WAIT_EVENT_DATA_FILE_TRUNCATE) < 0)
 6311 tgl@sss.pgh.pa.us        1190         [ #  # ]:              0 :                 ereport(ERROR,
                               1191                 :                :                         (errcode_for_file_access(),
                               1192                 :                :                          errmsg("could not truncate file \"%s\": %m",
                               1193                 :                :                                 FilePathName(v->mdfd_vfd))));
                               1194                 :                : 
 4993 rhaas@postgresql.org     1195         [ #  # ]:              0 :             if (!SmgrIsTemp(reln))
 5725 heikki.linnakangas@i     1196                 :              0 :                 register_dirty_segment(reln, forknum, v);
                               1197                 :                : 
                               1198                 :                :             /* we never drop the 1st segment */
 2775 andres@anarazel.de       1199         [ #  # ]:              0 :             Assert(v != &reln->md_seg_fds[forknum][0]);
                               1200                 :                : 
                               1201                 :              0 :             FileClose(v->mdfd_vfd);
                               1202                 :              0 :             _fdvec_resize(reln, forknum, curopensegs - 1);
                               1203                 :                :         }
 8327 tgl@sss.pgh.pa.us        1204         [ +  - ]:CBC         511 :         else if (priorblocks + ((BlockNumber) RELSEG_SIZE) > nblocks)
                               1205                 :                :         {
                               1206                 :                :             /*
                               1207                 :                :              * This is the last segment we want to keep. Truncate the file to
                               1208                 :                :              * the right length. NOTE: if nblocks is exactly a multiple K of
                               1209                 :                :              * RELSEG_SIZE, we will truncate the K+1st segment to 0 length but
                               1210                 :                :              * keep it. This adheres to the invariant given in the header
                               1211                 :                :              * comments.
                               1212                 :                :              */
 8207 bruce@momjian.us         1213                 :            511 :             BlockNumber lastsegblocks = nblocks - priorblocks;
                               1214                 :                : 
 2584 rhaas@postgresql.org     1215         [ -  + ]:            511 :             if (FileTruncate(v->mdfd_vfd, (off_t) lastsegblocks * BLCKSZ, WAIT_EVENT_DATA_FILE_TRUNCATE) < 0)
 6311 tgl@sss.pgh.pa.us        1216         [ #  # ]:UBC           0 :                 ereport(ERROR,
                               1217                 :                :                         (errcode_for_file_access(),
                               1218                 :                :                          errmsg("could not truncate file \"%s\" to %u blocks: %m",
                               1219                 :                :                                 FilePathName(v->mdfd_vfd),
                               1220                 :                :                                 nblocks)));
 4993 rhaas@postgresql.org     1221         [ +  + ]:CBC         511 :             if (!SmgrIsTemp(reln))
 5725 heikki.linnakangas@i     1222                 :            374 :                 register_dirty_segment(reln, forknum, v);
                               1223                 :                :         }
                               1224                 :                :         else
                               1225                 :                :         {
                               1226                 :                :             /*
                               1227                 :                :              * We still need this segment, so nothing to do for this and any
                               1228                 :                :              * earlier segment.
                               1229                 :                :              */
 2775 andres@anarazel.de       1230                 :UBC           0 :             break;
                               1231                 :                :         }
 2775 andres@anarazel.de       1232                 :CBC         511 :         curopensegs--;
                               1233                 :                :     }
                               1234                 :                : }
                               1235                 :                : 
                               1236                 :                : /*
                               1237                 :                :  * mdregistersync() -- Mark whole relation as needing fsync
                               1238                 :                :  */
                               1239                 :                : void
   51 heikki.linnakangas@i     1240                 :GNC       21978 : mdregistersync(SMgrRelation reln, ForkNumber forknum)
                               1241                 :                : {
                               1242                 :                :     int         segno;
                               1243                 :                :     int         min_inactive_seg;
                               1244                 :                : 
                               1245                 :                :     /*
                               1246                 :                :      * NOTE: mdnblocks makes sure we have opened all active segments, so that
                               1247                 :                :      * the loop below will get them all!
                               1248                 :                :      */
                               1249                 :          21978 :     mdnblocks(reln, forknum);
                               1250                 :                : 
                               1251                 :          21978 :     min_inactive_seg = segno = reln->md_num_open_segs[forknum];
                               1252                 :                : 
                               1253                 :                :     /*
                               1254                 :                :      * Temporarily open inactive segments, then close them after sync.  There
                               1255                 :                :      * may be some inactive segments left opened after error, but that is
                               1256                 :                :      * harmless.  We don't bother to clean them up and take a risk of further
                               1257                 :                :      * trouble.  The next mdclose() will soon close them.
                               1258                 :                :      */
                               1259         [ -  + ]:          21978 :     while (_mdfd_openseg(reln, forknum, segno, 0) != NULL)
   51 heikki.linnakangas@i     1260                 :UNC           0 :         segno++;
                               1261                 :                : 
   51 heikki.linnakangas@i     1262         [ +  + ]:GNC       43956 :     while (segno > 0)
                               1263                 :                :     {
                               1264                 :          21978 :         MdfdVec    *v = &reln->md_seg_fds[forknum][segno - 1];
                               1265                 :                : 
                               1266                 :          21978 :         register_dirty_segment(reln, forknum, v);
                               1267                 :                : 
                               1268                 :                :         /* Close inactive segments immediately */
                               1269         [ -  + ]:          21978 :         if (segno > min_inactive_seg)
                               1270                 :                :         {
   51 heikki.linnakangas@i     1271                 :UNC           0 :             FileClose(v->mdfd_vfd);
                               1272                 :              0 :             _fdvec_resize(reln, forknum, segno - 1);
                               1273                 :                :         }
                               1274                 :                : 
   51 heikki.linnakangas@i     1275                 :GNC       21978 :         segno--;
                               1276                 :                :     }
                               1277                 :          21978 : }
                               1278                 :                : 
                               1279                 :                : /*
                               1280                 :                :  * mdimmedsync() -- Immediately sync a relation to stable storage.
                               1281                 :                :  *
                               1282                 :                :  * Note that only writes already issued are synced; this routine knows
                               1283                 :                :  * nothing of dirty buffers that may exist inside the buffer manager.  We
                               1284                 :                :  * sync active and inactive segments; smgrDoPendingSyncs() relies on this.
                               1285                 :                :  * Consider a relation skipping WAL.  Suppose a checkpoint syncs blocks of
                               1286                 :                :  * some segment, then mdtruncate() renders that segment inactive.  If we
                               1287                 :                :  * crash before the next checkpoint syncs the newly-inactive segment, that
                               1288                 :                :  * segment may survive recovery, reintroducing unwanted data into the table.
                               1289                 :                :  */
                               1290                 :                : void
 5725 heikki.linnakangas@i     1291                 :CBC          11 : mdimmedsync(SMgrRelation reln, ForkNumber forknum)
                               1292                 :                : {
                               1293                 :                :     int         segno;
                               1294                 :                :     int         min_inactive_seg;
                               1295                 :                : 
                               1296                 :                :     /*
                               1297                 :                :      * NOTE: mdnblocks makes sure we have opened all active segments, so that
                               1298                 :                :      * the loop below will get them all!
                               1299                 :                :      */
 4752 peter_e@gmx.net          1300                 :             11 :     mdnblocks(reln, forknum);
                               1301                 :                : 
 1471 noah@leadboat.com        1302                 :             11 :     min_inactive_seg = segno = reln->md_num_open_segs[forknum];
                               1303                 :                : 
                               1304                 :                :     /*
                               1305                 :                :      * Temporarily open inactive segments, then close them after sync.  There
                               1306                 :                :      * may be some inactive segments left opened after fsync() error, but that
                               1307                 :                :      * is harmless.  We don't bother to clean them up and take a risk of
                               1308                 :                :      * further trouble.  The next mdclose() will soon close them.
                               1309                 :                :      */
                               1310         [ -  + ]:             11 :     while (_mdfd_openseg(reln, forknum, segno, 0) != NULL)
 1471 noah@leadboat.com        1311                 :UBC           0 :         segno++;
                               1312                 :                : 
 2775 andres@anarazel.de       1313         [ +  + ]:CBC          22 :     while (segno > 0)
                               1314                 :                :     {
                               1315                 :             11 :         MdfdVec    *v = &reln->md_seg_fds[forknum][segno - 1];
                               1316                 :                : 
                               1317                 :                :         /*
                               1318                 :                :          * fsyncs done through mdimmedsync() should be tracked in a separate
                               1319                 :                :          * IOContext than those done through mdsyncfiletag() to differentiate
                               1320                 :                :          * between unavoidable client backend fsyncs (e.g. those done during
                               1321                 :                :          * index build) and those which ideally would have been done by the
                               1322                 :                :          * checkpointer. Since other IO operations bypassing the buffer
                               1323                 :                :          * manager could also be tracked in such an IOContext, wait until
                               1324                 :                :          * these are also tracked to track immediate fsyncs.
                               1325                 :                :          */
 2584 rhaas@postgresql.org     1326         [ -  + ]:             11 :         if (FileSync(v->mdfd_vfd, WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC) < 0)
 1973 tmunro@postgresql.or     1327         [ #  # ]:UBC           0 :             ereport(data_sync_elevel(ERROR),
                               1328                 :                :                     (errcode_for_file_access(),
                               1329                 :                :                      errmsg("could not fsync file \"%s\": %m",
                               1330                 :                :                             FilePathName(v->mdfd_vfd))));
                               1331                 :                : 
                               1332                 :                :         /* Close inactive segments immediately */
 1471 noah@leadboat.com        1333         [ -  + ]:CBC          11 :         if (segno > min_inactive_seg)
                               1334                 :                :         {
 1471 noah@leadboat.com        1335                 :UBC           0 :             FileClose(v->mdfd_vfd);
                               1336                 :              0 :             _fdvec_resize(reln, forknum, segno - 1);
                               1337                 :                :         }
                               1338                 :                : 
 2775 andres@anarazel.de       1339                 :CBC          11 :         segno--;
                               1340                 :                :     }
 7256 tgl@sss.pgh.pa.us        1341                 :             11 : }
                               1342                 :                : 
                               1343                 :                : /*
                               1344                 :                :  * register_dirty_segment() -- Mark a relation segment as needing fsync
                               1345                 :                :  *
                               1346                 :                :  * If there is a local pending-ops table, just make an entry in it for
                               1347                 :                :  * ProcessSyncRequests to process later.  Otherwise, try to pass off the
                               1348                 :                :  * fsync request to the checkpointer process.  If that fails, just do the
                               1349                 :                :  * fsync locally before returning (we hope this will not happen often
                               1350                 :                :  * enough to be a performance problem).
                               1351                 :                :  */
                               1352                 :                : static void
 5725 heikki.linnakangas@i     1353                 :         881627 : register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
                               1354                 :                : {
                               1355                 :                :     FileTag     tag;
                               1356                 :                : 
  648 rhaas@postgresql.org     1357                 :         881627 :     INIT_MD_FILETAG(tag, reln->smgr_rlocator.locator, forknum, seg->mdfd_segno);
                               1358                 :                : 
                               1359                 :                :     /* Temp relations should never be fsync'd */
 4289 tgl@sss.pgh.pa.us        1360         [ -  + ]:         881627 :     Assert(!SmgrIsTemp(reln));
                               1361                 :                : 
 1837 tmunro@postgresql.or     1362         [ -  + ]:         881627 :     if (!RegisterSyncRequest(&tag, SYNC_REQUEST, false /* retryOnError */ ))
                               1363                 :                :     {
                               1364                 :                :         instr_time  io_start;
                               1365                 :                : 
  373 andres@anarazel.de       1366         [ #  # ]:LBC       (454) :         ereport(DEBUG1,
                               1367                 :                :                 (errmsg_internal("could not forward fsync request because request queue is full")));
                               1368                 :                : 
  120 michael@paquier.xyz      1369                 :UNC           0 :         io_start = pgstat_prepare_io_time(track_io_timing);
                               1370                 :                : 
  373 andres@anarazel.de       1371         [ #  # ]:LBC       (454) :         if (FileSync(seg->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC) < 0)
  373 andres@anarazel.de       1372         [ #  # ]:UBC           0 :             ereport(data_sync_elevel(ERROR),
                               1373                 :                :                     (errcode_for_file_access(),
                               1374                 :                :                      errmsg("could not fsync file \"%s\": %m",
                               1375                 :                :                             FilePathName(seg->mdfd_vfd))));
                               1376                 :                : 
                               1377                 :                :         /*
                               1378                 :                :          * We have no way of knowing if the current IOContext is
                               1379                 :                :          * IOCONTEXT_NORMAL or IOCONTEXT_[BULKREAD, BULKWRITE, VACUUM] at this
                               1380                 :                :          * point, so count the fsync as being in the IOCONTEXT_NORMAL
                               1381                 :                :          * IOContext. This is probably okay, because the number of backend
                               1382                 :                :          * fsyncs doesn't say anything about the efficacy of the
                               1383                 :                :          * BufferAccessStrategy. And counting both fsyncs done in
                               1384                 :                :          * IOCONTEXT_NORMAL and IOCONTEXT_[BULKREAD, BULKWRITE, VACUUM] under
                               1385                 :                :          * IOCONTEXT_NORMAL is likely clearer when investigating the number of
                               1386                 :                :          * backend fsyncs.
                               1387                 :                :          */
  373 andres@anarazel.de       1388                 :LBC       (454) :         pgstat_count_io_op_time(IOOBJECT_RELATION, IOCONTEXT_NORMAL,
                               1389                 :                :                                 IOOP_FSYNC, io_start, 1);
                               1390                 :                :     }
10141 scrappy@hub.org          1391                 :CBC      881627 : }
                               1392                 :                : 
                               1393                 :                : /*
                               1394                 :                :  * register_unlink_segment() -- Schedule a file to be deleted after next checkpoint
                               1395                 :                :  */
                               1396                 :                : static void
  648 rhaas@postgresql.org     1397                 :          32551 : register_unlink_segment(RelFileLocatorBackend rlocator, ForkNumber forknum,
                               1398                 :                :                         BlockNumber segno)
                               1399                 :                : {
                               1400                 :                :     FileTag     tag;
                               1401                 :                : 
                               1402                 :          32551 :     INIT_MD_FILETAG(tag, rlocator.locator, forknum, segno);
                               1403                 :                : 
                               1404                 :                :     /* Should never be used with temp relations */
                               1405         [ -  + ]:          32551 :     Assert(!RelFileLocatorBackendIsTemp(rlocator));
                               1406                 :                : 
 1837 tmunro@postgresql.or     1407                 :          32551 :     RegisterSyncRequest(&tag, SYNC_UNLINK_REQUEST, true /* retryOnError */ );
 5995 tgl@sss.pgh.pa.us        1408                 :          32551 : }
                               1409                 :                : 
                               1410                 :                : /*
                               1411                 :                :  * register_forget_request() -- forget any fsyncs for a relation fork's segment
                               1412                 :                :  */
                               1413                 :                : static void
  648 rhaas@postgresql.org     1414                 :         125761 : register_forget_request(RelFileLocatorBackend rlocator, ForkNumber forknum,
                               1415                 :                :                         BlockNumber segno)
                               1416                 :                : {
                               1417                 :                :     FileTag     tag;
                               1418                 :                : 
                               1419                 :         125761 :     INIT_MD_FILETAG(tag, rlocator.locator, forknum, segno);
                               1420                 :                : 
 1837 tmunro@postgresql.or     1421                 :         125761 :     RegisterSyncRequest(&tag, SYNC_FORGET_REQUEST, true /* retryOnError */ );
 6297 tgl@sss.pgh.pa.us        1422                 :         125761 : }
                               1423                 :                : 
                               1424                 :                : /*
                               1425                 :                :  * ForgetDatabaseSyncRequests -- forget any fsyncs and unlinks for a DB
                               1426                 :                :  */
                               1427                 :                : void
 1837 tmunro@postgresql.or     1428                 :             59 : ForgetDatabaseSyncRequests(Oid dbid)
                               1429                 :                : {
                               1430                 :                :     FileTag     tag;
                               1431                 :                :     RelFileLocator rlocator;
                               1432                 :                : 
  648 rhaas@postgresql.org     1433                 :             59 :     rlocator.dbOid = dbid;
                               1434                 :             59 :     rlocator.spcOid = 0;
                               1435                 :             59 :     rlocator.relNumber = 0;
                               1436                 :                : 
                               1437                 :             59 :     INIT_MD_FILETAG(tag, rlocator, InvalidForkNumber, InvalidBlockNumber);
                               1438                 :                : 
 1837 tmunro@postgresql.or     1439                 :             59 :     RegisterSyncRequest(&tag, SYNC_FILTER_REQUEST, true /* retryOnError */ );
 8569 vadim4o@yahoo.com        1440                 :             59 : }
                               1441                 :                : 
                               1442                 :                : /*
                               1443                 :                :  * DropRelationFiles -- drop files of all given relations
                               1444                 :                :  */
                               1445                 :                : void
  648 rhaas@postgresql.org     1446                 :           2493 : DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
                               1447                 :                : {
                               1448                 :                :     SMgrRelation *srels;
                               1449                 :                :     int         i;
                               1450                 :                : 
 2110 fujii@postgresql.org     1451                 :           2493 :     srels = palloc(sizeof(SMgrRelation) * ndelrels);
                               1452         [ +  + ]:           9524 :     for (i = 0; i < ndelrels; i++)
                               1453                 :                :     {
   42 heikki.linnakangas@i     1454                 :GNC        7031 :         SMgrRelation srel = smgropen(delrels[i], INVALID_PROC_NUMBER);
                               1455                 :                : 
 2110 fujii@postgresql.org     1456         [ +  + ]:CBC        7031 :         if (isRedo)
                               1457                 :                :         {
                               1458                 :                :             ForkNumber  fork;
                               1459                 :                : 
                               1460         [ +  + ]:          35035 :             for (fork = 0; fork <= MAX_FORKNUM; fork++)
                               1461                 :          28028 :                 XLogDropRelation(delrels[i], fork);
                               1462                 :                :         }
                               1463                 :           7031 :         srels[i] = srel;
                               1464                 :                :     }
                               1465                 :                : 
                               1466                 :           2493 :     smgrdounlinkall(srels, ndelrels, isRedo);
                               1467                 :                : 
 1845 tomas.vondra@postgre     1468         [ +  + ]:           9524 :     for (i = 0; i < ndelrels; i++)
 2110 fujii@postgresql.org     1469                 :           7031 :         smgrclose(srels[i]);
                               1470                 :           2493 :     pfree(srels);
                               1471                 :           2493 : }
                               1472                 :                : 
                               1473                 :                : 
                               1474                 :                : /*
                               1475                 :                :  * _fdvec_resize() -- Resize the fork's open segments array
                               1476                 :                :  */
                               1477                 :                : static void
 2775 andres@anarazel.de       1478                 :        1325023 : _fdvec_resize(SMgrRelation reln,
                               1479                 :                :               ForkNumber forknum,
                               1480                 :                :               int nseg)
                               1481                 :                : {
                               1482         [ +  + ]:        1325023 :     if (nseg == 0)
                               1483                 :                :     {
                               1484         [ +  - ]:         474238 :         if (reln->md_num_open_segs[forknum] > 0)
                               1485                 :                :         {
                               1486                 :         474238 :             pfree(reln->md_seg_fds[forknum]);
                               1487                 :         474238 :             reln->md_seg_fds[forknum] = NULL;
                               1488                 :                :         }
                               1489                 :                :     }
                               1490         [ +  - ]:         850785 :     else if (reln->md_num_open_segs[forknum] == 0)
                               1491                 :                :     {
                               1492                 :         850785 :         reln->md_seg_fds[forknum] =
                               1493                 :         850785 :             MemoryContextAlloc(MdCxt, sizeof(MdfdVec) * nseg);
                               1494                 :                :     }
                               1495                 :                :     else
                               1496                 :                :     {
                               1497                 :                :         /*
                               1498                 :                :          * It doesn't seem worthwhile complicating the code to amortize
                               1499                 :                :          * repalloc() calls.  Those are far faster than PathNameOpenFile() or
                               1500                 :                :          * FileClose(), and the memory context internally will sometimes avoid
                               1501                 :                :          * doing an actual reallocation.
                               1502                 :                :          */
 2775 andres@anarazel.de       1503                 :UBC           0 :         reln->md_seg_fds[forknum] =
                               1504                 :              0 :             repalloc(reln->md_seg_fds[forknum],
                               1505                 :                :                      sizeof(MdfdVec) * nseg);
                               1506                 :                :     }
                               1507                 :                : 
 2775 andres@anarazel.de       1508                 :CBC     1325023 :     reln->md_num_open_segs[forknum] = nseg;
 9824 vadim4o@yahoo.com        1509                 :        1325023 : }
                               1510                 :                : 
                               1511                 :                : /*
                               1512                 :                :  * Return the filename for the specified segment of the relation. The
                               1513                 :                :  * returned string is palloc'd.
                               1514                 :                :  */
                               1515                 :                : static char *
 5366 heikki.linnakangas@i     1516                 :          22001 : _mdfd_segpath(SMgrRelation reln, ForkNumber forknum, BlockNumber segno)
                               1517                 :                : {
                               1518                 :                :     char       *path,
                               1519                 :                :                *fullpath;
                               1520                 :                : 
  648 rhaas@postgresql.org     1521                 :          22001 :     path = relpath(reln->smgr_rlocator, forknum);
                               1522                 :                : 
 9716 bruce@momjian.us         1523         [ +  - ]:          22001 :     if (segno > 0)
                               1524                 :                :     {
 3751 peter_e@gmx.net          1525                 :          22001 :         fullpath = psprintf("%s.%u", path, segno);
 8771 tgl@sss.pgh.pa.us        1526                 :          22001 :         pfree(path);
                               1527                 :                :     }
                               1528                 :                :     else
 9716 bruce@momjian.us         1529                 :UBC           0 :         fullpath = path;
                               1530                 :                : 
 5366 heikki.linnakangas@i     1531                 :CBC       22001 :     return fullpath;
                               1532                 :                : }
                               1533                 :                : 
                               1534                 :                : /*
                               1535                 :                :  * Open the specified segment of the relation,
                               1536                 :                :  * and make a MdfdVec object for it.  Returns NULL on failure.
                               1537                 :                :  */
                               1538                 :                : static MdfdVec *
                               1539                 :          21989 : _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
                               1540                 :                :               int oflags)
                               1541                 :                : {
                               1542                 :                :     MdfdVec    *v;
                               1543                 :                :     File        fd;
                               1544                 :                :     char       *fullpath;
                               1545                 :                : 
                               1546                 :          21989 :     fullpath = _mdfd_segpath(reln, forknum, segno);
                               1547                 :                : 
                               1548                 :                :     /* open the file */
  372 tmunro@postgresql.or     1549                 :          21989 :     fd = PathNameOpenFile(fullpath, _mdfd_open_flags() | oflags);
                               1550                 :                : 
 8771 tgl@sss.pgh.pa.us        1551                 :          21989 :     pfree(fullpath);
                               1552                 :                : 
 9716 bruce@momjian.us         1553         [ +  - ]:          21989 :     if (fd < 0)
 7403 neilc@samurai.com        1554                 :          21989 :         return NULL;
                               1555                 :                : 
                               1556                 :                :     /*
                               1557                 :                :      * Segments are always opened in order from lowest to highest, so we must
                               1558                 :                :      * be adding a new one at the end.
                               1559                 :                :      */
 1539 tmunro@postgresql.or     1560         [ #  # ]:UBC           0 :     Assert(segno == reln->md_num_open_segs[forknum]);
                               1561                 :                : 
                               1562                 :              0 :     _fdvec_resize(reln, forknum, segno + 1);
                               1563                 :                : 
                               1564                 :                :     /* fill the entry */
 2775 andres@anarazel.de       1565                 :              0 :     v = &reln->md_seg_fds[forknum][segno];
 9716 bruce@momjian.us         1566                 :              0 :     v->mdfd_vfd = fd;
 7258 tgl@sss.pgh.pa.us        1567                 :              0 :     v->mdfd_segno = segno;
                               1568                 :                : 
 5725 heikki.linnakangas@i     1569         [ #  # ]:              0 :     Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
                               1570                 :                : 
                               1571                 :                :     /* all done */
 9357 bruce@momjian.us         1572                 :              0 :     return v;
                               1573                 :                : }
                               1574                 :                : 
                               1575                 :                : /*
                               1576                 :                :  * _mdfd_getseg() -- Find the segment of the relation holding the
                               1577                 :                :  *                   specified block.
                               1578                 :                :  *
                               1579                 :                :  * If the segment doesn't exist, we ereport, return NULL, or create the
                               1580                 :                :  * segment, according to "behavior".  Note: skipFsync is only used in the
                               1581                 :                :  * EXTENSION_CREATE case.
                               1582                 :                :  */
                               1583                 :                : static MdfdVec *
 5725 heikki.linnakangas@i     1584                 :CBC     2162318 : _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
                               1585                 :                :              bool skipFsync, int behavior)
                               1586                 :                : {
                               1587                 :                :     MdfdVec    *v;
                               1588                 :                :     BlockNumber targetseg;
                               1589                 :                :     BlockNumber nextsegno;
                               1590                 :                : 
                               1591                 :                :     /* some way to handle non-existent segments needs to be specified */
 2902 andres@anarazel.de       1592         [ -  + ]:        2162318 :     Assert(behavior &
                               1593                 :                :            (EXTENSION_FAIL | EXTENSION_CREATE | EXTENSION_RETURN_NULL |
                               1594                 :                :             EXTENSION_DONT_OPEN));
                               1595                 :                : 
 6311 tgl@sss.pgh.pa.us        1596                 :        2162318 :     targetseg = blkno / ((BlockNumber) RELSEG_SIZE);
                               1597                 :                : 
                               1598                 :                :     /* if an existing and opened segment, we're done */
 2775 andres@anarazel.de       1599         [ +  + ]:        2162318 :     if (targetseg < reln->md_num_open_segs[forknum])
                               1600                 :                :     {
                               1601                 :        1978445 :         v = &reln->md_seg_fds[forknum][targetseg];
                               1602                 :        1978445 :         return v;
                               1603                 :                :     }
                               1604                 :                : 
                               1605                 :                :     /* The caller only wants the segment if we already had it open. */
  708 tmunro@postgresql.or     1606         [ +  + ]:         183873 :     if (behavior & EXTENSION_DONT_OPEN)
                               1607                 :            468 :         return NULL;
                               1608                 :                : 
                               1609                 :                :     /*
                               1610                 :                :      * The target segment is not yet open. Iterate over all the segments
                               1611                 :                :      * between the last opened and the target segment. This way missing
                               1612                 :                :      * segments either raise an error, or get created (according to
                               1613                 :                :      * 'behavior'). Start with either the last opened, or the first segment if
                               1614                 :                :      * none was opened before.
                               1615                 :                :      */
 2775 andres@anarazel.de       1616         [ +  + ]:         183405 :     if (reln->md_num_open_segs[forknum] > 0)
                               1617                 :             12 :         v = &reln->md_seg_fds[forknum][reln->md_num_open_segs[forknum] - 1];
                               1618                 :                :     else
                               1619                 :                :     {
 1733 tmunro@postgresql.or     1620                 :         183393 :         v = mdopenfork(reln, forknum, behavior);
 2775 andres@anarazel.de       1621         [ -  + ]:         183390 :         if (!v)
 2775 andres@anarazel.de       1622                 :UBC           0 :             return NULL;        /* if behavior & EXTENSION_RETURN_NULL */
                               1623                 :                :     }
                               1624                 :                : 
 2775 andres@anarazel.de       1625                 :CBC      183402 :     for (nextsegno = reln->md_num_open_segs[forknum];
                               1626         [ +  + ]:         183402 :          nextsegno <= targetseg; nextsegno++)
                               1627                 :                :     {
                               1628                 :             12 :         BlockNumber nblocks = _mdnblocks(reln, forknum, v);
                               1629                 :             12 :         int         flags = 0;
                               1630                 :                : 
                               1631         [ -  + ]:             12 :         Assert(nextsegno == v->mdfd_segno + 1);
                               1632                 :                : 
                               1633         [ -  + ]:             12 :         if (nblocks > ((BlockNumber) RELSEG_SIZE))
 2775 andres@anarazel.de       1634         [ #  # ]:UBC           0 :             elog(FATAL, "segment too big");
                               1635                 :                : 
 2775 andres@anarazel.de       1636         [ +  - ]:CBC          12 :         if ((behavior & EXTENSION_CREATE) ||
                               1637   [ -  +  -  - ]:             12 :             (InRecovery && (behavior & EXTENSION_CREATE_RECOVERY)))
                               1638                 :                :         {
                               1639                 :                :             /*
                               1640                 :                :              * Normally we will create new segments only if authorized by the
                               1641                 :                :              * caller (i.e., we are doing mdextend()).  But when doing WAL
                               1642                 :                :              * recovery, create segments anyway; this allows cases such as
                               1643                 :                :              * replaying WAL data that has a write into a high-numbered
                               1644                 :                :              * segment of a relation that was later deleted. We want to go
                               1645                 :                :              * ahead and create the segments so we can finish out the replay.
                               1646                 :                :              *
                               1647                 :                :              * We have to maintain the invariant that segments before the last
                               1648                 :                :              * active segment are of size RELSEG_SIZE; therefore, if
                               1649                 :                :              * extending, pad them out with zeroes if needed.  (This only
                               1650                 :                :              * matters if in recovery, or if the caller is extending the
                               1651                 :                :              * relation discontiguously, but that can happen in hash indexes.)
                               1652                 :                :              */
 2775 andres@anarazel.de       1653         [ #  # ]:UBC           0 :             if (nblocks < ((BlockNumber) RELSEG_SIZE))
                               1654                 :                :             {
  372 tmunro@postgresql.or     1655                 :              0 :                 char       *zerobuf = palloc_aligned(BLCKSZ, PG_IO_ALIGN_SIZE,
                               1656                 :                :                                                      MCXT_ALLOC_ZERO);
                               1657                 :                : 
 2775 andres@anarazel.de       1658                 :              0 :                 mdextend(reln, forknum,
                               1659                 :              0 :                          nextsegno * ((BlockNumber) RELSEG_SIZE) - 1,
                               1660                 :                :                          zerobuf, skipFsync);
                               1661                 :              0 :                 pfree(zerobuf);
                               1662                 :                :             }
                               1663                 :              0 :             flags = O_CREAT;
                               1664                 :                :         }
 2775 andres@anarazel.de       1665   [ +  -  +  - ]:CBC          12 :         else if (!(behavior & EXTENSION_DONT_CHECK_SIZE) &&
                               1666                 :                :                  nblocks < ((BlockNumber) RELSEG_SIZE))
                               1667                 :                :         {
                               1668                 :                :             /*
                               1669                 :                :              * When not extending (or explicitly including truncated
                               1670                 :                :              * segments), only open the next segment if the current one is
                               1671                 :                :              * exactly RELSEG_SIZE.  If not (this branch), either return NULL
                               1672                 :                :              * or fail.
                               1673                 :                :              */
                               1674         [ -  + ]:             12 :             if (behavior & EXTENSION_RETURN_NULL)
                               1675                 :                :             {
                               1676                 :                :                 /*
                               1677                 :                :                  * Some callers discern between reasons for _mdfd_getseg()
                               1678                 :                :                  * returning NULL based on errno. As there's no failing
                               1679                 :                :                  * syscall involved in this case, explicitly set errno to
                               1680                 :                :                  * ENOENT, as that seems the closest interpretation.
                               1681                 :                :                  */
 2775 andres@anarazel.de       1682                 :UBC           0 :                 errno = ENOENT;
                               1683                 :              0 :                 return NULL;
                               1684                 :                :             }
                               1685                 :                : 
 2775 andres@anarazel.de       1686         [ +  - ]:CBC          12 :             ereport(ERROR,
                               1687                 :                :                     (errcode_for_file_access(),
                               1688                 :                :                      errmsg("could not open file \"%s\" (target block %u): previous segment is only %u blocks",
                               1689                 :                :                             _mdfd_segpath(reln, forknum, nextsegno),
                               1690                 :                :                             blkno, nblocks)));
                               1691                 :                :         }
                               1692                 :                : 
 2775 andres@anarazel.de       1693                 :UBC           0 :         v = _mdfd_openseg(reln, forknum, nextsegno, flags);
                               1694                 :                : 
                               1695         [ #  # ]:              0 :         if (v == NULL)
                               1696                 :                :         {
                               1697         [ #  # ]:              0 :             if ((behavior & EXTENSION_RETURN_NULL) &&
                               1698         [ #  # ]:              0 :                 FILE_POSSIBLY_DELETED(errno))
                               1699                 :              0 :                 return NULL;
                               1700         [ #  # ]:              0 :             ereport(ERROR,
                               1701                 :                :                     (errcode_for_file_access(),
                               1702                 :                :                      errmsg("could not open file \"%s\" (target block %u): %m",
                               1703                 :                :                             _mdfd_segpath(reln, forknum, nextsegno),
                               1704                 :                :                             blkno)));
                               1705                 :                :         }
                               1706                 :                :     }
                               1707                 :                : 
 9357 bruce@momjian.us         1708                 :CBC      183390 :     return v;
                               1709                 :                : }
                               1710                 :                : 
                               1711                 :                : /*
                               1712                 :                :  * Get number of blocks present in a single disk file
                               1713                 :                :  */
                               1714                 :                : static BlockNumber
 5725 heikki.linnakangas@i     1715                 :        2946042 : _mdnblocks(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
                               1716                 :                : {
                               1717                 :                :     off_t       len;
                               1718                 :                : 
 1985 tmunro@postgresql.or     1719                 :        2946042 :     len = FileSize(seg->mdfd_vfd);
 8768 bruce@momjian.us         1720         [ -  + ]:        2946042 :     if (len < 0)
 6311 tgl@sss.pgh.pa.us        1721         [ #  # ]:UBC           0 :         ereport(ERROR,
                               1722                 :                :                 (errcode_for_file_access(),
                               1723                 :                :                  errmsg("could not seek to end of file \"%s\": %m",
                               1724                 :                :                         FilePathName(seg->mdfd_vfd))));
                               1725                 :                :     /* note that this calculation will ignore any partial block at EOF */
 6311 tgl@sss.pgh.pa.us        1726                 :CBC     2946042 :     return (BlockNumber) (len / BLCKSZ);
                               1727                 :                : }
                               1728                 :                : 
                               1729                 :                : /*
                               1730                 :                :  * Sync a file to disk, given a file tag.  Write the path into an output
                               1731                 :                :  * buffer so the caller can use it in error messages.
                               1732                 :                :  *
                               1733                 :                :  * Return 0 on success, -1 on failure, with errno set.
                               1734                 :                :  */
                               1735                 :                : int
 1837 tmunro@postgresql.or     1736                 :UBC           0 : mdsyncfiletag(const FileTag *ftag, char *path)
                               1737                 :                : {
   42 heikki.linnakangas@i     1738                 :UNC           0 :     SMgrRelation reln = smgropen(ftag->rlocator, INVALID_PROC_NUMBER);
                               1739                 :                :     File        file;
                               1740                 :                :     instr_time  io_start;
                               1741                 :                :     bool        need_to_close;
                               1742                 :                :     int         result,
                               1743                 :                :                 save_errno;
                               1744                 :                : 
                               1745                 :                :     /* See if we already have the file open, or need to open it. */
 1583 tmunro@postgresql.or     1746         [ #  # ]:UBC           0 :     if (ftag->segno < reln->md_num_open_segs[ftag->forknum])
                               1747                 :                :     {
                               1748                 :              0 :         file = reln->md_seg_fds[ftag->forknum][ftag->segno].mdfd_vfd;
                               1749                 :              0 :         strlcpy(path, FilePathName(file), MAXPGPATH);
                               1750                 :              0 :         need_to_close = false;
                               1751                 :                :     }
                               1752                 :                :     else
                               1753                 :                :     {
                               1754                 :                :         char       *p;
                               1755                 :                : 
                               1756                 :              0 :         p = _mdfd_segpath(reln, ftag->forknum, ftag->segno);
                               1757                 :              0 :         strlcpy(path, p, MAXPGPATH);
                               1758                 :              0 :         pfree(p);
                               1759                 :                : 
  372                          1760                 :              0 :         file = PathNameOpenFile(path, _mdfd_open_flags());
 1583                          1761         [ #  # ]:              0 :         if (file < 0)
                               1762                 :              0 :             return -1;
                               1763                 :              0 :         need_to_close = true;
                               1764                 :                :     }
                               1765                 :                : 
  120 michael@paquier.xyz      1766                 :UNC           0 :     io_start = pgstat_prepare_io_time(track_io_timing);
                               1767                 :                : 
                               1768                 :                :     /* Sync the file. */
 1583 tmunro@postgresql.or     1769                 :UBC           0 :     result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
                               1770                 :              0 :     save_errno = errno;
                               1771                 :                : 
                               1772         [ #  # ]:              0 :     if (need_to_close)
                               1773                 :              0 :         FileClose(file);
                               1774                 :                : 
  373 andres@anarazel.de       1775                 :              0 :     pgstat_count_io_op_time(IOOBJECT_RELATION, IOCONTEXT_NORMAL,
                               1776                 :                :                             IOOP_FSYNC, io_start, 1);
                               1777                 :                : 
 1583 tmunro@postgresql.or     1778                 :              0 :     errno = save_errno;
                               1779                 :              0 :     return result;
                               1780                 :                : }
                               1781                 :                : 
                               1782                 :                : /*
                               1783                 :                :  * Unlink a file, given a file tag.  Write the path into an output
                               1784                 :                :  * buffer so the caller can use it in error messages.
                               1785                 :                :  *
                               1786                 :                :  * Return 0 on success, -1 on failure, with errno set.
                               1787                 :                :  */
                               1788                 :                : int
 1837 tmunro@postgresql.or     1789                 :CBC       30677 : mdunlinkfiletag(const FileTag *ftag, char *path)
                               1790                 :                : {
                               1791                 :                :     char       *p;
                               1792                 :                : 
                               1793                 :                :     /* Compute the path. */
  648 rhaas@postgresql.org     1794                 :          30677 :     p = relpathperm(ftag->rlocator, MAIN_FORKNUM);
 1837 tmunro@postgresql.or     1795                 :          30677 :     strlcpy(path, p, MAXPGPATH);
                               1796                 :          30677 :     pfree(p);
                               1797                 :                : 
                               1798                 :                :     /* Try to unlink the file. */
                               1799                 :          30677 :     return unlink(path);
                               1800                 :                : }
                               1801                 :                : 
                               1802                 :                : /*
                               1803                 :                :  * Check if a given candidate request matches a given tag, when processing
                               1804                 :                :  * a SYNC_FILTER_REQUEST request.  This will be called for all pending
                               1805                 :                :  * requests to find out whether to forget them.
                               1806                 :                :  */
                               1807                 :                : bool
                               1808                 :          19329 : mdfiletagmatches(const FileTag *ftag, const FileTag *candidate)
                               1809                 :                : {
                               1810                 :                :     /*
                               1811                 :                :      * For now we only use filter requests as a way to drop all scheduled
                               1812                 :                :      * callbacks relating to a given database, when dropping the database.
                               1813                 :                :      * We'll return true for all candidates that have the same database OID as
                               1814                 :                :      * the ftag from the SYNC_FILTER_REQUEST request, so they're forgotten.
                               1815                 :                :      */
  648 rhaas@postgresql.org     1816                 :          19329 :     return ftag->rlocator.dbOid == candidate->rlocator.dbOid;
                               1817                 :                : }
        

Generated by: LCOV version 2.1-beta2-3-g6141622