LCOV - differential code coverage report
Current view: top level - src/backend/access/transam - xlog.c (source / functions) Coverage Total Hit UNC LBC UBC GBC GNC CBC DUB DCB
Current: Differential Code Coverage 16@8cea358b128 vs 17@8cea358b128 Lines: 88.7 % 2512 2228 30 7 247 8 220 2000 21 118
Current Date: 2024-04-14 14:21:10 Functions: 99.2 % 120 119 1 37 82 1
Baseline: 16@8cea358b128 Branches: 63.8 % 1807 1152 55 9 591 16 115 1021
Baseline Date: 2024-04-14 14:21:09 Line coverage date bins:
Legend: Lines: hit not hit | Branches: + taken - not taken # not executed [..60] days: 94.6 % 56 53 3 53
(60,120] days: 97.0 % 66 64 2 2 61 1
(120,180] days: 91.7 % 84 77 7 75 2
(180,240] days: 60.0 % 45 27 18 26 1
(240..) days: 88.8 % 2261 2007 2 7 245 6 5 1996
Function coverage date bins:
(60,120] days: 100.0 % 2 2 2
(120,180] days: 100.0 % 1 1 1
(180,240] days: 100.0 % 2 2 2
(240..) days: 99.1 % 115 114 1 32 82
Branch coverage date bins:
[..60] days: 62.5 % 24 15 9 15
(60,120] days: 76.2 % 42 32 3 7 2 29 1
(120,180] days: 73.3 % 60 44 16 44
(180,240] days: 54.5 % 44 24 20 24
(240..) days: 63.3 % 1637 1037 7 9 584 14 3 1020

 Age         Owner                    Branch data    TLA  Line data    Source code
                                  1                 :                : /*-------------------------------------------------------------------------
                                  2                 :                :  *
                                  3                 :                :  * xlog.c
                                  4                 :                :  *      PostgreSQL write-ahead log manager
                                  5                 :                :  *
                                  6                 :                :  * The Write-Ahead Log (WAL) functionality is split into several source
                                  7                 :                :  * files, in addition to this one:
                                  8                 :                :  *
                                  9                 :                :  * xloginsert.c - Functions for constructing WAL records
                                 10                 :                :  * xlogrecovery.c - WAL recovery and standby code
                                 11                 :                :  * xlogreader.c - Facility for reading WAL files and parsing WAL records
                                 12                 :                :  * xlogutils.c - Helper functions for WAL redo routines
                                 13                 :                :  *
                                 14                 :                :  * This file contains functions for coordinating database startup and
                                 15                 :                :  * checkpointing, and managing the write-ahead log buffers when the
                                 16                 :                :  * system is running.
                                 17                 :                :  *
                                 18                 :                :  * StartupXLOG() is the main entry point of the startup process.  It
                                 19                 :                :  * coordinates database startup, performing WAL recovery, and the
                                 20                 :                :  * transition from WAL recovery into normal operations.
                                 21                 :                :  *
                                 22                 :                :  * XLogInsertRecord() inserts a WAL record into the WAL buffers.  Most
                                 23                 :                :  * callers should not call this directly, but use the functions in
                                 24                 :                :  * xloginsert.c to construct the WAL record.  XLogFlush() can be used
                                 25                 :                :  * to force the WAL to disk.
                                 26                 :                :  *
                                 27                 :                :  * In addition to those, there are many other functions for interrogating
                                 28                 :                :  * the current system state, and for starting/stopping backups.
                                 29                 :                :  *
                                 30                 :                :  *
                                 31                 :                :  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
                                 32                 :                :  * Portions Copyright (c) 1994, Regents of the University of California
                                 33                 :                :  *
                                 34                 :                :  * src/backend/access/transam/xlog.c
                                 35                 :                :  *
                                 36                 :                :  *-------------------------------------------------------------------------
                                 37                 :                :  */
                                 38                 :                : 
                                 39                 :                : #include "postgres.h"
                                 40                 :                : 
                                 41                 :                : #include <ctype.h>
                                 42                 :                : #include <math.h>
                                 43                 :                : #include <time.h>
                                 44                 :                : #include <fcntl.h>
                                 45                 :                : #include <sys/stat.h>
                                 46                 :                : #include <sys/time.h>
                                 47                 :                : #include <unistd.h>
                                 48                 :                : 
                                 49                 :                : #include "access/clog.h"
                                 50                 :                : #include "access/commit_ts.h"
                                 51                 :                : #include "access/heaptoast.h"
                                 52                 :                : #include "access/multixact.h"
                                 53                 :                : #include "access/rewriteheap.h"
                                 54                 :                : #include "access/subtrans.h"
                                 55                 :                : #include "access/timeline.h"
                                 56                 :                : #include "access/transam.h"
                                 57                 :                : #include "access/twophase.h"
                                 58                 :                : #include "access/xact.h"
                                 59                 :                : #include "access/xlog_internal.h"
                                 60                 :                : #include "access/xlogarchive.h"
                                 61                 :                : #include "access/xloginsert.h"
                                 62                 :                : #include "access/xlogreader.h"
                                 63                 :                : #include "access/xlogrecovery.h"
                                 64                 :                : #include "access/xlogutils.h"
                                 65                 :                : #include "backup/basebackup.h"
                                 66                 :                : #include "catalog/catversion.h"
                                 67                 :                : #include "catalog/pg_control.h"
                                 68                 :                : #include "catalog/pg_database.h"
                                 69                 :                : #include "common/controldata_utils.h"
                                 70                 :                : #include "common/file_utils.h"
                                 71                 :                : #include "executor/instrument.h"
                                 72                 :                : #include "miscadmin.h"
                                 73                 :                : #include "pg_trace.h"
                                 74                 :                : #include "pgstat.h"
                                 75                 :                : #include "port/atomics.h"
                                 76                 :                : #include "port/pg_iovec.h"
                                 77                 :                : #include "postmaster/bgwriter.h"
                                 78                 :                : #include "postmaster/startup.h"
                                 79                 :                : #include "postmaster/walsummarizer.h"
                                 80                 :                : #include "postmaster/walwriter.h"
                                 81                 :                : #include "replication/origin.h"
                                 82                 :                : #include "replication/slot.h"
                                 83                 :                : #include "replication/snapbuild.h"
                                 84                 :                : #include "replication/walreceiver.h"
                                 85                 :                : #include "replication/walsender.h"
                                 86                 :                : #include "storage/bufmgr.h"
                                 87                 :                : #include "storage/fd.h"
                                 88                 :                : #include "storage/ipc.h"
                                 89                 :                : #include "storage/large_object.h"
                                 90                 :                : #include "storage/latch.h"
                                 91                 :                : #include "storage/predicate.h"
                                 92                 :                : #include "storage/proc.h"
                                 93                 :                : #include "storage/procarray.h"
                                 94                 :                : #include "storage/reinit.h"
                                 95                 :                : #include "storage/spin.h"
                                 96                 :                : #include "storage/sync.h"
                                 97                 :                : #include "utils/guc_hooks.h"
                                 98                 :                : #include "utils/guc_tables.h"
                                 99                 :                : #include "utils/injection_point.h"
                                100                 :                : #include "utils/memutils.h"
                                101                 :                : #include "utils/ps_status.h"
                                102                 :                : #include "utils/relmapper.h"
                                103                 :                : #include "utils/snapmgr.h"
                                104                 :                : #include "utils/timeout.h"
                                105                 :                : #include "utils/timestamp.h"
                                106                 :                : #include "utils/varlena.h"
                                107                 :                : 
                                108                 :                : extern uint32 bootstrap_data_checksum_version;
                                109                 :                : 
                                110                 :                : /* timeline ID to be used when bootstrapping */
                                111                 :                : #define BootstrapTimeLineID     1
                                112                 :                : 
                                113                 :                : /* User-settable parameters */
                                114                 :                : int         max_wal_size_mb = 1024; /* 1 GB */
                                115                 :                : int         min_wal_size_mb = 80;   /* 80 MB */
                                116                 :                : int         wal_keep_size_mb = 0;
                                117                 :                : int         XLOGbuffers = -1;
                                118                 :                : int         XLogArchiveTimeout = 0;
                                119                 :                : int         XLogArchiveMode = ARCHIVE_MODE_OFF;
                                120                 :                : char       *XLogArchiveCommand = NULL;
                                121                 :                : bool        EnableHotStandby = false;
                                122                 :                : bool        fullPageWrites = true;
                                123                 :                : bool        wal_log_hints = false;
                                124                 :                : int         wal_compression = WAL_COMPRESSION_NONE;
                                125                 :                : char       *wal_consistency_checking_string = NULL;
                                126                 :                : bool       *wal_consistency_checking = NULL;
                                127                 :                : bool        wal_init_zero = true;
                                128                 :                : bool        wal_recycle = true;
                                129                 :                : bool        log_checkpoints = true;
                                130                 :                : int         wal_sync_method = DEFAULT_WAL_SYNC_METHOD;
                                131                 :                : int         wal_level = WAL_LEVEL_REPLICA;
                                132                 :                : int         CommitDelay = 0;    /* precommit delay in microseconds */
                                133                 :                : int         CommitSiblings = 5; /* # concurrent xacts needed to sleep */
                                134                 :                : int         wal_retrieve_retry_interval = 5000;
                                135                 :                : int         max_slot_wal_keep_size_mb = -1;
                                136                 :                : int         wal_decode_buffer_size = 512 * 1024;
                                137                 :                : bool        track_wal_io_timing = false;
                                138                 :                : 
                                139                 :                : #ifdef WAL_DEBUG
                                140                 :                : bool        XLOG_DEBUG = false;
                                141                 :                : #endif
                                142                 :                : 
                                143                 :                : int         wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
                                144                 :                : 
                                145                 :                : /*
                                146                 :                :  * Number of WAL insertion locks to use. A higher value allows more insertions
                                147                 :                :  * to happen concurrently, but adds some CPU overhead to flushing the WAL,
                                148                 :                :  * which needs to iterate all the locks.
                                149                 :                :  */
                                150                 :                : #define NUM_XLOGINSERT_LOCKS  8
                                151                 :                : 
                                152                 :                : /*
                                153                 :                :  * Max distance from last checkpoint, before triggering a new xlog-based
                                154                 :                :  * checkpoint.
                                155                 :                :  */
                                156                 :                : int         CheckPointSegments;
                                157                 :                : 
                                158                 :                : /* Estimated distance between checkpoints, in bytes */
                                159                 :                : static double CheckPointDistanceEstimate = 0;
                                160                 :                : static double PrevCheckPointDistance = 0;
                                161                 :                : 
                                162                 :                : /*
                                163                 :                :  * Track whether there were any deferred checks for custom resource managers
                                164                 :                :  * specified in wal_consistency_checking.
                                165                 :                :  */
                                166                 :                : static bool check_wal_consistency_checking_deferred = false;
                                167                 :                : 
                                168                 :                : /*
                                169                 :                :  * GUC support
                                170                 :                :  */
                                171                 :                : const struct config_enum_entry wal_sync_method_options[] = {
                                172                 :                :     {"fsync", WAL_SYNC_METHOD_FSYNC, false},
                                173                 :                : #ifdef HAVE_FSYNC_WRITETHROUGH
                                174                 :                :     {"fsync_writethrough", WAL_SYNC_METHOD_FSYNC_WRITETHROUGH, false},
                                175                 :                : #endif
                                176                 :                :     {"fdatasync", WAL_SYNC_METHOD_FDATASYNC, false},
                                177                 :                : #ifdef O_SYNC
                                178                 :                :     {"open_sync", WAL_SYNC_METHOD_OPEN, false},
                                179                 :                : #endif
                                180                 :                : #ifdef O_DSYNC
                                181                 :                :     {"open_datasync", WAL_SYNC_METHOD_OPEN_DSYNC, false},
                                182                 :                : #endif
                                183                 :                :     {NULL, 0, false}
                                184                 :                : };
                                185                 :                : 
                                186                 :                : 
                                187                 :                : /*
                                188                 :                :  * Although only "on", "off", and "always" are documented,
                                189                 :                :  * we accept all the likely variants of "on" and "off".
                                190                 :                :  */
                                191                 :                : const struct config_enum_entry archive_mode_options[] = {
                                192                 :                :     {"always", ARCHIVE_MODE_ALWAYS, false},
                                193                 :                :     {"on", ARCHIVE_MODE_ON, false},
                                194                 :                :     {"off", ARCHIVE_MODE_OFF, false},
                                195                 :                :     {"true", ARCHIVE_MODE_ON, true},
                                196                 :                :     {"false", ARCHIVE_MODE_OFF, true},
                                197                 :                :     {"yes", ARCHIVE_MODE_ON, true},
                                198                 :                :     {"no", ARCHIVE_MODE_OFF, true},
                                199                 :                :     {"1", ARCHIVE_MODE_ON, true},
                                200                 :                :     {"0", ARCHIVE_MODE_OFF, true},
                                201                 :                :     {NULL, 0, false}
                                202                 :                : };
                                203                 :                : 
                                204                 :                : /*
                                205                 :                :  * Statistics for current checkpoint are collected in this global struct.
                                206                 :                :  * Because only the checkpointer or a stand-alone backend can perform
                                207                 :                :  * checkpoints, this will be unused in normal backends.
                                208                 :                :  */
                                209                 :                : CheckpointStatsData CheckpointStats;
                                210                 :                : 
                                211                 :                : /*
                                212                 :                :  * During recovery, lastFullPageWrites keeps track of full_page_writes that
                                213                 :                :  * the replayed WAL records indicate. It's initialized with full_page_writes
                                214                 :                :  * that the recovery starting checkpoint record indicates, and then updated
                                215                 :                :  * each time XLOG_FPW_CHANGE record is replayed.
                                216                 :                :  */
                                217                 :                : static bool lastFullPageWrites;
                                218                 :                : 
                                219                 :                : /*
                                220                 :                :  * Local copy of the state tracked by SharedRecoveryState in shared memory,
                                221                 :                :  * It is false if SharedRecoveryState is RECOVERY_STATE_DONE.  True actually
                                222                 :                :  * means "not known, need to check the shared state".
                                223                 :                :  */
                                224                 :                : static bool LocalRecoveryInProgress = true;
                                225                 :                : 
                                226                 :                : /*
                                227                 :                :  * Local state for XLogInsertAllowed():
                                228                 :                :  *      1: unconditionally allowed to insert XLOG
                                229                 :                :  *      0: unconditionally not allowed to insert XLOG
                                230                 :                :  *      -1: must check RecoveryInProgress(); disallow until it is false
                                231                 :                :  * Most processes start with -1 and transition to 1 after seeing that recovery
                                232                 :                :  * is not in progress.  But we can also force the value for special cases.
                                233                 :                :  * The coding in XLogInsertAllowed() depends on the first two of these states
                                234                 :                :  * being numerically the same as bool true and false.
                                235                 :                :  */
                                236                 :                : static int  LocalXLogInsertAllowed = -1;
                                237                 :                : 
                                238                 :                : /*
                                239                 :                :  * ProcLastRecPtr points to the start of the last XLOG record inserted by the
                                240                 :                :  * current backend.  It is updated for all inserts.  XactLastRecEnd points to
                                241                 :                :  * end+1 of the last record, and is reset when we end a top-level transaction,
                                242                 :                :  * or start a new one; so it can be used to tell if the current transaction has
                                243                 :                :  * created any XLOG records.
                                244                 :                :  *
                                245                 :                :  * While in parallel mode, this may not be fully up to date.  When committing,
                                246                 :                :  * a transaction can assume this covers all xlog records written either by the
                                247                 :                :  * user backend or by any parallel worker which was present at any point during
                                248                 :                :  * the transaction.  But when aborting, or when still in parallel mode, other
                                249                 :                :  * parallel backends may have written WAL records at later LSNs than the value
                                250                 :                :  * stored here.  The parallel leader advances its own copy, when necessary,
                                251                 :                :  * in WaitForParallelWorkersToFinish.
                                252                 :                :  */
                                253                 :                : XLogRecPtr  ProcLastRecPtr = InvalidXLogRecPtr;
                                254                 :                : XLogRecPtr  XactLastRecEnd = InvalidXLogRecPtr;
                                255                 :                : XLogRecPtr  XactLastCommitEnd = InvalidXLogRecPtr;
                                256                 :                : 
                                257                 :                : /*
                                258                 :                :  * RedoRecPtr is this backend's local copy of the REDO record pointer
                                259                 :                :  * (which is almost but not quite the same as a pointer to the most recent
                                260                 :                :  * CHECKPOINT record).  We update this from the shared-memory copy,
                                261                 :                :  * XLogCtl->Insert.RedoRecPtr, whenever we can safely do so (ie, when we
                                262                 :                :  * hold an insertion lock).  See XLogInsertRecord for details.  We are also
                                263                 :                :  * allowed to update from XLogCtl->RedoRecPtr if we hold the info_lck;
                                264                 :                :  * see GetRedoRecPtr.
                                265                 :                :  *
                                266                 :                :  * NB: Code that uses this variable must be prepared not only for the
                                267                 :                :  * possibility that it may be arbitrarily out of date, but also for the
                                268                 :                :  * possibility that it might be set to InvalidXLogRecPtr. We used to
                                269                 :                :  * initialize it as a side effect of the first call to RecoveryInProgress(),
                                270                 :                :  * which meant that most code that might use it could assume that it had a
                                271                 :                :  * real if perhaps stale value. That's no longer the case.
                                272                 :                :  */
                                273                 :                : static XLogRecPtr RedoRecPtr;
                                274                 :                : 
                                275                 :                : /*
                                276                 :                :  * doPageWrites is this backend's local copy of (fullPageWrites ||
                                277                 :                :  * runningBackups > 0).  It is used together with RedoRecPtr to decide whether
                                278                 :                :  * a full-page image of a page need to be taken.
                                279                 :                :  *
                                280                 :                :  * NB: Initially this is false, and there's no guarantee that it will be
                                281                 :                :  * initialized to any other value before it is first used. Any code that
                                282                 :                :  * makes use of it must recheck the value after obtaining a WALInsertLock,
                                283                 :                :  * and respond appropriately if it turns out that the previous value wasn't
                                284                 :                :  * accurate.
                                285                 :                :  */
                                286                 :                : static bool doPageWrites;
                                287                 :                : 
                                288                 :                : /*----------
                                289                 :                :  * Shared-memory data structures for XLOG control
                                290                 :                :  *
                                291                 :                :  * LogwrtRqst indicates a byte position that we need to write and/or fsync
                                292                 :                :  * the log up to (all records before that point must be written or fsynced).
                                293                 :                :  * The positions already written/fsynced are maintained in logWriteResult
                                294                 :                :  * and logFlushResult using atomic access.
                                295                 :                :  * In addition to the shared variable, each backend has a private copy of
                                296                 :                :  * both in LogwrtResult, which is updated when convenient.
                                297                 :                :  *
                                298                 :                :  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
                                299                 :                :  * (protected by info_lck), but we don't need to cache any copies of it.
                                300                 :                :  *
                                301                 :                :  * info_lck is only held long enough to read/update the protected variables,
                                302                 :                :  * so it's a plain spinlock.  The other locks are held longer (potentially
                                303                 :                :  * over I/O operations), so we use LWLocks for them.  These locks are:
                                304                 :                :  *
                                305                 :                :  * WALBufMappingLock: must be held to replace a page in the WAL buffer cache.
                                306                 :                :  * It is only held while initializing and changing the mapping.  If the
                                307                 :                :  * contents of the buffer being replaced haven't been written yet, the mapping
                                308                 :                :  * lock is released while the write is done, and reacquired afterwards.
                                309                 :                :  *
                                310                 :                :  * WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or
                                311                 :                :  * XLogFlush).
                                312                 :                :  *
                                313                 :                :  * ControlFileLock: must be held to read/update control file or create
                                314                 :                :  * new log file.
                                315                 :                :  *
                                316                 :                :  *----------
                                317                 :                :  */
                                318                 :                : 
                                319                 :                : typedef struct XLogwrtRqst
                                320                 :                : {
                                321                 :                :     XLogRecPtr  Write;          /* last byte + 1 to write out */
                                322                 :                :     XLogRecPtr  Flush;          /* last byte + 1 to flush */
                                323                 :                : } XLogwrtRqst;
                                324                 :                : 
                                325                 :                : typedef struct XLogwrtResult
                                326                 :                : {
                                327                 :                :     XLogRecPtr  Write;          /* last byte + 1 written out */
                                328                 :                :     XLogRecPtr  Flush;          /* last byte + 1 flushed */
                                329                 :                : } XLogwrtResult;
                                330                 :                : 
                                331                 :                : /*
                                332                 :                :  * Inserting to WAL is protected by a small fixed number of WAL insertion
                                333                 :                :  * locks. To insert to the WAL, you must hold one of the locks - it doesn't
                                334                 :                :  * matter which one. To lock out other concurrent insertions, you must hold
                                335                 :                :  * of them. Each WAL insertion lock consists of a lightweight lock, plus an
                                336                 :                :  * indicator of how far the insertion has progressed (insertingAt).
                                337                 :                :  *
                                338                 :                :  * The insertingAt values are read when a process wants to flush WAL from
                                339                 :                :  * the in-memory buffers to disk, to check that all the insertions to the
                                340                 :                :  * region the process is about to write out have finished. You could simply
                                341                 :                :  * wait for all currently in-progress insertions to finish, but the
                                342                 :                :  * insertingAt indicator allows you to ignore insertions to later in the WAL,
                                343                 :                :  * so that you only wait for the insertions that are modifying the buffers
                                344                 :                :  * you're about to write out.
                                345                 :                :  *
                                346                 :                :  * This isn't just an optimization. If all the WAL buffers are dirty, an
                                347                 :                :  * inserter that's holding a WAL insert lock might need to evict an old WAL
                                348                 :                :  * buffer, which requires flushing the WAL. If it's possible for an inserter
                                349                 :                :  * to block on another inserter unnecessarily, deadlock can arise when two
                                350                 :                :  * inserters holding a WAL insert lock wait for each other to finish their
                                351                 :                :  * insertion.
                                352                 :                :  *
                                353                 :                :  * Small WAL records that don't cross a page boundary never update the value,
                                354                 :                :  * the WAL record is just copied to the page and the lock is released. But
                                355                 :                :  * to avoid the deadlock-scenario explained above, the indicator is always
                                356                 :                :  * updated before sleeping while holding an insertion lock.
                                357                 :                :  *
                                358                 :                :  * lastImportantAt contains the LSN of the last important WAL record inserted
                                359                 :                :  * using a given lock. This value is used to detect if there has been
                                360                 :                :  * important WAL activity since the last time some action, like a checkpoint,
                                361                 :                :  * was performed - allowing to not repeat the action if not. The LSN is
                                362                 :                :  * updated for all insertions, unless the XLOG_MARK_UNIMPORTANT flag was
                                363                 :                :  * set. lastImportantAt is never cleared, only overwritten by the LSN of newer
                                364                 :                :  * records.  Tracking the WAL activity directly in WALInsertLock has the
                                365                 :                :  * advantage of not needing any additional locks to update the value.
                                366                 :                :  */
                                367                 :                : typedef struct
                                368                 :                : {
                                369                 :                :     LWLock      lock;
                                370                 :                :     pg_atomic_uint64 insertingAt;
                                371                 :                :     XLogRecPtr  lastImportantAt;
                                372                 :                : } WALInsertLock;
                                373                 :                : 
                                374                 :                : /*
                                375                 :                :  * All the WAL insertion locks are allocated as an array in shared memory. We
                                376                 :                :  * force the array stride to be a power of 2, which saves a few cycles in
                                377                 :                :  * indexing, but more importantly also ensures that individual slots don't
                                378                 :                :  * cross cache line boundaries. (Of course, we have to also ensure that the
                                379                 :                :  * array start address is suitably aligned.)
                                380                 :                :  */
                                381                 :                : typedef union WALInsertLockPadded
                                382                 :                : {
                                383                 :                :     WALInsertLock l;
                                384                 :                :     char        pad[PG_CACHE_LINE_SIZE];
                                385                 :                : } WALInsertLockPadded;
                                386                 :                : 
                                387                 :                : /*
                                388                 :                :  * Session status of running backup, used for sanity checks in SQL-callable
                                389                 :                :  * functions to start and stop backups.
                                390                 :                :  */
                                391                 :                : static SessionBackupState sessionBackupState = SESSION_BACKUP_NONE;
                                392                 :                : 
                                393                 :                : /*
                                394                 :                :  * Shared state data for WAL insertion.
                                395                 :                :  */
                                396                 :                : typedef struct XLogCtlInsert
                                397                 :                : {
                                398                 :                :     slock_t     insertpos_lck;  /* protects CurrBytePos and PrevBytePos */
                                399                 :                : 
                                400                 :                :     /*
                                401                 :                :      * CurrBytePos is the end of reserved WAL. The next record will be
                                402                 :                :      * inserted at that position. PrevBytePos is the start position of the
                                403                 :                :      * previously inserted (or rather, reserved) record - it is copied to the
                                404                 :                :      * prev-link of the next record. These are stored as "usable byte
                                405                 :                :      * positions" rather than XLogRecPtrs (see XLogBytePosToRecPtr()).
                                406                 :                :      */
                                407                 :                :     uint64      CurrBytePos;
                                408                 :                :     uint64      PrevBytePos;
                                409                 :                : 
                                410                 :                :     /*
                                411                 :                :      * Make sure the above heavily-contended spinlock and byte positions are
                                412                 :                :      * on their own cache line. In particular, the RedoRecPtr and full page
                                413                 :                :      * write variables below should be on a different cache line. They are
                                414                 :                :      * read on every WAL insertion, but updated rarely, and we don't want
                                415                 :                :      * those reads to steal the cache line containing Curr/PrevBytePos.
                                416                 :                :      */
                                417                 :                :     char        pad[PG_CACHE_LINE_SIZE];
                                418                 :                : 
                                419                 :                :     /*
                                420                 :                :      * fullPageWrites is the authoritative value used by all backends to
                                421                 :                :      * determine whether to write full-page image to WAL. This shared value,
                                422                 :                :      * instead of the process-local fullPageWrites, is required because, when
                                423                 :                :      * full_page_writes is changed by SIGHUP, we must WAL-log it before it
                                424                 :                :      * actually affects WAL-logging by backends.  Checkpointer sets at startup
                                425                 :                :      * or after SIGHUP.
                                426                 :                :      *
                                427                 :                :      * To read these fields, you must hold an insertion lock. To modify them,
                                428                 :                :      * you must hold ALL the locks.
                                429                 :                :      */
                                430                 :                :     XLogRecPtr  RedoRecPtr;     /* current redo point for insertions */
                                431                 :                :     bool        fullPageWrites;
                                432                 :                : 
                                433                 :                :     /*
                                434                 :                :      * runningBackups is a counter indicating the number of backups currently
                                435                 :                :      * in progress. lastBackupStart is the latest checkpoint redo location
                                436                 :                :      * used as a starting point for an online backup.
                                437                 :                :      */
                                438                 :                :     int         runningBackups;
                                439                 :                :     XLogRecPtr  lastBackupStart;
                                440                 :                : 
                                441                 :                :     /*
                                442                 :                :      * WAL insertion locks.
                                443                 :                :      */
                                444                 :                :     WALInsertLockPadded *WALInsertLocks;
                                445                 :                : } XLogCtlInsert;
                                446                 :                : 
                                447                 :                : /*
                                448                 :                :  * Total shared-memory state for XLOG.
                                449                 :                :  */
                                450                 :                : typedef struct XLogCtlData
                                451                 :                : {
                                452                 :                :     XLogCtlInsert Insert;
                                453                 :                : 
                                454                 :                :     /* Protected by info_lck: */
                                455                 :                :     XLogwrtRqst LogwrtRqst;
                                456                 :                :     XLogRecPtr  RedoRecPtr;     /* a recent copy of Insert->RedoRecPtr */
                                457                 :                :     FullTransactionId ckptFullXid;  /* nextXid of latest checkpoint */
                                458                 :                :     XLogRecPtr  asyncXactLSN;   /* LSN of newest async commit/abort */
                                459                 :                :     XLogRecPtr  replicationSlotMinLSN;  /* oldest LSN needed by any slot */
                                460                 :                : 
                                461                 :                :     XLogSegNo   lastRemovedSegNo;   /* latest removed/recycled XLOG segment */
                                462                 :                : 
                                463                 :                :     /* Fake LSN counter, for unlogged relations. */
                                464                 :                :     pg_atomic_uint64 unloggedLSN;
                                465                 :                : 
                                466                 :                :     /* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
                                467                 :                :     pg_time_t   lastSegSwitchTime;
                                468                 :                :     XLogRecPtr  lastSegSwitchLSN;
                                469                 :                : 
                                470                 :                :     /* These are accessed using atomics -- info_lck not needed */
                                471                 :                :     pg_atomic_uint64 logInsertResult;   /* last byte + 1 inserted to buffers */
                                472                 :                :     pg_atomic_uint64 logWriteResult;    /* last byte + 1 written out */
                                473                 :                :     pg_atomic_uint64 logFlushResult;    /* last byte + 1 flushed */
                                474                 :                : 
                                475                 :                :     /*
                                476                 :                :      * Latest initialized page in the cache (last byte position + 1).
                                477                 :                :      *
                                478                 :                :      * To change the identity of a buffer (and InitializedUpTo), you need to
                                479                 :                :      * hold WALBufMappingLock.  To change the identity of a buffer that's
                                480                 :                :      * still dirty, the old page needs to be written out first, and for that
                                481                 :                :      * you need WALWriteLock, and you need to ensure that there are no
                                482                 :                :      * in-progress insertions to the page by calling
                                483                 :                :      * WaitXLogInsertionsToFinish().
                                484                 :                :      */
                                485                 :                :     XLogRecPtr  InitializedUpTo;
                                486                 :                : 
                                487                 :                :     /*
                                488                 :                :      * These values do not change after startup, although the pointed-to pages
                                489                 :                :      * and xlblocks values certainly do.  xlblocks values are protected by
                                490                 :                :      * WALBufMappingLock.
                                491                 :                :      */
                                492                 :                :     char       *pages;          /* buffers for unwritten XLOG pages */
                                493                 :                :     pg_atomic_uint64 *xlblocks; /* 1st byte ptr-s + XLOG_BLCKSZ */
                                494                 :                :     int         XLogCacheBlck;  /* highest allocated xlog buffer index */
                                495                 :                : 
                                496                 :                :     /*
                                497                 :                :      * InsertTimeLineID is the timeline into which new WAL is being inserted
                                498                 :                :      * and flushed. It is zero during recovery, and does not change once set.
                                499                 :                :      *
                                500                 :                :      * If we create a new timeline when the system was started up,
                                501                 :                :      * PrevTimeLineID is the old timeline's ID that we forked off from.
                                502                 :                :      * Otherwise it's equal to InsertTimeLineID.
                                503                 :                :      */
                                504                 :                :     TimeLineID  InsertTimeLineID;
                                505                 :                :     TimeLineID  PrevTimeLineID;
                                506                 :                : 
                                507                 :                :     /*
                                508                 :                :      * SharedRecoveryState indicates if we're still in crash or archive
                                509                 :                :      * recovery.  Protected by info_lck.
                                510                 :                :      */
                                511                 :                :     RecoveryState SharedRecoveryState;
                                512                 :                : 
                                513                 :                :     /*
                                514                 :                :      * InstallXLogFileSegmentActive indicates whether the checkpointer should
                                515                 :                :      * arrange for future segments by recycling and/or PreallocXlogFiles().
                                516                 :                :      * Protected by ControlFileLock.  Only the startup process changes it.  If
                                517                 :                :      * true, anyone can use InstallXLogFileSegment().  If false, the startup
                                518                 :                :      * process owns the exclusive right to install segments, by reading from
                                519                 :                :      * the archive and possibly replacing existing files.
                                520                 :                :      */
                                521                 :                :     bool        InstallXLogFileSegmentActive;
                                522                 :                : 
                                523                 :                :     /*
                                524                 :                :      * WalWriterSleeping indicates whether the WAL writer is currently in
                                525                 :                :      * low-power mode (and hence should be nudged if an async commit occurs).
                                526                 :                :      * Protected by info_lck.
                                527                 :                :      */
                                528                 :                :     bool        WalWriterSleeping;
                                529                 :                : 
                                530                 :                :     /*
                                531                 :                :      * During recovery, we keep a copy of the latest checkpoint record here.
                                532                 :                :      * lastCheckPointRecPtr points to start of checkpoint record and
                                533                 :                :      * lastCheckPointEndPtr points to end+1 of checkpoint record.  Used by the
                                534                 :                :      * checkpointer when it wants to create a restartpoint.
                                535                 :                :      *
                                536                 :                :      * Protected by info_lck.
                                537                 :                :      */
                                538                 :                :     XLogRecPtr  lastCheckPointRecPtr;
                                539                 :                :     XLogRecPtr  lastCheckPointEndPtr;
                                540                 :                :     CheckPoint  lastCheckPoint;
                                541                 :                : 
                                542                 :                :     /*
                                543                 :                :      * lastFpwDisableRecPtr points to the start of the last replayed
                                544                 :                :      * XLOG_FPW_CHANGE record that instructs full_page_writes is disabled.
                                545                 :                :      */
                                546                 :                :     XLogRecPtr  lastFpwDisableRecPtr;
                                547                 :                : 
                                548                 :                :     slock_t     info_lck;       /* locks shared variables shown above */
                                549                 :                : } XLogCtlData;
                                550                 :                : 
                                551                 :                : /*
                                552                 :                :  * Classification of XLogRecordInsert operations.
                                553                 :                :  */
                                554                 :                : typedef enum
                                555                 :                : {
                                556                 :                :     WALINSERT_NORMAL,
                                557                 :                :     WALINSERT_SPECIAL_SWITCH,
                                558                 :                :     WALINSERT_SPECIAL_CHECKPOINT
                                559                 :                : } WalInsertClass;
                                560                 :                : 
                                561                 :                : static XLogCtlData *XLogCtl = NULL;
                                562                 :                : 
                                563                 :                : /* a private copy of XLogCtl->Insert.WALInsertLocks, for convenience */
                                564                 :                : static WALInsertLockPadded *WALInsertLocks = NULL;
                                565                 :                : 
                                566                 :                : /*
                                567                 :                :  * We maintain an image of pg_control in shared memory.
                                568                 :                :  */
                                569                 :                : static ControlFileData *ControlFile = NULL;
                                570                 :                : 
                                571                 :                : /*
                                572                 :                :  * Calculate the amount of space left on the page after 'endptr'. Beware
                                573                 :                :  * multiple evaluation!
                                574                 :                :  */
                                575                 :                : #define INSERT_FREESPACE(endptr)    \
                                576                 :                :     (((endptr) % XLOG_BLCKSZ == 0) ? 0 : (XLOG_BLCKSZ - (endptr) % XLOG_BLCKSZ))
                                577                 :                : 
                                578                 :                : /* Macro to advance to next buffer index. */
                                579                 :                : #define NextBufIdx(idx)     \
                                580                 :                :         (((idx) == XLogCtl->XLogCacheBlck) ? 0 : ((idx) + 1))
                                581                 :                : 
                                582                 :                : /*
                                583                 :                :  * XLogRecPtrToBufIdx returns the index of the WAL buffer that holds, or
                                584                 :                :  * would hold if it was in cache, the page containing 'recptr'.
                                585                 :                :  */
                                586                 :                : #define XLogRecPtrToBufIdx(recptr)  \
                                587                 :                :     (((recptr) / XLOG_BLCKSZ) % (XLogCtl->XLogCacheBlck + 1))
                                588                 :                : 
                                589                 :                : /*
                                590                 :                :  * These are the number of bytes in a WAL page usable for WAL data.
                                591                 :                :  */
                                592                 :                : #define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD)
                                593                 :                : 
                                594                 :                : /*
                                595                 :                :  * Convert values of GUCs measured in megabytes to equiv. segment count.
                                596                 :                :  * Rounds down.
                                597                 :                :  */
                                598                 :                : #define ConvertToXSegs(x, segsize)  XLogMBVarToSegs((x), (segsize))
                                599                 :                : 
                                600                 :                : /* The number of bytes in a WAL segment usable for WAL data. */
                                601                 :                : static int  UsableBytesInSegment;
                                602                 :                : 
                                603                 :                : /*
                                604                 :                :  * Private, possibly out-of-date copy of shared LogwrtResult.
                                605                 :                :  * See discussion above.
                                606                 :                :  */
                                607                 :                : static XLogwrtResult LogwrtResult = {0, 0};
                                608                 :                : 
                                609                 :                : /*
                                610                 :                :  * Update local copy of shared XLogCtl->log{Write,Flush}Result
                                611                 :                :  *
                                612                 :                :  * It's critical that Flush always trails Write, so the order of the reads is
                                613                 :                :  * important, as is the barrier.  See also XLogWrite.
                                614                 :                :  */
                                615                 :                : #define RefreshXLogWriteResult(_target) \
                                616                 :                :     do { \
                                617                 :                :         _target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \
                                618                 :                :         pg_read_barrier(); \
                                619                 :                :         _target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \
                                620                 :                :     } while (0)
                                621                 :                : 
                                622                 :                : /*
                                623                 :                :  * openLogFile is -1 or a kernel FD for an open log file segment.
                                624                 :                :  * openLogSegNo identifies the segment, and openLogTLI the corresponding TLI.
                                625                 :                :  * These variables are only used to write the XLOG, and so will normally refer
                                626                 :                :  * to the active segment.
                                627                 :                :  *
                                628                 :                :  * Note: call Reserve/ReleaseExternalFD to track consumption of this FD.
                                629                 :                :  */
                                630                 :                : static int  openLogFile = -1;
                                631                 :                : static XLogSegNo openLogSegNo = 0;
                                632                 :                : static TimeLineID openLogTLI = 0;
                                633                 :                : 
                                634                 :                : /*
                                635                 :                :  * Local copies of equivalent fields in the control file.  When running
                                636                 :                :  * crash recovery, LocalMinRecoveryPoint is set to InvalidXLogRecPtr as we
                                637                 :                :  * expect to replay all the WAL available, and updateMinRecoveryPoint is
                                638                 :                :  * switched to false to prevent any updates while replaying records.
                                639                 :                :  * Those values are kept consistent as long as crash recovery runs.
                                640                 :                :  */
                                641                 :                : static XLogRecPtr LocalMinRecoveryPoint;
                                642                 :                : static TimeLineID LocalMinRecoveryPointTLI;
                                643                 :                : static bool updateMinRecoveryPoint = true;
                                644                 :                : 
                                645                 :                : /* For WALInsertLockAcquire/Release functions */
                                646                 :                : static int  MyLockNo = 0;
                                647                 :                : static bool holdingAllLocks = false;
                                648                 :                : 
                                649                 :                : #ifdef WAL_DEBUG
                                650                 :                : static MemoryContext walDebugCxt = NULL;
                                651                 :                : #endif
                                652                 :                : 
                                653                 :                : static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
                                654                 :                :                                         XLogRecPtr EndOfLog,
                                655                 :                :                                         TimeLineID newTLI);
                                656                 :                : static void CheckRequiredParameterValues(void);
                                657                 :                : static void XLogReportParameters(void);
                                658                 :                : static int  LocalSetXLogInsertAllowed(void);
                                659                 :                : static void CreateEndOfRecoveryRecord(void);
                                660                 :                : static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
                                661                 :                :                                                   XLogRecPtr pagePtr,
                                662                 :                :                                                   TimeLineID newTLI);
                                663                 :                : static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
                                664                 :                : static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
                                665                 :                : static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
                                666                 :                : 
                                667                 :                : static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
                                668                 :                :                                   bool opportunistic);
                                669                 :                : static void XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible);
                                670                 :                : static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                                671                 :                :                                    bool find_free, XLogSegNo max_segno,
                                672                 :                :                                    TimeLineID tli);
                                673                 :                : static void XLogFileClose(void);
                                674                 :                : static void PreallocXlogFiles(XLogRecPtr endptr, TimeLineID tli);
                                675                 :                : static void RemoveTempXlogFiles(void);
                                676                 :                : static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr,
                                677                 :                :                                XLogRecPtr endptr, TimeLineID insertTLI);
                                678                 :                : static void RemoveXlogFile(const struct dirent *segment_de,
                                679                 :                :                            XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo,
                                680                 :                :                            TimeLineID insertTLI);
                                681                 :                : static void UpdateLastRemovedPtr(char *filename);
                                682                 :                : static void ValidateXLOGDirectoryStructure(void);
                                683                 :                : static void CleanupBackupHistory(void);
                                684                 :                : static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
                                685                 :                : static bool PerformRecoveryXLogAction(void);
                                686                 :                : static void InitControlFile(uint64 sysidentifier);
                                687                 :                : static void WriteControlFile(void);
                                688                 :                : static void ReadControlFile(void);
                                689                 :                : static void UpdateControlFile(void);
                                690                 :                : static char *str_time(pg_time_t tnow);
                                691                 :                : 
                                692                 :                : static int  get_sync_bit(int method);
                                693                 :                : 
                                694                 :                : static void CopyXLogRecordToWAL(int write_len, bool isLogSwitch,
                                695                 :                :                                 XLogRecData *rdata,
                                696                 :                :                                 XLogRecPtr StartPos, XLogRecPtr EndPos,
                                697                 :                :                                 TimeLineID tli);
                                698                 :                : static void ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos,
                                699                 :                :                                       XLogRecPtr *EndPos, XLogRecPtr *PrevPtr);
                                700                 :                : static bool ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos,
                                701                 :                :                               XLogRecPtr *PrevPtr);
                                702                 :                : static XLogRecPtr WaitXLogInsertionsToFinish(XLogRecPtr upto);
                                703                 :                : static char *GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli);
                                704                 :                : static XLogRecPtr XLogBytePosToRecPtr(uint64 bytepos);
                                705                 :                : static XLogRecPtr XLogBytePosToEndRecPtr(uint64 bytepos);
                                706                 :                : static uint64 XLogRecPtrToBytePos(XLogRecPtr ptr);
                                707                 :                : 
                                708                 :                : static void WALInsertLockAcquire(void);
                                709                 :                : static void WALInsertLockAcquireExclusive(void);
                                710                 :                : static void WALInsertLockRelease(void);
                                711                 :                : static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
                                712                 :                : 
                                713                 :                : /*
                                714                 :                :  * Insert an XLOG record represented by an already-constructed chain of data
                                715                 :                :  * chunks.  This is a low-level routine; to construct the WAL record header
                                716                 :                :  * and data, use the higher-level routines in xloginsert.c.
                                717                 :                :  *
                                718                 :                :  * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this
                                719                 :                :  * WAL record applies to, that were not included in the record as full page
                                720                 :                :  * images.  If fpw_lsn <= RedoRecPtr, the function does not perform the
                                721                 :                :  * insertion and returns InvalidXLogRecPtr.  The caller can then recalculate
                                722                 :                :  * which pages need a full-page image, and retry.  If fpw_lsn is invalid, the
                                723                 :                :  * record is always inserted.
                                724                 :                :  *
                                725                 :                :  * 'flags' gives more in-depth control on the record being inserted. See
                                726                 :                :  * XLogSetRecordFlags() for details.
                                727                 :                :  *
                                728                 :                :  * 'topxid_included' tells whether the top-transaction id is logged along with
                                729                 :                :  * current subtransaction. See XLogRecordAssemble().
                                730                 :                :  *
                                731                 :                :  * The first XLogRecData in the chain must be for the record header, and its
                                732                 :                :  * data must be MAXALIGNed.  XLogInsertRecord fills in the xl_prev and
                                733                 :                :  * xl_crc fields in the header, the rest of the header must already be filled
                                734                 :                :  * by the caller.
                                735                 :                :  *
                                736                 :                :  * Returns XLOG pointer to end of record (beginning of next record).
                                737                 :                :  * This can be used as LSN for data pages affected by the logged action.
                                738                 :                :  * (LSN is the XLOG point up to which the XLOG must be flushed to disk
                                739                 :                :  * before the data page can be written out.  This implements the basic
                                740                 :                :  * WAL rule "write the log before the data".)
                                741                 :                :  */
                                742                 :                : XLogRecPtr
 2670 andres@anarazel.de        743                 :CBC    13613601 : XLogInsertRecord(XLogRecData *rdata,
                                744                 :                :                  XLogRecPtr fpw_lsn,
                                745                 :                :                  uint8 flags,
                                746                 :                :                  int num_fpi,
                                747                 :                :                  bool topxid_included)
                                748                 :                : {
 8424 bruce@momjian.us          749                 :       13613601 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                                750                 :                :     pg_crc32c   rdata_crc;
                                751                 :                :     bool        inserted;
 3447 heikki.linnakangas@i      752                 :       13613601 :     XLogRecord *rechdr = (XLogRecord *) rdata->data;
 2718 tgl@sss.pgh.pa.us         753                 :       13613601 :     uint8       info = rechdr->xl_info & ~XLR_INFO_MASK;
  178 rhaas@postgresql.org      754                 :GNC    13613601 :     WalInsertClass class = WALINSERT_NORMAL;
                                755                 :                :     XLogRecPtr  StartPos;
                                756                 :                :     XLogRecPtr  EndPos;
 2040 akapila@postgresql.o      757                 :CBC    13613601 :     bool        prevDoPageWrites = doPageWrites;
                                758                 :                :     TimeLineID  insertTLI;
                                759                 :                : 
                                760                 :                :     /* Does this record type require special handling? */
  178 rhaas@postgresql.org      761         [ +  + ]:GNC    13613601 :     if (unlikely(rechdr->xl_rmid == RM_XLOG_ID))
                                762                 :                :     {
                                763         [ +  + ]:         175374 :         if (info == XLOG_SWITCH)
                                764                 :            350 :             class = WALINSERT_SPECIAL_SWITCH;
                                765         [ +  + ]:         175024 :         else if (info == XLOG_CHECKPOINT_REDO)
                                766                 :            530 :             class = WALINSERT_SPECIAL_CHECKPOINT;
                                767                 :                :     }
                                768                 :                : 
                                769                 :                :     /* we assume that all of the record header is in the first chunk */
 3433 heikki.linnakangas@i      770         [ -  + ]:CBC    13613601 :     Assert(rdata->len >= SizeOfXLogRecord);
                                771                 :                : 
                                772                 :                :     /* cross-check on whether we should be here or not */
 5406 tgl@sss.pgh.pa.us         773         [ -  + ]:       13613601 :     if (!XLogInsertAllowed())
 5406 tgl@sss.pgh.pa.us         774         [ #  # ]:UBC           0 :         elog(ERROR, "cannot make new WAL entries during recovery");
                                775                 :                : 
                                776                 :                :     /*
                                777                 :                :      * Given that we're not in recovery, InsertTimeLineID is set and can't
                                778                 :                :      * change, so we can read it without a lock.
                                779                 :                :      */
  886 rhaas@postgresql.org      780                 :CBC    13613601 :     insertTLI = XLogCtl->InsertTimeLineID;
                                781                 :                : 
                                782                 :                :     /*----------
                                783                 :                :      *
                                784                 :                :      * We have now done all the preparatory work we can without holding a
                                785                 :                :      * lock or modifying shared state. From here on, inserting the new WAL
                                786                 :                :      * record to the shared WAL buffer cache is a two-step process:
                                787                 :                :      *
                                788                 :                :      * 1. Reserve the right amount of space from the WAL. The current head of
                                789                 :                :      *    reserved space is kept in Insert->CurrBytePos, and is protected by
                                790                 :                :      *    insertpos_lck.
                                791                 :                :      *
                                792                 :                :      * 2. Copy the record to the reserved WAL space. This involves finding the
                                793                 :                :      *    correct WAL buffer containing the reserved space, and copying the
                                794                 :                :      *    record in place. This can be done concurrently in multiple processes.
                                795                 :                :      *
                                796                 :                :      * To keep track of which insertions are still in-progress, each concurrent
                                797                 :                :      * inserter acquires an insertion lock. In addition to just indicating that
                                798                 :                :      * an insertion is in progress, the lock tells others how far the inserter
                                799                 :                :      * has progressed. There is a small fixed number of insertion locks,
                                800                 :                :      * determined by NUM_XLOGINSERT_LOCKS. When an inserter crosses a page
                                801                 :                :      * boundary, it updates the value stored in the lock to the how far it has
                                802                 :                :      * inserted, to allow the previous buffer to be flushed.
                                803                 :                :      *
                                804                 :                :      * Holding onto an insertion lock also protects RedoRecPtr and
                                805                 :                :      * fullPageWrites from changing until the insertion is finished.
                                806                 :                :      *
                                807                 :                :      * Step 2 can usually be done completely in parallel. If the required WAL
                                808                 :                :      * page is not initialized yet, you have to grab WALBufMappingLock to
                                809                 :                :      * initialize it, but the WAL writer tries to do that ahead of insertions
                                810                 :                :      * to avoid that from happening in the critical path.
                                811                 :                :      *
                                812                 :                :      *----------
                                813                 :                :      */
 4477 heikki.linnakangas@i      814                 :       13613601 :     START_CRIT_SECTION();
                                815                 :                : 
  178 rhaas@postgresql.org      816         [ +  + ]:GNC    13613601 :     if (likely(class == WALINSERT_NORMAL))
                                817                 :                :     {
  187 rhaas@postgresql.org      818                 :CBC    13612721 :         WALInsertLockAcquire();
                                819                 :                : 
                                820                 :                :         /*
                                821                 :                :          * Check to see if my copy of RedoRecPtr is out of date. If so, may
                                822                 :                :          * have to go back and have the caller recompute everything. This can
                                823                 :                :          * only happen just after a checkpoint, so it's better to be slow in
                                824                 :                :          * this case and fast otherwise.
                                825                 :                :          *
                                826                 :                :          * Also check to see if fullPageWrites was just turned on or there's a
                                827                 :                :          * running backup (which forces full-page writes); if we weren't
                                828                 :                :          * already doing full-page writes then go back and recompute.
                                829                 :                :          *
                                830                 :                :          * If we aren't doing full-page writes then RedoRecPtr doesn't
                                831                 :                :          * actually affect the contents of the XLOG record, so we'll update
                                832                 :                :          * our local copy but not force a recomputation.  (If doPageWrites was
                                833                 :                :          * just turned off, we could recompute the record without full pages,
                                834                 :                :          * but we choose not to bother.)
                                835                 :                :          */
  187 rhaas@postgresql.org      836         [ +  + ]:GNC    13612721 :         if (RedoRecPtr != Insert->RedoRecPtr)
                                837                 :                :         {
                                838         [ -  + ]:           5414 :             Assert(RedoRecPtr < Insert->RedoRecPtr);
                                839                 :           5414 :             RedoRecPtr = Insert->RedoRecPtr;
                                840                 :                :         }
                                841   [ +  +  +  + ]:       13612721 :         doPageWrites = (Insert->fullPageWrites || Insert->runningBackups > 0);
                                842                 :                : 
                                843         [ +  + ]:       13612721 :         if (doPageWrites &&
                                844   [ +  +  +  + ]:       13401892 :             (!prevDoPageWrites ||
                                845         [ +  + ]:       12352919 :              (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr)))
                                846                 :                :         {
                                847                 :                :             /*
                                848                 :                :              * Oops, some buffer now needs to be backed up that the caller
                                849                 :                :              * didn't back up.  Start over.
                                850                 :                :              */
                                851                 :           6216 :             WALInsertLockRelease();
                                852         [ -  + ]:           6216 :             END_CRIT_SECTION();
                                853                 :           6216 :             return InvalidXLogRecPtr;
                                854                 :                :         }
                                855                 :                : 
                                856                 :                :         /*
                                857                 :                :          * Reserve space for the record in the WAL. This also sets the xl_prev
                                858                 :                :          * pointer.
                                859                 :                :          */
 3447 heikki.linnakangas@i      860                 :CBC    13606505 :         ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos,
                                861                 :                :                                   &rechdr->xl_prev);
                                862                 :                : 
                                863                 :                :         /* Normal records are always inserted. */
 3933 heikki.linnakangas@i      864                 :GNC    13606505 :         inserted = true;
                                865                 :                :     }
  178 rhaas@postgresql.org      866         [ +  + ]:            880 :     else if (class == WALINSERT_SPECIAL_SWITCH)
                                867                 :                :     {
                                868                 :                :         /*
                                869                 :                :          * In order to insert an XLOG_SWITCH record, we need to hold all of
                                870                 :                :          * the WAL insertion locks, not just one, so that no one else can
                                871                 :                :          * begin inserting a record until we've figured out how much space
                                872                 :                :          * remains in the current WAL segment and claimed all of it.
                                873                 :                :          *
                                874                 :                :          * Nonetheless, this case is simpler than the normal cases handled
                                875                 :                :          * below, which must check for changes in doPageWrites and RedoRecPtr.
                                876                 :                :          * Those checks are only needed for records that can contain buffer
                                877                 :                :          * references, and an XLOG_SWITCH record never does.
                                878                 :                :          */
  187                           879         [ -  + ]:            350 :         Assert(fpw_lsn == InvalidXLogRecPtr);
                                880                 :            350 :         WALInsertLockAcquireExclusive();
                                881                 :            350 :         inserted = ReserveXLogSwitch(&StartPos, &EndPos, &rechdr->xl_prev);
                                882                 :                :     }
                                883                 :                :     else
                                884                 :                :     {
  178                           885         [ -  + ]:            530 :         Assert(class == WALINSERT_SPECIAL_CHECKPOINT);
                                886                 :                : 
                                887                 :                :         /*
                                888                 :                :          * We need to update both the local and shared copies of RedoRecPtr,
                                889                 :                :          * which means that we need to hold all the WAL insertion locks.
                                890                 :                :          * However, there can't be any buffer references, so as above, we need
                                891                 :                :          * not check RedoRecPtr before inserting the record; we just need to
                                892                 :                :          * update it afterwards.
                                893                 :                :          */
                                894         [ -  + ]:            530 :         Assert(fpw_lsn == InvalidXLogRecPtr);
                                895                 :            530 :         WALInsertLockAcquireExclusive();
                                896                 :            530 :         ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos,
                                897                 :                :                                   &rechdr->xl_prev);
                                898                 :            530 :         RedoRecPtr = Insert->RedoRecPtr = StartPos;
  178 rhaas@postgresql.org      899                 :CBC         530 :         inserted = true;
                                900                 :                :     }
                                901                 :                : 
 3933 heikki.linnakangas@i      902         [ +  + ]:       13607385 :     if (inserted)
                                903                 :                :     {
                                904                 :                :         /*
                                905                 :                :          * Now that xl_prev has been filled in, calculate CRC of the record
                                906                 :                :          * header.
                                907                 :                :          */
 3433                           908                 :       13607330 :         rdata_crc = rechdr->xl_crc;
                                909                 :       13607330 :         COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
 3449                           910                 :       13607330 :         FIN_CRC32C(rdata_crc);
 3933                           911                 :       13607330 :         rechdr->xl_crc = rdata_crc;
                                912                 :                : 
                                913                 :                :         /*
                                914                 :                :          * All the record data, including the header, is now ready to be
                                915                 :                :          * inserted. Copy the record in the space reserved.
                                916                 :                :          */
  178 rhaas@postgresql.org      917                 :GNC    13607330 :         CopyXLogRecordToWAL(rechdr->xl_tot_len,
                                918                 :                :                             class == WALINSERT_SPECIAL_SWITCH, rdata,
                                919                 :                :                             StartPos, EndPos, insertTLI);
                                920                 :                : 
                                921                 :                :         /*
                                922                 :                :          * Unless record is flagged as not important, update LSN of last
                                923                 :                :          * important record in the current slot. When holding all locks, just
                                924                 :                :          * update the first one.
                                925                 :                :          */
 2670 andres@anarazel.de        926         [ +  + ]:CBC    13607330 :         if ((flags & XLOG_MARK_UNIMPORTANT) == 0)
                                927                 :                :         {
 2524 bruce@momjian.us          928         [ +  + ]:       13532805 :             int         lockno = holdingAllLocks ? 0 : MyLockNo;
                                929                 :                : 
 2670 andres@anarazel.de        930                 :       13532805 :             WALInsertLocks[lockno].l.lastImportantAt = StartPos;
                                931                 :                :         }
                                932                 :                :     }
                                933                 :                :     else
                                934                 :                :     {
                                935                 :                :         /*
                                936                 :                :          * This was an xlog-switch record, but the current insert location was
                                937                 :                :          * already exactly at the beginning of a segment, so there was no need
                                938                 :                :          * to do anything.
                                939                 :                :          */
                                940                 :                :     }
                                941                 :                : 
                                942                 :                :     /*
                                943                 :                :      * Done! Let others know that we're finished.
                                944                 :                :      */
 3677 heikki.linnakangas@i      945                 :       13607385 :     WALInsertLockRelease();
                                946                 :                : 
 3933                           947         [ -  + ]:       13607385 :     END_CRIT_SECTION();
                                948                 :                : 
  894 akapila@postgresql.o      949                 :       13607385 :     MarkCurrentTransactionIdLoggedIfAny();
                                950                 :                : 
                                951                 :                :     /*
                                952                 :                :      * Mark top transaction id is logged (if needed) so that we should not try
                                953                 :                :      * to log it again with the next WAL record in the current subtransaction.
                                954                 :                :      */
                                955         [ +  + ]:       13607385 :     if (topxid_included)
                                956                 :            217 :         MarkSubxactTopXidLogged();
                                957                 :                : 
                                958                 :                :     /*
                                959                 :                :      * Update shared LogwrtRqst.Write, if we crossed page boundary.
                                960                 :                :      */
 3933 heikki.linnakangas@i      961         [ +  + ]:       13607385 :     if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
                                962                 :                :     {
 3492 andres@anarazel.de        963         [ +  + ]:         287190 :         SpinLockAcquire(&XLogCtl->info_lck);
                                964                 :                :         /* advance global request to include new block(s) */
                                965         [ +  + ]:         287190 :         if (XLogCtl->LogwrtRqst.Write < EndPos)
                                966                 :         287125 :             XLogCtl->LogwrtRqst.Write = EndPos;
                                967                 :         287190 :         SpinLockRelease(&XLogCtl->info_lck);
    9 alvherre@alvh.no-ip.      968                 :GNC      287190 :         RefreshXLogWriteResult(LogwrtResult);
                                969                 :                :     }
                                970                 :                : 
                                971                 :                :     /*
                                972                 :                :      * If this was an XLOG_SWITCH record, flush the record and the empty
                                973                 :                :      * padding space that fills the rest of the segment, and perform
                                974                 :                :      * end-of-segment actions (eg, notifying archiver).
                                975                 :                :      */
  178 rhaas@postgresql.org      976         [ +  + ]:       13607385 :     if (class == WALINSERT_SPECIAL_SWITCH)
                                977                 :                :     {
                                978                 :                :         TRACE_POSTGRESQL_WAL_SWITCH();
 3933 heikki.linnakangas@i      979                 :CBC         350 :         XLogFlush(EndPos);
                                980                 :                : 
                                981                 :                :         /*
                                982                 :                :          * Even though we reserved the rest of the segment for us, which is
                                983                 :                :          * reflected in EndPos, we return a pointer to just the end of the
                                984                 :                :          * xlog-switch record.
                                985                 :                :          */
                                986         [ +  + ]:            350 :         if (inserted)
                                987                 :                :         {
                                988                 :            295 :             EndPos = StartPos + SizeOfXLogRecord;
                                989         [ -  + ]:            295 :             if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
                                990                 :                :             {
 2399 andres@anarazel.de        991                 :LBC         (1) :                 uint64      offset = XLogSegmentOffset(EndPos, wal_segment_size);
                                992                 :                : 
                                993         [ #  # ]:            (1) :                 if (offset == EndPos % XLOG_BLCKSZ)
 3933 heikki.linnakangas@i      994                 :UBC           0 :                     EndPos += SizeOfXLogLongPHD;
                                995                 :                :                 else
 3933 heikki.linnakangas@i      996                 :LBC         (1) :                     EndPos += SizeOfXLogShortPHD;
                                997                 :                :             }
                                998                 :                :         }
                                999                 :                :     }
                               1000                 :                : 
                               1001                 :                : #ifdef WAL_DEBUG
                               1002                 :                :     if (XLOG_DEBUG)
                               1003                 :                :     {
                               1004                 :                :         static XLogReaderState *debug_reader = NULL;
                               1005                 :                :         XLogRecord *record;
                               1006                 :                :         DecodedXLogRecord *decoded;
                               1007                 :                :         StringInfoData buf;
                               1008                 :                :         StringInfoData recordBuf;
                               1009                 :                :         char       *errormsg = NULL;
                               1010                 :                :         MemoryContext oldCxt;
                               1011                 :                : 
                               1012                 :                :         oldCxt = MemoryContextSwitchTo(walDebugCxt);
                               1013                 :                : 
                               1014                 :                :         initStringInfo(&buf);
                               1015                 :                :         appendStringInfo(&buf, "INSERT @ %X/%X: ", LSN_FORMAT_ARGS(EndPos));
                               1016                 :                : 
                               1017                 :                :         /*
                               1018                 :                :          * We have to piece together the WAL record data from the XLogRecData
                               1019                 :                :          * entries, so that we can pass it to the rm_desc function as one
                               1020                 :                :          * contiguous chunk.
                               1021                 :                :          */
                               1022                 :                :         initStringInfo(&recordBuf);
                               1023                 :                :         for (; rdata != NULL; rdata = rdata->next)
                               1024                 :                :             appendBinaryStringInfo(&recordBuf, rdata->data, rdata->len);
                               1025                 :                : 
                               1026                 :                :         /* We also need temporary space to decode the record. */
                               1027                 :                :         record = (XLogRecord *) recordBuf.data;
                               1028                 :                :         decoded = (DecodedXLogRecord *)
                               1029                 :                :             palloc(DecodeXLogRecordRequiredSpace(record->xl_tot_len));
                               1030                 :                : 
                               1031                 :                :         if (!debug_reader)
                               1032                 :                :             debug_reader = XLogReaderAllocate(wal_segment_size, NULL,
                               1033                 :                :                                               XL_ROUTINE(.page_read = NULL,
                               1034                 :                :                                                          .segment_open = NULL,
                               1035                 :                :                                                          .segment_close = NULL),
                               1036                 :                :                                               NULL);
                               1037                 :                :         if (!debug_reader)
                               1038                 :                :         {
                               1039                 :                :             appendStringInfoString(&buf, "error decoding record: out of memory while allocating a WAL reading processor");
                               1040                 :                :         }
                               1041                 :                :         else if (!DecodeXLogRecord(debug_reader,
                               1042                 :                :                                    decoded,
                               1043                 :                :                                    record,
                               1044                 :                :                                    EndPos,
                               1045                 :                :                                    &errormsg))
                               1046                 :                :         {
                               1047                 :                :             appendStringInfo(&buf, "error decoding record: %s",
                               1048                 :                :                              errormsg ? errormsg : "no error message");
                               1049                 :                :         }
                               1050                 :                :         else
                               1051                 :                :         {
                               1052                 :                :             appendStringInfoString(&buf, " - ");
                               1053                 :                : 
                               1054                 :                :             debug_reader->record = decoded;
                               1055                 :                :             xlog_outdesc(&buf, debug_reader);
                               1056                 :                :             debug_reader->record = NULL;
                               1057                 :                :         }
                               1058                 :                :         elog(LOG, "%s", buf.data);
                               1059                 :                : 
                               1060                 :                :         pfree(decoded);
                               1061                 :                :         pfree(buf.data);
                               1062                 :                :         pfree(recordBuf.data);
                               1063                 :                :         MemoryContextSwitchTo(oldCxt);
                               1064                 :                :     }
                               1065                 :                : #endif
                               1066                 :                : 
                               1067                 :                :     /*
                               1068                 :                :      * Update our global variables
                               1069                 :                :      */
 3933 heikki.linnakangas@i     1070                 :CBC    13607385 :     ProcLastRecPtr = StartPos;
                               1071                 :       13607385 :     XactLastRecEnd = EndPos;
                               1072                 :                : 
                               1073                 :                :     /* Report WAL traffic to the instrumentation. */
 1471 akapila@postgresql.o     1074         [ +  + ]:       13607385 :     if (inserted)
                               1075                 :                :     {
                               1076                 :       13607330 :         pgWalUsage.wal_bytes += rechdr->xl_tot_len;
                               1077                 :       13607330 :         pgWalUsage.wal_records++;
 1440                          1078                 :       13607330 :         pgWalUsage.wal_fpi += num_fpi;
                               1079                 :                :     }
                               1080                 :                : 
 3933 heikki.linnakangas@i     1081                 :       13607385 :     return EndPos;
                               1082                 :                : }
                               1083                 :                : 
                               1084                 :                : /*
                               1085                 :                :  * Reserves the right amount of space for a record of given size from the WAL.
                               1086                 :                :  * *StartPos is set to the beginning of the reserved section, *EndPos to
                               1087                 :                :  * its end+1. *PrevPtr is set to the beginning of the previous record; it is
                               1088                 :                :  * used to set the xl_prev of this record.
                               1089                 :                :  *
                               1090                 :                :  * This is the performance critical part of XLogInsert that must be serialized
                               1091                 :                :  * across backends. The rest can happen mostly in parallel. Try to keep this
                               1092                 :                :  * section as short as possible, insertpos_lck can be heavily contended on a
                               1093                 :                :  * busy system.
                               1094                 :                :  *
                               1095                 :                :  * NB: The space calculation here must match the code in CopyXLogRecordToWAL,
                               1096                 :                :  * where we actually copy the record to the reserved space.
                               1097                 :                :  *
                               1098                 :                :  * NB: Testing shows that XLogInsertRecord runs faster if this code is inlined;
                               1099                 :                :  * however, because there are two call sites, the compiler is reluctant to
                               1100                 :                :  * inline. We use pg_attribute_always_inline here to try to convince it.
                               1101                 :                :  */
                               1102                 :                : static pg_attribute_always_inline void
                               1103                 :       13607035 : ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos, XLogRecPtr *EndPos,
                               1104                 :                :                           XLogRecPtr *PrevPtr)
                               1105                 :                : {
 3492 andres@anarazel.de       1106                 :       13607035 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1107                 :                :     uint64      startbytepos;
                               1108                 :                :     uint64      endbytepos;
                               1109                 :                :     uint64      prevbytepos;
                               1110                 :                : 
 3933 heikki.linnakangas@i     1111                 :       13607035 :     size = MAXALIGN(size);
                               1112                 :                : 
                               1113                 :                :     /* All (non xlog-switch) records should contain data. */
                               1114         [ -  + ]:       13607035 :     Assert(size > SizeOfXLogRecord);
                               1115                 :                : 
                               1116                 :                :     /*
                               1117                 :                :      * The duration the spinlock needs to be held is minimized by minimizing
                               1118                 :                :      * the calculations that have to be done while holding the lock. The
                               1119                 :                :      * current tip of reserved WAL is kept in CurrBytePos, as a byte position
                               1120                 :                :      * that only counts "usable" bytes in WAL, that is, it excludes all WAL
                               1121                 :                :      * page headers. The mapping between "usable" byte positions and physical
                               1122                 :                :      * positions (XLogRecPtrs) can be done outside the locked region, and
                               1123                 :                :      * because the usable byte position doesn't include any headers, reserving
                               1124                 :                :      * X bytes from WAL is almost as simple as "CurrBytePos += X".
                               1125                 :                :      */
                               1126         [ +  + ]:       13607035 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1127                 :                : 
                               1128                 :       13607035 :     startbytepos = Insert->CurrBytePos;
                               1129                 :       13607035 :     endbytepos = startbytepos + size;
                               1130                 :       13607035 :     prevbytepos = Insert->PrevBytePos;
                               1131                 :       13607035 :     Insert->CurrBytePos = endbytepos;
                               1132                 :       13607035 :     Insert->PrevBytePos = startbytepos;
                               1133                 :                : 
                               1134                 :       13607035 :     SpinLockRelease(&Insert->insertpos_lck);
                               1135                 :                : 
                               1136                 :       13607035 :     *StartPos = XLogBytePosToRecPtr(startbytepos);
                               1137                 :       13607035 :     *EndPos = XLogBytePosToEndRecPtr(endbytepos);
                               1138                 :       13607035 :     *PrevPtr = XLogBytePosToRecPtr(prevbytepos);
                               1139                 :                : 
                               1140                 :                :     /*
                               1141                 :                :      * Check that the conversions between "usable byte positions" and
                               1142                 :                :      * XLogRecPtrs work consistently in both directions.
                               1143                 :                :      */
                               1144         [ -  + ]:       13607035 :     Assert(XLogRecPtrToBytePos(*StartPos) == startbytepos);
                               1145         [ -  + ]:       13607035 :     Assert(XLogRecPtrToBytePos(*EndPos) == endbytepos);
                               1146         [ -  + ]:       13607035 :     Assert(XLogRecPtrToBytePos(*PrevPtr) == prevbytepos);
                               1147                 :       13607035 : }
                               1148                 :                : 
                               1149                 :                : /*
                               1150                 :                :  * Like ReserveXLogInsertLocation(), but for an xlog-switch record.
                               1151                 :                :  *
                               1152                 :                :  * A log-switch record is handled slightly differently. The rest of the
                               1153                 :                :  * segment will be reserved for this insertion, as indicated by the returned
                               1154                 :                :  * *EndPos value. However, if we are already at the beginning of the current
                               1155                 :                :  * segment, *StartPos and *EndPos are set to the current location without
                               1156                 :                :  * reserving any space, and the function returns false.
                               1157                 :                : */
                               1158                 :                : static bool
                               1159                 :            350 : ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos, XLogRecPtr *PrevPtr)
                               1160                 :                : {
 3492 andres@anarazel.de       1161                 :            350 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1162                 :                :     uint64      startbytepos;
                               1163                 :                :     uint64      endbytepos;
                               1164                 :                :     uint64      prevbytepos;
 3433 heikki.linnakangas@i     1165                 :            350 :     uint32      size = MAXALIGN(SizeOfXLogRecord);
                               1166                 :                :     XLogRecPtr  ptr;
                               1167                 :                :     uint32      segleft;
                               1168                 :                : 
                               1169                 :                :     /*
                               1170                 :                :      * These calculations are a bit heavy-weight to be done while holding a
                               1171                 :                :      * spinlock, but since we're holding all the WAL insertion locks, there
                               1172                 :                :      * are no other inserters competing for it. GetXLogInsertRecPtr() does
                               1173                 :                :      * compete for it, but that's not called very frequently.
                               1174                 :                :      */
 3933                          1175         [ -  + ]:            350 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1176                 :                : 
                               1177                 :            350 :     startbytepos = Insert->CurrBytePos;
                               1178                 :                : 
                               1179                 :            350 :     ptr = XLogBytePosToEndRecPtr(startbytepos);
 2399 andres@anarazel.de       1180         [ +  + ]:            350 :     if (XLogSegmentOffset(ptr, wal_segment_size) == 0)
                               1181                 :                :     {
 3933 heikki.linnakangas@i     1182                 :             55 :         SpinLockRelease(&Insert->insertpos_lck);
                               1183                 :             55 :         *EndPos = *StartPos = ptr;
                               1184                 :             55 :         return false;
                               1185                 :                :     }
                               1186                 :                : 
                               1187                 :            295 :     endbytepos = startbytepos + size;
                               1188                 :            295 :     prevbytepos = Insert->PrevBytePos;
                               1189                 :                : 
                               1190                 :            295 :     *StartPos = XLogBytePosToRecPtr(startbytepos);
                               1191                 :            295 :     *EndPos = XLogBytePosToEndRecPtr(endbytepos);
                               1192                 :                : 
 2399 andres@anarazel.de       1193                 :            295 :     segleft = wal_segment_size - XLogSegmentOffset(*EndPos, wal_segment_size);
                               1194         [ +  - ]:            295 :     if (segleft != wal_segment_size)
                               1195                 :                :     {
                               1196                 :                :         /* consume the rest of the segment */
 3933 heikki.linnakangas@i     1197                 :            295 :         *EndPos += segleft;
                               1198                 :            295 :         endbytepos = XLogRecPtrToBytePos(*EndPos);
                               1199                 :                :     }
                               1200                 :            295 :     Insert->CurrBytePos = endbytepos;
                               1201                 :            295 :     Insert->PrevBytePos = startbytepos;
                               1202                 :                : 
                               1203                 :            295 :     SpinLockRelease(&Insert->insertpos_lck);
                               1204                 :                : 
                               1205                 :            295 :     *PrevPtr = XLogBytePosToRecPtr(prevbytepos);
                               1206                 :                : 
 2399 andres@anarazel.de       1207         [ -  + ]:            295 :     Assert(XLogSegmentOffset(*EndPos, wal_segment_size) == 0);
 3933 heikki.linnakangas@i     1208         [ -  + ]:            295 :     Assert(XLogRecPtrToBytePos(*EndPos) == endbytepos);
                               1209         [ -  + ]:            295 :     Assert(XLogRecPtrToBytePos(*StartPos) == startbytepos);
                               1210         [ -  + ]:            295 :     Assert(XLogRecPtrToBytePos(*PrevPtr) == prevbytepos);
                               1211                 :                : 
                               1212                 :            295 :     return true;
                               1213                 :                : }
                               1214                 :                : 
                               1215                 :                : /*
                               1216                 :                :  * Subroutine of XLogInsertRecord.  Copies a WAL record to an already-reserved
                               1217                 :                :  * area in the WAL.
                               1218                 :                :  */
                               1219                 :                : static void
                               1220                 :       13607330 : CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
                               1221                 :                :                     XLogRecPtr StartPos, XLogRecPtr EndPos, TimeLineID tli)
                               1222                 :                : {
                               1223                 :                :     char       *currpos;
                               1224                 :                :     int         freespace;
                               1225                 :                :     int         written;
                               1226                 :                :     XLogRecPtr  CurrPos;
                               1227                 :                :     XLogPageHeader pagehdr;
                               1228                 :                : 
                               1229                 :                :     /*
                               1230                 :                :      * Get a pointer to the right place in the right WAL buffer to start
                               1231                 :                :      * inserting to.
                               1232                 :                :      */
                               1233                 :       13607330 :     CurrPos = StartPos;
  891 rhaas@postgresql.org     1234                 :       13607330 :     currpos = GetXLogBuffer(CurrPos, tli);
 3933 heikki.linnakangas@i     1235         [ +  - ]:       13607330 :     freespace = INSERT_FREESPACE(CurrPos);
                               1236                 :                : 
                               1237                 :                :     /*
                               1238                 :                :      * there should be enough space for at least the first field (xl_tot_len)
                               1239                 :                :      * on this page.
                               1240                 :                :      */
                               1241         [ -  + ]:       13607330 :     Assert(freespace >= sizeof(uint32));
                               1242                 :                : 
                               1243                 :                :     /* Copy record data */
                               1244                 :       13607330 :     written = 0;
                               1245         [ +  + ]:       59884839 :     while (rdata != NULL)
                               1246                 :                :     {
                               1247                 :       46277509 :         char       *rdata_data = rdata->data;
                               1248                 :       46277509 :         int         rdata_len = rdata->len;
                               1249                 :                : 
                               1250         [ +  + ]:       46642851 :         while (rdata_len > freespace)
                               1251                 :                :         {
                               1252                 :                :             /*
                               1253                 :                :              * Write what fits on this page, and continue on the next page.
                               1254                 :                :              */
                               1255   [ +  +  -  + ]:         365342 :             Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || freespace == 0);
                               1256                 :         365342 :             memcpy(currpos, rdata_data, freespace);
                               1257                 :         365342 :             rdata_data += freespace;
                               1258                 :         365342 :             rdata_len -= freespace;
                               1259                 :         365342 :             written += freespace;
                               1260                 :         365342 :             CurrPos += freespace;
                               1261                 :                : 
                               1262                 :                :             /*
                               1263                 :                :              * Get pointer to beginning of next page, and set the xlp_rem_len
                               1264                 :                :              * in the page header. Set XLP_FIRST_IS_CONTRECORD.
                               1265                 :                :              *
                               1266                 :                :              * It's safe to set the contrecord flag and xlp_rem_len without a
                               1267                 :                :              * lock on the page. All the other flags were already set when the
                               1268                 :                :              * page was initialized, in AdvanceXLInsertBuffer, and we're the
                               1269                 :                :              * only backend that needs to set the contrecord flag.
                               1270                 :                :              */
  891 rhaas@postgresql.org     1271                 :         365342 :             currpos = GetXLogBuffer(CurrPos, tli);
 3933 heikki.linnakangas@i     1272                 :         365342 :             pagehdr = (XLogPageHeader) currpos;
                               1273                 :         365342 :             pagehdr->xlp_rem_len = write_len - written;
                               1274                 :         365342 :             pagehdr->xlp_info |= XLP_FIRST_IS_CONTRECORD;
                               1275                 :                : 
                               1276                 :                :             /* skip over the page header */
 2399 andres@anarazel.de       1277         [ +  + ]:         365342 :             if (XLogSegmentOffset(CurrPos, wal_segment_size) == 0)
                               1278                 :                :             {
 3933 heikki.linnakangas@i     1279                 :            470 :                 CurrPos += SizeOfXLogLongPHD;
                               1280                 :            470 :                 currpos += SizeOfXLogLongPHD;
                               1281                 :                :             }
                               1282                 :                :             else
                               1283                 :                :             {
                               1284                 :         364872 :                 CurrPos += SizeOfXLogShortPHD;
                               1285                 :         364872 :                 currpos += SizeOfXLogShortPHD;
                               1286                 :                :             }
                               1287         [ +  - ]:         365342 :             freespace = INSERT_FREESPACE(CurrPos);
                               1288                 :                :         }
                               1289                 :                : 
                               1290   [ +  +  -  + ]:       46277509 :         Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || rdata_len == 0);
                               1291                 :       46277509 :         memcpy(currpos, rdata_data, rdata_len);
                               1292                 :       46277509 :         currpos += rdata_len;
                               1293                 :       46277509 :         CurrPos += rdata_len;
                               1294                 :       46277509 :         freespace -= rdata_len;
                               1295                 :       46277509 :         written += rdata_len;
                               1296                 :                : 
                               1297                 :       46277509 :         rdata = rdata->next;
                               1298                 :                :     }
                               1299         [ -  + ]:       13607330 :     Assert(written == write_len);
                               1300                 :                : 
                               1301                 :                :     /*
                               1302                 :                :      * If this was an xlog-switch, it's not enough to write the switch record,
                               1303                 :                :      * we also have to consume all the remaining space in the WAL segment.  We
                               1304                 :                :      * have already reserved that space, but we need to actually fill it.
                               1305                 :                :      */
 2399 andres@anarazel.de       1306   [ +  +  +  - ]:       13607330 :     if (isLogSwitch && XLogSegmentOffset(CurrPos, wal_segment_size) != 0)
                               1307                 :                :     {
                               1308                 :                :         /* An xlog-switch record doesn't contain any data besides the header */
 3933 heikki.linnakangas@i     1309         [ -  + ]:            295 :         Assert(write_len == SizeOfXLogRecord);
                               1310                 :                : 
                               1311                 :                :         /* Assert that we did reserve the right amount of space */
 2399 andres@anarazel.de       1312         [ -  + ]:            295 :         Assert(XLogSegmentOffset(EndPos, wal_segment_size) == 0);
                               1313                 :                : 
                               1314                 :                :         /* Use up all the remaining space on the current page */
 3933 heikki.linnakangas@i     1315                 :            295 :         CurrPos += freespace;
                               1316                 :                : 
                               1317                 :                :         /*
                               1318                 :                :          * Cause all remaining pages in the segment to be flushed, leaving the
                               1319                 :                :          * XLog position where it should be, at the start of the next segment.
                               1320                 :                :          * We do this one page at a time, to make sure we don't deadlock
                               1321                 :                :          * against ourselves if wal_buffers < wal_segment_size.
                               1322                 :                :          */
                               1323         [ +  + ]:         467819 :         while (CurrPos < EndPos)
                               1324                 :                :         {
                               1325                 :                :             /*
                               1326                 :                :              * The minimal action to flush the page would be to call
                               1327                 :                :              * WALInsertLockUpdateInsertingAt(CurrPos) followed by
                               1328                 :                :              * AdvanceXLInsertBuffer(...).  The page would be left initialized
                               1329                 :                :              * mostly to zeros, except for the page header (always the short
                               1330                 :                :              * variant, as this is never a segment's first page).
                               1331                 :                :              *
                               1332                 :                :              * The large vistas of zeros are good for compressibility, but the
                               1333                 :                :              * headers interrupting them every XLOG_BLCKSZ (with values that
                               1334                 :                :              * differ from page to page) are not.  The effect varies with
                               1335                 :                :              * compression tool, but bzip2 for instance compresses about an
                               1336                 :                :              * order of magnitude worse if those headers are left in place.
                               1337                 :                :              *
                               1338                 :                :              * Rather than complicating AdvanceXLInsertBuffer itself (which is
                               1339                 :                :              * called in heavily-loaded circumstances as well as this lightly-
                               1340                 :                :              * loaded one) with variant behavior, we just use GetXLogBuffer
                               1341                 :                :              * (which itself calls the two methods we need) to get the pointer
                               1342                 :                :              * and zero most of the page.  Then we just zero the page header.
                               1343                 :                :              */
  891 rhaas@postgresql.org     1344                 :         467524 :             currpos = GetXLogBuffer(CurrPos, tli);
 2207 tgl@sss.pgh.pa.us        1345   [ +  -  +  -  :        1870096 :             MemSet(currpos, 0, SizeOfXLogShortPHD);
                                     +  -  +  -  +  
                                                 + ]
                               1346                 :                : 
 3933 heikki.linnakangas@i     1347                 :         467524 :             CurrPos += XLOG_BLCKSZ;
                               1348                 :                :         }
                               1349                 :                :     }
                               1350                 :                :     else
                               1351                 :                :     {
                               1352                 :                :         /* Align the end position, so that the next record starts aligned */
 3433                          1353                 :       13607035 :         CurrPos = MAXALIGN64(CurrPos);
                               1354                 :                :     }
                               1355                 :                : 
 3933                          1356         [ -  + ]:       13607330 :     if (CurrPos != EndPos)
   11 dgustafsson@postgres     1357         [ #  # ]:UNC           0 :         ereport(PANIC,
                               1358                 :                :                 errcode(ERRCODE_DATA_CORRUPTED),
                               1359                 :                :                 errmsg_internal("space reserved for WAL record does not match what was written"));
 3933 heikki.linnakangas@i     1360                 :CBC    13607330 : }
                               1361                 :                : 
                               1362                 :                : /*
                               1363                 :                :  * Acquire a WAL insertion lock, for inserting to WAL.
                               1364                 :                :  */
                               1365                 :                : static void
 3677                          1366                 :       13612731 : WALInsertLockAcquire(void)
                               1367                 :                : {
                               1368                 :                :     bool        immed;
                               1369                 :                : 
                               1370                 :                :     /*
                               1371                 :                :      * It doesn't matter which of the WAL insertion locks we acquire, so try
                               1372                 :                :      * the one we used last time.  If the system isn't particularly busy, it's
                               1373                 :                :      * a good bet that it's still available, and it's good to have some
                               1374                 :                :      * affinity to a particular lock so that you don't unnecessarily bounce
                               1375                 :                :      * cache lines between processes when there's no contention.
                               1376                 :                :      *
                               1377                 :                :      * If this is the first time through in this backend, pick a lock
                               1378                 :                :      * (semi-)randomly.  This allows the locks to be used evenly if you have a
                               1379                 :                :      * lot of very short connections.
                               1380                 :                :      */
                               1381                 :                :     static int  lockToTry = -1;
                               1382                 :                : 
                               1383         [ +  + ]:       13612731 :     if (lockToTry == -1)
   52 heikki.linnakangas@i     1384                 :GNC        6750 :         lockToTry = MyProcNumber % NUM_XLOGINSERT_LOCKS;
 3677 heikki.linnakangas@i     1385                 :CBC    13612731 :     MyLockNo = lockToTry;
                               1386                 :                : 
                               1387                 :                :     /*
                               1388                 :                :      * The insertingAt value is initially set to 0, as we don't know our
                               1389                 :                :      * insert location yet.
                               1390                 :                :      */
 3180 andres@anarazel.de       1391                 :       13612731 :     immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
 3677 heikki.linnakangas@i     1392         [ +  + ]:       13612731 :     if (!immed)
                               1393                 :                :     {
                               1394                 :                :         /*
                               1395                 :                :          * If we couldn't get the lock immediately, try another lock next
                               1396                 :                :          * time.  On a system with more insertion locks than concurrent
                               1397                 :                :          * inserters, this causes all the inserters to eventually migrate to a
                               1398                 :                :          * lock that no-one else is using.  On a system with more inserters
                               1399                 :                :          * than locks, it still helps to distribute the inserters evenly
                               1400                 :                :          * across the locks.
                               1401                 :                :          */
 3483                          1402                 :            904 :         lockToTry = (lockToTry + 1) % NUM_XLOGINSERT_LOCKS;
                               1403                 :                :     }
 3933                          1404                 :       13612731 : }
                               1405                 :                : 
                               1406                 :                : /*
                               1407                 :                :  * Acquire all WAL insertion locks, to prevent other backends from inserting
                               1408                 :                :  * to WAL.
                               1409                 :                :  */
                               1410                 :                : static void
 3677                          1411                 :           3255 : WALInsertLockAcquireExclusive(void)
                               1412                 :                : {
                               1413                 :                :     int         i;
                               1414                 :                : 
                               1415                 :                :     /*
                               1416                 :                :      * When holding all the locks, all but the last lock's insertingAt
                               1417                 :                :      * indicator is set to 0xFFFFFFFFFFFFFFFF, which is higher than any real
                               1418                 :                :      * XLogRecPtr value, to make sure that no-one blocks waiting on those.
                               1419                 :                :      */
 3483                          1420         [ +  + ]:          26040 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS - 1; i++)
                               1421                 :                :     {
 3180 andres@anarazel.de       1422                 :          22785 :         LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               1423                 :          22785 :         LWLockUpdateVar(&WALInsertLocks[i].l.lock,
                               1424                 :          22785 :                         &WALInsertLocks[i].l.insertingAt,
                               1425                 :                :                         PG_UINT64_MAX);
                               1426                 :                :     }
                               1427                 :                :     /* Variable value reset to 0 at release */
                               1428                 :           3255 :     LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               1429                 :                : 
 3677 heikki.linnakangas@i     1430                 :           3255 :     holdingAllLocks = true;
 3933                          1431                 :           3255 : }
                               1432                 :                : 
                               1433                 :                : /*
                               1434                 :                :  * Release our insertion lock (or locks, if we're holding them all).
                               1435                 :                :  *
                               1436                 :                :  * NB: Reset all variables to 0, so they cause LWLockWaitForVar to block the
                               1437                 :                :  * next time the lock is acquired.
                               1438                 :                :  */
                               1439                 :                : static void
 3677                          1440                 :       13615986 : WALInsertLockRelease(void)
                               1441                 :                : {
                               1442         [ +  + ]:       13615986 :     if (holdingAllLocks)
                               1443                 :                :     {
                               1444                 :                :         int         i;
                               1445                 :                : 
 3483                          1446         [ +  + ]:          29295 :         for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
 3180 andres@anarazel.de       1447                 :          26040 :             LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
                               1448                 :          26040 :                                   &WALInsertLocks[i].l.insertingAt,
                               1449                 :                :                                   0);
                               1450                 :                : 
 3677 heikki.linnakangas@i     1451                 :           3255 :         holdingAllLocks = false;
                               1452                 :                :     }
                               1453                 :                :     else
                               1454                 :                :     {
 3180 andres@anarazel.de       1455                 :       13612731 :         LWLockReleaseClearVar(&WALInsertLocks[MyLockNo].l.lock,
                               1456                 :       13612731 :                               &WALInsertLocks[MyLockNo].l.insertingAt,
                               1457                 :                :                               0);
                               1458                 :                :     }
 3933 heikki.linnakangas@i     1459                 :       13615986 : }
                               1460                 :                : 
                               1461                 :                : /*
                               1462                 :                :  * Update our insertingAt value, to let others know that we've finished
                               1463                 :                :  * inserting up to that point.
                               1464                 :                :  */
                               1465                 :                : static void
 3677                          1466                 :         683968 : WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt)
                               1467                 :                : {
                               1468         [ +  + ]:         683968 :     if (holdingAllLocks)
                               1469                 :                :     {
                               1470                 :                :         /*
                               1471                 :                :          * We use the last lock to mark our actual position, see comments in
                               1472                 :                :          * WALInsertLockAcquireExclusive.
                               1473                 :                :          */
 3483                          1474                 :         465303 :         LWLockUpdateVar(&WALInsertLocks[NUM_XLOGINSERT_LOCKS - 1].l.lock,
 2489 tgl@sss.pgh.pa.us        1475                 :         465303 :                         &WALInsertLocks[NUM_XLOGINSERT_LOCKS - 1].l.insertingAt,
                               1476                 :                :                         insertingAt);
                               1477                 :                :     }
                               1478                 :                :     else
 3677 heikki.linnakangas@i     1479                 :         218665 :         LWLockUpdateVar(&WALInsertLocks[MyLockNo].l.lock,
                               1480                 :         218665 :                         &WALInsertLocks[MyLockNo].l.insertingAt,
                               1481                 :                :                         insertingAt);
 3933                          1482                 :         683968 : }
                               1483                 :                : 
                               1484                 :                : /*
                               1485                 :                :  * Wait for any WAL insertions < upto to finish.
                               1486                 :                :  *
                               1487                 :                :  * Returns the location of the oldest insertion that is still in-progress.
                               1488                 :                :  * Any WAL prior to that point has been fully copied into WAL buffers, and
                               1489                 :                :  * can be flushed out to disk. Because this waits for any insertions older
                               1490                 :                :  * than 'upto' to finish, the return value is always >= 'upto'.
                               1491                 :                :  *
                               1492                 :                :  * Note: When you are about to write out WAL, you must call this function
                               1493                 :                :  * *before* acquiring WALWriteLock, to avoid deadlocks. This function might
                               1494                 :                :  * need to wait for an insertion to finish (or at least advance to next
                               1495                 :                :  * uninitialized page), and the inserter might need to evict an old WAL buffer
                               1496                 :                :  * to make room for a new one, which in turn requires WALWriteLock.
                               1497                 :                :  */
                               1498                 :                : static XLogRecPtr
                               1499                 :         641343 : WaitXLogInsertionsToFinish(XLogRecPtr upto)
                               1500                 :                : {
                               1501                 :                :     uint64      bytepos;
                               1502                 :                :     XLogRecPtr  inserted;
                               1503                 :                :     XLogRecPtr  reservedUpto;
                               1504                 :                :     XLogRecPtr  finishedUpto;
 3492 andres@anarazel.de       1505                 :         641343 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1506                 :                :     int         i;
                               1507                 :                : 
 3933 heikki.linnakangas@i     1508         [ -  + ]:         641343 :     if (MyProc == NULL)
 3933 heikki.linnakangas@i     1509         [ #  # ]:UBC           0 :         elog(PANIC, "cannot wait without a PGPROC structure");
                               1510                 :                : 
                               1511                 :                :     /*
                               1512                 :                :      * Check if there's any work to do.  Use a barrier to ensure we get the
                               1513                 :                :      * freshest value.
                               1514                 :                :      */
    7 alvherre@alvh.no-ip.     1515                 :GNC      641343 :     inserted = pg_atomic_read_membarrier_u64(&XLogCtl->logInsertResult);
                               1516         [ +  + ]:         641343 :     if (upto <= inserted)
                               1517                 :         474598 :         return inserted;
                               1518                 :                : 
                               1519                 :                :     /* Read the current insert position */
 3933 heikki.linnakangas@i     1520         [ +  + ]:CBC      166745 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1521                 :         166745 :     bytepos = Insert->CurrBytePos;
                               1522                 :         166745 :     SpinLockRelease(&Insert->insertpos_lck);
                               1523                 :         166745 :     reservedUpto = XLogBytePosToEndRecPtr(bytepos);
                               1524                 :                : 
                               1525                 :                :     /*
                               1526                 :                :      * No-one should request to flush a piece of WAL that hasn't even been
                               1527                 :                :      * reserved yet. However, it can happen if there is a block with a bogus
                               1528                 :                :      * LSN on disk, for example. XLogFlush checks for that situation and
                               1529                 :                :      * complains, but only after the flush. Here we just assume that to mean
                               1530                 :                :      * that all WAL that has been reserved needs to be finished. In this
                               1531                 :                :      * corner-case, the return value can be smaller than 'upto' argument.
                               1532                 :                :      */
                               1533         [ -  + ]:         166745 :     if (upto > reservedUpto)
                               1534                 :                :     {
 1227 peter@eisentraut.org     1535         [ #  # ]:UBC           0 :         ereport(LOG,
                               1536                 :                :                 (errmsg("request to flush past end of generated WAL; request %X/%X, current position %X/%X",
                               1537                 :                :                         LSN_FORMAT_ARGS(upto), LSN_FORMAT_ARGS(reservedUpto))));
 3933 heikki.linnakangas@i     1538                 :              0 :         upto = reservedUpto;
                               1539                 :                :     }
                               1540                 :                : 
                               1541                 :                :     /*
                               1542                 :                :      * Loop through all the locks, sleeping on any in-progress insert older
                               1543                 :                :      * than 'upto'.
                               1544                 :                :      *
                               1545                 :                :      * finishedUpto is our return value, indicating the point upto which all
                               1546                 :                :      * the WAL insertions have been finished. Initialize it to the head of
                               1547                 :                :      * reserved WAL, and as we iterate through the insertion locks, back it
                               1548                 :                :      * out for any insertion that's still in progress.
                               1549                 :                :      */
 3933 heikki.linnakangas@i     1550                 :CBC      166745 :     finishedUpto = reservedUpto;
 3483                          1551         [ +  + ]:        1500705 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               1552                 :                :     {
 3631 bruce@momjian.us         1553                 :        1333960 :         XLogRecPtr  insertingat = InvalidXLogRecPtr;
                               1554                 :                : 
                               1555                 :                :         do
                               1556                 :                :         {
                               1557                 :                :             /*
                               1558                 :                :              * See if this insertion is in progress.  LWLockWaitForVar will
                               1559                 :                :              * wait for the lock to be released, or for the 'value' to be set
                               1560                 :                :              * by a LWLockUpdateVar call.  When a lock is initially acquired,
                               1561                 :                :              * its value is 0 (InvalidXLogRecPtr), which means that we don't
                               1562                 :                :              * know where it's inserting yet.  We will have to wait for it. If
                               1563                 :                :              * it's a small insertion, the record will most likely fit on the
                               1564                 :                :              * same page and the inserter will release the lock without ever
                               1565                 :                :              * calling LWLockUpdateVar.  But if it has to sleep, it will
                               1566                 :                :              * advertise the insertion point with LWLockUpdateVar before
                               1567                 :                :              * sleeping.
                               1568                 :                :              *
                               1569                 :                :              * In this loop we are only waiting for insertions that started
                               1570                 :                :              * before WaitXLogInsertionsToFinish was called.  The lack of
                               1571                 :                :              * memory barriers in the loop means that we might see locks as
                               1572                 :                :              * "unused" that have since become used.  This is fine because
                               1573                 :                :              * they only can be used for later insertions that we would not
                               1574                 :                :              * want to wait on anyway.  Not taking a lock to acquire the
                               1575                 :                :              * current insertingAt value means that we might see older
                               1576                 :                :              * insertingAt values.  This is also fine, because if we read a
                               1577                 :                :              * value too old, we will add ourselves to the wait queue, which
                               1578                 :                :              * contains atomic operations.
                               1579                 :                :              */
 3677 heikki.linnakangas@i     1580         [ +  + ]:        1334368 :             if (LWLockWaitForVar(&WALInsertLocks[i].l.lock,
                               1581                 :        1334368 :                                  &WALInsertLocks[i].l.insertingAt,
                               1582                 :                :                                  insertingat, &insertingat))
                               1583                 :                :             {
                               1584                 :                :                 /* the lock was free, so no insertion in progress */
                               1585                 :         960975 :                 insertingat = InvalidXLogRecPtr;
                               1586                 :         960975 :                 break;
                               1587                 :                :             }
                               1588                 :                : 
                               1589                 :                :             /*
                               1590                 :                :              * This insertion is still in progress. Have to wait, unless the
                               1591                 :                :              * inserter has proceeded past 'upto'.
                               1592                 :                :              */
                               1593         [ +  + ]:         373393 :         } while (insertingat < upto);
                               1594                 :                : 
                               1595   [ +  +  +  + ]:        1333960 :         if (insertingat != InvalidXLogRecPtr && insertingat < finishedUpto)
                               1596                 :          52208 :             finishedUpto = insertingat;
                               1597                 :                :     }
                               1598                 :                : 
                               1599                 :                :     /*
                               1600                 :                :      * Advance the limit we know to have been inserted and return the freshest
                               1601                 :                :      * value we know of, which might be beyond what we requested if somebody
                               1602                 :                :      * is concurrently doing this with an 'upto' pointer ahead of us.
                               1603                 :                :      */
    7 alvherre@alvh.no-ip.     1604                 :GNC      166745 :     finishedUpto = pg_atomic_monotonic_advance_u64(&XLogCtl->logInsertResult,
                               1605                 :                :                                                    finishedUpto);
                               1606                 :                : 
 3933 heikki.linnakangas@i     1607                 :CBC      166745 :     return finishedUpto;
                               1608                 :                : }
                               1609                 :                : 
                               1610                 :                : /*
                               1611                 :                :  * Get a pointer to the right location in the WAL buffer containing the
                               1612                 :                :  * given XLogRecPtr.
                               1613                 :                :  *
                               1614                 :                :  * If the page is not initialized yet, it is initialized. That might require
                               1615                 :                :  * evicting an old dirty buffer from the buffer cache, which means I/O.
                               1616                 :                :  *
                               1617                 :                :  * The caller must ensure that the page containing the requested location
                               1618                 :                :  * isn't evicted yet, and won't be evicted. The way to ensure that is to
                               1619                 :                :  * hold onto a WAL insertion lock with the insertingAt position set to
                               1620                 :                :  * something <= ptr. GetXLogBuffer() will update insertingAt if it needs
                               1621                 :                :  * to evict an old page from the buffer. (This means that once you call
                               1622                 :                :  * GetXLogBuffer() with a given 'ptr', you must not access anything before
                               1623                 :                :  * that point anymore, and must not call GetXLogBuffer() with an older 'ptr'
                               1624                 :                :  * later, because older buffers might be recycled already)
                               1625                 :                :  */
                               1626                 :                : static char *
  891 rhaas@postgresql.org     1627                 :       14440206 : GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
                               1628                 :                : {
                               1629                 :                :     int         idx;
                               1630                 :                :     XLogRecPtr  endptr;
                               1631                 :                :     static uint64 cachedPage = 0;
                               1632                 :                :     static char *cachedPos = NULL;
                               1633                 :                :     XLogRecPtr  expectedEndPtr;
                               1634                 :                : 
                               1635                 :                :     /*
                               1636                 :                :      * Fast path for the common case that we need to access again the same
                               1637                 :                :      * page as last time.
                               1638                 :                :      */
 3933 heikki.linnakangas@i     1639         [ +  + ]:       14440206 :     if (ptr / XLOG_BLCKSZ == cachedPage)
                               1640                 :                :     {
                               1641         [ -  + ]:       13436384 :         Assert(((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC);
                               1642         [ -  + ]:       13436384 :         Assert(((XLogPageHeader) cachedPos)->xlp_pageaddr == ptr - (ptr % XLOG_BLCKSZ));
                               1643                 :       13436384 :         return cachedPos + ptr % XLOG_BLCKSZ;
                               1644                 :                :     }
                               1645                 :                : 
                               1646                 :                :     /*
                               1647                 :                :      * The XLog buffer cache is organized so that a page is always loaded to a
                               1648                 :                :      * particular buffer.  That way we can easily calculate the buffer a given
                               1649                 :                :      * page must be loaded into, from the XLogRecPtr alone.
                               1650                 :                :      */
                               1651                 :        1003822 :     idx = XLogRecPtrToBufIdx(ptr);
                               1652                 :                : 
                               1653                 :                :     /*
                               1654                 :                :      * See what page is loaded in the buffer at the moment. It could be the
                               1655                 :                :      * page we're looking for, or something older. It can't be anything newer
                               1656                 :                :      * - that would imply the page we're looking for has already been written
                               1657                 :                :      * out to disk and evicted, and the caller is responsible for making sure
                               1658                 :                :      * that doesn't happen.
                               1659                 :                :      *
                               1660                 :                :      * We don't hold a lock while we read the value. If someone is just about
                               1661                 :                :      * to initialize or has just initialized the page, it's possible that we
                               1662                 :                :      * get InvalidXLogRecPtr. That's ok, we'll grab the mapping lock (in
                               1663                 :                :      * AdvanceXLInsertBuffer) and retry if we see anything other than the page
                               1664                 :                :      * we're looking for.
                               1665                 :                :      */
                               1666                 :        1003822 :     expectedEndPtr = ptr;
                               1667                 :        1003822 :     expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;
                               1668                 :                : 
  117 jdavis@postgresql.or     1669                 :GNC     1003822 :     endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
 3933 heikki.linnakangas@i     1670         [ +  + ]:CBC     1003822 :     if (expectedEndPtr != endptr)
                               1671                 :                :     {
                               1672                 :                :         XLogRecPtr  initializedUpto;
                               1673                 :                : 
                               1674                 :                :         /*
                               1675                 :                :          * Before calling AdvanceXLInsertBuffer(), which can block, let others
                               1676                 :                :          * know how far we're finished with inserting the record.
                               1677                 :                :          *
                               1678                 :                :          * NB: If 'ptr' points to just after the page header, advertise a
                               1679                 :                :          * position at the beginning of the page rather than 'ptr' itself. If
                               1680                 :                :          * there are no other insertions running, someone might try to flush
                               1681                 :                :          * up to our advertised location. If we advertised a position after
                               1682                 :                :          * the page header, someone might try to flush the page header, even
                               1683                 :                :          * though page might actually not be initialized yet. As the first
                               1684                 :                :          * inserter on the page, we are effectively responsible for making
                               1685                 :                :          * sure that it's initialized, before we let insertingAt to move past
                               1686                 :                :          * the page header.
                               1687                 :                :          */
 3178                          1688         [ +  + ]:         683968 :         if (ptr % XLOG_BLCKSZ == SizeOfXLogShortPHD &&
 2399 andres@anarazel.de       1689         [ +  - ]:           4483 :             XLogSegmentOffset(ptr, wal_segment_size) > XLOG_BLCKSZ)
 3178 heikki.linnakangas@i     1690                 :           4483 :             initializedUpto = ptr - SizeOfXLogShortPHD;
                               1691         [ +  + ]:         679485 :         else if (ptr % XLOG_BLCKSZ == SizeOfXLogLongPHD &&
 2399 andres@anarazel.de       1692         [ +  + ]:            462 :                  XLogSegmentOffset(ptr, wal_segment_size) < XLOG_BLCKSZ)
 3178 heikki.linnakangas@i     1693                 :            299 :             initializedUpto = ptr - SizeOfXLogLongPHD;
                               1694                 :                :         else
                               1695                 :         679186 :             initializedUpto = ptr;
                               1696                 :                : 
                               1697                 :         683968 :         WALInsertLockUpdateInsertingAt(initializedUpto);
                               1698                 :                : 
  891 rhaas@postgresql.org     1699                 :         683968 :         AdvanceXLInsertBuffer(ptr, tli, false);
  117 jdavis@postgresql.or     1700                 :GNC      683968 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1701                 :                : 
 3933 heikki.linnakangas@i     1702         [ -  + ]:CBC      683968 :         if (expectedEndPtr != endptr)
 3933 heikki.linnakangas@i     1703         [ #  # ]:UBC           0 :             elog(PANIC, "could not find WAL buffer for %X/%X",
                               1704                 :                :                  LSN_FORMAT_ARGS(ptr));
                               1705                 :                :     }
                               1706                 :                :     else
                               1707                 :                :     {
                               1708                 :                :         /*
                               1709                 :                :          * Make sure the initialization of the page is visible to us, and
                               1710                 :                :          * won't arrive later to overwrite the WAL data we write on the page.
                               1711                 :                :          */
 3933 heikki.linnakangas@i     1712                 :CBC      319854 :         pg_memory_barrier();
                               1713                 :                :     }
                               1714                 :                : 
                               1715                 :                :     /*
                               1716                 :                :      * Found the buffer holding this page. Return a pointer to the right
                               1717                 :                :      * offset within the page.
                               1718                 :                :      */
                               1719                 :        1003822 :     cachedPage = ptr / XLOG_BLCKSZ;
                               1720                 :        1003822 :     cachedPos = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
                               1721                 :                : 
                               1722         [ -  + ]:        1003822 :     Assert(((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC);
                               1723         [ -  + ]:        1003822 :     Assert(((XLogPageHeader) cachedPos)->xlp_pageaddr == ptr - (ptr % XLOG_BLCKSZ));
                               1724                 :                : 
                               1725                 :        1003822 :     return cachedPos + ptr % XLOG_BLCKSZ;
                               1726                 :                : }
                               1727                 :                : 
                               1728                 :                : /*
                               1729                 :                :  * Read WAL data directly from WAL buffers, if available. Returns the number
                               1730                 :                :  * of bytes read successfully.
                               1731                 :                :  *
                               1732                 :                :  * Fewer than 'count' bytes may be read if some of the requested WAL data has
                               1733                 :                :  * already been evicted.
                               1734                 :                :  *
                               1735                 :                :  * No locks are taken.
                               1736                 :                :  *
                               1737                 :                :  * Caller should ensure that it reads no further than LogwrtResult.Write
                               1738                 :                :  * (which should have been updated by the caller when determining how far to
                               1739                 :                :  * read). The 'tli' argument is only used as a convenient safety check so that
                               1740                 :                :  * callers do not read from WAL buffers on a historical timeline.
                               1741                 :                :  */
                               1742                 :                : Size
   62 jdavis@postgresql.or     1743                 :GNC       30959 : WALReadFromBuffers(char *dstbuf, XLogRecPtr startptr, Size count,
                               1744                 :                :                    TimeLineID tli)
                               1745                 :                : {
                               1746                 :          30959 :     char       *pdst = dstbuf;
                               1747                 :          30959 :     XLogRecPtr  recptr = startptr;
                               1748                 :                :     XLogRecPtr  inserted;
   58                          1749                 :          30959 :     Size        nbytes = count;
                               1750                 :                : 
   62                          1751   [ +  +  +  + ]:          30959 :     if (RecoveryInProgress() || tli != GetWALInsertionTimeLine())
                               1752                 :            923 :         return 0;
                               1753                 :                : 
                               1754         [ -  + ]:          30036 :     Assert(!XLogRecPtrIsInvalid(startptr));
                               1755                 :                : 
                               1756                 :                :     /*
                               1757                 :                :      * Caller should ensure that the requested data has been inserted into WAL
                               1758                 :                :      * buffers before we try to read it.
                               1759                 :                :      */
    7 alvherre@alvh.no-ip.     1760                 :          30036 :     inserted = pg_atomic_read_u64(&XLogCtl->logInsertResult);
                               1761         [ -  + ]:          30036 :     if (startptr + count > inserted)
    7 alvherre@alvh.no-ip.     1762         [ #  # ]:UNC           0 :         ereport(ERROR,
                               1763                 :                :                 errmsg("cannot read past end of generated WAL: requested %X/%X, current position %X/%X",
                               1764                 :                :                        LSN_FORMAT_ARGS(startptr + count),
                               1765                 :                :                        LSN_FORMAT_ARGS(inserted)));
                               1766                 :                : 
                               1767                 :                :     /*
                               1768                 :                :      * Loop through the buffers without a lock. For each buffer, atomically
                               1769                 :                :      * read and verify the end pointer, then copy the data out, and finally
                               1770                 :                :      * re-read and re-verify the end pointer.
                               1771                 :                :      *
                               1772                 :                :      * Once a page is evicted, it never returns to the WAL buffers, so if the
                               1773                 :                :      * end pointer matches the expected end pointer before and after we copy
                               1774                 :                :      * the data, then the right page must have been present during the data
                               1775                 :                :      * copy. Read barriers are necessary to ensure that the data copy actually
                               1776                 :                :      * happens between the two verification steps.
                               1777                 :                :      *
                               1778                 :                :      * If either verification fails, we simply terminate the loop and return
                               1779                 :                :      * with the data that had been already copied out successfully.
                               1780                 :                :      */
   62 jdavis@postgresql.or     1781         [ +  + ]:GNC       73191 :     while (nbytes > 0)
                               1782                 :                :     {
                               1783                 :          52916 :         uint32      offset = recptr % XLOG_BLCKSZ;
                               1784                 :          52916 :         int         idx = XLogRecPtrToBufIdx(recptr);
                               1785                 :                :         XLogRecPtr  expectedEndPtr;
                               1786                 :                :         XLogRecPtr  endptr;
                               1787                 :                :         const char *page;
                               1788                 :                :         const char *psrc;
                               1789                 :                :         Size        npagebytes;
                               1790                 :                : 
                               1791                 :                :         /*
                               1792                 :                :          * Calculate the end pointer we expect in the xlblocks array if the
                               1793                 :                :          * correct page is present.
                               1794                 :                :          */
                               1795                 :          52916 :         expectedEndPtr = recptr + (XLOG_BLCKSZ - offset);
                               1796                 :                : 
                               1797                 :                :         /*
                               1798                 :                :          * First verification step: check that the correct page is present in
                               1799                 :                :          * the WAL buffers.
                               1800                 :                :          */
                               1801                 :          52916 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1802         [ +  + ]:          52916 :         if (expectedEndPtr != endptr)
                               1803                 :           9752 :             break;
                               1804                 :                : 
                               1805                 :                :         /*
                               1806                 :                :          * The correct page is present (or was at the time the endptr was
                               1807                 :                :          * read; must re-verify later). Calculate pointer to source data and
                               1808                 :                :          * determine how much data to read from this page.
                               1809                 :                :          */
                               1810                 :          43164 :         page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
                               1811                 :          43164 :         psrc = page + offset;
                               1812                 :          43164 :         npagebytes = Min(nbytes, XLOG_BLCKSZ - offset);
                               1813                 :                : 
                               1814                 :                :         /*
                               1815                 :                :          * Ensure that the data copy and the first verification step are not
                               1816                 :                :          * reordered.
                               1817                 :                :          */
                               1818                 :          43164 :         pg_read_barrier();
                               1819                 :                : 
                               1820                 :                :         /* data copy */
                               1821                 :          43164 :         memcpy(pdst, psrc, npagebytes);
                               1822                 :                : 
                               1823                 :                :         /*
                               1824                 :                :          * Ensure that the data copy and the second verification step are not
                               1825                 :                :          * reordered.
                               1826                 :                :          */
                               1827                 :          43164 :         pg_read_barrier();
                               1828                 :                : 
                               1829                 :                :         /*
                               1830                 :                :          * Second verification step: check that the page we read from wasn't
                               1831                 :                :          * evicted while we were copying the data.
                               1832                 :                :          */
                               1833                 :          43164 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1834         [ +  + ]:          43164 :         if (expectedEndPtr != endptr)
                               1835                 :              9 :             break;
                               1836                 :                : 
                               1837                 :          43155 :         pdst += npagebytes;
                               1838                 :          43155 :         recptr += npagebytes;
                               1839                 :          43155 :         nbytes -= npagebytes;
                               1840                 :                :     }
                               1841                 :                : 
                               1842         [ -  + ]:          30036 :     Assert(pdst - dstbuf <= count);
                               1843                 :                : 
                               1844                 :          30036 :     return pdst - dstbuf;
                               1845                 :                : }
                               1846                 :                : 
                               1847                 :                : /*
                               1848                 :                :  * Converts a "usable byte position" to XLogRecPtr. A usable byte position
                               1849                 :                :  * is the position starting from the beginning of WAL, excluding all WAL
                               1850                 :                :  * page headers.
                               1851                 :                :  */
                               1852                 :                : static XLogRecPtr
 3933 heikki.linnakangas@i     1853                 :CBC    27217531 : XLogBytePosToRecPtr(uint64 bytepos)
                               1854                 :                : {
                               1855                 :                :     uint64      fullsegs;
                               1856                 :                :     uint64      fullpages;
                               1857                 :                :     uint64      bytesleft;
                               1858                 :                :     uint32      seg_offset;
                               1859                 :                :     XLogRecPtr  result;
                               1860                 :                : 
                               1861                 :       27217531 :     fullsegs = bytepos / UsableBytesInSegment;
                               1862                 :       27217531 :     bytesleft = bytepos % UsableBytesInSegment;
                               1863                 :                : 
                               1864         [ +  + ]:       27217531 :     if (bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD)
                               1865                 :                :     {
                               1866                 :                :         /* fits on first page of segment */
                               1867                 :          47455 :         seg_offset = bytesleft + SizeOfXLogLongPHD;
                               1868                 :                :     }
                               1869                 :                :     else
                               1870                 :                :     {
                               1871                 :                :         /* account for the first page on segment with long header */
                               1872                 :       27170076 :         seg_offset = XLOG_BLCKSZ;
                               1873                 :       27170076 :         bytesleft -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
                               1874                 :                : 
                               1875                 :       27170076 :         fullpages = bytesleft / UsableBytesInPage;
                               1876                 :       27170076 :         bytesleft = bytesleft % UsableBytesInPage;
                               1877                 :                : 
                               1878                 :       27170076 :         seg_offset += fullpages * XLOG_BLCKSZ + bytesleft + SizeOfXLogShortPHD;
                               1879                 :                :     }
                               1880                 :                : 
 2106 alvherre@alvh.no-ip.     1881                 :       27217531 :     XLogSegNoOffsetToRecPtr(fullsegs, seg_offset, wal_segment_size, result);
                               1882                 :                : 
 3933 heikki.linnakangas@i     1883                 :       27217531 :     return result;
                               1884                 :                : }
                               1885                 :                : 
                               1886                 :                : /*
                               1887                 :                :  * Like XLogBytePosToRecPtr, but if the position is at a page boundary,
                               1888                 :                :  * returns a pointer to the beginning of the page (ie. before page header),
                               1889                 :                :  * not to where the first xlog record on that page would go to. This is used
                               1890                 :                :  * when converting a pointer to the end of a record.
                               1891                 :                :  */
                               1892                 :                : static XLogRecPtr
                               1893                 :       13774425 : XLogBytePosToEndRecPtr(uint64 bytepos)
                               1894                 :                : {
                               1895                 :                :     uint64      fullsegs;
                               1896                 :                :     uint64      fullpages;
                               1897                 :                :     uint64      bytesleft;
                               1898                 :                :     uint32      seg_offset;
                               1899                 :                :     XLogRecPtr  result;
                               1900                 :                : 
                               1901                 :       13774425 :     fullsegs = bytepos / UsableBytesInSegment;
                               1902                 :       13774425 :     bytesleft = bytepos % UsableBytesInSegment;
                               1903                 :                : 
                               1904         [ +  + ]:       13774425 :     if (bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD)
                               1905                 :                :     {
                               1906                 :                :         /* fits on first page of segment */
                               1907         [ +  + ]:          70440 :         if (bytesleft == 0)
                               1908                 :          46128 :             seg_offset = 0;
                               1909                 :                :         else
                               1910                 :          24312 :             seg_offset = bytesleft + SizeOfXLogLongPHD;
                               1911                 :                :     }
                               1912                 :                :     else
                               1913                 :                :     {
                               1914                 :                :         /* account for the first page on segment with long header */
                               1915                 :       13703985 :         seg_offset = XLOG_BLCKSZ;
                               1916                 :       13703985 :         bytesleft -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
                               1917                 :                : 
                               1918                 :       13703985 :         fullpages = bytesleft / UsableBytesInPage;
                               1919                 :       13703985 :         bytesleft = bytesleft % UsableBytesInPage;
                               1920                 :                : 
                               1921         [ +  + ]:       13703985 :         if (bytesleft == 0)
                               1922                 :          13366 :             seg_offset += fullpages * XLOG_BLCKSZ + bytesleft;
                               1923                 :                :         else
                               1924                 :       13690619 :             seg_offset += fullpages * XLOG_BLCKSZ + bytesleft + SizeOfXLogShortPHD;
                               1925                 :                :     }
                               1926                 :                : 
 2106 alvherre@alvh.no-ip.     1927                 :       13774425 :     XLogSegNoOffsetToRecPtr(fullsegs, seg_offset, wal_segment_size, result);
                               1928                 :                : 
 3933 heikki.linnakangas@i     1929                 :       13774425 :     return result;
                               1930                 :                : }
                               1931                 :                : 
                               1932                 :                : /*
                               1933                 :                :  * Convert an XLogRecPtr to a "usable byte position".
                               1934                 :                :  */
                               1935                 :                : static uint64
                               1936                 :       40823743 : XLogRecPtrToBytePos(XLogRecPtr ptr)
                               1937                 :                : {
                               1938                 :                :     uint64      fullsegs;
                               1939                 :                :     uint32      fullpages;
                               1940                 :                :     uint32      offset;
                               1941                 :                :     uint64      result;
                               1942                 :                : 
 2399 andres@anarazel.de       1943                 :       40823743 :     XLByteToSeg(ptr, fullsegs, wal_segment_size);
                               1944                 :                : 
                               1945                 :       40823743 :     fullpages = (XLogSegmentOffset(ptr, wal_segment_size)) / XLOG_BLCKSZ;
 3933 heikki.linnakangas@i     1946                 :       40823743 :     offset = ptr % XLOG_BLCKSZ;
                               1947                 :                : 
                               1948         [ +  + ]:       40823743 :     if (fullpages == 0)
                               1949                 :                :     {
                               1950                 :          71547 :         result = fullsegs * UsableBytesInSegment;
                               1951         [ +  + ]:          71547 :         if (offset > 0)
                               1952                 :                :         {
                               1953         [ -  + ]:          70925 :             Assert(offset >= SizeOfXLogLongPHD);
                               1954                 :          70925 :             result += offset - SizeOfXLogLongPHD;
                               1955                 :                :         }
                               1956                 :                :     }
                               1957                 :                :     else
                               1958                 :                :     {
                               1959                 :       40752196 :         result = fullsegs * UsableBytesInSegment +
 3631 bruce@momjian.us         1960                 :       40752196 :             (XLOG_BLCKSZ - SizeOfXLogLongPHD) + /* account for first page */
 2489 tgl@sss.pgh.pa.us        1961                 :       40752196 :             (fullpages - 1) * UsableBytesInPage;    /* full pages */
 3933 heikki.linnakangas@i     1962         [ +  + ]:       40752196 :         if (offset > 0)
                               1963                 :                :         {
                               1964         [ -  + ]:       40738923 :             Assert(offset >= SizeOfXLogShortPHD);
                               1965                 :       40738923 :             result += offset - SizeOfXLogShortPHD;
                               1966                 :                :         }
                               1967                 :                :     }
                               1968                 :                : 
                               1969                 :       40823743 :     return result;
                               1970                 :                : }
                               1971                 :                : 
                               1972                 :                : /*
                               1973                 :                :  * Initialize XLOG buffers, writing out old buffers if they still contain
                               1974                 :                :  * unwritten data, upto the page containing 'upto'. Or if 'opportunistic' is
                               1975                 :                :  * true, initialize as many pages as we can without having to write out
                               1976                 :                :  * unwritten data. Any new pages are initialized to zeros, with pages headers
                               1977                 :                :  * initialized properly.
                               1978                 :                :  */
                               1979                 :                : static void
  891 rhaas@postgresql.org     1980                 :         687587 : AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
                               1981                 :                : {
 8433 tgl@sss.pgh.pa.us        1982                 :         687587 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1983                 :                :     int         nextidx;
                               1984                 :                :     XLogRecPtr  OldPageRqstPtr;
                               1985                 :                :     XLogwrtRqst WriteRqst;
 3933 heikki.linnakangas@i     1986                 :         687587 :     XLogRecPtr  NewPageEndPtr = InvalidXLogRecPtr;
                               1987                 :                :     XLogRecPtr  NewPageBeginPtr;
                               1988                 :                :     XLogPageHeader NewPage;
  572 tgl@sss.pgh.pa.us        1989                 :         687587 :     int         npages pg_attribute_unused() = 0;
                               1990                 :                : 
 3933 heikki.linnakangas@i     1991                 :         687587 :     LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE);
                               1992                 :                : 
                               1993                 :                :     /*
                               1994                 :                :      * Now that we have the lock, check if someone initialized the page
                               1995                 :                :      * already.
                               1996                 :                :      */
 3924                          1997   [ +  +  +  + ]:        2141357 :     while (upto >= XLogCtl->InitializedUpTo || opportunistic)
                               1998                 :                :     {
                               1999                 :        1457389 :         nextidx = XLogRecPtrToBufIdx(XLogCtl->InitializedUpTo);
                               2000                 :                : 
                               2001                 :                :         /*
                               2002                 :                :          * Get ending-offset of the buffer page we need to replace (this may
                               2003                 :                :          * be zero if the buffer hasn't been used yet).  Fall through if it's
                               2004                 :                :          * already written out.
                               2005                 :                :          */
  117 jdavis@postgresql.or     2006                 :GNC     1457389 :         OldPageRqstPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]);
 4125 alvherre@alvh.no-ip.     2007         [ +  + ]:CBC     1457389 :         if (LogwrtResult.Write < OldPageRqstPtr)
                               2008                 :                :         {
                               2009                 :                :             /*
                               2010                 :                :              * Nope, got work to do. If we just want to pre-initialize as much
                               2011                 :                :              * as we can without flushing, give up now.
                               2012                 :                :              */
 3933 heikki.linnakangas@i     2013         [ +  + ]:         532187 :             if (opportunistic)
                               2014                 :           3619 :                 break;
                               2015                 :                : 
                               2016                 :                :             /* Advance shared memory write request position */
 3492 andres@anarazel.de       2017         [ +  + ]:         528568 :             SpinLockAcquire(&XLogCtl->info_lck);
                               2018         [ +  + ]:         528568 :             if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
                               2019                 :         441311 :                 XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
                               2020                 :         528568 :             SpinLockRelease(&XLogCtl->info_lck);
                               2021                 :                : 
                               2022                 :                :             /*
                               2023                 :                :              * Acquire an up-to-date LogwrtResult value and see if we still
                               2024                 :                :              * need to write it or if someone else already did.
                               2025                 :                :              */
    9 alvherre@alvh.no-ip.     2026                 :GNC      528568 :             RefreshXLogWriteResult(LogwrtResult);
 3933 heikki.linnakangas@i     2027         [ +  + ]:CBC      528568 :             if (LogwrtResult.Write < OldPageRqstPtr)
                               2028                 :                :             {
                               2029                 :                :                 /*
                               2030                 :                :                  * Must acquire write lock. Release WALBufMappingLock first,
                               2031                 :                :                  * to make sure that all insertions that we need to wait for
                               2032                 :                :                  * can finish (up to this same position). Otherwise we risk
                               2033                 :                :                  * deadlock.
                               2034                 :                :                  */
                               2035                 :         525751 :                 LWLockRelease(WALBufMappingLock);
                               2036                 :                : 
                               2037                 :         525751 :                 WaitXLogInsertionsToFinish(OldPageRqstPtr);
                               2038                 :                : 
                               2039                 :         525751 :                 LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
                               2040                 :                : 
   11 alvherre@alvh.no-ip.     2041                 :GNC      525751 :                 RefreshXLogWriteResult(LogwrtResult);
 3933 heikki.linnakangas@i     2042         [ +  + ]:CBC      525751 :                 if (LogwrtResult.Write >= OldPageRqstPtr)
                               2043                 :                :                 {
                               2044                 :                :                     /* OK, someone wrote it already */
                               2045                 :           1282 :                     LWLockRelease(WALWriteLock);
                               2046                 :                :                 }
                               2047                 :                :                 else
                               2048                 :                :                 {
                               2049                 :                :                     /* Have to write it ourselves */
                               2050                 :                :                     TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START();
                               2051                 :         524469 :                     WriteRqst.Write = OldPageRqstPtr;
                               2052                 :         524469 :                     WriteRqst.Flush = 0;
  891 rhaas@postgresql.org     2053                 :         524469 :                     XLogWrite(WriteRqst, tli, false);
 3933 heikki.linnakangas@i     2054                 :         524469 :                     LWLockRelease(WALWriteLock);
  739 andres@anarazel.de       2055                 :         524469 :                     PendingWalStats.wal_buffers_full++;
                               2056                 :                :                     TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
                               2057                 :                :                 }
                               2058                 :                :                 /* Re-acquire WALBufMappingLock and retry */
 3933 heikki.linnakangas@i     2059                 :         525751 :                 LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE);
                               2060                 :         525751 :                 continue;
                               2061                 :                :             }
                               2062                 :                :         }
                               2063                 :                : 
                               2064                 :                :         /*
                               2065                 :                :          * Now the next buffer slot is free and we can set it up to be the
                               2066                 :                :          * next output page.
                               2067                 :                :          */
 3924                          2068                 :         928019 :         NewPageBeginPtr = XLogCtl->InitializedUpTo;
 3933                          2069                 :         928019 :         NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ;
                               2070                 :                : 
                               2071         [ -  + ]:         928019 :         Assert(XLogRecPtrToBufIdx(NewPageBeginPtr) == nextidx);
                               2072                 :                : 
                               2073                 :         928019 :         NewPage = (XLogPageHeader) (XLogCtl->pages + nextidx * (Size) XLOG_BLCKSZ);
                               2074                 :                : 
                               2075                 :                :         /*
                               2076                 :                :          * Mark the xlblock with InvalidXLogRecPtr and issue a write barrier
                               2077                 :                :          * before initializing. Otherwise, the old page may be partially
                               2078                 :                :          * zeroed but look valid.
                               2079                 :                :          */
  117 jdavis@postgresql.or     2080                 :GNC      928019 :         pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], InvalidXLogRecPtr);
                               2081                 :         928019 :         pg_write_barrier();
                               2082                 :                : 
                               2083                 :                :         /*
                               2084                 :                :          * Be sure to re-zero the buffer so that bytes beyond what we've
                               2085                 :                :          * written will look like zeroes and not valid XLOG records...
                               2086                 :                :          */
 3933 heikki.linnakangas@i     2087   [ +  -  +  -  :CBC      928019 :         MemSet((char *) NewPage, 0, XLOG_BLCKSZ);
                                     +  -  -  +  -  
                                                 - ]
                               2088                 :                : 
                               2089                 :                :         /*
                               2090                 :                :          * Fill the new page's header
                               2091                 :                :          */
 3249 bruce@momjian.us         2092                 :         928019 :         NewPage->xlp_magic = XLOG_PAGE_MAGIC;
                               2093                 :                : 
                               2094                 :                :         /* NewPage->xlp_info = 0; */ /* done by memset */
  891 rhaas@postgresql.org     2095                 :         928019 :         NewPage->xlp_tli = tli;
 3249 bruce@momjian.us         2096                 :         928019 :         NewPage->xlp_pageaddr = NewPageBeginPtr;
                               2097                 :                : 
                               2098                 :                :         /* NewPage->xlp_rem_len = 0; */  /* done by memset */
                               2099                 :                : 
                               2100                 :                :         /*
                               2101                 :                :          * If online backup is not in progress, mark the header to indicate
                               2102                 :                :          * that WAL records beginning in this page have removable backup
                               2103                 :                :          * blocks.  This allows the WAL archiver to know whether it is safe to
                               2104                 :                :          * compress archived WAL data by transforming full-block records into
                               2105                 :                :          * the non-full-block format.  It is sufficient to record this at the
                               2106                 :                :          * page level because we force a page switch (in fact a segment
                               2107                 :                :          * switch) when starting a backup, so the flag will be off before any
                               2108                 :                :          * records can be written during the backup.  At the end of a backup,
                               2109                 :                :          * the last page will be marked as all unsafe when perhaps only part
                               2110                 :                :          * is unsafe, but at worst the archiver would miss the opportunity to
                               2111                 :                :          * compress a few records.
                               2112                 :                :          */
  543 alvherre@alvh.no-ip.     2113         [ +  + ]:         928019 :         if (Insert->runningBackups == 0)
 3249 bruce@momjian.us         2114                 :         789859 :             NewPage->xlp_info |= XLP_BKP_REMOVABLE;
                               2115                 :                : 
                               2116                 :                :         /*
                               2117                 :                :          * If first page of an XLOG segment file, make it a long header.
                               2118                 :                :          */
 2399 andres@anarazel.de       2119         [ +  + ]:         928019 :         if ((XLogSegmentOffset(NewPage->xlp_pageaddr, wal_segment_size)) == 0)
                               2120                 :                :         {
 3933 heikki.linnakangas@i     2121                 :            794 :             XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;
                               2122                 :                : 
                               2123                 :            794 :             NewLongPage->xlp_sysid = ControlFile->system_identifier;
 2399 andres@anarazel.de       2124                 :            794 :             NewLongPage->xlp_seg_size = wal_segment_size;
 3933 heikki.linnakangas@i     2125                 :            794 :             NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
 3249 bruce@momjian.us         2126                 :            794 :             NewPage->xlp_info |= XLP_LONG_HEADER;
                               2127                 :                :         }
                               2128                 :                : 
                               2129                 :                :         /*
                               2130                 :                :          * Make sure the initialization of the page becomes visible to others
                               2131                 :                :          * before the xlblocks update. GetXLogBuffer() reads xlblocks without
                               2132                 :                :          * holding a lock.
                               2133                 :                :          */
 3933 heikki.linnakangas@i     2134                 :         928019 :         pg_write_barrier();
                               2135                 :                : 
  117 jdavis@postgresql.or     2136                 :GNC      928019 :         pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr);
 3924 heikki.linnakangas@i     2137                 :CBC      928019 :         XLogCtl->InitializedUpTo = NewPageEndPtr;
                               2138                 :                : 
 3933                          2139                 :         928019 :         npages++;
                               2140                 :                :     }
                               2141                 :         687587 :     LWLockRelease(WALBufMappingLock);
                               2142                 :                : 
                               2143                 :                : #ifdef WAL_DEBUG
                               2144                 :                :     if (XLOG_DEBUG && npages > 0)
                               2145                 :                :     {
                               2146                 :                :         elog(DEBUG1, "initialized %d pages, up to %X/%X",
                               2147                 :                :              npages, LSN_FORMAT_ARGS(NewPageEndPtr));
                               2148                 :                :     }
                               2149                 :                : #endif
 8966 vadim4o@yahoo.com        2150                 :         687587 : }
                               2151                 :                : 
                               2152                 :                : /*
                               2153                 :                :  * Calculate CheckPointSegments based on max_wal_size_mb and
                               2154                 :                :  * checkpoint_completion_target.
                               2155                 :                :  */
                               2156                 :                : static void
 3338 heikki.linnakangas@i     2157                 :           6957 : CalculateCheckpointSegments(void)
                               2158                 :                : {
                               2159                 :                :     double      target;
                               2160                 :                : 
                               2161                 :                :     /*-------
                               2162                 :                :      * Calculate the distance at which to trigger a checkpoint, to avoid
                               2163                 :                :      * exceeding max_wal_size_mb. This is based on two assumptions:
                               2164                 :                :      *
                               2165                 :                :      * a) we keep WAL for only one checkpoint cycle (prior to PG11 we kept
                               2166                 :                :      *    WAL for two checkpoint cycles to allow us to recover from the
                               2167                 :                :      *    secondary checkpoint if the first checkpoint failed, though we
                               2168                 :                :      *    only did this on the primary anyway, not on standby. Keeping just
                               2169                 :                :      *    one checkpoint simplifies processing and reduces disk space in
                               2170                 :                :      *    many smaller databases.)
                               2171                 :                :      * b) during checkpoint, we consume checkpoint_completion_target *
                               2172                 :                :      *    number of segments consumed between checkpoints.
                               2173                 :                :      *-------
                               2174                 :                :      */
 2399 andres@anarazel.de       2175                 :           6957 :     target = (double) ConvertToXSegs(max_wal_size_mb, wal_segment_size) /
 2350 simon@2ndQuadrant.co     2176                 :           6957 :         (1.0 + CheckPointCompletionTarget);
                               2177                 :                : 
                               2178                 :                :     /* round down */
 3338 heikki.linnakangas@i     2179                 :           6957 :     CheckPointSegments = (int) target;
                               2180                 :                : 
                               2181         [ +  + ]:           6957 :     if (CheckPointSegments < 1)
                               2182                 :              9 :         CheckPointSegments = 1;
                               2183                 :           6957 : }
                               2184                 :                : 
                               2185                 :                : void
                               2186                 :           5203 : assign_max_wal_size(int newval, void *extra)
                               2187                 :                : {
 2567 simon@2ndQuadrant.co     2188                 :           5203 :     max_wal_size_mb = newval;
 3338 heikki.linnakangas@i     2189                 :           5203 :     CalculateCheckpointSegments();
                               2190                 :           5203 : }
                               2191                 :                : 
                               2192                 :                : void
                               2193                 :            928 : assign_checkpoint_completion_target(double newval, void *extra)
                               2194                 :                : {
                               2195                 :            928 :     CheckPointCompletionTarget = newval;
                               2196                 :            928 :     CalculateCheckpointSegments();
                               2197                 :            928 : }
                               2198                 :                : 
                               2199                 :                : bool
  230 peter@eisentraut.org     2200                 :GNC        1794 : check_wal_segment_size(int *newval, void **extra, GucSource source)
                               2201                 :                : {
                               2202   [ +  -  +  -  :           1794 :     if (!IsValidWalSegSize(*newval))
                                        +  -  -  + ]
                               2203                 :                :     {
  230 peter@eisentraut.org     2204                 :UNC           0 :         GUC_check_errdetail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
                               2205                 :              0 :         return false;
                               2206                 :                :     }
                               2207                 :                : 
  230 peter@eisentraut.org     2208                 :GNC        1794 :     return true;
                               2209                 :                : }
                               2210                 :                : 
                               2211                 :                : /*
                               2212                 :                :  * GUC check_hook for max_slot_wal_keep_size
                               2213                 :                :  *
                               2214                 :                :  * We don't allow the value of max_slot_wal_keep_size other than -1 during the
                               2215                 :                :  * binary upgrade. See start_postmaster() in pg_upgrade for more details.
                               2216                 :                :  */
                               2217                 :                : bool
  156 akapila@postgresql.o     2218                 :           1032 : check_max_slot_wal_keep_size(int *newval, void **extra, GucSource source)
                               2219                 :                : {
                               2220   [ +  +  -  + ]:           1032 :     if (IsBinaryUpgrade && *newval != -1)
                               2221                 :                :     {
  156 akapila@postgresql.o     2222                 :UNC           0 :         GUC_check_errdetail("\"%s\" must be set to -1 during binary upgrade mode.",
                               2223                 :                :                             "max_slot_wal_keep_size");
                               2224                 :              0 :         return false;
                               2225                 :                :     }
                               2226                 :                : 
  156 akapila@postgresql.o     2227                 :GNC        1032 :     return true;
                               2228                 :                : }
                               2229                 :                : 
                               2230                 :                : /*
                               2231                 :                :  * At a checkpoint, how many WAL segments to recycle as preallocated future
                               2232                 :                :  * XLOG segments? Returns the highest segment that should be preallocated.
                               2233                 :                :  */
                               2234                 :                : static XLogSegNo
 1579 michael@paquier.xyz      2235                 :CBC        1148 : XLOGfileslop(XLogRecPtr lastredoptr)
                               2236                 :                : {
                               2237                 :                :     XLogSegNo   minSegNo;
                               2238                 :                :     XLogSegNo   maxSegNo;
                               2239                 :                :     double      distance;
                               2240                 :                :     XLogSegNo   recycleSegNo;
                               2241                 :                : 
                               2242                 :                :     /*
                               2243                 :                :      * Calculate the segment numbers that min_wal_size_mb and max_wal_size_mb
                               2244                 :                :      * correspond to. Always recycle enough segments to meet the minimum, and
                               2245                 :                :      * remove enough segments to stay below the maximum.
                               2246                 :                :      */
                               2247                 :           1148 :     minSegNo = lastredoptr / wal_segment_size +
 2399 andres@anarazel.de       2248                 :           1148 :         ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1;
 1579 michael@paquier.xyz      2249                 :           1148 :     maxSegNo = lastredoptr / wal_segment_size +
 2399 andres@anarazel.de       2250                 :           1148 :         ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1;
                               2251                 :                : 
                               2252                 :                :     /*
                               2253                 :                :      * Between those limits, recycle enough segments to get us through to the
                               2254                 :                :      * estimated end of next checkpoint.
                               2255                 :                :      *
                               2256                 :                :      * To estimate where the next checkpoint will finish, assume that the
                               2257                 :                :      * system runs steadily consuming CheckPointDistanceEstimate bytes between
                               2258                 :                :      * every checkpoint.
                               2259                 :                :      */
 2350 simon@2ndQuadrant.co     2260                 :           1148 :     distance = (1.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate;
                               2261                 :                :     /* add 10% for good measure. */
 3338 heikki.linnakangas@i     2262                 :           1148 :     distance *= 1.10;
                               2263                 :                : 
 1579 michael@paquier.xyz      2264                 :           1148 :     recycleSegNo = (XLogSegNo) ceil(((double) lastredoptr + distance) /
                               2265                 :                :                                     wal_segment_size);
                               2266                 :                : 
 3338 heikki.linnakangas@i     2267         [ +  + ]:           1148 :     if (recycleSegNo < minSegNo)
                               2268                 :            952 :         recycleSegNo = minSegNo;
                               2269         [ +  + ]:           1148 :     if (recycleSegNo > maxSegNo)
                               2270                 :             68 :         recycleSegNo = maxSegNo;
                               2271                 :                : 
                               2272                 :           1148 :     return recycleSegNo;
                               2273                 :                : }
                               2274                 :                : 
                               2275                 :                : /*
                               2276                 :                :  * Check whether we've consumed enough xlog space that a checkpoint is needed.
                               2277                 :                :  *
                               2278                 :                :  * new_segno indicates a log file that has just been filled up (or read
                               2279                 :                :  * during recovery). We measure the distance from RedoRecPtr to new_segno
                               2280                 :                :  * and see if that exceeds CheckPointSegments.
                               2281                 :                :  *
                               2282                 :                :  * Note: it is caller's responsibility that RedoRecPtr is up-to-date.
                               2283                 :                :  */
                               2284                 :                : bool
 4312                          2285                 :           1038 : XLogCheckpointNeeded(XLogSegNo new_segno)
                               2286                 :                : {
                               2287                 :                :     XLogSegNo   old_segno;
                               2288                 :                : 
 2399 andres@anarazel.de       2289                 :           1038 :     XLByteToSeg(RedoRecPtr, old_segno, wal_segment_size);
                               2290                 :                : 
 4312 heikki.linnakangas@i     2291         [ +  + ]:           1038 :     if (new_segno >= old_segno + (uint64) (CheckPointSegments - 1))
 6029 tgl@sss.pgh.pa.us        2292                 :            172 :         return true;
                               2293                 :            866 :     return false;
                               2294                 :                : }
                               2295                 :                : 
                               2296                 :                : /*
                               2297                 :                :  * Write and/or fsync the log at least as far as WriteRqst indicates.
                               2298                 :                :  *
                               2299                 :                :  * If flexible == true, we don't have to write as far as WriteRqst, but
                               2300                 :                :  * may stop at any convenient boundary (such as a cache or logfile boundary).
                               2301                 :                :  * This option allows us to avoid uselessly issuing multiple writes when a
                               2302                 :                :  * single one would do.
                               2303                 :                :  *
                               2304                 :                :  * Must be called with WALWriteLock held. WaitXLogInsertionsToFinish(WriteRqst)
                               2305                 :                :  * must be called before grabbing the lock, to make sure the data is ready to
                               2306                 :                :  * write.
                               2307                 :                :  */
                               2308                 :                : static void
  891 rhaas@postgresql.org     2309                 :         638833 : XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
                               2310                 :                : {
                               2311                 :                :     bool        ispartialpage;
                               2312                 :                :     bool        last_iteration;
                               2313                 :                :     bool        finishing_seg;
                               2314                 :                :     int         curridx;
                               2315                 :                :     int         npages;
                               2316                 :                :     int         startidx;
                               2317                 :                :     uint32      startoffset;
                               2318                 :                : 
                               2319                 :                :     /* We should always be inside a critical section here */
 6939 tgl@sss.pgh.pa.us        2320         [ -  + ]:         638833 :     Assert(CritSectionCount > 0);
                               2321                 :                : 
                               2322                 :                :     /*
                               2323                 :                :      * Update local LogwrtResult (caller probably did this already, but...)
                               2324                 :                :      */
   11 alvherre@alvh.no-ip.     2325                 :GNC      638833 :     RefreshXLogWriteResult(LogwrtResult);
                               2326                 :                : 
                               2327                 :                :     /*
                               2328                 :                :      * Since successive pages in the xlog cache are consecutively allocated,
                               2329                 :                :      * we can usually gather multiple pages together and issue just one
                               2330                 :                :      * write() call.  npages is the number of pages we have determined can be
                               2331                 :                :      * written together; startidx is the cache block index of the first one,
                               2332                 :                :      * and startoffset is the file offset at which it should go. The latter
                               2333                 :                :      * two variables are only valid when npages > 0, but we must initialize
                               2334                 :                :      * all of them to keep the compiler quiet.
                               2335                 :                :      */
 6810 tgl@sss.pgh.pa.us        2336                 :CBC      638833 :     npages = 0;
                               2337                 :         638833 :     startidx = 0;
                               2338                 :         638833 :     startoffset = 0;
                               2339                 :                : 
                               2340                 :                :     /*
                               2341                 :                :      * Within the loop, curridx is the cache block index of the page to
                               2342                 :                :      * consider writing.  Begin at the buffer containing the next unwritten
                               2343                 :                :      * page, or last partially written page.
                               2344                 :                :      */
 3924 heikki.linnakangas@i     2345                 :         638833 :     curridx = XLogRecPtrToBufIdx(LogwrtResult.Write);
                               2346                 :                : 
 4125 alvherre@alvh.no-ip.     2347         [ +  + ]:        1483647 :     while (LogwrtResult.Write < WriteRqst.Write)
                               2348                 :                :     {
                               2349                 :                :         /*
                               2350                 :                :          * Make sure we're not ahead of the insert process.  This could happen
                               2351                 :                :          * if we're passed a bogus WriteRqst.Write that is past the end of the
                               2352                 :                :          * last page that's been initialized by AdvanceXLInsertBuffer.
                               2353                 :                :          */
  117 jdavis@postgresql.or     2354                 :GNC      958716 :         XLogRecPtr  EndPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[curridx]);
                               2355                 :                : 
 3933 heikki.linnakangas@i     2356         [ -  + ]:CBC      958716 :         if (LogwrtResult.Write >= EndPtr)
 7573 tgl@sss.pgh.pa.us        2357         [ #  # ]:UBC           0 :             elog(PANIC, "xlog write request %X/%X is past end of log %X/%X",
                               2358                 :                :                  LSN_FORMAT_ARGS(LogwrtResult.Write),
                               2359                 :                :                  LSN_FORMAT_ARGS(EndPtr));
                               2360                 :                : 
                               2361                 :                :         /* Advance LogwrtResult.Write to end of current buffer page */
 3933 heikki.linnakangas@i     2362                 :CBC      958716 :         LogwrtResult.Write = EndPtr;
 4125 alvherre@alvh.no-ip.     2363                 :         958716 :         ispartialpage = WriteRqst.Write < LogwrtResult.Write;
                               2364                 :                : 
 2399 andres@anarazel.de       2365         [ +  + ]:         958716 :         if (!XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2366                 :                :                              wal_segment_size))
                               2367                 :                :         {
                               2368                 :                :             /*
                               2369                 :                :              * Switch to new logfile segment.  We cannot have any pending
                               2370                 :                :              * pages here (since we dump what we have at segment end).
                               2371                 :                :              */
 6810 tgl@sss.pgh.pa.us        2372         [ -  + ]:           8192 :             Assert(npages == 0);
 8433                          2373         [ +  + ]:           8192 :             if (openLogFile >= 0)
 6513 bruce@momjian.us         2374                 :           1885 :                 XLogFileClose();
 2399 andres@anarazel.de       2375                 :           8192 :             XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2376                 :                :                             wal_segment_size);
  891 rhaas@postgresql.org     2377                 :           8192 :             openLogTLI = tli;
                               2378                 :                : 
                               2379                 :                :             /* create/use new log file */
                               2380                 :           8192 :             openLogFile = XLogFileInit(openLogSegNo, tli);
 1511 tgl@sss.pgh.pa.us        2381                 :           8192 :             ReserveExternalFD();
                               2382                 :                :         }
                               2383                 :                : 
                               2384                 :                :         /* Make sure we have the current logfile open */
 8433                          2385         [ -  + ]:         958716 :         if (openLogFile < 0)
                               2386                 :                :         {
 2399 andres@anarazel.de       2387                 :UBC           0 :             XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2388                 :                :                             wal_segment_size);
  891 rhaas@postgresql.org     2389                 :              0 :             openLogTLI = tli;
                               2390                 :              0 :             openLogFile = XLogFileOpen(openLogSegNo, tli);
 1511 tgl@sss.pgh.pa.us        2391                 :              0 :             ReserveExternalFD();
                               2392                 :                :         }
                               2393                 :                : 
                               2394                 :                :         /* Add current page to the set of pending pages-to-dump */
 6810 tgl@sss.pgh.pa.us        2395         [ +  + ]:CBC      958716 :         if (npages == 0)
                               2396                 :                :         {
                               2397                 :                :             /* first of group */
                               2398                 :         643833 :             startidx = curridx;
 2399 andres@anarazel.de       2399                 :         643833 :             startoffset = XLogSegmentOffset(LogwrtResult.Write - XLOG_BLCKSZ,
                               2400                 :                :                                             wal_segment_size);
                               2401                 :                :         }
 6810 tgl@sss.pgh.pa.us        2402                 :         958716 :         npages++;
                               2403                 :                : 
                               2404                 :                :         /*
                               2405                 :                :          * Dump the set if this will be the last loop iteration, or if we are
                               2406                 :                :          * at the last page of the cache area (since the next page won't be
                               2407                 :                :          * contiguous in memory), or if we are at the end of the logfile
                               2408                 :                :          * segment.
                               2409                 :                :          */
 4125 alvherre@alvh.no-ip.     2410                 :         958716 :         last_iteration = WriteRqst.Write <= LogwrtResult.Write;
                               2411                 :                : 
 6810 tgl@sss.pgh.pa.us        2412         [ +  + ]:        1806439 :         finishing_seg = !ispartialpage &&
 2399 andres@anarazel.de       2413         [ +  + ]:         847723 :             (startoffset + npages * XLOG_BLCKSZ) >= wal_segment_size;
                               2414                 :                : 
 6461 tgl@sss.pgh.pa.us        2415         [ +  + ]:         958716 :         if (last_iteration ||
 6810                          2416   [ +  +  -  + ]:         320367 :             curridx == XLogCtl->XLogCacheBlck ||
                               2417                 :                :             finishing_seg)
                               2418                 :                :         {
                               2419                 :                :             char       *from;
                               2420                 :                :             Size        nbytes;
                               2421                 :                :             Size        nleft;
                               2422                 :                :             ssize_t     written;
                               2423                 :                :             instr_time  start;
                               2424                 :                : 
                               2425                 :                :             /* OK to write the page(s) */
 6586                          2426                 :         643833 :             from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
                               2427                 :         643833 :             nbytes = npages * (Size) XLOG_BLCKSZ;
 3940 heikki.linnakangas@i     2428                 :         643833 :             nleft = nbytes;
                               2429                 :                :             do
                               2430                 :                :             {
                               2431                 :         643833 :                 errno = 0;
                               2432                 :                : 
                               2433                 :                :                 /* Measure I/O timing to write WAL data */
 1132 fujii@postgresql.org     2434         [ -  + ]:         643833 :                 if (track_wal_io_timing)
 1132 fujii@postgresql.org     2435                 :UBC           0 :                     INSTR_TIME_SET_CURRENT(start);
                               2436                 :                :                 else
  450 andres@anarazel.de       2437                 :CBC      643833 :                     INSTR_TIME_SET_ZERO(start);
                               2438                 :                : 
 2584 rhaas@postgresql.org     2439                 :         643833 :                 pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
  563 tmunro@postgresql.or     2440                 :         643833 :                 written = pg_pwrite(openLogFile, from, nleft, startoffset);
 2584 rhaas@postgresql.org     2441                 :         643833 :                 pgstat_report_wait_end();
                               2442                 :                : 
                               2443                 :                :                 /*
                               2444                 :                :                  * Increment the I/O timing and the number of times WAL data
                               2445                 :                :                  * were written out to disk.
                               2446                 :                :                  */
 1132 fujii@postgresql.org     2447         [ -  + ]:         643833 :                 if (track_wal_io_timing)
                               2448                 :                :                 {
                               2449                 :                :                     instr_time  end;
                               2450                 :                : 
  212 dgustafsson@postgres     2451                 :UNC           0 :                     INSTR_TIME_SET_CURRENT(end);
                               2452                 :              0 :                     INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
                               2453                 :                :                 }
                               2454                 :                : 
  739 andres@anarazel.de       2455                 :CBC      643833 :                 PendingWalStats.wal_write++;
                               2456                 :                : 
 3940 heikki.linnakangas@i     2457         [ -  + ]:         643833 :                 if (written <= 0)
                               2458                 :                :                 {
                               2459                 :                :                     char        xlogfname[MAXFNAMELEN];
                               2460                 :                :                     int         save_errno;
                               2461                 :                : 
 3940 heikki.linnakangas@i     2462         [ #  # ]:UBC           0 :                     if (errno == EINTR)
                               2463                 :              0 :                         continue;
                               2464                 :                : 
 1594 michael@paquier.xyz      2465                 :              0 :                     save_errno = errno;
  891 rhaas@postgresql.org     2466                 :              0 :                     XLogFileName(xlogfname, tli, openLogSegNo,
                               2467                 :                :                                  wal_segment_size);
 1594 michael@paquier.xyz      2468                 :              0 :                     errno = save_errno;
 3940 heikki.linnakangas@i     2469         [ #  # ]:              0 :                     ereport(PANIC,
                               2470                 :                :                             (errcode_for_file_access(),
                               2471                 :                :                              errmsg("could not write to log file \"%s\" at offset %u, length %zu: %m",
                               2472                 :                :                                     xlogfname, startoffset, nleft)));
                               2473                 :                :                 }
 3940 heikki.linnakangas@i     2474                 :CBC      643833 :                 nleft -= written;
                               2475                 :         643833 :                 from += written;
 1985 tmunro@postgresql.or     2476                 :         643833 :                 startoffset += written;
 3940 heikki.linnakangas@i     2477         [ -  + ]:         643833 :             } while (nleft > 0);
                               2478                 :                : 
 6810 tgl@sss.pgh.pa.us        2479                 :         643833 :             npages = 0;
                               2480                 :                : 
                               2481                 :                :             /*
                               2482                 :                :              * If we just wrote the whole last page of a logfile segment,
                               2483                 :                :              * fsync the segment immediately.  This avoids having to go back
                               2484                 :                :              * and re-open prior segments when an fsync request comes along
                               2485                 :                :              * later. Doing it here ensures that one and only one backend will
                               2486                 :                :              * perform this fsync.
                               2487                 :                :              *
                               2488                 :                :              * This is also the right place to notify the Archiver that the
                               2489                 :                :              * segment is ready to copy to archival storage, and to update the
                               2490                 :                :              * timer for archive_timeout, and to signal for a checkpoint if
                               2491                 :                :              * too many logfile segments have been used since the last
                               2492                 :                :              * checkpoint.
                               2493                 :                :              */
 3933 heikki.linnakangas@i     2494         [ +  + ]:         643833 :             if (finishing_seg)
                               2495                 :                :             {
  891 rhaas@postgresql.org     2496                 :            781 :                 issue_xlog_fsync(openLogFile, openLogSegNo, tli);
                               2497                 :                : 
                               2498                 :                :                 /* signal that we need to wakeup walsenders later */
 4304                          2499                 :            781 :                 WalSndWakeupRequest();
                               2500                 :                : 
 2489 tgl@sss.pgh.pa.us        2501                 :            781 :                 LogwrtResult.Flush = LogwrtResult.Write;    /* end of page */
                               2502                 :                : 
 6810                          2503   [ +  +  -  +  :            781 :                 if (XLogArchivingActive())
                                              +  + ]
  891 rhaas@postgresql.org     2504                 :             64 :                     XLogArchiveNotifySeg(openLogSegNo, tli);
                               2505                 :                : 
 3924 heikki.linnakangas@i     2506                 :            781 :                 XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 2670 andres@anarazel.de       2507                 :            781 :                 XLogCtl->lastSegSwitchLSN = LogwrtResult.Flush;
                               2508                 :                : 
                               2509                 :                :                 /*
                               2510                 :                :                  * Request a checkpoint if we've consumed too much xlog since
                               2511                 :                :                  * the last one.  For speed, we first check using the local
                               2512                 :                :                  * copy of RedoRecPtr, which might be out of date; if it looks
                               2513                 :                :                  * like a checkpoint is needed, forcibly update RedoRecPtr and
                               2514                 :                :                  * recheck.
                               2515                 :                :                  */
 4312 heikki.linnakangas@i     2516   [ +  +  +  + ]:            781 :                 if (IsUnderPostmaster && XLogCheckpointNeeded(openLogSegNo))
                               2517                 :                :                 {
 6029 tgl@sss.pgh.pa.us        2518                 :             71 :                     (void) GetRedoRecPtr();
 4312 heikki.linnakangas@i     2519         [ +  + ]:             71 :                     if (XLogCheckpointNeeded(openLogSegNo))
 6133 tgl@sss.pgh.pa.us        2520                 :             36 :                         RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
                               2521                 :                :                 }
                               2522                 :                :             }
                               2523                 :                :         }
                               2524                 :                : 
 8433                          2525         [ +  + ]:         958716 :         if (ispartialpage)
                               2526                 :                :         {
                               2527                 :                :             /* Only asked to write a partial page */
                               2528                 :         110993 :             LogwrtResult.Write = WriteRqst.Write;
                               2529                 :         110993 :             break;
                               2530                 :                :         }
 6810                          2531         [ +  + ]:         847723 :         curridx = NextBufIdx(curridx);
                               2532                 :                : 
                               2533                 :                :         /* If flexible, break out of loop as soon as we wrote something */
                               2534   [ +  +  +  + ]:         847723 :         if (flexible && npages == 0)
                               2535                 :           2909 :             break;
                               2536                 :                :     }
                               2537                 :                : 
                               2538         [ -  + ]:         638833 :     Assert(npages == 0);
                               2539                 :                : 
                               2540                 :                :     /*
                               2541                 :                :      * If asked to flush, do so
                               2542                 :                :      */
 4125 alvherre@alvh.no-ip.     2543         [ +  + ]:         638833 :     if (LogwrtResult.Flush < WriteRqst.Flush &&
                               2544         [ +  + ]:         114059 :         LogwrtResult.Flush < LogwrtResult.Write)
                               2545                 :                :     {
                               2546                 :                :         /*
                               2547                 :                :          * Could get here without iterating above loop, in which case we might
                               2548                 :                :          * have no open file or the wrong one.  However, we do not need to
                               2549                 :                :          * fsync more than one file.
                               2550                 :                :          */
  184 nathan@postgresql.or     2551         [ +  - ]:GNC      113987 :         if (wal_sync_method != WAL_SYNC_METHOD_OPEN &&
                               2552         [ +  - ]:         113987 :             wal_sync_method != WAL_SYNC_METHOD_OPEN_DSYNC)
                               2553                 :                :         {
 8430 tgl@sss.pgh.pa.us        2554         [ +  + ]:CBC      113987 :             if (openLogFile >= 0 &&
 2399 andres@anarazel.de       2555         [ -  + ]:         113974 :                 !XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2556                 :                :                                  wal_segment_size))
 6513 bruce@momjian.us         2557                 :UBC           0 :                 XLogFileClose();
 8430 tgl@sss.pgh.pa.us        2558         [ +  + ]:CBC      113987 :             if (openLogFile < 0)
                               2559                 :                :             {
 2399 andres@anarazel.de       2560                 :             13 :                 XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2561                 :                :                                 wal_segment_size);
  891 rhaas@postgresql.org     2562                 :             13 :                 openLogTLI = tli;
                               2563                 :             13 :                 openLogFile = XLogFileOpen(openLogSegNo, tli);
 1511 tgl@sss.pgh.pa.us        2564                 :             13 :                 ReserveExternalFD();
                               2565                 :                :             }
                               2566                 :                : 
  891 rhaas@postgresql.org     2567                 :         113987 :             issue_xlog_fsync(openLogFile, openLogSegNo, tli);
                               2568                 :                :         }
                               2569                 :                : 
                               2570                 :                :         /* signal that we need to wakeup walsenders later */
 4304                          2571                 :         113987 :         WalSndWakeupRequest();
                               2572                 :                : 
 8433 tgl@sss.pgh.pa.us        2573                 :         113987 :         LogwrtResult.Flush = LogwrtResult.Write;
                               2574                 :                :     }
                               2575                 :                : 
                               2576                 :                :     /*
                               2577                 :                :      * Update shared-memory status
                               2578                 :                :      *
                               2579                 :                :      * We make sure that the shared 'request' values do not fall behind the
                               2580                 :                :      * 'result' values.  This is not absolutely essential, but it saves some
                               2581                 :                :      * code in a couple of places.
                               2582                 :                :      */
    9 alvherre@alvh.no-ip.     2583         [ +  + ]:GNC      638833 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2584         [ +  + ]:         638833 :     if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
                               2585                 :         105859 :         XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
                               2586         [ +  + ]:         638833 :     if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
                               2587                 :         114446 :         XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
                               2588                 :         638833 :     SpinLockRelease(&XLogCtl->info_lck);
                               2589                 :                : 
                               2590                 :                :     /*
                               2591                 :                :      * We write Write first, bar, then Flush.  When reading, the opposite must
                               2592                 :                :      * be done (with a matching barrier in between), so that we always see a
                               2593                 :                :      * Flush value that trails behind the Write value seen.
                               2594                 :                :      */
                               2595                 :         638833 :     pg_atomic_write_u64(&XLogCtl->logWriteResult, LogwrtResult.Write);
                               2596                 :         638833 :     pg_write_barrier();
                               2597                 :         638833 :     pg_atomic_write_u64(&XLogCtl->logFlushResult, LogwrtResult.Flush);
                               2598                 :                : 
                               2599                 :                : #ifdef USE_ASSERT_CHECKING
                               2600                 :                :     {
                               2601                 :                :         XLogRecPtr  Flush;
                               2602                 :                :         XLogRecPtr  Write;
                               2603                 :                :         XLogRecPtr  Insert;
                               2604                 :                : 
                               2605                 :         638833 :         Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult);
                               2606                 :         638833 :         pg_read_barrier();
                               2607                 :         638833 :         Write = pg_atomic_read_u64(&XLogCtl->logWriteResult);
    7                          2608                 :         638833 :         pg_read_barrier();
                               2609                 :         638833 :         Insert = pg_atomic_read_u64(&XLogCtl->logInsertResult);
                               2610                 :                : 
                               2611                 :                :         /* WAL written to disk is always ahead of WAL flushed */
    9                          2612         [ -  + ]:         638833 :         Assert(Write >= Flush);
                               2613                 :                : 
                               2614                 :                :         /* WAL inserted to buffers is always ahead of WAL written */
    7                          2615         [ -  + ]:         638833 :         Assert(Insert >= Write);
                               2616                 :                :     }
                               2617                 :                : #endif
 8433 tgl@sss.pgh.pa.us        2618                 :CBC      638833 : }
                               2619                 :                : 
                               2620                 :                : /*
                               2621                 :                :  * Record the LSN for an asynchronous transaction commit/abort
                               2622                 :                :  * and nudge the WALWriter if there is work for it to do.
                               2623                 :                :  * (This should not be called for synchronous commits.)
                               2624                 :                :  */
                               2625                 :                : void
 5008 simon@2ndQuadrant.co     2626                 :          40519 : XLogSetAsyncXactLSN(XLogRecPtr asyncXactLSN)
                               2627                 :                : {
 4536                          2628                 :          40519 :     XLogRecPtr  WriteRqstPtr = asyncXactLSN;
                               2629                 :                :     bool        sleeping;
  139 heikki.linnakangas@i     2630                 :GNC       40519 :     bool        wakeup = false;
                               2631                 :                :     XLogRecPtr  prevAsyncXactLSN;
                               2632                 :                : 
 3492 andres@anarazel.de       2633         [ +  + ]:CBC       40519 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2634                 :          40519 :     sleeping = XLogCtl->WalWriterSleeping;
  139 heikki.linnakangas@i     2635                 :GNC       40519 :     prevAsyncXactLSN = XLogCtl->asyncXactLSN;
 3492 andres@anarazel.de       2636         [ +  + ]:CBC       40519 :     if (XLogCtl->asyncXactLSN < asyncXactLSN)
                               2637                 :          40136 :         XLogCtl->asyncXactLSN = asyncXactLSN;
                               2638                 :          40519 :     SpinLockRelease(&XLogCtl->info_lck);
                               2639                 :                : 
                               2640                 :                :     /*
                               2641                 :                :      * If somebody else already called this function with a more aggressive
                               2642                 :                :      * LSN, they will have done what we needed (and perhaps more).
                               2643                 :                :      */
  139 heikki.linnakangas@i     2644         [ +  + ]:GNC       40519 :     if (asyncXactLSN <= prevAsyncXactLSN)
                               2645                 :            383 :         return;
                               2646                 :                : 
                               2647                 :                :     /*
                               2648                 :                :      * If the WALWriter is sleeping, kick it to make it come out of low-power
                               2649                 :                :      * mode, so that this async commit will reach disk within the expected
                               2650                 :                :      * amount of time.  Otherwise, determine whether it has enough WAL
                               2651                 :                :      * available to flush, the same way that XLogBackgroundFlush() does.
                               2652                 :                :      */
                               2653         [ +  + ]:          40136 :     if (sleeping)
                               2654                 :             24 :         wakeup = true;
                               2655                 :                :     else
                               2656                 :                :     {
                               2657                 :                :         int         flushblocks;
                               2658                 :                : 
    9 alvherre@alvh.no-ip.     2659                 :          40112 :         RefreshXLogWriteResult(LogwrtResult);
                               2660                 :                : 
  139 heikki.linnakangas@i     2661                 :          40112 :         flushblocks =
                               2662                 :          40112 :             WriteRqstPtr / XLOG_BLCKSZ - LogwrtResult.Flush / XLOG_BLCKSZ;
                               2663                 :                : 
                               2664   [ +  -  +  + ]:          40112 :         if (WalWriterFlushAfter == 0 || flushblocks >= WalWriterFlushAfter)
                               2665                 :             88 :             wakeup = true;
                               2666                 :                :     }
                               2667                 :                : 
                               2668   [ +  +  +  + ]:          40136 :     if (wakeup && ProcGlobal->walwriterLatch)
 4359 tgl@sss.pgh.pa.us        2669                 :CBC          27 :         SetLatch(ProcGlobal->walwriterLatch);
                               2670                 :                : }
                               2671                 :                : 
                               2672                 :                : /*
                               2673                 :                :  * Record the LSN up to which we can remove WAL because it's not required by
                               2674                 :                :  * any replication slot.
                               2675                 :                :  */
                               2676                 :                : void
 3726 rhaas@postgresql.org     2677                 :          22907 : XLogSetReplicationSlotMinimumLSN(XLogRecPtr lsn)
                               2678                 :                : {
 3492 andres@anarazel.de       2679         [ +  + ]:          22907 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2680                 :          22907 :     XLogCtl->replicationSlotMinLSN = lsn;
                               2681                 :          22907 :     SpinLockRelease(&XLogCtl->info_lck);
 3726 rhaas@postgresql.org     2682                 :          22907 : }
                               2683                 :                : 
                               2684                 :                : 
                               2685                 :                : /*
                               2686                 :                :  * Return the oldest LSN we must retain to satisfy the needs of some
                               2687                 :                :  * replication slot.
                               2688                 :                :  */
                               2689                 :                : static XLogRecPtr
                               2690                 :           1566 : XLogGetReplicationSlotMinimumLSN(void)
                               2691                 :                : {
                               2692                 :                :     XLogRecPtr  retval;
                               2693                 :                : 
 3492 andres@anarazel.de       2694         [ +  + ]:           1566 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2695                 :           1566 :     retval = XLogCtl->replicationSlotMinLSN;
                               2696                 :           1566 :     SpinLockRelease(&XLogCtl->info_lck);
                               2697                 :                : 
 3726 rhaas@postgresql.org     2698                 :           1566 :     return retval;
                               2699                 :                : }
                               2700                 :                : 
                               2701                 :                : /*
                               2702                 :                :  * Advance minRecoveryPoint in control file.
                               2703                 :                :  *
                               2704                 :                :  * If we crash during recovery, we must reach this point again before the
                               2705                 :                :  * database is consistent.
                               2706                 :                :  *
                               2707                 :                :  * If 'force' is true, 'lsn' argument is ignored. Otherwise, minRecoveryPoint
                               2708                 :                :  * is only updated if it's not already greater than or equal to 'lsn'.
                               2709                 :                :  */
                               2710                 :                : static void
 5534 heikki.linnakangas@i     2711                 :         120964 : UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
                               2712                 :                : {
                               2713                 :                :     /* Quick check using our local copy of the variable */
  788                          2714   [ +  +  +  +  :         120964 :     if (!updateMinRecoveryPoint || (!force && lsn <= LocalMinRecoveryPoint))
                                              +  + ]
 5534                          2715                 :         113152 :         return;
                               2716                 :                : 
                               2717                 :                :     /*
                               2718                 :                :      * An invalid minRecoveryPoint means that we need to recover all the WAL,
                               2719                 :                :      * i.e., we're doing crash recovery.  We never modify the control file's
                               2720                 :                :      * value in that case, so we can short-circuit future checks here too. The
                               2721                 :                :      * local values of minRecoveryPoint and minRecoveryPointTLI should not be
                               2722                 :                :      * updated until crash recovery finishes.  We only do this for the startup
                               2723                 :                :      * process as it should not update its own reference of minRecoveryPoint
                               2724                 :                :      * until it has finished crash recovery to make sure that all WAL
                               2725                 :                :      * available is replayed in this case.  This also saves from extra locks
                               2726                 :                :      * taken on the control file from the startup process.
                               2727                 :                :      */
  788                          2728   [ +  +  +  + ]:           7812 :     if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery)
                               2729                 :                :     {
 2110 michael@paquier.xyz      2730                 :             29 :         updateMinRecoveryPoint = false;
                               2731                 :             29 :         return;
                               2732                 :                :     }
                               2733                 :                : 
 5534 heikki.linnakangas@i     2734                 :           7783 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               2735                 :                : 
                               2736                 :                :     /* update local copy */
  788                          2737                 :           7783 :     LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               2738                 :           7783 :     LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               2739                 :                : 
                               2740         [ +  + ]:           7783 :     if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint))
 2053 michael@paquier.xyz      2741                 :              3 :         updateMinRecoveryPoint = false;
  788 heikki.linnakangas@i     2742   [ +  +  +  + ]:           7780 :     else if (force || LocalMinRecoveryPoint < lsn)
                               2743                 :                :     {
                               2744                 :                :         XLogRecPtr  newMinRecoveryPoint;
                               2745                 :                :         TimeLineID  newMinRecoveryPointTLI;
                               2746                 :                : 
                               2747                 :                :         /*
                               2748                 :                :          * To avoid having to update the control file too often, we update it
                               2749                 :                :          * all the way to the last record being replayed, even though 'lsn'
                               2750                 :                :          * would suffice for correctness.  This also allows the 'force' case
                               2751                 :                :          * to not need a valid 'lsn' value.
                               2752                 :                :          *
                               2753                 :                :          * Another important reason for doing it this way is that the passed
                               2754                 :                :          * 'lsn' value could be bogus, i.e., past the end of available WAL, if
                               2755                 :                :          * the caller got it from a corrupted heap page.  Accepting such a
                               2756                 :                :          * value as the min recovery point would prevent us from coming up at
                               2757                 :                :          * all.  Instead, we just log a warning and continue with recovery.
                               2758                 :                :          * (See also the comments about corrupt LSNs in XLogFlush.)
                               2759                 :                :          */
                               2760                 :           7129 :         newMinRecoveryPoint = GetCurrentReplayRecPtr(&newMinRecoveryPointTLI);
 4125 alvherre@alvh.no-ip.     2761   [ +  +  -  + ]:           7129 :         if (!force && newMinRecoveryPoint < lsn)
 5406 tgl@sss.pgh.pa.us        2762         [ #  # ]:UBC           0 :             elog(WARNING,
                               2763                 :                :                  "xlog min recovery request %X/%X is past current point %X/%X",
                               2764                 :                :                  LSN_FORMAT_ARGS(lsn), LSN_FORMAT_ARGS(newMinRecoveryPoint));
                               2765                 :                : 
                               2766                 :                :         /* update control file */
 4125 alvherre@alvh.no-ip.     2767         [ +  + ]:CBC        7129 :         if (ControlFile->minRecoveryPoint < newMinRecoveryPoint)
                               2768                 :                :         {
 5534 heikki.linnakangas@i     2769                 :           7106 :             ControlFile->minRecoveryPoint = newMinRecoveryPoint;
 4149                          2770                 :           7106 :             ControlFile->minRecoveryPointTLI = newMinRecoveryPointTLI;
 5534                          2771                 :           7106 :             UpdateControlFile();
  788                          2772                 :           7106 :             LocalMinRecoveryPoint = newMinRecoveryPoint;
                               2773                 :           7106 :             LocalMinRecoveryPointTLI = newMinRecoveryPointTLI;
                               2774                 :                : 
 5534                          2775         [ +  + ]:           7106 :             ereport(DEBUG2,
                               2776                 :                :                     (errmsg_internal("updated min recovery point to %X/%X on timeline %u",
                               2777                 :                :                                      LSN_FORMAT_ARGS(newMinRecoveryPoint),
                               2778                 :                :                                      newMinRecoveryPointTLI)));
                               2779                 :                :         }
                               2780                 :                :     }
                               2781                 :           7783 :     LWLockRelease(ControlFileLock);
                               2782                 :                : }
                               2783                 :                : 
                               2784                 :                : /*
                               2785                 :                :  * Ensure that all XLOG data through the given position is flushed to disk.
                               2786                 :                :  *
                               2787                 :                :  * NOTE: this differs from XLogWrite mainly in that the WALWriteLock is not
                               2788                 :                :  * already held, and we try to avoid acquiring it if possible.
                               2789                 :                :  */
                               2790                 :                : void
 8433 tgl@sss.pgh.pa.us        2791                 :         640332 : XLogFlush(XLogRecPtr record)
                               2792                 :                : {
                               2793                 :                :     XLogRecPtr  WriteRqstPtr;
                               2794                 :                :     XLogwrtRqst WriteRqst;
  886 rhaas@postgresql.org     2795                 :         640332 :     TimeLineID  insertTLI = XLogCtl->InsertTimeLineID;
                               2796                 :                : 
                               2797                 :                :     /*
                               2798                 :                :      * During REDO, we are reading not writing WAL.  Therefore, instead of
                               2799                 :                :      * trying to flush the WAL, we should update minRecoveryPoint instead. We
                               2800                 :                :      * test XLogInsertAllowed(), not InRecovery, because we need checkpointer
                               2801                 :                :      * to act this way too, and because when it tries to write the
                               2802                 :                :      * end-of-recovery checkpoint, it should indeed flush.
                               2803                 :                :      */
 5406 tgl@sss.pgh.pa.us        2804         [ +  + ]:         640332 :     if (!XLogInsertAllowed())
                               2805                 :                :     {
 5534 heikki.linnakangas@i     2806                 :         120869 :         UpdateMinRecoveryPoint(record, false);
 8433 tgl@sss.pgh.pa.us        2807                 :         516218 :         return;
                               2808                 :                :     }
                               2809                 :                : 
                               2810                 :                :     /* Quick exit if already known flushed */
 4125 alvherre@alvh.no-ip.     2811         [ +  + ]:         519463 :     if (record <= LogwrtResult.Flush)
 8433 tgl@sss.pgh.pa.us        2812                 :         395349 :         return;
                               2813                 :                : 
                               2814                 :                : #ifdef WAL_DEBUG
                               2815                 :                :     if (XLOG_DEBUG)
                               2816                 :                :         elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X",
                               2817                 :                :              LSN_FORMAT_ARGS(record),
                               2818                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Write),
                               2819                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               2820                 :                : #endif
                               2821                 :                : 
                               2822                 :         124114 :     START_CRIT_SECTION();
                               2823                 :                : 
                               2824                 :                :     /*
                               2825                 :                :      * Since fsync is usually a horribly expensive operation, we try to
                               2826                 :                :      * piggyback as much data as we can on each fsync: if we see any more data
                               2827                 :                :      * entered into the xlog buffer, we'll write and fsync that too, so that
                               2828                 :                :      * the final value of LogwrtResult.Flush is as large as possible. This
                               2829                 :                :      * gives us some chance of avoiding another fsync immediately after.
                               2830                 :                :      */
                               2831                 :                : 
                               2832                 :                :     /* initialize to given target; may increase below */
                               2833                 :         124114 :     WriteRqstPtr = record;
                               2834                 :                : 
                               2835                 :                :     /*
                               2836                 :                :      * Now wait until we get the write lock, or someone else does the flush
                               2837                 :                :      * for us.
                               2838                 :                :      */
                               2839                 :                :     for (;;)
 8143                          2840                 :           1003 :     {
                               2841                 :                :         XLogRecPtr  insertpos;
                               2842                 :                : 
                               2843                 :                :         /* done already? */
    9 alvherre@alvh.no-ip.     2844                 :GNC      125117 :         RefreshXLogWriteResult(LogwrtResult);
 4125 alvherre@alvh.no-ip.     2845         [ +  + ]:CBC      125117 :         if (record <= LogwrtResult.Flush)
 4458 heikki.linnakangas@i     2846                 :          13144 :             break;
                               2847                 :                : 
                               2848                 :                :         /*
                               2849                 :                :          * Before actually performing the write, wait for all in-flight
                               2850                 :                :          * insertions to the pages we're about to write to finish.
                               2851                 :                :          */
    9 alvherre@alvh.no-ip.     2852         [ +  + ]:GNC      111973 :         SpinLockAcquire(&XLogCtl->info_lck);
                               2853         [ +  + ]:         111973 :         if (WriteRqstPtr < XLogCtl->LogwrtRqst.Write)
                               2854                 :           3713 :             WriteRqstPtr = XLogCtl->LogwrtRqst.Write;
                               2855                 :         111973 :         SpinLockRelease(&XLogCtl->info_lck);
 3933 heikki.linnakangas@i     2856                 :CBC      111973 :         insertpos = WaitXLogInsertionsToFinish(WriteRqstPtr);
                               2857                 :                : 
                               2858                 :                :         /*
                               2859                 :                :          * Try to get the write lock. If we can't get it immediately, wait
                               2860                 :                :          * until it's released, and recheck if we still need to do the flush
                               2861                 :                :          * or if the backend that held the lock did it for us already. This
                               2862                 :                :          * helps to maintain a good rate of group committing when the system
                               2863                 :                :          * is bottlenecked by the speed of fsyncing.
                               2864                 :                :          */
 4449                          2865         [ +  + ]:         111973 :         if (!LWLockAcquireOrWait(WALWriteLock, LW_EXCLUSIVE))
                               2866                 :                :         {
                               2867                 :                :             /*
                               2868                 :                :              * The lock is now free, but we didn't acquire it yet. Before we
                               2869                 :                :              * do, loop back to check if someone else flushed the record for
                               2870                 :                :              * us already.
                               2871                 :                :              */
 4458                          2872                 :           1003 :             continue;
                               2873                 :                :         }
                               2874                 :                : 
                               2875                 :                :         /* Got the lock; recheck whether request is satisfied */
   11 alvherre@alvh.no-ip.     2876                 :GNC      110970 :         RefreshXLogWriteResult(LogwrtResult);
 4125 alvherre@alvh.no-ip.     2877         [ +  + ]:CBC      110970 :         if (record <= LogwrtResult.Flush)
                               2878                 :                :         {
 4304 rhaas@postgresql.org     2879                 :            206 :             LWLockRelease(WALWriteLock);
                               2880                 :            206 :             break;
                               2881                 :                :         }
                               2882                 :                : 
                               2883                 :                :         /*
                               2884                 :                :          * Sleep before flush! By adding a delay here, we may give further
                               2885                 :                :          * backends the opportunity to join the backlog of group commit
                               2886                 :                :          * followers; this can significantly improve transaction throughput,
                               2887                 :                :          * at the risk of increasing transaction latency.
                               2888                 :                :          *
                               2889                 :                :          * We do not sleep if enableFsync is not turned on, nor if there are
                               2890                 :                :          * fewer than CommitSiblings other backends with active transactions.
                               2891                 :                :          */
                               2892   [ -  +  -  -  :         110764 :         if (CommitDelay > 0 && enableFsync &&
                                              -  - ]
 4304 rhaas@postgresql.org     2893                 :UBC           0 :             MinimumActiveBackends(CommitSiblings))
                               2894                 :                :         {
                               2895                 :              0 :             pg_usleep(CommitDelay);
                               2896                 :                : 
                               2897                 :                :             /*
                               2898                 :                :              * Re-check how far we can now flush the WAL. It's generally not
                               2899                 :                :              * safe to call WaitXLogInsertionsToFinish while holding
                               2900                 :                :              * WALWriteLock, because an in-progress insertion might need to
                               2901                 :                :              * also grab WALWriteLock to make progress. But we know that all
                               2902                 :                :              * the insertions up to insertpos have already finished, because
                               2903                 :                :              * that's what the earlier WaitXLogInsertionsToFinish() returned.
                               2904                 :                :              * We're only calling it again to allow insertpos to be moved
                               2905                 :                :              * further forward, not to actually wait for anyone.
                               2906                 :                :              */
 3933 heikki.linnakangas@i     2907                 :              0 :             insertpos = WaitXLogInsertionsToFinish(insertpos);
                               2908                 :                :         }
                               2909                 :                : 
                               2910                 :                :         /* try to write/flush later additions to XLOG as well */
 3933 heikki.linnakangas@i     2911                 :CBC      110764 :         WriteRqst.Write = insertpos;
                               2912                 :         110764 :         WriteRqst.Flush = insertpos;
                               2913                 :                : 
  891 rhaas@postgresql.org     2914                 :         110764 :         XLogWrite(WriteRqst, insertTLI, false);
                               2915                 :                : 
 8233 tgl@sss.pgh.pa.us        2916                 :         110764 :         LWLockRelease(WALWriteLock);
                               2917                 :                :         /* done */
 4458 heikki.linnakangas@i     2918                 :         110764 :         break;
                               2919                 :                :     }
                               2920                 :                : 
 8433 tgl@sss.pgh.pa.us        2921         [ -  + ]:         124114 :     END_CRIT_SECTION();
                               2922                 :                : 
                               2923                 :                :     /* wake up walsenders now that we've released heavily contended locks */
  372 andres@anarazel.de       2924                 :         124114 :     WalSndWakeupProcessRequests(true, !RecoveryInProgress());
                               2925                 :                : 
                               2926                 :                :     /*
                               2927                 :                :      * If we still haven't flushed to the request point then we have a
                               2928                 :                :      * problem; most likely, the requested flush point is past end of XLOG.
                               2929                 :                :      * This has been seen to occur when a disk page has a corrupted LSN.
                               2930                 :                :      *
                               2931                 :                :      * Formerly we treated this as a PANIC condition, but that hurts the
                               2932                 :                :      * system's robustness rather than helping it: we do not want to take down
                               2933                 :                :      * the whole system due to corruption on one data page.  In particular, if
                               2934                 :                :      * the bad page is encountered again during recovery then we would be
                               2935                 :                :      * unable to restart the database at all!  (This scenario actually
                               2936                 :                :      * happened in the field several times with 7.1 releases.)  As of 8.4, bad
                               2937                 :                :      * LSNs encountered during recovery are UpdateMinRecoveryPoint's problem;
                               2938                 :                :      * the only time we can reach here during recovery is while flushing the
                               2939                 :                :      * end-of-recovery checkpoint record, and we don't expect that to have a
                               2940                 :                :      * bad LSN.
                               2941                 :                :      *
                               2942                 :                :      * Note that for calls from xact.c, the ERROR will be promoted to PANIC
                               2943                 :                :      * since xact.c calls this routine inside a critical section.  However,
                               2944                 :                :      * calls from bufmgr.c are not within critical sections and so we will not
                               2945                 :                :      * force a restart for a bad LSN on a data page.
                               2946                 :                :      */
 4125 alvherre@alvh.no-ip.     2947         [ -  + ]:         124114 :     if (LogwrtResult.Flush < record)
 5406 tgl@sss.pgh.pa.us        2948         [ #  # ]:UBC           0 :         elog(ERROR,
                               2949                 :                :              "xlog flush request %X/%X is not satisfied --- flushed only to %X/%X",
                               2950                 :                :              LSN_FORMAT_ARGS(record),
                               2951                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               2952                 :                : }
                               2953                 :                : 
                               2954                 :                : /*
                               2955                 :                :  * Write & flush xlog, but without specifying exactly where to.
                               2956                 :                :  *
                               2957                 :                :  * We normally write only completed blocks; but if there is nothing to do on
                               2958                 :                :  * that basis, we check for unwritten async commits in the current incomplete
                               2959                 :                :  * block, and write through the latest one of those.  Thus, if async commits
                               2960                 :                :  * are not being used, we will write complete blocks only.
                               2961                 :                :  *
                               2962                 :                :  * If, based on the above, there's anything to write we do so immediately. But
                               2963                 :                :  * to avoid calling fsync, fdatasync et. al. at a rate that'd impact
                               2964                 :                :  * concurrent IO, we only flush WAL every wal_writer_delay ms, or if there's
                               2965                 :                :  * more than wal_writer_flush_after unflushed blocks.
                               2966                 :                :  *
                               2967                 :                :  * We can guarantee that async commits reach disk after at most three
                               2968                 :                :  * wal_writer_delay cycles. (When flushing complete blocks, we allow XLogWrite
                               2969                 :                :  * to write "flexibly", meaning it can stop at the end of the buffer ring;
                               2970                 :                :  * this makes a difference only with very high load or long wal_writer_delay,
                               2971                 :                :  * but imposes one extra cycle for the worst case for async commits.)
                               2972                 :                :  *
                               2973                 :                :  * This routine is invoked periodically by the background walwriter process.
                               2974                 :                :  *
                               2975                 :                :  * Returns true if there was any work to do, even if we skipped flushing due
                               2976                 :                :  * to wal_writer_delay/wal_writer_flush_after.
                               2977                 :                :  */
                               2978                 :                : bool
 6109 tgl@sss.pgh.pa.us        2979                 :CBC       23201 : XLogBackgroundFlush(void)
                               2980                 :                : {
                               2981                 :                :     XLogwrtRqst WriteRqst;
                               2982                 :          23201 :     bool        flexible = true;
                               2983                 :                :     static TimestampTz lastflush;
                               2984                 :                :     TimestampTz now;
                               2985                 :                :     int         flushblocks;
                               2986                 :                :     TimeLineID  insertTLI;
                               2987                 :                : 
                               2988                 :                :     /* XLOG doesn't need flushing during recovery */
 5534 heikki.linnakangas@i     2989         [ +  + ]:          23201 :     if (RecoveryInProgress())
 4359 tgl@sss.pgh.pa.us        2990                 :              8 :         return false;
                               2991                 :                : 
                               2992                 :                :     /*
                               2993                 :                :      * Since we're not in recovery, InsertTimeLineID is set and can't change,
                               2994                 :                :      * so we can read it without a lock.
                               2995                 :                :      */
  886 rhaas@postgresql.org     2996                 :          23193 :     insertTLI = XLogCtl->InsertTimeLineID;
                               2997                 :                : 
                               2998                 :                :     /* read updated LogwrtRqst */
 3492 andres@anarazel.de       2999         [ -  + ]:          23193 :     SpinLockAcquire(&XLogCtl->info_lck);
 2981                          3000                 :          23193 :     WriteRqst = XLogCtl->LogwrtRqst;
 3492                          3001                 :          23193 :     SpinLockRelease(&XLogCtl->info_lck);
                               3002                 :                : 
                               3003                 :                :     /* back off to last completed page boundary */
 2981                          3004                 :          23193 :     WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
                               3005                 :                : 
                               3006                 :                :     /* if we have already flushed that far, consider async commit records */
    9 alvherre@alvh.no-ip.     3007                 :GNC       23193 :     RefreshXLogWriteResult(LogwrtResult);
 2981 andres@anarazel.de       3008         [ +  + ]:CBC       23193 :     if (WriteRqst.Write <= LogwrtResult.Flush)
                               3009                 :                :     {
 3492                          3010         [ +  + ]:          20259 :         SpinLockAcquire(&XLogCtl->info_lck);
 2981                          3011                 :          20259 :         WriteRqst.Write = XLogCtl->asyncXactLSN;
 3492                          3012                 :          20259 :         SpinLockRelease(&XLogCtl->info_lck);
 6109 tgl@sss.pgh.pa.us        3013                 :          20259 :         flexible = false;       /* ensure it all gets written */
                               3014                 :                :     }
                               3015                 :                : 
                               3016                 :                :     /*
                               3017                 :                :      * If already known flushed, we're done. Just need to check if we are
                               3018                 :                :      * holding an open file handle to a logfile that's no longer in use,
                               3019                 :                :      * preventing the file from being deleted.
                               3020                 :                :      */
 2981 andres@anarazel.de       3021         [ +  + ]:          23193 :     if (WriteRqst.Write <= LogwrtResult.Flush)
                               3022                 :                :     {
 5031 bruce@momjian.us         3023         [ +  + ]:          19574 :         if (openLogFile >= 0)
                               3024                 :                :         {
 2399 andres@anarazel.de       3025         [ +  + ]:           5871 :             if (!XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               3026                 :                :                                  wal_segment_size))
                               3027                 :                :             {
 5058 magnus@hagander.net      3028                 :            144 :                 XLogFileClose();
                               3029                 :                :             }
                               3030                 :                :         }
 4359 tgl@sss.pgh.pa.us        3031                 :          19574 :         return false;
                               3032                 :                :     }
                               3033                 :                : 
                               3034                 :                :     /*
                               3035                 :                :      * Determine how far to flush WAL, based on the wal_writer_delay and
                               3036                 :                :      * wal_writer_flush_after GUCs.
                               3037                 :                :      *
                               3038                 :                :      * Note that XLogSetAsyncXactLSN() performs similar calculation based on
                               3039                 :                :      * wal_writer_flush_after, to decide when to wake us up.  Make sure the
                               3040                 :                :      * logic is the same in both places if you change this.
                               3041                 :                :      */
 2981 andres@anarazel.de       3042                 :           3619 :     now = GetCurrentTimestamp();
  139 heikki.linnakangas@i     3043                 :GNC        3619 :     flushblocks =
 2981 andres@anarazel.de       3044                 :CBC        3619 :         WriteRqst.Write / XLOG_BLCKSZ - LogwrtResult.Flush / XLOG_BLCKSZ;
                               3045                 :                : 
                               3046   [ +  -  +  + ]:           3619 :     if (WalWriterFlushAfter == 0 || lastflush == 0)
                               3047                 :                :     {
                               3048                 :                :         /* first call, or block based limits disabled */
                               3049                 :            298 :         WriteRqst.Flush = WriteRqst.Write;
                               3050                 :            298 :         lastflush = now;
                               3051                 :                :     }
                               3052         [ +  + ]:           3321 :     else if (TimestampDifferenceExceeds(lastflush, now, WalWriterDelay))
                               3053                 :                :     {
                               3054                 :                :         /*
                               3055                 :                :          * Flush the writes at least every WalWriterDelay ms. This is
                               3056                 :                :          * important to bound the amount of time it takes for an asynchronous
                               3057                 :                :          * commit to hit disk.
                               3058                 :                :          */
                               3059                 :           3312 :         WriteRqst.Flush = WriteRqst.Write;
                               3060                 :           3312 :         lastflush = now;
                               3061                 :                :     }
  139 heikki.linnakangas@i     3062         [ +  + ]:GNC           9 :     else if (flushblocks >= WalWriterFlushAfter)
                               3063                 :                :     {
                               3064                 :                :         /* exceeded wal_writer_flush_after blocks, flush */
 2981 andres@anarazel.de       3065                 :GBC           3 :         WriteRqst.Flush = WriteRqst.Write;
                               3066                 :              3 :         lastflush = now;
                               3067                 :                :     }
                               3068                 :                :     else
                               3069                 :                :     {
                               3070                 :                :         /* no flushing, this time round */
 2981 andres@anarazel.de       3071                 :CBC           6 :         WriteRqst.Flush = 0;
                               3072                 :                :     }
                               3073                 :                : 
                               3074                 :                : #ifdef WAL_DEBUG
                               3075                 :                :     if (XLOG_DEBUG)
                               3076                 :                :         elog(LOG, "xlog bg flush request write %X/%X; flush: %X/%X, current is write %X/%X; flush %X/%X",
                               3077                 :                :              LSN_FORMAT_ARGS(WriteRqst.Write),
                               3078                 :                :              LSN_FORMAT_ARGS(WriteRqst.Flush),
                               3079                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Write),
                               3080                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               3081                 :                : #endif
                               3082                 :                : 
 6109 tgl@sss.pgh.pa.us        3083                 :           3619 :     START_CRIT_SECTION();
                               3084                 :                : 
                               3085                 :                :     /* now wait for any in-progress insertions to finish and get write lock */
 2981 andres@anarazel.de       3086                 :           3619 :     WaitXLogInsertionsToFinish(WriteRqst.Write);
 6109 tgl@sss.pgh.pa.us        3087                 :           3619 :     LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
   11 alvherre@alvh.no-ip.     3088                 :GNC        3619 :     RefreshXLogWriteResult(LogwrtResult);
 2981 andres@anarazel.de       3089         [ +  + ]:CBC        3619 :     if (WriteRqst.Write > LogwrtResult.Write ||
                               3090         [ +  + ]:             31 :         WriteRqst.Flush > LogwrtResult.Flush)
                               3091                 :                :     {
  891 rhaas@postgresql.org     3092                 :           3600 :         XLogWrite(WriteRqst, insertTLI, flexible);
                               3093                 :                :     }
 6109 tgl@sss.pgh.pa.us        3094                 :           3619 :     LWLockRelease(WALWriteLock);
                               3095                 :                : 
                               3096         [ -  + ]:           3619 :     END_CRIT_SECTION();
                               3097                 :                : 
                               3098                 :                :     /* wake up walsenders now that we've released heavily contended locks */
  372 andres@anarazel.de       3099                 :           3619 :     WalSndWakeupProcessRequests(true, !RecoveryInProgress());
                               3100                 :                : 
                               3101                 :                :     /*
                               3102                 :                :      * Great, done. To take some work off the critical path, try to initialize
                               3103                 :                :      * as many of the no-longer-needed WAL buffers for future use as we can.
                               3104                 :                :      */
  891 rhaas@postgresql.org     3105                 :           3619 :     AdvanceXLInsertBuffer(InvalidXLogRecPtr, insertTLI, true);
                               3106                 :                : 
                               3107                 :                :     /*
                               3108                 :                :      * If we determined that we need to write data, but somebody else
                               3109                 :                :      * wrote/flushed already, it should be considered as being active, to
                               3110                 :                :      * avoid hibernating too early.
                               3111                 :                :      */
 2981 andres@anarazel.de       3112                 :           3619 :     return true;
                               3113                 :                : }
                               3114                 :                : 
                               3115                 :                : /*
                               3116                 :                :  * Test whether XLOG data has been flushed up to (at least) the given position.
                               3117                 :                :  *
                               3118                 :                :  * Returns true if a flush is still needed.  (It may be that someone else
                               3119                 :                :  * is already in process of flushing that far, however.)
                               3120                 :                :  */
                               3121                 :                : bool
 6164 tgl@sss.pgh.pa.us        3122                 :        8107764 : XLogNeedsFlush(XLogRecPtr record)
                               3123                 :                : {
                               3124                 :                :     /*
                               3125                 :                :      * During recovery, we don't flush WAL but update minRecoveryPoint
                               3126                 :                :      * instead. So "needs flush" is taken to mean whether minRecoveryPoint
                               3127                 :                :      * would need to be updated.
                               3128                 :                :      */
 5534 heikki.linnakangas@i     3129         [ +  + ]:        8107764 :     if (RecoveryInProgress())
                               3130                 :                :     {
                               3131                 :                :         /*
                               3132                 :                :          * An invalid minRecoveryPoint means that we need to recover all the
                               3133                 :                :          * WAL, i.e., we're doing crash recovery.  We never modify the control
                               3134                 :                :          * file's value in that case, so we can short-circuit future checks
                               3135                 :                :          * here too.  This triggers a quick exit path for the startup process,
                               3136                 :                :          * which cannot update its local copy of minRecoveryPoint as long as
                               3137                 :                :          * it has not replayed all WAL available when doing crash recovery.
                               3138                 :                :          */
  788                          3139   [ +  +  -  + ]:         605919 :         if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery)
 2110 michael@paquier.xyz      3140                 :UBC           0 :             updateMinRecoveryPoint = false;
                               3141                 :                : 
                               3142                 :                :         /* Quick exit if already known to be updated or cannot be updated */
  788 heikki.linnakangas@i     3143   [ +  +  -  + ]:CBC      605919 :         if (record <= LocalMinRecoveryPoint || !updateMinRecoveryPoint)
 5230 simon@2ndQuadrant.co     3144                 :         596974 :             return false;
                               3145                 :                : 
                               3146                 :                :         /*
                               3147                 :                :          * Update local copy of minRecoveryPoint. But if the lock is busy,
                               3148                 :                :          * just return a conservative guess.
                               3149                 :                :          */
                               3150         [ -  + ]:           8945 :         if (!LWLockConditionalAcquire(ControlFileLock, LW_SHARED))
 5230 simon@2ndQuadrant.co     3151                 :UBC           0 :             return true;
  788 heikki.linnakangas@i     3152                 :CBC        8945 :         LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               3153                 :           8945 :         LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 5230 simon@2ndQuadrant.co     3154                 :           8945 :         LWLockRelease(ControlFileLock);
                               3155                 :                : 
                               3156                 :                :         /*
                               3157                 :                :          * Check minRecoveryPoint for any other process than the startup
                               3158                 :                :          * process doing crash recovery, which should not update the control
                               3159                 :                :          * file value if crash recovery is still running.
                               3160                 :                :          */
  788 heikki.linnakangas@i     3161         [ -  + ]:           8945 :         if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint))
 2053 michael@paquier.xyz      3162                 :UBC           0 :             updateMinRecoveryPoint = false;
                               3163                 :                : 
                               3164                 :                :         /* check again */
  788 heikki.linnakangas@i     3165   [ +  +  -  + ]:CBC        8945 :         if (record <= LocalMinRecoveryPoint || !updateMinRecoveryPoint)
 2053 michael@paquier.xyz      3166                 :             66 :             return false;
                               3167                 :                :         else
                               3168                 :           8879 :             return true;
                               3169                 :                :     }
                               3170                 :                : 
                               3171                 :                :     /* Quick exit if already known flushed */
 4125 alvherre@alvh.no-ip.     3172         [ +  + ]:        7501845 :     if (record <= LogwrtResult.Flush)
 6164 tgl@sss.pgh.pa.us        3173                 :        7327403 :         return false;
                               3174                 :                : 
                               3175                 :                :     /* read LogwrtResult and update local state */
   11 alvherre@alvh.no-ip.     3176                 :GNC      174442 :     RefreshXLogWriteResult(LogwrtResult);
                               3177                 :                : 
                               3178                 :                :     /* check again */
 4125 alvherre@alvh.no-ip.     3179         [ +  + ]:CBC      174442 :     if (record <= LogwrtResult.Flush)
 6164 tgl@sss.pgh.pa.us        3180                 :           3053 :         return false;
                               3181                 :                : 
                               3182                 :         171389 :     return true;
                               3183                 :                : }
                               3184                 :                : 
                               3185                 :                : /*
                               3186                 :                :  * Try to make a given XLOG file segment exist.
                               3187                 :                :  *
                               3188                 :                :  * logsegno: identify segment.
                               3189                 :                :  *
                               3190                 :                :  * *added: on return, true if this call raised the number of extant segments.
                               3191                 :                :  *
                               3192                 :                :  * path: on return, this char[MAXPGPATH] has the path to the logsegno file.
                               3193                 :                :  *
                               3194                 :                :  * Returns -1 or FD of opened file.  A -1 here is not an error; a caller
                               3195                 :                :  * wanting an open segment should attempt to open "path", which usually will
                               3196                 :                :  * succeed.  (This is weird, but it's efficient for the callers.)
                               3197                 :                :  */
                               3198                 :                : static int
  891 rhaas@postgresql.org     3199                 :           8471 : XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
                               3200                 :                :                      bool *added, char *path)
                               3201                 :                : {
                               3202                 :                :     char        tmppath[MAXPGPATH];
                               3203                 :                :     XLogSegNo   installed_segno;
                               3204                 :                :     XLogSegNo   max_segno;
                               3205                 :                :     int         fd;
                               3206                 :                :     int         save_errno;
  372 tmunro@postgresql.or     3207                 :           8471 :     int         open_flags = O_RDWR | O_CREAT | O_EXCL | PG_BINARY;
                               3208                 :                : 
  891 rhaas@postgresql.org     3209         [ -  + ]:           8471 :     Assert(logtli != 0);
                               3210                 :                : 
                               3211                 :           8471 :     XLogFilePath(path, logtli, logsegno, wal_segment_size);
                               3212                 :                : 
                               3213                 :                :     /*
                               3214                 :                :      * Try to use existent file (checkpoint maker may have created it already)
                               3215                 :                :      */
 1021 noah@leadboat.com        3216                 :           8471 :     *added = false;
  408 tmunro@postgresql.or     3217                 :           8471 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
  184 nathan@postgresql.or     3218                 :GNC        8471 :                        get_sync_bit(wal_sync_method));
 1021 noah@leadboat.com        3219         [ +  + ]:CBC        8471 :     if (fd < 0)
                               3220                 :                :     {
                               3221         [ -  + ]:            512 :         if (errno != ENOENT)
 1021 noah@leadboat.com        3222         [ #  # ]:UBC           0 :             ereport(ERROR,
                               3223                 :                :                     (errcode_for_file_access(),
                               3224                 :                :                      errmsg("could not open file \"%s\": %m", path)));
                               3225                 :                :     }
                               3226                 :                :     else
 1021 noah@leadboat.com        3227                 :CBC        7959 :         return fd;
                               3228                 :                : 
                               3229                 :                :     /*
                               3230                 :                :      * Initialize an empty (all zeroes) segment.  NOTE: it is possible that
                               3231                 :                :      * another process is doing the same thing.  If so, we will end up
                               3232                 :                :      * pre-creating an extra log segment.  That seems OK, and better than
                               3233                 :                :      * holding the lock throughout this lengthy process.
                               3234                 :                :      */
 6133 tgl@sss.pgh.pa.us        3235         [ +  + ]:            512 :     elog(DEBUG2, "creating and filling new WAL file");
                               3236                 :                : 
 6859                          3237                 :            512 :     snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
                               3238                 :                : 
 8429                          3239                 :            512 :     unlink(tmppath);
                               3240                 :                : 
  372 tmunro@postgresql.or     3241         [ -  + ]:            512 :     if (io_direct_flags & IO_DIRECT_WAL_INIT)
  372 tmunro@postgresql.or     3242                 :UBC           0 :         open_flags |= PG_O_DIRECT;
                               3243                 :                : 
                               3244                 :                :     /* do not use get_sync_bit() here --- want to fsync only at end of fill */
  372 tmunro@postgresql.or     3245                 :CBC         512 :     fd = BasicOpenFile(tmppath, open_flags);
 8966 vadim4o@yahoo.com        3246         [ -  + ]:            512 :     if (fd < 0)
 6939 tgl@sss.pgh.pa.us        3247         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3248                 :                :                 (errcode_for_file_access(),
                               3249                 :                :                  errmsg("could not create file \"%s\": %m", tmppath)));
                               3250                 :                : 
 1839 tmunro@postgresql.or     3251                 :CBC         512 :     pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
                               3252                 :            512 :     save_errno = 0;
                               3253         [ +  - ]:            512 :     if (wal_init_zero)
                               3254                 :                :     {
                               3255                 :                :         ssize_t     rc;
                               3256                 :                : 
                               3257                 :                :         /*
                               3258                 :                :          * Zero-fill the file.  With this setting, we do this the hard way to
                               3259                 :                :          * ensure that all the file space has really been allocated.  On
                               3260                 :                :          * platforms that allow "holes" in files, just seeking to the end
                               3261                 :                :          * doesn't allocate intermediate space.  This way, we know that we
                               3262                 :                :          * have all the space and (after the fsync below) that all the
                               3263                 :                :          * indirect blocks are down on disk.  Therefore, fdatasync(2) or
                               3264                 :                :          * O_DSYNC will be sufficient to sync future writes to the log file.
                               3265                 :                :          */
  405 michael@paquier.xyz      3266                 :            512 :         rc = pg_pwrite_zeros(fd, wal_segment_size, 0);
                               3267                 :                : 
  523                          3268         [ -  + ]:            512 :         if (rc < 0)
  523 michael@paquier.xyz      3269                 :UBC           0 :             save_errno = errno;
                               3270                 :                :     }
                               3271                 :                :     else
                               3272                 :                :     {
                               3273                 :                :         /*
                               3274                 :                :          * Otherwise, seeking to the end and writing a solitary byte is
                               3275                 :                :          * enough.
                               3276                 :                :          */
 3875 jdavis@postgresql.or     3277                 :              0 :         errno = 0;
  523 michael@paquier.xyz      3278         [ #  # ]:              0 :         if (pg_pwrite(fd, "\0", 1, wal_segment_size - 1) != 1)
                               3279                 :                :         {
                               3280                 :                :             /* if write didn't set errno, assume no disk space */
 1839 tmunro@postgresql.or     3281         [ #  # ]:              0 :             save_errno = errno ? errno : ENOSPC;
                               3282                 :                :         }
                               3283                 :                :     }
 1839 tmunro@postgresql.or     3284                 :CBC         512 :     pgstat_report_wait_end();
                               3285                 :                : 
                               3286         [ -  + ]:            512 :     if (save_errno)
                               3287                 :                :     {
                               3288                 :                :         /*
                               3289                 :                :          * If we fail to make the file, delete it to release disk space
                               3290                 :                :          */
 1839 tmunro@postgresql.or     3291                 :UBC           0 :         unlink(tmppath);
                               3292                 :                : 
                               3293                 :              0 :         close(fd);
                               3294                 :                : 
                               3295                 :              0 :         errno = save_errno;
                               3296                 :                : 
                               3297         [ #  # ]:              0 :         ereport(ERROR,
                               3298                 :                :                 (errcode_for_file_access(),
                               3299                 :                :                  errmsg("could not write to file \"%s\": %m", tmppath)));
                               3300                 :                :     }
                               3301                 :                : 
 2584 rhaas@postgresql.org     3302                 :CBC         512 :     pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 8528 tgl@sss.pgh.pa.us        3303         [ -  + ]:            512 :     if (pg_fsync(fd) != 0)
                               3304                 :                :     {
  597 drowley@postgresql.o     3305                 :UBC           0 :         save_errno = errno;
 4156 heikki.linnakangas@i     3306                 :              0 :         close(fd);
 2120 michael@paquier.xyz      3307                 :              0 :         errno = save_errno;
 6939 tgl@sss.pgh.pa.us        3308         [ #  # ]:              0 :         ereport(ERROR,
                               3309                 :                :                 (errcode_for_file_access(),
                               3310                 :                :                  errmsg("could not fsync file \"%s\": %m", tmppath)));
                               3311                 :                :     }
 2584 rhaas@postgresql.org     3312                 :CBC         512 :     pgstat_report_wait_end();
                               3313                 :                : 
 1744 peter@eisentraut.org     3314         [ -  + ]:            512 :     if (close(fd) != 0)
 6939 tgl@sss.pgh.pa.us        3315         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3316                 :                :                 (errcode_for_file_access(),
                               3317                 :                :                  errmsg("could not close file \"%s\": %m", tmppath)));
                               3318                 :                : 
                               3319                 :                :     /*
                               3320                 :                :      * Now move the segment into place with its final name.  Cope with
                               3321                 :                :      * possibility that someone else has created the file while we were
                               3322                 :                :      * filling ours: if so, use ours to pre-create a future log segment.
                               3323                 :                :      */
 4312 heikki.linnakangas@i     3324                 :CBC         512 :     installed_segno = logsegno;
                               3325                 :                : 
                               3326                 :                :     /*
                               3327                 :                :      * XXX: What should we use as max_segno? We used to use XLOGfileslop when
                               3328                 :                :      * that was a constant, but that was always a bit dubious: normally, at a
                               3329                 :                :      * checkpoint, XLOGfileslop was the offset from the checkpoint record, but
                               3330                 :                :      * here, it was the offset from the insert location. We can't do the
                               3331                 :                :      * normal XLOGfileslop calculation here because we don't have access to
                               3332                 :                :      * the prior checkpoint's redo location. So somewhat arbitrarily, just use
                               3333                 :                :      * CheckPointSegments.
                               3334                 :                :      */
 3338                          3335                 :            512 :     max_segno = logsegno + CheckPointSegments;
  891 rhaas@postgresql.org     3336         [ +  - ]:            512 :     if (InstallXLogFileSegment(&installed_segno, tmppath, true, max_segno,
                               3337                 :                :                                logtli))
                               3338                 :                :     {
 1021 noah@leadboat.com        3339                 :            512 :         *added = true;
                               3340         [ +  + ]:            512 :         elog(DEBUG2, "done creating and filling new WAL file");
                               3341                 :                :     }
                               3342                 :                :     else
                               3343                 :                :     {
                               3344                 :                :         /*
                               3345                 :                :          * No need for any more future segments, or InstallXLogFileSegment()
                               3346                 :                :          * failed to rename the file into place. If the rename failed, a
                               3347                 :                :          * caller opening the file may fail.
                               3348                 :                :          */
 8305 tgl@sss.pgh.pa.us        3349                 :UBC           0 :         unlink(tmppath);
 1021 noah@leadboat.com        3350         [ #  # ]:              0 :         elog(DEBUG2, "abandoned new WAL file");
                               3351                 :                :     }
                               3352                 :                : 
 1021 noah@leadboat.com        3353                 :CBC         512 :     return -1;
                               3354                 :                : }
                               3355                 :                : 
                               3356                 :                : /*
                               3357                 :                :  * Create a new XLOG file segment, or open a pre-existing one.
                               3358                 :                :  *
                               3359                 :                :  * logsegno: identify segment to be created/opened.
                               3360                 :                :  *
                               3361                 :                :  * Returns FD of opened file.
                               3362                 :                :  *
                               3363                 :                :  * Note: errors here are ERROR not PANIC because we might or might not be
                               3364                 :                :  * inside a critical section (eg, during checkpoint there is no reason to
                               3365                 :                :  * take down the system on failure).  They will promote to PANIC if we are
                               3366                 :                :  * in a critical section.
                               3367                 :                :  */
                               3368                 :                : int
  891 rhaas@postgresql.org     3369                 :           8410 : XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
                               3370                 :                : {
                               3371                 :                :     bool        ignore_added;
                               3372                 :                :     char        path[MAXPGPATH];
                               3373                 :                :     int         fd;
                               3374                 :                : 
                               3375         [ -  + ]:           8410 :     Assert(logtli != 0);
                               3376                 :                : 
                               3377                 :           8410 :     fd = XLogFileInitInternal(logsegno, logtli, &ignore_added, path);
 1021 noah@leadboat.com        3378         [ +  + ]:           8410 :     if (fd >= 0)
                               3379                 :           7912 :         return fd;
                               3380                 :                : 
                               3381                 :                :     /* Now open original target segment (might not be file I just made) */
  408 tmunro@postgresql.or     3382                 :            498 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
  184 nathan@postgresql.or     3383                 :GNC         498 :                        get_sync_bit(wal_sync_method));
 8305 tgl@sss.pgh.pa.us        3384         [ -  + ]:CBC         498 :     if (fd < 0)
 6939 tgl@sss.pgh.pa.us        3385         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3386                 :                :                 (errcode_for_file_access(),
                               3387                 :                :                  errmsg("could not open file \"%s\": %m", path)));
 6668 neilc@samurai.com        3388                 :CBC         498 :     return fd;
                               3389                 :                : }
                               3390                 :                : 
                               3391                 :                : /*
                               3392                 :                :  * Create a new XLOG file segment by copying a pre-existing one.
                               3393                 :                :  *
                               3394                 :                :  * destsegno: identify segment to be created.
                               3395                 :                :  *
                               3396                 :                :  * srcTLI, srcsegno: identify segment to be copied (could be from
                               3397                 :                :  *      a different timeline)
                               3398                 :                :  *
                               3399                 :                :  * upto: how much of the source file to copy (the rest is filled with
                               3400                 :                :  *      zeros)
                               3401                 :                :  *
                               3402                 :                :  * Currently this is only used during recovery, and so there are no locking
                               3403                 :                :  * considerations.  But we should be just as tense as XLogFileInit to avoid
                               3404                 :                :  * emplacing a bogus file.
                               3405                 :                :  */
                               3406                 :                : static void
  891 rhaas@postgresql.org     3407                 :             36 : XLogFileCopy(TimeLineID destTLI, XLogSegNo destsegno,
                               3408                 :                :              TimeLineID srcTLI, XLogSegNo srcsegno,
                               3409                 :                :              int upto)
                               3410                 :                : {
                               3411                 :                :     char        path[MAXPGPATH];
                               3412                 :                :     char        tmppath[MAXPGPATH];
                               3413                 :                :     PGAlignedXLogBlock buffer;
                               3414                 :                :     int         srcfd;
                               3415                 :                :     int         fd;
                               3416                 :                :     int         nbytes;
                               3417                 :                : 
                               3418                 :                :     /*
                               3419                 :                :      * Open the source file
                               3420                 :                :      */
 2399 andres@anarazel.de       3421                 :             36 :     XLogFilePath(path, srcTLI, srcsegno, wal_segment_size);
 2395 peter_e@gmx.net          3422                 :             36 :     srcfd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 7207 tgl@sss.pgh.pa.us        3423         [ -  + ]:             36 :     if (srcfd < 0)
 6939 tgl@sss.pgh.pa.us        3424         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3425                 :                :                 (errcode_for_file_access(),
                               3426                 :                :                  errmsg("could not open file \"%s\": %m", path)));
                               3427                 :                : 
                               3428                 :                :     /*
                               3429                 :                :      * Copy into a temp file name.
                               3430                 :                :      */
 6859 tgl@sss.pgh.pa.us        3431                 :CBC          36 :     snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
                               3432                 :                : 
 7207                          3433                 :             36 :     unlink(tmppath);
                               3434                 :                : 
                               3435                 :                :     /* do not use get_sync_bit() here --- want to fsync only at end of fill */
 2395 peter_e@gmx.net          3436                 :             36 :     fd = OpenTransientFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
 7207 tgl@sss.pgh.pa.us        3437         [ -  + ]:             36 :     if (fd < 0)
 6939 tgl@sss.pgh.pa.us        3438         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3439                 :                :                 (errcode_for_file_access(),
                               3440                 :                :                  errmsg("could not create file \"%s\": %m", tmppath)));
                               3441                 :                : 
                               3442                 :                :     /*
                               3443                 :                :      * Do the data copying.
                               3444                 :                :      */
 2399 andres@anarazel.de       3445         [ +  + ]:CBC       73764 :     for (nbytes = 0; nbytes < wal_segment_size; nbytes += sizeof(buffer))
                               3446                 :                :     {
                               3447                 :                :         int         nread;
                               3448                 :                : 
 3405 heikki.linnakangas@i     3449                 :          73728 :         nread = upto - nbytes;
                               3450                 :                : 
                               3451                 :                :         /*
                               3452                 :                :          * The part that is not read from the source file is filled with
                               3453                 :                :          * zeros.
                               3454                 :                :          */
                               3455         [ +  + ]:          73728 :         if (nread < sizeof(buffer))
 2052 tgl@sss.pgh.pa.us        3456                 :             36 :             memset(buffer.data, 0, sizeof(buffer));
                               3457                 :                : 
 3405 heikki.linnakangas@i     3458         [ +  + ]:          73728 :         if (nread > 0)
                               3459                 :                :         {
                               3460                 :                :             int         r;
                               3461                 :                : 
                               3462         [ +  + ]:           2640 :             if (nread > sizeof(buffer))
                               3463                 :           2604 :                 nread = sizeof(buffer);
 2584 rhaas@postgresql.org     3464                 :           2640 :             pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_READ);
 2052 tgl@sss.pgh.pa.us        3465                 :           2640 :             r = read(srcfd, buffer.data, nread);
 2097 michael@paquier.xyz      3466         [ -  + ]:           2640 :             if (r != nread)
                               3467                 :                :             {
 2097 michael@paquier.xyz      3468         [ #  # ]:UBC           0 :                 if (r < 0)
 3405 heikki.linnakangas@i     3469         [ #  # ]:              0 :                     ereport(ERROR,
                               3470                 :                :                             (errcode_for_file_access(),
                               3471                 :                :                              errmsg("could not read file \"%s\": %m",
                               3472                 :                :                                     path)));
                               3473                 :                :                 else
                               3474         [ #  # ]:              0 :                     ereport(ERROR,
                               3475                 :                :                             (errcode(ERRCODE_DATA_CORRUPTED),
                               3476                 :                :                              errmsg("could not read file \"%s\": read %d of %zu",
                               3477                 :                :                                     path, r, (Size) nread)));
                               3478                 :                :             }
 2584 rhaas@postgresql.org     3479                 :CBC        2640 :             pgstat_report_wait_end();
                               3480                 :                :         }
 7207 tgl@sss.pgh.pa.us        3481                 :          73728 :         errno = 0;
 2584 rhaas@postgresql.org     3482                 :          73728 :         pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_WRITE);
 2052 tgl@sss.pgh.pa.us        3483         [ -  + ]:          73728 :         if ((int) write(fd, buffer.data, sizeof(buffer)) != (int) sizeof(buffer))
                               3484                 :                :         {
 7207 tgl@sss.pgh.pa.us        3485                 :UBC           0 :             int         save_errno = errno;
                               3486                 :                : 
                               3487                 :                :             /*
                               3488                 :                :              * If we fail to make the file, delete it to release disk space
                               3489                 :                :              */
                               3490                 :              0 :             unlink(tmppath);
                               3491                 :                :             /* if write didn't set errno, assume problem is no disk space */
                               3492         [ #  # ]:              0 :             errno = save_errno ? save_errno : ENOSPC;
                               3493                 :                : 
 6939                          3494         [ #  # ]:              0 :             ereport(ERROR,
                               3495                 :                :                     (errcode_for_file_access(),
                               3496                 :                :                      errmsg("could not write to file \"%s\": %m", tmppath)));
                               3497                 :                :         }
 2584 rhaas@postgresql.org     3498                 :CBC       73728 :         pgstat_report_wait_end();
                               3499                 :                :     }
                               3500                 :                : 
                               3501                 :             36 :     pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_SYNC);
 7207 tgl@sss.pgh.pa.us        3502         [ -  + ]:             36 :     if (pg_fsync(fd) != 0)
 1973 tmunro@postgresql.or     3503         [ #  # ]:UBC           0 :         ereport(data_sync_elevel(ERROR),
                               3504                 :                :                 (errcode_for_file_access(),
                               3505                 :                :                  errmsg("could not fsync file \"%s\": %m", tmppath)));
 2584 rhaas@postgresql.org     3506                 :CBC          36 :     pgstat_report_wait_end();
                               3507                 :                : 
 1744 peter@eisentraut.org     3508         [ -  + ]:             36 :     if (CloseTransientFile(fd) != 0)
 6939 tgl@sss.pgh.pa.us        3509         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3510                 :                :                 (errcode_for_file_access(),
                               3511                 :                :                  errmsg("could not close file \"%s\": %m", tmppath)));
                               3512                 :                : 
 1744 peter@eisentraut.org     3513         [ -  + ]:CBC          36 :     if (CloseTransientFile(srcfd) != 0)
 1863 michael@paquier.xyz      3514         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3515                 :                :                 (errcode_for_file_access(),
                               3516                 :                :                  errmsg("could not close file \"%s\": %m", path)));
                               3517                 :                : 
                               3518                 :                :     /*
                               3519                 :                :      * Now move the segment into place with its final name.
                               3520                 :                :      */
  891 rhaas@postgresql.org     3521         [ -  + ]:CBC          36 :     if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, destTLI))
 3210 fujii@postgresql.org     3522         [ #  # ]:UBC           0 :         elog(ERROR, "InstallXLogFileSegment should not have failed");
 7207 tgl@sss.pgh.pa.us        3523                 :CBC          36 : }
                               3524                 :                : 
                               3525                 :                : /*
                               3526                 :                :  * Install a new XLOG segment file as a current or future log segment.
                               3527                 :                :  *
                               3528                 :                :  * This is used both to install a newly-created segment (which has a temp
                               3529                 :                :  * filename while it's being created) and to recycle an old segment.
                               3530                 :                :  *
                               3531                 :                :  * *segno: identify segment to install as (or first possible target).
                               3532                 :                :  * When find_free is true, this is modified on return to indicate the
                               3533                 :                :  * actual installation location or last segment searched.
                               3534                 :                :  *
                               3535                 :                :  * tmppath: initial name of file to install.  It will be renamed into place.
                               3536                 :                :  *
                               3537                 :                :  * find_free: if true, install the new segment at the first empty segno
                               3538                 :                :  * number at or after the passed numbers.  If false, install the new segment
                               3539                 :                :  * exactly where specified, deleting any existing segment file there.
                               3540                 :                :  *
                               3541                 :                :  * max_segno: maximum segment number to install the new file as.  Fail if no
                               3542                 :                :  * free slot is found between *segno and max_segno. (Ignored when find_free
                               3543                 :                :  * is false.)
                               3544                 :                :  *
                               3545                 :                :  * tli: The timeline on which the new segment should be installed.
                               3546                 :                :  *
                               3547                 :                :  * Returns true if the file was installed successfully.  false indicates that
                               3548                 :                :  * max_segno limit was exceeded, the startup process has disabled this
                               3549                 :                :  * function for now, or an error occurred while renaming the file into place.
                               3550                 :                :  */
                               3551                 :                : static bool
 4312 heikki.linnakangas@i     3552                 :           1266 : InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                               3553                 :                :                        bool find_free, XLogSegNo max_segno, TimeLineID tli)
                               3554                 :                : {
                               3555                 :                :     char        path[MAXPGPATH];
                               3556                 :                :     struct stat stat_buf;
                               3557                 :                : 
  891 rhaas@postgresql.org     3558         [ -  + ]:           1266 :     Assert(tli != 0);
                               3559                 :                : 
                               3560                 :           1266 :     XLogFilePath(path, tli, *segno, wal_segment_size);
                               3561                 :                : 
 1021 noah@leadboat.com        3562                 :           1266 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               3563         [ -  + ]:           1266 :     if (!XLogCtl->InstallXLogFileSegmentActive)
                               3564                 :                :     {
 1021 noah@leadboat.com        3565                 :UBC           0 :         LWLockRelease(ControlFileLock);
                               3566                 :              0 :         return false;
                               3567                 :                :     }
                               3568                 :                : 
 8305 tgl@sss.pgh.pa.us        3569         [ +  + ]:CBC        1266 :     if (!find_free)
                               3570                 :                :     {
                               3571                 :                :         /* Force installation: get rid of any pre-existing segment file */
 2575 teodor@sigaev.ru         3572                 :             36 :         durable_unlink(path, DEBUG1);
                               3573                 :                :     }
                               3574                 :                :     else
                               3575                 :                :     {
                               3576                 :                :         /* Find a free slot to put it in */
 7750 tgl@sss.pgh.pa.us        3577         [ +  + ]:           2003 :         while (stat(path, &stat_buf) == 0)
                               3578                 :                :         {
 3338 heikki.linnakangas@i     3579         [ +  + ]:            782 :             if ((*segno) >= max_segno)
                               3580                 :                :             {
                               3581                 :                :                 /* Failed to find a free slot within specified range */
 1021 noah@leadboat.com        3582                 :              9 :                 LWLockRelease(ControlFileLock);
 8305 tgl@sss.pgh.pa.us        3583                 :              9 :                 return false;
                               3584                 :                :             }
 4312 heikki.linnakangas@i     3585                 :            773 :             (*segno)++;
  891 rhaas@postgresql.org     3586                 :            773 :             XLogFilePath(path, tli, *segno, wal_segment_size);
                               3587                 :                :         }
                               3588                 :                :     }
                               3589                 :                : 
  649 michael@paquier.xyz      3590   [ +  -  -  + ]:           1257 :     Assert(access(path, F_OK) != 0 && errno == ENOENT);
                               3591         [ -  + ]:           1257 :     if (durable_rename(tmppath, path, LOG) != 0)
                               3592                 :                :     {
 1021 noah@leadboat.com        3593                 :UBC           0 :         LWLockRelease(ControlFileLock);
                               3594                 :                :         /* durable_rename already emitted log message */
 5327 heikki.linnakangas@i     3595                 :              0 :         return false;
                               3596                 :                :     }
                               3597                 :                : 
 1021 noah@leadboat.com        3598                 :CBC        1257 :     LWLockRelease(ControlFileLock);
                               3599                 :                : 
 8305 tgl@sss.pgh.pa.us        3600                 :           1257 :     return true;
                               3601                 :                : }
                               3602                 :                : 
                               3603                 :                : /*
                               3604                 :                :  * Open a pre-existing logfile segment for writing.
                               3605                 :                :  */
                               3606                 :                : int
  891 rhaas@postgresql.org     3607                 :             13 : XLogFileOpen(XLogSegNo segno, TimeLineID tli)
                               3608                 :                : {
                               3609                 :                :     char        path[MAXPGPATH];
                               3610                 :                :     int         fd;
                               3611                 :                : 
                               3612                 :             13 :     XLogFilePath(path, tli, segno, wal_segment_size);
                               3613                 :                : 
  408 tmunro@postgresql.or     3614                 :             13 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
  184 nathan@postgresql.or     3615                 :GNC          13 :                        get_sync_bit(wal_sync_method));
 8966 vadim4o@yahoo.com        3616         [ -  + ]:CBC          13 :     if (fd < 0)
 7573 tgl@sss.pgh.pa.us        3617         [ #  # ]:UBC           0 :         ereport(PANIC,
                               3618                 :                :                 (errcode_for_file_access(),
                               3619                 :                :                  errmsg("could not open file \"%s\": %m", path)));
                               3620                 :                : 
 7207 tgl@sss.pgh.pa.us        3621                 :CBC          13 :     return fd;
                               3622                 :                : }
                               3623                 :                : 
                               3624                 :                : /*
                               3625                 :                :  * Close the current logfile segment for writing.
                               3626                 :                :  */
                               3627                 :                : static void
 6513 bruce@momjian.us         3628                 :           2029 : XLogFileClose(void)
                               3629                 :                : {
                               3630         [ -  + ]:           2029 :     Assert(openLogFile >= 0);
                               3631                 :                : 
                               3632                 :                :     /*
                               3633                 :                :      * WAL segment files will not be re-read in normal operation, so we advise
                               3634                 :                :      * the OS to release any cached pages.  But do not do so if WAL archiving
                               3635                 :                :      * or streaming is active, because archiver and walsender process could
                               3636                 :                :      * use the cache to read the WAL segment.
                               3637                 :                :      */
                               3638                 :                : #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
  372 tmunro@postgresql.or     3639   [ +  +  +  - ]:           2029 :     if (!XLogIsNeeded() && (io_direct_flags & IO_DIRECT_WAL) == 0)
 5572 tgl@sss.pgh.pa.us        3640                 :           1419 :         (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
                               3641                 :                : #endif
                               3642                 :                : 
 1744 peter@eisentraut.org     3643         [ -  + ]:           2029 :     if (close(openLogFile) != 0)
                               3644                 :                :     {
                               3645                 :                :         char        xlogfname[MAXFNAMELEN];
 1594 michael@paquier.xyz      3646                 :UBC           0 :         int         save_errno = errno;
                               3647                 :                : 
  891 rhaas@postgresql.org     3648                 :              0 :         XLogFileName(xlogfname, openLogTLI, openLogSegNo, wal_segment_size);
 1594 michael@paquier.xyz      3649                 :              0 :         errno = save_errno;
 6513 bruce@momjian.us         3650         [ #  # ]:              0 :         ereport(PANIC,
                               3651                 :                :                 (errcode_for_file_access(),
                               3652                 :                :                  errmsg("could not close file \"%s\": %m", xlogfname)));
                               3653                 :                :     }
                               3654                 :                : 
 6513 bruce@momjian.us         3655                 :CBC        2029 :     openLogFile = -1;
 1511 tgl@sss.pgh.pa.us        3656                 :           2029 :     ReleaseExternalFD();
 6513 bruce@momjian.us         3657                 :           2029 : }
                               3658                 :                : 
                               3659                 :                : /*
                               3660                 :                :  * Preallocate log files beyond the specified log endpoint.
                               3661                 :                :  *
                               3662                 :                :  * XXX this is currently extremely conservative, since it forces only one
                               3663                 :                :  * future log segment to exist, and even that only if we are 75% done with
                               3664                 :                :  * the current one.  This is only appropriate for very low-WAL-volume systems.
                               3665                 :                :  * High-volume systems will be OK once they've built up a sufficient set of
                               3666                 :                :  * recycled log segments, but the startup transient is likely to include
                               3667                 :                :  * a lot of segment creations by foreground processes, which is not so good.
                               3668                 :                :  *
                               3669                 :                :  * XLogFileInitInternal() can ereport(ERROR).  All known causes indicate big
                               3670                 :                :  * trouble; for example, a full filesystem is one cause.  The checkpoint WAL
                               3671                 :                :  * and/or ControlFile updates already completed.  If a RequestCheckpoint()
                               3672                 :                :  * initiated the present checkpoint and an ERROR ends this function, the
                               3673                 :                :  * command that called RequestCheckpoint() fails.  That's not ideal, but it's
                               3674                 :                :  * not worth contorting more functions to use caller-specified elevel values.
                               3675                 :                :  * (With or without RequestCheckpoint(), an ERROR forestalls some inessential
                               3676                 :                :  * reporting and resource reclamation.)
                               3677                 :                :  */
                               3678                 :                : static void
  891 rhaas@postgresql.org     3679                 :           1302 : PreallocXlogFiles(XLogRecPtr endptr, TimeLineID tli)
                               3680                 :                : {
                               3681                 :                :     XLogSegNo   _logSegNo;
                               3682                 :                :     int         lf;
                               3683                 :                :     bool        added;
                               3684                 :                :     char        path[MAXPGPATH];
                               3685                 :                :     uint64      offset;
                               3686                 :                : 
 1021 noah@leadboat.com        3687         [ +  + ]:           1302 :     if (!XLogCtl->InstallXLogFileSegmentActive)
                               3688                 :              9 :         return;                 /* unlocked check says no */
                               3689                 :                : 
 2399 andres@anarazel.de       3690                 :           1293 :     XLByteToPrevSeg(endptr, _logSegNo, wal_segment_size);
                               3691                 :           1293 :     offset = XLogSegmentOffset(endptr - 1, wal_segment_size);
                               3692         [ +  + ]:           1293 :     if (offset >= (uint32) (0.75 * wal_segment_size))
                               3693                 :                :     {
 4312 heikki.linnakangas@i     3694                 :             61 :         _logSegNo++;
  891 rhaas@postgresql.org     3695                 :             61 :         lf = XLogFileInitInternal(_logSegNo, tli, &added, path);
 1021 noah@leadboat.com        3696         [ +  + ]:             61 :         if (lf >= 0)
                               3697                 :             47 :             close(lf);
                               3698         [ +  + ]:             61 :         if (added)
 6133 tgl@sss.pgh.pa.us        3699                 :             14 :             CheckpointStats.ckpt_segs_added++;
                               3700                 :                :     }
                               3701                 :                : }
                               3702                 :                : 
                               3703                 :                : /*
                               3704                 :                :  * Throws an error if the given log segment has already been removed or
                               3705                 :                :  * recycled. The caller should only pass a segment that it knows to have
                               3706                 :                :  * existed while the server has been running, as this function always
                               3707                 :                :  * succeeds if no WAL segments have been removed since startup.
                               3708                 :                :  * 'tli' is only used in the error message.
                               3709                 :                :  *
                               3710                 :                :  * Note: this function guarantees to keep errno unchanged on return.
                               3711                 :                :  * This supports callers that use this to possibly deliver a better
                               3712                 :                :  * error message about a missing file, while still being able to throw
                               3713                 :                :  * a normal file-access error afterwards, if this does return.
                               3714                 :                :  */
                               3715                 :                : void
 4119 heikki.linnakangas@i     3716                 :          58031 : CheckXLogRemoved(XLogSegNo segno, TimeLineID tli)
                               3717                 :                : {
 2323 tgl@sss.pgh.pa.us        3718                 :          58031 :     int         save_errno = errno;
                               3719                 :                :     XLogSegNo   lastRemovedSegNo;
                               3720                 :                : 
 3492 andres@anarazel.de       3721         [ +  + ]:          58031 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3722                 :          58031 :     lastRemovedSegNo = XLogCtl->lastRemovedSegNo;
                               3723                 :          58031 :     SpinLockRelease(&XLogCtl->info_lck);
                               3724                 :                : 
 4119 heikki.linnakangas@i     3725         [ -  + ]:          58031 :     if (segno <= lastRemovedSegNo)
                               3726                 :                :     {
                               3727                 :                :         char        filename[MAXFNAMELEN];
                               3728                 :                : 
 2399 andres@anarazel.de       3729                 :UBC           0 :         XLogFileName(filename, tli, segno, wal_segment_size);
 2323 tgl@sss.pgh.pa.us        3730                 :              0 :         errno = save_errno;
 4119 heikki.linnakangas@i     3731         [ #  # ]:              0 :         ereport(ERROR,
                               3732                 :                :                 (errcode_for_file_access(),
                               3733                 :                :                  errmsg("requested WAL segment %s has already been removed",
                               3734                 :                :                         filename)));
                               3735                 :                :     }
 2323 tgl@sss.pgh.pa.us        3736                 :CBC       58031 :     errno = save_errno;
 5116 heikki.linnakangas@i     3737                 :          58031 : }
                               3738                 :                : 
                               3739                 :                : /*
                               3740                 :                :  * Return the last WAL segment removed, or 0 if no segment has been removed
                               3741                 :                :  * since startup.
                               3742                 :                :  *
                               3743                 :                :  * NB: the result can be out of date arbitrarily fast, the caller has to deal
                               3744                 :                :  * with that.
                               3745                 :                :  */
                               3746                 :                : XLogSegNo
 3695 rhaas@postgresql.org     3747                 :            949 : XLogGetLastRemovedSegno(void)
                               3748                 :                : {
                               3749                 :                :     XLogSegNo   lastRemovedSegNo;
                               3750                 :                : 
 3492 andres@anarazel.de       3751         [ -  + ]:            949 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3752                 :            949 :     lastRemovedSegNo = XLogCtl->lastRemovedSegNo;
                               3753                 :            949 :     SpinLockRelease(&XLogCtl->info_lck);
                               3754                 :                : 
 3695 rhaas@postgresql.org     3755                 :            949 :     return lastRemovedSegNo;
                               3756                 :                : }
                               3757                 :                : 
                               3758                 :                : /*
                               3759                 :                :  * Return the oldest WAL segment on the given TLI that still exists in
                               3760                 :                :  * XLOGDIR, or 0 if none.
                               3761                 :                :  */
                               3762                 :                : XLogSegNo
  116 rhaas@postgresql.org     3763                 :GNC          36 : XLogGetOldestSegno(TimeLineID tli)
                               3764                 :                : {
                               3765                 :                :     DIR        *xldir;
                               3766                 :                :     struct dirent *xlde;
                               3767                 :             36 :     XLogSegNo   oldest_segno = 0;
                               3768                 :                : 
                               3769                 :             36 :     xldir = AllocateDir(XLOGDIR);
                               3770         [ +  + ]:            269 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3771                 :                :     {
                               3772                 :                :         TimeLineID  file_tli;
                               3773                 :                :         XLogSegNo   file_segno;
                               3774                 :                : 
                               3775                 :                :         /* Ignore files that are not XLOG segments. */
                               3776         [ +  + ]:            233 :         if (!IsXLogFileName(xlde->d_name))
                               3777                 :            159 :             continue;
                               3778                 :                : 
                               3779                 :                :         /* Parse filename to get TLI and segno. */
                               3780                 :             77 :         XLogFromFileName(xlde->d_name, &file_tli, &file_segno,
                               3781                 :                :                          wal_segment_size);
                               3782                 :                : 
                               3783                 :                :         /* Ignore anything that's not from the TLI of interest. */
                               3784         [ +  + ]:             77 :         if (tli != file_tli)
                               3785                 :              3 :             continue;
                               3786                 :                : 
                               3787                 :                :         /* If it's the oldest so far, update oldest_segno. */
                               3788   [ +  +  -  + ]:             74 :         if (oldest_segno == 0 || file_segno < oldest_segno)
                               3789                 :             36 :             oldest_segno = file_segno;
                               3790                 :                :     }
                               3791                 :                : 
                               3792                 :             36 :     FreeDir(xldir);
                               3793                 :             36 :     return oldest_segno;
                               3794                 :                : }
                               3795                 :                : 
                               3796                 :                : /*
                               3797                 :                :  * Update the last removed segno pointer in shared memory, to reflect that the
                               3798                 :                :  * given XLOG file has been removed.
                               3799                 :                :  */
                               3800                 :                : static void
 5116 heikki.linnakangas@i     3801                 :CBC         749 : UpdateLastRemovedPtr(char *filename)
                               3802                 :                : {
                               3803                 :                :     uint32      tli;
                               3804                 :                :     XLogSegNo   segno;
                               3805                 :                : 
 2399 andres@anarazel.de       3806                 :            749 :     XLogFromFileName(filename, &tli, &segno, wal_segment_size);
                               3807                 :                : 
 3492                          3808         [ -  + ]:            749 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3809         [ +  + ]:            749 :     if (segno > XLogCtl->lastRemovedSegNo)
                               3810                 :            677 :         XLogCtl->lastRemovedSegNo = segno;
                               3811                 :            749 :     SpinLockRelease(&XLogCtl->info_lck);
 5116 heikki.linnakangas@i     3812                 :            749 : }
                               3813                 :                : 
                               3814                 :                : /*
                               3815                 :                :  * Remove all temporary log files in pg_wal
                               3816                 :                :  *
                               3817                 :                :  * This is called at the beginning of recovery after a previous crash,
                               3818                 :                :  * at a point where no other processes write fresh WAL data.
                               3819                 :                :  */
                               3820                 :                : static void
 2102 michael@paquier.xyz      3821                 :            201 : RemoveTempXlogFiles(void)
                               3822                 :                : {
                               3823                 :                :     DIR        *xldir;
                               3824                 :                :     struct dirent *xlde;
                               3825                 :                : 
                               3826         [ +  + ]:            201 :     elog(DEBUG2, "removing all temporary WAL segments");
                               3827                 :                : 
                               3828                 :            201 :     xldir = AllocateDir(XLOGDIR);
                               3829         [ +  + ]:           1298 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3830                 :                :     {
                               3831                 :                :         char        path[MAXPGPATH];
                               3832                 :                : 
                               3833         [ +  - ]:           1097 :         if (strncmp(xlde->d_name, "xlogtemp.", 9) != 0)
                               3834                 :           1097 :             continue;
                               3835                 :                : 
 2102 michael@paquier.xyz      3836                 :UBC           0 :         snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlde->d_name);
                               3837                 :              0 :         unlink(path);
                               3838         [ #  # ]:              0 :         elog(DEBUG2, "removed temporary WAL segment \"%s\"", path);
                               3839                 :                :     }
 2102 michael@paquier.xyz      3840                 :CBC         201 :     FreeDir(xldir);
                               3841                 :            201 : }
                               3842                 :                : 
                               3843                 :                : /*
                               3844                 :                :  * Recycle or remove all log files older or equal to passed segno.
                               3845                 :                :  *
                               3846                 :                :  * endptr is current (or recent) end of xlog, and lastredoptr is the
                               3847                 :                :  * redo pointer of the last checkpoint. These are used to determine
                               3848                 :                :  * whether we want to recycle rather than delete no-longer-wanted log files.
                               3849                 :                :  *
                               3850                 :                :  * insertTLI is the current timeline for XLOG insertion. Any recycled
                               3851                 :                :  * segments should be reused for this timeline.
                               3852                 :                :  */
                               3853                 :                : static void
  891 rhaas@postgresql.org     3854                 :           1148 : RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr,
                               3855                 :                :                    TimeLineID insertTLI)
                               3856                 :                : {
                               3857                 :                :     DIR        *xldir;
                               3858                 :                :     struct dirent *xlde;
                               3859                 :                :     char        lastoff[MAXFNAMELEN];
                               3860                 :                :     XLogSegNo   endlogSegNo;
                               3861                 :                :     XLogSegNo   recycleSegNo;
                               3862                 :                : 
                               3863                 :                :     /* Initialize info about where to try to recycle to */
 1185 michael@paquier.xyz      3864                 :           1148 :     XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
                               3865                 :           1148 :     recycleSegNo = XLOGfileslop(lastredoptr);
                               3866                 :                : 
                               3867                 :                :     /*
                               3868                 :                :      * Construct a filename of the last segment to be kept. The timeline ID
                               3869                 :                :      * doesn't matter, we ignore that in the comparison. (During recovery,
                               3870                 :                :      * InsertTimeLineID isn't set, so we can't use that.)
                               3871                 :                :      */
 2399 andres@anarazel.de       3872                 :           1148 :     XLogFileName(lastoff, 0, segno, wal_segment_size);
                               3873                 :                : 
 4976 simon@2ndQuadrant.co     3874         [ +  + ]:           1148 :     elog(DEBUG2, "attempting to remove WAL segments older than log file %s",
                               3875                 :                :          lastoff);
                               3876                 :                : 
 2323 tgl@sss.pgh.pa.us        3877                 :           1148 :     xldir = AllocateDir(XLOGDIR);
                               3878                 :                : 
 6859                          3879         [ +  + ]:           9703 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3880                 :                :     {
                               3881                 :                :         /* Ignore files that are not XLOG segments */
 3264 heikki.linnakangas@i     3882         [ +  + ]:           8555 :         if (!IsXLogFileName(xlde->d_name) &&
                               3883         [ +  + ]:           4724 :             !IsPartialXLogFileName(xlde->d_name))
 3289                          3884                 :           4720 :             continue;
                               3885                 :                : 
                               3886                 :                :         /*
                               3887                 :                :          * We ignore the timeline part of the XLOG segment identifiers in
                               3888                 :                :          * deciding whether a segment is still needed.  This ensures that we
                               3889                 :                :          * won't prematurely remove a segment from a parent timeline. We could
                               3890                 :                :          * probably be a little more proactive about removing segments of
                               3891                 :                :          * non-parent timelines, but that would be a whole lot more
                               3892                 :                :          * complicated.
                               3893                 :                :          *
                               3894                 :                :          * We use the alphanumeric sorting property of the filenames to decide
                               3895                 :                :          * which ones are earlier than the lastoff segment.
                               3896                 :                :          */
                               3897         [ +  + ]:           3835 :         if (strcmp(xlde->d_name + 8, lastoff + 8) <= 0)
                               3898                 :                :         {
 4076                          3899         [ +  + ]:            765 :             if (XLogArchiveCheckDone(xlde->d_name))
                               3900                 :                :             {
                               3901                 :                :                 /* Update the last removed location in shared memory first */
 5116                          3902                 :            749 :                 UpdateLastRemovedPtr(xlde->d_name);
                               3903                 :                : 
  590 michael@paquier.xyz      3904                 :            749 :                 RemoveXlogFile(xlde, recycleSegNo, &endlogSegNo, insertTLI);
                               3905                 :                :             }
                               3906                 :                :         }
                               3907                 :                :     }
                               3908                 :                : 
 3289 heikki.linnakangas@i     3909                 :           1148 :     FreeDir(xldir);
                               3910                 :           1148 : }
                               3911                 :                : 
                               3912                 :                : /*
                               3913                 :                :  * Recycle or remove WAL files that are not part of the given timeline's
                               3914                 :                :  * history.
                               3915                 :                :  *
                               3916                 :                :  * This is called during recovery, whenever we switch to follow a new
                               3917                 :                :  * timeline, and at the end of recovery when we create a new timeline. We
                               3918                 :                :  * wouldn't otherwise care about extra WAL files lying in pg_wal, but they
                               3919                 :                :  * might be leftover pre-allocated or recycled WAL segments on the old timeline
                               3920                 :                :  * that we haven't used yet, and contain garbage. If we just leave them in
                               3921                 :                :  * pg_wal, they will eventually be archived, and we can't let that happen.
                               3922                 :                :  * Files that belong to our timeline history are valid, because we have
                               3923                 :                :  * successfully replayed them, but from others we can't be sure.
                               3924                 :                :  *
                               3925                 :                :  * 'switchpoint' is the current point in WAL where we switch to new timeline,
                               3926                 :                :  * and 'newTLI' is the new timeline we switch to.
                               3927                 :                :  */
                               3928                 :                : void
                               3929                 :             72 : RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
                               3930                 :                : {
                               3931                 :                :     DIR        *xldir;
                               3932                 :                :     struct dirent *xlde;
                               3933                 :                :     char        switchseg[MAXFNAMELEN];
                               3934                 :                :     XLogSegNo   endLogSegNo;
                               3935                 :                :     XLogSegNo   switchLogSegNo;
                               3936                 :                :     XLogSegNo   recycleSegNo;
                               3937                 :                : 
                               3938                 :                :     /*
                               3939                 :                :      * Initialize info about where to begin the work.  This will recycle,
                               3940                 :                :      * somewhat arbitrarily, 10 future segments.
                               3941                 :                :      */
 1185 michael@paquier.xyz      3942                 :             72 :     XLByteToPrevSeg(switchpoint, switchLogSegNo, wal_segment_size);
                               3943                 :             72 :     XLByteToSeg(switchpoint, endLogSegNo, wal_segment_size);
                               3944                 :             72 :     recycleSegNo = endLogSegNo + 10;
                               3945                 :                : 
                               3946                 :                :     /*
                               3947                 :                :      * Construct a filename of the last segment to be kept.
                               3948                 :                :      */
                               3949                 :             72 :     XLogFileName(switchseg, newTLI, switchLogSegNo, wal_segment_size);
                               3950                 :                : 
 3289 heikki.linnakangas@i     3951         [ +  + ]:             72 :     elog(DEBUG2, "attempting to remove WAL segments newer than log file %s",
                               3952                 :                :          switchseg);
                               3953                 :                : 
 2323 tgl@sss.pgh.pa.us        3954                 :             72 :     xldir = AllocateDir(XLOGDIR);
                               3955                 :                : 
 3289 heikki.linnakangas@i     3956         [ +  + ]:            680 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3957                 :                :     {
                               3958                 :                :         /* Ignore files that are not XLOG segments */
 3264                          3959         [ +  + ]:            608 :         if (!IsXLogFileName(xlde->d_name))
 3289                          3960                 :            381 :             continue;
                               3961                 :                : 
                               3962                 :                :         /*
                               3963                 :                :          * Remove files that are on a timeline older than the new one we're
                               3964                 :                :          * switching to, but with a segment number >= the first segment on the
                               3965                 :                :          * new timeline.
                               3966                 :                :          */
                               3967         [ +  + ]:            227 :         if (strncmp(xlde->d_name, switchseg, 8) < 0 &&
                               3968         [ +  + ]:            147 :             strcmp(xlde->d_name + 8, switchseg + 8) > 0)
                               3969                 :                :         {
                               3970                 :                :             /*
                               3971                 :                :              * If the file has already been marked as .ready, however, don't
                               3972                 :                :              * remove it yet. It should be OK to remove it - files that are
                               3973                 :                :              * not part of our timeline history are not required for recovery
                               3974                 :                :              * - but seems safer to let them be archived and removed later.
                               3975                 :                :              */
                               3976         [ +  - ]:             18 :             if (!XLogArchiveIsReady(xlde->d_name))
  590 michael@paquier.xyz      3977                 :             18 :                 RemoveXlogFile(xlde, recycleSegNo, &endLogSegNo, newTLI);
                               3978                 :                :         }
                               3979                 :                :     }
                               3980                 :                : 
 3289 heikki.linnakangas@i     3981                 :             72 :     FreeDir(xldir);
                               3982                 :             72 : }
                               3983                 :                : 
                               3984                 :                : /*
                               3985                 :                :  * Recycle or remove a log file that's no longer needed.
                               3986                 :                :  *
                               3987                 :                :  * segment_de is the dirent structure of the segment to recycle or remove.
                               3988                 :                :  * recycleSegNo is the segment number to recycle up to.  endlogSegNo is
                               3989                 :                :  * the segment number of the current (or recent) end of WAL.
                               3990                 :                :  *
                               3991                 :                :  * endlogSegNo gets incremented if the segment is recycled so as it is not
                               3992                 :                :  * checked again with future callers of this function.
                               3993                 :                :  *
                               3994                 :                :  * insertTLI is the current timeline for XLOG insertion. Any recycled segments
                               3995                 :                :  * should be used for this timeline.
                               3996                 :                :  */
                               3997                 :                : static void
  590 michael@paquier.xyz      3998                 :            767 : RemoveXlogFile(const struct dirent *segment_de,
                               3999                 :                :                XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo,
                               4000                 :                :                TimeLineID insertTLI)
                               4001                 :                : {
                               4002                 :                :     char        path[MAXPGPATH];
                               4003                 :                : #ifdef WIN32
                               4004                 :                :     char        newpath[MAXPGPATH];
                               4005                 :                : #endif
                               4006                 :            767 :     const char *segname = segment_de->d_name;
                               4007                 :                : 
 3289 heikki.linnakangas@i     4008                 :            767 :     snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);
                               4009                 :                : 
                               4010                 :                :     /*
                               4011                 :                :      * Before deleting the file, see if it can be recycled as a future log
                               4012                 :                :      * segment. Only recycle normal files, because we don't want to recycle
                               4013                 :                :      * symbolic links pointing to a separate archive directory.
                               4014                 :                :      */
 1839 tmunro@postgresql.or     4015         [ +  - ]:            767 :     if (wal_recycle &&
 1185 michael@paquier.xyz      4016         [ +  + ]:            767 :         *endlogSegNo <= recycleSegNo &&
 1021 noah@leadboat.com        4017   [ +  +  +  - ]:           1441 :         XLogCtl->InstallXLogFileSegmentActive && /* callee rechecks this */
  590 michael@paquier.xyz      4018         [ +  + ]:           1436 :         get_dirent_type(path, segment_de, false, DEBUG2) == PGFILETYPE_REG &&
 1185                          4019                 :            718 :         InstallXLogFileSegment(endlogSegNo, path,
                               4020                 :                :                                true, recycleSegNo, insertTLI))
                               4021                 :                :     {
 3289 heikki.linnakangas@i     4022         [ +  + ]:            709 :         ereport(DEBUG2,
                               4023                 :                :                 (errmsg_internal("recycled write-ahead log file \"%s\"",
                               4024                 :                :                                  segname)));
                               4025                 :            709 :         CheckpointStats.ckpt_segs_recycled++;
                               4026                 :                :         /* Needn't recheck that slot on future iterations */
 1185 michael@paquier.xyz      4027                 :            709 :         (*endlogSegNo)++;
                               4028                 :                :     }
                               4029                 :                :     else
                               4030                 :                :     {
                               4031                 :                :         /* No need for any more future segments, or recycling failed ... */
                               4032                 :                :         int         rc;
                               4033                 :                : 
 3289 heikki.linnakangas@i     4034         [ -  + ]:             58 :         ereport(DEBUG2,
                               4035                 :                :                 (errmsg_internal("removing write-ahead log file \"%s\"",
                               4036                 :                :                                  segname)));
                               4037                 :                : 
                               4038                 :                : #ifdef WIN32
                               4039                 :                : 
                               4040                 :                :         /*
                               4041                 :                :          * On Windows, if another process (e.g another backend) holds the file
                               4042                 :                :          * open in FILE_SHARE_DELETE mode, unlink will succeed, but the file
                               4043                 :                :          * will still show up in directory listing until the last handle is
                               4044                 :                :          * closed. To avoid confusing the lingering deleted file for a live
                               4045                 :                :          * WAL file that needs to be archived, rename it before deleting it.
                               4046                 :                :          *
                               4047                 :                :          * If another process holds the file open without FILE_SHARE_DELETE
                               4048                 :                :          * flag, rename will fail. We'll try again at the next checkpoint.
                               4049                 :                :          */
                               4050                 :                :         snprintf(newpath, MAXPGPATH, "%s.deleted", path);
                               4051                 :                :         if (rename(path, newpath) != 0)
                               4052                 :                :         {
                               4053                 :                :             ereport(LOG,
                               4054                 :                :                     (errcode_for_file_access(),
                               4055                 :                :                      errmsg("could not rename file \"%s\": %m",
                               4056                 :                :                             path)));
                               4057                 :                :             return;
                               4058                 :                :         }
                               4059                 :                :         rc = durable_unlink(newpath, LOG);
                               4060                 :                : #else
 2575 teodor@sigaev.ru         4061                 :             58 :         rc = durable_unlink(path, LOG);
                               4062                 :                : #endif
 3289 heikki.linnakangas@i     4063         [ -  + ]:             58 :         if (rc != 0)
                               4064                 :                :         {
                               4065                 :                :             /* Message already logged by durable_unlink() */
 3289 heikki.linnakangas@i     4066                 :UBC           0 :             return;
                               4067                 :                :         }
 3289 heikki.linnakangas@i     4068                 :CBC          58 :         CheckpointStats.ckpt_segs_removed++;
                               4069                 :                :     }
                               4070                 :                : 
                               4071                 :            767 :     XLogArchiveCleanup(segname);
                               4072                 :                : }
                               4073                 :                : 
                               4074                 :                : /*
                               4075                 :                :  * Verify whether pg_wal, pg_wal/archive_status, and pg_wal/summaries exist.
                               4076                 :                :  * If the latter do not exist, recreate them.
                               4077                 :                :  *
                               4078                 :                :  * It is not the goal of this function to verify the contents of these
                               4079                 :                :  * directories, but to help in cases where someone has performed a cluster
                               4080                 :                :  * copy for PITR purposes but omitted pg_wal from the copy.
                               4081                 :                :  *
                               4082                 :                :  * We could also recreate pg_wal if it doesn't exist, but a deliberate
                               4083                 :                :  * policy decision was made not to.  It is fairly common for pg_wal to be
                               4084                 :                :  * a symlink, and if that was the DBA's intent then automatically making a
                               4085                 :                :  * plain directory would result in degraded performance with no notice.
                               4086                 :                :  */
                               4087                 :                : static void
 5635 tgl@sss.pgh.pa.us        4088                 :            823 : ValidateXLOGDirectoryStructure(void)
                               4089                 :                : {
                               4090                 :                :     char        path[MAXPGPATH];
                               4091                 :                :     struct stat stat_buf;
                               4092                 :                : 
                               4093                 :                :     /* Check for pg_wal; if it doesn't exist, error out */
                               4094         [ +  - ]:            823 :     if (stat(XLOGDIR, &stat_buf) != 0 ||
                               4095         [ -  + ]:            823 :         !S_ISDIR(stat_buf.st_mode))
 5421 bruce@momjian.us         4096         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4097                 :                :                 (errcode_for_file_access(),
                               4098                 :                :                  errmsg("required WAL directory \"%s\" does not exist",
                               4099                 :                :                         XLOGDIR)));
                               4100                 :                : 
                               4101                 :                :     /* Check for archive_status */
 5635 tgl@sss.pgh.pa.us        4102                 :CBC         823 :     snprintf(path, MAXPGPATH, XLOGDIR "/archive_status");
                               4103         [ +  + ]:            823 :     if (stat(path, &stat_buf) == 0)
                               4104                 :                :     {
                               4105                 :                :         /* Check for weird cases where it exists but isn't a directory */
 5635 tgl@sss.pgh.pa.us        4106         [ -  + ]:GNC         821 :         if (!S_ISDIR(stat_buf.st_mode))
 5421 bruce@momjian.us         4107         [ #  # ]:UNC           0 :             ereport(FATAL,
                               4108                 :                :                     (errcode_for_file_access(),
                               4109                 :                :                      errmsg("required WAL directory \"%s\" does not exist",
                               4110                 :                :                             path)));
                               4111                 :                :     }
                               4112                 :                :     else
                               4113                 :                :     {
 5635 tgl@sss.pgh.pa.us        4114         [ +  - ]:GNC           2 :         ereport(LOG,
                               4115                 :                :                 (errmsg("creating missing WAL directory \"%s\"", path)));
 2199 sfrost@snowman.net       4116         [ -  + ]:              2 :         if (MakePGDirectory(path) < 0)
 5421 bruce@momjian.us         4117         [ #  # ]:UNC           0 :             ereport(FATAL,
                               4118                 :                :                     (errcode_for_file_access(),
                               4119                 :                :                      errmsg("could not create missing directory \"%s\": %m",
                               4120                 :                :                             path)));
                               4121                 :                :     }
                               4122                 :                : 
                               4123                 :                :     /* Check for summaries */
  116 rhaas@postgresql.org     4124                 :GNC         823 :     snprintf(path, MAXPGPATH, XLOGDIR "/summaries");
                               4125         [ +  + ]:            823 :     if (stat(path, &stat_buf) == 0)
                               4126                 :                :     {
                               4127                 :                :         /* Check for weird cases where it exists but isn't a directory */
  116 rhaas@postgresql.org     4128         [ -  + ]:CBC         821 :         if (!S_ISDIR(stat_buf.st_mode))
  116 rhaas@postgresql.org     4129         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4130                 :                :                     (errmsg("required WAL directory \"%s\" does not exist",
                               4131                 :                :                             path)));
                               4132                 :                :     }
                               4133                 :                :     else
                               4134                 :                :     {
  116 rhaas@postgresql.org     4135         [ +  - ]:GBC           2 :         ereport(LOG,
                               4136                 :                :                 (errmsg("creating missing WAL directory \"%s\"", path)));
                               4137         [ -  + ]:              2 :         if (MakePGDirectory(path) < 0)
  116 rhaas@postgresql.org     4138         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4139                 :                :                     (errmsg("could not create missing directory \"%s\": %m",
                               4140                 :                :                             path)));
                               4141                 :                :     }
 5635 tgl@sss.pgh.pa.us        4142                 :CBC         823 : }
                               4143                 :                : 
                               4144                 :                : /*
                               4145                 :                :  * Remove previous backup history files.  This also retries creation of
                               4146                 :                :  * .ready files for any backup history files for which XLogArchiveNotify
                               4147                 :                :  * failed earlier.
                               4148                 :                :  */
                               4149                 :                : static void
 6506                          4150                 :            138 : CleanupBackupHistory(void)
                               4151                 :                : {
                               4152                 :                :     DIR        *xldir;
                               4153                 :                :     struct dirent *xlde;
                               4154                 :                :     char        path[MAXPGPATH + sizeof(XLOGDIR)];
                               4155                 :                : 
 6859                          4156                 :            138 :     xldir = AllocateDir(XLOGDIR);
                               4157                 :                : 
                               4158         [ +  + ]:           1383 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               4159                 :                :     {
 3264 heikki.linnakangas@i     4160         [ +  + ]:           1107 :         if (IsBackupHistoryFileName(xlde->d_name))
                               4161                 :                :         {
 5697 tgl@sss.pgh.pa.us        4162         [ +  + ]:            144 :             if (XLogArchiveCheckDone(xlde->d_name))
                               4163                 :                :             {
 2529 peter_e@gmx.net          4164         [ +  + ]:            120 :                 elog(DEBUG2, "removing WAL backup history file \"%s\"",
                               4165                 :                :                      xlde->d_name);
 2560                          4166                 :            120 :                 snprintf(path, sizeof(path), XLOGDIR "/%s", xlde->d_name);
 6878 bruce@momjian.us         4167                 :            120 :                 unlink(path);
                               4168                 :            120 :                 XLogArchiveCleanup(xlde->d_name);
                               4169                 :                :             }
                               4170                 :                :         }
                               4171                 :                :     }
                               4172                 :                : 
                               4173                 :            138 :     FreeDir(xldir);
                               4174                 :            138 : }
                               4175                 :                : 
                               4176                 :                : /*
                               4177                 :                :  * I/O routines for pg_control
                               4178                 :                :  *
                               4179                 :                :  * *ControlFile is a buffer in shared memory that holds an image of the
                               4180                 :                :  * contents of pg_control.  WriteControlFile() initializes pg_control
                               4181                 :                :  * given a preloaded buffer, ReadControlFile() loads the buffer from
                               4182                 :                :  * the pg_control file (during postmaster or standalone-backend startup),
                               4183                 :                :  * and UpdateControlFile() rewrites pg_control after we modify xlog state.
                               4184                 :                :  * InitControlFile() fills the buffer with initial values.
                               4185                 :                :  *
                               4186                 :                :  * For simplicity, WriteControlFile() initializes the fields of pg_control
                               4187                 :                :  * that are related to checking backend/database compatibility, and
                               4188                 :                :  * ReadControlFile() verifies they are correct.  We could split out the
                               4189                 :                :  * I/O and compatibility-check functions, but there seems no need currently.
                               4190                 :                :  */
                               4191                 :                : 
                               4192                 :                : static void
  788 heikki.linnakangas@i     4193                 :             39 : InitControlFile(uint64 sysidentifier)
                               4194                 :                : {
                               4195                 :                :     char        mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
                               4196                 :                : 
                               4197                 :                :     /*
                               4198                 :                :      * Generate a random nonce. This is used for authentication requests that
                               4199                 :                :      * will fail because the user does not exist. The nonce is used to create
                               4200                 :                :      * a genuine-looking password challenge for the non-existent user, in lieu
                               4201                 :                :      * of an actual stored password.
                               4202                 :                :      */
                               4203         [ -  + ]:             39 :     if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
  788 heikki.linnakangas@i     4204         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4205                 :                :                 (errcode(ERRCODE_INTERNAL_ERROR),
                               4206                 :                :                  errmsg("could not generate secret authorization token")));
                               4207                 :                : 
  788 heikki.linnakangas@i     4208                 :CBC          39 :     memset(ControlFile, 0, sizeof(ControlFileData));
                               4209                 :                :     /* Initialize pg_control status fields */
                               4210                 :             39 :     ControlFile->system_identifier = sysidentifier;
                               4211                 :             39 :     memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
                               4212                 :             39 :     ControlFile->state = DB_SHUTDOWNED;
                               4213                 :             39 :     ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
                               4214                 :                : 
                               4215                 :                :     /* Set important parameter values for use when replaying WAL */
 1518 peter@eisentraut.org     4216                 :             39 :     ControlFile->MaxConnections = MaxConnections;
                               4217                 :             39 :     ControlFile->max_worker_processes = max_worker_processes;
                               4218                 :             39 :     ControlFile->max_wal_senders = max_wal_senders;
                               4219                 :             39 :     ControlFile->max_prepared_xacts = max_prepared_xacts;
                               4220                 :             39 :     ControlFile->max_locks_per_xact = max_locks_per_xact;
                               4221                 :             39 :     ControlFile->wal_level = wal_level;
                               4222                 :             39 :     ControlFile->wal_log_hints = wal_log_hints;
                               4223                 :             39 :     ControlFile->track_commit_timestamp = track_commit_timestamp;
                               4224                 :             39 :     ControlFile->data_checksum_version = bootstrap_data_checksum_version;
                               4225                 :             39 : }
                               4226                 :                : 
                               4227                 :                : static void
 8541 tgl@sss.pgh.pa.us        4228                 :             39 : WriteControlFile(void)
                               4229                 :                : {
                               4230                 :                :     int         fd;
                               4231                 :                :     char        buffer[PG_CONTROL_FILE_SIZE];   /* need not be aligned */
                               4232                 :                : 
                               4233                 :                :     /*
                               4234                 :                :      * Initialize version and compatibility-check fields
                               4235                 :                :      */
 8433                          4236                 :             39 :     ControlFile->pg_control_version = PG_CONTROL_VERSION;
                               4237                 :             39 :     ControlFile->catalog_version_no = CATALOG_VERSION_NO;
                               4238                 :                : 
 6768                          4239                 :             39 :     ControlFile->maxAlign = MAXIMUM_ALIGNOF;
                               4240                 :             39 :     ControlFile->floatFormat = FLOATFORMAT_VALUE;
                               4241                 :                : 
 8541                          4242                 :             39 :     ControlFile->blcksz = BLCKSZ;
                               4243                 :             39 :     ControlFile->relseg_size = RELSEG_SIZE;
 6586                          4244                 :             39 :     ControlFile->xlog_blcksz = XLOG_BLCKSZ;
 2399 andres@anarazel.de       4245                 :             39 :     ControlFile->xlog_seg_size = wal_segment_size;
                               4246                 :                : 
 8029 lockhart@fourpalms.o     4247                 :             39 :     ControlFile->nameDataLen = NAMEDATALEN;
 6956 tgl@sss.pgh.pa.us        4248                 :             39 :     ControlFile->indexMaxKeys = INDEX_MAX_KEYS;
                               4249                 :                : 
 6221                          4250                 :             39 :     ControlFile->toast_max_chunk_size = TOAST_MAX_CHUNK_SIZE;
 3601                          4251                 :             39 :     ControlFile->loblksize = LOBLKSIZE;
                               4252                 :                : 
 5837                          4253                 :             39 :     ControlFile->float8ByVal = FLOAT8PASSBYVAL;
                               4254                 :                : 
                               4255                 :                :     /* Contents are protected with a CRC */
 3449 heikki.linnakangas@i     4256                 :             39 :     INIT_CRC32C(ControlFile->crc);
                               4257                 :             39 :     COMP_CRC32C(ControlFile->crc,
                               4258                 :                :                 (char *) ControlFile,
                               4259                 :                :                 offsetof(ControlFileData, crc));
                               4260                 :             39 :     FIN_CRC32C(ControlFile->crc);
                               4261                 :                : 
                               4262                 :                :     /*
                               4263                 :                :      * We write out PG_CONTROL_FILE_SIZE bytes into pg_control, zero-padding
                               4264                 :                :      * the excess over sizeof(ControlFileData).  This reduces the odds of
                               4265                 :                :      * premature-EOF errors when reading pg_control.  We'll still fail when we
                               4266                 :                :      * check the contents of the file, but hopefully with a more specific
                               4267                 :                :      * error than "couldn't read pg_control".
                               4268                 :                :      */
 2461 tgl@sss.pgh.pa.us        4269                 :             39 :     memset(buffer, 0, PG_CONTROL_FILE_SIZE);
 8541                          4270                 :             39 :     memcpy(buffer, ControlFile, sizeof(ControlFileData));
                               4271                 :                : 
 6859                          4272                 :             39 :     fd = BasicOpenFile(XLOG_CONTROL_FILE,
                               4273                 :                :                        O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
 8541                          4274         [ -  + ]:             39 :     if (fd < 0)
 7573 tgl@sss.pgh.pa.us        4275         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4276                 :                :                 (errcode_for_file_access(),
                               4277                 :                :                  errmsg("could not create file \"%s\": %m",
                               4278                 :                :                         XLOG_CONTROL_FILE)));
                               4279                 :                : 
 8348 tgl@sss.pgh.pa.us        4280                 :CBC          39 :     errno = 0;
 2584 rhaas@postgresql.org     4281                 :             39 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_WRITE);
 2461 tgl@sss.pgh.pa.us        4282         [ -  + ]:             39 :     if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
                               4283                 :                :     {
                               4284                 :                :         /* if write didn't set errno, assume problem is no disk space */
 8348 tgl@sss.pgh.pa.us        4285         [ #  # ]:UBC           0 :         if (errno == 0)
                               4286                 :              0 :             errno = ENOSPC;
 7573                          4287         [ #  # ]:              0 :         ereport(PANIC,
                               4288                 :                :                 (errcode_for_file_access(),
                               4289                 :                :                  errmsg("could not write to file \"%s\": %m",
                               4290                 :                :                         XLOG_CONTROL_FILE)));
                               4291                 :                :     }
 2584 rhaas@postgresql.org     4292                 :CBC          39 :     pgstat_report_wait_end();
                               4293                 :                : 
                               4294                 :             39 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC);
 8528 tgl@sss.pgh.pa.us        4295         [ -  + ]:             39 :     if (pg_fsync(fd) != 0)
 7573 tgl@sss.pgh.pa.us        4296         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4297                 :                :                 (errcode_for_file_access(),
                               4298                 :                :                  errmsg("could not fsync file \"%s\": %m",
                               4299                 :                :                         XLOG_CONTROL_FILE)));
 2584 rhaas@postgresql.org     4300                 :CBC          39 :     pgstat_report_wait_end();
                               4301                 :                : 
 1744 peter@eisentraut.org     4302         [ -  + ]:             39 :     if (close(fd) != 0)
 7384 tgl@sss.pgh.pa.us        4303         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4304                 :                :                 (errcode_for_file_access(),
                               4305                 :                :                  errmsg("could not close file \"%s\": %m",
                               4306                 :                :                         XLOG_CONTROL_FILE)));
 8541 tgl@sss.pgh.pa.us        4307                 :CBC          39 : }
                               4308                 :                : 
                               4309                 :                : static void
                               4310                 :            826 : ReadControlFile(void)
                               4311                 :                : {
                               4312                 :                :     pg_crc32c   crc;
                               4313                 :                :     int         fd;
                               4314                 :                :     static char wal_segsz_str[20];
                               4315                 :                :     int         r;
                               4316                 :                : 
                               4317                 :                :     /*
                               4318                 :                :      * Read data...
                               4319                 :                :      */
 6859                          4320                 :            826 :     fd = BasicOpenFile(XLOG_CONTROL_FILE,
                               4321                 :                :                        O_RDWR | PG_BINARY);
 8541                          4322         [ -  + ]:            826 :     if (fd < 0)
 7573 tgl@sss.pgh.pa.us        4323         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4324                 :                :                 (errcode_for_file_access(),
                               4325                 :                :                  errmsg("could not open file \"%s\": %m",
                               4326                 :                :                         XLOG_CONTROL_FILE)));
                               4327                 :                : 
 2584 rhaas@postgresql.org     4328                 :CBC         826 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_READ);
 2158 magnus@hagander.net      4329                 :            826 :     r = read(fd, ControlFile, sizeof(ControlFileData));
                               4330         [ -  + ]:            826 :     if (r != sizeof(ControlFileData))
                               4331                 :                :     {
 2158 magnus@hagander.net      4332         [ #  # ]:UBC           0 :         if (r < 0)
                               4333         [ #  # ]:              0 :             ereport(PANIC,
                               4334                 :                :                     (errcode_for_file_access(),
                               4335                 :                :                      errmsg("could not read file \"%s\": %m",
                               4336                 :                :                             XLOG_CONTROL_FILE)));
                               4337                 :                :         else
                               4338         [ #  # ]:              0 :             ereport(PANIC,
                               4339                 :                :                     (errcode(ERRCODE_DATA_CORRUPTED),
                               4340                 :                :                      errmsg("could not read file \"%s\": read %d of %zu",
                               4341                 :                :                             XLOG_CONTROL_FILE, r, sizeof(ControlFileData))));
                               4342                 :                :     }
 2584 rhaas@postgresql.org     4343                 :CBC         826 :     pgstat_report_wait_end();
                               4344                 :                : 
 8541 tgl@sss.pgh.pa.us        4345                 :            826 :     close(fd);
                               4346                 :                : 
                               4347                 :                :     /*
                               4348                 :                :      * Check for expected pg_control format version.  If this is wrong, the
                               4349                 :                :      * CRC check will likely fail because we'll be checking the wrong number
                               4350                 :                :      * of bytes.  Complaining about wrong version will probably be more
                               4351                 :                :      * enlightening than complaining about wrong CRC.
                               4352                 :                :      */
                               4353                 :                : 
 5928 peter_e@gmx.net          4354   [ -  +  -  -  :            826 :     if (ControlFile->pg_control_version != PG_CONTROL_VERSION && ControlFile->pg_control_version % 65536 == 0 && ControlFile->pg_control_version / 65536 != 0)
                                              -  - ]
 5928 peter_e@gmx.net          4355         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4356                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4357                 :                :                  errmsg("database files are incompatible with server"),
                               4358                 :                :                  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d (0x%08x),"
                               4359                 :                :                            " but the server was compiled with PG_CONTROL_VERSION %d (0x%08x).",
                               4360                 :                :                            ControlFile->pg_control_version, ControlFile->pg_control_version,
                               4361                 :                :                            PG_CONTROL_VERSION, PG_CONTROL_VERSION),
                               4362                 :                :                  errhint("This could be a problem of mismatched byte ordering.  It looks like you need to initdb.")));
                               4363                 :                : 
 8433 tgl@sss.pgh.pa.us        4364         [ -  + ]:CBC         826 :     if (ControlFile->pg_control_version != PG_CONTROL_VERSION)
 7573 tgl@sss.pgh.pa.us        4365         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4366                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4367                 :                :                  errmsg("database files are incompatible with server"),
                               4368                 :                :                  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d,"
                               4369                 :                :                            " but the server was compiled with PG_CONTROL_VERSION %d.",
                               4370                 :                :                            ControlFile->pg_control_version, PG_CONTROL_VERSION),
                               4371                 :                :                  errhint("It looks like you need to initdb.")));
                               4372                 :                : 
                               4373                 :                :     /* Now check the CRC. */
 3449 heikki.linnakangas@i     4374                 :CBC         826 :     INIT_CRC32C(crc);
                               4375                 :            826 :     COMP_CRC32C(crc,
                               4376                 :                :                 (char *) ControlFile,
                               4377                 :                :                 offsetof(ControlFileData, crc));
                               4378                 :            826 :     FIN_CRC32C(crc);
                               4379                 :                : 
                               4380         [ -  + ]:            826 :     if (!EQ_CRC32C(crc, ControlFile->crc))
 7573 tgl@sss.pgh.pa.us        4381         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4382                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4383                 :                :                  errmsg("incorrect checksum in control file")));
                               4384                 :                : 
                               4385                 :                :     /*
                               4386                 :                :      * Do compatibility checking immediately.  If the database isn't
                               4387                 :                :      * compatible with the backend executable, we want to abort before we can
                               4388                 :                :      * possibly do any damage.
                               4389                 :                :      */
 8433 tgl@sss.pgh.pa.us        4390         [ -  + ]:CBC         826 :     if (ControlFile->catalog_version_no != CATALOG_VERSION_NO)
 7573 tgl@sss.pgh.pa.us        4391         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4392                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4393                 :                :                  errmsg("database files are incompatible with server"),
                               4394                 :                :                  errdetail("The database cluster was initialized with CATALOG_VERSION_NO %d,"
                               4395                 :                :                            " but the server was compiled with CATALOG_VERSION_NO %d.",
                               4396                 :                :                            ControlFile->catalog_version_no, CATALOG_VERSION_NO),
                               4397                 :                :                  errhint("It looks like you need to initdb.")));
 6768 tgl@sss.pgh.pa.us        4398         [ -  + ]:CBC         826 :     if (ControlFile->maxAlign != MAXIMUM_ALIGNOF)
 6768 tgl@sss.pgh.pa.us        4399         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4400                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4401                 :                :                  errmsg("database files are incompatible with server"),
                               4402                 :                :                  errdetail("The database cluster was initialized with MAXALIGN %d,"
                               4403                 :                :                            " but the server was compiled with MAXALIGN %d.",
                               4404                 :                :                            ControlFile->maxAlign, MAXIMUM_ALIGNOF),
                               4405                 :                :                  errhint("It looks like you need to initdb.")));
 6768 tgl@sss.pgh.pa.us        4406         [ -  + ]:CBC         826 :     if (ControlFile->floatFormat != FLOATFORMAT_VALUE)
 6768 tgl@sss.pgh.pa.us        4407         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4408                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4409                 :                :                  errmsg("database files are incompatible with server"),
                               4410                 :                :                  errdetail("The database cluster appears to use a different floating-point number format than the server executable."),
                               4411                 :                :                  errhint("It looks like you need to initdb.")));
 8541 tgl@sss.pgh.pa.us        4412         [ -  + ]:CBC         826 :     if (ControlFile->blcksz != BLCKSZ)
 7573 tgl@sss.pgh.pa.us        4413         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4414                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4415                 :                :                  errmsg("database files are incompatible with server"),
                               4416                 :                :                  errdetail("The database cluster was initialized with BLCKSZ %d,"
                               4417                 :                :                            " but the server was compiled with BLCKSZ %d.",
                               4418                 :                :                            ControlFile->blcksz, BLCKSZ),
                               4419                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 8541 tgl@sss.pgh.pa.us        4420         [ -  + ]:CBC         826 :     if (ControlFile->relseg_size != RELSEG_SIZE)
 7573 tgl@sss.pgh.pa.us        4421         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4422                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4423                 :                :                  errmsg("database files are incompatible with server"),
                               4424                 :                :                  errdetail("The database cluster was initialized with RELSEG_SIZE %d,"
                               4425                 :                :                            " but the server was compiled with RELSEG_SIZE %d.",
                               4426                 :                :                            ControlFile->relseg_size, RELSEG_SIZE),
                               4427                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 6586 tgl@sss.pgh.pa.us        4428         [ -  + ]:CBC         826 :     if (ControlFile->xlog_blcksz != XLOG_BLCKSZ)
 6586 tgl@sss.pgh.pa.us        4429         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4430                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4431                 :                :                  errmsg("database files are incompatible with server"),
                               4432                 :                :                  errdetail("The database cluster was initialized with XLOG_BLCKSZ %d,"
                               4433                 :                :                            " but the server was compiled with XLOG_BLCKSZ %d.",
                               4434                 :                :                            ControlFile->xlog_blcksz, XLOG_BLCKSZ),
                               4435                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 8029 lockhart@fourpalms.o     4436         [ -  + ]:CBC         826 :     if (ControlFile->nameDataLen != NAMEDATALEN)
 7573 tgl@sss.pgh.pa.us        4437         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4438                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4439                 :                :                  errmsg("database files are incompatible with server"),
                               4440                 :                :                  errdetail("The database cluster was initialized with NAMEDATALEN %d,"
                               4441                 :                :                            " but the server was compiled with NAMEDATALEN %d.",
                               4442                 :                :                            ControlFile->nameDataLen, NAMEDATALEN),
                               4443                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 6956 tgl@sss.pgh.pa.us        4444         [ -  + ]:CBC         826 :     if (ControlFile->indexMaxKeys != INDEX_MAX_KEYS)
 7573 tgl@sss.pgh.pa.us        4445         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4446                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4447                 :                :                  errmsg("database files are incompatible with server"),
                               4448                 :                :                  errdetail("The database cluster was initialized with INDEX_MAX_KEYS %d,"
                               4449                 :                :                            " but the server was compiled with INDEX_MAX_KEYS %d.",
                               4450                 :                :                            ControlFile->indexMaxKeys, INDEX_MAX_KEYS),
                               4451                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 6221 tgl@sss.pgh.pa.us        4452         [ -  + ]:CBC         826 :     if (ControlFile->toast_max_chunk_size != TOAST_MAX_CHUNK_SIZE)
 6221 tgl@sss.pgh.pa.us        4453         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4454                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4455                 :                :                  errmsg("database files are incompatible with server"),
                               4456                 :                :                  errdetail("The database cluster was initialized with TOAST_MAX_CHUNK_SIZE %d,"
                               4457                 :                :                            " but the server was compiled with TOAST_MAX_CHUNK_SIZE %d.",
                               4458                 :                :                            ControlFile->toast_max_chunk_size, (int) TOAST_MAX_CHUNK_SIZE),
                               4459                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 3601 tgl@sss.pgh.pa.us        4460         [ -  + ]:CBC         826 :     if (ControlFile->loblksize != LOBLKSIZE)
 3601 tgl@sss.pgh.pa.us        4461         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4462                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4463                 :                :                  errmsg("database files are incompatible with server"),
                               4464                 :                :                  errdetail("The database cluster was initialized with LOBLKSIZE %d,"
                               4465                 :                :                            " but the server was compiled with LOBLKSIZE %d.",
                               4466                 :                :                            ControlFile->loblksize, (int) LOBLKSIZE),
                               4467                 :                :                  errhint("It looks like you need to recompile or initdb.")));
                               4468                 :                : 
                               4469                 :                : #ifdef USE_FLOAT8_BYVAL
 5837 tgl@sss.pgh.pa.us        4470         [ -  + ]:CBC         826 :     if (ControlFile->float8ByVal != true)
 5837 tgl@sss.pgh.pa.us        4471         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4472                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4473                 :                :                  errmsg("database files are incompatible with server"),
                               4474                 :                :                  errdetail("The database cluster was initialized without USE_FLOAT8_BYVAL"
                               4475                 :                :                            " but the server was compiled with USE_FLOAT8_BYVAL."),
                               4476                 :                :                  errhint("It looks like you need to recompile or initdb.")));
                               4477                 :                : #else
                               4478                 :                :     if (ControlFile->float8ByVal != false)
                               4479                 :                :         ereport(FATAL,
                               4480                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4481                 :                :                  errmsg("database files are incompatible with server"),
                               4482                 :                :                  errdetail("The database cluster was initialized with USE_FLOAT8_BYVAL"
                               4483                 :                :                            " but the server was compiled without USE_FLOAT8_BYVAL."),
                               4484                 :                :                  errhint("It looks like you need to recompile or initdb.")));
                               4485                 :                : #endif
                               4486                 :                : 
 2399 andres@anarazel.de       4487                 :CBC         826 :     wal_segment_size = ControlFile->xlog_seg_size;
                               4488                 :                : 
                               4489   [ +  -  +  -  :            826 :     if (!IsValidWalSegSize(wal_segment_size))
                                        +  -  -  + ]
 2399 andres@anarazel.de       4490         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4491                 :                :                         errmsg_plural("invalid WAL segment size in control file (%d byte)",
                               4492                 :                :                                       "invalid WAL segment size in control file (%d bytes)",
                               4493                 :                :                                       wal_segment_size,
                               4494                 :                :                                       wal_segment_size),
                               4495                 :                :                         errdetail("The WAL segment size must be a power of two between 1 MB and 1 GB.")));
                               4496                 :                : 
 2399 andres@anarazel.de       4497                 :CBC         826 :     snprintf(wal_segsz_str, sizeof(wal_segsz_str), "%d", wal_segment_size);
                               4498                 :            826 :     SetConfigOption("wal_segment_size", wal_segsz_str, PGC_INTERNAL,
                               4499                 :                :                     PGC_S_DYNAMIC_DEFAULT);
                               4500                 :                : 
                               4501                 :                :     /* check and update variables dependent on wal_segment_size */
                               4502         [ -  + ]:            826 :     if (ConvertToXSegs(min_wal_size_mb, wal_segment_size) < 2)
 2399 andres@anarazel.de       4503         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4504                 :                :                         errmsg("min_wal_size must be at least twice wal_segment_size")));
                               4505                 :                : 
 2399 andres@anarazel.de       4506         [ -  + ]:CBC         826 :     if (ConvertToXSegs(max_wal_size_mb, wal_segment_size) < 2)
 2399 andres@anarazel.de       4507         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4508                 :                :                         errmsg("max_wal_size must be at least twice wal_segment_size")));
                               4509                 :                : 
 2399 andres@anarazel.de       4510                 :CBC         826 :     UsableBytesInSegment =
                               4511                 :            826 :         (wal_segment_size / XLOG_BLCKSZ * UsableBytesInPage) -
                               4512                 :                :         (SizeOfXLogLongPHD - SizeOfXLogShortPHD);
                               4513                 :                : 
                               4514                 :            826 :     CalculateCheckpointSegments();
                               4515                 :                : 
                               4516                 :                :     /* Make the initdb settings visible as GUC variables, too */
 2197 magnus@hagander.net      4517         [ +  + ]:            826 :     SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
                               4518                 :                :                     PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
 8541 tgl@sss.pgh.pa.us        4519                 :            826 : }
                               4520                 :                : 
                               4521                 :                : /*
                               4522                 :                :  * Utility wrapper to update the control file.  Note that the control
                               4523                 :                :  * file gets flushed.
                               4524                 :                :  */
                               4525                 :                : static void
                               4526                 :          10366 : UpdateControlFile(void)
                               4527                 :                : {
 1840 peter@eisentraut.org     4528                 :          10366 :     update_controlfile(DataDir, ControlFile, true);
 8957 vadim4o@yahoo.com        4529                 :          10366 : }
                               4530                 :                : 
                               4531                 :                : /*
                               4532                 :                :  * Returns the unique system identifier from control file.
                               4533                 :                :  */
                               4534                 :                : uint64
 5203 heikki.linnakangas@i     4535                 :           1285 : GetSystemIdentifier(void)
                               4536                 :                : {
                               4537         [ -  + ]:           1285 :     Assert(ControlFile != NULL);
                               4538                 :           1285 :     return ControlFile->system_identifier;
                               4539                 :                : }
                               4540                 :                : 
                               4541                 :                : /*
                               4542                 :                :  * Returns the random nonce from control file.
                               4543                 :                :  */
                               4544                 :                : char *
 2595                          4545                 :              1 : GetMockAuthenticationNonce(void)
                               4546                 :                : {
                               4547         [ -  + ]:              1 :     Assert(ControlFile != NULL);
                               4548                 :              1 :     return ControlFile->mock_authentication_nonce;
                               4549                 :                : }
                               4550                 :                : 
                               4551                 :                : /*
                               4552                 :                :  * Are checksums enabled for data pages?
                               4553                 :                :  */
                               4554                 :                : bool
 2197 magnus@hagander.net      4555                 :       10465402 : DataChecksumsEnabled(void)
                               4556                 :                : {
 4041 simon@2ndQuadrant.co     4557         [ -  + ]:       10465402 :     Assert(ControlFile != NULL);
 4002                          4558                 :       10465402 :     return (ControlFile->data_checksum_version > 0);
                               4559                 :                : }
                               4560                 :                : 
                               4561                 :                : /*
                               4562                 :                :  * Returns a fake LSN for unlogged relations.
                               4563                 :                :  *
                               4564                 :                :  * Each call generates an LSN that is greater than any previous value
                               4565                 :                :  * returned. The current counter value is saved and restored across clean
                               4566                 :                :  * shutdowns, but like unlogged relations, does not survive a crash. This can
                               4567                 :                :  * be used in lieu of real LSN values returned by XLogInsert, if you need an
                               4568                 :                :  * LSN-like increasing sequence of numbers without writing any WAL.
                               4569                 :                :  */
                               4570                 :                : XLogRecPtr
 4080 heikki.linnakangas@i     4571                 :             33 : GetFakeLSNForUnloggedRel(void)
                               4572                 :                : {
   45 nathan@postgresql.or     4573                 :GNC          33 :     return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
                               4574                 :                : }
                               4575                 :                : 
                               4576                 :                : /*
                               4577                 :                :  * Auto-tune the number of XLOG buffers.
                               4578                 :                :  *
                               4579                 :                :  * The preferred setting for wal_buffers is about 3% of shared_buffers, with
                               4580                 :                :  * a maximum of one XLOG segment (there is little reason to think that more
                               4581                 :                :  * is helpful, at least so long as we force an fsync when switching log files)
                               4582                 :                :  * and a minimum of 8 blocks (which was the default value prior to PostgreSQL
                               4583                 :                :  * 9.1, when auto-tuning was added).
                               4584                 :                :  *
                               4585                 :                :  * This should not be called until NBuffers has received its final value.
                               4586                 :                :  */
                               4587                 :                : static int
 4756 tgl@sss.pgh.pa.us        4588                 :CBC         896 : XLOGChooseNumBuffers(void)
                               4589                 :                : {
                               4590                 :                :     int         xbuffers;
                               4591                 :                : 
                               4592                 :            896 :     xbuffers = NBuffers / 32;
 2399 andres@anarazel.de       4593         [ +  + ]:            896 :     if (xbuffers > (wal_segment_size / XLOG_BLCKSZ))
                               4594                 :             21 :         xbuffers = (wal_segment_size / XLOG_BLCKSZ);
 4756 tgl@sss.pgh.pa.us        4595         [ +  + ]:            896 :     if (xbuffers < 8)
                               4596                 :            356 :         xbuffers = 8;
                               4597                 :            896 :     return xbuffers;
                               4598                 :                : }
                               4599                 :                : 
                               4600                 :                : /*
                               4601                 :                :  * GUC check_hook for wal_buffers
                               4602                 :                :  */
                               4603                 :                : bool
                               4604                 :           1824 : check_wal_buffers(int *newval, void **extra, GucSource source)
                               4605                 :                : {
                               4606                 :                :     /*
                               4607                 :                :      * -1 indicates a request for auto-tune.
                               4608                 :                :      */
                               4609         [ +  + ]:           1824 :     if (*newval == -1)
                               4610                 :                :     {
                               4611                 :                :         /*
                               4612                 :                :          * If we haven't yet changed the boot_val default of -1, just let it
                               4613                 :                :          * be.  We'll fix it when XLOGShmemSize is called.
                               4614                 :                :          */
                               4615         [ +  - ]:            928 :         if (XLOGbuffers == -1)
                               4616                 :            928 :             return true;
                               4617                 :                : 
                               4618                 :                :         /* Otherwise, substitute the auto-tune value */
 4756 tgl@sss.pgh.pa.us        4619                 :UBC           0 :         *newval = XLOGChooseNumBuffers();
                               4620                 :                :     }
                               4621                 :                : 
                               4622                 :                :     /*
                               4623                 :                :      * We clamp manually-set values to at least 4 blocks.  Prior to PostgreSQL
                               4624                 :                :      * 9.1, a minimum of 4 was enforced by guc.c, but since that is no longer
                               4625                 :                :      * the case, we just silently treat such values as a request for the
                               4626                 :                :      * minimum.  (We could throw an error instead, but that doesn't seem very
                               4627                 :                :      * helpful.)
                               4628                 :                :      */
 4756 tgl@sss.pgh.pa.us        4629         [ -  + ]:CBC         896 :     if (*newval < 4)
 4756 tgl@sss.pgh.pa.us        4630                 :UBC           0 :         *newval = 4;
                               4631                 :                : 
 4756 tgl@sss.pgh.pa.us        4632                 :CBC         896 :     return true;
                               4633                 :                : }
                               4634                 :                : 
                               4635                 :                : /*
                               4636                 :                :  * GUC check_hook for wal_consistency_checking
                               4637                 :                :  */
                               4638                 :                : bool
  579                          4639                 :            930 : check_wal_consistency_checking(char **newval, void **extra, GucSource source)
                               4640                 :                : {
                               4641                 :                :     char       *rawstring;
                               4642                 :                :     List       *elemlist;
                               4643                 :                :     ListCell   *l;
                               4644                 :                :     bool        newwalconsistency[RM_MAX_ID + 1];
                               4645                 :                : 
                               4646                 :                :     /* Initialize the array */
                               4647   [ +  -  +  -  :          30690 :     MemSet(newwalconsistency, 0, (RM_MAX_ID + 1) * sizeof(bool));
                                     +  -  +  -  +  
                                                 + ]
                               4648                 :                : 
                               4649                 :                :     /* Need a modifiable copy of string */
                               4650                 :            930 :     rawstring = pstrdup(*newval);
                               4651                 :                : 
                               4652                 :                :     /* Parse string into list of identifiers */
                               4653         [ -  + ]:            930 :     if (!SplitIdentifierString(rawstring, ',', &elemlist))
                               4654                 :                :     {
                               4655                 :                :         /* syntax error in list */
  579 tgl@sss.pgh.pa.us        4656                 :UBC           0 :         GUC_check_errdetail("List syntax is invalid.");
                               4657                 :              0 :         pfree(rawstring);
                               4658                 :              0 :         list_free(elemlist);
                               4659                 :              0 :         return false;
                               4660                 :                :     }
                               4661                 :                : 
  579 tgl@sss.pgh.pa.us        4662   [ +  +  +  +  :CBC         932 :     foreach(l, elemlist)
                                              +  + ]
                               4663                 :                :     {
                               4664                 :              2 :         char       *tok = (char *) lfirst(l);
                               4665                 :                :         int         rmid;
                               4666                 :                : 
                               4667                 :                :         /* Check for 'all'. */
                               4668         [ -  + ]:              2 :         if (pg_strcasecmp(tok, "all") == 0)
                               4669                 :                :         {
  579 tgl@sss.pgh.pa.us        4670         [ #  # ]:UBC           0 :             for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
                               4671   [ #  #  #  # ]:              0 :                 if (RmgrIdExists(rmid) && GetRmgr(rmid).rm_mask != NULL)
                               4672                 :              0 :                     newwalconsistency[rmid] = true;
                               4673                 :                :         }
                               4674                 :                :         else
                               4675                 :                :         {
                               4676                 :                :             /* Check if the token matches any known resource manager. */
  579 tgl@sss.pgh.pa.us        4677                 :CBC           2 :             bool        found = false;
                               4678                 :                : 
                               4679         [ +  - ]:             36 :             for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
                               4680                 :                :             {
                               4681   [ +  -  +  +  :             54 :                 if (RmgrIdExists(rmid) && GetRmgr(rmid).rm_mask != NULL &&
                                              +  + ]
                               4682                 :             18 :                     pg_strcasecmp(tok, GetRmgr(rmid).rm_name) == 0)
                               4683                 :                :                 {
                               4684                 :              2 :                     newwalconsistency[rmid] = true;
                               4685                 :              2 :                     found = true;
                               4686                 :              2 :                     break;
                               4687                 :                :                 }
                               4688                 :                :             }
                               4689         [ -  + ]:              2 :             if (!found)
                               4690                 :                :             {
                               4691                 :                :                 /*
                               4692                 :                :                  * During startup, it might be a not-yet-loaded custom
                               4693                 :                :                  * resource manager.  Defer checking until
                               4694                 :                :                  * InitializeWalConsistencyChecking().
                               4695                 :                :                  */
  579 tgl@sss.pgh.pa.us        4696         [ #  # ]:UBC           0 :                 if (!process_shared_preload_libraries_done)
                               4697                 :                :                 {
                               4698                 :              0 :                     check_wal_consistency_checking_deferred = true;
                               4699                 :                :                 }
                               4700                 :                :                 else
                               4701                 :                :                 {
                               4702                 :              0 :                     GUC_check_errdetail("Unrecognized key word: \"%s\".", tok);
                               4703                 :              0 :                     pfree(rawstring);
                               4704                 :              0 :                     list_free(elemlist);
                               4705                 :              0 :                     return false;
                               4706                 :                :                 }
                               4707                 :                :             }
                               4708                 :                :         }
                               4709                 :                :     }
                               4710                 :                : 
  579 tgl@sss.pgh.pa.us        4711                 :CBC         930 :     pfree(rawstring);
                               4712                 :            930 :     list_free(elemlist);
                               4713                 :                : 
                               4714                 :                :     /* assign new value */
                               4715                 :            930 :     *extra = guc_malloc(ERROR, (RM_MAX_ID + 1) * sizeof(bool));
                               4716                 :            930 :     memcpy(*extra, newwalconsistency, (RM_MAX_ID + 1) * sizeof(bool));
                               4717                 :            930 :     return true;
                               4718                 :                : }
                               4719                 :                : 
                               4720                 :                : /*
                               4721                 :                :  * GUC assign_hook for wal_consistency_checking
                               4722                 :                :  */
                               4723                 :                : void
                               4724                 :            930 : assign_wal_consistency_checking(const char *newval, void *extra)
                               4725                 :                : {
                               4726                 :                :     /*
                               4727                 :                :      * If some checks were deferred, it's possible that the checks will fail
                               4728                 :                :      * later during InitializeWalConsistencyChecking(). But in that case, the
                               4729                 :                :      * postmaster will exit anyway, so it's safe to proceed with the
                               4730                 :                :      * assignment.
                               4731                 :                :      *
                               4732                 :                :      * Any built-in resource managers specified are assigned immediately,
                               4733                 :                :      * which affects WAL created before shared_preload_libraries are
                               4734                 :                :      * processed. Any custom resource managers specified won't be assigned
                               4735                 :                :      * until after shared_preload_libraries are processed, but that's OK
                               4736                 :                :      * because WAL for a custom resource manager can't be written before the
                               4737                 :                :      * module is loaded anyway.
                               4738                 :                :      */
                               4739                 :            930 :     wal_consistency_checking = extra;
                               4740                 :            930 : }
                               4741                 :                : 
                               4742                 :                : /*
                               4743                 :                :  * InitializeWalConsistencyChecking: run after loading custom resource managers
                               4744                 :                :  *
                               4745                 :                :  * If any unknown resource managers were specified in the
                               4746                 :                :  * wal_consistency_checking GUC, processing was deferred.  Now that
                               4747                 :                :  * shared_preload_libraries have been loaded, process wal_consistency_checking
                               4748                 :                :  * again.
                               4749                 :                :  */
                               4750                 :                : void
                               4751                 :            779 : InitializeWalConsistencyChecking(void)
                               4752                 :                : {
                               4753         [ -  + ]:            779 :     Assert(process_shared_preload_libraries_done);
                               4754                 :                : 
                               4755         [ -  + ]:            779 :     if (check_wal_consistency_checking_deferred)
                               4756                 :                :     {
                               4757                 :                :         struct config_generic *guc;
                               4758                 :                : 
  579 tgl@sss.pgh.pa.us        4759                 :UBC           0 :         guc = find_option("wal_consistency_checking", false, false, ERROR);
                               4760                 :                : 
                               4761                 :              0 :         check_wal_consistency_checking_deferred = false;
                               4762                 :                : 
                               4763                 :              0 :         set_config_option_ext("wal_consistency_checking",
                               4764                 :                :                               wal_consistency_checking_string,
                               4765                 :                :                               guc->scontext, guc->source, guc->srole,
                               4766                 :                :                               GUC_ACTION_SET, true, ERROR, false);
                               4767                 :                : 
                               4768                 :                :         /* checking should not be deferred again */
                               4769         [ #  # ]:              0 :         Assert(!check_wal_consistency_checking_deferred);
                               4770                 :                :     }
  579 tgl@sss.pgh.pa.us        4771                 :CBC         779 : }
                               4772                 :                : 
                               4773                 :                : /*
                               4774                 :                :  * GUC show_hook for archive_command
                               4775                 :                :  */
                               4776                 :                : const char *
                               4777                 :           1892 : show_archive_command(void)
                               4778                 :                : {
                               4779   [ -  +  -  -  :           1892 :     if (XLogArchivingActive())
                                              -  + ]
  579 tgl@sss.pgh.pa.us        4780                 :UBC           0 :         return XLogArchiveCommand;
                               4781                 :                :     else
  579 tgl@sss.pgh.pa.us        4782                 :CBC        1892 :         return "(disabled)";
                               4783                 :                : }
                               4784                 :                : 
                               4785                 :                : /*
                               4786                 :                :  * GUC show_hook for in_hot_standby
                               4787                 :                :  */
                               4788                 :                : const char *
                               4789                 :          13355 : show_in_hot_standby(void)
                               4790                 :                : {
                               4791                 :                :     /*
                               4792                 :                :      * We display the actual state based on shared memory, so that this GUC
                               4793                 :                :      * reports up-to-date state if examined intra-query.  The underlying
                               4794                 :                :      * variable (in_hot_standby_guc) changes only when we transmit a new value
                               4795                 :                :      * to the client.
                               4796                 :                :      */
                               4797         [ +  + ]:          13355 :     return RecoveryInProgress() ? "on" : "off";
                               4798                 :                : }
                               4799                 :                : 
                               4800                 :                : /*
                               4801                 :                :  * Read the control file, set respective GUCs.
                               4802                 :                :  *
                               4803                 :                :  * This is to be called during startup, including a crash recovery cycle,
                               4804                 :                :  * unless in bootstrap mode, where no control file yet exists.  As there's no
                               4805                 :                :  * usable shared memory yet (its sizing can depend on the contents of the
                               4806                 :                :  * control file!), first store the contents in local memory. XLOGShmemInit()
                               4807                 :                :  * will then copy it to shared memory later.
                               4808                 :                :  *
                               4809                 :                :  * reset just controls whether previous contents are to be expected (in the
                               4810                 :                :  * reset case, there's a dangling pointer into old shared memory), or not.
                               4811                 :                :  */
                               4812                 :                : void
 2401 andres@anarazel.de       4813                 :            787 : LocalProcessControlFile(bool reset)
                               4814                 :                : {
                               4815   [ +  +  -  + ]:            787 :     Assert(reset || ControlFile == NULL);
 2405                          4816                 :            787 :     ControlFile = palloc(sizeof(ControlFileData));
                               4817                 :            787 :     ReadControlFile();
                               4818                 :            787 : }
                               4819                 :                : 
                               4820                 :                : /*
                               4821                 :                :  * Get the wal_level from the control file. For a standby, this value should be
                               4822                 :                :  * considered as its active wal_level, because it may be different from what
                               4823                 :                :  * was originally configured on standby.
                               4824                 :                :  */
                               4825                 :                : WalLevel
  372                          4826                 :             68 : GetActiveWalLevelOnStandby(void)
                               4827                 :                : {
                               4828                 :             68 :     return ControlFile->wal_level;
                               4829                 :                : }
                               4830                 :                : 
                               4831                 :                : /*
                               4832                 :                :  * Initialization of shared memory for XLOG
                               4833                 :                :  */
                               4834                 :                : Size
 8545 peter_e@gmx.net          4835                 :           2577 : XLOGShmemSize(void)
                               4836                 :                : {
                               4837                 :                :     Size        size;
                               4838                 :                : 
                               4839                 :                :     /*
                               4840                 :                :      * If the value of wal_buffers is -1, use the preferred auto-tune value.
                               4841                 :                :      * This isn't an amazingly clean place to do this, but we must wait till
                               4842                 :                :      * NBuffers has received its final value, and must do it before using the
                               4843                 :                :      * value of XLOGbuffers to do anything important.
                               4844                 :                :      *
                               4845                 :                :      * We prefer to report this value's source as PGC_S_DYNAMIC_DEFAULT.
                               4846                 :                :      * However, if the DBA explicitly set wal_buffers = -1 in the config file,
                               4847                 :                :      * then PGC_S_DYNAMIC_DEFAULT will fail to override that and we must force
                               4848                 :                :      * the matter with PGC_S_OVERRIDE.
                               4849                 :                :      */
 4756 tgl@sss.pgh.pa.us        4850         [ +  + ]:           2577 :     if (XLOGbuffers == -1)
                               4851                 :                :     {
                               4852                 :                :         char        buf[32];
                               4853                 :                : 
                               4854                 :            896 :         snprintf(buf, sizeof(buf), "%d", XLOGChooseNumBuffers());
  676                          4855                 :            896 :         SetConfigOption("wal_buffers", buf, PGC_POSTMASTER,
                               4856                 :                :                         PGC_S_DYNAMIC_DEFAULT);
                               4857         [ -  + ]:            896 :         if (XLOGbuffers == -1)  /* failed to apply it? */
  676 tgl@sss.pgh.pa.us        4858                 :UBC           0 :             SetConfigOption("wal_buffers", buf, PGC_POSTMASTER,
                               4859                 :                :                             PGC_S_OVERRIDE);
                               4860                 :                :     }
 4831 tgl@sss.pgh.pa.us        4861         [ -  + ]:CBC        2577 :     Assert(XLOGbuffers > 0);
                               4862                 :                : 
                               4863                 :                :     /* XLogCtl */
 6812                          4864                 :           2577 :     size = sizeof(XLogCtlData);
                               4865                 :                : 
                               4866                 :                :     /* WAL insertion locks, plus alignment */
 3483 heikki.linnakangas@i     4867                 :           2577 :     size = add_size(size, mul_size(sizeof(WALInsertLockPadded), NUM_XLOGINSERT_LOCKS + 1));
                               4868                 :                :     /* xlblocks array */
  117 jdavis@postgresql.or     4869                 :GNC        2577 :     size = add_size(size, mul_size(sizeof(pg_atomic_uint64), XLOGbuffers));
                               4870                 :                :     /* extra alignment padding for XLOG I/O buffers */
  372 tmunro@postgresql.or     4871                 :CBC        2577 :     size = add_size(size, Max(XLOG_BLCKSZ, PG_IO_ALIGN_SIZE));
                               4872                 :                :     /* and the buffers themselves */
 6586 tgl@sss.pgh.pa.us        4873                 :           2577 :     size = add_size(size, mul_size(XLOG_BLCKSZ, XLOGbuffers));
                               4874                 :                : 
                               4875                 :                :     /*
                               4876                 :                :      * Note: we don't count ControlFileData, it comes out of the "slop factor"
                               4877                 :                :      * added by CreateSharedMemoryAndSemaphores.  This lets us use this
                               4878                 :                :      * routine again below to compute the actual allocation size.
                               4879                 :                :      */
                               4880                 :                : 
 6812                          4881                 :           2577 :     return size;
                               4882                 :                : }
                               4883                 :                : 
                               4884                 :                : void
 8957 vadim4o@yahoo.com        4885                 :            898 : XLOGShmemInit(void)
                               4886                 :                : {
                               4887                 :                :     bool        foundCFile,
                               4888                 :                :                 foundXLog;
                               4889                 :                :     char       *allocptr;
                               4890                 :                :     int         i;
                               4891                 :                :     ControlFileData *localControlFile;
                               4892                 :                : 
                               4893                 :                : #ifdef WAL_DEBUG
                               4894                 :                : 
                               4895                 :                :     /*
                               4896                 :                :      * Create a memory context for WAL debugging that's exempt from the normal
                               4897                 :                :      * "no pallocs in critical section" rule. Yes, that can lead to a PANIC if
                               4898                 :                :      * an allocation fails, but wal_debug is not for production use anyway.
                               4899                 :                :      */
                               4900                 :                :     if (walDebugCxt == NULL)
                               4901                 :                :     {
                               4902                 :                :         walDebugCxt = AllocSetContextCreate(TopMemoryContext,
                               4903                 :                :                                             "WAL Debug",
                               4904                 :                :                                             ALLOCSET_DEFAULT_SIZES);
                               4905                 :                :         MemoryContextAllowInCriticalSection(walDebugCxt, true);
                               4906                 :                :     }
                               4907                 :                : #endif
                               4908                 :                : 
                               4909                 :                : 
 2401 andres@anarazel.de       4910                 :            898 :     XLogCtl = (XLogCtlData *)
                               4911                 :            898 :         ShmemInitStruct("XLOG Ctl", XLOGShmemSize(), &foundXLog);
                               4912                 :                : 
 2405                          4913                 :            898 :     localControlFile = ControlFile;
 8541 tgl@sss.pgh.pa.us        4914                 :            898 :     ControlFile = (ControlFileData *)
 7421 bruce@momjian.us         4915                 :            898 :         ShmemInitStruct("Control File", sizeof(ControlFileData), &foundCFile);
                               4916                 :                : 
 6810 tgl@sss.pgh.pa.us        4917   [ +  -  -  + ]:            898 :     if (foundCFile || foundXLog)
                               4918                 :                :     {
                               4919                 :                :         /* both should be present or neither */
 6810 tgl@sss.pgh.pa.us        4920   [ #  #  #  # ]:UBC           0 :         Assert(foundCFile && foundXLog);
                               4921                 :                : 
                               4922                 :                :         /* Initialize local copy of WALInsertLocks */
 3552 rhaas@postgresql.org     4923                 :              0 :         WALInsertLocks = XLogCtl->Insert.WALInsertLocks;
                               4924                 :                : 
 2401 andres@anarazel.de       4925         [ #  # ]:              0 :         if (localControlFile)
                               4926                 :              0 :             pfree(localControlFile);
 7421 bruce@momjian.us         4927                 :              0 :         return;
                               4928                 :                :     }
 8433 tgl@sss.pgh.pa.us        4929                 :CBC         898 :     memset(XLogCtl, 0, sizeof(XLogCtlData));
                               4930                 :                : 
                               4931                 :                :     /*
                               4932                 :                :      * Already have read control file locally, unless in bootstrap mode. Move
                               4933                 :                :      * contents into shared memory.
                               4934                 :                :      */
 2401 andres@anarazel.de       4935         [ +  + ]:            898 :     if (localControlFile)
                               4936                 :                :     {
                               4937                 :            781 :         memcpy(ControlFile, localControlFile, sizeof(ControlFileData));
                               4938                 :            781 :         pfree(localControlFile);
                               4939                 :                :     }
                               4940                 :                : 
                               4941                 :                :     /*
                               4942                 :                :      * Since XLogCtlData contains XLogRecPtr fields, its sizeof should be a
                               4943                 :                :      * multiple of the alignment for same, so no extra alignment padding is
                               4944                 :                :      * needed here.
                               4945                 :                :      */
 3933 heikki.linnakangas@i     4946                 :            898 :     allocptr = ((char *) XLogCtl) + sizeof(XLogCtlData);
  117 jdavis@postgresql.or     4947                 :GNC         898 :     XLogCtl->xlblocks = (pg_atomic_uint64 *) allocptr;
                               4948                 :            898 :     allocptr += sizeof(pg_atomic_uint64) * XLOGbuffers;
                               4949                 :                : 
                               4950         [ +  + ]:         252915 :     for (i = 0; i < XLOGbuffers; i++)
                               4951                 :                :     {
                               4952                 :         252017 :         pg_atomic_init_u64(&XLogCtl->xlblocks[i], InvalidXLogRecPtr);
                               4953                 :                :     }
                               4954                 :                : 
                               4955                 :                :     /* WAL insertion locks. Ensure they're aligned to the full padded size */
 3677 heikki.linnakangas@i     4956                 :CBC         898 :     allocptr += sizeof(WALInsertLockPadded) -
 2489 tgl@sss.pgh.pa.us        4957                 :            898 :         ((uintptr_t) allocptr) % sizeof(WALInsertLockPadded);
 3677 heikki.linnakangas@i     4958                 :            898 :     WALInsertLocks = XLogCtl->Insert.WALInsertLocks =
                               4959                 :                :         (WALInsertLockPadded *) allocptr;
 3483                          4960                 :            898 :     allocptr += sizeof(WALInsertLockPadded) * NUM_XLOGINSERT_LOCKS;
                               4961                 :                : 
                               4962         [ +  + ]:           8082 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               4963                 :                :     {
 3043 rhaas@postgresql.org     4964                 :           7184 :         LWLockInitialize(&WALInsertLocks[i].l.lock, LWTRANCHE_WAL_INSERT);
  264 michael@paquier.xyz      4965                 :GNC        7184 :         pg_atomic_init_u64(&WALInsertLocks[i].l.insertingAt, InvalidXLogRecPtr);
 2670 andres@anarazel.de       4966                 :CBC        7184 :         WALInsertLocks[i].l.lastImportantAt = InvalidXLogRecPtr;
                               4967                 :                :     }
                               4968                 :                : 
                               4969                 :                :     /*
                               4970                 :                :      * Align the start of the page buffers to a full xlog block size boundary.
                               4971                 :                :      * This simplifies some calculations in XLOG insertion. It is also
                               4972                 :                :      * required for O_DIRECT.
                               4973                 :                :      */
 3933 heikki.linnakangas@i     4974                 :            898 :     allocptr = (char *) TYPEALIGN(XLOG_BLCKSZ, allocptr);
 6812 tgl@sss.pgh.pa.us        4975                 :            898 :     XLogCtl->pages = allocptr;
 6586                          4976                 :            898 :     memset(XLogCtl->pages, 0, (Size) XLOG_BLCKSZ * XLOGbuffers);
                               4977                 :                : 
                               4978                 :                :     /*
                               4979                 :                :      * Do basic initialization of XLogCtl shared data. (StartupXLOG will fill
                               4980                 :                :      * in additional info.)
                               4981                 :                :      */
 8433                          4982                 :            898 :     XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
 1451 michael@paquier.xyz      4983                 :            898 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
 1021 noah@leadboat.com        4984                 :            898 :     XLogCtl->InstallXLogFileSegmentActive = false;
 4359 tgl@sss.pgh.pa.us        4985                 :            898 :     XLogCtl->WalWriterSleeping = false;
                               4986                 :                : 
 3933 heikki.linnakangas@i     4987                 :            898 :     SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 8233 tgl@sss.pgh.pa.us        4988                 :            898 :     SpinLockInit(&XLogCtl->info_lck);
    7 alvherre@alvh.no-ip.     4989                 :GNC         898 :     pg_atomic_init_u64(&XLogCtl->logInsertResult, InvalidXLogRecPtr);
    9                          4990                 :            898 :     pg_atomic_init_u64(&XLogCtl->logWriteResult, InvalidXLogRecPtr);
                               4991                 :            898 :     pg_atomic_init_u64(&XLogCtl->logFlushResult, InvalidXLogRecPtr);
   45 nathan@postgresql.or     4992                 :            898 :     pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr);
                               4993                 :                : }
                               4994                 :                : 
                               4995                 :                : /*
                               4996                 :                :  * This func must be called ONCE on system install.  It creates pg_control
                               4997                 :                :  * and the initial XLOG segment.
                               4998                 :                :  */
                               4999                 :                : void
 8433 tgl@sss.pgh.pa.us        5000                 :CBC          39 : BootStrapXLOG(void)
                               5001                 :                : {
                               5002                 :                :     CheckPoint  checkPoint;
                               5003                 :                :     char       *buffer;
                               5004                 :                :     XLogPageHeader page;
                               5005                 :                :     XLogLongPageHeader longpage;
                               5006                 :                :     XLogRecord *record;
                               5007                 :                :     char       *recptr;
                               5008                 :                :     uint64      sysidentifier;
                               5009                 :                :     struct timeval tv;
                               5010                 :                :     pg_crc32c   crc;
                               5011                 :                : 
                               5012                 :                :     /* allow ordinary WAL segment creation, like StartupXLOG() would */
  606 michael@paquier.xyz      5013                 :             39 :     SetInstallXLogFileSegmentActive();
                               5014                 :                : 
                               5015                 :                :     /*
                               5016                 :                :      * Select a hopefully-unique system identifier code for this installation.
                               5017                 :                :      * We use the result of gettimeofday(), including the fractional seconds
                               5018                 :                :      * field, as being about as unique as we can easily get.  (Think not to
                               5019                 :                :      * use random(), since it hasn't been seeded and there's no portable way
                               5020                 :                :      * to seed it other than the system clock value...)  The upper half of the
                               5021                 :                :      * uint64 value is just the tv_sec part, while the lower half contains the
                               5022                 :                :      * tv_usec part (which must fit in 20 bits), plus 12 bits from our current
                               5023                 :                :      * PID for a little extra uniqueness.  A person knowing this encoding can
                               5024                 :                :      * determine the initialization time of the installation, which could
                               5025                 :                :      * perhaps be useful sometimes.
                               5026                 :                :      */
 7368 tgl@sss.pgh.pa.us        5027                 :             39 :     gettimeofday(&tv, NULL);
                               5028                 :             39 :     sysidentifier = ((uint64) tv.tv_sec) << 32;
 3641                          5029                 :             39 :     sysidentifier |= ((uint64) tv.tv_usec) << 12;
                               5030                 :             39 :     sysidentifier |= getpid() & 0xFFF;
                               5031                 :                : 
                               5032                 :                :     /* page buffer must be aligned suitably for O_DIRECT */
 3933 heikki.linnakangas@i     5033                 :             39 :     buffer = (char *) palloc(XLOG_BLCKSZ + XLOG_BLCKSZ);
                               5034                 :             39 :     page = (XLogPageHeader) TYPEALIGN(XLOG_BLCKSZ, buffer);
 6586 tgl@sss.pgh.pa.us        5035                 :             39 :     memset(page, 0, XLOG_BLCKSZ);
                               5036                 :                : 
                               5037                 :                :     /*
                               5038                 :                :      * Set up information for the initial checkpoint record
                               5039                 :                :      *
                               5040                 :                :      * The initial checkpoint record is written to the beginning of the WAL
                               5041                 :                :      * segment with logid=0 logseg=1. The very first WAL segment, 0/0, is not
                               5042                 :                :      * used, so that we can use 0/0 to mean "before any valid WAL segment".
                               5043                 :                :      */
 2399 andres@anarazel.de       5044                 :             39 :     checkPoint.redo = wal_segment_size + SizeOfXLogLongPHD;
  891 rhaas@postgresql.org     5045                 :             39 :     checkPoint.ThisTimeLineID = BootstrapTimeLineID;
                               5046                 :             39 :     checkPoint.PrevTimeLineID = BootstrapTimeLineID;
 4463 simon@2ndQuadrant.co     5047                 :             39 :     checkPoint.fullPageWrites = fullPageWrites;
                               5048                 :                :     checkPoint.nextXid =
 1844 tmunro@postgresql.or     5049                 :             39 :         FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
 1004 tgl@sss.pgh.pa.us        5050                 :             39 :     checkPoint.nextOid = FirstGenbkiObjectId;
 6926                          5051                 :             39 :     checkPoint.nextMulti = FirstMultiXactId;
 6885                          5052                 :             39 :     checkPoint.nextMultiOffset = 0;
 5340                          5053                 :             39 :     checkPoint.oldestXid = FirstNormalTransactionId;
  724                          5054                 :             39 :     checkPoint.oldestXidDB = Template1DbOid;
 4099 alvherre@alvh.no-ip.     5055                 :             39 :     checkPoint.oldestMulti = FirstMultiXactId;
  724 tgl@sss.pgh.pa.us        5056                 :             39 :     checkPoint.oldestMultiDB = Template1DbOid;
 3030 mail@joeconway.com       5057                 :             39 :     checkPoint.oldestCommitTsXid = InvalidTransactionId;
                               5058                 :             39 :     checkPoint.newestCommitTsXid = InvalidTransactionId;
 5901 tgl@sss.pgh.pa.us        5059                 :             39 :     checkPoint.time = (pg_time_t) time(NULL);
 5230 simon@2ndQuadrant.co     5060                 :             39 :     checkPoint.oldestActiveXid = InvalidTransactionId;
                               5061                 :                : 
  128 heikki.linnakangas@i     5062                 :GNC          39 :     TransamVariables->nextXid = checkPoint.nextXid;
                               5063                 :             39 :     TransamVariables->nextOid = checkPoint.nextOid;
                               5064                 :             39 :     TransamVariables->oidCount = 0;
 6885 tgl@sss.pgh.pa.us        5065                 :CBC          39 :     MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 2579 rhaas@postgresql.org     5066                 :             39 :     AdvanceOldestClogXid(checkPoint.oldestXid);
 5170 tgl@sss.pgh.pa.us        5067                 :             39 :     SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
 2588                          5068                 :             39 :     SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB, true);
 3420 alvherre@alvh.no-ip.     5069                 :             39 :     SetCommitTsLimit(InvalidTransactionId, InvalidTransactionId);
                               5070                 :                : 
                               5071                 :                :     /* Set up the XLOG page header */
 8957 vadim4o@yahoo.com        5072                 :             39 :     page->xlp_magic = XLOG_PAGE_MAGIC;
 7207 tgl@sss.pgh.pa.us        5073                 :             39 :     page->xlp_info = XLP_LONG_HEADER;
  891 rhaas@postgresql.org     5074                 :             39 :     page->xlp_tli = BootstrapTimeLineID;
 2399 andres@anarazel.de       5075                 :             39 :     page->xlp_pageaddr = wal_segment_size;
 7207 tgl@sss.pgh.pa.us        5076                 :             39 :     longpage = (XLogLongPageHeader) page;
                               5077                 :             39 :     longpage->xlp_sysid = sysidentifier;
 2399 andres@anarazel.de       5078                 :             39 :     longpage->xlp_seg_size = wal_segment_size;
 6584 tgl@sss.pgh.pa.us        5079                 :             39 :     longpage->xlp_xlog_blcksz = XLOG_BLCKSZ;
                               5080                 :                : 
                               5081                 :                :     /* Insert the initial checkpoint record */
 3433 heikki.linnakangas@i     5082                 :             39 :     recptr = ((char *) page + SizeOfXLogLongPHD);
                               5083                 :             39 :     record = (XLogRecord *) recptr;
 4312                          5084                 :             39 :     record->xl_prev = 0;
 8957 vadim4o@yahoo.com        5085                 :             39 :     record->xl_xid = InvalidTransactionId;
 3433 heikki.linnakangas@i     5086                 :             39 :     record->xl_tot_len = SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(checkPoint);
 8433 tgl@sss.pgh.pa.us        5087                 :             39 :     record->xl_info = XLOG_CHECKPOINT_SHUTDOWN;
 8957 vadim4o@yahoo.com        5088                 :             39 :     record->xl_rmid = RM_XLOG_ID;
 3433 heikki.linnakangas@i     5089                 :             39 :     recptr += SizeOfXLogRecord;
                               5090                 :                :     /* fill the XLogRecordDataHeaderShort struct */
 2574 tgl@sss.pgh.pa.us        5091                 :             39 :     *(recptr++) = (char) XLR_BLOCK_ID_DATA_SHORT;
 3433 heikki.linnakangas@i     5092                 :             39 :     *(recptr++) = sizeof(checkPoint);
                               5093                 :             39 :     memcpy(recptr, &checkPoint, sizeof(checkPoint));
                               5094                 :             39 :     recptr += sizeof(checkPoint);
                               5095         [ -  + ]:             39 :     Assert(recptr - (char *) record == record->xl_tot_len);
                               5096                 :                : 
 3449                          5097                 :             39 :     INIT_CRC32C(crc);
 3433                          5098                 :             39 :     COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record->xl_tot_len - SizeOfXLogRecord);
 3449                          5099                 :             39 :     COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
                               5100                 :             39 :     FIN_CRC32C(crc);
 8508 vadim4o@yahoo.com        5101                 :             39 :     record->xl_crc = crc;
                               5102                 :                : 
                               5103                 :                :     /* Create first XLOG segment file */
  891 rhaas@postgresql.org     5104                 :             39 :     openLogTLI = BootstrapTimeLineID;
                               5105                 :             39 :     openLogFile = XLogFileInit(1, BootstrapTimeLineID);
                               5106                 :                : 
                               5107                 :                :     /*
                               5108                 :                :      * We needn't bother with Reserve/ReleaseExternalFD here, since we'll
                               5109                 :                :      * close the file again in a moment.
                               5110                 :                :      */
                               5111                 :                : 
                               5112                 :                :     /* Write the first page with the initial record */
 8348 tgl@sss.pgh.pa.us        5113                 :             39 :     errno = 0;
 2584 rhaas@postgresql.org     5114                 :             39 :     pgstat_report_wait_start(WAIT_EVENT_WAL_BOOTSTRAP_WRITE);
 6586 tgl@sss.pgh.pa.us        5115         [ -  + ]:             39 :     if (write(openLogFile, page, XLOG_BLCKSZ) != XLOG_BLCKSZ)
                               5116                 :                :     {
                               5117                 :                :         /* if write didn't set errno, assume problem is no disk space */
 8348 tgl@sss.pgh.pa.us        5118         [ #  # ]:UBC           0 :         if (errno == 0)
                               5119                 :              0 :             errno = ENOSPC;
 7573                          5120         [ #  # ]:              0 :         ereport(PANIC,
                               5121                 :                :                 (errcode_for_file_access(),
                               5122                 :                :                  errmsg("could not write bootstrap write-ahead log file: %m")));
                               5123                 :                :     }
 2584 rhaas@postgresql.org     5124                 :CBC          39 :     pgstat_report_wait_end();
                               5125                 :                : 
                               5126                 :             39 :     pgstat_report_wait_start(WAIT_EVENT_WAL_BOOTSTRAP_SYNC);
 8433 tgl@sss.pgh.pa.us        5127         [ -  + ]:             39 :     if (pg_fsync(openLogFile) != 0)
 7573 tgl@sss.pgh.pa.us        5128         [ #  # ]:UBC           0 :         ereport(PANIC,
                               5129                 :                :                 (errcode_for_file_access(),
                               5130                 :                :                  errmsg("could not fsync bootstrap write-ahead log file: %m")));
 2584 rhaas@postgresql.org     5131                 :CBC          39 :     pgstat_report_wait_end();
                               5132                 :                : 
 1744 peter@eisentraut.org     5133         [ -  + ]:             39 :     if (close(openLogFile) != 0)
 7384 tgl@sss.pgh.pa.us        5134         [ #  # ]:UBC           0 :         ereport(PANIC,
                               5135                 :                :                 (errcode_for_file_access(),
                               5136                 :                :                  errmsg("could not close bootstrap write-ahead log file: %m")));
                               5137                 :                : 
 8433 tgl@sss.pgh.pa.us        5138                 :CBC          39 :     openLogFile = -1;
                               5139                 :                : 
                               5140                 :                :     /* Now create pg_control */
 1518 peter@eisentraut.org     5141                 :             39 :     InitControlFile(sysidentifier);
 8433 tgl@sss.pgh.pa.us        5142                 :             39 :     ControlFile->time = checkPoint.time;
 8957 vadim4o@yahoo.com        5143                 :             39 :     ControlFile->checkPoint = checkPoint.redo;
 8433 tgl@sss.pgh.pa.us        5144                 :             39 :     ControlFile->checkPointCopy = checkPoint;
                               5145                 :                : 
                               5146                 :                :     /* some additional ControlFile fields are set in WriteControlFile() */
 8541                          5147                 :             39 :     WriteControlFile();
                               5148                 :                : 
                               5149                 :                :     /* Bootstrap the commit log, too */
 8268                          5150                 :             39 :     BootStrapCLOG();
 3420 alvherre@alvh.no-ip.     5151                 :             39 :     BootStrapCommitTs();
 7227 tgl@sss.pgh.pa.us        5152                 :             39 :     BootStrapSUBTRANS();
 6926                          5153                 :             39 :     BootStrapMultiXact();
                               5154                 :                : 
 6812                          5155                 :             39 :     pfree(buffer);
                               5156                 :                : 
                               5157                 :                :     /*
                               5158                 :                :      * Force control file to be read - in contrast to normal processing we'd
                               5159                 :                :      * otherwise never run the checks and GUC related initializations therein.
                               5160                 :                :      */
 2405 andres@anarazel.de       5161                 :             39 :     ReadControlFile();
 8957 vadim4o@yahoo.com        5162                 :             39 : }
                               5163                 :                : 
                               5164                 :                : static char *
 6098 tgl@sss.pgh.pa.us        5165                 :            745 : str_time(pg_time_t tnow)
                               5166                 :                : {
                               5167                 :                :     static char buf[128];
                               5168                 :                : 
                               5169                 :            745 :     pg_strftime(buf, sizeof(buf),
                               5170                 :                :                 "%Y-%m-%d %H:%M:%S %Z",
                               5171                 :            745 :                 pg_localtime(&tnow, log_timezone));
                               5172                 :                : 
 8545 peter_e@gmx.net          5173                 :            745 :     return buf;
                               5174                 :                : }
                               5175                 :                : 
                               5176                 :                : /*
                               5177                 :                :  * Initialize the first WAL segment on new timeline.
                               5178                 :                :  */
                               5179                 :                : static void
  788 heikki.linnakangas@i     5180                 :             46 : XLogInitNewTimeline(TimeLineID endTLI, XLogRecPtr endOfLog, TimeLineID newTLI)
                               5181                 :                : {
                               5182                 :                :     char        xlogfname[MAXFNAMELEN];
                               5183                 :                :     XLogSegNo   endLogSegNo;
                               5184                 :                :     XLogSegNo   startLogSegNo;
                               5185                 :                : 
                               5186                 :                :     /* we always switch to a new timeline after archive recovery */
  891 rhaas@postgresql.org     5187         [ -  + ]:             46 :     Assert(endTLI != newTLI);
                               5188                 :                : 
                               5189                 :                :     /*
                               5190                 :                :      * Update min recovery point one last time.
                               5191                 :                :      */
 5407 heikki.linnakangas@i     5192                 :             46 :     UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
                               5193                 :                : 
                               5194                 :                :     /*
                               5195                 :                :      * Calculate the last segment on the old timeline, and the first segment
                               5196                 :                :      * on the new timeline. If the switch happens in the middle of a segment,
                               5197                 :                :      * they are the same, but if the switch happens exactly at a segment
                               5198                 :                :      * boundary, startLogSegNo will be endLogSegNo + 1.
                               5199                 :                :      */
 2399 andres@anarazel.de       5200                 :             46 :     XLByteToPrevSeg(endOfLog, endLogSegNo, wal_segment_size);
                               5201                 :             46 :     XLByteToSeg(endOfLog, startLogSegNo, wal_segment_size);
                               5202                 :                : 
                               5203                 :                :     /*
                               5204                 :                :      * Initialize the starting WAL segment for the new timeline. If the switch
                               5205                 :                :      * happens in the middle of a segment, copy data from the last WAL segment
                               5206                 :                :      * of the old timeline up to the switch point, to the starting WAL segment
                               5207                 :                :      * on the new timeline.
                               5208                 :                :      */
 3405 heikki.linnakangas@i     5209         [ +  + ]:             46 :     if (endLogSegNo == startLogSegNo)
                               5210                 :                :     {
                               5211                 :                :         /*
                               5212                 :                :          * Make a copy of the file on the new timeline.
                               5213                 :                :          *
                               5214                 :                :          * Writing WAL isn't allowed yet, so there are no locking
                               5215                 :                :          * considerations. But we should be just as tense as XLogFileInit to
                               5216                 :                :          * avoid emplacing a bogus file.
                               5217                 :                :          */
  891 rhaas@postgresql.org     5218                 :             36 :         XLogFileCopy(newTLI, endLogSegNo, endTLI, endLogSegNo,
 2399 andres@anarazel.de       5219                 :             36 :                      XLogSegmentOffset(endOfLog, wal_segment_size));
                               5220                 :                :     }
                               5221                 :                :     else
                               5222                 :                :     {
                               5223                 :                :         /*
                               5224                 :                :          * The switch happened at a segment boundary, so just create the next
                               5225                 :                :          * segment on the new timeline.
                               5226                 :                :          */
                               5227                 :                :         int         fd;
                               5228                 :                : 
  891 rhaas@postgresql.org     5229                 :             10 :         fd = XLogFileInit(startLogSegNo, newTLI);
                               5230                 :                : 
 1744 peter@eisentraut.org     5231         [ -  + ]:             10 :         if (close(fd) != 0)
                               5232                 :                :         {
 1594 michael@paquier.xyz      5233                 :UBC           0 :             int         save_errno = errno;
                               5234                 :                : 
  891 rhaas@postgresql.org     5235                 :              0 :             XLogFileName(xlogfname, newTLI, startLogSegNo, wal_segment_size);
 1594 michael@paquier.xyz      5236                 :              0 :             errno = save_errno;
 3402 heikki.linnakangas@i     5237         [ #  # ]:              0 :             ereport(ERROR,
                               5238                 :                :                     (errcode_for_file_access(),
                               5239                 :                :                      errmsg("could not close file \"%s\": %m", xlogfname)));
                               5240                 :                :         }
                               5241                 :                :     }
                               5242                 :                : 
                               5243                 :                :     /*
                               5244                 :                :      * Let's just make real sure there are not .ready or .done flags posted
                               5245                 :                :      * for the new segment.
                               5246                 :                :      */
  891 rhaas@postgresql.org     5247                 :CBC          46 :     XLogFileName(xlogfname, newTLI, startLogSegNo, wal_segment_size);
 3461 fujii@postgresql.org     5248                 :             46 :     XLogArchiveCleanup(xlogfname);
 7209 tgl@sss.pgh.pa.us        5249                 :             46 : }
                               5250                 :                : 
                               5251                 :                : /*
                               5252                 :                :  * Perform cleanup actions at the conclusion of archive recovery.
                               5253                 :                :  */
                               5254                 :                : static void
  891 rhaas@postgresql.org     5255                 :             46 : CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog,
                               5256                 :                :                             TimeLineID newTLI)
                               5257                 :                : {
                               5258                 :                :     /*
                               5259                 :                :      * Execute the recovery_end_command, if any.
                               5260                 :                :      */
  914                          5261   [ +  -  +  + ]:             46 :     if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
  433 michael@paquier.xyz      5262                 :              2 :         ExecuteRecoveryCommand(recoveryEndCommand,
                               5263                 :                :                                "recovery_end_command",
                               5264                 :                :                                true,
                               5265                 :                :                                WAIT_EVENT_RECOVERY_END_COMMAND);
                               5266                 :                : 
                               5267                 :                :     /*
                               5268                 :                :      * We switched to a new timeline. Clean up segments on the old timeline.
                               5269                 :                :      *
                               5270                 :                :      * If there are any higher-numbered segments on the old timeline, remove
                               5271                 :                :      * them. They might contain valid WAL, but they might also be
                               5272                 :                :      * pre-allocated files containing garbage. In any case, they are not part
                               5273                 :                :      * of the new timeline's history so we don't need them.
                               5274                 :                :      */
  891 rhaas@postgresql.org     5275                 :             46 :     RemoveNonParentXlogFiles(EndOfLog, newTLI);
                               5276                 :                : 
                               5277                 :                :     /*
                               5278                 :                :      * If the switch happened in the middle of a segment, what to do with the
                               5279                 :                :      * last, partial segment on the old timeline? If we don't archive it, and
                               5280                 :                :      * the server that created the WAL never archives it either (e.g. because
                               5281                 :                :      * it was hit by a meteor), it will never make it to the archive. That's
                               5282                 :                :      * OK from our point of view, because the new segment that we created with
                               5283                 :                :      * the new TLI contains all the WAL from the old timeline up to the switch
                               5284                 :                :      * point. But if you later try to do PITR to the "missing" WAL on the old
                               5285                 :                :      * timeline, recovery won't find it in the archive. It's physically
                               5286                 :                :      * present in the new file with new TLI, but recovery won't look there
                               5287                 :                :      * when it's recovering to the older timeline. On the other hand, if we
                               5288                 :                :      * archive the partial segment, and the original server on that timeline
                               5289                 :                :      * is still running and archives the completed version of the same segment
                               5290                 :                :      * later, it will fail. (We used to do that in 9.4 and below, and it
                               5291                 :                :      * caused such problems).
                               5292                 :                :      *
                               5293                 :                :      * As a compromise, we rename the last segment with the .partial suffix,
                               5294                 :                :      * and archive it. Archive recovery will never try to read .partial
                               5295                 :                :      * segments, so they will normally go unused. But in the odd PITR case,
                               5296                 :                :      * the administrator can copy them manually to the pg_wal directory
                               5297                 :                :      * (removing the suffix). They can be useful in debugging, too.
                               5298                 :                :      *
                               5299                 :                :      * If a .done or .ready file already exists for the old timeline, however,
                               5300                 :                :      * we had already determined that the segment is complete, so we can let
                               5301                 :                :      * it be archived normally. (In particular, if it was restored from the
                               5302                 :                :      * archive to begin with, it's expected to have a .done file).
                               5303                 :                :      */
  914                          5304   [ +  +  +  + ]:             82 :     if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
                               5305   [ +  +  -  + ]:             36 :         XLogArchivingActive())
                               5306                 :                :     {
                               5307                 :                :         char        origfname[MAXFNAMELEN];
                               5308                 :                :         XLogSegNo   endLogSegNo;
                               5309                 :                : 
                               5310                 :              7 :         XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
                               5311                 :              7 :         XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
                               5312                 :                : 
                               5313         [ +  + ]:              7 :         if (!XLogArchiveIsReadyOrDone(origfname))
                               5314                 :                :         {
                               5315                 :                :             char        origpath[MAXPGPATH];
                               5316                 :                :             char        partialfname[MAXFNAMELEN];
                               5317                 :                :             char        partialpath[MAXPGPATH];
                               5318                 :                : 
                               5319                 :              4 :             XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
                               5320                 :              4 :             snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
                               5321                 :              4 :             snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
                               5322                 :                : 
                               5323                 :                :             /*
                               5324                 :                :              * Make sure there's no .done or .ready file for the .partial
                               5325                 :                :              * file.
                               5326                 :                :              */
                               5327                 :              4 :             XLogArchiveCleanup(partialfname);
                               5328                 :                : 
                               5329                 :              4 :             durable_rename(origpath, partialpath, ERROR);
                               5330                 :              4 :             XLogArchiveNotify(partialfname);
                               5331                 :                :         }
                               5332                 :                :     }
                               5333                 :             46 : }
                               5334                 :                : 
                               5335                 :                : /*
                               5336                 :                :  * Check to see if required parameters are set high enough on this server
                               5337                 :                :  * for various aspects of recovery operation.
                               5338                 :                :  *
                               5339                 :                :  * Note that all the parameters which this function tests need to be
                               5340                 :                :  * listed in Administrator's Overview section in high-availability.sgml.
                               5341                 :                :  * If you change them, don't forget to update the list.
                               5342                 :                :  */
                               5343                 :                : static void
  788 heikki.linnakangas@i     5344                 :            273 : CheckRequiredParameterValues(void)
                               5345                 :                : {
                               5346                 :                :     /*
                               5347                 :                :      * For archive recovery, the WAL must be generated with at least 'replica'
                               5348                 :                :      * wal_level.
                               5349                 :                :      */
                               5350   [ +  +  +  + ]:            273 :     if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
                               5351                 :                :     {
                               5352         [ +  - ]:              2 :         ereport(FATAL,
                               5353                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5354                 :                :                  errmsg("WAL was generated with wal_level=minimal, cannot continue recovering"),
                               5355                 :                :                  errdetail("This happens if you temporarily set wal_level=minimal on the server."),
                               5356                 :                :                  errhint("Use a backup taken after setting wal_level to higher than minimal.")));
                               5357                 :                :     }
                               5358                 :                : 
                               5359                 :                :     /*
                               5360                 :                :      * For Hot Standby, the WAL must be generated with 'replica' mode, and we
                               5361                 :                :      * must have at least as many backend slots as the primary.
                               5362                 :                :      */
 3693                          5363   [ +  +  +  + ]:            271 :     if (ArchiveRecoveryRequested && EnableHotStandby)
                               5364                 :                :     {
                               5365                 :                :         /* We ignore autovacuum_max_workers when we make this test. */
 5100                          5366                 :            153 :         RecoveryRequiresIntParameter("max_connections",
                               5367                 :                :                                      MaxConnections,
 5099 tgl@sss.pgh.pa.us        5368                 :            153 :                                      ControlFile->MaxConnections);
 3937 rhaas@postgresql.org     5369                 :            153 :         RecoveryRequiresIntParameter("max_worker_processes",
                               5370                 :                :                                      max_worker_processes,
                               5371                 :            153 :                                      ControlFile->max_worker_processes);
 1888 michael@paquier.xyz      5372                 :            153 :         RecoveryRequiresIntParameter("max_wal_senders",
                               5373                 :                :                                      max_wal_senders,
                               5374                 :            153 :                                      ControlFile->max_wal_senders);
 4239 tgl@sss.pgh.pa.us        5375                 :            153 :         RecoveryRequiresIntParameter("max_prepared_transactions",
                               5376                 :                :                                      max_prepared_xacts,
 5099                          5377                 :            153 :                                      ControlFile->max_prepared_xacts);
 4239                          5378                 :            153 :         RecoveryRequiresIntParameter("max_locks_per_transaction",
                               5379                 :                :                                      max_locks_per_xact,
 5099                          5380                 :            153 :                                      ControlFile->max_locks_per_xact);
                               5381                 :                :     }
 5230 simon@2ndQuadrant.co     5382                 :            271 : }
                               5383                 :                : 
                               5384                 :                : /*
                               5385                 :                :  * This must be called ONCE during postmaster or standalone-backend startup
                               5386                 :                :  */
                               5387                 :                : void
 8433 tgl@sss.pgh.pa.us        5388                 :            823 : StartupXLOG(void)
                               5389                 :                : {
                               5390                 :                :     XLogCtlInsert *Insert;
                               5391                 :                :     CheckPoint  checkPoint;
                               5392                 :                :     bool        wasShutdown;
                               5393                 :                :     bool        didCrash;
                               5394                 :                :     bool        haveTblspcMap;
                               5395                 :                :     bool        haveBackupLabel;
                               5396                 :                :     XLogRecPtr  EndOfLog;
                               5397                 :                :     TimeLineID  EndOfLogTLI;
                               5398                 :                :     TimeLineID  newTLI;
                               5399                 :                :     bool        performedWalRecovery;
                               5400                 :                :     EndOfWalRecoveryInfo *endOfRecoveryInfo;
                               5401                 :                :     XLogRecPtr  abortedRecPtr;
                               5402                 :                :     XLogRecPtr  missingContrecPtr;
                               5403                 :                :     TransactionId oldestActiveXID;
 1355 fujii@postgresql.org     5404                 :            823 :     bool        promoted = false;
                               5405                 :                : 
                               5406                 :                :     /*
                               5407                 :                :      * We should have an aux process resource owner to use, and we should not
                               5408                 :                :      * be in a transaction that's installed some other resowner.
                               5409                 :                :      */
 2097 tgl@sss.pgh.pa.us        5410         [ -  + ]:            823 :     Assert(AuxProcessResourceOwner != NULL);
                               5411   [ +  -  -  + ]:            823 :     Assert(CurrentResourceOwner == NULL ||
                               5412                 :                :            CurrentResourceOwner == AuxProcessResourceOwner);
                               5413                 :            823 :     CurrentResourceOwner = AuxProcessResourceOwner;
                               5414                 :                : 
                               5415                 :                :     /*
                               5416                 :                :      * Check that contents look valid.
                               5417                 :                :      */
 1619 peter@eisentraut.org     5418         [ -  + ]:            823 :     if (!XRecOffIsValid(ControlFile->checkPoint))
 7573 tgl@sss.pgh.pa.us        5419         [ #  # ]:UBC           0 :         ereport(FATAL,
                               5420                 :                :                 (errcode(ERRCODE_DATA_CORRUPTED),
                               5421                 :                :                  errmsg("control file contains invalid checkpoint location")));
                               5422                 :                : 
 1619 peter@eisentraut.org     5423   [ +  +  -  -  :CBC         823 :     switch (ControlFile->state)
                                           +  +  - ]
                               5424                 :                :     {
                               5425                 :            591 :         case DB_SHUTDOWNED:
                               5426                 :                : 
                               5427                 :                :             /*
                               5428                 :                :              * This is the expected case, so don't be chatty in standalone
                               5429                 :                :              * mode
                               5430                 :                :              */
                               5431   [ +  +  +  + ]:            591 :             ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                               5432                 :                :                     (errmsg("database system was shut down at %s",
                               5433                 :                :                             str_time(ControlFile->time))));
                               5434                 :            591 :             break;
                               5435                 :                : 
                               5436                 :             31 :         case DB_SHUTDOWNED_IN_RECOVERY:
                               5437         [ +  - ]:             31 :             ereport(LOG,
                               5438                 :                :                     (errmsg("database system was shut down in recovery at %s",
                               5439                 :                :                             str_time(ControlFile->time))));
                               5440                 :             31 :             break;
                               5441                 :                : 
 1619 peter@eisentraut.org     5442                 :UBC           0 :         case DB_SHUTDOWNING:
                               5443         [ #  # ]:              0 :             ereport(LOG,
                               5444                 :                :                     (errmsg("database system shutdown was interrupted; last known up at %s",
                               5445                 :                :                             str_time(ControlFile->time))));
                               5446                 :              0 :             break;
                               5447                 :                : 
                               5448                 :              0 :         case DB_IN_CRASH_RECOVERY:
                               5449         [ #  # ]:              0 :             ereport(LOG,
                               5450                 :                :                     (errmsg("database system was interrupted while in recovery at %s",
                               5451                 :                :                             str_time(ControlFile->time)),
                               5452                 :                :                      errhint("This probably means that some data is corrupted and"
                               5453                 :                :                              " you will have to use the last backup for recovery.")));
                               5454                 :              0 :             break;
                               5455                 :                : 
 1619 peter@eisentraut.org     5456                 :CBC          22 :         case DB_IN_ARCHIVE_RECOVERY:
                               5457         [ +  - ]:             22 :             ereport(LOG,
                               5458                 :                :                     (errmsg("database system was interrupted while in recovery at log time %s",
                               5459                 :                :                             str_time(ControlFile->checkPointCopy.time)),
                               5460                 :                :                      errhint("If this has occurred more than once some data might be corrupted"
                               5461                 :                :                              " and you might need to choose an earlier recovery target.")));
                               5462                 :             22 :             break;
                               5463                 :                : 
                               5464                 :            179 :         case DB_IN_PRODUCTION:
                               5465         [ +  - ]:            179 :             ereport(LOG,
                               5466                 :                :                     (errmsg("database system was interrupted; last known up at %s",
                               5467                 :                :                             str_time(ControlFile->time))));
                               5468                 :            179 :             break;
                               5469                 :                : 
 1619 peter@eisentraut.org     5470                 :UBC           0 :         default:
                               5471         [ #  # ]:              0 :             ereport(FATAL,
                               5472                 :                :                     (errcode(ERRCODE_DATA_CORRUPTED),
                               5473                 :                :                      errmsg("control file contains invalid database cluster state")));
                               5474                 :                :     }
                               5475                 :                : 
                               5476                 :                :     /* This is just to allow attaching to startup process with a debugger */
                               5477                 :                : #ifdef XLOG_REPLAY_DELAY
                               5478                 :                :     if (ControlFile->state != DB_SHUTDOWNED)
                               5479                 :                :         pg_usleep(60000000L);
                               5480                 :                : #endif
                               5481                 :                : 
                               5482                 :                :     /*
                               5483                 :                :      * Verify that pg_wal, pg_wal/archive_status, and pg_wal/summaries exist.
                               5484                 :                :      * In cases where someone has performed a copy for PITR, these directories
                               5485                 :                :      * may have been excluded and need to be re-created.
                               5486                 :                :      */
 5635 tgl@sss.pgh.pa.us        5487                 :CBC         823 :     ValidateXLOGDirectoryStructure();
                               5488                 :                : 
                               5489                 :                :     /* Set up timeout handler needed to report startup progress. */
  902 rhaas@postgresql.org     5490         [ +  + ]:            823 :     if (!IsBootstrapProcessingMode())
                               5491                 :            784 :         RegisterTimeout(STARTUP_PROGRESS_TIMEOUT,
                               5492                 :                :                         startup_progress_timeout_handler);
                               5493                 :                : 
                               5494                 :                :     /*----------
                               5495                 :                :      * If we previously crashed, perform a couple of actions:
                               5496                 :                :      *
                               5497                 :                :      * - The pg_wal directory may still include some temporary WAL segments
                               5498                 :                :      *   used when creating a new segment, so perform some clean up to not
                               5499                 :                :      *   bloat this path.  This is done first as there is no point to sync
                               5500                 :                :      *   this temporary data.
                               5501                 :                :      *
                               5502                 :                :      * - There might be data which we had written, intending to fsync it, but
                               5503                 :                :      *   which we had not actually fsync'd yet.  Therefore, a power failure in
                               5504                 :                :      *   the near future might cause earlier unflushed writes to be lost, even
                               5505                 :                :      *   though more recent data written to disk from here on would be
                               5506                 :                :      *   persisted.  To avoid that, fsync the entire data directory.
                               5507                 :                :      */
  788 heikki.linnakangas@i     5508         [ +  + ]:            823 :     if (ControlFile->state != DB_SHUTDOWNED &&
                               5509         [ +  + ]:            232 :         ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
                               5510                 :                :     {
                               5511                 :            201 :         RemoveTempXlogFiles();
                               5512                 :            201 :         SyncDataDirectory();
  739 andres@anarazel.de       5513                 :            201 :         didCrash = true;
                               5514                 :                :     }
                               5515                 :                :     else
                               5516                 :            622 :         didCrash = false;
                               5517                 :                : 
                               5518                 :                :     /*
                               5519                 :                :      * Prepare for WAL recovery if needed.
                               5520                 :                :      *
                               5521                 :                :      * InitWalRecovery analyzes the control file and the backup label file, if
                               5522                 :                :      * any.  It updates the in-memory ControlFile buffer according to the
                               5523                 :                :      * starting checkpoint, and sets InRecovery and ArchiveRecoveryRequested.
                               5524                 :                :      * It also applies the tablespace map file, if any.
                               5525                 :                :      */
  788 heikki.linnakangas@i     5526                 :            823 :     InitWalRecovery(ControlFile, &wasShutdown,
                               5527                 :                :                     &haveBackupLabel, &haveTblspcMap);
                               5528                 :            823 :     checkPoint = ControlFile->checkPointCopy;
                               5529                 :                : 
                               5530                 :                :     /* initialize shared memory variables from the checkpoint record */
  128 heikki.linnakangas@i     5531                 :GNC         823 :     TransamVariables->nextXid = checkPoint.nextXid;
                               5532                 :            823 :     TransamVariables->nextOid = checkPoint.nextOid;
                               5533                 :            823 :     TransamVariables->oidCount = 0;
 6885 tgl@sss.pgh.pa.us        5534                 :CBC         823 :     MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 2579 rhaas@postgresql.org     5535                 :            823 :     AdvanceOldestClogXid(checkPoint.oldestXid);
 5170 tgl@sss.pgh.pa.us        5536                 :            823 :     SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
 2588                          5537                 :            823 :     SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB, true);
 3030 mail@joeconway.com       5538                 :            823 :     SetCommitTsLimit(checkPoint.oldestCommitTsXid,
                               5539                 :                :                      checkPoint.newestCommitTsXid);
 1342 andres@anarazel.de       5540                 :            823 :     XLogCtl->ckptFullXid = checkPoint.nextXid;
                               5541                 :                : 
                               5542                 :                :     /*
                               5543                 :                :      * Clear out any old relcache cache files.  This is *necessary* if we do
                               5544                 :                :      * any WAL replay, since that would probably result in the cache files
                               5545                 :                :      * being out of sync with database reality.  In theory we could leave them
                               5546                 :                :      * in place if the database had been cleanly shut down, but it seems
                               5547                 :                :      * safest to just remove them always and let them be rebuilt during the
                               5548                 :                :      * first backend startup.  These files needs to be removed from all
                               5549                 :                :      * directories including pg_tblspc, however the symlinks are created only
                               5550                 :                :      * after reading tablespace_map file in case of archive recovery from
                               5551                 :                :      * backup, so needs to clear old relcache files here after creating
                               5552                 :                :      * symlinks.
                               5553                 :                :      */
  788 heikki.linnakangas@i     5554                 :            823 :     RelationCacheInitFileRemove();
                               5555                 :                : 
                               5556                 :                :     /*
                               5557                 :                :      * Initialize replication slots, before there's a chance to remove
                               5558                 :                :      * required resources.
                               5559                 :                :      */
 3594 andres@anarazel.de       5560                 :            823 :     StartupReplicationSlots();
                               5561                 :                : 
                               5562                 :                :     /*
                               5563                 :                :      * Startup logical state, needs to be setup now so we have proper data
                               5564                 :                :      * during crash recovery.
                               5565                 :                :      */
 3695 rhaas@postgresql.org     5566                 :            823 :     StartupReorderBuffer();
                               5567                 :                : 
                               5568                 :                :     /*
                               5569                 :                :      * Startup CLOG. This must be done after TransamVariables->nextXid has
                               5570                 :                :      * been initialized and before we accept connections or begin WAL replay.
                               5571                 :                :      */
 1173                          5572                 :            823 :     StartupCLOG();
                               5573                 :                : 
                               5574                 :                :     /*
                               5575                 :                :      * Startup MultiXact. We need to do this early to be able to replay
                               5576                 :                :      * truncations.
                               5577                 :                :      */
 3789 alvherre@alvh.no-ip.     5578                 :            823 :     StartupMultiXact();
                               5579                 :                : 
                               5580                 :                :     /*
                               5581                 :                :      * Ditto for commit timestamps.  Activate the facility if the setting is
                               5582                 :                :      * enabled in the control file, as there should be no tracking of commit
                               5583                 :                :      * timestamps done when the setting was disabled.  This facility can be
                               5584                 :                :      * started or stopped when replaying a XLOG_PARAMETER_CHANGE record.
                               5585                 :                :      */
 2027 michael@paquier.xyz      5586         [ +  + ]:            823 :     if (ControlFile->track_commit_timestamp)
 3047 alvherre@alvh.no-ip.     5587                 :              9 :         StartupCommitTs();
                               5588                 :                : 
                               5589                 :                :     /*
                               5590                 :                :      * Recover knowledge about replay progress of known replication partners.
                               5591                 :                :      */
 3273 andres@anarazel.de       5592                 :            823 :     StartupReplicationOrigin();
                               5593                 :                : 
                               5594                 :                :     /*
                               5595                 :                :      * Initialize unlogged LSN. On a clean shutdown, it's restored from the
                               5596                 :                :      * control file. On recovery, all unlogged relations are blown away, so
                               5597                 :                :      * the unlogged LSN counter can be reset too.
                               5598                 :                :      */
 4080 heikki.linnakangas@i     5599         [ +  + ]:            823 :     if (ControlFile->state == DB_SHUTDOWNED)
   45 nathan@postgresql.or     5600                 :GNC         581 :         pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
                               5601                 :            581 :                                        ControlFile->unloggedLSN);
                               5602                 :                :     else
                               5603                 :            242 :         pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
                               5604                 :                :                                        FirstNormalUnloggedLSN);
                               5605                 :                : 
                               5606                 :                :     /*
                               5607                 :                :      * Copy any missing timeline history files between 'now' and the recovery
                               5608                 :                :      * target timeline from archive to pg_wal. While we don't need those files
                               5609                 :                :      * ourselves - the history file of the recovery target timeline covers all
                               5610                 :                :      * the previous timelines in the history too - a cascading standby server
                               5611                 :                :      * might be interested in them. Or, if you archive the WAL from this
                               5612                 :                :      * server to a different archive than the primary, it'd be good for all
                               5613                 :                :      * the history files to get archived there after failover, so that you can
                               5614                 :                :      * use one of the old timelines as a PITR target. Timeline history files
                               5615                 :                :      * are small, so it's better to copy them unnecessarily than not copy them
                               5616                 :                :      * and regret later.
                               5617                 :                :      */
  788 heikki.linnakangas@i     5618                 :CBC         823 :     restoreTimeLineHistoryFiles(checkPoint.ThisTimeLineID, recoveryTargetTLI);
                               5619                 :                : 
                               5620                 :                :     /*
                               5621                 :                :      * Before running in recovery, scan pg_twophase and fill in its status to
                               5622                 :                :      * be able to work on entries generated by redo.  Doing a scan before
                               5623                 :                :      * taking any recovery action has the merit to discard any 2PC files that
                               5624                 :                :      * are newer than the first record to replay, saving from any conflicts at
                               5625                 :                :      * replay.  This avoids as well any subsequent scans when doing recovery
                               5626                 :                :      * of the on-disk two-phase data.
                               5627                 :                :      */
 2567 simon@2ndQuadrant.co     5628                 :            823 :     restoreTwoPhaseData();
                               5629                 :                : 
                               5630                 :                :     /*
                               5631                 :                :      * When starting with crash recovery, reset pgstat data - it might not be
                               5632                 :                :      * valid. Otherwise restore pgstat data. It's safe to do this here,
                               5633                 :                :      * because postmaster will not yet have started any other processes.
                               5634                 :                :      *
                               5635                 :                :      * NB: Restoring replication slot stats relies on slot state to have
                               5636                 :                :      * already been restored from disk.
                               5637                 :                :      *
                               5638                 :                :      * TODO: With a bit of extra work we could just start with a pgstat file
                               5639                 :                :      * associated with the checkpoint redo location we're starting from.
                               5640                 :                :      */
  739 andres@anarazel.de       5641         [ +  + ]:            823 :     if (didCrash)
                               5642                 :            201 :         pgstat_discard_stats();
                               5643                 :                :     else
                               5644                 :            622 :         pgstat_restore_stats();
                               5645                 :                : 
 4463 simon@2ndQuadrant.co     5646                 :            823 :     lastFullPageWrites = checkPoint.fullPageWrites;
                               5647                 :                : 
 3933 heikki.linnakangas@i     5648                 :            823 :     RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
 3447                          5649                 :            823 :     doPageWrites = lastFullPageWrites;
                               5650                 :                : 
                               5651                 :                :     /* REDO */
 7193 tgl@sss.pgh.pa.us        5652         [ +  + ]:            823 :     if (InRecovery)
                               5653                 :                :     {
                               5654                 :                :         /* Initialize state for RecoveryInProgress() */
  788 heikki.linnakangas@i     5655         [ -  + ]:            242 :         SpinLockAcquire(&XLogCtl->info_lck);
                               5656         [ +  + ]:            242 :         if (InArchiveRecovery)
                               5657                 :            141 :             XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
                               5658                 :                :         else
                               5659                 :            101 :             XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
                               5660                 :            242 :         SpinLockRelease(&XLogCtl->info_lck);
                               5661                 :                : 
                               5662                 :                :         /*
                               5663                 :                :          * Update pg_control to show that we are recovering and to show the
                               5664                 :                :          * selected checkpoint as the place we are starting from. We also mark
                               5665                 :                :          * pg_control with any minimum recovery stop point obtained from a
                               5666                 :                :          * backup history file.
                               5667                 :                :          *
                               5668                 :                :          * No need to hold ControlFileLock yet, we aren't up far enough.
                               5669                 :                :          */
                               5670                 :            242 :         UpdateControlFile();
                               5671                 :                : 
                               5672                 :                :         /*
                               5673                 :                :          * If there was a backup label file, it's done its job and the info
                               5674                 :                :          * has now been propagated into pg_control.  We must get rid of the
                               5675                 :                :          * label file so that if we crash during recovery, we'll pick up at
                               5676                 :                :          * the latest recovery restartpoint instead of going all the way back
                               5677                 :                :          * to the backup start point.  It seems prudent though to just rename
                               5678                 :                :          * the file out of the way rather than delete it completely.
                               5679                 :                :          */
                               5680         [ +  + ]:            242 :         if (haveBackupLabel)
                               5681                 :                :         {
                               5682                 :             95 :             unlink(BACKUP_LABEL_OLD);
                               5683                 :             95 :             durable_rename(BACKUP_LABEL_FILE, BACKUP_LABEL_OLD, FATAL);
                               5684                 :                :         }
                               5685                 :                : 
                               5686                 :                :         /*
                               5687                 :                :          * If there was a tablespace_map file, it's done its job and the
                               5688                 :                :          * symlinks have been created.  We must get rid of the map file so
                               5689                 :                :          * that if we crash during recovery, we don't create symlinks again.
                               5690                 :                :          * It seems prudent though to just rename the file out of the way
                               5691                 :                :          * rather than delete it completely.
                               5692                 :                :          */
                               5693         [ +  + ]:            242 :         if (haveTblspcMap)
                               5694                 :                :         {
                               5695                 :              1 :             unlink(TABLESPACE_MAP_OLD);
                               5696                 :              1 :             durable_rename(TABLESPACE_MAP, TABLESPACE_MAP_OLD, FATAL);
                               5697                 :                :         }
                               5698                 :                : 
                               5699                 :                :         /*
                               5700                 :                :          * Initialize our local copy of minRecoveryPoint.  When doing crash
                               5701                 :                :          * recovery we want to replay up to the end of WAL.  Particularly, in
                               5702                 :                :          * the case of a promoted standby minRecoveryPoint value in the
                               5703                 :                :          * control file is only updated after the first checkpoint.  However,
                               5704                 :                :          * if the instance crashes before the first post-recovery checkpoint
                               5705                 :                :          * is completed then recovery will use a stale location causing the
                               5706                 :                :          * startup process to think that there are still invalid page
                               5707                 :                :          * references when checking for data consistency.
                               5708                 :                :          */
 2110 michael@paquier.xyz      5709         [ +  + ]:            242 :         if (InArchiveRecovery)
                               5710                 :                :         {
  788 heikki.linnakangas@i     5711                 :            141 :             LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               5712                 :            141 :             LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               5713                 :                :         }
                               5714                 :                :         else
                               5715                 :                :         {
                               5716                 :            101 :             LocalMinRecoveryPoint = InvalidXLogRecPtr;
                               5717                 :            101 :             LocalMinRecoveryPointTLI = 0;
                               5718                 :                :         }
                               5719                 :                : 
                               5720                 :                :         /* Check that the GUCs used to generate the WAL allow recovery */
 5100                          5721                 :            242 :         CheckRequiredParameterValues();
                               5722                 :                : 
                               5723                 :                :         /*
                               5724                 :                :          * We're in recovery, so unlogged relations may be trashed and must be
                               5725                 :                :          * reset.  This should be done BEFORE allowing Hot Standby
                               5726                 :                :          * connections, so that read-only backends don't try to read whatever
                               5727                 :                :          * garbage is left over from before.
                               5728                 :                :          */
 4855 rhaas@postgresql.org     5729                 :            242 :         ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
                               5730                 :                : 
                               5731                 :                :         /*
                               5732                 :                :          * Likewise, delete any saved transaction snapshot files that got left
                               5733                 :                :          * behind by crashed backends.
                               5734                 :                :          */
 4558 tgl@sss.pgh.pa.us        5735                 :            242 :         DeleteAllExportedSnapshotFiles();
                               5736                 :                : 
                               5737                 :                :         /*
                               5738                 :                :          * Initialize for Hot Standby, if enabled. We won't let backends in
                               5739                 :                :          * yet, not until we've reached the min recovery point specified in
                               5740                 :                :          * control file and we've established a recovery snapshot from a
                               5741                 :                :          * running-xacts WAL record.
                               5742                 :                :          */
 4069 heikki.linnakangas@i     5743   [ +  +  +  + ]:            242 :         if (ArchiveRecoveryRequested && EnableHotStandby)
                               5744                 :                :         {
                               5745                 :                :             TransactionId *xids;
                               5746                 :                :             int         nxids;
                               5747                 :                : 
 5175                          5748         [ +  + ]:            138 :             ereport(DEBUG1,
                               5749                 :                :                     (errmsg_internal("initializing for hot standby")));
                               5750                 :                : 
 5230 simon@2ndQuadrant.co     5751                 :            138 :             InitRecoveryTransactionEnvironment();
                               5752                 :                : 
                               5753         [ +  + ]:            138 :             if (wasShutdown)
                               5754                 :             25 :                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
                               5755                 :                :             else
                               5756                 :            113 :                 oldestActiveXID = checkPoint.oldestActiveXid;
                               5757         [ -  + ]:            138 :             Assert(TransactionIdIsValid(oldestActiveXID));
                               5758                 :                : 
                               5759                 :                :             /* Tell procarray about the range of xids it has to deal with */
  128 heikki.linnakangas@i     5760                 :GNC         138 :             ProcArrayInitRecovery(XidFromFullTransactionId(TransamVariables->nextXid));
                               5761                 :                : 
                               5762                 :                :             /*
                               5763                 :                :              * Startup subtrans only.  CLOG, MultiXact and commit timestamp
                               5764                 :                :              * have already been started up and other SLRUs are not maintained
                               5765                 :                :              * during recovery and need not be started yet.
                               5766                 :                :              */
 5230 simon@2ndQuadrant.co     5767                 :CBC         138 :             StartupSUBTRANS(oldestActiveXID);
                               5768                 :                : 
                               5769                 :                :             /*
                               5770                 :                :              * If we're beginning at a shutdown checkpoint, we know that
                               5771                 :                :              * nothing was running on the primary at this point. So fake-up an
                               5772                 :                :              * empty running-xacts record and use that here and now. Recover
                               5773                 :                :              * additional standby state for prepared transactions.
                               5774                 :                :              */
 5115 heikki.linnakangas@i     5775         [ +  + ]:            138 :             if (wasShutdown)
                               5776                 :                :             {
                               5777                 :                :                 RunningTransactionsData running;
                               5778                 :                :                 TransactionId latestCompletedXid;
                               5779                 :                : 
                               5780                 :                :                 /*
                               5781                 :                :                  * Construct a RunningTransactions snapshot representing a
                               5782                 :                :                  * shut down server, with only prepared transactions still
                               5783                 :                :                  * alive. We're never overflowed at this point because all
                               5784                 :                :                  * subxids are listed with their parent prepared transactions.
                               5785                 :                :                  */
                               5786                 :             25 :                 running.xcnt = nxids;
 4151 simon@2ndQuadrant.co     5787                 :             25 :                 running.subxcnt = 0;
 5115 heikki.linnakangas@i     5788                 :             25 :                 running.subxid_overflow = false;
 1342 andres@anarazel.de       5789                 :             25 :                 running.nextXid = XidFromFullTransactionId(checkPoint.nextXid);
 5115 heikki.linnakangas@i     5790                 :             25 :                 running.oldestRunningXid = oldestActiveXID;
 1342 andres@anarazel.de       5791                 :             25 :                 latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid);
 5085 simon@2ndQuadrant.co     5792         [ -  + ]:             25 :                 TransactionIdRetreat(latestCompletedXid);
 5084                          5793         [ -  + ]:             25 :                 Assert(TransactionIdIsNormal(latestCompletedXid));
 5085                          5794                 :             25 :                 running.latestCompletedXid = latestCompletedXid;
 5115 heikki.linnakangas@i     5795                 :             25 :                 running.xids = xids;
                               5796                 :                : 
                               5797                 :             25 :                 ProcArrayApplyRecoveryInfo(&running);
                               5798                 :                : 
 2544 simon@2ndQuadrant.co     5799                 :             25 :                 StandbyRecoverPreparedTransactions();
                               5800                 :                :             }
                               5801                 :                :         }
                               5802                 :                : 
                               5803                 :                :         /*
                               5804                 :                :          * We're all set for replaying the WAL now. Do it.
                               5805                 :                :          */
  788 heikki.linnakangas@i     5806                 :            242 :         PerformWalRecovery();
                               5807                 :            148 :         performedWalRecovery = true;
                               5808                 :                :     }
                               5809                 :                :     else
  784                          5810                 :            581 :         performedWalRecovery = false;
                               5811                 :                : 
                               5812                 :                :     /*
                               5813                 :                :      * Finish WAL recovery.
                               5814                 :                :      */
  788                          5815                 :            729 :     endOfRecoveryInfo = FinishWalRecovery();
                               5816                 :            729 :     EndOfLog = endOfRecoveryInfo->endOfLog;
                               5817                 :            729 :     EndOfLogTLI = endOfRecoveryInfo->endOfLogTLI;
                               5818                 :            729 :     abortedRecPtr = endOfRecoveryInfo->abortedRecPtr;
                               5819                 :            729 :     missingContrecPtr = endOfRecoveryInfo->missingContrecPtr;
                               5820                 :                : 
                               5821                 :                :     /*
                               5822                 :                :      * Reset ps status display, so as no information related to recovery shows
                               5823                 :                :      * up.
                               5824                 :                :      */
  570 michael@paquier.xyz      5825                 :            729 :     set_ps_display("");
                               5826                 :                : 
                               5827                 :                :     /*
                               5828                 :                :      * When recovering from a backup (we are in recovery, and archive recovery
                               5829                 :                :      * was requested), complain if we did not roll forward far enough to reach
                               5830                 :                :      * the point where the database is consistent.  For regular online
                               5831                 :                :      * backup-from-primary, that means reaching the end-of-backup WAL record
                               5832                 :                :      * (at which point we reset backupStartPoint to be Invalid), for
                               5833                 :                :      * backup-from-replica (which can't inject records into the WAL stream),
                               5834                 :                :      * that point is when we reach the minRecoveryPoint in pg_control (which
                               5835                 :                :      * we purposefully copy last when backing up from a replica).  For
                               5836                 :                :      * pg_rewind (which creates a backup_label with a method of "pg_rewind")
                               5837                 :                :      * or snapshot-style backups (which don't), backupEndRequired will be set
                               5838                 :                :      * to false.
                               5839                 :                :      *
                               5840                 :                :      * Note: it is indeed okay to look at the local variable
                               5841                 :                :      * LocalMinRecoveryPoint here, even though ControlFile->minRecoveryPoint
                               5842                 :                :      * might be further ahead --- ControlFile->minRecoveryPoint cannot have
                               5843                 :                :      * been advanced beyond the WAL we processed.
                               5844                 :                :      */
 4764 heikki.linnakangas@i     5845         [ +  + ]:            729 :     if (InRecovery &&
  788                          5846         [ +  - ]:            148 :         (EndOfLog < LocalMinRecoveryPoint ||
 5214                          5847         [ -  + ]:            148 :          !XLogRecPtrIsInvalid(ControlFile->backupStartPoint)))
                               5848                 :                :     {
                               5849                 :                :         /*
                               5850                 :                :          * Ran off end of WAL before reaching end-of-backup WAL record, or
                               5851                 :                :          * minRecoveryPoint. That's a bad sign, indicating that you tried to
                               5852                 :                :          * recover from an online backup but never called pg_backup_stop(), or
                               5853                 :                :          * you didn't archive all the WAL needed.
                               5854                 :                :          */
 4069 heikki.linnakangas@i     5855   [ #  #  #  # ]:UBC           0 :         if (ArchiveRecoveryRequested || ControlFile->backupEndRequired)
                               5856                 :                :         {
  739 sfrost@snowman.net       5857   [ #  #  #  # ]:              0 :             if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) || ControlFile->backupEndRequired)
 4631 heikki.linnakangas@i     5858         [ #  # ]:              0 :                 ereport(FATAL,
                               5859                 :                :                         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5860                 :                :                          errmsg("WAL ends before end of online backup"),
                               5861                 :                :                          errhint("All WAL generated while online backup was taken must be available at recovery.")));
                               5862                 :                :             else
 4750                          5863         [ #  # ]:              0 :                 ereport(FATAL,
                               5864                 :                :                         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5865                 :                :                          errmsg("WAL ends before consistent recovery point")));
                               5866                 :                :         }
                               5867                 :                :     }
                               5868                 :                : 
                               5869                 :                :     /*
                               5870                 :                :      * Reset unlogged relations to the contents of their INIT fork. This is
                               5871                 :                :      * done AFTER recovery is complete so as to include any unlogged relations
                               5872                 :                :      * created during recovery, but BEFORE recovery is marked as having
                               5873                 :                :      * completed successfully. Otherwise we'd not retry if any of the post
                               5874                 :                :      * end-of-recovery steps fail.
                               5875                 :                :      */
  788 heikki.linnakangas@i     5876         [ +  + ]:CBC         729 :     if (InRecovery)
                               5877                 :            148 :         ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
                               5878                 :                : 
                               5879                 :                :     /*
                               5880                 :                :      * Pre-scan prepared transactions to find out the range of XIDs present.
                               5881                 :                :      * This information is not quite needed yet, but it is positioned here so
                               5882                 :                :      * as potential problems are detected before any on-disk change is done.
                               5883                 :                :      */
 2106 michael@paquier.xyz      5884                 :            729 :     oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
                               5885                 :                : 
                               5886                 :                :     /*
                               5887                 :                :      * Allow ordinary WAL segment creation before possibly switching to a new
                               5888                 :                :      * timeline, which creates a new segment, and after the last ReadRecord().
                               5889                 :                :      */
  606                          5890                 :            729 :     SetInstallXLogFileSegmentActive();
                               5891                 :                : 
                               5892                 :                :     /*
                               5893                 :                :      * Consider whether we need to assign a new timeline ID.
                               5894                 :                :      *
                               5895                 :                :      * If we did archive recovery, we always assign a new ID.  This handles a
                               5896                 :                :      * couple of issues.  If we stopped short of the end of WAL during
                               5897                 :                :      * recovery, then we are clearly generating a new timeline and must assign
                               5898                 :                :      * it a unique new ID.  Even if we ran to the end, modifying the current
                               5899                 :                :      * last segment is problematic because it may result in trying to
                               5900                 :                :      * overwrite an already-archived copy of that segment, and we encourage
                               5901                 :                :      * DBAs to make their archive_commands reject that.  We can dodge the
                               5902                 :                :      * problem by making the new active segment have a new timeline ID.
                               5903                 :                :      *
                               5904                 :                :      * In a normal crash recovery, we can just extend the timeline we were in.
                               5905                 :                :      */
  788 heikki.linnakangas@i     5906                 :            729 :     newTLI = endOfRecoveryInfo->lastRecTLI;
 4069                          5907         [ +  + ]:            729 :     if (ArchiveRecoveryRequested)
                               5908                 :                :     {
  886 rhaas@postgresql.org     5909                 :             46 :         newTLI = findNewestTimeLine(recoveryTargetTLI) + 1;
 7207 tgl@sss.pgh.pa.us        5910         [ +  - ]:             46 :         ereport(LOG,
                               5911                 :                :                 (errmsg("selected new timeline ID: %u", newTLI)));
                               5912                 :                : 
                               5913                 :                :         /*
                               5914                 :                :          * Make a writable copy of the last WAL segment.  (Note that we also
                               5915                 :                :          * have a copy of the last block of the old WAL in
                               5916                 :                :          * endOfRecovery->lastPage; we will use that below.)
                               5917                 :                :          */
  788 heikki.linnakangas@i     5918                 :             46 :         XLogInitNewTimeline(EndOfLogTLI, EndOfLog, newTLI);
                               5919                 :                : 
                               5920                 :                :         /*
                               5921                 :                :          * Remove the signal files out of the way, so that we don't
                               5922                 :                :          * accidentally re-enter archive recovery mode in a subsequent crash.
                               5923                 :                :          */
                               5924         [ +  + ]:             46 :         if (endOfRecoveryInfo->standby_signal_file_found)
                               5925                 :             43 :             durable_unlink(STANDBY_SIGNAL_FILE, FATAL);
                               5926                 :                : 
                               5927         [ +  + ]:             46 :         if (endOfRecoveryInfo->recovery_signal_file_found)
                               5928                 :              3 :             durable_unlink(RECOVERY_SIGNAL_FILE, FATAL);
                               5929                 :                : 
                               5930                 :                :         /*
                               5931                 :                :          * Write the timeline history file, and have it archived. After this
                               5932                 :                :          * point (or rather, as soon as the file is archived), the timeline
                               5933                 :                :          * will appear as "taken" in the WAL archive and to any standby
                               5934                 :                :          * servers.  If we crash before actually switching to the new
                               5935                 :                :          * timeline, standby servers will nevertheless think that we switched
                               5936                 :                :          * to the new timeline, and will try to connect to the new timeline.
                               5937                 :                :          * To minimize the window for that, try to do as little as possible
                               5938                 :                :          * between here and writing the end-of-recovery record.
                               5939                 :                :          */
  886 rhaas@postgresql.org     5940                 :             46 :         writeTimeLineHistory(newTLI, recoveryTargetTLI,
                               5941                 :                :                              EndOfLog, endOfRecoveryInfo->recoveryStopReason);
                               5942                 :                : 
  788 heikki.linnakangas@i     5943         [ +  - ]:             46 :         ereport(LOG,
                               5944                 :                :                 (errmsg("archive recovery complete")));
                               5945                 :                :     }
                               5946                 :                : 
                               5947                 :                :     /* Save the selected TimeLineID in shared memory, too */
  886 rhaas@postgresql.org     5948                 :            729 :     XLogCtl->InsertTimeLineID = newTLI;
  788 heikki.linnakangas@i     5949                 :            729 :     XLogCtl->PrevTimeLineID = endOfRecoveryInfo->lastRecTLI;
                               5950                 :                : 
                               5951                 :                :     /*
                               5952                 :                :      * Actually, if WAL ended in an incomplete record, skip the parts that
                               5953                 :                :      * made it through and start writing after the portion that persisted.
                               5954                 :                :      * (It's critical to first write an OVERWRITE_CONTRECORD message, which
                               5955                 :                :      * we'll do as soon as we're open for writing new WAL.)
                               5956                 :                :      */
  928 alvherre@alvh.no-ip.     5957         [ +  + ]:            729 :     if (!XLogRecPtrIsInvalid(missingContrecPtr))
                               5958                 :                :     {
                               5959                 :                :         /*
                               5960                 :                :          * We should only have a missingContrecPtr if we're not switching to a
                               5961                 :                :          * new timeline. When a timeline switch occurs, WAL is copied from the
                               5962                 :                :          * old timeline to the new only up to the end of the last complete
                               5963                 :                :          * record, so there can't be an incomplete WAL record that we need to
                               5964                 :                :          * disregard.
                               5965                 :                :          */
  594 rhaas@postgresql.org     5966         [ -  + ]:             10 :         Assert(newTLI == endOfRecoveryInfo->lastRecTLI);
  928 alvherre@alvh.no-ip.     5967         [ -  + ]:             10 :         Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
                               5968                 :             10 :         EndOfLog = missingContrecPtr;
                               5969                 :                :     }
                               5970                 :                : 
                               5971                 :                :     /*
                               5972                 :                :      * Prepare to write WAL starting at EndOfLog location, and init xlog
                               5973                 :                :      * buffer cache using the block containing the last record from the
                               5974                 :                :      * previous incarnation.
                               5975                 :                :      */
 8569 vadim4o@yahoo.com        5976                 :            729 :     Insert = &XLogCtl->Insert;
  788 heikki.linnakangas@i     5977                 :            729 :     Insert->PrevBytePos = XLogRecPtrToBytePos(endOfRecoveryInfo->lastRec);
 3924                          5978                 :            729 :     Insert->CurrBytePos = XLogRecPtrToBytePos(EndOfLog);
                               5979                 :                : 
                               5980                 :                :     /*
                               5981                 :                :      * Tricky point here: lastPage contains the *last* block that the LastRec
                               5982                 :                :      * record spans, not the one it starts in.  The last block is indeed the
                               5983                 :                :      * one we want to use.
                               5984                 :                :      */
                               5985         [ +  + ]:            729 :     if (EndOfLog % XLOG_BLCKSZ != 0)
                               5986                 :                :     {
                               5987                 :                :         char       *page;
                               5988                 :                :         int         len;
                               5989                 :                :         int         firstIdx;
                               5990                 :                : 
                               5991                 :            703 :         firstIdx = XLogRecPtrToBufIdx(EndOfLog);
  788                          5992                 :            703 :         len = EndOfLog - endOfRecoveryInfo->lastPageBeginPtr;
                               5993         [ -  + ]:            703 :         Assert(len < XLOG_BLCKSZ);
                               5994                 :                : 
                               5995                 :                :         /* Copy the valid part of the last block, and zero the rest */
 3924                          5996                 :            703 :         page = &XLogCtl->pages[firstIdx * XLOG_BLCKSZ];
  788                          5997                 :            703 :         memcpy(page, endOfRecoveryInfo->lastPage, len);
 3924                          5998                 :            703 :         memset(page + len, 0, XLOG_BLCKSZ - len);
                               5999                 :                : 
  117 jdavis@postgresql.or     6000                 :GNC         703 :         pg_atomic_write_u64(&XLogCtl->xlblocks[firstIdx], endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ);
  788 heikki.linnakangas@i     6001                 :CBC         703 :         XLogCtl->InitializedUpTo = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
                               6002                 :                :     }
                               6003                 :                :     else
                               6004                 :                :     {
                               6005                 :                :         /*
                               6006                 :                :          * There is no partial block to copy. Just set InitializedUpTo, and
                               6007                 :                :          * let the first attempt to insert a log record to initialize the next
                               6008                 :                :          * buffer.
                               6009                 :                :          */
 3924                          6010                 :             26 :         XLogCtl->InitializedUpTo = EndOfLog;
                               6011                 :                :     }
                               6012                 :                : 
                               6013                 :                :     /*
                               6014                 :                :      * Update local and shared status.  This is OK to do without any locks
                               6015                 :                :      * because no other process can be reading or writing WAL yet.
                               6016                 :                :      */
                               6017                 :            729 :     LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;
    7 alvherre@alvh.no-ip.     6018                 :GNC         729 :     pg_atomic_write_u64(&XLogCtl->logInsertResult, EndOfLog);
    9                          6019                 :            729 :     pg_atomic_write_u64(&XLogCtl->logWriteResult, EndOfLog);
                               6020                 :            729 :     pg_atomic_write_u64(&XLogCtl->logFlushResult, EndOfLog);
 3924 heikki.linnakangas@i     6021                 :CBC         729 :     XLogCtl->LogwrtRqst.Write = EndOfLog;
                               6022                 :            729 :     XLogCtl->LogwrtRqst.Flush = EndOfLog;
                               6023                 :                : 
                               6024                 :                :     /*
                               6025                 :                :      * Preallocate additional log files, if wanted.
                               6026                 :                :      */
  886 rhaas@postgresql.org     6027                 :            729 :     PreallocXlogFiles(EndOfLog, newTLI);
                               6028                 :                : 
                               6029                 :                :     /*
                               6030                 :                :      * Okay, we're officially UP.
                               6031                 :                :      */
 8569 vadim4o@yahoo.com        6032                 :            729 :     InRecovery = false;
                               6033                 :                : 
                               6034                 :                :     /* start the archive_timeout timer and LSN running */
 3924 heikki.linnakangas@i     6035                 :            729 :     XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 2670 andres@anarazel.de       6036                 :            729 :     XLogCtl->lastSegSwitchLSN = EndOfLog;
                               6037                 :                : 
                               6038                 :                :     /* also initialize latestCompletedXid, to nextXid - 1 */
 4451 tgl@sss.pgh.pa.us        6039                 :            729 :     LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  128 heikki.linnakangas@i     6040                 :GNC         729 :     TransamVariables->latestCompletedXid = TransamVariables->nextXid;
                               6041                 :            729 :     FullTransactionIdRetreat(&TransamVariables->latestCompletedXid);
 4451 tgl@sss.pgh.pa.us        6042                 :CBC         729 :     LWLockRelease(ProcArrayLock);
                               6043                 :                : 
                               6044                 :                :     /*
                               6045                 :                :      * Start up subtrans, if not already done for hot standby.  (commit
                               6046                 :                :      * timestamps are started below, if necessary.)
                               6047                 :                :      */
 5230 simon@2ndQuadrant.co     6048         [ +  + ]:            729 :     if (standbyState == STANDBY_DISABLED)
                               6049                 :            683 :         StartupSUBTRANS(oldestActiveXID);
                               6050                 :                : 
                               6051                 :                :     /*
                               6052                 :                :      * Perform end of recovery actions for any SLRUs that need it.
                               6053                 :                :      */
 4547                          6054                 :            729 :     TrimCLOG();
 3789 alvherre@alvh.no-ip.     6055                 :            729 :     TrimMultiXact();
                               6056                 :                : 
                               6057                 :                :     /*
                               6058                 :                :      * Reload shared-memory state for prepared transactions.  This needs to
                               6059                 :                :      * happen before renaming the last partial segment of the old timeline as
                               6060                 :                :      * it may be possible that we have to recovery some transactions from it.
                               6061                 :                :      */
 6876 tgl@sss.pgh.pa.us        6062                 :            729 :     RecoverPreparedTransactions();
                               6063                 :                : 
                               6064                 :                :     /* Shut down xlogreader */
  788 heikki.linnakangas@i     6065                 :            729 :     ShutdownWalRecovery();
                               6066                 :                : 
                               6067                 :                :     /* Enable WAL writes for this backend only. */
  913 rhaas@postgresql.org     6068                 :            729 :     LocalSetXLogInsertAllowed();
                               6069                 :                : 
                               6070                 :                :     /* If necessary, write overwrite-contrecord before doing anything else */
                               6071         [ +  + ]:            729 :     if (!XLogRecPtrIsInvalid(abortedRecPtr))
                               6072                 :                :     {
                               6073         [ -  + ]:             10 :         Assert(!XLogRecPtrIsInvalid(missingContrecPtr));
  788 heikki.linnakangas@i     6074                 :             10 :         CreateOverwriteContrecordRecord(abortedRecPtr, missingContrecPtr, newTLI);
                               6075                 :                :     }
                               6076                 :                : 
                               6077                 :                :     /*
                               6078                 :                :      * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
                               6079                 :                :      * record before resource manager writes cleanup WAL records or checkpoint
                               6080                 :                :      * record is written.
                               6081                 :                :      */
  913 rhaas@postgresql.org     6082                 :            729 :     Insert->fullPageWrites = lastFullPageWrites;
                               6083                 :            729 :     UpdateFullPageWrites();
                               6084                 :                : 
                               6085                 :                :     /*
                               6086                 :                :      * Emit checkpoint or end-of-recovery record in XLOG, if required.
                               6087                 :                :      */
  788 heikki.linnakangas@i     6088         [ +  + ]:            729 :     if (performedWalRecovery)
  913 rhaas@postgresql.org     6089                 :            148 :         promoted = PerformRecoveryXLogAction();
                               6090                 :                : 
                               6091                 :                :     /*
                               6092                 :                :      * If any of the critical GUCs have changed, log them before we allow
                               6093                 :                :      * backends to write WAL.
                               6094                 :                :      */
 5100 heikki.linnakangas@i     6095                 :            729 :     XLogReportParameters();
                               6096                 :                : 
                               6097                 :                :     /* If this is archive recovery, perform post-recovery cleanup actions. */
  902 rhaas@postgresql.org     6098         [ +  + ]:            729 :     if (ArchiveRecoveryRequested)
  886                          6099                 :             46 :         CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog, newTLI);
                               6100                 :                : 
                               6101                 :                :     /*
                               6102                 :                :      * Local WAL inserts enabled, so it's time to finish initialization of
                               6103                 :                :      * commit timestamp.
                               6104                 :                :      */
 3420 alvherre@alvh.no-ip.     6105                 :            729 :     CompleteCommitTsInitialization();
                               6106                 :                : 
                               6107                 :                :     /*
                               6108                 :                :      * All done with end-of-recovery actions.
                               6109                 :                :      *
                               6110                 :                :      * Now allow backends to write WAL and update the control file status in
                               6111                 :                :      * consequence.  SharedRecoveryState, that controls if backends can write
                               6112                 :                :      * WAL, is updated while holding ControlFileLock to prevent other backends
                               6113                 :                :      * to look at an inconsistent state of the control file in shared memory.
                               6114                 :                :      * There is still a small window during which backends can write WAL and
                               6115                 :                :      * the control file is still referring to a system not in DB_IN_PRODUCTION
                               6116                 :                :      * state while looking at the on-disk control file.
                               6117                 :                :      *
                               6118                 :                :      * Also, we use info_lck to update SharedRecoveryState to ensure that
                               6119                 :                :      * there are no race conditions concerning visibility of other recent
                               6120                 :                :      * updates to shared memory.
                               6121                 :                :      */
 2819 peter_e@gmx.net          6122                 :            729 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6123                 :            729 :     ControlFile->state = DB_IN_PRODUCTION;
                               6124                 :                : 
 3492 andres@anarazel.de       6125         [ -  + ]:            729 :     SpinLockAcquire(&XLogCtl->info_lck);
 1451 michael@paquier.xyz      6126                 :            729 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
 3492 andres@anarazel.de       6127                 :            729 :     SpinLockRelease(&XLogCtl->info_lck);
                               6128                 :                : 
 2819 peter_e@gmx.net          6129                 :            729 :     UpdateControlFile();
                               6130                 :            729 :     LWLockRelease(ControlFileLock);
                               6131                 :                : 
                               6132                 :                :     /*
                               6133                 :                :      * Shutdown the recovery environment.  This must occur after
                               6134                 :                :      * RecoverPreparedTransactions() (see notes in lock_twophase_recover())
                               6135                 :                :      * and after switching SharedRecoveryState to RECOVERY_STATE_DONE so as
                               6136                 :                :      * any session building a snapshot will not rely on KnownAssignedXids as
                               6137                 :                :      * RecoveryInProgress() would return false at this stage.  This is
                               6138                 :                :      * particularly critical for prepared 2PC transactions, that would still
                               6139                 :                :      * need to be included in snapshots once recovery has ended.
                               6140                 :                :      */
  923 michael@paquier.xyz      6141         [ +  + ]:            729 :     if (standbyState != STANDBY_DISABLED)
                               6142                 :             46 :         ShutdownRecoveryTransactionEnvironment();
                               6143                 :                : 
                               6144                 :                :     /*
                               6145                 :                :      * If there were cascading standby servers connected to us, nudge any wal
                               6146                 :                :      * sender processes to notice that we've been promoted.
                               6147                 :                :      */
  372 andres@anarazel.de       6148                 :            729 :     WalSndWakeup(true, true);
                               6149                 :                : 
                               6150                 :                :     /*
                               6151                 :                :      * If this was a promotion, request an (online) checkpoint now. This isn't
                               6152                 :                :      * required for consistency, but the last restartpoint might be far back,
                               6153                 :                :      * and in case of a crash, recovering from it might take a longer than is
                               6154                 :                :      * appropriate now that we're not in standby mode anymore.
                               6155                 :                :      */
 1355 fujii@postgresql.org     6156         [ +  + ]:            729 :     if (promoted)
 3981 simon@2ndQuadrant.co     6157                 :             39 :         RequestCheckpoint(CHECKPOINT_FORCE);
 5534 heikki.linnakangas@i     6158                 :            729 : }
                               6159                 :                : 
                               6160                 :                : /*
                               6161                 :                :  * Callback from PerformWalRecovery(), called when we switch from crash
                               6162                 :                :  * recovery to archive recovery mode.  Updates the control file accordingly.
                               6163                 :                :  */
                               6164                 :                : void
  788                          6165                 :              4 : SwitchIntoArchiveRecovery(XLogRecPtr EndRecPtr, TimeLineID replayTLI)
                               6166                 :                : {
                               6167                 :                :     /* initialize minRecoveryPoint to this record */
                               6168                 :              4 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6169                 :              4 :     ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
                               6170         [ +  - ]:              4 :     if (ControlFile->minRecoveryPoint < EndRecPtr)
                               6171                 :                :     {
                               6172                 :              4 :         ControlFile->minRecoveryPoint = EndRecPtr;
                               6173                 :              4 :         ControlFile->minRecoveryPointTLI = replayTLI;
                               6174                 :                :     }
                               6175                 :                :     /* update local copy */
                               6176                 :              4 :     LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               6177                 :              4 :     LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               6178                 :                : 
                               6179                 :                :     /*
                               6180                 :                :      * The startup process can update its local copy of minRecoveryPoint from
                               6181                 :                :      * this point.
                               6182                 :                :      */
                               6183                 :              4 :     updateMinRecoveryPoint = true;
                               6184                 :                : 
                               6185                 :              4 :     UpdateControlFile();
                               6186                 :                : 
                               6187                 :                :     /*
                               6188                 :                :      * We update SharedRecoveryState while holding the lock on ControlFileLock
                               6189                 :                :      * so both states are consistent in shared memory.
                               6190                 :                :      */
                               6191         [ -  + ]:              4 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6192                 :              4 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
                               6193                 :              4 :     SpinLockRelease(&XLogCtl->info_lck);
                               6194                 :                : 
                               6195                 :              4 :     LWLockRelease(ControlFileLock);
                               6196                 :              4 : }
                               6197                 :                : 
                               6198                 :                : /*
                               6199                 :                :  * Callback from PerformWalRecovery(), called when we reach the end of backup.
                               6200                 :                :  * Updates the control file accordingly.
                               6201                 :                :  */
                               6202                 :                : void
                               6203                 :             95 : ReachedEndOfBackup(XLogRecPtr EndRecPtr, TimeLineID tli)
                               6204                 :                : {
                               6205                 :                :     /*
                               6206                 :                :      * We have reached the end of base backup, as indicated by pg_control. The
                               6207                 :                :      * data on disk is now consistent (unless minRecoveryPoint is further
                               6208                 :                :      * ahead, which can happen if we crashed during previous recovery).  Reset
                               6209                 :                :      * backupStartPoint and backupEndPoint, and update minRecoveryPoint to
                               6210                 :                :      * make sure we don't allow starting up at an earlier point even if
                               6211                 :                :      * recovery is stopped and restarted soon after this.
                               6212                 :                :      */
                               6213                 :             95 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6214                 :                : 
                               6215         [ +  + ]:             95 :     if (ControlFile->minRecoveryPoint < EndRecPtr)
                               6216                 :                :     {
                               6217                 :             78 :         ControlFile->minRecoveryPoint = EndRecPtr;
                               6218                 :             78 :         ControlFile->minRecoveryPointTLI = tli;
                               6219                 :                :     }
                               6220                 :                : 
                               6221                 :             95 :     ControlFile->backupStartPoint = InvalidXLogRecPtr;
                               6222                 :             95 :     ControlFile->backupEndPoint = InvalidXLogRecPtr;
                               6223                 :             95 :     ControlFile->backupEndRequired = false;
                               6224                 :             95 :     UpdateControlFile();
                               6225                 :                : 
                               6226                 :             95 :     LWLockRelease(ControlFileLock);
 5115                          6227                 :             95 : }
                               6228                 :                : 
                               6229                 :                : /*
                               6230                 :                :  * Perform whatever XLOG actions are necessary at end of REDO.
                               6231                 :                :  *
                               6232                 :                :  * The goal here is to make sure that we'll be able to recover properly if
                               6233                 :                :  * we crash again. If we choose to write a checkpoint, we'll write a shutdown
                               6234                 :                :  * checkpoint rather than an on-line one. This is not particularly critical,
                               6235                 :                :  * but since we may be assigning a new TLI, using a shutdown checkpoint allows
                               6236                 :                :  * us to have the rule that TLI only changes in shutdown checkpoints, which
                               6237                 :                :  * allows some extra error checking in xlog_redo.
                               6238                 :                :  */
                               6239                 :                : static bool
  914 rhaas@postgresql.org     6240                 :            148 : PerformRecoveryXLogAction(void)
                               6241                 :                : {
                               6242                 :            148 :     bool        promoted = false;
                               6243                 :                : 
                               6244                 :                :     /*
                               6245                 :                :      * Perform a checkpoint to update all our recovery activity to disk.
                               6246                 :                :      *
                               6247                 :                :      * Note that we write a shutdown checkpoint rather than an on-line one.
                               6248                 :                :      * This is not particularly critical, but since we may be assigning a new
                               6249                 :                :      * TLI, using a shutdown checkpoint allows us to have the rule that TLI
                               6250                 :                :      * only changes in shutdown checkpoints, which allows some extra error
                               6251                 :                :      * checking in xlog_redo.
                               6252                 :                :      *
                               6253                 :                :      * In promotion, only create a lightweight end-of-recovery record instead
                               6254                 :                :      * of a full checkpoint. A checkpoint is requested later, after we're
                               6255                 :                :      * fully out of recovery mode and already accepting queries.
                               6256                 :                :      */
                               6257   [ +  +  +  -  :            194 :     if (ArchiveRecoveryRequested && IsUnderPostmaster &&
                                              +  + ]
  788 heikki.linnakangas@i     6258                 :             46 :         PromoteIsTriggered())
                               6259                 :                :     {
  914 rhaas@postgresql.org     6260                 :             39 :         promoted = true;
                               6261                 :                : 
                               6262                 :                :         /*
                               6263                 :                :          * Insert a special WAL record to mark the end of recovery, since we
                               6264                 :                :          * aren't doing a checkpoint. That means that the checkpointer process
                               6265                 :                :          * may likely be in the middle of a time-smoothed restartpoint and
                               6266                 :                :          * could continue to be for minutes after this.  That sounds strange,
                               6267                 :                :          * but the effect is roughly the same and it would be stranger to try
                               6268                 :                :          * to come out of the restartpoint and then checkpoint. We request a
                               6269                 :                :          * checkpoint later anyway, just for safety.
                               6270                 :                :          */
                               6271                 :             39 :         CreateEndOfRecoveryRecord();
                               6272                 :                :     }
                               6273                 :                :     else
                               6274                 :                :     {
                               6275                 :            109 :         RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
                               6276                 :                :                           CHECKPOINT_IMMEDIATE |
                               6277                 :                :                           CHECKPOINT_WAIT);
                               6278                 :                :     }
                               6279                 :                : 
                               6280                 :            148 :     return promoted;
                               6281                 :                : }
                               6282                 :                : 
                               6283                 :                : /*
                               6284                 :                :  * Is the system still in recovery?
                               6285                 :                :  *
                               6286                 :                :  * Unlike testing InRecovery, this works in any process that's connected to
                               6287                 :                :  * shared memory.
                               6288                 :                :  */
                               6289                 :                : bool
 5534 heikki.linnakangas@i     6290                 :       80635129 : RecoveryInProgress(void)
                               6291                 :                : {
                               6292                 :                :     /*
                               6293                 :                :      * We check shared state each time only until we leave recovery mode. We
                               6294                 :                :      * can't re-enter recovery, so there's no need to keep checking after the
                               6295                 :                :      * shared variable has once been seen false.
                               6296                 :                :      */
                               6297         [ +  + ]:       80635129 :     if (!LocalRecoveryInProgress)
                               6298                 :       78518930 :         return false;
                               6299                 :                :     else
                               6300                 :                :     {
                               6301                 :                :         /*
                               6302                 :                :          * use volatile pointer to make sure we make a fresh read of the
                               6303                 :                :          * shared variable.
                               6304                 :                :          */
                               6305                 :        2116199 :         volatile XLogCtlData *xlogctl = XLogCtl;
                               6306                 :                : 
 1451 michael@paquier.xyz      6307                 :        2116199 :         LocalRecoveryInProgress = (xlogctl->SharedRecoveryState != RECOVERY_STATE_DONE);
                               6308                 :                : 
                               6309                 :                :         /*
                               6310                 :                :          * Note: We don't need a memory barrier when we're still in recovery.
                               6311                 :                :          * We might exit recovery immediately after return, so the caller
                               6312                 :                :          * can't rely on 'true' meaning that we're still in recovery anyway.
                               6313                 :                :          */
                               6314                 :                : 
 5534 heikki.linnakangas@i     6315                 :        2116199 :         return LocalRecoveryInProgress;
                               6316                 :                :     }
                               6317                 :                : }
                               6318                 :                : 
                               6319                 :                : /*
                               6320                 :                :  * Returns current recovery state from shared memory.
                               6321                 :                :  *
                               6322                 :                :  * This returned state is kept consistent with the contents of the control
                               6323                 :                :  * file.  See details about the possible values of RecoveryState in xlog.h.
                               6324                 :                :  */
                               6325                 :                : RecoveryState
 1451 michael@paquier.xyz      6326                 :             88 : GetRecoveryState(void)
                               6327                 :                : {
                               6328                 :                :     RecoveryState retval;
                               6329                 :                : 
                               6330         [ -  + ]:             88 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6331                 :             88 :     retval = XLogCtl->SharedRecoveryState;
                               6332                 :             88 :     SpinLockRelease(&XLogCtl->info_lck);
                               6333                 :                : 
                               6334                 :             88 :     return retval;
                               6335                 :                : }
                               6336                 :                : 
                               6337                 :                : /*
                               6338                 :                :  * Is this process allowed to insert new WAL records?
                               6339                 :                :  *
                               6340                 :                :  * Ordinarily this is essentially equivalent to !RecoveryInProgress().
                               6341                 :                :  * But we also have provisions for forcing the result "true" or "false"
                               6342                 :                :  * within specific processes regardless of the global state.
                               6343                 :                :  */
                               6344                 :                : bool
 5406 tgl@sss.pgh.pa.us        6345                 :       28339614 : XLogInsertAllowed(void)
                               6346                 :                : {
                               6347                 :                :     /*
                               6348                 :                :      * If value is "unconditionally true" or "unconditionally false", just
                               6349                 :                :      * return it.  This provides the normal fast path once recovery is known
                               6350                 :                :      * done.
                               6351                 :                :      */
                               6352         [ +  + ]:       28339614 :     if (LocalXLogInsertAllowed >= 0)
                               6353                 :       28211705 :         return (bool) LocalXLogInsertAllowed;
                               6354                 :                : 
                               6355                 :                :     /*
                               6356                 :                :      * Else, must check to see if we're still in recovery.
                               6357                 :                :      */
                               6358         [ +  + ]:         127909 :     if (RecoveryInProgress())
                               6359                 :         120869 :         return false;
                               6360                 :                : 
                               6361                 :                :     /*
                               6362                 :                :      * On exit from recovery, reset to "unconditionally true", since there is
                               6363                 :                :      * no need to keep checking.
                               6364                 :                :      */
                               6365                 :           7040 :     LocalXLogInsertAllowed = 1;
                               6366                 :           7040 :     return true;
                               6367                 :                : }
                               6368                 :                : 
                               6369                 :                : /*
                               6370                 :                :  * Make XLogInsertAllowed() return true in the current process only.
                               6371                 :                :  *
                               6372                 :                :  * Note: it is allowed to switch LocalXLogInsertAllowed back to -1 later,
                               6373                 :                :  * and even call LocalSetXLogInsertAllowed() again after that.
                               6374                 :                :  *
                               6375                 :                :  * Returns the previous value of LocalXLogInsertAllowed.
                               6376                 :                :  */
                               6377                 :                : static int
                               6378                 :            838 : LocalSetXLogInsertAllowed(void)
                               6379                 :                : {
  788 heikki.linnakangas@i     6380                 :            838 :     int         oldXLogAllowed = LocalXLogInsertAllowed;
                               6381                 :                : 
 5406 tgl@sss.pgh.pa.us        6382                 :            838 :     LocalXLogInsertAllowed = 1;
                               6383                 :                : 
  902 rhaas@postgresql.org     6384                 :            838 :     return oldXLogAllowed;
                               6385                 :                : }
                               6386                 :                : 
                               6387                 :                : /*
                               6388                 :                :  * Return the current Redo pointer from shared memory.
                               6389                 :                :  *
                               6390                 :                :  * As a side-effect, the local RedoRecPtr copy is updated.
                               6391                 :                :  */
                               6392                 :                : XLogRecPtr
 8508 vadim4o@yahoo.com        6393                 :         274638 : GetRedoRecPtr(void)
                               6394                 :                : {
                               6395                 :                :     XLogRecPtr  ptr;
                               6396                 :                : 
                               6397                 :                :     /*
                               6398                 :                :      * The possibly not up-to-date copy in XlogCtl is enough. Even if we
                               6399                 :                :      * grabbed a WAL insertion lock to read the authoritative value in
                               6400                 :                :      * Insert->RedoRecPtr, someone might update it just after we've released
                               6401                 :                :      * the lock.
                               6402                 :                :      */
 3492 andres@anarazel.de       6403         [ +  + ]:         274638 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6404                 :         274638 :     ptr = XLogCtl->RedoRecPtr;
                               6405                 :         274638 :     SpinLockRelease(&XLogCtl->info_lck);
                               6406                 :                : 
 3933 heikki.linnakangas@i     6407         [ +  + ]:         274638 :     if (RedoRecPtr < ptr)
                               6408                 :            856 :         RedoRecPtr = ptr;
                               6409                 :                : 
 8066 tgl@sss.pgh.pa.us        6410                 :         274638 :     return RedoRecPtr;
                               6411                 :                : }
                               6412                 :                : 
                               6413                 :                : /*
                               6414                 :                :  * Return information needed to decide whether a modified block needs a
                               6415                 :                :  * full-page image to be included in the WAL record.
                               6416                 :                :  *
                               6417                 :                :  * The returned values are cached copies from backend-private memory, and
                               6418                 :                :  * possibly out-of-date or, indeed, uninitialized, in which case they will
                               6419                 :                :  * be InvalidXLogRecPtr and false, respectively.  XLogInsertRecord will
                               6420                 :                :  * re-check them against up-to-date values, while holding the WAL insert lock.
                               6421                 :                :  */
                               6422                 :                : void
 3447 heikki.linnakangas@i     6423                 :       13739113 : GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, bool *doPageWrites_p)
                               6424                 :                : {
                               6425                 :       13739113 :     *RedoRecPtr_p = RedoRecPtr;
                               6426                 :       13739113 :     *doPageWrites_p = doPageWrites;
                               6427                 :       13739113 : }
                               6428                 :                : 
                               6429                 :                : /*
                               6430                 :                :  * GetInsertRecPtr -- Returns the current insert position.
                               6431                 :                :  *
                               6432                 :                :  * NOTE: The value *actually* returned is the position of the last full
                               6433                 :                :  * xlog page. It lags behind the real insert position by at most 1 page.
                               6434                 :                :  * For that, we don't need to scan through WAL insertion locks, and an
                               6435                 :                :  * approximation is enough for the current usage of this function.
                               6436                 :                :  */
                               6437                 :                : XLogRecPtr
 6135 tgl@sss.pgh.pa.us        6438                 :           2838 : GetInsertRecPtr(void)
                               6439                 :                : {
                               6440                 :                :     XLogRecPtr  recptr;
                               6441                 :                : 
 3492 andres@anarazel.de       6442         [ -  + ]:           2838 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6443                 :           2838 :     recptr = XLogCtl->LogwrtRqst.Write;
                               6444                 :           2838 :     SpinLockRelease(&XLogCtl->info_lck);
                               6445                 :                : 
 6135 tgl@sss.pgh.pa.us        6446                 :           2838 :     return recptr;
                               6447                 :                : }
                               6448                 :                : 
                               6449                 :                : /*
                               6450                 :                :  * GetFlushRecPtr -- Returns the current flush position, ie, the last WAL
                               6451                 :                :  * position known to be fsync'd to disk. This should only be used on a
                               6452                 :                :  * system that is known not to be in recovery.
                               6453                 :                :  */
                               6454                 :                : XLogRecPtr
  891 rhaas@postgresql.org     6455                 :         193034 : GetFlushRecPtr(TimeLineID *insertTLI)
                               6456                 :                : {
  886                          6457         [ -  + ]:         193034 :     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
                               6458                 :                : 
   11 alvherre@alvh.no-ip.     6459                 :GNC      193034 :     RefreshXLogWriteResult(LogwrtResult);
                               6460                 :                : 
                               6461                 :                :     /*
                               6462                 :                :      * If we're writing and flushing WAL, the time line can't be changing, so
                               6463                 :                :      * no lock is required.
                               6464                 :                :      */
  891 rhaas@postgresql.org     6465         [ +  + ]:CBC      193034 :     if (insertTLI)
  886                          6466                 :          26055 :         *insertTLI = XLogCtl->InsertTimeLineID;
                               6467                 :                : 
 3015 simon@2ndQuadrant.co     6468                 :         193034 :     return LogwrtResult.Flush;
                               6469                 :                : }
                               6470                 :                : 
                               6471                 :                : /*
                               6472                 :                :  * GetWALInsertionTimeLine -- Returns the current timeline of a system that
                               6473                 :                :  * is not in recovery.
                               6474                 :                :  */
                               6475                 :                : TimeLineID
  891 rhaas@postgresql.org     6476                 :          55419 : GetWALInsertionTimeLine(void)
                               6477                 :                : {
                               6478         [ -  + ]:          55419 :     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
                               6479                 :                : 
                               6480                 :                :     /* Since the value can't be changing, no lock is required. */
  886                          6481                 :          55419 :     return XLogCtl->InsertTimeLineID;
                               6482                 :                : }
                               6483                 :                : 
                               6484                 :                : /*
                               6485                 :                :  * GetLastImportantRecPtr -- Returns the LSN of the last important record
                               6486                 :                :  * inserted. All records not explicitly marked as unimportant are considered
                               6487                 :                :  * important.
                               6488                 :                :  *
                               6489                 :                :  * The LSN is determined by computing the maximum of
                               6490                 :                :  * WALInsertLocks[i].lastImportantAt.
                               6491                 :                :  */
                               6492                 :                : XLogRecPtr
 2670 andres@anarazel.de       6493                 :           1201 : GetLastImportantRecPtr(void)
                               6494                 :                : {
                               6495                 :           1201 :     XLogRecPtr  res = InvalidXLogRecPtr;
                               6496                 :                :     int         i;
                               6497                 :                : 
                               6498         [ +  + ]:          10809 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               6499                 :                :     {
                               6500                 :                :         XLogRecPtr  last_important;
                               6501                 :                : 
                               6502                 :                :         /*
                               6503                 :                :          * Need to take a lock to prevent torn reads of the LSN, which are
                               6504                 :                :          * possible on some of the supported platforms. WAL insert locks only
                               6505                 :                :          * support exclusive mode, so we have to use that.
                               6506                 :                :          */
                               6507                 :           9608 :         LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               6508                 :           9608 :         last_important = WALInsertLocks[i].l.lastImportantAt;
                               6509                 :           9608 :         LWLockRelease(&WALInsertLocks[i].l.lock);
                               6510                 :                : 
                               6511         [ +  + ]:           9608 :         if (res < last_important)
                               6512                 :           1815 :             res = last_important;
                               6513                 :                :     }
                               6514                 :                : 
                               6515                 :           1201 :     return res;
                               6516                 :                : }
                               6517                 :                : 
                               6518                 :                : /*
                               6519                 :                :  * Get the time and LSN of the last xlog segment switch
                               6520                 :                :  */
                               6521                 :                : pg_time_t
 2670 andres@anarazel.de       6522                 :UBC           0 : GetLastSegSwitchData(XLogRecPtr *lastSwitchLSN)
                               6523                 :                : {
                               6524                 :                :     pg_time_t   result;
                               6525                 :                : 
                               6526                 :                :     /* Need WALWriteLock, but shared lock is sufficient */
 6450 tgl@sss.pgh.pa.us        6527                 :              0 :     LWLockAcquire(WALWriteLock, LW_SHARED);
 3924 heikki.linnakangas@i     6528                 :              0 :     result = XLogCtl->lastSegSwitchTime;
 2670 andres@anarazel.de       6529                 :              0 :     *lastSwitchLSN = XLogCtl->lastSegSwitchLSN;
 6450 tgl@sss.pgh.pa.us        6530                 :              0 :     LWLockRelease(WALWriteLock);
                               6531                 :                : 
                               6532                 :              0 :     return result;
                               6533                 :                : }
                               6534                 :                : 
                               6535                 :                : /*
                               6536                 :                :  * This must be called ONCE during postmaster or standalone-backend shutdown
                               6537                 :                :  */
                               6538                 :                : void
 7429 peter_e@gmx.net          6539                 :CBC         513 : ShutdownXLOG(int code, Datum arg)
                               6540                 :                : {
                               6541                 :                :     /*
                               6542                 :                :      * We should have an aux process resource owner to use, and we should not
                               6543                 :                :      * be in a transaction that's installed some other resowner.
                               6544                 :                :      */
 2097 tgl@sss.pgh.pa.us        6545         [ -  + ]:            513 :     Assert(AuxProcessResourceOwner != NULL);
                               6546   [ +  +  -  + ]:            513 :     Assert(CurrentResourceOwner == NULL ||
                               6547                 :                :            CurrentResourceOwner == AuxProcessResourceOwner);
                               6548                 :            513 :     CurrentResourceOwner = AuxProcessResourceOwner;
                               6549                 :                : 
                               6550                 :                :     /* Don't be chatty in standalone mode */
 3958                          6551   [ +  +  +  + ]:            513 :     ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                               6552                 :                :             (errmsg("shutting down")));
                               6553                 :                : 
                               6554                 :                :     /*
                               6555                 :                :      * Signal walsenders to move to stopping state.
                               6556                 :                :      */
 2505 andres@anarazel.de       6557                 :            513 :     WalSndInitStopping();
                               6558                 :                : 
                               6559                 :                :     /*
                               6560                 :                :      * Wait for WAL senders to be in stopping state.  This prevents commands
                               6561                 :                :      * from writing new WAL.
                               6562                 :                :      */
                               6563                 :            513 :     WalSndWaitStopping();
                               6564                 :                : 
 5534 heikki.linnakangas@i     6565         [ +  + ]:            513 :     if (RecoveryInProgress())
                               6566                 :             47 :         CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
                               6567                 :                :     else
                               6568                 :                :     {
                               6569                 :                :         /*
                               6570                 :                :          * If archiving is enabled, rotate the last XLOG file so that all the
                               6571                 :                :          * remaining records are archived (postmaster wakes up the archiver
                               6572                 :                :          * process one more time at the end of shutdown). The checkpoint
                               6573                 :                :          * record will go to the next XLOG file and won't be archived (yet).
                               6574                 :                :          */
  801 rhaas@postgresql.org     6575   [ +  +  -  +  :            466 :         if (XLogArchivingActive())
                                              +  + ]
 2670 andres@anarazel.de       6576                 :             11 :             RequestXLogSwitch(false);
                               6577                 :                : 
 5534 heikki.linnakangas@i     6578                 :            466 :         CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
                               6579                 :                :     }
 8957 vadim4o@yahoo.com        6580                 :            513 : }
                               6581                 :                : 
                               6582                 :                : /*
                               6583                 :                :  * Log start of a checkpoint.
                               6584                 :                :  */
                               6585                 :                : static void
 5534 heikki.linnakangas@i     6586                 :            927 : LogCheckpointStart(int flags, bool restartpoint)
                               6587                 :                : {
 1227 peter@eisentraut.org     6588         [ +  + ]:            927 :     if (restartpoint)
                               6589   [ +  -  -  +  :             48 :         ereport(LOG,
                                     +  +  +  +  +  
                                     +  +  +  +  +  
                                        -  +  +  + ]
                               6590                 :                :         /* translator: the placeholders show checkpoint options */
                               6591                 :                :                 (errmsg("restartpoint starting:%s%s%s%s%s%s%s%s",
                               6592                 :                :                         (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
                               6593                 :                :                         (flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "",
                               6594                 :                :                         (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
                               6595                 :                :                         (flags & CHECKPOINT_FORCE) ? " force" : "",
                               6596                 :                :                         (flags & CHECKPOINT_WAIT) ? " wait" : "",
                               6597                 :                :                         (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
                               6598                 :                :                         (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
                               6599                 :                :                         (flags & CHECKPOINT_FLUSH_ALL) ? " flush-all" : "")));
                               6600                 :                :     else
                               6601   [ +  -  +  +  :            879 :         ereport(LOG,
                                     -  +  +  +  +  
                                     +  +  +  +  +  
                                        +  +  +  + ]
                               6602                 :                :         /* translator: the placeholders show checkpoint options */
                               6603                 :                :                 (errmsg("checkpoint starting:%s%s%s%s%s%s%s%s",
                               6604                 :                :                         (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
                               6605                 :                :                         (flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "",
                               6606                 :                :                         (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
                               6607                 :                :                         (flags & CHECKPOINT_FORCE) ? " force" : "",
                               6608                 :                :                         (flags & CHECKPOINT_WAIT) ? " wait" : "",
                               6609                 :                :                         (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
                               6610                 :                :                         (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
                               6611                 :                :                         (flags & CHECKPOINT_FLUSH_ALL) ? " flush-all" : "")));
 6133 tgl@sss.pgh.pa.us        6612                 :            927 : }
                               6613                 :                : 
                               6614                 :                : /*
                               6615                 :                :  * Log end of a checkpoint.
                               6616                 :                :  */
                               6617                 :                : static void
 5534 heikki.linnakangas@i     6618                 :           1148 : LogCheckpointEnd(bool restartpoint)
                               6619                 :                : {
                               6620                 :                :     long        write_msecs,
                               6621                 :                :                 sync_msecs,
                               6622                 :                :                 total_msecs,
                               6623                 :                :                 longest_msecs,
                               6624                 :                :                 average_msecs;
                               6625                 :                :     uint64      average_sync_time;
                               6626                 :                : 
 6133 tgl@sss.pgh.pa.us        6627                 :           1148 :     CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
                               6628                 :                : 
 1251                          6629                 :           1148 :     write_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_write_t,
                               6630                 :                :                                                   CheckpointStats.ckpt_sync_t);
                               6631                 :                : 
                               6632                 :           1148 :     sync_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_sync_t,
                               6633                 :                :                                                  CheckpointStats.ckpt_sync_end_t);
                               6634                 :                : 
                               6635                 :                :     /* Accumulate checkpoint timing summary data, in milliseconds. */
  167 michael@paquier.xyz      6636                 :GNC        1148 :     PendingCheckpointerStats.write_time += write_msecs;
                               6637                 :           1148 :     PendingCheckpointerStats.sync_time += sync_msecs;
                               6638                 :                : 
                               6639                 :                :     /*
                               6640                 :                :      * All of the published timing statistics are accounted for.  Only
                               6641                 :                :      * continue if a log message is to be written.
                               6642                 :                :      */
 4392 rhaas@postgresql.org     6643         [ +  + ]:CBC        1148 :     if (!log_checkpoints)
                               6644                 :            226 :         return;
                               6645                 :                : 
 1251 tgl@sss.pgh.pa.us        6646                 :            922 :     total_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_start_t,
                               6647                 :                :                                                   CheckpointStats.ckpt_end_t);
                               6648                 :                : 
                               6649                 :                :     /*
                               6650                 :                :      * Timing values returned from CheckpointStats are in microseconds.
                               6651                 :                :      * Convert to milliseconds for consistent printing.
                               6652                 :                :      */
                               6653                 :            922 :     longest_msecs = (long) ((CheckpointStats.ckpt_longest_sync + 999) / 1000);
                               6654                 :                : 
 4870 rhaas@postgresql.org     6655                 :            922 :     average_sync_time = 0;
 4753 bruce@momjian.us         6656         [ -  + ]:            922 :     if (CheckpointStats.ckpt_sync_rels > 0)
 4870 rhaas@postgresql.org     6657                 :UBC           0 :         average_sync_time = CheckpointStats.ckpt_agg_sync_time /
                               6658                 :              0 :             CheckpointStats.ckpt_sync_rels;
 1251 tgl@sss.pgh.pa.us        6659                 :CBC         922 :     average_msecs = (long) ((average_sync_time + 999) / 1000);
                               6660                 :                : 
                               6661                 :                :     /*
                               6662                 :                :      * ControlFileLock is not required to see ControlFile->checkPoint and
                               6663                 :                :      * ->checkPointCopy here as we are the only updator of those variables at
                               6664                 :                :      * this moment.
                               6665                 :                :      */
 1227 peter@eisentraut.org     6666         [ +  + ]:            922 :     if (restartpoint)
                               6667         [ +  - ]:             48 :         ereport(LOG,
                               6668                 :                :                 (errmsg("restartpoint complete: wrote %d buffers (%.1f%%); "
                               6669                 :                :                         "%d WAL file(s) added, %d removed, %d recycled; "
                               6670                 :                :                         "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
                               6671                 :                :                         "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
                               6672                 :                :                         "distance=%d kB, estimate=%d kB; "
                               6673                 :                :                         "lsn=%X/%X, redo lsn=%X/%X",
                               6674                 :                :                         CheckpointStats.ckpt_bufs_written,
                               6675                 :                :                         (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
                               6676                 :                :                         CheckpointStats.ckpt_segs_added,
                               6677                 :                :                         CheckpointStats.ckpt_segs_removed,
                               6678                 :                :                         CheckpointStats.ckpt_segs_recycled,
                               6679                 :                :                         write_msecs / 1000, (int) (write_msecs % 1000),
                               6680                 :                :                         sync_msecs / 1000, (int) (sync_msecs % 1000),
                               6681                 :                :                         total_msecs / 1000, (int) (total_msecs % 1000),
                               6682                 :                :                         CheckpointStats.ckpt_sync_rels,
                               6683                 :                :                         longest_msecs / 1000, (int) (longest_msecs % 1000),
                               6684                 :                :                         average_msecs / 1000, (int) (average_msecs % 1000),
                               6685                 :                :                         (int) (PrevCheckPointDistance / 1024.0),
                               6686                 :                :                         (int) (CheckPointDistanceEstimate / 1024.0),
                               6687                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPoint),
                               6688                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
                               6689                 :                :     else
                               6690         [ +  - ]:            874 :         ereport(LOG,
                               6691                 :                :                 (errmsg("checkpoint complete: wrote %d buffers (%.1f%%); "
                               6692                 :                :                         "%d WAL file(s) added, %d removed, %d recycled; "
                               6693                 :                :                         "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
                               6694                 :                :                         "sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
                               6695                 :                :                         "distance=%d kB, estimate=%d kB; "
                               6696                 :                :                         "lsn=%X/%X, redo lsn=%X/%X",
                               6697                 :                :                         CheckpointStats.ckpt_bufs_written,
                               6698                 :                :                         (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
                               6699                 :                :                         CheckpointStats.ckpt_segs_added,
                               6700                 :                :                         CheckpointStats.ckpt_segs_removed,
                               6701                 :                :                         CheckpointStats.ckpt_segs_recycled,
                               6702                 :                :                         write_msecs / 1000, (int) (write_msecs % 1000),
                               6703                 :                :                         sync_msecs / 1000, (int) (sync_msecs % 1000),
                               6704                 :                :                         total_msecs / 1000, (int) (total_msecs % 1000),
                               6705                 :                :                         CheckpointStats.ckpt_sync_rels,
                               6706                 :                :                         longest_msecs / 1000, (int) (longest_msecs % 1000),
                               6707                 :                :                         average_msecs / 1000, (int) (average_msecs % 1000),
                               6708                 :                :                         (int) (PrevCheckPointDistance / 1024.0),
                               6709                 :                :                         (int) (CheckPointDistanceEstimate / 1024.0),
                               6710                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPoint),
                               6711                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
                               6712                 :                : }
                               6713                 :                : 
                               6714                 :                : /*
                               6715                 :                :  * Update the estimate of distance between checkpoints.
                               6716                 :                :  *
                               6717                 :                :  * The estimate is used to calculate the number of WAL segments to keep
                               6718                 :                :  * preallocated, see XLOGfileslop().
                               6719                 :                :  */
                               6720                 :                : static void
 3338 heikki.linnakangas@i     6721                 :           1148 : UpdateCheckPointDistanceEstimate(uint64 nbytes)
                               6722                 :                : {
                               6723                 :                :     /*
                               6724                 :                :      * To estimate the number of segments consumed between checkpoints, keep a
                               6725                 :                :      * moving average of the amount of WAL generated in previous checkpoint
                               6726                 :                :      * cycles. However, if the load is bursty, with quiet periods and busy
                               6727                 :                :      * periods, we want to cater for the peak load. So instead of a plain
                               6728                 :                :      * moving average, let the average decline slowly if the previous cycle
                               6729                 :                :      * used less WAL than estimated, but bump it up immediately if it used
                               6730                 :                :      * more.
                               6731                 :                :      *
                               6732                 :                :      * When checkpoints are triggered by max_wal_size, this should converge to
                               6733                 :                :      * CheckpointSegments * wal_segment_size,
                               6734                 :                :      *
                               6735                 :                :      * Note: This doesn't pay any attention to what caused the checkpoint.
                               6736                 :                :      * Checkpoints triggered manually with CHECKPOINT command, or by e.g.
                               6737                 :                :      * starting a base backup, are counted the same as those created
                               6738                 :                :      * automatically. The slow-decline will largely mask them out, if they are
                               6739                 :                :      * not frequent. If they are frequent, it seems reasonable to count them
                               6740                 :                :      * in as any others; if you issue a manual checkpoint every 5 minutes and
                               6741                 :                :      * never let a timed checkpoint happen, it makes sense to base the
                               6742                 :                :      * preallocation on that 5 minute interval rather than whatever
                               6743                 :                :      * checkpoint_timeout is set to.
                               6744                 :                :      */
                               6745                 :           1148 :     PrevCheckPointDistance = nbytes;
                               6746         [ +  + ]:           1148 :     if (CheckPointDistanceEstimate < nbytes)
                               6747                 :            765 :         CheckPointDistanceEstimate = nbytes;
                               6748                 :                :     else
                               6749                 :            383 :         CheckPointDistanceEstimate =
                               6750                 :            383 :             (0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes);
 6133 tgl@sss.pgh.pa.us        6751                 :           1148 : }
                               6752                 :                : 
                               6753                 :                : /*
                               6754                 :                :  * Update the ps display for a process running a checkpoint.  Note that
                               6755                 :                :  * this routine should not do any allocations so as it can be called
                               6756                 :                :  * from a critical section.
                               6757                 :                :  */
                               6758                 :                : static void
 1217 michael@paquier.xyz      6759                 :           2301 : update_checkpoint_display(int flags, bool restartpoint, bool reset)
                               6760                 :                : {
                               6761                 :                :     /*
                               6762                 :                :      * The status is reported only for end-of-recovery and shutdown
                               6763                 :                :      * checkpoints or shutdown restartpoints.  Updating the ps display is
                               6764                 :                :      * useful in those situations as it may not be possible to rely on
                               6765                 :                :      * pg_stat_activity to see the status of the checkpointer or the startup
                               6766                 :                :      * process.
                               6767                 :                :      */
                               6768         [ +  + ]:           2301 :     if ((flags & (CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IS_SHUTDOWN)) == 0)
                               6769                 :           1111 :         return;
                               6770                 :                : 
                               6771         [ +  + ]:           1190 :     if (reset)
                               6772                 :            595 :         set_ps_display("");
                               6773                 :                :     else
                               6774                 :                :     {
                               6775                 :                :         char        activitymsg[128];
                               6776                 :                : 
                               6777         [ +  + ]:           1785 :         snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
                               6778         [ +  + ]:            595 :                  (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
                               6779         [ +  + ]:            595 :                  (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
                               6780                 :                :                  restartpoint ? "restartpoint" : "checkpoint");
                               6781                 :            595 :         set_ps_display(activitymsg);
                               6782                 :                :     }
                               6783                 :                : }
                               6784                 :                : 
                               6785                 :                : 
                               6786                 :                : /*
                               6787                 :                :  * Perform a checkpoint --- either during shutdown, or on-the-fly
                               6788                 :                :  *
                               6789                 :                :  * flags is a bitwise OR of the following:
                               6790                 :                :  *  CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
                               6791                 :                :  *  CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
                               6792                 :                :  *  CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
                               6793                 :                :  *      ignoring checkpoint_completion_target parameter.
                               6794                 :                :  *  CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
                               6795                 :                :  *      since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
                               6796                 :                :  *      CHECKPOINT_END_OF_RECOVERY).
                               6797                 :                :  *  CHECKPOINT_FLUSH_ALL: also flush buffers of unlogged tables.
                               6798                 :                :  *
                               6799                 :                :  * Note: flags contains other bits, of interest here only for logging purposes.
                               6800                 :                :  * In particular note that this routine is synchronous and does not pay
                               6801                 :                :  * attention to CHECKPOINT_WAIT.
                               6802                 :                :  *
                               6803                 :                :  * If !shutdown then we are writing an online checkpoint. An XLOG_CHECKPOINT_REDO
                               6804                 :                :  * record is inserted into WAL at the logical location of the checkpoint, before
                               6805                 :                :  * flushing anything to disk, and when the checkpoint is eventually completed,
                               6806                 :                :  * and it is from this point that WAL replay will begin in the case of a recovery
                               6807                 :                :  * from this checkpoint. Once everything is written to disk, an
                               6808                 :                :  * XLOG_CHECKPOINT_ONLINE record is written to complete the checkpoint, and
                               6809                 :                :  * points back to the earlier XLOG_CHECKPOINT_REDO record. This mechanism allows
                               6810                 :                :  * other write-ahead log records to be written while the checkpoint is in
                               6811                 :                :  * progress, but we must be very careful about order of operations. This function
                               6812                 :                :  * may take many minutes to execute on a busy system.
                               6813                 :                :  *
                               6814                 :                :  * On the other hand, when shutdown is true, concurrent insertion into the
                               6815                 :                :  * write-ahead log is impossible, so there is no need for two separate records.
                               6816                 :                :  * In this case, we only insert an XLOG_CHECKPOINT_SHUTDOWN record, and it's
                               6817                 :                :  * both the record marking the completion of the checkpoint and the location
                               6818                 :                :  * from which WAL replay would begin if needed.
                               6819                 :                :  */
                               6820                 :                : void
 6135 tgl@sss.pgh.pa.us        6821                 :           1105 : CreateCheckPoint(int flags)
                               6822                 :                : {
                               6823                 :                :     bool        shutdown;
                               6824                 :                :     CheckPoint  checkPoint;
                               6825                 :                :     XLogRecPtr  recptr;
                               6826                 :                :     XLogSegNo   _logSegNo;
 8768 bruce@momjian.us         6827                 :           1105 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               6828                 :                :     uint32      freespace;
                               6829                 :                :     XLogRecPtr  PriorRedoPtr;
                               6830                 :                :     XLogRecPtr  last_important_lsn;
                               6831                 :                :     VirtualTransactionId *vxids;
                               6832                 :                :     int         nvxids;
  902 rhaas@postgresql.org     6833                 :           1105 :     int         oldXLogAllowed = 0;
                               6834                 :                : 
                               6835                 :                :     /*
                               6836                 :                :      * An end-of-recovery checkpoint is really a shutdown checkpoint, just
                               6837                 :                :      * issued at a different time.
                               6838                 :                :      */
 5406 tgl@sss.pgh.pa.us        6839         [ +  + ]:           1105 :     if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY))
 5407 heikki.linnakangas@i     6840                 :            575 :         shutdown = true;
                               6841                 :                :     else
                               6842                 :            530 :         shutdown = false;
                               6843                 :                : 
                               6844                 :                :     /* sanity check */
 5406 tgl@sss.pgh.pa.us        6845   [ +  +  -  + ]:           1105 :     if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)
 5406 tgl@sss.pgh.pa.us        6846         [ #  # ]:UBC           0 :         elog(ERROR, "can't create a checkpoint during recovery");
                               6847                 :                : 
                               6848                 :                :     /*
                               6849                 :                :      * Prepare to accumulate statistics.
                               6850                 :                :      *
                               6851                 :                :      * Note: because it is possible for log_checkpoints to change while a
                               6852                 :                :      * checkpoint proceeds, we always accumulate stats, even if
                               6853                 :                :      * log_checkpoints is currently off.
                               6854                 :                :      */
 6133 tgl@sss.pgh.pa.us        6855   [ +  -  +  -  :CBC       12155 :     MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
                                     +  -  +  -  +  
                                                 + ]
                               6856                 :           1105 :     CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
                               6857                 :                : 
                               6858                 :                :     /*
                               6859                 :                :      * Let smgr prepare for checkpoint; this has to happen outside the
                               6860                 :                :      * critical section and before we determine the REDO pointer.  Note that
                               6861                 :                :      * smgr must not do anything that'd have to be undone if we decide no
                               6862                 :                :      * checkpoint is needed.
                               6863                 :                :      */
  760 tmunro@postgresql.or     6864                 :           1105 :     SyncPreCheckpoint();
                               6865                 :                : 
                               6866                 :                :     /*
                               6867                 :                :      * Use a critical section to force system panic if we have trouble.
                               6868                 :                :      */
 8233 tgl@sss.pgh.pa.us        6869                 :           1105 :     START_CRIT_SECTION();
                               6870                 :                : 
 8966 vadim4o@yahoo.com        6871         [ +  + ]:           1105 :     if (shutdown)
                               6872                 :                :     {
 5534 heikki.linnakangas@i     6873                 :            575 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 8966 vadim4o@yahoo.com        6874                 :            575 :         ControlFile->state = DB_SHUTDOWNING;
                               6875                 :            575 :         UpdateControlFile();
 5534 heikki.linnakangas@i     6876                 :            575 :         LWLockRelease(ControlFileLock);
                               6877                 :                :     }
                               6878                 :                : 
                               6879                 :                :     /* Begin filling in the checkpoint WAL record */
 7633 tgl@sss.pgh.pa.us        6880   [ +  -  +  -  :          13260 :     MemSet(&checkPoint, 0, sizeof(checkPoint));
                                     +  -  +  -  +  
                                                 + ]
 5901                          6881                 :           1105 :     checkPoint.time = (pg_time_t) time(NULL);
                               6882                 :                : 
                               6883                 :                :     /*
                               6884                 :                :      * For Hot Standby, derive the oldestActiveXid before we fix the redo
                               6885                 :                :      * pointer. This allows us to begin accumulating changes to assemble our
                               6886                 :                :      * starting snapshot of locks and transactions.
                               6887                 :                :      */
 4547 simon@2ndQuadrant.co     6888   [ +  +  +  + ]:           1105 :     if (!shutdown && XLogStandbyInfoActive())
                               6889                 :            490 :         checkPoint.oldestActiveXid = GetOldestActiveTransactionId();
                               6890                 :                :     else
                               6891                 :            615 :         checkPoint.oldestActiveXid = InvalidTransactionId;
                               6892                 :                : 
                               6893                 :                :     /*
                               6894                 :                :      * Get location of last important record before acquiring insert locks (as
                               6895                 :                :      * GetLastImportantRecPtr() also locks WAL locks).
                               6896                 :                :      */
 2670 andres@anarazel.de       6897                 :           1105 :     last_important_lsn = GetLastImportantRecPtr();
                               6898                 :                : 
                               6899                 :                :     /*
                               6900                 :                :      * If this isn't a shutdown or forced checkpoint, and if there has been no
                               6901                 :                :      * WAL activity requiring a checkpoint, skip it.  The idea here is to
                               6902                 :                :      * avoid inserting duplicate checkpoints when the system is idle.
                               6903                 :                :      */
 5407 heikki.linnakangas@i     6904         [ +  + ]:           1105 :     if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
                               6905                 :                :                   CHECKPOINT_FORCE)) == 0)
                               6906                 :                :     {
 2670 andres@anarazel.de       6907         [ -  + ]:             25 :         if (last_important_lsn == ControlFile->checkPoint)
                               6908                 :                :         {
 8433 tgl@sss.pgh.pa.us        6909         [ #  # ]:LBC         (5) :             END_CRIT_SECTION();
 2670 andres@anarazel.de       6910         [ #  # ]:            (5) :             ereport(DEBUG1,
                               6911                 :                :                     (errmsg_internal("checkpoint skipped because system is idle")));
 8433 tgl@sss.pgh.pa.us        6912                 :            (5) :             return;
                               6913                 :                :         }
                               6914                 :                :     }
                               6915                 :                : 
                               6916                 :                :     /*
                               6917                 :                :      * An end-of-recovery checkpoint is created before anyone is allowed to
                               6918                 :                :      * write WAL. To allow us to write the checkpoint record, temporarily
                               6919                 :                :      * enable XLogInsertAllowed.
                               6920                 :                :      */
 5344 heikki.linnakangas@i     6921         [ +  + ]:CBC        1105 :     if (flags & CHECKPOINT_END_OF_RECOVERY)
  902 rhaas@postgresql.org     6922                 :            109 :         oldXLogAllowed = LocalSetXLogInsertAllowed();
                               6923                 :                : 
  886                          6924                 :           1105 :     checkPoint.ThisTimeLineID = XLogCtl->InsertTimeLineID;
 4080 heikki.linnakangas@i     6925         [ +  + ]:           1105 :     if (flags & CHECKPOINT_END_OF_RECOVERY)
                               6926                 :            109 :         checkPoint.PrevTimeLineID = XLogCtl->PrevTimeLineID;
                               6927                 :                :     else
  891 rhaas@postgresql.org     6928                 :            996 :         checkPoint.PrevTimeLineID = checkPoint.ThisTimeLineID;
                               6929                 :                : 
                               6930                 :                :     /*
                               6931                 :                :      * We must block concurrent insertions while examining insert state.
                               6932                 :                :      */
  178 rhaas@postgresql.org     6933                 :GNC        1105 :     WALInsertLockAcquireExclusive();
                               6934                 :                : 
  178 rhaas@postgresql.org     6935                 :CBC        1105 :     checkPoint.fullPageWrites = Insert->fullPageWrites;
                               6936                 :                : 
  178 rhaas@postgresql.org     6937         [ +  + ]:GNC        1105 :     if (shutdown)
                               6938                 :                :     {
                               6939                 :            575 :         XLogRecPtr  curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos);
                               6940                 :                : 
                               6941                 :                :         /*
                               6942                 :                :          * Compute new REDO record ptr = location of next XLOG record.
                               6943                 :                :          *
                               6944                 :                :          * Since this is a shutdown checkpoint, there can't be any concurrent
                               6945                 :                :          * WAL insertion.
                               6946                 :                :          */
                               6947         [ +  - ]:            575 :         freespace = INSERT_FREESPACE(curInsert);
                               6948         [ -  + ]:            575 :         if (freespace == 0)
                               6949                 :                :         {
  178 rhaas@postgresql.org     6950         [ #  # ]:UNC           0 :             if (XLogSegmentOffset(curInsert, wal_segment_size) == 0)
                               6951                 :              0 :                 curInsert += SizeOfXLogLongPHD;
                               6952                 :                :             else
                               6953                 :              0 :                 curInsert += SizeOfXLogShortPHD;
                               6954                 :                :         }
  178 rhaas@postgresql.org     6955                 :GNC         575 :         checkPoint.redo = curInsert;
                               6956                 :                : 
                               6957                 :                :         /*
                               6958                 :                :          * Here we update the shared RedoRecPtr for future XLogInsert calls;
                               6959                 :                :          * this must be done while holding all the insertion locks.
                               6960                 :                :          *
                               6961                 :                :          * Note: if we fail to complete the checkpoint, RedoRecPtr will be
                               6962                 :                :          * left pointing past where it really needs to point.  This is okay;
                               6963                 :                :          * the only consequence is that XLogInsert might back up whole buffers
                               6964                 :                :          * that it didn't really need to.  We can't postpone advancing
                               6965                 :                :          * RedoRecPtr because XLogInserts that happen while we are dumping
                               6966                 :                :          * buffers must assume that their buffer changes are not included in
                               6967                 :                :          * the checkpoint.
                               6968                 :                :          */
                               6969                 :            575 :         RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
                               6970                 :                :     }
                               6971                 :                : 
                               6972                 :                :     /*
                               6973                 :                :      * Now we can release the WAL insertion locks, allowing other xacts to
                               6974                 :                :      * proceed while we are flushing disk buffers.
                               6975                 :                :      */
 3677 heikki.linnakangas@i     6976                 :CBC        1105 :     WALInsertLockRelease();
                               6977                 :                : 
                               6978                 :                :     /*
                               6979                 :                :      * If this is an online checkpoint, we have not yet determined the redo
                               6980                 :                :      * point. We do so now by inserting the special XLOG_CHECKPOINT_REDO
                               6981                 :                :      * record; the LSN at which it starts becomes the new redo pointer. We
                               6982                 :                :      * don't do this for a shutdown checkpoint, because in that case no WAL
                               6983                 :                :      * can be written between the redo point and the insertion of the
                               6984                 :                :      * checkpoint record itself, so the checkpoint record itself serves to
                               6985                 :                :      * mark the redo point.
                               6986                 :                :      */
  178 rhaas@postgresql.org     6987         [ +  + ]:GNC        1105 :     if (!shutdown)
                               6988                 :                :     {
                               6989                 :            530 :         int         dummy = 0;
                               6990                 :                : 
                               6991                 :                :         /* Record must have payload to avoid assertion failure. */
                               6992                 :            530 :         XLogBeginInsert();
                               6993                 :            530 :         XLogRegisterData((char *) &dummy, sizeof(dummy));
                               6994                 :            530 :         (void) XLogInsert(RM_XLOG_ID, XLOG_CHECKPOINT_REDO);
                               6995                 :                : 
                               6996                 :                :         /*
                               6997                 :                :          * XLogInsertRecord will have updated XLogCtl->Insert.RedoRecPtr in
                               6998                 :                :          * shared memory and RedoRecPtr in backend-local memory, but we need
                               6999                 :                :          * to copy that into the record that will be inserted when the
                               7000                 :                :          * checkpoint is complete.
                               7001                 :                :          */
                               7002                 :            530 :         checkPoint.redo = RedoRecPtr;
                               7003                 :                :     }
                               7004                 :                : 
                               7005                 :                :     /* Update the info_lck-protected copy of RedoRecPtr as well */
 3492 andres@anarazel.de       7006         [ -  + ]:CBC        1105 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7007                 :           1105 :     XLogCtl->RedoRecPtr = checkPoint.redo;
                               7008                 :           1105 :     SpinLockRelease(&XLogCtl->info_lck);
                               7009                 :                : 
                               7010                 :                :     /*
                               7011                 :                :      * If enabled, log checkpoint start.  We postpone this until now so as not
                               7012                 :                :      * to log anything if we decided to skip the checkpoint.
                               7013                 :                :      */
 6133 tgl@sss.pgh.pa.us        7014         [ +  + ]:           1105 :     if (log_checkpoints)
 5534 heikki.linnakangas@i     7015                 :            879 :         LogCheckpointStart(flags, false);
                               7016                 :                : 
                               7017                 :                :     /* Update the process title */
 1217 michael@paquier.xyz      7018                 :           1105 :     update_checkpoint_display(flags, false, false);
                               7019                 :                : 
                               7020                 :                :     TRACE_POSTGRESQL_CHECKPOINT_START(flags);
                               7021                 :                : 
                               7022                 :                :     /*
                               7023                 :                :      * Get the other info we need for the checkpoint record.
                               7024                 :                :      *
                               7025                 :                :      * We don't need to save oldestClogXid in the checkpoint, it only matters
                               7026                 :                :      * for the short period in which clog is being truncated, and if we crash
                               7027                 :                :      * during that we'll redo the clog truncation and fix up oldestClogXid
                               7028                 :                :      * there.
                               7029                 :                :      */
 3663 heikki.linnakangas@i     7030                 :           1105 :     LWLockAcquire(XidGenLock, LW_SHARED);
  128 heikki.linnakangas@i     7031                 :GNC        1105 :     checkPoint.nextXid = TransamVariables->nextXid;
                               7032                 :           1105 :     checkPoint.oldestXid = TransamVariables->oldestXid;
                               7033                 :           1105 :     checkPoint.oldestXidDB = TransamVariables->oldestXidDB;
 3663 heikki.linnakangas@i     7034                 :CBC        1105 :     LWLockRelease(XidGenLock);
                               7035                 :                : 
 3420 alvherre@alvh.no-ip.     7036                 :           1105 :     LWLockAcquire(CommitTsLock, LW_SHARED);
  128 heikki.linnakangas@i     7037                 :GNC        1105 :     checkPoint.oldestCommitTsXid = TransamVariables->oldestCommitTsXid;
                               7038                 :           1105 :     checkPoint.newestCommitTsXid = TransamVariables->newestCommitTsXid;
 3420 alvherre@alvh.no-ip.     7039                 :CBC        1105 :     LWLockRelease(CommitTsLock);
                               7040                 :                : 
 3663 heikki.linnakangas@i     7041                 :           1105 :     LWLockAcquire(OidGenLock, LW_SHARED);
  128 heikki.linnakangas@i     7042                 :GNC        1105 :     checkPoint.nextOid = TransamVariables->nextOid;
 3663 heikki.linnakangas@i     7043         [ +  + ]:CBC        1105 :     if (!shutdown)
  128 heikki.linnakangas@i     7044                 :GNC         530 :         checkPoint.nextOid += TransamVariables->oidCount;
 3663 heikki.linnakangas@i     7045                 :CBC        1105 :     LWLockRelease(OidGenLock);
                               7046                 :                : 
                               7047                 :           1105 :     MultiXactGetCheckptMulti(shutdown,
                               7048                 :                :                              &checkPoint.nextMulti,
                               7049                 :                :                              &checkPoint.nextMultiOffset,
                               7050                 :                :                              &checkPoint.oldestMulti,
                               7051                 :                :                              &checkPoint.oldestMultiDB);
                               7052                 :                : 
                               7053                 :                :     /*
                               7054                 :                :      * Having constructed the checkpoint record, ensure all shmem disk buffers
                               7055                 :                :      * and commit-log buffers are flushed to disk.
                               7056                 :                :      *
                               7057                 :                :      * This I/O could fail for various reasons.  If so, we will fail to
                               7058                 :                :      * complete the checkpoint, but there is no reason to force a system
                               7059                 :                :      * panic. Accordingly, exit critical section while doing it.
                               7060                 :                :      */
                               7061         [ -  + ]:           1105 :     END_CRIT_SECTION();
                               7062                 :                : 
                               7063                 :                :     /*
                               7064                 :                :      * In some cases there are groups of actions that must all occur on one
                               7065                 :                :      * side or the other of a checkpoint record. Before flushing the
                               7066                 :                :      * checkpoint record we must explicitly wait for any backend currently
                               7067                 :                :      * performing those groups of actions.
                               7068                 :                :      *
                               7069                 :                :      * One example is end of transaction, so we must wait for any transactions
                               7070                 :                :      * that are currently in commit critical sections.  If an xact inserted
                               7071                 :                :      * its commit record into XLOG just before the REDO point, then a crash
                               7072                 :                :      * restart from the REDO point would not replay that record, which means
                               7073                 :                :      * that our flushing had better include the xact's update of pg_xact.  So
                               7074                 :                :      * we wait till he's out of his commit critical section before proceeding.
                               7075                 :                :      * See notes in RecordTransactionCommit().
                               7076                 :                :      *
                               7077                 :                :      * Because we've already released the insertion locks, this test is a bit
                               7078                 :                :      * fuzzy: it is possible that we will wait for xacts we didn't really need
                               7079                 :                :      * to wait for.  But the delay should be short and it seems better to make
                               7080                 :                :      * checkpoint take a bit longer than to hold off insertions longer than
                               7081                 :                :      * necessary. (In fact, the whole reason we have this issue is that xact.c
                               7082                 :                :      * does commit record XLOG insertion and clog update as two separate steps
                               7083                 :                :      * protected by different locks, but again that seems best on grounds of
                               7084                 :                :      * minimizing lock contention.)
                               7085                 :                :      *
                               7086                 :                :      * A transaction that has not yet set delayChkptFlags when we look cannot
                               7087                 :                :      * be at risk, since it has not inserted its commit record yet; and one
                               7088                 :                :      * that's already cleared it is not at risk either, since it's done fixing
                               7089                 :                :      * clog and we will correctly flush the update below.  So we cannot miss
                               7090                 :                :      * any xacts we need to wait for.
                               7091                 :                :      */
  752 rhaas@postgresql.org     7092                 :           1105 :     vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_START);
 4150 simon@2ndQuadrant.co     7093         [ +  + ]:           1105 :     if (nvxids > 0)
                               7094                 :                :     {
                               7095                 :                :         do
                               7096                 :                :         {
  184 tmunro@postgresql.or     7097                 :GNC           1 :             pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
 5995 bruce@momjian.us         7098                 :CBC           1 :             pg_usleep(10000L);  /* wait for 10 msec */
  184 tmunro@postgresql.or     7099                 :GNC           1 :             pgstat_report_wait_end();
  752 rhaas@postgresql.org     7100         [ -  + ]:CBC           1 :         } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
                               7101                 :                :                                               DELAY_CHKPT_START));
                               7102                 :                :     }
 4150 simon@2ndQuadrant.co     7103                 :           1105 :     pfree(vxids);
                               7104                 :                : 
 6135 tgl@sss.pgh.pa.us        7105                 :           1105 :     CheckPointGuts(checkPoint.redo, flags);
                               7106                 :                : 
  752 rhaas@postgresql.org     7107                 :           1100 :     vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_COMPLETE);
                               7108         [ -  + ]:           1100 :     if (nvxids > 0)
                               7109                 :                :     {
                               7110                 :                :         do
                               7111                 :                :         {
  184 tmunro@postgresql.or     7112                 :UNC           0 :             pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_COMPLETE);
  752 rhaas@postgresql.org     7113                 :UBC           0 :             pg_usleep(10000L);  /* wait for 10 msec */
  184 tmunro@postgresql.or     7114                 :UNC           0 :             pgstat_report_wait_end();
  752 rhaas@postgresql.org     7115         [ #  # ]:UBC           0 :         } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
                               7116                 :                :                                               DELAY_CHKPT_COMPLETE));
                               7117                 :                :     }
  752 rhaas@postgresql.org     7118                 :CBC        1100 :     pfree(vxids);
                               7119                 :                : 
                               7120                 :                :     /*
                               7121                 :                :      * Take a snapshot of running transactions and write this to WAL. This
                               7122                 :                :      * allows us to reconstruct the state of running transactions during
                               7123                 :                :      * archive recovery, if required. Skip, if this info disabled.
                               7124                 :                :      *
                               7125                 :                :      * If we are shutting down, or Startup process is completing crash
                               7126                 :                :      * recovery we don't need to write running xact data.
                               7127                 :                :      */
 5230 simon@2ndQuadrant.co     7128   [ +  +  +  + ]:           1100 :     if (!shutdown && XLogStandbyInfoActive())
 4151 tgl@sss.pgh.pa.us        7129                 :            485 :         LogStandbySnapshot();
                               7130                 :                : 
 7645                          7131                 :           1100 :     START_CRIT_SECTION();
                               7132                 :                : 
                               7133                 :                :     /*
                               7134                 :                :      * Now insert the checkpoint record into XLOG.
                               7135                 :                :      */
 3433 heikki.linnakangas@i     7136                 :           1100 :     XLogBeginInsert();
                               7137                 :           1100 :     XLogRegisterData((char *) (&checkPoint), sizeof(checkPoint));
 8433 tgl@sss.pgh.pa.us        7138         [ +  + ]:           1100 :     recptr = XLogInsert(RM_XLOG_ID,
                               7139                 :                :                         shutdown ? XLOG_CHECKPOINT_SHUTDOWN :
                               7140                 :                :                         XLOG_CHECKPOINT_ONLINE);
                               7141                 :                : 
                               7142                 :           1100 :     XLogFlush(recptr);
                               7143                 :                : 
                               7144                 :                :     /*
                               7145                 :                :      * We mustn't write any new WAL after a shutdown checkpoint, or it will be
                               7146                 :                :      * overwritten at next startup.  No-one should even try, this just allows
                               7147                 :                :      * sanity-checking.  In the case of an end-of-recovery checkpoint, we want
                               7148                 :                :      * to just temporarily disable writing until the system has exited
                               7149                 :                :      * recovery.
                               7150                 :                :      */
 5406                          7151         [ +  + ]:           1100 :     if (shutdown)
                               7152                 :                :     {
                               7153         [ +  + ]:            575 :         if (flags & CHECKPOINT_END_OF_RECOVERY)
  902 rhaas@postgresql.org     7154                 :            109 :             LocalXLogInsertAllowed = oldXLogAllowed;
                               7155                 :                :         else
 5161 bruce@momjian.us         7156                 :            466 :             LocalXLogInsertAllowed = 0; /* never again write WAL */
                               7157                 :                :     }
                               7158                 :                : 
                               7159                 :                :     /*
                               7160                 :                :      * We now have ProcLastRecPtr = start of actual checkpoint record, recptr
                               7161                 :                :      * = end of actual checkpoint record.
                               7162                 :                :      */
 4125 alvherre@alvh.no-ip.     7163   [ +  +  -  + ]:           1100 :     if (shutdown && checkPoint.redo != ProcLastRecPtr)
 7573 tgl@sss.pgh.pa.us        7164         [ #  # ]:UBC           0 :         ereport(PANIC,
                               7165                 :                :                 (errmsg("concurrent write-ahead log activity while database system is shutting down")));
                               7166                 :                : 
                               7167                 :                :     /*
                               7168                 :                :      * Remember the prior checkpoint's redo ptr for
                               7169                 :                :      * UpdateCheckPointDistanceEstimate()
                               7170                 :                :      */
 3338 heikki.linnakangas@i     7171                 :CBC        1100 :     PriorRedoPtr = ControlFile->checkPointCopy.redo;
                               7172                 :                : 
                               7173                 :                :     /*
                               7174                 :                :      * Update the control file.
                               7175                 :                :      */
 8233 tgl@sss.pgh.pa.us        7176                 :           1100 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 8966 vadim4o@yahoo.com        7177         [ +  + ]:           1100 :     if (shutdown)
                               7178                 :            575 :         ControlFile->state = DB_SHUTDOWNED;
 8433 tgl@sss.pgh.pa.us        7179                 :           1100 :     ControlFile->checkPoint = ProcLastRecPtr;
                               7180                 :           1100 :     ControlFile->checkPointCopy = checkPoint;
                               7181                 :                :     /* crash recovery should always recover to the end of WAL */
 4126 alvherre@alvh.no-ip.     7182                 :           1100 :     ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
 4149 heikki.linnakangas@i     7183                 :           1100 :     ControlFile->minRecoveryPointTLI = 0;
                               7184                 :                : 
                               7185                 :                :     /*
                               7186                 :                :      * Persist unloggedLSN value. It's reset on crash recovery, so this goes
                               7187                 :                :      * unused on non-shutdown checkpoints, but seems useful to store it always
                               7188                 :                :      * for debugging purposes.
                               7189                 :                :      */
   45 nathan@postgresql.or     7190                 :GNC        1100 :     ControlFile->unloggedLSN = pg_atomic_read_membarrier_u64(&XLogCtl->unloggedLSN);
                               7191                 :                : 
 8966 vadim4o@yahoo.com        7192                 :CBC        1100 :     UpdateControlFile();
 8233 tgl@sss.pgh.pa.us        7193                 :           1100 :     LWLockRelease(ControlFileLock);
                               7194                 :                : 
                               7195                 :                :     /* Update shared-memory copy of checkpoint XID/epoch */
 3492 andres@anarazel.de       7196         [ -  + ]:           1100 :     SpinLockAcquire(&XLogCtl->info_lck);
 1342                          7197                 :           1100 :     XLogCtl->ckptFullXid = checkPoint.nextXid;
 3492                          7198                 :           1100 :     SpinLockRelease(&XLogCtl->info_lck);
                               7199                 :                : 
                               7200                 :                :     /*
                               7201                 :                :      * We are now done with critical updates; no need for system panic if we
                               7202                 :                :      * have trouble while fooling with old log segments.
                               7203                 :                :      */
 7645 tgl@sss.pgh.pa.us        7204         [ -  + ]:           1100 :     END_CRIT_SECTION();
                               7205                 :                : 
                               7206                 :                :     /*
                               7207                 :                :      * WAL summaries end when the next XLOG_CHECKPOINT_REDO or
                               7208                 :                :      * XLOG_CHECKPOINT_SHUTDOWN record is reached. This is the first point
                               7209                 :                :      * where (a) we're not inside of a critical section and (b) we can be
                               7210                 :                :      * certain that the relevant record has been flushed to disk, which must
                               7211                 :                :      * happen before it can be summarized.
                               7212                 :                :      *
                               7213                 :                :      * If this is a shutdown checkpoint, then this happens reasonably
                               7214                 :                :      * promptly: we've only just inserted and flushed the
                               7215                 :                :      * XLOG_CHECKPOINT_SHUTDOWN record. If this is not a shutdown checkpoint,
                               7216                 :                :      * then this might not be very prompt at all: the XLOG_CHECKPOINT_REDO
                               7217                 :                :      * record was written before we began flushing data to disk, and that
                               7218                 :                :      * could be many minutes ago at this point. However, we don't XLogFlush()
                               7219                 :                :      * after inserting that record, so we're not guaranteed that it's on disk
                               7220                 :                :      * until after the above call that flushes the XLOG_CHECKPOINT_ONLINE
                               7221                 :                :      * record.
                               7222                 :                :      */
  116 rhaas@postgresql.org     7223                 :GNC        1100 :     SetWalSummarizerLatch();
                               7224                 :                : 
                               7225                 :                :     /*
                               7226                 :                :      * Let smgr do post-checkpoint cleanup (eg, deleting old files).
                               7227                 :                :      */
 1837 tmunro@postgresql.or     7228                 :CBC        1100 :     SyncPostCheckpoint();
                               7229                 :                : 
                               7230                 :                :     /*
                               7231                 :                :      * Update the average distance between checkpoints if the prior checkpoint
                               7232                 :                :      * exists.
                               7233                 :                :      */
 3338 heikki.linnakangas@i     7234         [ +  - ]:           1100 :     if (PriorRedoPtr != InvalidXLogRecPtr)
                               7235                 :           1100 :         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
                               7236                 :                : 
                               7237                 :                :     /*
                               7238                 :                :      * Delete old log files, those no longer needed for last checkpoint to
                               7239                 :                :      * prevent the disk holding the xlog from growing full.
                               7240                 :                :      */
 2091 michael@paquier.xyz      7241                 :           1100 :     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7242                 :           1100 :     KeepLogSeg(recptr, &_logSegNo);
  373 andres@anarazel.de       7243         [ +  + ]:           1100 :     if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED,
                               7244                 :                :                                            _logSegNo, InvalidOid,
                               7245                 :                :                                            InvalidTransactionId))
                               7246                 :                :     {
                               7247                 :                :         /*
                               7248                 :                :          * Some slots have been invalidated; recalculate the old-segment
                               7249                 :                :          * horizon, starting again from RedoRecPtr.
                               7250                 :                :          */
 1003 alvherre@alvh.no-ip.     7251                 :              2 :         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7252                 :              2 :         KeepLogSeg(recptr, &_logSegNo);
                               7253                 :                :     }
 2091 michael@paquier.xyz      7254                 :           1100 :     _logSegNo--;
  891 rhaas@postgresql.org     7255                 :           1100 :     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
                               7256                 :                :                        checkPoint.ThisTimeLineID);
                               7257                 :                : 
                               7258                 :                :     /*
                               7259                 :                :      * Make more log segments if needed.  (Do this after recycling old log
                               7260                 :                :      * segments, since that may supply some of the needed files.)
                               7261                 :                :      */
 8433 tgl@sss.pgh.pa.us        7262         [ +  + ]:           1100 :     if (!shutdown)
  891 rhaas@postgresql.org     7263                 :            525 :         PreallocXlogFiles(recptr, checkPoint.ThisTimeLineID);
                               7264                 :                : 
                               7265                 :                :     /*
                               7266                 :                :      * Truncate pg_subtrans if possible.  We can throw away all data before
                               7267                 :                :      * the oldest XMIN of any running transaction.  No future transaction will
                               7268                 :                :      * attempt to reference any pg_subtrans entry older than that (see Asserts
                               7269                 :                :      * in subtrans.c).  During recovery, though, we mustn't do this because
                               7270                 :                :      * StartupSUBTRANS hasn't been called yet.
                               7271                 :                :      */
 5406 tgl@sss.pgh.pa.us        7272         [ +  + ]:           1100 :     if (!RecoveryInProgress())
 1341 andres@anarazel.de       7273                 :            991 :         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
                               7274                 :                : 
                               7275                 :                :     /* Real work is done; log and update stats. */
 4392 rhaas@postgresql.org     7276                 :           1100 :     LogCheckpointEnd(false);
                               7277                 :                : 
                               7278                 :                :     /* Reset the process title */
 1217 michael@paquier.xyz      7279                 :           1100 :     update_checkpoint_display(flags, false, true);
                               7280                 :                : 
                               7281                 :                :     TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
                               7282                 :                :                                      NBuffers,
                               7283                 :                :                                      CheckpointStats.ckpt_segs_added,
                               7284                 :                :                                      CheckpointStats.ckpt_segs_removed,
                               7285                 :                :                                      CheckpointStats.ckpt_segs_recycled);
                               7286                 :                : }
                               7287                 :                : 
                               7288                 :                : /*
                               7289                 :                :  * Mark the end of recovery in WAL though without running a full checkpoint.
                               7290                 :                :  * We can expect that a restartpoint is likely to be in progress as we
                               7291                 :                :  * do this, though we are unwilling to wait for it to complete.
                               7292                 :                :  *
                               7293                 :                :  * CreateRestartPoint() allows for the case where recovery may end before
                               7294                 :                :  * the restartpoint completes so there is no concern of concurrent behaviour.
                               7295                 :                :  */
                               7296                 :                : static void
 4093 simon@2ndQuadrant.co     7297                 :             39 : CreateEndOfRecoveryRecord(void)
                               7298                 :                : {
                               7299                 :                :     xl_end_of_recovery xlrec;
                               7300                 :                :     XLogRecPtr  recptr;
                               7301                 :                : 
                               7302                 :                :     /* sanity check */
                               7303         [ -  + ]:             39 :     if (!RecoveryInProgress())
 4093 simon@2ndQuadrant.co     7304         [ #  # ]:UBC           0 :         elog(ERROR, "can only be used to end recovery");
                               7305                 :                : 
 3404 heikki.linnakangas@i     7306                 :CBC          39 :     xlrec.end_time = GetCurrentTimestamp();
                               7307                 :                : 
 3677                          7308                 :             39 :     WALInsertLockAcquireExclusive();
  886 rhaas@postgresql.org     7309                 :             39 :     xlrec.ThisTimeLineID = XLogCtl->InsertTimeLineID;
 4080 heikki.linnakangas@i     7310                 :             39 :     xlrec.PrevTimeLineID = XLogCtl->PrevTimeLineID;
 3677                          7311                 :             39 :     WALInsertLockRelease();
                               7312                 :                : 
 4093 simon@2ndQuadrant.co     7313                 :             39 :     START_CRIT_SECTION();
                               7314                 :                : 
 3433 heikki.linnakangas@i     7315                 :             39 :     XLogBeginInsert();
                               7316                 :             39 :     XLogRegisterData((char *) &xlrec, sizeof(xl_end_of_recovery));
                               7317                 :             39 :     recptr = XLogInsert(RM_XLOG_ID, XLOG_END_OF_RECOVERY);
                               7318                 :                : 
 4091 simon@2ndQuadrant.co     7319                 :             39 :     XLogFlush(recptr);
                               7320                 :                : 
                               7321                 :                :     /*
                               7322                 :                :      * Update the control file so that crash recovery can follow the timeline
                               7323                 :                :      * changes to this point.
                               7324                 :                :      */
                               7325                 :             39 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               7326                 :             39 :     ControlFile->minRecoveryPoint = recptr;
  891 rhaas@postgresql.org     7327                 :             39 :     ControlFile->minRecoveryPointTLI = xlrec.ThisTimeLineID;
 4091 simon@2ndQuadrant.co     7328                 :             39 :     UpdateControlFile();
                               7329                 :             39 :     LWLockRelease(ControlFileLock);
                               7330                 :                : 
 4093                          7331         [ -  + ]:             39 :     END_CRIT_SECTION();
                               7332                 :             39 : }
                               7333                 :                : 
                               7334                 :                : /*
                               7335                 :                :  * Write an OVERWRITE_CONTRECORD message.
                               7336                 :                :  *
                               7337                 :                :  * When on WAL replay we expect a continuation record at the start of a page
                               7338                 :                :  * that is not there, recovery ends and WAL writing resumes at that point.
                               7339                 :                :  * But it's wrong to resume writing new WAL back at the start of the record
                               7340                 :                :  * that was broken, because downstream consumers of that WAL (physical
                               7341                 :                :  * replicas) are not prepared to "rewind".  So the first action after
                               7342                 :                :  * finishing replay of all valid WAL must be to write a record of this type
                               7343                 :                :  * at the point where the contrecord was missing; to support xlogreader
                               7344                 :                :  * detecting the special case, XLP_FIRST_IS_OVERWRITE_CONTRECORD is also added
                               7345                 :                :  * to the page header where the record occurs.  xlogreader has an ad-hoc
                               7346                 :                :  * mechanism to report metadata about the broken record, which is what we
                               7347                 :                :  * use here.
                               7348                 :                :  *
                               7349                 :                :  * At replay time, XLP_FIRST_IS_OVERWRITE_CONTRECORD instructs xlogreader to
                               7350                 :                :  * skip the record it was reading, and pass back the LSN of the skipped
                               7351                 :                :  * record, so that its caller can verify (on "replay" of that record) that the
                               7352                 :                :  * XLOG_OVERWRITE_CONTRECORD matches what was effectively overwritten.
                               7353                 :                :  *
                               7354                 :                :  * 'aborted_lsn' is the beginning position of the record that was incomplete.
                               7355                 :                :  * It is included in the WAL record.  'pagePtr' and 'newTLI' point to the
                               7356                 :                :  * beginning of the XLOG page where the record is to be inserted.  They must
                               7357                 :                :  * match the current WAL insert position, they're passed here just so that we
                               7358                 :                :  * can verify that.
                               7359                 :                :  */
                               7360                 :                : static XLogRecPtr
  788 heikki.linnakangas@i     7361                 :             10 : CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn, XLogRecPtr pagePtr,
                               7362                 :                :                                 TimeLineID newTLI)
                               7363                 :                : {
                               7364                 :                :     xl_overwrite_contrecord xlrec;
                               7365                 :                :     XLogRecPtr  recptr;
                               7366                 :                :     XLogPageHeader pagehdr;
                               7367                 :                :     XLogRecPtr  startPos;
                               7368                 :                : 
                               7369                 :                :     /* sanity checks */
  928 alvherre@alvh.no-ip.     7370         [ -  + ]:             10 :     if (!RecoveryInProgress())
  928 alvherre@alvh.no-ip.     7371         [ #  # ]:UBC           0 :         elog(ERROR, "can only be used at end of recovery");
  788 heikki.linnakangas@i     7372         [ -  + ]:CBC          10 :     if (pagePtr % XLOG_BLCKSZ != 0)
  788 heikki.linnakangas@i     7373         [ #  # ]:UBC           0 :         elog(ERROR, "invalid position for missing continuation record %X/%X",
                               7374                 :                :              LSN_FORMAT_ARGS(pagePtr));
                               7375                 :                : 
                               7376                 :                :     /* The current WAL insert position should be right after the page header */
  788 heikki.linnakangas@i     7377                 :CBC          10 :     startPos = pagePtr;
                               7378         [ +  + ]:             10 :     if (XLogSegmentOffset(startPos, wal_segment_size) == 0)
                               7379                 :              1 :         startPos += SizeOfXLogLongPHD;
                               7380                 :                :     else
                               7381                 :              9 :         startPos += SizeOfXLogShortPHD;
                               7382                 :             10 :     recptr = GetXLogInsertRecPtr();
                               7383         [ -  + ]:             10 :     if (recptr != startPos)
  788 heikki.linnakangas@i     7384         [ #  # ]:UBC           0 :         elog(ERROR, "invalid WAL insert position %X/%X for OVERWRITE_CONTRECORD",
                               7385                 :                :              LSN_FORMAT_ARGS(recptr));
                               7386                 :                : 
  928 alvherre@alvh.no-ip.     7387                 :CBC          10 :     START_CRIT_SECTION();
                               7388                 :                : 
                               7389                 :                :     /*
                               7390                 :                :      * Initialize the XLOG page header (by GetXLogBuffer), and set the
                               7391                 :                :      * XLP_FIRST_IS_OVERWRITE_CONTRECORD flag.
                               7392                 :                :      *
                               7393                 :                :      * No other backend is allowed to write WAL yet, so acquiring the WAL
                               7394                 :                :      * insertion lock is just pro forma.
                               7395                 :                :      */
  788 heikki.linnakangas@i     7396                 :             10 :     WALInsertLockAcquire();
                               7397                 :             10 :     pagehdr = (XLogPageHeader) GetXLogBuffer(pagePtr, newTLI);
                               7398                 :             10 :     pagehdr->xlp_info |= XLP_FIRST_IS_OVERWRITE_CONTRECORD;
                               7399                 :             10 :     WALInsertLockRelease();
                               7400                 :                : 
                               7401                 :                :     /*
                               7402                 :                :      * Insert the XLOG_OVERWRITE_CONTRECORD record as the first record on the
                               7403                 :                :      * page.  We know it becomes the first record, because no other backend is
                               7404                 :                :      * allowed to write WAL yet.
                               7405                 :                :      */
  928 alvherre@alvh.no-ip.     7406                 :             10 :     XLogBeginInsert();
  788 heikki.linnakangas@i     7407                 :             10 :     xlrec.overwritten_lsn = aborted_lsn;
                               7408                 :             10 :     xlrec.overwrite_time = GetCurrentTimestamp();
  928 alvherre@alvh.no-ip.     7409                 :             10 :     XLogRegisterData((char *) &xlrec, sizeof(xl_overwrite_contrecord));
                               7410                 :             10 :     recptr = XLogInsert(RM_XLOG_ID, XLOG_OVERWRITE_CONTRECORD);
                               7411                 :                : 
                               7412                 :                :     /* check that the record was inserted to the right place */
  788 heikki.linnakangas@i     7413         [ -  + ]:             10 :     if (ProcLastRecPtr != startPos)
  788 heikki.linnakangas@i     7414         [ #  # ]:UBC           0 :         elog(ERROR, "OVERWRITE_CONTRECORD was inserted to unexpected position %X/%X",
                               7415                 :                :              LSN_FORMAT_ARGS(ProcLastRecPtr));
                               7416                 :                : 
  928 alvherre@alvh.no-ip.     7417                 :CBC          10 :     XLogFlush(recptr);
                               7418                 :                : 
                               7419         [ -  + ]:             10 :     END_CRIT_SECTION();
                               7420                 :                : 
                               7421                 :             10 :     return recptr;
                               7422                 :                : }
                               7423                 :                : 
                               7424                 :                : /*
                               7425                 :                :  * Flush all data in shared memory to disk, and fsync
                               7426                 :                :  *
                               7427                 :                :  * This is the common code shared between regular checkpoints and
                               7428                 :                :  * recovery restartpoints.
                               7429                 :                :  */
                               7430                 :                : static void
 6135 tgl@sss.pgh.pa.us        7431                 :           1153 : CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
                               7432                 :                : {
 5180                          7433                 :           1153 :     CheckPointRelationMap();
  213 akapila@postgresql.o     7434                 :GNC        1153 :     CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 3695 rhaas@postgresql.org     7435                 :CBC        1153 :     CheckPointSnapBuild();
                               7436                 :           1153 :     CheckPointLogicalRewriteHeap();
 3273 andres@anarazel.de       7437                 :           1153 :     CheckPointReplicationOrigin();
                               7438                 :                : 
                               7439                 :                :     /* Write out all dirty data in SLRUs and the main buffer pool */
                               7440                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
 1297 tmunro@postgresql.or     7441                 :           1153 :     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
                               7442                 :           1153 :     CheckPointCLOG();
                               7443                 :           1153 :     CheckPointCommitTs();
                               7444                 :           1153 :     CheckPointSUBTRANS();
                               7445                 :           1153 :     CheckPointMultiXact();
                               7446                 :           1153 :     CheckPointPredicate();
                               7447                 :           1153 :     CheckPointBuffers(flags);
                               7448                 :                : 
                               7449                 :                :     /* Perform all queued up fsyncs */
                               7450                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
                               7451                 :           1148 :     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
                               7452                 :           1148 :     ProcessSyncRequests();
                               7453                 :           1148 :     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
                               7454                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
                               7455                 :                : 
                               7456                 :                :     /* We deliberately delay 2PC checkpointing as long as possible */
 6460 tgl@sss.pgh.pa.us        7457                 :           1148 :     CheckPointTwoPhase(checkPointRedo);
                               7458                 :           1148 : }
                               7459                 :                : 
                               7460                 :                : /*
                               7461                 :                :  * Save a checkpoint for recovery restart if appropriate
                               7462                 :                :  *
                               7463                 :                :  * This function is called each time a checkpoint record is read from XLOG.
                               7464                 :                :  * It must determine whether the checkpoint represents a safe restartpoint or
                               7465                 :                :  * not.  If so, the checkpoint record is stashed in shared memory so that
                               7466                 :                :  * CreateRestartPoint can consult it.  (Note that the latter function is
                               7467                 :                :  * executed by the checkpointer, while this one will be executed by the
                               7468                 :                :  * startup process.)
                               7469                 :                :  */
                               7470                 :                : static void
  872 rhaas@postgresql.org     7471                 :            326 : RecoveryRestartPoint(const CheckPoint *checkPoint, XLogReaderState *record)
                               7472                 :                : {
                               7473                 :                :     /*
                               7474                 :                :      * Also refrain from creating a restartpoint if we have seen any
                               7475                 :                :      * references to non-existent pages. Restarting recovery from the
                               7476                 :                :      * restartpoint would not see the references, so we would lose the
                               7477                 :                :      * cross-check that the pages belonged to a relation that was dropped
                               7478                 :                :      * later.
                               7479                 :                :      */
 4517 heikki.linnakangas@i     7480         [ -  + ]:            326 :     if (XLogHaveInvalidPages())
                               7481                 :                :     {
  125 michael@paquier.xyz      7482         [ #  # ]:UNC           0 :         elog(DEBUG2,
                               7483                 :                :              "could not record restart point at %X/%X because there "
                               7484                 :                :              "are unresolved references to invalid pages",
                               7485                 :                :              LSN_FORMAT_ARGS(checkPoint->redo));
 4517 heikki.linnakangas@i     7486                 :UBC           0 :         return;
                               7487                 :                :     }
                               7488                 :                : 
                               7489                 :                :     /*
                               7490                 :                :      * Copy the checkpoint record to shared memory, so that checkpointer can
                               7491                 :                :      * work out the next time it wants to perform a restartpoint.
                               7492                 :                :      */
 3492 andres@anarazel.de       7493         [ -  + ]:CBC         326 :     SpinLockAcquire(&XLogCtl->info_lck);
  872 rhaas@postgresql.org     7494                 :            326 :     XLogCtl->lastCheckPointRecPtr = record->ReadRecPtr;
                               7495                 :            326 :     XLogCtl->lastCheckPointEndPtr = record->EndRecPtr;
 3492 andres@anarazel.de       7496                 :            326 :     XLogCtl->lastCheckPoint = *checkPoint;
                               7497                 :            326 :     SpinLockRelease(&XLogCtl->info_lck);
                               7498                 :                : }
                               7499                 :                : 
                               7500                 :                : /*
                               7501                 :                :  * Establish a restartpoint if possible.
                               7502                 :                :  *
                               7503                 :                :  * This is similar to CreateCheckPoint, but is used during WAL recovery
                               7504                 :                :  * to establish a point from which recovery can roll forward without
                               7505                 :                :  * replaying the entire recovery log.
                               7506                 :                :  *
                               7507                 :                :  * Returns true if a new restartpoint was established. We can only establish
                               7508                 :                :  * a restartpoint if we have replayed a safe checkpoint record since last
                               7509                 :                :  * restartpoint.
                               7510                 :                :  */
                               7511                 :                : bool
 5534 heikki.linnakangas@i     7512                 :             97 : CreateRestartPoint(int flags)
                               7513                 :                : {
                               7514                 :                :     XLogRecPtr  lastCheckPointRecPtr;
                               7515                 :                :     XLogRecPtr  lastCheckPointEndPtr;
                               7516                 :                :     CheckPoint  lastCheckPoint;
                               7517                 :                :     XLogRecPtr  PriorRedoPtr;
                               7518                 :                :     XLogRecPtr  receivePtr;
                               7519                 :                :     XLogRecPtr  replayPtr;
                               7520                 :                :     TimeLineID  replayTLI;
                               7521                 :                :     XLogRecPtr  endptr;
                               7522                 :                :     XLogSegNo   _logSegNo;
                               7523                 :                :     TimestampTz xtime;
                               7524                 :                : 
                               7525                 :                :     /* Concurrent checkpoint/restartpoint cannot happen */
  706 michael@paquier.xyz      7526   [ +  -  -  + ]:             97 :     Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
                               7527                 :                : 
                               7528                 :                :     /* Get a local copy of the last safe checkpoint record. */
 3492 andres@anarazel.de       7529         [ -  + ]:             97 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7530                 :             97 :     lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr;
 2726 rhaas@postgresql.org     7531                 :             97 :     lastCheckPointEndPtr = XLogCtl->lastCheckPointEndPtr;
 3492 andres@anarazel.de       7532                 :             97 :     lastCheckPoint = XLogCtl->lastCheckPoint;
                               7533                 :             97 :     SpinLockRelease(&XLogCtl->info_lck);
                               7534                 :                : 
                               7535                 :                :     /*
                               7536                 :                :      * Check that we're still in recovery mode. It's ok if we exit recovery
                               7537                 :                :      * mode after this check, the restart point is valid anyway.
                               7538                 :                :      */
 5534 heikki.linnakangas@i     7539         [ -  + ]:             97 :     if (!RecoveryInProgress())
                               7540                 :                :     {
 5534 heikki.linnakangas@i     7541         [ #  # ]:UBC           0 :         ereport(DEBUG2,
                               7542                 :                :                 (errmsg_internal("skipping restartpoint, recovery has already ended")));
                               7543                 :              0 :         return false;
                               7544                 :                :     }
                               7545                 :                : 
                               7546                 :                :     /*
                               7547                 :                :      * If the last checkpoint record we've replayed is already our last
                               7548                 :                :      * restartpoint, we can't perform a new restart point. We still update
                               7549                 :                :      * minRecoveryPoint in that case, so that if this is a shutdown restart
                               7550                 :                :      * point, we won't start up earlier than before. That's not strictly
                               7551                 :                :      * necessary, but when hot standby is enabled, it would be rather weird if
                               7552                 :                :      * the database opened up for read-only connections at a point-in-time
                               7553                 :                :      * before the last shutdown. Such time travel is still possible in case of
                               7554                 :                :      * immediate shutdown, though.
                               7555                 :                :      *
                               7556                 :                :      * We don't explicitly advance minRecoveryPoint when we do create a
                               7557                 :                :      * restartpoint. It's assumed that flushing the buffers will do that as a
                               7558                 :                :      * side-effect.
                               7559                 :                :      */
 5534 heikki.linnakangas@i     7560         [ +  + ]:CBC          97 :     if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
 4125 alvherre@alvh.no-ip.     7561         [ +  + ]:             88 :         lastCheckPoint.redo <= ControlFile->checkPointCopy.redo)
                               7562                 :                :     {
 5534 heikki.linnakangas@i     7563         [ -  + ]:             49 :         ereport(DEBUG2,
                               7564                 :                :                 (errmsg_internal("skipping restartpoint, already performed at %X/%X",
                               7565                 :                :                                  LSN_FORMAT_ARGS(lastCheckPoint.redo))));
                               7566                 :                : 
                               7567                 :             49 :         UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
 5064 rhaas@postgresql.org     7568         [ +  + ]:             49 :         if (flags & CHECKPOINT_IS_SHUTDOWN)
                               7569                 :                :         {
                               7570                 :             27 :             LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               7571                 :             27 :             ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
                               7572                 :             27 :             UpdateControlFile();
                               7573                 :             27 :             LWLockRelease(ControlFileLock);
                               7574                 :                :         }
 5534 heikki.linnakangas@i     7575                 :             49 :         return false;
                               7576                 :                :     }
                               7577                 :                : 
                               7578                 :                :     /*
                               7579                 :                :      * Update the shared RedoRecPtr so that the startup process can calculate
                               7580                 :                :      * the number of segments replayed since last restartpoint, and request a
                               7581                 :                :      * restartpoint if it exceeds CheckPointSegments.
                               7582                 :                :      *
                               7583                 :                :      * Like in CreateCheckPoint(), hold off insertions to update it, although
                               7584                 :                :      * during recovery this is just pro forma, because no WAL insertions are
                               7585                 :                :      * happening.
                               7586                 :                :      */
 3677                          7587                 :             48 :     WALInsertLockAcquireExclusive();
 3338                          7588                 :             48 :     RedoRecPtr = XLogCtl->Insert.RedoRecPtr = lastCheckPoint.redo;
 3677                          7589                 :             48 :     WALInsertLockRelease();
                               7590                 :                : 
                               7591                 :                :     /* Also update the info_lck-protected copy */
 3492 andres@anarazel.de       7592         [ -  + ]:             48 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7593                 :             48 :     XLogCtl->RedoRecPtr = lastCheckPoint.redo;
                               7594                 :             48 :     SpinLockRelease(&XLogCtl->info_lck);
                               7595                 :                : 
                               7596                 :                :     /*
                               7597                 :                :      * Prepare to accumulate statistics.
                               7598                 :                :      *
                               7599                 :                :      * Note: because it is possible for log_checkpoints to change while a
                               7600                 :                :      * checkpoint proceeds, we always accumulate stats, even if
                               7601                 :                :      * log_checkpoints is currently off.
                               7602                 :                :      */
 4820 rhaas@postgresql.org     7603   [ +  -  +  -  :            528 :     MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
                                     +  -  +  -  +  
                                                 + ]
                               7604                 :             48 :     CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
                               7605                 :                : 
                               7606         [ +  - ]:             48 :     if (log_checkpoints)
 5534 heikki.linnakangas@i     7607                 :             48 :         LogCheckpointStart(flags, true);
                               7608                 :                : 
                               7609                 :                :     /* Update the process title */
 1217 michael@paquier.xyz      7610                 :             48 :     update_checkpoint_display(flags, true, false);
                               7611                 :                : 
 5534 heikki.linnakangas@i     7612                 :             48 :     CheckPointGuts(lastCheckPoint.redo, flags);
                               7613                 :                : 
                               7614                 :                :     /*
                               7615                 :                :      * This location needs to be after CheckPointGuts() to ensure that some
                               7616                 :                :      * work has already happened during this checkpoint.
                               7617                 :                :      */
   41 michael@paquier.xyz      7618                 :GNC          48 :     INJECTION_POINT("create-restart-point");
                               7619                 :                : 
                               7620                 :                :     /*
                               7621                 :                :      * Remember the prior checkpoint's redo ptr for
                               7622                 :                :      * UpdateCheckPointDistanceEstimate()
                               7623                 :                :      */
 3338 heikki.linnakangas@i     7624                 :CBC          48 :     PriorRedoPtr = ControlFile->checkPointCopy.redo;
                               7625                 :                : 
                               7626                 :                :     /*
                               7627                 :                :      * Update pg_control, using current time.  Check that it still shows an
                               7628                 :                :      * older checkpoint, else do nothing; this is a quick hack to make sure
                               7629                 :                :      * nothing really bad happens if somehow we get here after the
                               7630                 :                :      * end-of-recovery checkpoint.
                               7631                 :                :      */
 5534                          7632                 :             48 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
  706 michael@paquier.xyz      7633         [ +  - ]:             48 :     if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
                               7634                 :                :     {
                               7635                 :                :         /*
                               7636                 :                :          * Update the checkpoint information.  We do this even if the cluster
                               7637                 :                :          * does not show DB_IN_ARCHIVE_RECOVERY to match with the set of WAL
                               7638                 :                :          * segments recycled below.
                               7639                 :                :          */
 5406 tgl@sss.pgh.pa.us        7640                 :             48 :         ControlFile->checkPoint = lastCheckPointRecPtr;
                               7641                 :             48 :         ControlFile->checkPointCopy = lastCheckPoint;
                               7642                 :                : 
                               7643                 :                :         /*
                               7644                 :                :          * Ensure minRecoveryPoint is past the checkpoint record and update it
                               7645                 :                :          * if the control file still shows DB_IN_ARCHIVE_RECOVERY.  Normally,
                               7646                 :                :          * this will have happened already while writing out dirty buffers,
                               7647                 :                :          * but not necessarily - e.g. because no buffers were dirtied.  We do
                               7648                 :                :          * this because a backup performed in recovery uses minRecoveryPoint
                               7649                 :                :          * to determine which WAL files must be included in the backup, and
                               7650                 :                :          * the file (or files) containing the checkpoint record must be
                               7651                 :                :          * included, at a minimum.  Note that for an ordinary restart of
                               7652                 :                :          * recovery there's no value in having the minimum recovery point any
                               7653                 :                :          * earlier than this anyway, because redo will begin just after the
                               7654                 :                :          * checkpoint record.
                               7655                 :                :          */
  706 michael@paquier.xyz      7656         [ +  + ]:             48 :         if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
                               7657                 :                :         {
                               7658         [ +  + ]:             47 :             if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
                               7659                 :                :             {
                               7660                 :             20 :                 ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
                               7661                 :             20 :                 ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
                               7662                 :                : 
                               7663                 :                :                 /* update local copy */
                               7664                 :             20 :                 LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               7665                 :             20 :                 LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               7666                 :                :             }
                               7667         [ +  + ]:             47 :             if (flags & CHECKPOINT_IS_SHUTDOWN)
                               7668                 :             20 :                 ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
                               7669                 :                :         }
 5406 tgl@sss.pgh.pa.us        7670                 :             48 :         UpdateControlFile();
                               7671                 :                :     }
 5534 heikki.linnakangas@i     7672                 :             48 :     LWLockRelease(ControlFileLock);
                               7673                 :                : 
                               7674                 :                :     /*
                               7675                 :                :      * Update the average distance between checkpoints/restartpoints if the
                               7676                 :                :      * prior checkpoint exists.
                               7677                 :                :      */
 3338                          7678         [ +  - ]:             48 :     if (PriorRedoPtr != InvalidXLogRecPtr)
                               7679                 :             48 :         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
                               7680                 :                : 
                               7681                 :                :     /*
                               7682                 :                :      * Delete old log files, those no longer needed for last restartpoint to
                               7683                 :                :      * prevent the disk holding the xlog from growing full.
                               7684                 :                :      */
 2091 michael@paquier.xyz      7685                 :             48 :     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7686                 :                : 
                               7687                 :                :     /*
                               7688                 :                :      * Retreat _logSegNo using the current end of xlog replayed or received,
                               7689                 :                :      * whichever is later.
                               7690                 :                :      */
 1467 tmunro@postgresql.or     7691                 :             48 :     receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
 2091 michael@paquier.xyz      7692                 :             48 :     replayPtr = GetXLogReplayRecPtr(&replayTLI);
                               7693                 :             48 :     endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
                               7694                 :             48 :     KeepLogSeg(endptr, &_logSegNo);
  373 andres@anarazel.de       7695         [ +  + ]:             48 :     if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED,
                               7696                 :                :                                            _logSegNo, InvalidOid,
                               7697                 :                :                                            InvalidTransactionId))
                               7698                 :                :     {
                               7699                 :                :         /*
                               7700                 :                :          * Some slots have been invalidated; recalculate the old-segment
                               7701                 :                :          * horizon, starting again from RedoRecPtr.
                               7702                 :                :          */
 1003 alvherre@alvh.no-ip.     7703                 :GBC           1 :         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7704                 :              1 :         KeepLogSeg(endptr, &_logSegNo);
                               7705                 :                :     }
 2091 michael@paquier.xyz      7706                 :CBC          48 :     _logSegNo--;
                               7707                 :                : 
                               7708                 :                :     /*
                               7709                 :                :      * Try to recycle segments on a useful timeline. If we've been promoted
                               7710                 :                :      * since the beginning of this restartpoint, use the new timeline chosen
                               7711                 :                :      * at end of recovery.  If we're still in recovery, use the timeline we're
                               7712                 :                :      * currently replaying.
                               7713                 :                :      *
                               7714                 :                :      * There is no guarantee that the WAL segments will be useful on the
                               7715                 :                :      * current timeline; if recovery proceeds to a new timeline right after
                               7716                 :                :      * this, the pre-allocated WAL segments on this timeline will not be used,
                               7717                 :                :      * and will go wasted until recycled on the next restartpoint. We'll live
                               7718                 :                :      * with that.
                               7719                 :                :      */
  891 rhaas@postgresql.org     7720         [ +  + ]:             48 :     if (!RecoveryInProgress())
  886 rhaas@postgresql.org     7721                 :GBC           1 :         replayTLI = XLogCtl->InsertTimeLineID;
                               7722                 :                : 
  891 rhaas@postgresql.org     7723                 :CBC          48 :     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr, replayTLI);
                               7724                 :                : 
                               7725                 :                :     /*
                               7726                 :                :      * Make more log segments if needed.  (Do this after recycling old log
                               7727                 :                :      * segments, since that may supply some of the needed files.)
                               7728                 :                :      */
                               7729                 :             48 :     PreallocXlogFiles(endptr, replayTLI);
                               7730                 :                : 
                               7731                 :                :     /*
                               7732                 :                :      * Truncate pg_subtrans if possible.  We can throw away all data before
                               7733                 :                :      * the oldest XMIN of any running transaction.  No future transaction will
                               7734                 :                :      * attempt to reference any pg_subtrans entry older than that (see Asserts
                               7735                 :                :      * in subtrans.c).  When hot standby is disabled, though, we mustn't do
                               7736                 :                :      * this because StartupSUBTRANS hasn't been called yet.
                               7737                 :                :      */
 4976 simon@2ndQuadrant.co     7738         [ +  - ]:             48 :     if (EnableHotStandby)
 1341 andres@anarazel.de       7739                 :             48 :         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
                               7740                 :                : 
                               7741                 :                :     /* Real work is done; log and update stats. */
 4392 rhaas@postgresql.org     7742                 :             48 :     LogCheckpointEnd(true);
                               7743                 :                : 
                               7744                 :                :     /* Reset the process title */
 1217 michael@paquier.xyz      7745                 :             48 :     update_checkpoint_display(flags, true, true);
                               7746                 :                : 
 5034 tgl@sss.pgh.pa.us        7747                 :             48 :     xtime = GetLatestXTime();
 5534 heikki.linnakangas@i     7748   [ +  -  +  -  :             48 :     ereport((log_checkpoints ? LOG : DEBUG2),
                                              +  + ]
                               7749                 :                :             (errmsg("recovery restart point at %X/%X",
                               7750                 :                :                     LSN_FORMAT_ARGS(lastCheckPoint.redo)),
                               7751                 :                :              xtime ? errdetail("Last completed transaction was at log time %s.",
                               7752                 :                :                                timestamptz_to_str(xtime)) : 0));
                               7753                 :                : 
                               7754                 :                :     /*
                               7755                 :                :      * Finally, execute archive_cleanup_command, if any.
                               7756                 :                :      */
 1967 peter_e@gmx.net          7757   [ +  -  +  + ]:             48 :     if (archiveCleanupCommand && strcmp(archiveCleanupCommand, "") != 0)
  433 michael@paquier.xyz      7758                 :              1 :         ExecuteRecoveryCommand(archiveCleanupCommand,
                               7759                 :                :                                "archive_cleanup_command",
                               7760                 :                :                                false,
                               7761                 :                :                                WAIT_EVENT_ARCHIVE_CLEANUP_COMMAND);
                               7762                 :                : 
 5534 heikki.linnakangas@i     7763                 :             48 :     return true;
                               7764                 :                : }
                               7765                 :                : 
                               7766                 :                : /*
                               7767                 :                :  * Report availability of WAL for the given target LSN
                               7768                 :                :  *      (typically a slot's restart_lsn)
                               7769                 :                :  *
                               7770                 :                :  * Returns one of the following enum values:
                               7771                 :                :  *
                               7772                 :                :  * * WALAVAIL_RESERVED means targetLSN is available and it is in the range of
                               7773                 :                :  *   max_wal_size.
                               7774                 :                :  *
                               7775                 :                :  * * WALAVAIL_EXTENDED means it is still available by preserving extra
                               7776                 :                :  *   segments beyond max_wal_size. If max_slot_wal_keep_size is smaller
                               7777                 :                :  *   than max_wal_size, this state is not returned.
                               7778                 :                :  *
                               7779                 :                :  * * WALAVAIL_UNRESERVED means it is being lost and the next checkpoint will
                               7780                 :                :  *   remove reserved segments. The walsender using this slot may return to the
                               7781                 :                :  *   above.
                               7782                 :                :  *
                               7783                 :                :  * * WALAVAIL_REMOVED means it has been removed. A replication stream on
                               7784                 :                :  *   a slot with this LSN cannot continue.  (Any associated walsender
                               7785                 :                :  *   processes should have been terminated already.)
                               7786                 :                :  *
                               7787                 :                :  * * WALAVAIL_INVALID_LSN means the slot hasn't been set to reserve WAL.
                               7788                 :                :  */
                               7789                 :                : WALAvailability
 1468 alvherre@alvh.no-ip.     7790                 :            429 : GetWALAvailability(XLogRecPtr targetLSN)
                               7791                 :                : {
                               7792                 :                :     XLogRecPtr  currpos;        /* current write LSN */
                               7793                 :                :     XLogSegNo   currSeg;        /* segid of currpos */
                               7794                 :                :     XLogSegNo   targetSeg;      /* segid of targetLSN */
                               7795                 :                :     XLogSegNo   oldestSeg;      /* actual oldest segid */
                               7796                 :                :     XLogSegNo   oldestSegMaxWalSize;    /* oldest segid kept by max_wal_size */
                               7797                 :                :     XLogSegNo   oldestSlotSeg;  /* oldest segid kept by slot */
                               7798                 :                :     uint64      keepSegs;
                               7799                 :                : 
                               7800                 :                :     /*
                               7801                 :                :      * slot does not reserve WAL. Either deactivated, or has never been active
                               7802                 :                :      */
                               7803         [ +  + ]:            429 :     if (XLogRecPtrIsInvalid(targetLSN))
                               7804                 :             14 :         return WALAVAIL_INVALID_LSN;
                               7805                 :                : 
                               7806                 :                :     /*
                               7807                 :                :      * Calculate the oldest segment currently reserved by all slots,
                               7808                 :                :      * considering wal_keep_size and max_slot_wal_keep_size.  Initialize
                               7809                 :                :      * oldestSlotSeg to the current segment.
                               7810                 :                :      */
 1371                          7811                 :            415 :     currpos = GetXLogWriteRecPtr();
                               7812                 :            415 :     XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
 1468                          7813                 :            415 :     KeepLogSeg(currpos, &oldestSlotSeg);
                               7814                 :                : 
                               7815                 :                :     /*
                               7816                 :                :      * Find the oldest extant segment file. We get 1 until checkpoint removes
                               7817                 :                :      * the first WAL segment file since startup, which causes the status being
                               7818                 :                :      * wrong under certain abnormal conditions but that doesn't actually harm.
                               7819                 :                :      */
                               7820                 :            415 :     oldestSeg = XLogGetLastRemovedSegno() + 1;
                               7821                 :                : 
                               7822                 :                :     /* calculate oldest segment by max_wal_size */
                               7823                 :            415 :     XLByteToSeg(currpos, currSeg, wal_segment_size);
 1390                          7824                 :            415 :     keepSegs = ConvertToXSegs(max_wal_size_mb, wal_segment_size) + 1;
                               7825                 :                : 
 1468                          7826         [ +  + ]:            415 :     if (currSeg > keepSegs)
                               7827                 :              8 :         oldestSegMaxWalSize = currSeg - keepSegs;
                               7828                 :                :     else
                               7829                 :            407 :         oldestSegMaxWalSize = 1;
                               7830                 :                : 
                               7831                 :                :     /* the segment we care about */
 1371                          7832                 :            415 :     XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
                               7833                 :                : 
                               7834                 :                :     /*
                               7835                 :                :      * No point in returning reserved or extended status values if the
                               7836                 :                :      * targetSeg is known to be lost.
                               7837                 :                :      */
 1390                          7838         [ +  + ]:            415 :     if (targetSeg >= oldestSlotSeg)
                               7839                 :                :     {
                               7840                 :                :         /* show "reserved" when targetSeg is within max_wal_size */
                               7841         [ +  + ]:            414 :         if (targetSeg >= oldestSegMaxWalSize)
 1468                          7842                 :            412 :             return WALAVAIL_RESERVED;
                               7843                 :                : 
                               7844                 :                :         /* being retained by slots exceeding max_wal_size */
 1390                          7845                 :              2 :         return WALAVAIL_EXTENDED;
                               7846                 :                :     }
                               7847                 :                : 
                               7848                 :                :     /* WAL segments are no longer retained but haven't been removed yet */
                               7849         [ +  - ]:              1 :     if (targetSeg >= oldestSeg)
                               7850                 :              1 :         return WALAVAIL_UNRESERVED;
                               7851                 :                : 
                               7852                 :                :     /* Definitely lost */
 1468 alvherre@alvh.no-ip.     7853                 :UBC           0 :     return WALAVAIL_REMOVED;
                               7854                 :                : }
                               7855                 :                : 
                               7856                 :                : 
                               7857                 :                : /*
                               7858                 :                :  * Retreat *logSegNo to the last segment that we need to retain because of
                               7859                 :                :  * either wal_keep_size or replication slots.
                               7860                 :                :  *
                               7861                 :                :  * This is calculated by subtracting wal_keep_size from the given xlog
                               7862                 :                :  * location, recptr and by making sure that that result is below the
                               7863                 :                :  * requirement of replication slots.  For the latter criterion we do consider
                               7864                 :                :  * the effects of max_slot_wal_keep_size: reserve at most that much space back
                               7865                 :                :  * from recptr.
                               7866                 :                :  *
                               7867                 :                :  * Note about replication slots: if this function calculates a value
                               7868                 :                :  * that's further ahead than what slots need reserved, then affected
                               7869                 :                :  * slots need to be invalidated and this function invoked again.
                               7870                 :                :  * XXX it might be a good idea to rewrite this function so that
                               7871                 :                :  * invalidation is optionally done here, instead.
                               7872                 :                :  */
                               7873                 :                : static void
 4312 heikki.linnakangas@i     7874                 :CBC        1566 : KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
                               7875                 :                : {
                               7876                 :                :     XLogSegNo   currSegNo;
                               7877                 :                :     XLogSegNo   segno;
                               7878                 :                :     XLogRecPtr  keep;
                               7879                 :                : 
 1468 alvherre@alvh.no-ip.     7880                 :           1566 :     XLByteToSeg(recptr, currSegNo, wal_segment_size);
                               7881                 :           1566 :     segno = currSegNo;
                               7882                 :                : 
                               7883                 :                :     /*
                               7884                 :                :      * Calculate how many segments are kept by slots first, adjusting for
                               7885                 :                :      * max_slot_wal_keep_size.
                               7886                 :                :      */
                               7887                 :           1566 :     keep = XLogGetReplicationSlotMinimumLSN();
  353 nathan@postgresql.or     7888   [ +  +  +  + ]:           1566 :     if (keep != InvalidXLogRecPtr && keep < recptr)
                               7889                 :                :     {
 1468 alvherre@alvh.no-ip.     7890                 :            400 :         XLByteToSeg(keep, segno, wal_segment_size);
                               7891                 :                : 
                               7892                 :                :         /* Cap by max_slot_wal_keep_size ... */
                               7893         [ +  + ]:            400 :         if (max_slot_wal_keep_size_mb >= 0)
                               7894                 :                :         {
                               7895                 :                :             uint64      slot_keep_segs;
                               7896                 :                : 
                               7897                 :             22 :             slot_keep_segs =
                               7898                 :             22 :                 ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
                               7899                 :                : 
                               7900         [ +  + ]:             22 :             if (currSegNo - segno > slot_keep_segs)
                               7901                 :              4 :                 segno = currSegNo - slot_keep_segs;
                               7902                 :                :         }
                               7903                 :                :     }
                               7904                 :                : 
                               7905                 :                :     /*
                               7906                 :                :      * If WAL summarization is in use, don't remove WAL that has yet to be
                               7907                 :                :      * summarized.
                               7908                 :                :      */
  116 rhaas@postgresql.org     7909                 :GNC        1566 :     keep = GetOldestUnsummarizedLSN(NULL, NULL, false);
                               7910         [ +  + ]:           1566 :     if (keep != InvalidXLogRecPtr)
                               7911                 :                :     {
                               7912                 :                :         XLogSegNo   unsummarized_segno;
                               7913                 :                : 
                               7914                 :             26 :         XLByteToSeg(keep, unsummarized_segno, wal_segment_size);
                               7915         [ +  + ]:             26 :         if (unsummarized_segno < segno)
                               7916                 :             18 :             segno = unsummarized_segno;
                               7917                 :                :     }
                               7918                 :                : 
                               7919                 :                :     /* but, keep at least wal_keep_size if that's set */
 1364 fujii@postgresql.org     7920         [ +  + ]:CBC        1566 :     if (wal_keep_size_mb > 0)
                               7921                 :                :     {
                               7922                 :                :         uint64      keep_segs;
                               7923                 :                : 
                               7924                 :             85 :         keep_segs = ConvertToXSegs(wal_keep_size_mb, wal_segment_size);
                               7925         [ +  - ]:             85 :         if (currSegNo - segno < keep_segs)
                               7926                 :                :         {
                               7927                 :                :             /* avoid underflow, don't go below 1 */
                               7928         [ +  + ]:             85 :             if (currSegNo <= keep_segs)
                               7929                 :             81 :                 segno = 1;
                               7930                 :                :             else
                               7931                 :              4 :                 segno = currSegNo - keep_segs;
                               7932                 :                :         }
                               7933                 :                :     }
                               7934                 :                : 
                               7935                 :                :     /* don't delete WAL segments newer than the calculated segment */
 1371 alvherre@alvh.no-ip.     7936         [ +  + ]:           1566 :     if (segno < *logSegNo)
 4312 heikki.linnakangas@i     7937                 :            228 :         *logSegNo = segno;
 4653 simon@2ndQuadrant.co     7938                 :           1566 : }
                               7939                 :                : 
                               7940                 :                : /*
                               7941                 :                :  * Write a NEXTOID log record
                               7942                 :                :  */
                               7943                 :                : void
 8563 vadim4o@yahoo.com        7944                 :            505 : XLogPutNextOid(Oid nextOid)
                               7945                 :                : {
 3433 heikki.linnakangas@i     7946                 :            505 :     XLogBeginInsert();
                               7947                 :            505 :     XLogRegisterData((char *) (&nextOid), sizeof(Oid));
                               7948                 :            505 :     (void) XLogInsert(RM_XLOG_ID, XLOG_NEXTOID);
                               7949                 :                : 
                               7950                 :                :     /*
                               7951                 :                :      * We need not flush the NEXTOID record immediately, because any of the
                               7952                 :                :      * just-allocated OIDs could only reach disk as part of a tuple insert or
                               7953                 :                :      * update that would have its own XLOG record that must follow the NEXTOID
                               7954                 :                :      * record.  Therefore, the standard buffer LSN interlock applied to those
                               7955                 :                :      * records will ensure no such OID reaches disk before the NEXTOID record
                               7956                 :                :      * does.
                               7957                 :                :      *
                               7958                 :                :      * Note, however, that the above statement only covers state "within" the
                               7959                 :                :      * database.  When we use a generated OID as a file or directory name, we
                               7960                 :                :      * are in a sense violating the basic WAL rule, because that filesystem
                               7961                 :                :      * change may reach disk before the NEXTOID WAL record does.  The impact
                               7962                 :                :      * of this is that if a database crash occurs immediately afterward, we
                               7963                 :                :      * might after restart re-generate the same OID and find that it conflicts
                               7964                 :                :      * with the leftover file or directory.  But since for safety's sake we
                               7965                 :                :      * always loop until finding a nonconflicting filename, this poses no real
                               7966                 :                :      * problem in practice. See pgsql-hackers discussion 27-Sep-2006.
                               7967                 :                :      */
 6926 tgl@sss.pgh.pa.us        7968                 :            505 : }
                               7969                 :                : 
                               7970                 :                : /*
                               7971                 :                :  * Write an XLOG SWITCH record.
                               7972                 :                :  *
                               7973                 :                :  * Here we just blindly issue an XLogInsert request for the record.
                               7974                 :                :  * All the magic happens inside XLogInsert.
                               7975                 :                :  *
                               7976                 :                :  * The return value is either the end+1 address of the switch record,
                               7977                 :                :  * or the end+1 address of the prior segment if we did not need to
                               7978                 :                :  * write a switch record because we are already at segment start.
                               7979                 :                :  */
                               7980                 :                : XLogRecPtr
 2670 andres@anarazel.de       7981                 :            350 : RequestXLogSwitch(bool mark_unimportant)
                               7982                 :                : {
                               7983                 :                :     XLogRecPtr  RecPtr;
                               7984                 :                : 
                               7985                 :                :     /* XLOG SWITCH has no data */
 3433 heikki.linnakangas@i     7986                 :            350 :     XLogBeginInsert();
                               7987                 :                : 
 2670 andres@anarazel.de       7988         [ -  + ]:            350 :     if (mark_unimportant)
 2670 andres@anarazel.de       7989                 :UBC           0 :         XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
 3433 heikki.linnakangas@i     7990                 :CBC         350 :     RecPtr = XLogInsert(RM_XLOG_ID, XLOG_SWITCH);
                               7991                 :                : 
 6461 tgl@sss.pgh.pa.us        7992                 :            350 :     return RecPtr;
                               7993                 :                : }
                               7994                 :                : 
                               7995                 :                : /*
                               7996                 :                :  * Write a RESTORE POINT record
                               7997                 :                :  */
                               7998                 :                : XLogRecPtr
 4814 simon@2ndQuadrant.co     7999                 :              3 : XLogRestorePoint(const char *rpName)
                               8000                 :                : {
                               8001                 :                :     XLogRecPtr  RecPtr;
                               8002                 :                :     xl_restore_point xlrec;
                               8003                 :                : 
                               8004                 :              3 :     xlrec.rp_time = GetCurrentTimestamp();
 3709 tgl@sss.pgh.pa.us        8005                 :              3 :     strlcpy(xlrec.rp_name, rpName, MAXFNAMELEN);
                               8006                 :                : 
 3433 heikki.linnakangas@i     8007                 :              3 :     XLogBeginInsert();
                               8008                 :              3 :     XLogRegisterData((char *) &xlrec, sizeof(xl_restore_point));
                               8009                 :                : 
                               8010                 :              3 :     RecPtr = XLogInsert(RM_XLOG_ID, XLOG_RESTORE_POINT);
                               8011                 :                : 
 4798 rhaas@postgresql.org     8012         [ +  - ]:              3 :     ereport(LOG,
                               8013                 :                :             (errmsg("restore point \"%s\" created at %X/%X",
                               8014                 :                :                     rpName, LSN_FORMAT_ARGS(RecPtr))));
                               8015                 :                : 
 4814 simon@2ndQuadrant.co     8016                 :              3 :     return RecPtr;
                               8017                 :                : }
                               8018                 :                : 
                               8019                 :                : /*
                               8020                 :                :  * Check if any of the GUC parameters that are critical for hot standby
                               8021                 :                :  * have changed, and update the value in pg_control file if necessary.
                               8022                 :                :  */
                               8023                 :                : static void
 5100 heikki.linnakangas@i     8024                 :            729 : XLogReportParameters(void)
                               8025                 :                : {
                               8026         [ +  + ]:            729 :     if (wal_level != ControlFile->wal_level ||
 3756 rhaas@postgresql.org     8027         [ +  + ]:            534 :         wal_log_hints != ControlFile->wal_log_hints ||
 5100 heikki.linnakangas@i     8028         [ +  + ]:            462 :         MaxConnections != ControlFile->MaxConnections ||
 3937 rhaas@postgresql.org     8029         [ +  + ]:            461 :         max_worker_processes != ControlFile->max_worker_processes ||
 1888 michael@paquier.xyz      8030         [ +  + ]:            460 :         max_wal_senders != ControlFile->max_wal_senders ||
 5100 heikki.linnakangas@i     8031         [ +  + ]:            449 :         max_prepared_xacts != ControlFile->max_prepared_xacts ||
 3420 alvherre@alvh.no-ip.     8032         [ +  - ]:            366 :         max_locks_per_xact != ControlFile->max_locks_per_xact ||
                               8033         [ +  + ]:            366 :         track_commit_timestamp != ControlFile->track_commit_timestamp)
                               8034                 :                :     {
                               8035                 :                :         /*
                               8036                 :                :          * The change in number of backend slots doesn't need to be WAL-logged
                               8037                 :                :          * if archiving is not enabled, as you can't start archive recovery
                               8038                 :                :          * with wal_level=minimal anyway. We don't really care about the
                               8039                 :                :          * values in pg_control either if wal_level=minimal, but seems better
                               8040                 :                :          * to keep them up-to-date to avoid confusion.
                               8041                 :                :          */
 5100 heikki.linnakangas@i     8042   [ +  +  +  + ]:            370 :         if (wal_level != ControlFile->wal_level || XLogIsNeeded())
                               8043                 :                :         {
                               8044                 :                :             xl_parameter_change xlrec;
                               8045                 :                :             XLogRecPtr  recptr;
                               8046                 :                : 
                               8047                 :            362 :             xlrec.MaxConnections = MaxConnections;
 3937 rhaas@postgresql.org     8048                 :            362 :             xlrec.max_worker_processes = max_worker_processes;
 1888 michael@paquier.xyz      8049                 :            362 :             xlrec.max_wal_senders = max_wal_senders;
 5100 heikki.linnakangas@i     8050                 :            362 :             xlrec.max_prepared_xacts = max_prepared_xacts;
                               8051                 :            362 :             xlrec.max_locks_per_xact = max_locks_per_xact;
                               8052                 :            362 :             xlrec.wal_level = wal_level;
 3756 rhaas@postgresql.org     8053                 :            362 :             xlrec.wal_log_hints = wal_log_hints;
 3420 alvherre@alvh.no-ip.     8054                 :            362 :             xlrec.track_commit_timestamp = track_commit_timestamp;
                               8055                 :                : 
 3433 heikki.linnakangas@i     8056                 :            362 :             XLogBeginInsert();
                               8057                 :            362 :             XLogRegisterData((char *) &xlrec, sizeof(xlrec));
                               8058                 :                : 
                               8059                 :            362 :             recptr = XLogInsert(RM_XLOG_ID, XLOG_PARAMETER_CHANGE);
 3672 fujii@postgresql.org     8060                 :            362 :             XLogFlush(recptr);
                               8061                 :                :         }
                               8062                 :                : 
 1406 tmunro@postgresql.or     8063                 :            370 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               8064                 :                : 
 5100 heikki.linnakangas@i     8065                 :            370 :         ControlFile->MaxConnections = MaxConnections;
 3937 rhaas@postgresql.org     8066                 :            370 :         ControlFile->max_worker_processes = max_worker_processes;
 1888 michael@paquier.xyz      8067                 :            370 :         ControlFile->max_wal_senders = max_wal_senders;
 5100 heikki.linnakangas@i     8068                 :            370 :         ControlFile->max_prepared_xacts = max_prepared_xacts;
                               8069                 :            370 :         ControlFile->max_locks_per_xact = max_locks_per_xact;
                               8070                 :            370 :         ControlFile->wal_level = wal_level;
 3756 rhaas@postgresql.org     8071                 :            370 :         ControlFile->wal_log_hints = wal_log_hints;
 3420 alvherre@alvh.no-ip.     8072                 :            370 :         ControlFile->track_commit_timestamp = track_commit_timestamp;
 5100 heikki.linnakangas@i     8073                 :            370 :         UpdateControlFile();
                               8074                 :                : 
 1406 tmunro@postgresql.or     8075                 :            370 :         LWLockRelease(ControlFileLock);
                               8076                 :                :     }
 5198 heikki.linnakangas@i     8077                 :            729 : }
                               8078                 :                : 
                               8079                 :                : /*
                               8080                 :                :  * Update full_page_writes in shared memory, and write an
                               8081                 :                :  * XLOG_FPW_CHANGE record if necessary.
                               8082                 :                :  *
                               8083                 :                :  * Note: this function assumes there is no other process running
                               8084                 :                :  * concurrently that could update it.
                               8085                 :                :  */
                               8086                 :                : void
 4463 simon@2ndQuadrant.co     8087                 :           1592 : UpdateFullPageWrites(void)
                               8088                 :                : {
                               8089                 :           1592 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               8090                 :                :     bool        recoveryInProgress;
                               8091                 :                : 
                               8092                 :                :     /*
                               8093                 :                :      * Do nothing if full_page_writes has not been changed.
                               8094                 :                :      *
                               8095                 :                :      * It's safe to check the shared full_page_writes without the lock,
                               8096                 :                :      * because we assume that there is no concurrently running process which
                               8097                 :                :      * can update it.
                               8098                 :                :      */
                               8099         [ +  + ]:           1592 :     if (fullPageWrites == Insert->fullPageWrites)
                               8100                 :            865 :         return;
                               8101                 :                : 
                               8102                 :                :     /*
                               8103                 :                :      * Perform this outside critical section so that the WAL insert
                               8104                 :                :      * initialization done by RecoveryInProgress() doesn't trigger an
                               8105                 :                :      * assertion failure.
                               8106                 :                :      */
 2025 akapila@postgresql.o     8107                 :            727 :     recoveryInProgress = RecoveryInProgress();
                               8108                 :                : 
 4422 heikki.linnakangas@i     8109                 :            727 :     START_CRIT_SECTION();
                               8110                 :                : 
                               8111                 :                :     /*
                               8112                 :                :      * It's always safe to take full page images, even when not strictly
                               8113                 :                :      * required, but not the other round. So if we're setting full_page_writes
                               8114                 :                :      * to true, first set it true and then write the WAL record. If we're
                               8115                 :                :      * setting it to false, first write the WAL record and then set the global
                               8116                 :                :      * flag.
                               8117                 :                :      */
                               8118         [ +  + ]:            727 :     if (fullPageWrites)
                               8119                 :                :     {
 3677                          8120                 :            723 :         WALInsertLockAcquireExclusive();
 4422                          8121                 :            723 :         Insert->fullPageWrites = true;
 3677                          8122                 :            723 :         WALInsertLockRelease();
                               8123                 :                :     }
                               8124                 :                : 
                               8125                 :                :     /*
                               8126                 :                :      * Write an XLOG_FPW_CHANGE record. This allows us to keep track of
                               8127                 :                :      * full_page_writes during archive recovery, if required.
                               8128                 :                :      */
 2025 akapila@postgresql.o     8129   [ +  +  -  + ]:            727 :     if (XLogStandbyInfoActive() && !recoveryInProgress)
                               8130                 :                :     {
 3433 heikki.linnakangas@i     8131                 :UBC           0 :         XLogBeginInsert();
                               8132                 :              0 :         XLogRegisterData((char *) (&fullPageWrites), sizeof(bool));
                               8133                 :                : 
                               8134                 :              0 :         XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
                               8135                 :                :     }
                               8136                 :                : 
 4422 heikki.linnakangas@i     8137         [ +  + ]:CBC         727 :     if (!fullPageWrites)
                               8138                 :                :     {
 3677                          8139                 :              4 :         WALInsertLockAcquireExclusive();
 4422                          8140                 :              4 :         Insert->fullPageWrites = false;
 3677                          8141                 :              4 :         WALInsertLockRelease();
                               8142                 :                :     }
 4422                          8143         [ -  + ]:            727 :     END_CRIT_SECTION();
                               8144                 :                : }
                               8145                 :                : 
                               8146                 :                : /*
                               8147                 :                :  * XLOG resource manager's routines
                               8148                 :                :  *
                               8149                 :                :  * Definitions of info values are in include/catalog/pg_control.h, though
                               8150                 :                :  * not all record types are related to control file updates.
                               8151                 :                :  *
                               8152                 :                :  * NOTE: Some XLOG record types that are directly related to WAL recovery
                               8153                 :                :  * are handled in xlogrecovery_redo().
                               8154                 :                :  */
                               8155                 :                : void
 3433                          8156                 :          42783 : xlog_redo(XLogReaderState *record)
                               8157                 :                : {
                               8158                 :          42783 :     uint8       info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
                               8159                 :          42783 :     XLogRecPtr  lsn = record->EndRecPtr;
                               8160                 :                : 
                               8161                 :                :     /*
                               8162                 :                :      * In XLOG rmgr, backup blocks are only used by XLOG_FPI and
                               8163                 :                :      * XLOG_FPI_FOR_HINT records.
                               8164                 :                :      */
 3429                          8165   [ +  +  +  +  :          42783 :     Assert(info == XLOG_FPI || info == XLOG_FPI_FOR_HINT ||
                                              -  + ]
                               8166                 :                :            !XLogRecHasAnyBlockRefs(record));
                               8167                 :                : 
 8428 tgl@sss.pgh.pa.us        8168         [ +  + ]:          42783 :     if (info == XLOG_NEXTOID)
                               8169                 :                :     {
                               8170                 :                :         Oid         nextOid;
                               8171                 :                : 
                               8172                 :                :         /*
                               8173                 :                :          * We used to try to take the maximum of TransamVariables->nextOid and
                               8174                 :                :          * the recorded nextOid, but that fails if the OID counter wraps
                               8175                 :                :          * around.  Since no OID allocation should be happening during replay
                               8176                 :                :          * anyway, better to just believe the record exactly.  We still take
                               8177                 :                :          * OidGenLock while setting the variable, just in case.
                               8178                 :                :          */
 8563 vadim4o@yahoo.com        8179                 :             91 :         memcpy(&nextOid, XLogRecGetData(record), sizeof(Oid));
 4451 tgl@sss.pgh.pa.us        8180                 :             91 :         LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
  128 heikki.linnakangas@i     8181                 :GNC          91 :         TransamVariables->nextOid = nextOid;
                               8182                 :             91 :         TransamVariables->oidCount = 0;
 4451 tgl@sss.pgh.pa.us        8183                 :CBC          91 :         LWLockRelease(OidGenLock);
                               8184                 :                :     }
 8433                          8185         [ +  + ]:          42692 :     else if (info == XLOG_CHECKPOINT_SHUTDOWN)
                               8186                 :                :     {
                               8187                 :                :         CheckPoint  checkPoint;
                               8188                 :                :         TimeLineID  replayTLI;
                               8189                 :                : 
                               8190                 :             46 :         memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
                               8191                 :                :         /* In a SHUTDOWN checkpoint, believe the counters exactly */
 4451                          8192                 :             46 :         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
  128 heikki.linnakangas@i     8193                 :GNC          46 :         TransamVariables->nextXid = checkPoint.nextXid;
 4451 tgl@sss.pgh.pa.us        8194                 :CBC          46 :         LWLockRelease(XidGenLock);
                               8195                 :             46 :         LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
  128 heikki.linnakangas@i     8196                 :GNC          46 :         TransamVariables->nextOid = checkPoint.nextOid;
                               8197                 :             46 :         TransamVariables->oidCount = 0;
 4451 tgl@sss.pgh.pa.us        8198                 :CBC          46 :         LWLockRelease(OidGenLock);
 6885                          8199                 :             46 :         MultiXactSetNextMXact(checkPoint.nextMulti,
                               8200                 :                :                               checkPoint.nextMultiOffset);
                               8201                 :                : 
 3123 andres@anarazel.de       8202                 :             46 :         MultiXactAdvanceOldest(checkPoint.oldestMulti,
                               8203                 :                :                                checkPoint.oldestMultiDB);
                               8204                 :                : 
                               8205                 :                :         /*
                               8206                 :                :          * No need to set oldestClogXid here as well; it'll be set when we
                               8207                 :                :          * redo an xl_clog_truncate if it changed since initialization.
                               8208                 :                :          */
 5170 tgl@sss.pgh.pa.us        8209                 :             46 :         SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
                               8210                 :                : 
                               8211                 :                :         /*
                               8212                 :                :          * If we see a shutdown checkpoint while waiting for an end-of-backup
                               8213                 :                :          * record, the backup was canceled and the end-of-backup record will
                               8214                 :                :          * never arrive.
                               8215                 :                :          */
 4069 heikki.linnakangas@i     8216         [ +  + ]:             46 :         if (ArchiveRecoveryRequested &&
 4463 simon@2ndQuadrant.co     8217         [ +  + ]:             45 :             !XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
                               8218         [ -  + ]:              1 :             XLogRecPtrIsInvalid(ControlFile->backupEndPoint))
 4451 tgl@sss.pgh.pa.us        8219         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8220                 :                :                     (errmsg("online backup was canceled, recovery cannot continue")));
                               8221                 :                : 
                               8222                 :                :         /*
                               8223                 :                :          * If we see a shutdown checkpoint, we know that nothing was running
                               8224                 :                :          * on the primary at this point. So fake-up an empty running-xacts
                               8225                 :                :          * record and use that here and now. Recover additional standby state
                               8226                 :                :          * for prepared transactions.
                               8227                 :                :          */
 5230 simon@2ndQuadrant.co     8228         [ +  + ]:CBC          46 :         if (standbyState >= STANDBY_INITIALIZED)
                               8229                 :                :         {
                               8230                 :                :             TransactionId *xids;
                               8231                 :                :             int         nxids;
                               8232                 :                :             TransactionId oldestActiveXID;
                               8233                 :                :             TransactionId latestCompletedXid;
                               8234                 :                :             RunningTransactionsData running;
                               8235                 :                : 
 5115 heikki.linnakangas@i     8236                 :             43 :             oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
                               8237                 :                : 
                               8238                 :                :             /*
                               8239                 :                :              * Construct a RunningTransactions snapshot representing a shut
                               8240                 :                :              * down server, with only prepared transactions still alive. We're
                               8241                 :                :              * never overflowed at this point because all subxids are listed
                               8242                 :                :              * with their parent prepared transactions.
                               8243                 :                :              */
                               8244                 :             43 :             running.xcnt = nxids;
 4151 simon@2ndQuadrant.co     8245                 :             43 :             running.subxcnt = 0;
 5115 heikki.linnakangas@i     8246                 :             43 :             running.subxid_overflow = false;
 1342 andres@anarazel.de       8247                 :             43 :             running.nextXid = XidFromFullTransactionId(checkPoint.nextXid);
 5115 heikki.linnakangas@i     8248                 :             43 :             running.oldestRunningXid = oldestActiveXID;
 1342 andres@anarazel.de       8249                 :             43 :             latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid);
 5085 simon@2ndQuadrant.co     8250         [ -  + ]:             43 :             TransactionIdRetreat(latestCompletedXid);
 5084                          8251         [ -  + ]:             43 :             Assert(TransactionIdIsNormal(latestCompletedXid));
 5085                          8252                 :             43 :             running.latestCompletedXid = latestCompletedXid;
 5115 heikki.linnakangas@i     8253                 :             43 :             running.xids = xids;
                               8254                 :                : 
                               8255                 :             43 :             ProcArrayApplyRecoveryInfo(&running);
                               8256                 :                : 
 2544 simon@2ndQuadrant.co     8257                 :             43 :             StandbyRecoverPreparedTransactions();
                               8258                 :                :         }
                               8259                 :                : 
                               8260                 :                :         /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
 1406 tmunro@postgresql.or     8261                 :             46 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 1342 andres@anarazel.de       8262                 :             46 :         ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 1406 tmunro@postgresql.or     8263                 :             46 :         LWLockRelease(ControlFileLock);
                               8264                 :                : 
                               8265                 :                :         /* Update shared-memory copy of checkpoint XID/epoch */
 3492 andres@anarazel.de       8266         [ -  + ]:             46 :         SpinLockAcquire(&XLogCtl->info_lck);
 1342                          8267                 :             46 :         XLogCtl->ckptFullXid = checkPoint.nextXid;
 3492                          8268                 :             46 :         SpinLockRelease(&XLogCtl->info_lck);
                               8269                 :                : 
                               8270                 :                :         /*
                               8271                 :                :          * We should've already switched to the new TLI before replaying this
                               8272                 :                :          * record.
                               8273                 :                :          */
  788 heikki.linnakangas@i     8274                 :             46 :         (void) GetCurrentReplayRecPtr(&replayTLI);
  891 rhaas@postgresql.org     8275         [ -  + ]:             46 :         if (checkPoint.ThisTimeLineID != replayTLI)
 4146 heikki.linnakangas@i     8276         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8277                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in shutdown checkpoint record",
                               8278                 :                :                             checkPoint.ThisTimeLineID, replayTLI)));
                               8279                 :                : 
  872 rhaas@postgresql.org     8280                 :CBC          46 :         RecoveryRestartPoint(&checkPoint, record);
                               8281                 :                :     }
 8433 tgl@sss.pgh.pa.us        8282         [ +  + ]:          42646 :     else if (info == XLOG_CHECKPOINT_ONLINE)
                               8283                 :                :     {
                               8284                 :                :         CheckPoint  checkPoint;
                               8285                 :                :         TimeLineID  replayTLI;
                               8286                 :                : 
                               8287                 :            280 :         memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
                               8288                 :                :         /* In an ONLINE checkpoint, treat the XID counter as a minimum */
 4451                          8289                 :            280 :         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
  128 heikki.linnakangas@i     8290         [ -  + ]:GNC         280 :         if (FullTransactionIdPrecedes(TransamVariables->nextXid,
                               8291                 :                :                                       checkPoint.nextXid))
  128 heikki.linnakangas@i     8292                 :UNC           0 :             TransamVariables->nextXid = checkPoint.nextXid;
 4451 tgl@sss.pgh.pa.us        8293                 :CBC         280 :         LWLockRelease(XidGenLock);
                               8294                 :                : 
                               8295                 :                :         /*
                               8296                 :                :          * We ignore the nextOid counter in an ONLINE checkpoint, preferring
                               8297                 :                :          * to track OID assignment through XLOG_NEXTOID records.  The nextOid
                               8298                 :                :          * counter is from the start of the checkpoint and might well be stale
                               8299                 :                :          * compared to later XLOG_NEXTOID records.  We could try to take the
                               8300                 :                :          * maximum of the nextOid counter and our latest value, but since
                               8301                 :                :          * there's no particular guarantee about the speed with which the OID
                               8302                 :                :          * counter wraps around, that's a risky thing to do.  In any case,
                               8303                 :                :          * users of the nextOid counter are required to avoid assignment of
                               8304                 :                :          * duplicates, so that a somewhat out-of-date value should be safe.
                               8305                 :                :          */
                               8306                 :                : 
                               8307                 :                :         /* Handle multixact */
 6885                          8308                 :            280 :         MultiXactAdvanceNextMXact(checkPoint.nextMulti,
                               8309                 :                :                                   checkPoint.nextMultiOffset);
                               8310                 :                : 
                               8311                 :                :         /*
                               8312                 :                :          * NB: This may perform multixact truncation when replaying WAL
                               8313                 :                :          * generated by an older primary.
                               8314                 :                :          */
 3123 andres@anarazel.de       8315                 :            280 :         MultiXactAdvanceOldest(checkPoint.oldestMulti,
                               8316                 :                :                                checkPoint.oldestMultiDB);
  128 heikki.linnakangas@i     8317         [ -  + ]:GNC         280 :         if (TransactionIdPrecedes(TransamVariables->oldestXid,
                               8318                 :                :                                   checkPoint.oldestXid))
 5170 tgl@sss.pgh.pa.us        8319                 :UBC           0 :             SetTransactionIdLimit(checkPoint.oldestXid,
                               8320                 :                :                                   checkPoint.oldestXidDB);
                               8321                 :                :         /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
 1406 tmunro@postgresql.or     8322                 :CBC         280 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 1342 andres@anarazel.de       8323                 :            280 :         ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 1406 tmunro@postgresql.or     8324                 :            280 :         LWLockRelease(ControlFileLock);
                               8325                 :                : 
                               8326                 :                :         /* Update shared-memory copy of checkpoint XID/epoch */
 3492 andres@anarazel.de       8327         [ -  + ]:            280 :         SpinLockAcquire(&XLogCtl->info_lck);
 1342                          8328                 :            280 :         XLogCtl->ckptFullXid = checkPoint.nextXid;
 3492                          8329                 :            280 :         SpinLockRelease(&XLogCtl->info_lck);
                               8330                 :                : 
                               8331                 :                :         /* TLI should not change in an on-line checkpoint */
  788 heikki.linnakangas@i     8332                 :            280 :         (void) GetCurrentReplayRecPtr(&replayTLI);
  891 rhaas@postgresql.org     8333         [ -  + ]:            280 :         if (checkPoint.ThisTimeLineID != replayTLI)
 7368 tgl@sss.pgh.pa.us        8334         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8335                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in online checkpoint record",
                               8336                 :                :                             checkPoint.ThisTimeLineID, replayTLI)));
                               8337                 :                : 
  872 rhaas@postgresql.org     8338                 :CBC         280 :         RecoveryRestartPoint(&checkPoint, record);
                               8339                 :                :     }
  928 alvherre@alvh.no-ip.     8340         [ +  + ]:          42366 :     else if (info == XLOG_OVERWRITE_CONTRECORD)
                               8341                 :                :     {
                               8342                 :                :         /* nothing to do here, handled in xlogrecovery_redo() */
                               8343                 :                :     }
 4093 simon@2ndQuadrant.co     8344         [ +  + ]:          42365 :     else if (info == XLOG_END_OF_RECOVERY)
                               8345                 :                :     {
                               8346                 :                :         xl_end_of_recovery xlrec;
                               8347                 :                :         TimeLineID  replayTLI;
                               8348                 :                : 
                               8349                 :             25 :         memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_end_of_recovery));
                               8350                 :                : 
                               8351                 :                :         /*
                               8352                 :                :          * For Hot Standby, we could treat this like a Shutdown Checkpoint,
                               8353                 :                :          * but this case is rarer and harder to test, so the benefit doesn't
                               8354                 :                :          * outweigh the potential extra cost of maintenance.
                               8355                 :                :          */
                               8356                 :                : 
                               8357                 :                :         /*
                               8358                 :                :          * We should've already switched to the new TLI before replaying this
                               8359                 :                :          * record.
                               8360                 :                :          */
  788 heikki.linnakangas@i     8361                 :             25 :         (void) GetCurrentReplayRecPtr(&replayTLI);
  891 rhaas@postgresql.org     8362         [ -  + ]:             25 :         if (xlrec.ThisTimeLineID != replayTLI)
 4093 simon@2ndQuadrant.co     8363         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8364                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record",
                               8365                 :                :                             xlrec.ThisTimeLineID, replayTLI)));
                               8366                 :                :     }
 6174 tgl@sss.pgh.pa.us        8367         [ +  - ]:CBC       42340 :     else if (info == XLOG_NOOP)
                               8368                 :                :     {
                               8369                 :                :         /* nothing to do here */
                               8370                 :                :     }
 6461                          8371         [ +  + ]:          42340 :     else if (info == XLOG_SWITCH)
                               8372                 :                :     {
                               8373                 :                :         /* nothing to do here */
                               8374                 :                :     }
 4814 simon@2ndQuadrant.co     8375         [ +  + ]:          42190 :     else if (info == XLOG_RESTORE_POINT)
                               8376                 :                :     {
                               8377                 :                :         /* nothing to do here, handled in xlogrecovery.c */
                               8378                 :                :     }
 3429 heikki.linnakangas@i     8379   [ +  +  +  + ]:          42183 :     else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT)
                               8380                 :                :     {
                               8381                 :                :         /*
                               8382                 :                :          * XLOG_FPI records contain nothing else but one or more block
                               8383                 :                :          * references. Every block reference must include a full-page image
                               8384                 :                :          * even if full_page_writes was disabled when the record was generated
                               8385                 :                :          * - otherwise there would be no point in this record.
                               8386                 :                :          *
                               8387                 :                :          * XLOG_FPI_FOR_HINT records are generated when a page needs to be
                               8388                 :                :          * WAL-logged because of a hint bit update. They are only generated
                               8389                 :                :          * when checksums and/or wal_log_hints are enabled. They may include
                               8390                 :                :          * no full-page images if full_page_writes was disabled when they were
                               8391                 :                :          * generated. In this case there is nothing to do here.
                               8392                 :                :          *
                               8393                 :                :          * No recovery conflicts are generated by these generic records - if a
                               8394                 :                :          * resource manager needs to generate conflicts, it has to define a
                               8395                 :                :          * separate WAL record type and redo routine.
                               8396                 :                :          */
  758 tmunro@postgresql.or     8397         [ +  + ]:          88313 :         for (uint8 block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
                               8398                 :                :         {
                               8399                 :                :             Buffer      buffer;
                               8400                 :                : 
  998 fujii@postgresql.org     8401         [ +  + ]:          46556 :             if (!XLogRecHasBlockImage(record, block_id))
                               8402                 :                :             {
                               8403         [ -  + ]:             68 :                 if (info == XLOG_FPI)
  998 fujii@postgresql.org     8404         [ #  # ]:UBC           0 :                     elog(ERROR, "XLOG_FPI record did not contain a full-page image");
  998 fujii@postgresql.org     8405                 :CBC          68 :                 continue;
                               8406                 :                :             }
                               8407                 :                : 
 1838 heikki.linnakangas@i     8408         [ -  + ]:          46488 :             if (XLogReadBufferForRedo(record, block_id, &buffer) != BLK_RESTORED)
 1838 heikki.linnakangas@i     8409         [ #  # ]:UBC           0 :                 elog(ERROR, "unexpected XLogReadBufferForRedo result when restoring backup block");
 1838 heikki.linnakangas@i     8410                 :CBC       46488 :             UnlockReleaseBuffer(buffer);
                               8411                 :                :         }
                               8412                 :                :     }
 5214                          8413         [ +  + ]:            426 :     else if (info == XLOG_BACKUP_END)
                               8414                 :                :     {
                               8415                 :                :         /* nothing to do here, handled in xlogrecovery_redo() */
                               8416                 :                :     }
 5100                          8417         [ +  + ]:            315 :     else if (info == XLOG_PARAMETER_CHANGE)
                               8418                 :                :     {
                               8419                 :                :         xl_parameter_change xlrec;
                               8420                 :                : 
                               8421                 :                :         /* Update our copy of the parameters in pg_control */
                               8422                 :             31 :         memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
                               8423                 :                : 
                               8424                 :                :         /*
                               8425                 :                :          * Invalidate logical slots if we are in hot standby and the primary
                               8426                 :                :          * does not have a WAL level sufficient for logical decoding. No need
                               8427                 :                :          * to search for potentially conflicting logically slots if standby is
                               8428                 :                :          * running with wal_level lower than logical, because in that case, we
                               8429                 :                :          * would have either disallowed creation of logical slots or
                               8430                 :                :          * invalidated existing ones.
                               8431                 :                :          */
  373 andres@anarazel.de       8432   [ +  -  +  + ]:             31 :         if (InRecovery && InHotStandby &&
                               8433         [ +  + ]:             15 :             xlrec.wal_level < WAL_LEVEL_LOGICAL &&
                               8434         [ +  + ]:              7 :             wal_level >= WAL_LEVEL_LOGICAL)
                               8435                 :              4 :             InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
                               8436                 :                :                                                0, InvalidOid,
                               8437                 :                :                                                InvalidTransactionId);
                               8438                 :                : 
 5095 heikki.linnakangas@i     8439                 :             31 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 5100                          8440                 :             31 :         ControlFile->MaxConnections = xlrec.MaxConnections;
 3937 rhaas@postgresql.org     8441                 :             31 :         ControlFile->max_worker_processes = xlrec.max_worker_processes;
 1888 michael@paquier.xyz      8442                 :             31 :         ControlFile->max_wal_senders = xlrec.max_wal_senders;
 5100 heikki.linnakangas@i     8443                 :             31 :         ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
                               8444                 :             31 :         ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
                               8445                 :             31 :         ControlFile->wal_level = xlrec.wal_level;
 3377                          8446                 :             31 :         ControlFile->wal_log_hints = xlrec.wal_log_hints;
                               8447                 :                : 
                               8448                 :                :         /*
                               8449                 :                :          * Update minRecoveryPoint to ensure that if recovery is aborted, we
                               8450                 :                :          * recover back up to this point before allowing hot standby again.
                               8451                 :                :          * This is important if the max_* settings are decreased, to ensure
                               8452                 :                :          * you don't run queries against the WAL preceding the change. The
                               8453                 :                :          * local copies cannot be updated as long as crash recovery is
                               8454                 :                :          * happening and we expect all the WAL to be replayed.
                               8455                 :                :          */
 2110 michael@paquier.xyz      8456         [ +  + ]:             31 :         if (InArchiveRecovery)
                               8457                 :                :         {
  788 heikki.linnakangas@i     8458                 :             16 :             LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               8459                 :             16 :             LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               8460                 :                :         }
                               8461   [ +  +  +  + ]:             31 :         if (LocalMinRecoveryPoint != InvalidXLogRecPtr && LocalMinRecoveryPoint < lsn)
                               8462                 :                :         {
                               8463                 :                :             TimeLineID  replayTLI;
                               8464                 :                : 
                               8465                 :              8 :             (void) GetCurrentReplayRecPtr(&replayTLI);
 5095                          8466                 :              8 :             ControlFile->minRecoveryPoint = lsn;
  891 rhaas@postgresql.org     8467                 :              8 :             ControlFile->minRecoveryPointTLI = replayTLI;
                               8468                 :                :         }
                               8469                 :                : 
 3118 alvherre@alvh.no-ip.     8470                 :             31 :         CommitTsParameterChange(xlrec.track_commit_timestamp,
                               8471                 :             31 :                                 ControlFile->track_commit_timestamp);
                               8472                 :             31 :         ControlFile->track_commit_timestamp = xlrec.track_commit_timestamp;
                               8473                 :                : 
 5100 heikki.linnakangas@i     8474                 :             31 :         UpdateControlFile();
 5095                          8475                 :             31 :         LWLockRelease(ControlFileLock);
                               8476                 :                : 
                               8477                 :                :         /* Check to see if any parameter change gives a problem on recovery */
 5100                          8478                 :             31 :         CheckRequiredParameterValues();
                               8479                 :                :     }
 4463 simon@2ndQuadrant.co     8480         [ -  + ]:GBC         284 :     else if (info == XLOG_FPW_CHANGE)
                               8481                 :                :     {
                               8482                 :                :         bool        fpw;
                               8483                 :                : 
 4463 simon@2ndQuadrant.co     8484                 :UBC           0 :         memcpy(&fpw, XLogRecGetData(record), sizeof(bool));
                               8485                 :                : 
                               8486                 :                :         /*
                               8487                 :                :          * Update the LSN of the last replayed XLOG_FPW_CHANGE record so that
                               8488                 :                :          * do_pg_backup_start() and do_pg_backup_stop() can check whether
                               8489                 :                :          * full_page_writes has been disabled during online backup.
                               8490                 :                :          */
                               8491         [ #  # ]:              0 :         if (!fpw)
                               8492                 :                :         {
 3492 andres@anarazel.de       8493         [ #  # ]:              0 :             SpinLockAcquire(&XLogCtl->info_lck);
  872 rhaas@postgresql.org     8494         [ #  # ]:              0 :             if (XLogCtl->lastFpwDisableRecPtr < record->ReadRecPtr)
                               8495                 :              0 :                 XLogCtl->lastFpwDisableRecPtr = record->ReadRecPtr;
 3492 andres@anarazel.de       8496                 :              0 :             SpinLockRelease(&XLogCtl->info_lck);
                               8497                 :                :         }
                               8498                 :                : 
                               8499                 :                :         /* Keep track of full_page_writes */
 4463 simon@2ndQuadrant.co     8500                 :              0 :         lastFullPageWrites = fpw;
                               8501                 :                :     }
                               8502                 :                :     else if (info == XLOG_CHECKPOINT_REDO)
                               8503                 :                :     {
                               8504                 :                :         /* nothing to do here, just for informational purposes */
                               8505                 :                :     }
 8576 vadim4o@yahoo.com        8506                 :CBC       42781 : }
                               8507                 :                : 
                               8508                 :                : /*
                               8509                 :                :  * Return the extra open flags used for opening a file, depending on the
                               8510                 :                :  * value of the GUCs wal_sync_method, fsync and io_direct.
                               8511                 :                :  */
                               8512                 :                : static int
 5814 magnus@hagander.net      8513                 :           8982 : get_sync_bit(int method)
                               8514                 :                : {
 5161 bruce@momjian.us         8515                 :           8982 :     int         o_direct_flag = 0;
                               8516                 :                : 
                               8517                 :                :     /*
                               8518                 :                :      * Use O_DIRECT if requested, except in walreceiver process.  The WAL
                               8519                 :                :      * written by walreceiver is normally read by the startup process soon
                               8520                 :                :      * after it's written.  Also, walreceiver performs unaligned writes, which
                               8521                 :                :      * don't work with O_DIRECT, so it is required for correctness too.
                               8522                 :                :      */
  372 tmunro@postgresql.or     8523   [ +  +  +  - ]:           8982 :     if ((io_direct_flags & IO_DIRECT_WAL) && !AmWalReceiverProcess())
 5168 heikki.linnakangas@i     8524                 :              8 :         o_direct_flag = PG_O_DIRECT;
                               8525                 :                : 
                               8526                 :                :     /* If fsync is disabled, never open in sync mode */
  372 tmunro@postgresql.or     8527         [ +  - ]:           8982 :     if (!enableFsync)
                               8528                 :           8982 :         return o_direct_flag;
                               8529                 :                : 
 5814 magnus@hagander.net      8530   [ #  #  #  # ]:UBC           0 :     switch (method)
                               8531                 :                :     {
                               8532                 :                :             /*
                               8533                 :                :              * enum values for all sync options are defined even if they are
                               8534                 :                :              * not supported on the current platform.  But if not, they are
                               8535                 :                :              * not included in the enum option array, and therefore will never
                               8536                 :                :              * be seen here.
                               8537                 :                :              */
  184 nathan@postgresql.or     8538                 :UNC           0 :         case WAL_SYNC_METHOD_FSYNC:
                               8539                 :                :         case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
                               8540                 :                :         case WAL_SYNC_METHOD_FDATASYNC:
  372 tmunro@postgresql.or     8541                 :UBC           0 :             return o_direct_flag;
                               8542                 :                : #ifdef O_SYNC
  184 nathan@postgresql.or     8543                 :UNC           0 :         case WAL_SYNC_METHOD_OPEN:
  632 tmunro@postgresql.or     8544                 :UBC           0 :             return O_SYNC | o_direct_flag;
                               8545                 :                : #endif
                               8546                 :                : #ifdef O_DSYNC
  184 nathan@postgresql.or     8547                 :UNC           0 :         case WAL_SYNC_METHOD_OPEN_DSYNC:
  632 tmunro@postgresql.or     8548                 :UBC           0 :             return O_DSYNC | o_direct_flag;
                               8549                 :                : #endif
 5816 magnus@hagander.net      8550                 :              0 :         default:
                               8551                 :                :             /* can't happen (unless we are out of sync with option array) */
 5812 tgl@sss.pgh.pa.us        8552         [ #  # ]:              0 :             elog(ERROR, "unrecognized wal_sync_method: %d", method);
                               8553                 :                :             return 0;           /* silence warning */
                               8554                 :                :     }
                               8555                 :                : }
                               8556                 :                : 
                               8557                 :                : /*
                               8558                 :                :  * GUC support
                               8559                 :                :  */
                               8560                 :                : void
  184 nathan@postgresql.or     8561                 :GNC         928 : assign_wal_sync_method(int new_wal_sync_method, void *extra)
                               8562                 :                : {
                               8563         [ -  + ]:            928 :     if (wal_sync_method != new_wal_sync_method)
                               8564                 :                :     {
                               8565                 :                :         /*
                               8566                 :                :          * To ensure that no blocks escape unsynced, force an fsync on the
                               8567                 :                :          * currently open log segment (if any).  Also, if the open flag is
                               8568                 :                :          * changing, close the log file so it will be reopened (with new flag
                               8569                 :                :          * bit) at next use.
                               8570                 :                :          */
 8430 tgl@sss.pgh.pa.us        8571         [ #  # ]:UBC           0 :         if (openLogFile >= 0)
                               8572                 :                :         {
 2584 rhaas@postgresql.org     8573                 :              0 :             pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN);
 8430 tgl@sss.pgh.pa.us        8574         [ #  # ]:              0 :             if (pg_fsync(openLogFile) != 0)
                               8575                 :                :             {
                               8576                 :                :                 char        xlogfname[MAXFNAMELEN];
                               8577                 :                :                 int         save_errno;
                               8578                 :                : 
 1594 michael@paquier.xyz      8579                 :              0 :                 save_errno = errno;
  891 rhaas@postgresql.org     8580                 :              0 :                 XLogFileName(xlogfname, openLogTLI, openLogSegNo,
                               8581                 :                :                              wal_segment_size);
 1594 michael@paquier.xyz      8582                 :              0 :                 errno = save_errno;
 7573 tgl@sss.pgh.pa.us        8583         [ #  # ]:              0 :                 ereport(PANIC,
                               8584                 :                :                         (errcode_for_file_access(),
                               8585                 :                :                          errmsg("could not fsync file \"%s\": %m", xlogfname)));
                               8586                 :                :             }
                               8587                 :                : 
 2584 rhaas@postgresql.org     8588                 :              0 :             pgstat_report_wait_end();
  184 nathan@postgresql.or     8589         [ #  # ]:UNC           0 :             if (get_sync_bit(wal_sync_method) != get_sync_bit(new_wal_sync_method))
 6513 bruce@momjian.us         8590                 :UBC           0 :                 XLogFileClose();
                               8591                 :                :         }
                               8592                 :                :     }
 8430 tgl@sss.pgh.pa.us        8593                 :CBC         928 : }
                               8594                 :                : 
                               8595                 :                : 
                               8596                 :                : /*
                               8597                 :                :  * Issue appropriate kind of fsync (if any) for an XLOG output file.
                               8598                 :                :  *
                               8599                 :                :  * 'fd' is a file descriptor for the XLOG file to be fsync'd.
                               8600                 :                :  * 'segno' is for error reporting purposes.
                               8601                 :                :  */
                               8602                 :                : void
  891 rhaas@postgresql.org     8603                 :         136682 : issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
                               8604                 :                : {
 1594 michael@paquier.xyz      8605                 :         136682 :     char       *msg = NULL;
                               8606                 :                :     instr_time  start;
                               8607                 :                : 
  891 rhaas@postgresql.org     8608         [ -  + ]:         136682 :     Assert(tli != 0);
                               8609                 :                : 
                               8610                 :                :     /*
                               8611                 :                :      * Quick exit if fsync is disabled or write() has already synced the WAL
                               8612                 :                :      * file.
                               8613                 :                :      */
 1132 fujii@postgresql.org     8614         [ -  + ]:         136682 :     if (!enableFsync ||
  184 nathan@postgresql.or     8615         [ #  # ]:UNC           0 :         wal_sync_method == WAL_SYNC_METHOD_OPEN ||
                               8616         [ #  # ]:              0 :         wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC)
 1132 fujii@postgresql.org     8617                 :CBC      136682 :         return;
                               8618                 :                : 
                               8619                 :                :     /* Measure I/O timing to sync the WAL file */
 1132 fujii@postgresql.org     8620         [ #  # ]:UBC           0 :     if (track_wal_io_timing)
                               8621                 :              0 :         INSTR_TIME_SET_CURRENT(start);
                               8622                 :                :     else
  450 andres@anarazel.de       8623                 :              0 :         INSTR_TIME_SET_ZERO(start);
                               8624                 :                : 
 2113 michael@paquier.xyz      8625                 :              0 :     pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
  184 nathan@postgresql.or     8626   [ #  #  #  # ]:UNC           0 :     switch (wal_sync_method)
                               8627                 :                :     {
                               8628                 :              0 :         case WAL_SYNC_METHOD_FSYNC:
 5203 heikki.linnakangas@i     8629         [ #  # ]:UBC           0 :             if (pg_fsync_no_writethrough(fd) != 0)
 1594 michael@paquier.xyz      8630                 :              0 :                 msg = _("could not fsync file \"%s\": %m");
 8430 tgl@sss.pgh.pa.us        8631                 :              0 :             break;
                               8632                 :                : #ifdef HAVE_FSYNC_WRITETHROUGH
                               8633                 :                :         case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
                               8634                 :                :             if (pg_fsync_writethrough(fd) != 0)
                               8635                 :                :                 msg = _("could not fsync write-through file \"%s\": %m");
                               8636                 :                :             break;
                               8637                 :                : #endif
  184 nathan@postgresql.or     8638                 :UNC           0 :         case WAL_SYNC_METHOD_FDATASYNC:
 5203 heikki.linnakangas@i     8639         [ #  # ]:UBC           0 :             if (pg_fdatasync(fd) != 0)
 1594 michael@paquier.xyz      8640                 :              0 :                 msg = _("could not fdatasync file \"%s\": %m");
 8430 tgl@sss.pgh.pa.us        8641                 :              0 :             break;
  184 nathan@postgresql.or     8642                 :UNC           0 :         case WAL_SYNC_METHOD_OPEN:
                               8643                 :                :         case WAL_SYNC_METHOD_OPEN_DSYNC:
                               8644                 :                :             /* not reachable */
 1132 fujii@postgresql.org     8645                 :UBC           0 :             Assert(false);
                               8646                 :                :             break;
 8430 tgl@sss.pgh.pa.us        8647                 :              0 :         default:
   11 dgustafsson@postgres     8648         [ #  # ]:UNC           0 :             ereport(PANIC,
                               8649                 :                :                     errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               8650                 :                :                     errmsg_internal("unrecognized wal_sync_method: %d", wal_sync_method));
                               8651                 :                :             break;
                               8652                 :                :     }
                               8653                 :                : 
                               8654                 :                :     /* PANIC if failed to fsync */
 1594 michael@paquier.xyz      8655         [ #  # ]:UBC           0 :     if (msg)
                               8656                 :                :     {
                               8657                 :                :         char        xlogfname[MAXFNAMELEN];
                               8658                 :              0 :         int         save_errno = errno;
                               8659                 :                : 
  891 rhaas@postgresql.org     8660                 :              0 :         XLogFileName(xlogfname, tli, segno, wal_segment_size);
 1594 michael@paquier.xyz      8661                 :              0 :         errno = save_errno;
                               8662         [ #  # ]:              0 :         ereport(PANIC,
                               8663                 :                :                 (errcode_for_file_access(),
                               8664                 :                :                  errmsg(msg, xlogfname)));
                               8665                 :                :     }
                               8666                 :                : 
                               8667                 :              0 :     pgstat_report_wait_end();
                               8668                 :                : 
                               8669                 :                :     /*
                               8670                 :                :      * Increment the I/O timing and the number of times WAL files were synced.
                               8671                 :                :      */
 1132 fujii@postgresql.org     8672         [ #  # ]:              0 :     if (track_wal_io_timing)
                               8673                 :                :     {
                               8674                 :                :         instr_time  end;
                               8675                 :                : 
  212 dgustafsson@postgres     8676                 :UNC           0 :         INSTR_TIME_SET_CURRENT(end);
                               8677                 :              0 :         INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
                               8678                 :                :     }
                               8679                 :                : 
  739 andres@anarazel.de       8680                 :UBC           0 :     PendingWalStats.wal_sync++;
                               8681                 :                : }
                               8682                 :                : 
                               8683                 :                : /*
                               8684                 :                :  * do_pg_backup_start is the workhorse of the user-visible pg_backup_start()
                               8685                 :                :  * function. It creates the necessary starting checkpoint and constructs the
                               8686                 :                :  * backup state and tablespace map.
                               8687                 :                :  *
                               8688                 :                :  * Input parameters are "state" (the backup state), "fast" (if true, we do
                               8689                 :                :  * the checkpoint in immediate mode to make it faster), and "tablespaces"
                               8690                 :                :  * (if non-NULL, indicates a list of tablespaceinfo structs describing the
                               8691                 :                :  * cluster's tablespaces.).
                               8692                 :                :  *
                               8693                 :                :  * The tablespace map contents are appended to passed-in parameter
                               8694                 :                :  * tablespace_map and the caller is responsible for including it in the backup
                               8695                 :                :  * archive as 'tablespace_map'. The tablespace_map file is required mainly for
                               8696                 :                :  * tar format in windows as native windows utilities are not able to create
                               8697                 :                :  * symlinks while extracting files from tar. However for consistency and
                               8698                 :                :  * platform-independence, we do it the same way everywhere.
                               8699                 :                :  *
                               8700                 :                :  * It fills in "state" with the information required for the backup, such
                               8701                 :                :  * as the minimum WAL location that must be present to restore from this
                               8702                 :                :  * backup (starttli) and the corresponding timeline ID (starttli).
                               8703                 :                :  *
                               8704                 :                :  * Every successfully started backup must be stopped by calling
                               8705                 :                :  * do_pg_backup_stop() or do_pg_abort_backup(). There can be many
                               8706                 :                :  * backups active at the same time.
                               8707                 :                :  *
                               8708                 :                :  * It is the responsibility of the caller of this function to verify the
                               8709                 :                :  * permissions of the calling user!
                               8710                 :                :  */
                               8711                 :                : void
  566 michael@paquier.xyz      8712                 :CBC         153 : do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
                               8713                 :                :                    BackupState *state, StringInfo tblspcmapfile)
                               8714                 :                : {
                               8715                 :                :     bool        backup_started_in_recovery;
                               8716                 :                : 
                               8717         [ -  + ]:            153 :     Assert(state != NULL);
 4463 simon@2ndQuadrant.co     8718                 :            153 :     backup_started_in_recovery = RecoveryInProgress();
                               8719                 :                : 
                               8720                 :                :     /*
                               8721                 :                :      * During recovery, we don't need to check WAL level. Because, if WAL
                               8722                 :                :      * level is not sufficient, it's impossible to get here during recovery.
                               8723                 :                :      */
                               8724   [ +  +  -  + ]:            153 :     if (!backup_started_in_recovery && !XLogIsNeeded())
 6045 tgl@sss.pgh.pa.us        8725         [ #  # ]:UBC           0 :         ereport(ERROR,
                               8726                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               8727                 :                :                  errmsg("WAL level not sufficient for making an online backup"),
                               8728                 :                :                  errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));
                               8729                 :                : 
 4822 heikki.linnakangas@i     8730         [ +  + ]:CBC         153 :     if (strlen(backupidstr) > MAXPGPATH)
                               8731         [ +  - ]:              1 :         ereport(ERROR,
                               8732                 :                :                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               8733                 :                :                  errmsg("backup label too long (max %d bytes)",
                               8734                 :                :                         MAXPGPATH)));
                               8735                 :                : 
  566 michael@paquier.xyz      8736                 :            152 :     memcpy(state->name, backupidstr, strlen(backupidstr));
                               8737                 :                : 
                               8738                 :                :     /*
                               8739                 :                :      * Mark backup active in shared memory.  We must do full-page WAL writes
                               8740                 :                :      * during an on-line backup even if not doing so at other times, because
                               8741                 :                :      * it's quite possible for the backup dump to obtain a "torn" (partially
                               8742                 :                :      * written) copy of a database page if it reads the page concurrently with
                               8743                 :                :      * our write to the same page.  This can be fixed as long as the first
                               8744                 :                :      * write to the page in the WAL sequence is a full-page write. Hence, we
                               8745                 :                :      * increment runningBackups then force a CHECKPOINT, to ensure there are
                               8746                 :                :      * no dirty pages in shared memory that might get dumped while the backup
                               8747                 :                :      * is in progress without having a corresponding WAL record.  (Once the
                               8748                 :                :      * backup is complete, we need not force full-page writes anymore, since
                               8749                 :                :      * we expect that any pages not modified during the backup interval must
                               8750                 :                :      * have been correctly captured by the backup.)
                               8751                 :                :      *
                               8752                 :                :      * Note that forcing full-page writes has no effect during an online
                               8753                 :                :      * backup from the standby.
                               8754                 :                :      *
                               8755                 :                :      * We must hold all the insertion locks to change the value of
                               8756                 :                :      * runningBackups, to ensure adequate interlocking against
                               8757                 :                :      * XLogInsertRecord().
                               8758                 :                :      */
 3677 heikki.linnakangas@i     8759                 :            152 :     WALInsertLockAcquireExclusive();
  739 sfrost@snowman.net       8760                 :            152 :     XLogCtl->Insert.runningBackups++;
 3677 heikki.linnakangas@i     8761                 :            152 :     WALInsertLockRelease();
                               8762                 :                : 
                               8763                 :                :     /*
                               8764                 :                :      * Ensure we decrement runningBackups if we fail below. NB -- for this to
                               8765                 :                :      * work correctly, it is critical that sessionBackupState is only updated
                               8766                 :                :      * after this block is over.
                               8767                 :                :      */
  543 alvherre@alvh.no-ip.     8768         [ +  - ]:            152 :     PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, DatumGetBool(true));
                               8769                 :                :     {
 4753 bruce@momjian.us         8770                 :            152 :         bool        gotUniqueStartpoint = false;
                               8771                 :                :         DIR        *tblspcdir;
                               8772                 :                :         struct dirent *de;
                               8773                 :                :         tablespaceinfo *ti;
                               8774                 :                :         int         datadirpathlen;
                               8775                 :                : 
                               8776                 :                :         /*
                               8777                 :                :          * Force an XLOG file switch before the checkpoint, to ensure that the
                               8778                 :                :          * WAL segment the checkpoint is written to doesn't contain pages with
                               8779                 :                :          * old timeline IDs.  That would otherwise happen if you called
                               8780                 :                :          * pg_backup_start() right after restoring from a PITR archive: the
                               8781                 :                :          * first WAL segment containing the startup checkpoint has pages in
                               8782                 :                :          * the beginning with the old timeline ID.  That can cause trouble at
                               8783                 :                :          * recovery: we won't have a history file covering the old timeline if
                               8784                 :                :          * pg_wal directory was not included in the base backup and the WAL
                               8785                 :                :          * archive was cleared too before starting the backup.
                               8786                 :                :          *
                               8787                 :                :          * This also ensures that we have emitted a WAL page header that has
                               8788                 :                :          * XLP_BKP_REMOVABLE off before we emit the checkpoint record.
                               8789                 :                :          * Therefore, if a WAL archiver (such as pglesslog) is trying to
                               8790                 :                :          * compress out removable backup blocks, it won't remove any that
                               8791                 :                :          * occur after this point.
                               8792                 :                :          *
                               8793                 :                :          * During recovery, we skip forcing XLOG file switch, which means that
                               8794                 :                :          * the backup taken during recovery is not available for the special
                               8795                 :                :          * recovery case described above.
                               8796                 :                :          */
 4463 simon@2ndQuadrant.co     8797         [ +  + ]:            152 :         if (!backup_started_in_recovery)
 2670 andres@anarazel.de       8798                 :            145 :             RequestXLogSwitch(false);
                               8799                 :                : 
                               8800                 :                :         do
                               8801                 :                :         {
                               8802                 :                :             bool        checkpointfpw;
                               8803                 :                : 
                               8804                 :                :             /*
                               8805                 :                :              * Force a CHECKPOINT.  Aside from being necessary to prevent torn
                               8806                 :                :              * page problems, this guarantees that two successive backup runs
                               8807                 :                :              * will have different checkpoint positions and hence different
                               8808                 :                :              * history file names, even if nothing happened in between.
                               8809                 :                :              *
                               8810                 :                :              * During recovery, establish a restartpoint if possible. We use
                               8811                 :                :              * the last restartpoint as the backup starting checkpoint. This
                               8812                 :                :              * means that two successive backup runs can have same checkpoint
                               8813                 :                :              * positions.
                               8814                 :                :              *
                               8815                 :                :              * Since the fact that we are executing do_pg_backup_start()
                               8816                 :                :              * during recovery means that checkpointer is running, we can use
                               8817                 :                :              * RequestCheckpoint() to establish a restartpoint.
                               8818                 :                :              *
                               8819                 :                :              * We use CHECKPOINT_IMMEDIATE only if requested by user (via
                               8820                 :                :              * passing fast = true).  Otherwise this can take awhile.
                               8821                 :                :              */
 4773 heikki.linnakangas@i     8822         [ +  + ]:            152 :             RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
                               8823                 :                :                               (fast ? CHECKPOINT_IMMEDIATE : 0));
                               8824                 :                : 
                               8825                 :                :             /*
                               8826                 :                :              * Now we need to fetch the checkpoint record location, and also
                               8827                 :                :              * its REDO pointer.  The oldest point in WAL that would be needed
                               8828                 :                :              * to restore starting from the checkpoint is precisely the REDO
                               8829                 :                :              * pointer.
                               8830                 :                :              */
                               8831                 :            152 :             LWLockAcquire(ControlFileLock, LW_SHARED);
  566 michael@paquier.xyz      8832                 :            152 :             state->checkpointloc = ControlFile->checkPoint;
                               8833                 :            152 :             state->startpoint = ControlFile->checkPointCopy.redo;
                               8834                 :            152 :             state->starttli = ControlFile->checkPointCopy.ThisTimeLineID;
 4463 simon@2ndQuadrant.co     8835                 :            152 :             checkpointfpw = ControlFile->checkPointCopy.fullPageWrites;
 4773 heikki.linnakangas@i     8836                 :            152 :             LWLockRelease(ControlFileLock);
                               8837                 :                : 
 4463 simon@2ndQuadrant.co     8838         [ +  + ]:            152 :             if (backup_started_in_recovery)
                               8839                 :                :             {
                               8840                 :                :                 XLogRecPtr  recptr;
                               8841                 :                : 
                               8842                 :                :                 /*
                               8843                 :                :                  * Check to see if all WAL replayed during online backup
                               8844                 :                :                  * (i.e., since last restartpoint used as backup starting
                               8845                 :                :                  * checkpoint) contain full-page writes.
                               8846                 :                :                  */
 3492 andres@anarazel.de       8847         [ -  + ]:              7 :                 SpinLockAcquire(&XLogCtl->info_lck);
                               8848                 :              7 :                 recptr = XLogCtl->lastFpwDisableRecPtr;
                               8849                 :              7 :                 SpinLockRelease(&XLogCtl->info_lck);
                               8850                 :                : 
  566 michael@paquier.xyz      8851   [ +  -  -  + ]:              7 :                 if (!checkpointfpw || state->startpoint <= recptr)
 4463 simon@2ndQuadrant.co     8852         [ #  # ]:UBC           0 :                     ereport(ERROR,
                               8853                 :                :                             (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               8854                 :                :                              errmsg("WAL generated with full_page_writes=off was replayed "
                               8855                 :                :                                     "since last restartpoint"),
                               8856                 :                :                              errhint("This means that the backup being taken on the standby "
                               8857                 :                :                                      "is corrupt and should not be used. "
                               8858                 :                :                                      "Enable full_page_writes and run CHECKPOINT on the primary, "
                               8859                 :                :                                      "and then try an online backup again.")));
                               8860                 :                : 
                               8861                 :                :                 /*
                               8862                 :                :                  * During recovery, since we don't use the end-of-backup WAL
                               8863                 :                :                  * record and don't write the backup history file, the
                               8864                 :                :                  * starting WAL location doesn't need to be unique. This means
                               8865                 :                :                  * that two base backups started at the same time might use
                               8866                 :                :                  * the same checkpoint as starting locations.
                               8867                 :                :                  */
 4463 simon@2ndQuadrant.co     8868                 :CBC           7 :                 gotUniqueStartpoint = true;
                               8869                 :                :             }
                               8870                 :                : 
                               8871                 :                :             /*
                               8872                 :                :              * If two base backups are started at the same time (in WAL sender
                               8873                 :                :              * processes), we need to make sure that they use different
                               8874                 :                :              * checkpoints as starting locations, because we use the starting
                               8875                 :                :              * WAL location as a unique identifier for the base backup in the
                               8876                 :                :              * end-of-backup WAL record and when we write the backup history
                               8877                 :                :              * file. Perhaps it would be better generate a separate unique ID
                               8878                 :                :              * for each backup instead of forcing another checkpoint, but
                               8879                 :                :              * taking a checkpoint right after another is not that expensive
                               8880                 :                :              * either because only few buffers have been dirtied yet.
                               8881                 :                :              */
 3677 heikki.linnakangas@i     8882                 :            152 :             WALInsertLockAcquireExclusive();
  566 michael@paquier.xyz      8883         [ +  - ]:            152 :             if (XLogCtl->Insert.lastBackupStart < state->startpoint)
                               8884                 :                :             {
                               8885                 :            152 :                 XLogCtl->Insert.lastBackupStart = state->startpoint;
 4773 heikki.linnakangas@i     8886                 :            152 :                 gotUniqueStartpoint = true;
                               8887                 :                :             }
 3677                          8888                 :            152 :             WALInsertLockRelease();
 4753 bruce@momjian.us         8889         [ -  + ]:            152 :         } while (!gotUniqueStartpoint);
                               8890                 :                : 
                               8891                 :                :         /*
                               8892                 :                :          * Construct tablespace_map file.
                               8893                 :                :          */
 3260 andrew@dunslane.net      8894                 :            152 :         datadirpathlen = strlen(DataDir);
                               8895                 :                : 
                               8896                 :                :         /* Collect information about all tablespaces */
 2323 tgl@sss.pgh.pa.us        8897                 :            152 :         tblspcdir = AllocateDir("pg_tblspc");
 3260 andrew@dunslane.net      8898         [ +  + ]:            487 :         while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL)
                               8899                 :                :         {
                               8900                 :                :             char        fullpath[MAXPGPATH + 10];
                               8901                 :                :             char        linkpath[MAXPGPATH];
                               8902                 :            335 :             char       *relpath = NULL;
                               8903                 :                :             char       *s;
                               8904                 :                :             PGFileType  de_type;
                               8905                 :                :             char       *badp;
                               8906                 :                :             Oid         tsoid;
                               8907                 :                : 
                               8908                 :                :             /*
                               8909                 :                :              * Try to parse the directory name as an unsigned integer.
                               8910                 :                :              *
                               8911                 :                :              * Tablespace directories should be positive integers that can be
                               8912                 :                :              * represented in 32 bits, with no leading zeroes or trailing
                               8913                 :                :              * garbage. If we come across a name that doesn't meet those
                               8914                 :                :              * criteria, skip it.
                               8915                 :                :              */
  174 rhaas@postgresql.org     8916   [ +  +  -  + ]:GNC         335 :             if (de->d_name[0] < '1' || de->d_name[1] > '9')
                               8917                 :            304 :                 continue;
                               8918                 :             31 :             errno = 0;
                               8919                 :             31 :             tsoid = strtoul(de->d_name, &badp, 10);
                               8920   [ +  -  +  -  :             31 :             if (*badp != '\0' || errno == EINVAL || errno == ERANGE)
                                              -  + ]
 3260 andrew@dunslane.net      8921                 :LBC       (258) :                 continue;
                               8922                 :                : 
 3260 andrew@dunslane.net      8923                 :CBC          31 :             snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
                               8924                 :                : 
  362 rhaas@postgresql.org     8925                 :             31 :             de_type = get_dirent_type(fullpath, de, false, ERROR);
                               8926                 :                : 
                               8927         [ +  + ]:             31 :             if (de_type == PGFILETYPE_LNK)
                               8928                 :                :             {
                               8929                 :                :                 StringInfoData escapedpath;
                               8930                 :                :                 int         rllen;
                               8931                 :                : 
                               8932                 :             17 :                 rllen = readlink(fullpath, linkpath, sizeof(linkpath));
                               8933         [ -  + ]:             17 :                 if (rllen < 0)
                               8934                 :                :                 {
  362 rhaas@postgresql.org     8935         [ #  # ]:UBC           0 :                     ereport(WARNING,
                               8936                 :                :                             (errmsg("could not read symbolic link \"%s\": %m",
                               8937                 :                :                                     fullpath)));
                               8938                 :              0 :                     continue;
                               8939                 :                :                 }
  362 rhaas@postgresql.org     8940         [ -  + ]:CBC          17 :                 else if (rllen >= sizeof(linkpath))
                               8941                 :                :                 {
  362 rhaas@postgresql.org     8942         [ #  # ]:UBC           0 :                     ereport(WARNING,
                               8943                 :                :                             (errmsg("symbolic link \"%s\" target is too long",
                               8944                 :                :                                     fullpath)));
                               8945                 :              0 :                     continue;
                               8946                 :                :                 }
  362 rhaas@postgresql.org     8947                 :CBC          17 :                 linkpath[rllen] = '\0';
                               8948                 :                : 
                               8949                 :                :                 /*
                               8950                 :                :                  * Relpath holds the relative path of the tablespace directory
                               8951                 :                :                  * when it's located within PGDATA, or NULL if it's located
                               8952                 :                :                  * elsewhere.
                               8953                 :                :                  */
                               8954         [ +  + ]:             17 :                 if (rllen > datadirpathlen &&
                               8955         [ -  + ]:              1 :                     strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
  331 tgl@sss.pgh.pa.us        8956         [ #  # ]:UBC           0 :                     IS_DIR_SEP(linkpath[datadirpathlen]))
  362 rhaas@postgresql.org     8957                 :              0 :                     relpath = pstrdup(linkpath + datadirpathlen + 1);
                               8958                 :                : 
                               8959                 :                :                 /*
                               8960                 :                :                  * Add a backslash-escaped version of the link path to the
                               8961                 :                :                  * tablespace map file.
                               8962                 :                :                  */
  362 rhaas@postgresql.org     8963                 :CBC          17 :                 initStringInfo(&escapedpath);
                               8964         [ +  + ]:            460 :                 for (s = linkpath; *s; s++)
                               8965                 :                :                 {
                               8966   [ +  -  +  -  :            443 :                     if (*s == '\n' || *s == '\r' || *s == '\\')
                                              -  + ]
  362 rhaas@postgresql.org     8967                 :UBC           0 :                         appendStringInfoChar(&escapedpath, '\\');
  362 rhaas@postgresql.org     8968                 :CBC         443 :                     appendStringInfoChar(&escapedpath, *s);
                               8969                 :                :                 }
                               8970                 :             17 :                 appendStringInfo(tblspcmapfile, "%s %s\n",
                               8971                 :             17 :                                  de->d_name, escapedpath.data);
                               8972                 :             17 :                 pfree(escapedpath.data);
                               8973                 :                :             }
                               8974         [ +  - ]:             14 :             else if (de_type == PGFILETYPE_DIR)
                               8975                 :                :             {
                               8976                 :                :                 /*
                               8977                 :                :                  * It's possible to use allow_in_place_tablespaces to create
                               8978                 :                :                  * directories directly under pg_tblspc, for testing purposes
                               8979                 :                :                  * only.
                               8980                 :                :                  *
                               8981                 :                :                  * In this case, we store a relative path rather than an
                               8982                 :                :                  * absolute path into the tablespaceinfo.
                               8983                 :                :                  */
                               8984                 :             14 :                 snprintf(linkpath, sizeof(linkpath), "pg_tblspc/%s",
                               8985                 :             14 :                          de->d_name);
                               8986                 :             14 :                 relpath = pstrdup(linkpath);
                               8987                 :                :             }
                               8988                 :                :             else
                               8989                 :                :             {
                               8990                 :                :                 /* Skip any other file type that appears here. */
  362 rhaas@postgresql.org     8991                 :UBC           0 :                 continue;
                               8992                 :                :             }
                               8993                 :                : 
 3260 andrew@dunslane.net      8994                 :CBC          31 :             ti = palloc(sizeof(tablespaceinfo));
  174 rhaas@postgresql.org     8995                 :GNC          31 :             ti->oid = tsoid;
 1124 tgl@sss.pgh.pa.us        8996                 :CBC          31 :             ti->path = pstrdup(linkpath);
  362 rhaas@postgresql.org     8997                 :             31 :             ti->rpath = relpath;
 1397                          8998                 :             31 :             ti->size = -1;
                               8999                 :                : 
 3249 bruce@momjian.us         9000         [ +  - ]:             31 :             if (tablespaces)
                               9001                 :             31 :                 *tablespaces = lappend(*tablespaces, ti);
                               9002                 :                :         }
 2323 tgl@sss.pgh.pa.us        9003                 :            152 :         FreeDir(tblspcdir);
                               9004                 :                : 
  566 michael@paquier.xyz      9005                 :            152 :         state->starttime = (pg_time_t) time(NULL);
                               9006                 :                :     }
  543 alvherre@alvh.no-ip.     9007         [ -  + ]:            152 :     PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, DatumGetBool(true));
                               9008                 :                : 
  566 michael@paquier.xyz      9009                 :            152 :     state->started_in_recovery = backup_started_in_recovery;
                               9010                 :                : 
                               9011                 :                :     /*
                               9012                 :                :      * Mark that the start phase has correctly finished for the backup.
                               9013                 :                :      */
  739 sfrost@snowman.net       9014                 :            152 :     sessionBackupState = SESSION_BACKUP_RUNNING;
 7194 tgl@sss.pgh.pa.us        9015                 :            152 : }
                               9016                 :                : 
                               9017                 :                : /*
                               9018                 :                :  * Utility routine to fetch the session-level status of a backup running.
                               9019                 :                :  */
                               9020                 :                : SessionBackupState
 2578 teodor@sigaev.ru         9021                 :            172 : get_backup_status(void)
                               9022                 :                : {
                               9023                 :            172 :     return sessionBackupState;
                               9024                 :                : }
                               9025                 :                : 
                               9026                 :                : /*
                               9027                 :                :  * do_pg_backup_stop
                               9028                 :                :  *
                               9029                 :                :  * Utility function called at the end of an online backup.  It creates history
                               9030                 :                :  * file (if required), resets sessionBackupState and so on.  It can optionally
                               9031                 :                :  * wait for WAL segments to be archived.
                               9032                 :                :  *
                               9033                 :                :  * "state" is filled with the information necessary to restore from this
                               9034                 :                :  * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc.
                               9035                 :                :  *
                               9036                 :                :  * It is the responsibility of the caller of this function to verify the
                               9037                 :                :  * permissions of the calling user!
                               9038                 :                :  */
                               9039                 :                : void
  566 michael@paquier.xyz      9040                 :            145 : do_pg_backup_stop(BackupState *state, bool waitforarchive)
                               9041                 :                : {
                               9042                 :            145 :     bool        backup_stopped_in_recovery = false;
                               9043                 :                :     char        histfilepath[MAXPGPATH];
                               9044                 :                :     char        lastxlogfilename[MAXFNAMELEN];
                               9045                 :                :     char        histfilename[MAXFNAMELEN];
                               9046                 :                :     XLogSegNo   _logSegNo;
                               9047                 :                :     FILE       *fp;
                               9048                 :                :     int         seconds_before_warning;
 5853 bruce@momjian.us         9049                 :            145 :     int         waits = 0;
 5110 simon@2ndQuadrant.co     9050                 :            145 :     bool        reported_waiting = false;
                               9051                 :                : 
  566 michael@paquier.xyz      9052         [ -  + ]:            145 :     Assert(state != NULL);
                               9053                 :                : 
                               9054                 :            145 :     backup_stopped_in_recovery = RecoveryInProgress();
                               9055                 :                : 
                               9056                 :                :     /*
                               9057                 :                :      * During recovery, we don't need to check WAL level. Because, if WAL
                               9058                 :                :      * level is not sufficient, it's impossible to get here during recovery.
                               9059                 :                :      */
                               9060   [ +  +  -  + ]:            145 :     if (!backup_stopped_in_recovery && !XLogIsNeeded())
 5697 tgl@sss.pgh.pa.us        9061         [ #  # ]:UBC           0 :         ereport(ERROR,
                               9062                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9063                 :                :                  errmsg("WAL level not sufficient for making an online backup"),
                               9064                 :                :                  errhint("wal_level must be set to \"replica\" or \"logical\" at server start.")));
                               9065                 :                : 
                               9066                 :                :     /*
                               9067                 :                :      * OK to update backup counter and session-level lock.
                               9068                 :                :      *
                               9069                 :                :      * Note that CHECK_FOR_INTERRUPTS() must not occur while updating them,
                               9070                 :                :      * otherwise they can be updated inconsistently, which might cause
                               9071                 :                :      * do_pg_abort_backup() to fail.
                               9072                 :                :      */
 2644 fujii@postgresql.org     9073                 :CBC         145 :     WALInsertLockAcquireExclusive();
                               9074                 :                : 
                               9075                 :                :     /*
                               9076                 :                :      * It is expected that each do_pg_backup_start() call is matched by
                               9077                 :                :      * exactly one do_pg_backup_stop() call.
                               9078                 :                :      */
  739 sfrost@snowman.net       9079         [ -  + ]:            145 :     Assert(XLogCtl->Insert.runningBackups > 0);
                               9080                 :            145 :     XLogCtl->Insert.runningBackups--;
                               9081                 :                : 
                               9082                 :                :     /*
                               9083                 :                :      * Clean up session-level lock.
                               9084                 :                :      *
                               9085                 :                :      * You might think that WALInsertLockRelease() can be called before
                               9086                 :                :      * cleaning up session-level lock because session-level lock doesn't need
                               9087                 :                :      * to be protected with WAL insertion lock. But since
                               9088                 :                :      * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be
                               9089                 :                :      * cleaned up before it.
                               9090                 :                :      */
 2578 teodor@sigaev.ru         9091                 :            145 :     sessionBackupState = SESSION_BACKUP_NONE;
                               9092                 :                : 
 2308 fujii@postgresql.org     9093                 :            145 :     WALInsertLockRelease();
                               9094                 :                : 
                               9095                 :                :     /*
                               9096                 :                :      * If we are taking an online backup from the standby, we confirm that the
                               9097                 :                :      * standby has not been promoted during the backup.
                               9098                 :                :      */
  566 michael@paquier.xyz      9099   [ +  +  -  + ]:            145 :     if (state->started_in_recovery && !backup_stopped_in_recovery)
 4463 simon@2ndQuadrant.co     9100         [ #  # ]:UBC           0 :         ereport(ERROR,
                               9101                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9102                 :                :                  errmsg("the standby was promoted during online backup"),
                               9103                 :                :                  errhint("This means that the backup being taken is corrupt "
                               9104                 :                :                          "and should not be used. "
                               9105                 :                :                          "Try taking another online backup.")));
                               9106                 :                : 
                               9107                 :                :     /*
                               9108                 :                :      * During recovery, we don't write an end-of-backup record. We assume that
                               9109                 :                :      * pg_control was backed up last and its minimum recovery point can be
                               9110                 :                :      * available as the backup end location. Since we don't have an
                               9111                 :                :      * end-of-backup record, we use the pg_control value to check whether
                               9112                 :                :      * we've reached the end of backup when starting recovery from this
                               9113                 :                :      * backup. We have no way of checking if pg_control wasn't backed up last
                               9114                 :                :      * however.
                               9115                 :                :      *
                               9116                 :                :      * We don't force a switch to new WAL file but it is still possible to
                               9117                 :                :      * wait for all the required files to be archived if waitforarchive is
                               9118                 :                :      * true. This is okay if we use the backup to start a standby and fetch
                               9119                 :                :      * the missing WAL using streaming replication. But in the case of an
                               9120                 :                :      * archive recovery, a user should set waitforarchive to true and wait for
                               9121                 :                :      * them to be archived to ensure that all the required files are
                               9122                 :                :      * available.
                               9123                 :                :      *
                               9124                 :                :      * We return the current minimum recovery point as the backup end
                               9125                 :                :      * location. Note that it can be greater than the exact backup end
                               9126                 :                :      * location if the minimum recovery point is updated after the backup of
                               9127                 :                :      * pg_control. This is harmless for current uses.
                               9128                 :                :      *
                               9129                 :                :      * XXX currently a backup history file is for informational and debug
                               9130                 :                :      * purposes only. It's not essential for an online backup. Furthermore,
                               9131                 :                :      * even if it's created, it will not be archived during recovery because
                               9132                 :                :      * an archiver is not invoked. So it doesn't seem worthwhile to write a
                               9133                 :                :      * backup history file during recovery.
                               9134                 :                :      */
  566 michael@paquier.xyz      9135         [ +  + ]:CBC         145 :     if (backup_stopped_in_recovery)
                               9136                 :                :     {
                               9137                 :                :         XLogRecPtr  recptr;
                               9138                 :                : 
                               9139                 :                :         /*
                               9140                 :                :          * Check to see if all WAL replayed during online backup contain
                               9141                 :                :          * full-page writes.
                               9142                 :                :          */
 3492 andres@anarazel.de       9143         [ -  + ]:              7 :         SpinLockAcquire(&XLogCtl->info_lck);
                               9144                 :              7 :         recptr = XLogCtl->lastFpwDisableRecPtr;
                               9145                 :              7 :         SpinLockRelease(&XLogCtl->info_lck);
                               9146                 :                : 
  566 michael@paquier.xyz      9147         [ -  + ]:              7 :         if (state->startpoint <= recptr)
 4463 simon@2ndQuadrant.co     9148         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9149                 :                :                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9150                 :                :                      errmsg("WAL generated with full_page_writes=off was replayed "
                               9151                 :                :                             "during online backup"),
                               9152                 :                :                      errhint("This means that the backup being taken on the standby "
                               9153                 :                :                              "is corrupt and should not be used. "
                               9154                 :                :                              "Enable full_page_writes and run CHECKPOINT on the primary, "
                               9155                 :                :                              "and then try an online backup again.")));
                               9156                 :                : 
                               9157                 :                : 
 4463 simon@2ndQuadrant.co     9158                 :CBC           7 :         LWLockAcquire(ControlFileLock, LW_SHARED);
  566 michael@paquier.xyz      9159                 :              7 :         state->stoppoint = ControlFile->minRecoveryPoint;
                               9160                 :              7 :         state->stoptli = ControlFile->minRecoveryPointTLI;
 4463 simon@2ndQuadrant.co     9161                 :              7 :         LWLockRelease(ControlFileLock);
                               9162                 :                :     }
                               9163                 :                :     else
                               9164                 :                :     {
                               9165                 :                :         char       *history_file;
                               9166                 :                : 
                               9167                 :                :         /*
                               9168                 :                :          * Write the backup-end xlog record
                               9169                 :                :          */
 2444 rhaas@postgresql.org     9170                 :            138 :         XLogBeginInsert();
  566 michael@paquier.xyz      9171                 :            138 :         XLogRegisterData((char *) (&state->startpoint),
                               9172                 :                :                          sizeof(state->startpoint));
                               9173                 :            138 :         state->stoppoint = XLogInsert(RM_XLOG_ID, XLOG_BACKUP_END);
                               9174                 :                : 
                               9175                 :                :         /*
                               9176                 :                :          * Given that we're not in recovery, InsertTimeLineID is set and can't
                               9177                 :                :          * change, so we can read it without a lock.
                               9178                 :                :          */
                               9179                 :            138 :         state->stoptli = XLogCtl->InsertTimeLineID;
                               9180                 :                : 
                               9181                 :                :         /*
                               9182                 :                :          * Force a switch to a new xlog segment file, so that the backup is
                               9183                 :                :          * valid as soon as archiver moves out the current segment file.
                               9184                 :                :          */
 2444 rhaas@postgresql.org     9185                 :            138 :         RequestXLogSwitch(false);
                               9186                 :                : 
  566 michael@paquier.xyz      9187                 :            138 :         state->stoptime = (pg_time_t) time(NULL);
                               9188                 :                : 
                               9189                 :                :         /*
                               9190                 :                :          * Write the backup history file
                               9191                 :                :          */
                               9192                 :            138 :         XLByteToSeg(state->startpoint, _logSegNo, wal_segment_size);
                               9193                 :            138 :         BackupHistoryFilePath(histfilepath, state->stoptli, _logSegNo,
                               9194                 :                :                               state->startpoint, wal_segment_size);
 2444 rhaas@postgresql.org     9195                 :            138 :         fp = AllocateFile(histfilepath, "w");
                               9196         [ -  + ]:            138 :         if (!fp)
 2444 rhaas@postgresql.org     9197         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9198                 :                :                     (errcode_for_file_access(),
                               9199                 :                :                      errmsg("could not create file \"%s\": %m",
                               9200                 :                :                             histfilepath)));
                               9201                 :                : 
                               9202                 :                :         /* Build and save the contents of the backup history file */
  566 michael@paquier.xyz      9203                 :CBC         138 :         history_file = build_backup_content(state, true);
  565                          9204                 :            138 :         fprintf(fp, "%s", history_file);
  566                          9205                 :            138 :         pfree(history_file);
                               9206                 :                : 
 2444 rhaas@postgresql.org     9207   [ +  -  +  -  :            138 :         if (fflush(fp) || ferror(fp) || FreeFile(fp))
                                              -  + ]
 2444 rhaas@postgresql.org     9208         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9209                 :                :                     (errcode_for_file_access(),
                               9210                 :                :                      errmsg("could not write file \"%s\": %m",
                               9211                 :                :                             histfilepath)));
                               9212                 :                : 
                               9213                 :                :         /*
                               9214                 :                :          * Clean out any no-longer-needed history files.  As a side effect,
                               9215                 :                :          * this will post a .ready file for the newly created history file,
                               9216                 :                :          * notifying the archiver that history file may be archived
                               9217                 :                :          * immediately.
                               9218                 :                :          */
 2444 rhaas@postgresql.org     9219                 :CBC         138 :         CleanupBackupHistory();
                               9220                 :                :     }
                               9221                 :                : 
                               9222                 :                :     /*
                               9223                 :                :      * If archiving is enabled, wait for all the required WAL files to be
                               9224                 :                :      * archived before returning. If archiving isn't enabled, the required WAL
                               9225                 :                :      * needs to be transported via streaming replication (hopefully with
                               9226                 :                :      * wal_keep_size set high enough), or some more exotic mechanism like
                               9227                 :                :      * polling and copying files from pg_wal with script. We have no knowledge
                               9228                 :                :      * of those mechanisms, so it's up to the user to ensure that he gets all
                               9229                 :                :      * the required WAL.
                               9230                 :                :      *
                               9231                 :                :      * We wait until both the last WAL file filled during backup and the
                               9232                 :                :      * history file have been archived, and assume that the alphabetic sorting
                               9233                 :                :      * property of the WAL files ensures any earlier WAL files are safely
                               9234                 :                :      * archived as well.
                               9235                 :                :      *
                               9236                 :                :      * We wait forever, since archive_command is supposed to work and we
                               9237                 :                :      * assume the admin wanted his backup to work completely. If you don't
                               9238                 :                :      * wish to wait, then either waitforarchive should be passed in as false,
                               9239                 :                :      * or you can set statement_timeout.  Also, some notices are issued to
                               9240                 :                :      * clue in anyone who might be doing this interactively.
                               9241                 :                :      */
                               9242                 :                : 
                               9243         [ +  + ]:            145 :     if (waitforarchive &&
  566 michael@paquier.xyz      9244   [ +  +  +  +  :              8 :         ((!backup_stopped_in_recovery && XLogArchivingActive()) ||
                                     -  +  +  +  +  
                                                 + ]
                               9245   [ +  -  -  +  :              1 :          (backup_stopped_in_recovery && XLogArchivingAlways())))
                                              -  + ]
                               9246                 :                :     {
                               9247                 :              2 :         XLByteToPrevSeg(state->stoppoint, _logSegNo, wal_segment_size);
                               9248                 :              2 :         XLogFileName(lastxlogfilename, state->stoptli, _logSegNo,
                               9249                 :                :                      wal_segment_size);
                               9250                 :                : 
                               9251                 :              2 :         XLByteToSeg(state->startpoint, _logSegNo, wal_segment_size);
                               9252                 :              2 :         BackupHistoryFileName(histfilename, state->stoptli, _logSegNo,
                               9253                 :                :                               state->startpoint, wal_segment_size);
                               9254                 :                : 
 5031 bruce@momjian.us         9255                 :              2 :         seconds_before_warning = 60;
                               9256                 :              2 :         waits = 0;
                               9257                 :                : 
                               9258   [ +  +  -  + ]:              6 :         while (XLogArchiveIsBusy(lastxlogfilename) ||
                               9259                 :              2 :                XLogArchiveIsBusy(histfilename))
                               9260                 :                :         {
                               9261         [ -  + ]:              2 :             CHECK_FOR_INTERRUPTS();
                               9262                 :                : 
                               9263   [ +  -  -  + ]:              2 :             if (!reported_waiting && waits > 5)
                               9264                 :                :             {
 5031 bruce@momjian.us         9265         [ #  # ]:UBC           0 :                 ereport(NOTICE,
                               9266                 :                :                         (errmsg("base backup done, waiting for required WAL segments to be archived")));
                               9267                 :              0 :                 reported_waiting = true;
                               9268                 :                :             }
                               9269                 :                : 
 1013 michael@paquier.xyz      9270                 :CBC           2 :             (void) WaitLatch(MyLatch,
                               9271                 :                :                              WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
                               9272                 :                :                              1000L,
                               9273                 :                :                              WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
                               9274                 :              2 :             ResetLatch(MyLatch);
                               9275                 :                : 
 5031 bruce@momjian.us         9276         [ -  + ]:              2 :             if (++waits >= seconds_before_warning)
                               9277                 :                :             {
 5031 bruce@momjian.us         9278                 :UBC           0 :                 seconds_before_warning *= 2;    /* This wraps in >10 years... */
                               9279         [ #  # ]:              0 :                 ereport(WARNING,
                               9280                 :                :                         (errmsg("still waiting for all required WAL segments to be archived (%d seconds elapsed)",
                               9281                 :                :                                 waits),
                               9282                 :                :                          errhint("Check that your archive_command is executing properly.  "
                               9283                 :                :                                  "You can safely cancel this backup, "
                               9284                 :                :                                  "but the database backup will not be usable without all the WAL segments.")));
                               9285                 :                :             }
                               9286                 :                :         }
                               9287                 :                : 
 5031 bruce@momjian.us         9288         [ +  + ]:CBC           2 :         ereport(NOTICE,
                               9289                 :                :                 (errmsg("all required WAL segments have been archived")));
                               9290                 :                :     }
 4813 magnus@hagander.net      9291         [ +  + ]:            143 :     else if (waitforarchive)
 5099 tgl@sss.pgh.pa.us        9292         [ +  - ]:              6 :         ereport(NOTICE,
                               9293                 :                :                 (errmsg("WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup")));
 4844 magnus@hagander.net      9294                 :            145 : }
                               9295                 :                : 
                               9296                 :                : 
                               9297                 :                : /*
                               9298                 :                :  * do_pg_abort_backup: abort a running backup
                               9299                 :                :  *
                               9300                 :                :  * This does just the most basic steps of do_pg_backup_stop(), by taking the
                               9301                 :                :  * system out of backup mode, thus making it a lot more safe to call from
                               9302                 :                :  * an error handler.
                               9303                 :                :  *
                               9304                 :                :  * 'arg' indicates that it's being called during backup setup; so
                               9305                 :                :  * sessionBackupState has not been modified yet, but runningBackups has
                               9306                 :                :  * already been incremented.  When it's false, then it's invoked as a
                               9307                 :                :  * before_shmem_exit handler, and therefore we must not change state
                               9308                 :                :  * unless sessionBackupState indicates that a backup is actually running.
                               9309                 :                :  *
                               9310                 :                :  * NB: This gets used as a PG_ENSURE_ERROR_CLEANUP callback and
                               9311                 :                :  * before_shmem_exit handler, hence the odd-looking signature.
                               9312                 :                :  */
                               9313                 :                : void
 1578 rhaas@postgresql.org     9314                 :              9 : do_pg_abort_backup(int code, Datum arg)
                               9315                 :                : {
  543 alvherre@alvh.no-ip.     9316                 :              9 :     bool        during_backup_start = DatumGetBool(arg);
                               9317                 :                : 
                               9318                 :                :     /* If called during backup start, there shouldn't be one already running */
  538                          9319   [ -  +  -  - ]:              9 :     Assert(!during_backup_start || sessionBackupState == SESSION_BACKUP_NONE);
                               9320                 :                : 
  543                          9321   [ +  -  +  + ]:              9 :     if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
                               9322                 :                :     {
                               9323                 :              7 :         WALInsertLockAcquireExclusive();
                               9324         [ -  + ]:              7 :         Assert(XLogCtl->Insert.runningBackups > 0);
                               9325                 :              7 :         XLogCtl->Insert.runningBackups--;
                               9326                 :                : 
                               9327                 :              7 :         sessionBackupState = SESSION_BACKUP_NONE;
                               9328                 :              7 :         WALInsertLockRelease();
                               9329                 :                : 
                               9330         [ +  - ]:              7 :         if (!during_backup_start)
                               9331         [ +  - ]:              7 :             ereport(WARNING,
                               9332                 :                :                     errmsg("aborting backup due to backend exiting before pg_backup_stop was called"));
                               9333                 :                :     }
 1578 rhaas@postgresql.org     9334                 :              9 : }
                               9335                 :                : 
                               9336                 :                : /*
                               9337                 :                :  * Register a handler that will warn about unterminated backups at end of
                               9338                 :                :  * session, unless this has already been done.
                               9339                 :                :  */
                               9340                 :                : void
                               9341                 :              4 : register_persistent_abort_backup_handler(void)
                               9342                 :                : {
                               9343                 :                :     static bool already_done = false;
                               9344                 :                : 
                               9345         [ +  + ]:              4 :     if (already_done)
                               9346                 :              1 :         return;
  543 alvherre@alvh.no-ip.     9347                 :              3 :     before_shmem_exit(do_pg_abort_backup, DatumGetBool(false));
 1578 rhaas@postgresql.org     9348                 :              3 :     already_done = true;
                               9349                 :                : }
                               9350                 :                : 
                               9351                 :                : /*
                               9352                 :                :  * Get latest WAL insert pointer
                               9353                 :                :  */
                               9354                 :                : XLogRecPtr
 4477 heikki.linnakangas@i     9355                 :           2296 : GetXLogInsertRecPtr(void)
                               9356                 :                : {
 3492 andres@anarazel.de       9357                 :           2296 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               9358                 :                :     uint64      current_bytepos;
                               9359                 :                : 
 3933 heikki.linnakangas@i     9360         [ -  + ]:           2296 :     SpinLockAcquire(&Insert->insertpos_lck);
                               9361                 :           2296 :     current_bytepos = Insert->CurrBytePos;
                               9362                 :           2296 :     SpinLockRelease(&Insert->insertpos_lck);
                               9363                 :                : 
                               9364                 :           2296 :     return XLogBytePosToRecPtr(current_bytepos);
                               9365                 :                : }
                               9366                 :                : 
                               9367                 :                : /*
                               9368                 :                :  * Get latest WAL write pointer
                               9369                 :                :  */
                               9370                 :                : XLogRecPtr
  788                          9371                 :           1302 : GetXLogWriteRecPtr(void)
                               9372                 :                : {
   11 alvherre@alvh.no-ip.     9373                 :GNC        1302 :     RefreshXLogWriteResult(LogwrtResult);
                               9374                 :                : 
  788 heikki.linnakangas@i     9375                 :CBC        1302 :     return LogwrtResult.Write;
                               9376                 :                : }
                               9377                 :                : 
                               9378                 :                : /*
                               9379                 :                :  * Returns the redo pointer of the last checkpoint or restartpoint. This is
                               9380                 :                :  * the oldest point in WAL that we still need, if we have to restart recovery.
                               9381                 :                :  */
                               9382                 :                : void
                               9383                 :            127 : GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli)
                               9384                 :                : {
                               9385                 :            127 :     LWLockAcquire(ControlFileLock, LW_SHARED);
                               9386                 :            127 :     *oldrecptr = ControlFile->checkPointCopy.redo;
                               9387                 :            127 :     *oldtli = ControlFile->checkPointCopy.ThisTimeLineID;
                               9388                 :            127 :     LWLockRelease(ControlFileLock);
 6596 tgl@sss.pgh.pa.us        9389                 :            127 : }
                               9390                 :                : 
                               9391                 :                : /* Thin wrapper around ShutdownWalRcv(). */
                               9392                 :                : void
 1021 noah@leadboat.com        9393                 :           1063 : XLogShutdownWalRcv(void)
                               9394                 :                : {
                               9395                 :           1063 :     ShutdownWalRcv();
                               9396                 :                : 
                               9397                 :           1063 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               9398                 :           1063 :     XLogCtl->InstallXLogFileSegmentActive = false;
                               9399                 :           1063 :     LWLockRelease(ControlFileLock);
                               9400                 :           1063 : }
                               9401                 :                : 
                               9402                 :                : /* Enable WAL file recycling and preallocation. */
                               9403                 :                : void
  788 heikki.linnakangas@i     9404                 :           1089 : SetInstallXLogFileSegmentActive(void)
                               9405                 :                : {
                               9406                 :           1089 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               9407                 :           1089 :     XLogCtl->InstallXLogFileSegmentActive = true;
                               9408                 :           1089 :     LWLockRelease(ControlFileLock);
 3140 fujii@postgresql.org     9409                 :           1089 : }
                               9410                 :                : 
                               9411                 :                : bool
  788 heikki.linnakangas@i     9412                 :             51 : IsInstallXLogFileSegmentActive(void)
                               9413                 :                : {
                               9414                 :                :     bool        result;
                               9415                 :                : 
                               9416                 :             51 :     LWLockAcquire(ControlFileLock, LW_SHARED);
                               9417                 :             51 :     result = XLogCtl->InstallXLogFileSegmentActive;
                               9418                 :             51 :     LWLockRelease(ControlFileLock);
                               9419                 :                : 
                               9420                 :             51 :     return result;
                               9421                 :                : }
                               9422                 :                : 
                               9423                 :                : /*
                               9424                 :                :  * Update the WalWriterSleeping flag.
                               9425                 :                :  */
                               9426                 :                : void
 4359 tgl@sss.pgh.pa.us        9427                 :            692 : SetWalWriterSleeping(bool sleeping)
                               9428                 :                : {
 3492 andres@anarazel.de       9429         [ -  + ]:            692 :     SpinLockAcquire(&XLogCtl->info_lck);
                               9430                 :            692 :     XLogCtl->WalWriterSleeping = sleeping;
                               9431                 :            692 :     SpinLockRelease(&XLogCtl->info_lck);
 4359 tgl@sss.pgh.pa.us        9432                 :            692 : }
        

Generated by: LCOV version 2.1-beta2-3-g6141622