Age Owner Branch data TLA Line data Source code
1 : : /*
2 : : * brin_bloom.c
3 : : * Implementation of Bloom opclass for BRIN
4 : : *
5 : : * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
6 : : * Portions Copyright (c) 1994, Regents of the University of California
7 : : *
8 : : *
9 : : * A BRIN opclass summarizing page range into a bloom filter.
10 : : *
11 : : * Bloom filters allow efficient testing whether a given page range contains
12 : : * a particular value. Therefore, if we summarize each page range into a small
13 : : * bloom filter, we can easily (and cheaply) test whether it contains values
14 : : * we get later.
15 : : *
16 : : * The index only supports equality operators, similarly to hash indexes.
17 : : * Bloom indexes are however much smaller, and support only bitmap scans.
18 : : *
19 : : * Note: Don't confuse this with bloom indexes, implemented in a contrib
20 : : * module. That extension implements an entirely new AM, building a bloom
21 : : * filter on multiple columns in a single row. This opclass works with an
22 : : * existing AM (BRIN) and builds bloom filter on a column.
23 : : *
24 : : *
25 : : * values vs. hashes
26 : : * -----------------
27 : : *
28 : : * The original column values are not used directly, but are first hashed
29 : : * using the regular type-specific hash function, producing a uint32 hash.
30 : : * And this hash value is then added to the summary - i.e. it's hashed
31 : : * again and added to the bloom filter.
32 : : *
33 : : * This allows the code to treat all data types (byval/byref/...) the same
34 : : * way, with only minimal space requirements, because we're working with
35 : : * hashes and not the original values. Everything is uint32.
36 : : *
37 : : * Of course, this assumes the built-in hash function is reasonably good,
38 : : * without too many collisions etc. But that does seem to be the case, at
39 : : * least based on past experience. After all, the same hash functions are
40 : : * used for hash indexes, hash partitioning and so on.
41 : : *
42 : : *
43 : : * hashing scheme
44 : : * --------------
45 : : *
46 : : * Bloom filters require a number of independent hash functions. There are
47 : : * different schemes how to construct them - for example we might use
48 : : * hash_uint32_extended with random seeds, but that seems fairly expensive.
49 : : * We use a scheme requiring only two functions described in this paper:
50 : : *
51 : : * Less Hashing, Same Performance:Building a Better Bloom Filter
52 : : * Adam Kirsch, Michael Mitzenmacher, Harvard School of Engineering and
53 : : * Applied Sciences, Cambridge, Massachusetts [DOI 10.1002/rsa.20208]
54 : : *
55 : : * The two hash functions h1 and h2 are calculated using hard-coded seeds,
56 : : * and then combined using (h1 + i * h2) to generate the hash functions.
57 : : *
58 : : *
59 : : * sizing the bloom filter
60 : : * -----------------------
61 : : *
62 : : * Size of a bloom filter depends on the number of distinct values we will
63 : : * store in it, and the desired false positive rate. The higher the number
64 : : * of distinct values and/or the lower the false positive rate, the larger
65 : : * the bloom filter. On the other hand, we want to keep the index as small
66 : : * as possible - that's one of the basic advantages of BRIN indexes.
67 : : *
68 : : * Although the number of distinct elements (in a page range) depends on
69 : : * the data, we can consider it fixed. This simplifies the trade-off to
70 : : * just false positive rate vs. size.
71 : : *
72 : : * At the page range level, false positive rate is a probability the bloom
73 : : * filter matches a random value. For the whole index (with sufficiently
74 : : * many page ranges) it represents the fraction of the index ranges (and
75 : : * thus fraction of the table to be scanned) matching the random value.
76 : : *
77 : : * Furthermore, the size of the bloom filter is subject to implementation
78 : : * limits - it has to fit onto a single index page (8kB by default). As
79 : : * the bitmap is inherently random (when "full" about half the bits is set
80 : : * to 1, randomly), compression can't help very much.
81 : : *
82 : : * To reduce the size of a filter (to fit to a page), we have to either
83 : : * accept higher false positive rate (undesirable), or reduce the number
84 : : * of distinct items to be stored in the filter. We can't alter the input
85 : : * data, of course, but we may make the BRIN page ranges smaller - instead
86 : : * of the default 128 pages (1MB) we may build index with 16-page ranges,
87 : : * or something like that. This should reduce the number of distinct values
88 : : * in the page range, making the filter smaller (with fixed false positive
89 : : * rate). Even for random data sets this should help, as the number of rows
90 : : * per heap page is limited (to ~290 with very narrow tables, likely ~20
91 : : * in practice).
92 : : *
93 : : * Of course, good sizing decisions depend on having the necessary data,
94 : : * i.e. number of distinct values in a page range (of a given size) and
95 : : * table size (to estimate cost change due to change in false positive
96 : : * rate due to having larger index vs. scanning larger indexes). We may
97 : : * not have that data - for example when building an index on empty table
98 : : * it's not really possible. And for some data we only have estimates for
99 : : * the whole table and we can only estimate per-range values (ndistinct).
100 : : *
101 : : * Another challenge is that while the bloom filter is per-column, it's
102 : : * the whole index tuple that has to fit into a page. And for multi-column
103 : : * indexes that may include pieces we have no control over (not necessarily
104 : : * bloom filters, the other columns may use other BRIN opclasses). So it's
105 : : * not entirely clear how to distribute the space between those columns.
106 : : *
107 : : * The current logic, implemented in brin_bloom_get_ndistinct, attempts to
108 : : * make some basic sizing decisions, based on the size of BRIN ranges, and
109 : : * the maximum number of rows per range.
110 : : *
111 : : *
112 : : * IDENTIFICATION
113 : : * src/backend/access/brin/brin_bloom.c
114 : : */
115 : : #include "postgres.h"
116 : :
117 : : #include <math.h>
118 : :
119 : : #include "access/brin.h"
120 : : #include "access/brin_internal.h"
121 : : #include "access/brin_page.h"
122 : : #include "access/brin_tuple.h"
123 : : #include "access/genam.h"
124 : : #include "access/htup_details.h"
125 : : #include "access/reloptions.h"
126 : : #include "catalog/pg_am.h"
127 : : #include "catalog/pg_amop.h"
128 : : #include "catalog/pg_type.h"
129 : : #include "common/hashfn.h"
130 : : #include "utils/fmgrprotos.h"
131 : : #include "utils/rel.h"
132 : :
133 : : #define BloomEqualStrategyNumber 1
134 : :
135 : : /*
136 : : * Additional SQL level support functions. We only have one, which is
137 : : * used to calculate hash of the input value.
138 : : *
139 : : * Procedure numbers must not use values reserved for BRIN itself; see
140 : : * brin_internal.h.
141 : : */
142 : : #define BLOOM_MAX_PROCNUMS 1 /* maximum support procs we need */
143 : : #define PROCNUM_HASH 11 /* required */
144 : :
145 : : /*
146 : : * Subtract this from procnum to obtain index in BloomOpaque arrays
147 : : * (Must be equal to minimum of private procnums).
148 : : */
149 : : #define PROCNUM_BASE 11
150 : :
151 : : /*
152 : : * Storage type for BRIN's reloptions.
153 : : */
154 : : typedef struct BloomOptions
155 : : {
156 : : int32 vl_len_; /* varlena header (do not touch directly!) */
157 : : double nDistinctPerRange; /* number of distinct values per range */
158 : : double falsePositiveRate; /* false positive for bloom filter */
159 : : } BloomOptions;
160 : :
161 : : /*
162 : : * The current min value (16) is somewhat arbitrary, but it's based
163 : : * on the fact that the filter header is ~20B alone, which is about
164 : : * the same as the filter bitmap for 16 distinct items with 1% false
165 : : * positive rate. So by allowing lower values we'd not gain much. In
166 : : * any case, the min should not be larger than MaxHeapTuplesPerPage
167 : : * (~290), which is the theoretical maximum for single-page ranges.
168 : : */
169 : : #define BLOOM_MIN_NDISTINCT_PER_RANGE 16
170 : :
171 : : /*
172 : : * Used to determine number of distinct items, based on the number of rows
173 : : * in a page range. The 10% is somewhat similar to what estimate_num_groups
174 : : * does, so we use the same factor here.
175 : : */
176 : : #define BLOOM_DEFAULT_NDISTINCT_PER_RANGE -0.1 /* 10% of values */
177 : :
178 : : /*
179 : : * Allowed range and default value for the false positive range. The exact
180 : : * values are somewhat arbitrary, but were chosen considering the various
181 : : * parameters (size of filter vs. page size, etc.).
182 : : *
183 : : * The lower the false-positive rate, the more accurate the filter is, but
184 : : * it also gets larger - at some point this eliminates the main advantage
185 : : * of BRIN indexes, which is the tiny size. At 0.01% the index is about
186 : : * 10% of the table (assuming 290 distinct values per 8kB page).
187 : : *
188 : : * On the other hand, as the false-positive rate increases, larger part of
189 : : * the table has to be scanned due to mismatches - at 25% we're probably
190 : : * close to sequential scan being cheaper.
191 : : */
192 : : #define BLOOM_MIN_FALSE_POSITIVE_RATE 0.0001 /* 0.01% fp rate */
193 : : #define BLOOM_MAX_FALSE_POSITIVE_RATE 0.25 /* 25% fp rate */
194 : : #define BLOOM_DEFAULT_FALSE_POSITIVE_RATE 0.01 /* 1% fp rate */
195 : :
196 : : #define BloomGetNDistinctPerRange(opts) \
197 : : ((opts) && (((BloomOptions *) (opts))->nDistinctPerRange != 0) ? \
198 : : (((BloomOptions *) (opts))->nDistinctPerRange) : \
199 : : BLOOM_DEFAULT_NDISTINCT_PER_RANGE)
200 : :
201 : : #define BloomGetFalsePositiveRate(opts) \
202 : : ((opts) && (((BloomOptions *) (opts))->falsePositiveRate != 0.0) ? \
203 : : (((BloomOptions *) (opts))->falsePositiveRate) : \
204 : : BLOOM_DEFAULT_FALSE_POSITIVE_RATE)
205 : :
206 : : /*
207 : : * And estimate of the largest bloom we can fit onto a page. This is not
208 : : * a perfect guarantee, for a couple of reasons. For example, the row may
209 : : * be larger because the index has multiple columns.
210 : : */
211 : : #define BloomMaxFilterSize \
212 : : MAXALIGN_DOWN(BLCKSZ - \
213 : : (MAXALIGN(SizeOfPageHeaderData + \
214 : : sizeof(ItemIdData)) + \
215 : : MAXALIGN(sizeof(BrinSpecialSpace)) + \
216 : : SizeOfBrinTuple))
217 : :
218 : : /*
219 : : * Seeds used to calculate two hash functions h1 and h2, which are then used
220 : : * to generate k hashes using the (h1 + i * h2) scheme.
221 : : */
222 : : #define BLOOM_SEED_1 0x71d924af
223 : : #define BLOOM_SEED_2 0xba48b314
224 : :
225 : : /*
226 : : * Bloom Filter
227 : : *
228 : : * Represents a bloom filter, built on hashes of the indexed values. That is,
229 : : * we compute a uint32 hash of the value, and then store this hash into the
230 : : * bloom filter (and compute additional hashes on it).
231 : : *
232 : : * XXX We could implement "sparse" bloom filters, keeping only the bytes that
233 : : * are not entirely 0. But while indexes don't support TOAST, the varlena can
234 : : * still be compressed. So this seems unnecessary, because the compression
235 : : * should do the same job.
236 : : *
237 : : * XXX We can also watch the number of bits set in the bloom filter, and then
238 : : * stop using it (and not store the bitmap, to save space) when the false
239 : : * positive rate gets too high. But even if the false positive rate exceeds the
240 : : * desired value, it still can eliminate some page ranges.
241 : : */
242 : : typedef struct BloomFilter
243 : : {
244 : : /* varlena header (do not touch directly!) */
245 : : int32 vl_len_;
246 : :
247 : : /* space for various flags (unused for now) */
248 : : uint16 flags;
249 : :
250 : : /* fields for the HASHED phase */
251 : : uint8 nhashes; /* number of hash functions */
252 : : uint32 nbits; /* number of bits in the bitmap (size) */
253 : : uint32 nbits_set; /* number of bits set to 1 */
254 : :
255 : : /* data of the bloom filter */
256 : : char data[FLEXIBLE_ARRAY_MEMBER];
257 : : } BloomFilter;
258 : :
259 : : /*
260 : : * bloom_filter_size
261 : : * Calculate Bloom filter parameters (nbits, nbytes, nhashes).
262 : : *
263 : : * Given expected number of distinct values and desired false positive rate,
264 : : * calculates the optimal parameters of the Bloom filter.
265 : : *
266 : : * The resulting parameters are returned through nbytesp (number of bytes),
267 : : * nbitsp (number of bits) and nhashesp (number of hash functions). If a
268 : : * pointer is NULL, the parameter is not returned.
269 : : */
270 : : static void
287 tomas.vondra@postgre 271 :GNC 4003 : bloom_filter_size(int ndistinct, double false_positive_rate,
272 : : int *nbytesp, int *nbitsp, int *nhashesp)
273 : : {
274 : : double k;
275 : : int nbits,
276 : : nbytes;
277 : :
278 : : /* sizing bloom filter: -(n * ln(p)) / (ln(2))^2 */
279 : 4003 : nbits = ceil(-(ndistinct * log(false_positive_rate)) / pow(log(2.0), 2));
280 : :
281 : : /* round m to whole bytes */
282 : 4003 : nbytes = ((nbits + 7) / 8);
283 : 4003 : nbits = nbytes * 8;
284 : :
285 : : /*
286 : : * round(log(2.0) * m / ndistinct), but assume round() may not be
287 : : * available on Windows
288 : : */
289 : 4003 : k = log(2.0) * nbits / ndistinct;
290 [ + - ]: 4003 : k = (k - floor(k) >= 0.5) ? ceil(k) : floor(k);
291 : :
292 [ + - ]: 4003 : if (nbytesp)
293 : 4003 : *nbytesp = nbytes;
294 : :
295 [ + - ]: 4003 : if (nbitsp)
296 : 4003 : *nbitsp = nbits;
297 : :
298 [ + - ]: 4003 : if (nhashesp)
299 : 4003 : *nhashesp = (int) k;
300 : 4003 : }
301 : :
302 : : /*
303 : : * bloom_init
304 : : * Initialize the Bloom Filter, allocate all the memory.
305 : : *
306 : : * The filter is initialized with optimal size for ndistinct expected values
307 : : * and the requested false positive rate. The filter is stored as varlena.
308 : : */
309 : : static BloomFilter *
1115 tomas.vondra@postgre 310 :CBC 4003 : bloom_init(int ndistinct, double false_positive_rate)
311 : : {
312 : : Size len;
313 : : BloomFilter *filter;
314 : :
315 : : int nbits; /* size of filter / number of bits */
316 : : int nbytes; /* size of filter / number of bytes */
317 : : int nhashes; /* number of hash functions */
318 : :
319 [ - + ]: 4003 : Assert(ndistinct > 0);
158 michael@paquier.xyz 320 [ + - - + ]: 4003 : Assert(false_positive_rate > 0 && false_positive_rate < 1);
321 : :
322 : : /* calculate bloom filter size / parameters */
287 tomas.vondra@postgre 323 :GNC 4003 : bloom_filter_size(ndistinct, false_positive_rate,
324 : : &nbytes, &nbits, &nhashes);
325 : :
326 : : /*
327 : : * Reject filters that are obviously too large to store on a page.
328 : : *
329 : : * Initially the bloom filter is just zeroes and so very compressible, but
330 : : * as we add values it gets more and more random, and so less and less
331 : : * compressible. So initially everything fits on the page, but we might
332 : : * get surprising failures later - we want to prevent that, so we reject
333 : : * bloom filter that are obviously too large.
334 : : *
335 : : * XXX It's not uncommon to oversize the bloom filter a bit, to defend
336 : : * against unexpected data anomalies (parts of table with more distinct
337 : : * values per range etc.). But we still need to make sure even the
338 : : * oversized filter fits on page, if such need arises.
339 : : *
340 : : * XXX This check is not perfect, because the index may have multiple
341 : : * filters that are small individually, but too large when combined.
342 : : */
1115 tomas.vondra@postgre 343 [ - + ]:CBC 4003 : if (nbytes > BloomMaxFilterSize)
1115 tomas.vondra@postgre 344 [ # # ]:UBC 0 : elog(ERROR, "the bloom filter is too large (%d > %zu)", nbytes,
345 : : BloomMaxFilterSize);
346 : :
347 : : /*
348 : : * We allocate the whole filter. Most of it is going to be 0 bits, so the
349 : : * varlena is easy to compress.
350 : : */
1115 tomas.vondra@postgre 351 :CBC 4003 : len = offsetof(BloomFilter, data) + nbytes;
352 : :
353 : 4003 : filter = (BloomFilter *) palloc0(len);
354 : :
355 : 4003 : filter->flags = 0;
287 tomas.vondra@postgre 356 :GNC 4003 : filter->nhashes = nhashes;
1115 tomas.vondra@postgre 357 :CBC 4003 : filter->nbits = nbits;
358 : :
359 : 4003 : SET_VARSIZE(filter, len);
360 : :
361 : 4003 : return filter;
362 : : }
363 : :
364 : :
365 : : /*
366 : : * bloom_add_value
367 : : * Add value to the bloom filter.
368 : : */
369 : : static BloomFilter *
1068 tgl@sss.pgh.pa.us 370 : 34104 : bloom_add_value(BloomFilter *filter, uint32 value, bool *updated)
371 : : {
372 : : int i;
373 : : uint64 h1,
374 : : h2;
375 : :
376 : : /* compute the hashes, used for the bloom filter */
1115 tomas.vondra@postgre 377 : 34104 : h1 = hash_bytes_uint32_extended(value, BLOOM_SEED_1) % filter->nbits;
378 : 34104 : h2 = hash_bytes_uint32_extended(value, BLOOM_SEED_2) % filter->nbits;
379 : :
380 : : /* compute the requested number of hashes */
381 [ + + ]: 272832 : for (i = 0; i < filter->nhashes; i++)
382 : : {
383 : : /* h1 + h2 + f(i) */
384 : 238728 : uint32 h = (h1 + i * h2) % filter->nbits;
385 : 238728 : uint32 byte = (h / 8);
386 : 238728 : uint32 bit = (h % 8);
387 : :
388 : : /* if the bit is not set, set it and remember we did that */
389 [ + + ]: 238728 : if (!(filter->data[byte] & (0x01 << bit)))
390 : : {
391 : 109275 : filter->data[byte] |= (0x01 << bit);
392 : 109275 : filter->nbits_set++;
393 [ + - ]: 109275 : if (updated)
394 : 109275 : *updated = true;
395 : : }
396 : : }
397 : :
398 : 34104 : return filter;
399 : : }
400 : :
401 : :
402 : : /*
403 : : * bloom_contains_value
404 : : * Check if the bloom filter contains a particular value.
405 : : */
406 : : static bool
1068 tgl@sss.pgh.pa.us 407 : 4104 : bloom_contains_value(BloomFilter *filter, uint32 value)
408 : : {
409 : : int i;
410 : : uint64 h1,
411 : : h2;
412 : :
413 : : /* calculate the two hashes */
1115 tomas.vondra@postgre 414 : 4104 : h1 = hash_bytes_uint32_extended(value, BLOOM_SEED_1) % filter->nbits;
415 : 4104 : h2 = hash_bytes_uint32_extended(value, BLOOM_SEED_2) % filter->nbits;
416 : :
417 : : /* compute the requested number of hashes */
418 [ + + ]: 5217 : for (i = 0; i < filter->nhashes; i++)
419 : : {
420 : : /* h1 + h2 + f(i) */
421 : 5082 : uint32 h = (h1 + i * h2) % filter->nbits;
422 : 5082 : uint32 byte = (h / 8);
423 : 5082 : uint32 bit = (h % 8);
424 : :
425 : : /* if the bit is not set, the value is not there */
426 [ + + ]: 5082 : if (!(filter->data[byte] & (0x01 << bit)))
427 : 3969 : return false;
428 : : }
429 : :
430 : : /* all hashes found in bloom filter */
431 : 135 : return true;
432 : : }
433 : :
434 : : typedef struct BloomOpaque
435 : : {
436 : : /*
437 : : * XXX At this point we only need a single proc (to compute the hash), but
438 : : * let's keep the array just like inclusion and minmax opclasses, for
439 : : * consistency. We may need additional procs in the future.
440 : : */
441 : : FmgrInfo extra_procinfos[BLOOM_MAX_PROCNUMS];
442 : : bool extra_proc_missing[BLOOM_MAX_PROCNUMS];
443 : : } BloomOpaque;
444 : :
445 : : static FmgrInfo *bloom_get_procinfo(BrinDesc *bdesc, uint16 attno,
446 : : uint16 procnum);
447 : :
448 : :
449 : : Datum
450 : 2452 : brin_bloom_opcinfo(PG_FUNCTION_ARGS)
451 : : {
452 : : BrinOpcInfo *result;
453 : :
454 : : /*
455 : : * opaque->strategy_procinfos is initialized lazily; here it is set to
456 : : * all-uninitialized by palloc0 which sets fn_oid to InvalidOid.
457 : : *
458 : : * bloom indexes only store the filter as a single BYTEA column
459 : : */
460 : :
461 : 2452 : result = palloc0(MAXALIGN(SizeofBrinOpcInfo(1)) +
462 : : sizeof(BloomOpaque));
463 : 2452 : result->oi_nstored = 1;
464 : 2452 : result->oi_regular_nulls = true;
465 : 2452 : result->oi_opaque = (BloomOpaque *)
466 : 2452 : MAXALIGN((char *) result + SizeofBrinOpcInfo(1));
467 : 2452 : result->oi_typcache[0] = lookup_type_cache(PG_BRIN_BLOOM_SUMMARYOID, 0);
468 : :
469 : 2452 : PG_RETURN_POINTER(result);
470 : : }
471 : :
472 : : /*
473 : : * brin_bloom_get_ndistinct
474 : : * Determine the ndistinct value used to size bloom filter.
475 : : *
476 : : * Adjust the ndistinct value based on the pagesPerRange value. First,
477 : : * if it's negative, it's assumed to be relative to maximum number of
478 : : * tuples in the range (assuming each page gets MaxHeapTuplesPerPage
479 : : * tuples, which is likely a significant over-estimate). We also clamp
480 : : * the value, not to over-size the bloom filter unnecessarily.
481 : : *
482 : : * XXX We can only do this when the pagesPerRange value was supplied.
483 : : * If it wasn't, it has to be a read-only access to the index, in which
484 : : * case we don't really care. But perhaps we should fall-back to the
485 : : * default pagesPerRange value?
486 : : *
487 : : * XXX We might also fetch info about ndistinct estimate for the column,
488 : : * and compute the expected number of distinct values in a range. But
489 : : * that may be tricky due to data being sorted in various ways, so it
490 : : * seems better to rely on the upper estimate.
491 : : *
492 : : * XXX We might also calculate a better estimate of rows per BRIN range,
493 : : * instead of using MaxHeapTuplesPerPage (which probably produces values
494 : : * much higher than reality).
495 : : */
496 : : static int
497 : 4003 : brin_bloom_get_ndistinct(BrinDesc *bdesc, BloomOptions *opts)
498 : : {
499 : : double ndistinct;
500 : : double maxtuples;
501 : : BlockNumber pagesPerRange;
502 : :
503 [ + - - + : 4003 : pagesPerRange = BrinGetPagesPerRange(bdesc->bd_index);
+ - ]
504 [ + - + - ]: 4003 : ndistinct = BloomGetNDistinctPerRange(opts);
505 : :
506 [ - + ]: 4003 : Assert(BlockNumberIsValid(pagesPerRange));
507 : :
508 : 4003 : maxtuples = MaxHeapTuplesPerPage * pagesPerRange;
509 : :
510 : : /*
511 : : * Similarly to n_distinct, negative values are relative - in this case to
512 : : * maximum number of tuples in the page range (maxtuples).
513 : : */
514 [ + - ]: 4003 : if (ndistinct < 0)
515 : 4003 : ndistinct = (-ndistinct) * maxtuples;
516 : :
517 : : /*
518 : : * Positive values are to be used directly, but we still apply a couple of
519 : : * safeties to avoid using unreasonably small bloom filters.
520 : : */
521 [ + - ]: 4003 : ndistinct = Max(ndistinct, BLOOM_MIN_NDISTINCT_PER_RANGE);
522 : :
523 : : /*
524 : : * And don't use more than the maximum possible number of tuples, in the
525 : : * range, which would be entirely wasteful.
526 : : */
527 [ + - ]: 4003 : ndistinct = Min(ndistinct, maxtuples);
528 : :
529 : 4003 : return (int) ndistinct;
530 : : }
531 : :
532 : : /*
533 : : * Examine the given index tuple (which contains partial status of a certain
534 : : * page range) by comparing it to the given value that comes from another heap
535 : : * tuple. If the new value is outside the bloom filter specified by the
536 : : * existing tuple values, update the index tuple and return true. Otherwise,
537 : : * return false and do not modify in this case.
538 : : */
539 : : Datum
540 : 34104 : brin_bloom_add_value(PG_FUNCTION_ARGS)
541 : : {
542 : 34104 : BrinDesc *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
543 : 34104 : BrinValues *column = (BrinValues *) PG_GETARG_POINTER(1);
544 : 34104 : Datum newval = PG_GETARG_DATUM(2);
545 : 34104 : bool isnull PG_USED_FOR_ASSERTS_ONLY = PG_GETARG_DATUM(3);
546 : 34104 : BloomOptions *opts = (BloomOptions *) PG_GET_OPCLASS_OPTIONS();
547 : 34104 : Oid colloid = PG_GET_COLLATION();
548 : : FmgrInfo *hashFn;
549 : : uint32 hashValue;
550 : 34104 : bool updated = false;
551 : : AttrNumber attno;
552 : : BloomFilter *filter;
553 : :
554 [ - + ]: 34104 : Assert(!isnull);
555 : :
556 : 34104 : attno = column->bv_attno;
557 : :
558 : : /*
559 : : * If this is the first non-null value, we need to initialize the bloom
560 : : * filter. Otherwise just extract the existing bloom filter from
561 : : * BrinValues.
562 : : */
563 [ + + ]: 34104 : if (column->bv_allnulls)
564 : : {
565 [ + - ]: 8006 : filter = bloom_init(brin_bloom_get_ndistinct(bdesc, opts),
566 [ + - ]: 4003 : BloomGetFalsePositiveRate(opts));
567 : 4003 : column->bv_values[0] = PointerGetDatum(filter);
568 : 4003 : column->bv_allnulls = false;
569 : 4003 : updated = true;
570 : : }
571 : : else
572 : 30101 : filter = (BloomFilter *) PG_DETOAST_DATUM(column->bv_values[0]);
573 : :
574 : : /*
575 : : * Compute the hash of the new value, using the supplied hash function,
576 : : * and then add the hash value to the bloom filter.
577 : : */
578 : 34104 : hashFn = bloom_get_procinfo(bdesc, attno, PROCNUM_HASH);
579 : :
580 : 34104 : hashValue = DatumGetUInt32(FunctionCall1Coll(hashFn, colloid, newval));
581 : :
582 : 34104 : filter = bloom_add_value(filter, hashValue, &updated);
583 : :
584 : 34104 : column->bv_values[0] = PointerGetDatum(filter);
585 : :
586 : 34104 : PG_RETURN_BOOL(updated);
587 : : }
588 : :
589 : : /*
590 : : * Given an index tuple corresponding to a certain page range and a scan key,
591 : : * return whether the scan key is consistent with the index tuple's bloom
592 : : * filter. Return true if so, false otherwise.
593 : : */
594 : : Datum
595 : 4104 : brin_bloom_consistent(PG_FUNCTION_ARGS)
596 : : {
597 : 4104 : BrinDesc *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
598 : 4104 : BrinValues *column = (BrinValues *) PG_GETARG_POINTER(1);
599 : 4104 : ScanKey *keys = (ScanKey *) PG_GETARG_POINTER(2);
600 : 4104 : int nkeys = PG_GETARG_INT32(3);
601 : 4104 : Oid colloid = PG_GET_COLLATION();
602 : : AttrNumber attno;
603 : : Datum value;
604 : : bool matches;
605 : : FmgrInfo *finfo;
606 : : uint32 hashValue;
607 : : BloomFilter *filter;
608 : : int keyno;
609 : :
610 : 4104 : filter = (BloomFilter *) PG_DETOAST_DATUM(column->bv_values[0]);
611 : :
612 [ - + ]: 4104 : Assert(filter);
613 : :
614 : : /*
615 : : * Assume all scan keys match. We'll be searching for a scan key
616 : : * eliminating the page range (we can stop on the first such key).
617 : : */
618 : 4104 : matches = true;
619 : :
620 [ + + ]: 4239 : for (keyno = 0; keyno < nkeys; keyno++)
621 : : {
622 : 4104 : ScanKey key = keys[keyno];
623 : :
624 : : /* NULL keys are handled and filtered-out in bringetbitmap */
625 [ - + ]: 4104 : Assert(!(key->sk_flags & SK_ISNULL));
626 : :
627 : 4104 : attno = key->sk_attno;
628 : 4104 : value = key->sk_argument;
629 : :
630 [ + - ]: 4104 : switch (key->sk_strategy)
631 : : {
632 : 4104 : case BloomEqualStrategyNumber:
633 : :
634 : : /*
635 : : * We want to return the current page range if the bloom
636 : : * filter seems to contain the value.
637 : : */
638 : 4104 : finfo = bloom_get_procinfo(bdesc, attno, PROCNUM_HASH);
639 : :
640 : 4104 : hashValue = DatumGetUInt32(FunctionCall1Coll(finfo, colloid, value));
641 : 4104 : matches &= bloom_contains_value(filter, hashValue);
642 : :
643 : 4104 : break;
1115 tomas.vondra@postgre 644 :UBC 0 : default:
645 : : /* shouldn't happen */
646 [ # # ]: 0 : elog(ERROR, "invalid strategy number %d", key->sk_strategy);
647 : : matches = false;
648 : : break;
649 : : }
650 : :
1115 tomas.vondra@postgre 651 [ + + ]:CBC 4104 : if (!matches)
652 : 3969 : break;
653 : : }
654 : :
287 tomas.vondra@postgre 655 :GNC 4104 : PG_RETURN_BOOL(matches);
656 : : }
657 : :
658 : : /*
659 : : * Given two BrinValues, update the first of them as a union of the summary
660 : : * values contained in both. The second one is untouched.
661 : : *
662 : : * XXX We assume the bloom filters have the same parameters for now. In the
663 : : * future we should have 'can union' function, to decide if we can combine
664 : : * two particular bloom filters.
665 : : */
666 : : Datum
1115 tomas.vondra@postgre 667 :GBC 7 : brin_bloom_union(PG_FUNCTION_ARGS)
668 : : {
669 : : int i;
670 : : int nbytes;
671 : 7 : BrinValues *col_a = (BrinValues *) PG_GETARG_POINTER(1);
672 : 7 : BrinValues *col_b = (BrinValues *) PG_GETARG_POINTER(2);
673 : : BloomFilter *filter_a;
674 : : BloomFilter *filter_b;
675 : :
676 [ - + ]: 7 : Assert(col_a->bv_attno == col_b->bv_attno);
677 [ + - - + ]: 7 : Assert(!col_a->bv_allnulls && !col_b->bv_allnulls);
678 : :
679 : 7 : filter_a = (BloomFilter *) PG_DETOAST_DATUM(col_a->bv_values[0]);
680 : 7 : filter_b = (BloomFilter *) PG_DETOAST_DATUM(col_b->bv_values[0]);
681 : :
682 : : /* make sure the filters use the same parameters */
683 [ + - - + ]: 7 : Assert(filter_a && filter_b);
684 [ - + ]: 7 : Assert(filter_a->nbits == filter_b->nbits);
685 [ - + ]: 7 : Assert(filter_a->nhashes == filter_b->nhashes);
686 [ + - - + ]: 7 : Assert((filter_a->nbits > 0) && (filter_a->nbits % 8 == 0));
687 : :
688 : 7 : nbytes = (filter_a->nbits) / 8;
689 : :
690 : : /* simply OR the bitmaps */
691 [ + + ]: 1715 : for (i = 0; i < nbytes; i++)
692 : 1708 : filter_a->data[i] |= filter_b->data[i];
693 : :
694 : : /* update the number of bits set in the filter */
0 695 : 7 : filter_a->nbits_set = pg_popcount((const char *) filter_a->data, nbytes);
696 : :
1115 697 : 7 : PG_RETURN_VOID();
698 : : }
699 : :
700 : : /*
701 : : * Cache and return inclusion opclass support procedure
702 : : *
703 : : * Return the procedure corresponding to the given function support number
704 : : * or null if it does not exist.
705 : : */
706 : : static FmgrInfo *
1115 tomas.vondra@postgre 707 :CBC 38208 : bloom_get_procinfo(BrinDesc *bdesc, uint16 attno, uint16 procnum)
708 : : {
709 : : BloomOpaque *opaque;
710 : 38208 : uint16 basenum = procnum - PROCNUM_BASE;
711 : :
712 : : /*
713 : : * We cache these in the opaque struct, to avoid repetitive syscache
714 : : * lookups.
715 : : */
716 : 38208 : opaque = (BloomOpaque *) bdesc->bd_info[attno - 1]->oi_opaque;
717 : :
718 : : /*
719 : : * If we already searched for this proc and didn't find it, don't bother
720 : : * searching again.
721 : : */
722 [ - + ]: 38208 : if (opaque->extra_proc_missing[basenum])
1115 tomas.vondra@postgre 723 :UBC 0 : return NULL;
724 : :
1115 tomas.vondra@postgre 725 [ + + ]:CBC 38208 : if (opaque->extra_procinfos[basenum].fn_oid == InvalidOid)
726 : : {
727 [ + - ]: 432 : if (RegProcedureIsValid(index_getprocid(bdesc->bd_index, attno,
728 : : procnum)))
729 : : {
730 : 432 : fmgr_info_copy(&opaque->extra_procinfos[basenum],
731 : : index_getprocinfo(bdesc->bd_index, attno, procnum),
732 : : bdesc->bd_context);
733 : : }
734 : : else
735 : : {
1115 tomas.vondra@postgre 736 :UBC 0 : opaque->extra_proc_missing[basenum] = true;
737 : 0 : return NULL;
738 : : }
739 : : }
740 : :
1115 tomas.vondra@postgre 741 :CBC 38208 : return &opaque->extra_procinfos[basenum];
742 : : }
743 : :
744 : : Datum
745 : 421 : brin_bloom_options(PG_FUNCTION_ARGS)
746 : : {
747 : 421 : local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0);
748 : :
749 : 421 : init_local_reloptions(relopts, sizeof(BloomOptions));
750 : :
751 : 421 : add_local_real_reloption(relopts, "n_distinct_per_range",
752 : : "number of distinct items expected in a BRIN page range",
753 : : BLOOM_DEFAULT_NDISTINCT_PER_RANGE,
754 : : -1.0, INT_MAX, offsetof(BloomOptions, nDistinctPerRange));
755 : :
756 : 421 : add_local_real_reloption(relopts, "false_positive_rate",
757 : : "desired false-positive rate for the bloom filters",
758 : : BLOOM_DEFAULT_FALSE_POSITIVE_RATE,
759 : : BLOOM_MIN_FALSE_POSITIVE_RATE,
760 : : BLOOM_MAX_FALSE_POSITIVE_RATE,
761 : : offsetof(BloomOptions, falsePositiveRate));
762 : :
763 : 421 : PG_RETURN_VOID();
764 : : }
765 : :
766 : : /*
767 : : * brin_bloom_summary_in
768 : : * - input routine for type brin_bloom_summary.
769 : : *
770 : : * brin_bloom_summary is only used internally to represent summaries
771 : : * in BRIN bloom indexes, so it has no operations of its own, and we
772 : : * disallow input too.
773 : : */
774 : : Datum
1115 tomas.vondra@postgre 775 :UBC 0 : brin_bloom_summary_in(PG_FUNCTION_ARGS)
776 : : {
777 : : /*
778 : : * brin_bloom_summary stores the data in binary form and parsing text
779 : : * input is not needed, so disallow this.
780 : : */
781 [ # # ]: 0 : ereport(ERROR,
782 : : (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
783 : : errmsg("cannot accept a value of type %s", "pg_brin_bloom_summary")));
784 : :
785 : : PG_RETURN_VOID(); /* keep compiler quiet */
786 : : }
787 : :
788 : :
789 : : /*
790 : : * brin_bloom_summary_out
791 : : * - output routine for type brin_bloom_summary.
792 : : *
793 : : * BRIN bloom summaries are serialized into a bytea value, but we want
794 : : * to output something nicer humans can understand.
795 : : */
796 : : Datum
1115 tomas.vondra@postgre 797 :GBC 120 : brin_bloom_summary_out(PG_FUNCTION_ARGS)
798 : : {
799 : : BloomFilter *filter;
800 : : StringInfoData str;
801 : :
802 : : /* detoast the data to get value with a full 4B header */
0 803 : 120 : filter = (BloomFilter *) PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
804 : :
1115 805 : 120 : initStringInfo(&str);
806 : 120 : appendStringInfoChar(&str, '{');
807 : :
808 : 120 : appendStringInfo(&str, "mode: hashed nhashes: %u nbits: %u nbits_set: %u",
809 : 120 : filter->nhashes, filter->nbits, filter->nbits_set);
810 : :
811 : 120 : appendStringInfoChar(&str, '}');
812 : :
813 : 120 : PG_RETURN_CSTRING(str.data);
814 : : }
815 : :
816 : : /*
817 : : * brin_bloom_summary_recv
818 : : * - binary input routine for type brin_bloom_summary.
819 : : */
820 : : Datum
1115 tomas.vondra@postgre 821 :UBC 0 : brin_bloom_summary_recv(PG_FUNCTION_ARGS)
822 : : {
823 [ # # ]: 0 : ereport(ERROR,
824 : : (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
825 : : errmsg("cannot accept a value of type %s", "pg_brin_bloom_summary")));
826 : :
827 : : PG_RETURN_VOID(); /* keep compiler quiet */
828 : : }
829 : :
830 : : /*
831 : : * brin_bloom_summary_send
832 : : * - binary output routine for type brin_bloom_summary.
833 : : *
834 : : * BRIN bloom summaries are serialized in a bytea value (although the
835 : : * type is named differently), so let's just send that.
836 : : */
837 : : Datum
838 : 0 : brin_bloom_summary_send(PG_FUNCTION_ARGS)
839 : : {
840 : 0 : return byteasend(fcinfo);
841 : : }
|