This note focuses on OLTP setting, not OLAP. In OLAP setting, we can just spam more CPUs to do parallel scan/aggregate/etc. In OLTP, we can also do that, but also bottlenecked by contention for isolation, number of queries, latency target, etc.
Also assuming that the queries cant use other highly selective fields (e.g. unique key, date, etc). If it can, then the low cardinality field can be checked on filter
step rather than access
step.
WHERE status is NULL
). Partial/Filtered index for example supported in PostgreSQL, VoltDB, Oracle, SQL Server, CockroachDB, YugabyteVALUES IN ('A', 'B', 'C', 'D', ...)
. Basically the same as partitioning by list. Physically partitioned, so queries can directly filter by this partitioning. Supported by both PostgreSQL, MySQL, CockroachDB, YugaByte, Oracle, SQL Server, VoltDBTypical in OTA industries, as most search result has many boolean check (e.g. HAS_SMOKE_AREA
, HAS_AC
, HAS_WIFI
, etc). Need to use bitmap index, and use partitioning, so do not need to use bitmask over every single record.
Tarantool and Lucene (and basically their users, Elasticsearch and Solr) has support for this.
Pilosa is a full blown distributed store, specialized in bitmap indexing.
Can also use bitmap index libraries, such as Roaring Bitmap or those using SIMD instructions, such as kelindar/bitmap. For few patterns to be aware of, see here