21.7. Kudu Index Configuration¶
GeoMesa exposes a variety of configuration options that can be used to customize and optimize a given installation. This section contains Kudu-specific options; general options can be found under Index Configuration.
21.7.1. Default Index Creation¶
By default GeoMesa will only create a single Z3 (or XZ3) index for Kudu. Due to predicate push-downs, partition pruning and other advanced optimizations, queries on non-indexed fields are still quite fast. To create additional indices, specify them explicitly. See Customizing Index Creation for details.
Note that because they do not have an incrementing time field in the primary key, other indices may eventually run out of space. Initial partitioning is especially important in this case (see next).
21.7.2. Table Partitioning¶
Partitioning is import in Kudu, and GeoMesa supports both Kudu hash and range partitions. See Configuring Z-Index Shards and Configuring Attribute Index Shards for details on configuring shards, which will be translated to Kudu hash partitions. See Configuring Index Splits for details on configuring table splits, which will be translated to Kudu range partitions. By default, the Z3/XZ3 index will create a new range partition for every time period (week by default).
See the Kudu documentation for more details on Kudu partitioning.
21.7.3. Column Encoding¶
Kudu has multiple ways to encode data. Each column (attribute) in a schema is encoded separately. The best encoding will depend on the data being written; for example low-cardinality string columns work will with dictionary encoding. If nothing is specified, then the Kudu default encoding will be used based on the attribute type.
Valid encodings depend on the attribute type:
Attribute Type | Encodings | Default |
---|---|---|
Integer, Long, Date | plain, bit shuffle, run length | bit shuffle |
Float, Double | plain, bit shuffle | bit shuffle |
Boolean | plain, run length | run length |
String, Bytes, UUID, List, Map | plain, prefix, dictionary | dictionary |
Point | plain, bit shuffle | bit shuffle |
Non-point geometry | plain, prefix, dictionary | dictionary |
The encoding must be specified by its enumeration:
Encoding | Keyword |
---|---|
plain | PLAIN_ENCODING |
prefix | PREFIX_ENCODING |
run length | RLE |
dictionary | DICT_ENCODING |
bitshuffle | BIT_SHUFFLE |
Encodings are set in the attribute user data. See Setting Attribute Options for more information on how to configure attributes.
import org.locationtech.geomesa.utils.interop.SimpleFeatureTypes;
SimpleFeatureTypes.createType("example", "name:String:encoding=DICT_ENCODING");
import org.locationtech.geomesa.utils.geotools.SimpleFeatureTypes
SimpleFeatureTypes.createType("example", "name:String:encoding=DICT_ENCODING")
geomesa = {
sfts = {
example = {
attributes = [
{ name = "name", type = "String", encoding = "DICT_ENCODING" }
]
}
}
}
See the Kudu documentation for more information.
21.7.4. Column Compression¶
Kudu also allows compression on a per-column basis. Compression may be one of NO_COMPRESSION
, SNAPPY
,
LZ4
, or ZLIB
. If not specified, GeoMesa will default to LZ4
.
Note that columns that are bit-shuffle encoded are compressed as part of the bit-shuffle algorithm, so it is not recommended to compress them further. GeoMesa will ignore attempts to do so.
Compressions are set in the attribute user data. See Setting Attribute Options for more information on how to configure attributes.
import org.locationtech.geomesa.utils.interop.SimpleFeatureTypes;
SimpleFeatureTypes.createType("example", "name:String:compression=SNAPPY");
import org.locationtech.geomesa.utils.geotools.SimpleFeatureTypes
SimpleFeatureTypes.createType("example", "name:String:compression=SNAPPY")
geomesa = {
sfts = {
example = {
attributes = [
{ name = "name", type = "String", compression = "SNAPPY" }
]
}
}
}
See the Kudu documentation for more information.