15.2. Partition Schemes¶
Partition Schemes can be defined using a JSON/typesafe configuration or by a common name. By default these schemes all utilize “leaf storage” with a one up sequence number. This allows for appending data when it arrives out of order.
Several commonly used schemes are available and can be referenced by name rather than config:
- minute -
yyyy/MM/dd/HH/mm
- hourly -
yyyy/MM/dd/HH
- daily -
yyyy/MM/dd
- weekly -
yyyy/ww
- monthly -
yyyy/MM
- julian-minute -
yyyy/DDD/HH/mm
- julian-hourly -
yyyy/DDD/HH
- julian-daily -
yyyy/DDD
Common geometry schemes are:
- z2-2bit
- z2-4bit
Common combined schemes are created by using a comma. For example:
- hourly,z2-2bit
- julian-daily,z2-2bit
15.2.1. Leaf vs Bucket Storage¶
As previously mentioned, the default storage mode is leaf storage. Leaf storage means that the last component of the
partition scheme is the prefix of the parquet file name. A sequence number is then appended to the filename before the
file extension. For example, a partition scheme of yyyy/MM/dd
would produce the storage path
2016/01/01_0000.parquet
for the first file created for the partition 2016/01/01
. The next file (created by a
new ingest process) for the same partition would result in a file named 2016/01/01_0001.parquet
and then
2016/01/01_0002.parquet
and so on.
By contrast, bucket storage uses a directory for the final component of the scheme. For our partition example above
the storage paths would be 2016/01/01/0001.parquet
, 2016/01/01/0002.parquet
, and 2016/01/01/0003.parquet
.
Leaf storage results in less directory overhead for filesystems such as S3.
15.2.2. JSON Configuration¶
You can manually configure the datetime and z2 geometry schemes using typesafe and json configuration.
15.2.2.1. Common Options¶
- scheme - The names of the partition schemes separated by commas
- leaf-storage - Defines if the partition scheme are files or directories. Default:
true
- true - The right most component of the scheme will be a file with a sequence number
- false - The rightmost component of the scheme is a directory with sequence files
15.2.2.2. DateTime Configuration¶
The datetime scheme is indicated by the key name = "datetime"
and uses the following options:
- datetime-format - A Java DateTime format string separated by forward slashes which will be used to build a
directory structure. Examples are:
- yyyy/DDD/HH - year, Julian Date, hour layout
- yyyy/MM/dd - year, month, day layout
- yyyy/MM/dd/HH - year, month, day, hour layout
- step-unit - A
java.time.temporal.ChronoUnit
defining how to increment the leaf of the partition scheme - step - The step amount to increment the leaf of the partition scheme. Default:
1
- dtg-attribute - The datetime attribute from the SimpleFeature to use for partitioning data. Defaults to the default dtg attribute of the SimpleFeatureType
{
"scheme" : "datetime,z2",
"options" : {
"datetime-format" : "yyyy/MM/dd/HH",
"step-unit" : "HOURS",
"step" : "1",
"dtg-attribute" : "dtg",
"leaf-storage" : true
}
}
15.2.2.3. Z2 Configuration¶
The z2 scheme is indicate by the key name = "z2"
and uses the following options:
- geom-attribute - The geometry attribute from the SimpleFeature to use for partitioning data. Defaults to the default dtg value of the SimpleFeatureType
- z2-resolution - The number of bits to use for z indexing.
{
"scheme" : "z2",
"options" : {
"geom-attribute" : "geom",
"z2-resolution" : 2,
"leaf-storage" : false
}
}
15.2.2.4. Combined DateTime and Z2 Configuration¶
The datetime-z2 scheme is indicated by the key name = "datetime,z2"
and uses the combined options
from the datetime and z2 schemes.
{
"scheme" : "datetime,z2",
"options" : {
"datetime-format" : "yyyy/MM/dd/HH",
"step-unit" : "HOURS",
"step" : 1,
"dtg-attribute" : "dtg",
"geom-attribute" : "geom",
"z2-resolution" : 2,
"leaf-storage" : true
}
}