19.9. FileSystem Metadata¶
The FileSystem data store (FSDS) stores metadata about partitions and data files, to avoid having to repeatedly interrogate the filesystem. When a new data file is added or removed, an associated metadata entry will be created to track the operation. See Configuring Metadata Persistence for information on how to configure the metadata.
19.9.1. File System Persistence¶
By default, metadata information is stored as a change log in the metadata
folder under the root path for the
FSDS. This is the simplest solution, as it does not require any additional infrastructure. However, the initial
time required to read the metadata may be a limitation when dealing with a large number of partitions.
If the number of metadata files grows too large, they may be reduced by using the compact or manage-metadata command-line functions, and/or manually moved into sub-folders.
The file-based metadata may be specified by using the name file
, and supports the following configuration
options:
Key |
Description |
---|---|
|
The format for rendering the metadata files, either |
19.9.2. Relational Database Persistence¶
Alternatively, metadata may be stored in a relational database through JDBC. A relational database may be
specified by using the name jdbc
, and supports the following configuration options (required options are
marked with *
):
Key |
Description |
---|---|
|
The JDBC connection URL, e.g. |
|
The fully-qualified name of a JDBC driver class, e.g. |
|
The database user used to create connections |
|
The password for the database user |
|
The minimum number of connections to keep idle in the database connection pool |
|
The maximum number of connections to keep idle in the database connection pool |
|
The maximum size of the database connection pool |
|
Enable fairness when retrieving from the database connection pool ( |
|
Test connections when retrieving them from the database connection pool ( |
|
Test connections when initially creating them ( |
|
Test idle connections in the database connection pool ( |
Currently, Postgres and H2 are officially supported. Other databases may work, but have not been tested.