14.7. FileSystem Command-Line Tools¶
The GeoMesa FileSystem distribution includes a set of command-line tools for feature management, ingest, export and debugging.
To install the tools, see Setting up the FileSystem Command Line Tools.
Once installed, the tools should be available through the command geomesa-fs
:
$ geomesa-fs
INFO Usage: geomesa-fs [command] [command options]
Commands:
...
Commands that are common to multiple back ends are described in Command-Line Tools. The commands here are FileSystem-specific.
14.7.1. General Arguments¶
Most commands require you to specify the file system used. This generally includes the root path that the files
are under. Specify the path with --path
(or -p
).
14.7.2. Commands¶
14.7.2.1. compact
¶
Compact one or more filesystem partitions.
Argument | Description |
---|---|
-f, --feature-name * |
The name of the schema |
--partitions |
Partitions to compact (omit to compact all partitions) |
--mode |
One of local or distributed (to use map/reduce) |
--temp-path |
Path to a temp directory used for working files |
The --temp-path
argument may be useful when working with s3
data, as s3
is slow to write to.
14.7.2.2. get-files
¶
Displays the files for one or more filesystem partitions.
Argument | Description |
---|---|
-f, --feature-name * |
The name of the schema |
--partitions |
Partitions to compact (omit to list all partitions) |
14.7.2.3. get-partitions
¶
Displays the partitions for a given filesystem store.
Argument | Description |
---|---|
-f, --feature-name * |
The name of the schema |
14.7.2.4. ingest
¶
For an overview of ingestion options, see ingest.
This command ingests files into a GeoMesa FS Datastore. Note that a “datastore” is simply a path in the filesystem. All data and metadata will be stored in the filesystem under the hierarchy of the root path.
Argument | Description |
---|---|
-e, --encoding * |
The encoding used for the underlying files. Implementations are provided for parquet and orc . |
--partition-scheme * |
Common partition scheme name (e.g. daily, z2) or path to a file containing a scheme config |
--num-reducers |
Number of reducers to use (required for distributed ingest) |
--leaf-storage |
Use leaf storage |
--temp-path |
Path to a temp directory used for working files |
--storage-opt |
Additional storage options, as key=value |
The --partition-scheme
argument should be the well-known name of a provided partition scheme, or the name
of a file containing a partition scheme. See Partition Schemes for more information.
The --num-reducers
should generally be set to half the number of partitions.
The --temp-path
argument may be useful when working with s3
data, as s3
is slow to write to.
14.7.2.4.1. Example¶
Lets say that we have all our data for 2016 stored in an S3 bucket:
$ geomesa-fs ingest \
-p 's3a://mybucket/datastores/test' \
-e parquet \
--partition-scheme daily,z2-2bits \
-s s3a://mybucket/schemas/my-config.conf \
-C s3a://mybucket/schemas/my-config.conf \
--temp-dir hdfs://namenode:port/tmp/gm/1 \
--num-reducers 20 \
's3a://mybucket/data/2016/*'
After ingest we expect to see a file structure with metadata and parquet files in S3 for our type named “myfeature”:
aws s3 ls --recursive s3://mybucket/datastores/test
datastores/test/myfeature/schema.sft
datastores/test/myfeature/metadata
datastores/test/myfeature/2016/01/01/0/0000.parquet
datastores/test/myfeature/2016/01/01/2/0000.parquet
datastores/test/myfeature/2016/01/01/3/0000.parquet
datastores/test/myfeature/2016/01/02/0/0000.parquet
datastores/test/myfeature/2016/01/02/1/0000.parquet
datastores/test/myfeature/2016/01/02/3/0000.parquet
Two metadata files (schema.sft
and metadata
) store information about the schema, partition scheme, and list of
files that have been created. Note that the list of created files allows the datastore to quickly compute available
files to avoid possibly expensive directly listings against the filesystem. You may need to run update-metadata
if you decide to insert new files.
Notice that the bucket “directory structure” includes year, month, day and then a 0,1,2,3 representing a quadrant of the Z2 Space Filling Curve with 2bit resolution (i.e. 0 = lower left, 1 = lower right, 2 = upper left, 3 = upper right). Note that in our example January 1st and 2nd both do not have all four quadrants represented. This means that the input dataset for that day didn’t have any data in that region of the world. If additional data were ingested, the directory and a corresponding file would be created.
14.7.2.5. update-metadata
¶
Recompute the list of partitions stored within the metadata file in a filesystem datastore. This metadata file is used at query time in lieu of performing repeated directory listings.