8.6. Command Line Examples¶
This chapter provides hands-on examples of some common tasks in GeoMesa, including the management of registered feature types in a data store, ingest of data, and export of data in a variety of formats.
8.6.1. Feature Type Management¶
8.6.1.1. Creating a feature type¶
To begin, let’s start by creating a new feature type in GeoMesa with the
create
command. The create
command takes three required and one
optional flag:
Required
-c
or--catalog
: the name of the catalog table-f
or--feature-name
: the name of the feature-s
or--spec
: theSimpleFeatureType
specification
Optional
--dtg
: the default date attribute of theSimpleFeatureType
Run the command:
$ geomesa-accumulo create-schema -u <username> -p <password> \
-c cmd_tutorial \
-f feature \
-s fid:String:index=true,dtg:Date,geom:Point:srid=4326 \
-dtg dtg
This will create a new feature type, named “feature”, on the GeoMesa catalog table “cmd_tutorial”. The catalog table stores metadata information about each feature, and it will be used to prefix each table name in Accumulo.
If the above command was successful, you should see output similar to the following:
Creating 'cmd_tutorial_feature' with spec 'fid:String:index=true,dtg:Date,geom:Point:srid=4326'. Just a few moments...
Feature 'cmd_tutorial_feature' with spec 'fid:String:index=true,dtg:Date,geom:Point:srid=4326' successfully created.
Now that you’ve seen how to create feature types, create another feature
type on catalog table “cmd_tutorial” using your own first name for the
--feature-name
and the above schema for the --spec
.
8.6.1.2. Listing known feature types¶
You should have two feature types on catalog table “cmd_tutorial”. To
verify, we’ll use the list
command. The list
command takes one
flag:
-c
or--catalog
: the name of the catalog table
Run the following command:
$ geomesa-accumulo get-type-names -u <username> -p <password> -c cmd_tutorial
The output text should be something like:
Listing features on 'cmd_tutorial'. Just a few moments...
2 features exist on 'cmd_tutorial'. They are:
feature
gdelt
8.6.1.3. Finding the attributes of a feature type¶
To find out more about the attributes of a feature type, we’ll use the
describe
command. This command takes two flags:
-c
or--catalog
: the name of the catalog table-f
or--feature-name
: the name of the feature type
Let’s find out more about the attributes on our first feature type. Run the command
$ geomesa-accumulo describe-schema -u <username> -p <password> -c cmd_tutorial -f feature
The output should look like:
Describing attributes of feature 'cmd_tutorial_feature'. Just a few moments...
fid: String (Indexed)
dtg: Date (Time-index)
geom: Point (Geo-index)
8.6.1.4. Deleting a feature type¶
Continuing on, let’s delete the first feature type we created with the
remove-schema
command. The remove-schema
command takes two flags:
-c
or--catalog
: the name of the catalog table-f
or--feature-name
: the name of the feature to delete
Run the following command:
geomesa remove-schema -u <username> -p <password> -c cmd_tutorial -f feature
NOTE: Running this command will take a bit longer than the previous two, as it will delete three tables in Accumulo, as well as remove the metadata rows in the catalog table associated with the feature.
The output should resemble the following:
Remove schema feature from catalog cmd_tutorial? (yes/no): yes
Starting
State change: CONNECTED
Removed feature
8.6.2. Ingesting Data¶
GeoMesa Tools is a set of command line tools to add feature management functions, query planning and explanation, ingest, and export abilities from the command line. In this tutorial, we’ll cover how to ingest and export features using GeoMesa Tools.
8.6.2.1. Getting Data¶
For this tutorial we will be using the GDELT data set, available here: http://data.gdeltproject.org/events/index.html. Download any daily data file, for example:
20160119.export.CSV.zip
and unzip the file on your computer.
Note
The unpacked files have *.CSV
extensions but the data within them are
actually tab separated.
8.6.2.2. Ingesting Features¶
The ingest command currently supports three formats: CSV, TSV, and SHP.
The ingest
command has the following required flags:
-u
or--user
: the Accumulo user-c
or--catalog
: the name of the GeoMesa catalog table-f
or--feature-name
: the name of the feature to ingest
One (not both) of the following flags must also be specified:
-p
or--password
: the Accumulo password--keytab
: path to a Kerberos keytab file
If -p
(or --password
) and --keytab
are both omitted, then password authentication is assumed and the user
is prompted for a password.
If $ACCUMULO_HOME
does not contain the configuration of the Accumulo
instance you wish to connect to, you also must specify the connection
parameters for Accumulo:
-i
or--instance
: the Accumulo instance-z
or--zookeepers
: a comma-separated list of Zookeeper hosts
The optional -C
switch lets you specify a converter defined in a JSON-based
instruction file about how to convert the data as GeoMesa reads it. The
converter library handles many of the data transformations necessary to fit a
raw data set into a simple feature type suitable for use in GeoMesa
applications. Conversions can take advantage of a variety of features such as
concatenate()
and stringToInteger()
functions as well as the use of regular
expressions. For more information see Setting up an Ingest Converter below.
The last argument that is required for all ingest commands is the path
to the file to ingest. If ingesting CSV/TSV data this can be an HDFS
path, specified by prefixing it with hdfs://
.
8.6.2.3. Setting up an Ingest Converter¶
To use the -C
switch, create (or edit) the file
$GEOMESA_ACCUMULO_HOME/conf/application.conf
, which serves as the converter
configuration file, to add the gdelt
SimpleFeatureType and a converter
gdelt_csv
for reading the data from tab-separated value files:
geomesa {
sfts {
gdelt = {
fields = [
{ name = globalEventId, type = String, index = false}
{ name = eventCode, type = String }
{ name = actor1, type = String }
{ name = actor2, type = String }
{ name = dtg, type = Date, index = true }
{ name = geom, type = Point, srid = 4326 }
]
}
}
converters {
gdelt_tsv = {
type = delimited-text
format = TDF
id-field = "$1" // global event id
fields = [
{ name = globalEventId, transform = "$1" }
{ name = eventCode, transform = "$27" }
{ name = actor1, transform = "$7" }
{ name = actor2, transform = "$17" }
{ name = dtg, transform = "date('yyyyMMdd', $2)" }
{ name = geom, transform = "point(stringToDouble($41, 0.0), $40::double)" }
]
}
}
}
The config file needs to have a SimpleFeatureType
defined along with a
converter that specifies instructions on how to turn the raw data file into
that simple feature type. See GeoMesa Convert for a more details
on converters, including a full list of the transformation functions available
(Transformation Function Overview).
This example uses the date()
function to tell the parser what date column
is in. The stringToDouble()
and ::double
functions give two different
methods for type casting. The stringTo<dataType>()
methods take in the
value to be cast as well as a prespecified default that will be returned if
there is an exception, whereas the ::double
function will fail (and drop
the record) if the casting fails.
To confirm that GeoMesa can properly parse your edited
$GEOMESA_ACCUMULO_HOME/conf/application.conf
file, use geomesa-accumulo env
:
$ geomesa-accumulo env -s gdelt --format spec
Using GEOMESA_ACCUMULO_HOME = /opt/geomesa/tools
Simple Feature Types:
gdelt = globalEventId:String,eventCode:String,actor1:String,actor2:String,dtg:Date:index=join,*geom:Point:srid=4326;geomesa.index.dtg='dtg',geomesa.table.sharing='false'
$ geomesa-accumulo env -c gdelt_tsv
Using GEOMESA_ACCUMULO_HOME = /opt/geomesa/tools
Simple Feature Type Converters:
converter-name=gdelt_tsv
fields=[
{
name=globalEventId
transform="$1"
},
{
name=eventCode
transform="$27"
},
{
name=actor1
transform="$7"
},
{
name=actor2
transform="$17"
},
{
name=dtg
transform="date('yyyyMMdd', $2)"
},
{
name=geom
transform="point(stringToDouble($41, 0.0), $40::double)"
}
]
format=TDF
# global event id
id-field="$1"
type=delimited-text
8.6.2.4. Downloading sample data¶
Packaged with geomesa script for easily downloading publicly available data sets and a set of corresponding config files.
The currently available data sets are GDELT, GeoLife, OSM-GPX, T-Drive, GeoNames, NYCTaxi, GTD, and Twitter. The first five of these sets are easily downloadable via a provided script.
To download these sets, run the download script found in geomesa-tools/bin and
provide the name of the data set desired.
This can be one of gdelt
, geolife
, osm-gpx
, tdrive
, or geonames
:
Example Usage:
$ ./download-data.sh geolife
Depending on the desired data, you may be prompted further information to specify desired dates or locations.
The resulting data will then be downloaded to $GEOMESA_ACCUMULO_HOME/data
.
Configuration files for these data sets are found under $GEOMESA_ACCUMULO_HOME/conf/sfts
.
Modifications to them can seen by running geomesa-accumulo env
and will be reflected in the next run ingest.
8.6.2.5. Running an Ingest¶
Now that we have everything ready, we will now combine the various parameters into the following complete ingest command:
$ geomesa-accumulo ingest \
-u <username> -p <password> -i <instance> -z <zookeepers> \
-c gdelt -s gdelt -C gdelt_tsv --threads 1 \
/path/to/<gdelt-data-file>.csv
<username>
and <password>
are the credentials associated with
the Accumulo instance. <instance>
and <zookeepers>
are the
connection parameters for Accumulo, if this is not specified in the
configuration files in $ACCUMULO_HOME
.
8.6.3. Exporting Features¶
Let’s export your newly ingested features in a couple of file formats.
Currently, the export
command supports exports to CSV, TSV,
Shapefile, GeoJSON, and GML. We’ll do one of each format in this next
section.
The export
command has 3 required flags:
-c
or--catalog
: the name of the catalog table-f
or--feature-name
: the name of the feature to export-F
or--format
: the output format (csv
,tsv
,shp
,geojson
, orgml
)
Additionally, you can specify more details about the kind of export you
would like to perform with optional flags for export
:
-a
or--attributes
: the attributes of the feature to return-m
or--max-features
: the maximum number of features to return in an export-q
or--query
: a CQL query to perform on the features, to return only subset of features matching the query
We’ll use the --max-features
flag to ensure our dataset is small and
quick to export. First, we’ll export to CSV with the following command:
$ geomesa-accumulo export -u <username> -p <password> -c gdelt_Ukraine -fn gdelt -fmt csv -max 50
# or specifying Accumulo configuration explicitly:
$ geomesa-accumulo export \
-u <username> -p <password> -i <instance> -z <zookeepers> \
-c gdelt -f gdelt -f csv -m 50
This command will output the relevant rows to the console. Inspect the rows now, or pipe the output into a file for later review.
Now, run the above command four additional times, changing the
--format
flag to tsv
, shp
, json
, and gml
. The
shp
format also requires the -o
option to specify the name of an
output file.