15.10. Command Line Examples

This chapter provides hands-on examples of some common tasks in GeoMesa, including the management of registered feature types in a data store, ingest of data, and export of data in a variety of formats.

15.10.1. Feature Type Management

15.10.1.1. Creating a feature type

To begin, let’s start by creating a new feature type in GeoMesa with the create command. The create command takes three required and one optional flag:

Required

  • -c or --catalog: the name of the catalog table

  • -f or --feature-name: the name of the feature

  • -s or --spec: the SimpleFeatureType specification

Optional

  • --dtg: the default date attribute of the SimpleFeatureType

Run the command:

$ geomesa-accumulo create-schema -u <username> -p <password> \
-c cmd_tutorial \
-f feature \
-s fid:String:index=true,dtg:Date,geom:Point:srid=4326 \
-dtg dtg

This will create a new feature type, named “feature”, on the GeoMesa catalog table “cmd_tutorial”. The catalog table stores metadata information about each feature, and it will be used to prefix each table name in Accumulo.

If the above command was successful, you should see output similar to the following:

Creating 'cmd_tutorial_feature' with spec 'fid:String:index=true,dtg:Date,geom:Point:srid=4326'. Just a few moments...
Feature 'cmd_tutorial_feature' with spec 'fid:String:index=true,dtg:Date,geom:Point:srid=4326' successfully created.

Now that you’ve seen how to create feature types, create another feature type on catalog table “cmd_tutorial” using your own first name for the --feature-name and the above schema for the --spec.

15.10.1.2. Listing known feature types

You should have two feature types on catalog table “cmd_tutorial”. To verify, we’ll use the list command. The list command takes one flag:

  • -c or --catalog: the name of the catalog table

Run the following command:

$ geomesa-accumulo get-type-names -u <username> -p <password> -c cmd_tutorial

The output text should be something like:

Listing features on 'cmd_tutorial'. Just a few moments...
2 features exist on 'cmd_tutorial'. They are:
feature
gdelt

15.10.1.3. Finding the attributes of a feature type

To find out more about the attributes of a feature type, we’ll use the describe command. This command takes two flags:

  • -c or --catalog: the name of the catalog table

  • -f or --feature-name: the name of the feature type

Let’s find out more about the attributes on our first feature type. Run the command

$ geomesa-accumulo describe-schema -u <username> -p <password> -c cmd_tutorial -f feature

The output should look like:

Describing attributes of feature 'cmd_tutorial_feature'. Just a few moments...
fid: String (Indexed)
dtg: Date (Time-index)
geom: Point (Geo-index)

15.10.1.4. Deleting a feature type

Continuing on, let’s delete the first feature type we created with the remove-schema command. The remove-schema command takes two flags:

  • -c or --catalog: the name of the catalog table

  • -f or --feature-name: the name of the feature to delete

Run the following command:

$ geomesa-accumulo remove-schema -u <username> -p <password> -c cmd_tutorial -f feature

NOTE: Running this command will take a bit longer than the previous two, as it will delete three tables in Accumulo, as well as remove the metadata rows in the catalog table associated with the feature.

The output should resemble the following:

Remove schema feature from catalog cmd_tutorial? (yes/no): yes
Starting
State change: CONNECTED
Removed feature

15.10.2. Ingesting Data

GeoMesa Tools is a set of command line tools to add feature management functions, query planning and explanation, ingest, and export abilities from the command line. In this tutorial, we’ll cover how to ingest and export features using GeoMesa Tools.

15.10.2.1. Getting Data

For this tutorial we will be using the GDELT data set, available here: http://data.gdeltproject.org/events/index.html. Download any daily data file, for example:

20160119.export.CSV.zip

and unzip the file on your computer.

Note

The unpacked files have *.CSV extensions but the data within them are actually tab separated.

15.10.2.2. Ingesting Features

The ingest command currently supports three formats: CSV, TSV, and SHP.

The ingest command has the following required flags:

  • -u or --user: the Accumulo user

  • -c or --catalog: the name of the GeoMesa catalog table

  • -f or --feature-name: the name of the feature to ingest

One (not both) of the following flags must also be specified:

  • -p or --password: the Accumulo password

  • --keytab: path to a Kerberos keytab file

If -p (or --password) and --keytab are both omitted, then password authentication is assumed and the user is prompted for a password.

If $ACCUMULO_HOME does not contain the configuration of the Accumulo instance you wish to connect to, you also must specify the connection parameters for Accumulo:

  • -i or --instance: the Accumulo instance

  • -z or --zookeepers: a comma-separated list of Zookeeper hosts

The optional -C switch lets you specify a converter defined in a JSON-based instruction file about how to convert the data as GeoMesa reads it. The converter library handles many of the data transformations necessary to fit a raw data set into a simple feature type suitable for use in GeoMesa applications. Conversions can take advantage of a variety of features such as concatenate() and stringToInteger() functions as well as the use of regular expressions. For more information see Setting up an Ingest Converter below.

The last argument that is required for all ingest commands is the path to the file to ingest. If ingesting CSV/TSV data this can be an HDFS path, specified by prefixing it with hdfs://.

15.10.2.3. Setting up an Ingest Converter

To use the -C switch, create (or edit) the file $GEOMESA_ACCUMULO_HOME/conf/application.conf, which serves as the converter configuration file, to add the gdelt SimpleFeatureType and a converter gdelt_csv for reading the data from tab-separated value files:

geomesa {
  sfts {
    gdelt = {
      fields = [
        { name = globalEventId, type = String, index = false}
        { name = eventCode, type = String }
        { name = actor1, type = String }
        { name = actor2, type = String }
        { name = dtg, type = Date, index = true }
        { name = geom, type = Point, srid = 4326 }
      ]
    }
  }
  converters {
    gdelt_tsv = {
      type = delimited-text
      format = TDF
      id-field = "$1" // global event id
      fields = [
        { name = globalEventId, transform = "$1" }
        { name = eventCode,     transform = "$27" }
        { name = actor1,        transform = "$7" }
        { name = actor2,        transform = "$17" }
        { name = dtg,           transform = "date('yyyyMMdd', $2)" }
        { name = geom,          transform = "point(stringToDouble($41, 0.0), $40::double)" }
      ]
    }
  }
}

The config file needs to have a SimpleFeatureType defined along with a converter that specifies instructions on how to turn the raw data file into that simple feature type. See GeoMesa Converters for a more details on converters, including a full list of the transformation functions available (Transformation Function Overview).

This example uses the date() function to tell the parser what date column is in. The stringToDouble() and ::double functions give two different methods for type casting. The stringTo<dataType>() methods take in the value to be cast as well as a prespecified default that will be returned if there is an exception, whereas the ::double function will fail (and drop the record) if the casting fails.

To confirm that GeoMesa can properly parse your edited $GEOMESA_ACCUMULO_HOME/conf/application.conf file, use geomesa-accumulo env:

$ geomesa-accumulo env -s gdelt --format spec
Using GEOMESA_ACCUMULO_HOME = /opt/geomesa/tools
Simple Feature Types:
gdelt = globalEventId:String,eventCode:String,actor1:String,actor2:String,dtg:Date:index=join,*geom:Point:srid=4326;geomesa.index.dtg='dtg'

$ geomesa-accumulo env -c gdelt_tsv
Using GEOMESA_ACCUMULO_HOME = /opt/geomesa/tools

Simple Feature Type Converters:
converter-name=gdelt_tsv
fields=[
    {
        name=globalEventId
        transform="$1"
    },
    {
        name=eventCode
        transform="$27"
    },
    {
        name=actor1
        transform="$7"
    },
    {
        name=actor2
        transform="$17"
    },
    {
        name=dtg
        transform="date('yyyyMMdd', $2)"
    },
    {
        name=geom
        transform="point(stringToDouble($41, 0.0), $40::double)"
    }
]
format=TDF
# global event id
id-field="$1"
type=delimited-text

15.10.2.4. Downloading sample data

Packaged with geomesa script for easily downloading publicly available data sets and a set of corresponding config files.

The currently available data sets are GDELT, GeoLife, OSM-GPX, T-Drive, GeoNames, NYCTaxi, GTD, and Twitter. The first five of these sets are easily downloadable via a provided script.

To download these sets, run the download script found in geomesa-tools/bin and provide the name of the data set desired. This can be one of gdelt, geolife, osm-gpx, tdrive, or geonames:

Example Usage:

$ ./download-data.sh geolife

Depending on the desired data, you may be prompted further information to specify desired dates or locations.

The resulting data will then be downloaded to $GEOMESA_ACCUMULO_HOME/data.

Configuration files for these data sets are found under $GEOMESA_ACCUMULO_HOME/conf/sfts. Modifications to them can seen by running geomesa-accumulo env and will be reflected in the next run ingest.

15.10.2.5. Running an Ingest

Now that we have everything ready, we will now combine the various parameters into the following complete ingest command:

$ geomesa-accumulo ingest \
   -u <username> -p <password> -i <instance> -z <zookeepers> \
   -c gdelt -s gdelt -C gdelt_tsv --threads 1 \
   /path/to/<gdelt-data-file>.csv

<username> and <password> are the credentials associated with the Accumulo instance. <instance> and <zookeepers> are the connection parameters for Accumulo, if this is not specified in the configuration files in $ACCUMULO_HOME.

15.10.3. Exporting Features

Let’s export your newly ingested features in a couple of file formats. Currently, the export command supports exports to CSV, TSV, Shapefile, GeoJSON, and GML. We’ll do one of each format in this next section.

The export command has 3 required flags:

  • -c or --catalog: the name of the catalog table

  • -f or --feature-name: the name of the feature to export

  • -F or --format: the output format (csv, tsv, shp, geojson, or gml)

Additionally, you can specify more details about the kind of export you would like to perform with optional flags for export:

  • -a or --attributes: the attributes of the feature to return

  • -m or --max-features: the maximum number of features to return in an export

  • -q or --query: a CQL query to perform on the features, to return only subset of features matching the query

We’ll use the --max-features flag to ensure our dataset is small and quick to export. First, we’ll export to CSV with the following command:

$ geomesa-accumulo export -u <username> -p <password> -c gdelt_Ukraine -fn gdelt -fmt csv -max 50
# or specifying Accumulo configuration explicitly:
$ geomesa-accumulo export \
    -u <username> -p <password> -i <instance> -z <zookeepers> \
    -c gdelt -f gdelt -f csv -m 50

This command will output the relevant rows to the console. Inspect the rows now, or pipe the output into a file for later review.

Now, run the above command four additional times, changing the --format flag to tsv, shp, json, and gml. The shp format also requires the -o option to specify the name of an output file.