8.4. Using Converters with the Command-Line Tools¶
The GeoMesa binary distributions ship with prepackaged feature type and converter definitions for common data types including Twitter, GeoNames, T-drive, and several more. These converters can be used with the GeoMesa command-line tools out of the box. See Prepackaged Converter Definitions. In addition, common file formats such as GeoJSON, delimited text, or self-describing Avro can often be ingested without a converter. See ingest for details.
Users can add additional SimpleFeatureType and converter types by providing a reference.conf
file
embedded with a JAR within the lib
directory, or by adding the types to the
application.conf
file in the conf
directory of the tools distribution.
Note
The example below is specific to the GeoMesa Accumulo distribution, but the general principle is the same for each distribution. Only the home variable and command-line tool name will differ depending on GeoMesa distribution.
Given the following sample CSV file example.csv
:
ID,Name,Age,LastSeen,Friends,Lat,Lon
23623,Harry,20,2015-05-06,"Will, Mark, Suzan",-100.236523,23
26236,Hermione,25,2015-06-07,"Edward, Bill, Harry",40.232,-53.2356
3233,Severus,30,2015-10-23,"Tom, Riddle, Voldemort",3,-62.23
A “renegades” SFT and “renegades-csv” converter may be specified in
the GeoMesa Tools configuration file ($GEOMESA_ACCUMULO_HOME/conf/application.conf
)
as shown below. By default, SFTs will be loaded from the file
at the path geomesa.sfts
and converters will be loaded at the path
geomesa.converters
. Each converter and SFT definition is keyed by the name that
can be referenced in the converter and SFT loaders.
$GEOMESA_ACCUMULO_HOME/conf/application.conf
:
geomesa = {
sfts = {
# other SFTs
# ...
"renegades" = {
attributes = [
{ name = "fid", type = "Integer", index = false }
{ name = "name", type = "String", index = true }
{ name = "age", type = "Integer", index = false }
{ name = "lastseen", type = "Date", index = true }
{ name = "friends", type = "List[String]", index = true }
{ name = "geom", type = "Point", index = true, srid = 4326, default = true }
]
}
}
converters = {
# other converters
# ...
"renegades-csv" = {
type = "delimited-text",
format = "CSV",
options {
skip-lines = 1
},
id-field = "toString($fid)",
fields = [
{ name = "fid", transform = "$1::int" }
{ name = "name", transform = "$2::string" }
{ name = "age", transform = "$3::int" }
{ name = "lastseen", transform = "date('yyyy-MM-dd', $4)" }
{ name = "friends", transform = "parseList('string', $5)" }
{ name = "lon", transform = "$6::double" }
{ name = "lat", transform = "$7::double" }
{ name = "geom", transform = "point($lon, $lat)" }
]
}
}
}
Use geomesa-accumulo env
to confirm that geomesa-accumulo ingest
can properly read
the updated file.
$ geomesa-accumulo env
Once the converter and SFT are registered, it can be used to ingest the
example.csv
file:
$ geomesa-accumulo ingest -u <user> -p <pass> -i <instance> -z <zookeepers> -s renegades -C renegades-csv example.csv