20. GeoMesa Processes¶
The following analytic processes are available and optimized on GeoMesa
data stores, found in the geomesa-process
module:
- ArrowConversionProcess - encodes simple features in the Apache Arrow format
- BinConversionProcess - encodes simple features in a minimized 16-byte format
- DensityProcess - computes a density heatmap for a CQL query
- DateOffsetProcess - modifies the specified date field in a feature collection by an input time period.
- HashAttributeProcess/HashAttributeColorProcess - computes an additional ‘hash’ attribute which is useful for styling.
- JoinProcess - merges features from two different schemas using a common attribute field
- KNearestNeighborProcess - performs a KNN search
- Point2PointProcess - aggregates a collection of points into a collection of line segments
- ProximitySearchProcess - searches near a set input features
- QueryProcess - performs a Geomesa query, useful as input for nested requests
- RouteSearchProcess - matches features traveling along a given route
- SamplingProcess - uses statistical sampling to reduces the features returned by a query
- StatsProcess - returns various stats for a CQL query
- TrackLabelProcess - selects the last feature in a track based on a common attribute, useful for styling
- TubeSelectProcess - performs a correlated search across time and space
- UniqueProcess - identifies unique values for an attribute
Where possible, the calculations are pushed out to a distributed system for faster performance. Currently this has been implemented in the Accumulo data store and partially in the HBase data store. Other back-ends can still be used, but local processing will be used.
20.1. Installation¶
While they can be used independently, the common use case is to use them with GeoServer. To deploy them in GeoServer requires:
- a GeoMesa datastore plugin
- the GeoServer WPS extension
- the
geomesa-process-wps_2.11-<version>.jar
deployed in${GEOSERVER_HOME}/WEB-INF/lib
Note
Some processes also require custom output formats, available separately in the GPL licensed GeoMesa GeoServer WFS module
The GeoMesa datastore plugin and GeoMesa process jars are both available in the binary distribution in the gs-plugins directory.
Documentation about the GeoServer WPS Extension (including download instructions) is available here.
To verify the install, start GeoServer, and you should see a line like
INFO [geoserver.wps] - Found 15 bindable processes in GeoMesa Process Factory
.
In the GeoServer web UI, click ‘Demos’ and then ‘WPS request builder’. From the request builder, under ‘Choose Process’, click on any of the ‘geomesa:’ options to build up example requests and in some cases see results.
20.2. Processors¶
20.2.1. ArrowConversionProcess¶
The ArrowConversionProcess
converts an input feature collection to arrow format.
Parameters | Description |
---|---|
features | Input feature collection to encode. |
includeFids | Include feature IDs in arrow file. |
dictionaryFields | Attributes to dictionary encode. |
useCachedDictionaries | Use cached top-k stats (if available), or run a dynamic stats query to build dictionaries. |
sortField | Attribute to sort by. |
sortReverse | Reverse the default sort order. |
batchSize | Number of features to include in each record batch. |
doublePass | Build dictionaries first, then query results in a separate scan. |
20.2.2. BinConversionProcess¶
The BinConversionProcess
converts an input feature collection to BIN format.
Parameters | Description |
---|---|
features | Input feature collection to query. |
track | Track field to use for BIN records. |
geom | Geometry field to use for BIN records. |
dtg | Use cached top-k stats (if available), or run a dynamic stats query to build dictionaries. |
label | Attribute to sort by. |
axisOrder | Reverse the default sort order. |
20.2.3. DensityProcess¶
The DensityProcess
computes a density map over a set of features stored in GeoMesa. A raster image is returned.
Parameters | Description |
---|---|
data | Input Simple Feature Collection to run the density process over. |
radiusPixels | Radius of the density kernel in pixels. Controls the “fuzziness” of the density map. |
weightAttr | Name of the attribute to use for data point weights. |
outputBBOX | Bounding box and CRS of the output raster. |
outputWidth | Width of the output raster in pixels. |
outputHeight | Height of the output raster in pixels. |
20.2.4. DateOffsetProcess¶
The DateOffsetProcess
modifies the specified date field in a feature collection by an input time period.
Parameters | Description |
---|---|
data | Input features. |
dateField | The date attribute to modify. |
timeOffset | Time offset (e.g. P1D). |
20.2.5. HashAttributeProcess¶
The HashAttributeProcess
adds an attribute to each SimpleFeature that hashes the configured attribute modulo the configured param.
Parameters | Description |
---|---|
data | Input Simple Feature Collection to run the hash process over. |
attribute | The attribute to hash on. |
modulo | The divisor. |
20.2.5.1. Hash example (XML)¶
HashAttributeProcess_wps.xml
is a geoserver WPS call to the GeoMesa HashAttributeProcess. It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@HashAttributeProcess_wps.xml localhost:8080/geoserver/wps
The query should generate results that look like this
:
{
"id" : "d0971735-f8fe-47ed-a7cd-2e12280e8ac1",
"geometry" : {
"coordinates" : [
151.1554,
18.2014
],
"type" : "Point"
},
"type" : "Feature",
"properties" : {
"Vitesse" : 614,
"Heading" : 244,
"Date" : "2016-05-02T18:00:44.030+0000",
"hash" : 237,
"CabId" : 150002,
}
}
20.2.6. HashAttributeColorProcess¶
The HashAttributeColorProcess
adds an attribute to each SimpleFeature that hashes the configured attribute modulo the configured param and emit a color.
Parameters | Description |
---|---|
data | Input Simple Feature Collection to run the hash process over. |
attribute | The attribute to hash on. |
modulo | The divisor. |
20.2.7. JoinProcess¶
The JoinProcess
queries a feature type based on attributes from a second feature type.
Parameters | Description |
---|---|
primary | Primary feature collection being queried. |
secondary | Secondary feature collection to be joined. |
joinAttribute | Attribute field to join on. |
joinFilter | Additional filter to apply to joined features. |
attributes | Attributes to return. Attribute names should be qualified with the schema name, e.g. foo.bar. |
20.2.8. KNearestNeighborProcess¶
The KNearestNeighborProcess
performs a K Nearest Neighbor search on a Geomesa feature collection using another feature collection as input. Return k neighbors for each point in the input data set. If a point is the nearest neighbor of multiple points of the input data set, it is returned only once.
Parameters | Description |
---|---|
inputFeatures | Input feature collection that defines the KNN search. |
dataFeatures | The data set to query for matching features. |
numDesired | K : number of nearest neighbors to return. |
estimatedDistance | Estimate of Search Distance in meters for K neighbors—used to set the granularity of the search. |
maxSearchDistance | Maximum search distance in meters—used to prevent runaway queries of the entire table. |
20.2.8.1. K-Nearest-Neighbor example (XML)¶
KNNProcess_wps.xml
is a geoserver WPS call to the GeoMesa KNearestNeighborProcess. It is here chained with a Query process (see Chaining Processes) in order to avoid points related to the same Id to be matched by the request. It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@KNNProcess_wps.xml localhost:8080/geoserver/wps
20.2.9. Point2PointProcess¶
The Point2PointProcess
aggregates a collection of points into a collection of line segments.
Parameters | Description |
---|---|
data | Input feature collection. |
groupingField | Field on which to group. |
sortField | Field on which to sort (must be Date type). |
minimumNumberOfPoints | Minimum number of points. |
breakOnDay | Break connections on day marks. |
filterSingularPoints | Filter out segments that fall on the same point. |
20.2.9.1. Point2Point example (XML)¶
Point2PointProcess_wps.xml
is a geoserver WPS call to the GeoMesa Point2PointProcess. It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@Point2PointProcess_wps.xml localhost:8080/geoserver/wps
The query should generate results that look like this
:
{
"id" : "367152240-4",
"geometry" : {
"coordinates" : [
[
-13.4041,
37.8067
],
[
-13.4041,
37.8068
]
],
"type" : "LineString"
},
"type" : "Feature",
"properties" : {
"Date_end" : "2018-02-05T14:54:36.598+0000",
"CabId" : 367152240,
"Date_start" : "2018-02-05T14:53:58.078+0000"
}
}
20.2.10. ProximitySearchProcess¶
The ProximitySearchProcess
performs a proximity search on a Geomesa feature collection using another feature collection as input.
Parameters | Description |
---|---|
inputFeatures | Input feature collection that defines the proximity search. |
dataFeatures | The data set to query for matching features. |
bufferDistance | Buffer size in meters. |
20.2.10.1. Proximity search example (XML)¶
ProximitySearchProcess_wps.xml
is a geoserver WPS call to the GeoMesa ProximitySearchProcess. It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@ProximitySearchProcess_wps.xml localhost:8080/geoserver/wps
20.2.11. RouteSearchProcess¶
The RouteSearchProcess
finds features around a route that are heading along the route and not just crossing over it.
Parameters | Description |
---|---|
features | Input feature collection to query. |
routes | Routes to search along. Features must have a geometry of LineString. |
bufferSize | Buffer size (in meters) to search around the route. |
headingThreshold | Threshold for comparing headings, in degrees. |
routeGeomField | Attribute that will be examined for routes to match. Must be a LineString. |
geomField | Attribute that will be examined for route matching. |
bidirectional | Consider the direction of the route or just the path of the route. |
headingField | Attribute that will be examined for heading in the input features. If not provided, input features geometries must be LineStrings. |
20.2.11.1. Route search example (XML)¶
RouteSearchProcess_wps.xml
is a geoserver WPS call to the GeoMesa RouteSearchProcess. It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@RouteSearchProcess_wps.xml localhost:8080/geoserver/wps
20.2.12. SamplingProcess¶
The SamplingProcess
uses statistical sampling to reduces the features returned by a query.
Parameters | Description |
---|---|
data | Input features. |
samplePercent | Percent of features to return, between 0 and 1. |
threadBy | Attribute field to link associated features for sampling. |
20.2.12.1. Sampling example (XML)¶
SamplingProcess_wps.xml
is a geoserver WPS call to the GeoMesa SamplingProcess. It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@SamplingProcess_wps.xml localhost:8080/geoserver/wps
20.2.13. StatsProcess¶
The StatsProcess
allows the running of statistics on a given feature set.
Parameters | Description |
---|---|
features | The feature set on which to query. Can be a raw text input, reference to a remote URL, a subquery or a vector layer. |
statString | Stat string indicating which stats to instantiate. More info here Stat Strings. |
encode | Return the values encoded as json. Must be true or false ; empty values will not work. |
properties | The properties / transforms to apply before gathering stats. |
20.2.13.1. Stat Strings¶
Stat strings are a GeoMesa domain specific language (DSL) that allows the specification of stats for the iterators to collect. The available stat function are listed below:
Note
Items marked with *
are the name of an attribute, either in your sft or as the result of a transformation or projection.
Note
A TimePeriod is defined as one of the following strings: “day”, “week”, “month”, “year”
Syntax | Parameters | Description |
---|---|---|
Count | ||
Count() |
Counts the number of features. | |
MinMax | ||
MinMax(attribute) |
|
Finds the min and max values of the given attribute. |
GroupBy | ||
GroupBy(attribute,stat) |
|
Groups stats by the given attribute and then runs the given stat on each group. Any stat can be provided. |
Descriptive Stats | ||
DescriptiveStats(attribute) |
|
Runs single pass stats on the given attribute calculating stats describing the attribute such as: count; min; max; mean; and population and sample versions of variance, standard deviation, kurtosis, excess kurtosis, covariance, and correlation. |
Enumeration | ||
Enumeration(attribute) |
|
Enumerates the values in the give attribute and the number of occurrences. |
TopK | ||
TopK(attribute) |
|
TopK of the given attribute |
Histogram | ||
Histogram(attribute,numBins,lower,upper) |
|
Provides a histogram of the given attribute, binning the results into a binned array using the numBins as the number of bins and lower and upper as the bounds of the binned array. |
Frequency | ||
Frequency(attribute,dtg,period,precision) |
|
Estimates frequency counts at scale. |
z3Histogram | ||
Z3Histogram(geom,dtg,period,length) |
|
Provides a histogram similar to Histogram but
treats the geometry and date attributes as a single
value. |
z3Frequency | ||
Z3Frequency(geom,dtg,period,precision) |
|
Provides a freqency estimate similar to Frequency
but treats the geometry and date attributes as a
single value. |
Iterator Stack | ||
IteratorStackCount() |
IteratorStackCount keeps track of the number of times Accumulo sets up an iterator stack as a result of a query. |
20.2.14. TrackLabelProcess¶
The TrackLabelProcess
returns a single feature that is the head of a track of related simple features.
Parameters | Description |
---|---|
data | Input features. |
track | Track attribute to use for grouping features. |
dtg | Date attribute to use for ordering tracks. |
20.2.14.1. TrackLabel example (XML)¶
TrackLabelProcess_wps.xml
is a geoserver WPS call to the GeoMesa TrackLabelProcess. It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@TrackLabelProcess_wps.xml localhost:8080/geoserver/wps
20.2.15. TubeSelectProcess¶
The TubeSelectProcess
performs a tube select on a Geomesa feature collection based on another feature collection. To get more informations on TubeSelectProcess
and how to use it, you can read this tutorial.
Parameters | Description |
---|---|
tubeFeatures | Input feature collection (must have geometry and datetime). |
featureCollection | The data set to query for matching features. |
filter | The filter to apply to the featureCollection. |
maxSpeed | Max speed of the object in m/s for nofill & line gapfill methods. |
maxTime | Time as seconds for nofill & line gapfill methods. |
bufferSize | Buffer size in meters to use instead of maxSpeed/maxTime calculation. |
maxBins | Number of bins to use for breaking up query into individual queries. |
gapFill | Method of filling gap (nofill, line). |
20.2.15.1. TubeSelect example (XML)¶
TubeSelectProcess_wps.xml
is a geoserver WPS call to the GeoMesa TubeSelectProcess. It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@TubeSelectProcess_wps.xml localhost:8080/geoserver/wps
20.2.16. QueryProcess¶
The QueryProcess
takes an (E)CQL query/filter for a given feature set as a text object and returns
the result as a json object.
Parameters | Description |
---|---|
features |
|
filter |
<wps:ComplexData mimeType="text/plain; subtype=cql">
<![CDATA[some-query-text]]
</wps:ComplexData>
|
output |
<wps:ResponseForm>
<wps:RawDataOutput mimeType="application/json">
<ows:Identifier>result</ows:Identifier>
</wps:RawDataOutput>
</wps:ResponseForm>
For interactive WPS request builder check the Generate box and choose “application/json” |
properties | The properties / transforms to apply before gathering stats. |
20.2.16.1. Query example (XML)¶
QueryProcess_wps.xml
is a geoserver WPS call to the GeoMesa QueryProcess that performs the same query shown
in the Accumulo-quickstart. It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@QueryProcess_wps.xml localhost:8080/geoserver/wps
The query should generate results that look like this
:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-76.513,
-37.4941
]
},
"properties": {
"Who": "Bierce",
"What": 931,
"When": "2014-07-04T22:25:38.000+0000"
},
"id": "Observation.931"
}
]
}
20.2.17. UniqueProcess¶
The UniqueProcess
class is optimized for GeoMesa to find unique attributes values for a feature collection,
which are returned as a json object.
Parameters | Description |
---|---|
features |
|
attribute |
|
filter |
<wps:ComplexData mimeType="text/plain; subtype=cql">
<![CDATA[some-query-text]]
</wps:ComplexData>
|
histogram |
|
sort |
|
sortByCount |
|
output |
<wps:ResponseForm>
<wps:RawDataOutput mimeType="application/json">
<ows:Identifier>result</ows:Identifier>
</wps:RawDataOutput>
</wps:ResponseForm>
For interactive WPS request builder check the Generate box and choose “application/json” |
20.2.17.1. Unique example (XML)¶
UniqueProcess_wps.xml
is a geoserver WPS call to the GeoMesa UniqueProcess that reports the unique names
in in the ‘Who’ field of the Accumulo quickstart data for a restricted bounding box (-77.5, -37.5, -76.5, -36.5)). It can be run with the following curl call:
curl -v -u admin:geoserver -H "Content-Type: text/xml" -d@UniqueProcess_wps.xml localhost:8080/geoserver/wps
The query should generate results that look like this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"value": "Addams",
"count": 37
},
"id": "fid--21d4eb0_15b68e0e8ca_-7fd6"
},
{
"type": "Feature",
"properties": {
"value": "Bierce",
"count": 43
},
"id": "fid--21d4eb0_15b68e0e8ca_-7fd5"
},
{
"type": "Feature",
"properties": {
"value": "Clemens",
"count": 48
},
"id": "fid--21d4eb0_15b68e0e8ca_-7fd4"
}
]
}
20.2.18. Chaining Processes¶
WPS processes can be chained, using the result of one process as the input for another. For example, a bounding box
in a GeoMesa QueryProcess can be used to restrict data sent to StatsProcess.
GeoMesa_WPS_chain_example.xml
will get all points from
the AccumuloQuickStart table that are within a specified bounding box (-77.5, -37.5, -76.5, -36.5), and calculate
descriptive statistics on the ‘What’ attribute of the results.
The query should generate results that look like this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
0,
0
]
},
"properties": {
"stats": "{\"count\":128,\"minimum\":[29.0],\"maximum\":[991.0],\"mean\":[508.5781249999999],\"population_variance\":[85116.25952148438],\"population_standard_deviation\":[291.74691004616375],\"population_skewness\":[-0.11170819256679464],\"population_kurtosis\":[1.7823482287566166],\"population_excess_kurtosis\":[-1.2176517712433834],\"sample_variance\":[85786.46628937007],\"sample_standard_deviation\":[292.893267743337],\"sample_skewness\":[-0.11303718280959842],\"sample_kurtosis\":[1.8519712064424219],\"sample_excess_kurtosis\":[-1.1480287935575781],\"population_covariance\":[85116.25952148438],\"population_correlation\":[1.0],\"sample_covariance\":[85786.46628937007],\"sample_correlation\":[1.0]}"
},
"id": "stat"
}
]
}