11.9. Deploying GeoMesa Spark with Zeppelin¶
Apache Zeppelin is a web-based notebook server for interactive data analytics, which includes built-in support for Spark.
Note
The instructions below have been tested with Zeppelin version 0.7.0.
11.9.1. Installing Zeppelin¶
Follow the Zeppelin installation instructions, as well as the instructions for configuring the Zeppelin Spark interpreter.
11.9.2. Configuring Zeppelin with GeoMesa¶
The GeoMesa Accumulo Spark runtime JAR may be found either in the dist/spark
directory of the GeoMesa Accumulo
binary distribution, or (after building) in the geomesa-accumulo/geomesa-accumulo-spark-runtime-accumulo21/target
directory of the GeoMesa source distribution.
Note
See Spatial RDD Providers for details on choosing the correct GeoMesa Spark runtime JAR.
In the Zeppelin web UI, click on the downward-pointing triangle next to the username in the upper-right hand corner and select “Interpreter”.
Scroll to the bottom where the “Spark” interpreter configuration appears.
Click on the “edit” button next to the interpreter name (on the right-hand side of the UI).
- In the “Dependencies” section, add the GeoMesa JAR, either as
the full local path to the
geomesa-accumulo-spark-runtime-accumulo21_${VERSION}.jar
described above, orthe Maven groupId:artifactId:version coordinates (
org.locationtech.geomesa:geomesa-accumulo-spark-runtime-accumulo21_2.12:$VERSION
)
Click “Save”. When prompted by the pop-up, click to restart the Spark interpreter.
It is not necessary to restart Zeppelin.
11.9.3. Data Plotting¶
Zeppelin provides built-in tools for visualizing quantitative data, which can be invoked by prepending
“%table\n” to tab-separated output (see the Zeppelin table display system). For example, the following method
may be used to print a DataFrame
via this display system:
def printDF(df: DataFrame) = {
val dfc = df.collect
println("%table")
println(df.columns.mkString("\t"))
dfc.foreach(r => println(r.mkString("\t")))
}