11.9. Deploying GeoMesa Spark with Zeppelin

Apache Zeppelin is a web-based notebook server for interactive data analytics, which includes built-in support for Spark.

Note

The instructions below have been tested with Zeppelin version 0.7.0.

11.9.1. Installing Zeppelin

Follow the Zeppelin installation instructions, as well as the instructions for configuring the Zeppelin Spark interpreter.

11.9.2. Configuring Zeppelin with GeoMesa

The GeoMesa Accumulo Spark runtime JAR may be found either in the dist/spark directory of the GeoMesa Accumulo binary distribution, or (after building) in the geomesa-accumulo/geomesa-accumulo-spark-runtime-accumulo21/target directory of the GeoMesa source distribution.

Note

See Spatial RDD Providers for details on choosing the correct GeoMesa Spark runtime JAR.

  1. In the Zeppelin web UI, click on the downward-pointing triangle next to the username in the upper-right hand corner and select “Interpreter”.

  2. Scroll to the bottom where the “Spark” interpreter configuration appears.

  3. Click on the “edit” button next to the interpreter name (on the right-hand side of the UI).

  4. In the “Dependencies” section, add the GeoMesa JAR, either as
    1. the full local path to the geomesa-accumulo-spark-runtime-accumulo21_${VERSION}.jar described above, or

    2. the Maven groupId:artifactId:version coordinates (org.locationtech.geomesa:geomesa-accumulo-spark-runtime-accumulo21_2.12:$VERSION)

  5. Click “Save”. When prompted by the pop-up, click to restart the Spark interpreter.

It is not necessary to restart Zeppelin.

11.9.3. Data Plotting

Zeppelin provides built-in tools for visualizing quantitative data, which can be invoked by prepending “%table\n” to tab-separated output (see the Zeppelin table display system). For example, the following method may be used to print a DataFrame via this display system:

def printDF(df: DataFrame) = {
  val dfc = df.collect
  println("%table")
  println(df.columns.mkString("\t"))
  dfc.foreach(r => println(r.mkString("\t")))
}