9.16.9. Marine Cadastre AIS¶
Automated Identification System is a global system to track ships, with each vessel beaconing its position to other (nearby) vessels. While various commercial data providers exist, this converter is intended for data collected by the U.S. Coast Guard around the USA and openly published by Marine Cadastre.
Prior to 2015, the data is disseminated as zipped ESRI File Geodatabases containing a number of tables. Only the
broadcast
table is considered here, which consists of vessel locations over time. Note that “Ship name and call sign
fields have been removed, and the MMSI (Maritime Mobile Service Identity) field has been encrypted for the 2010 through
2014 data at the request of the U.S. Coast Guard.”
For 2015 and later, the data is disseminated as zipped Comma Separated Values (CSV) files which have been enriched with additional vessel data (e.g., vessel name) and include unencrypted MMSI values.
9.16.9.1. Getting the Data¶
Machine-friendly links for bulk data download can be found here. Replace the four digit year with that desired (2009-2017), data is then split into month and UTM Zone. Once data is downloaded, it must be unzipped e.g.
find . -name "*.zip" -exec unzip {} \;
9.16.9.2. Converting the pre-2015 Data¶
GeoMesa does not currently support ESRI File Geodatabases (FileGDB), so an external tool is required to convert the data
into a suitable format. ogr2ogr
from GDAL can convert from FileGDB into Comma Separated
Value (CSV) format e.g.
find . -name "*.gdb" -exec sh -c "ogr2ogr -f CSV /dev/stdout {} Broadcast -lco GEOMETRY=AS_XY | tail -n +2 > {}.csv" \;
Note that the tail
command removes the header row from the output file, making ingest more amenable to processing
using HDFS or similar. Also note that the resulting CSV files may be very large – 264 GB for the 2009 & 2019 data.
9.16.9.3. Sample Ingest Command¶
Due to the two different formats, two different simple feature types (SFTs) and converters are required, both of which
are defined in the same file. marinecadastre-ais
defines the SFT and converter for data converted to CSV from
FileGDB format (prior to 2015), while marinecadastre-ais-csv
defines the SFT and converter for data natively in the
newer CSV format (2015 and later).
Check that the desired simple feature type and converter are available on the GeoMesa tools classpath.
This is the default case. Note that you will need to use the command specific to your back-end e.g.
geomesa-accumulo
.
geomesa env | grep 'marinecadastre-ais'
If they are not, merge the contents of reference.conf
with $GEOMESA_HOME/conf/application.conf
, or ensure that
reference.conf
is in $GEOMESA_HOME/conf/sfts/marinecadastre-ais
.
To ingest using the GeoMesa command line interface:
$ geomesa ingest -u username -c catalogName -s marinecadastre-ais -C marinecadastre-ais -t 8 /path/to/data/*.csv
You should replace marinecadastre-ais
with marinecadastre-ais-csv
if using 2015 or later data.
This example uses 8 threads, which for all of the 2009 & 2010 data (approx 3.5B records) took 15h on a 5 node cluster.