8.6. Delimited Text Converter

The delimited text converter handles plain delimited text files such as CSV or TSV. To use the delimited text converter, specify type = "delimited-text" in your converter definition.

8.6.1. Configuration

The format of the delimited files must be defined using the format element. GeoMesa uses Apache Commons CSV for parsing. The available formats are instances of org.apache.commons.csv.CSVFormat:

  • DEFAULT or CSV: CSVFormat.DEFAULT
  • TDF or TSV: CSVFormat.TDF
  • QUOTED: CSVFormat.DEFAULT.withQuoteMode(QuoteMode.ALL)
  • QUOTE_ESCAPE: CSVFormat.DEFAULT.withEscape('"')
  • QUOTED_WITH_QUOTE_ESCAPE: CSVFormat.DEFAULT.withEscape('"').withQuoteMode(QuoteMode.ALL)
  • EXCEL: CSVFormat.EXCEL
  • MYSQL: CSVFormat.MYSQL
  • RFC4180: CSVFormat.RFC4180

In addition, GeoMesa supports custom quote, escape and delimiter characters, which can be used to modify the base format. These can be specified through options.quote, options.escape and options.delimiter. Quotes and escapes can be disabled by setting the option to an empty string.

If the input files have header lines, they can be skipped over by specifying a number of lines to skip using options.skip-lines, e.g. options.skip-lines = 1.

8.6.2. Transform Functions

The transform element supports referencing each field in the record by its column number using $. $0 refers to the whole line, then the first columns is $1, etc. Each column will initially be a string, so further transforms may be necessary to create the correct type. See Transformation Function Overview for more details.

8.6.3. Example Usage

Suppose you have a SimpleFeatureType with the following schema:

phrase:String,dtg:Date,*geom:Point:srid=4326

And you have the following comma-separated data:

first,hello,2015-01-01T00:00:00.000Z,45.0,45.0
second,world,2015-01-01T00:00:00.000Z,45.0,45.0

We want to concatenate the first two fields together to form the phrase, parse the third field as a date, and use the last two fields as coordinates for a Point geometry. The following configuration defines an appropriate converter for taking this CSV data and transforming it into our SimpleFeatureType:

geomesa.converters.example = {
  type     = "delimited-text",
  format   = "CSV",
  id-field = "md5(stringToBytes($0))",
  fields = [
    { name = "phrase", transform = "concatenate($1, $2)" },
    { name = "dtg",    transform = "dateHourMinuteSecondMillis($3)" },
    { name = "lat",    transform = "$4::double" },
    { name = "lon",    transform = "$5::double" },
    { name = "geom",   transform = "point($lon, $lat)" }
  ]
  user-data = {
    // note: keys will be treated as strings and should not be quoted
    my.user.key = "$phrase"
  }
}

The id of the SimpleFeature is formed from an MD5 hash of the entire record ($0 is the original data). The simple feature attributes are created from the fields list with appropriate transforms (note the use of intermediate fields ‘lat’ and ‘lon’). If desired, user data for the feature can be set by referencing fields. This can be used for setting Accumulo visibility constraints, among other things (see Accumulo Visibilities).