0

I installed and configured Geomesa using Docker containers. The versions I used for the various applications are:

  • Geomesa 4.0.1
  • Hadoop 2.10.2
  • Apache Accumulo 2.0.1
  • Apache Zookeepeer 3.7.1

To ingest a file I use the following command

geomesa-accumulo ingest --force -i accumulo -z zookeeper -u username -p myPassword -c myCatalog -f myFeature  /path/to/shapefile

I have been dealing with this error for days while trying to ingest a file.

2023-07-26 06:50:24,914 ERROR [org.locationtech.geomesa.tools.ingest.LocalConverterIngest] Fatal error running local ingest worker on /data/hdfs/file_geomesa/CD_Toscana/Gasdotto-45840-CD-Maggio-centroidi.shp
java.io.IOException: Error occurred trying to reproject data
        at org.geotools.data.store.ContentFeatureSource.getReader(ContentFeatureSource.java:723)
        at org.locationtech.geomesa.convert.shp.ShapefileConverter.parse(ShapefileConverter.scala:74)
        at org.locationtech.geomesa.convert2.AbstractConverter.process(AbstractConverter.scala:151)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$4(LocalConverterIngest.scala:172)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$4$adapted(LocalConverterIngest.scala:168)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573)
        at org.locationtech.geomesa.utils.collection.CloseableIterator$CloseableSingleIterator.foreach(CloseableIterator.scala:85)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$3(LocalConverterIngest.scala:168)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$3$adapted(LocalConverterIngest.scala:167)
        at org.locationtech.geomesa.utils.io.package$WithClose$.apply(package.scala:64)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$2(LocalConverterIngest.scala:167)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$2$adapted(LocalConverterIngest.scala:166)
        at org.locationtech.geomesa.utils.io.CloseablePool$CommonsPoolPool.borrow(CloseablePool.scala:68)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.run(LocalConverterIngest.scala:166)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Nothing to be reprojected! (check before using wrapper)
        at org.geotools.data.crs.ReprojectFeatureReader.<init>(ReprojectFeatureReader.java:152)
        at org.geotools.data.crs.ReprojectFeatureReader.<init>(ReprojectFeatureReader.java:117)
        at org.geotools.data.store.ContentFeatureSource.getReader(ContentFeatureSource.java:719)
        ... 19 more

Initially I thought the error was due to the large number of features, because on an older installation of Geomesa (version 3.1.1), trying to ingest the same file caused an error that seemed quite explicitly related to the number of columns (java.lang.ArrayIndexOutOfBoundsException). However, guessing that on Geomesa 4.0.1 the error was of a different nature, I ran numerous tests, either by changing environment configurations or by modifying the file in question with other tools.

I eventually found another file, having a very small number of features, that causes the same error when trying to ingest it, confirming that the error is not due to the large number of features. The real surprise was to find that instead, on the older version of Geomesa, this file is ingested correctly. The file is handled correctly up to Geomesa version 3.5.2, while it causes the above error starting from version 4.0.0. This leads me to the following question. Is it possible that in the new versions of Geomesa some bug was introduced that was not present in the older versions, or is it more likely that the problem is related to some configuration that should be done for the new versions?

Finally I was able to find a way to overcome the above stacktrace error. It was enough to delete the file with the .prj extension. Now the file with few columns is also ingested correctly in Geomesa 4.0.1, while the one with many columns causes the same error found in Geomesa 3.1.1. The corresponding stacktrace is as follows. I am not authorized to share the original shapefile, however, I was able to create one myself with many columns capable of triggering the same error. Possibly I can share it if you need it to reproduce the experiment, however from what I could see, the error should occur with any file having more than 600 columns.

at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter.writeFeature(GeoMesaFeatureWriter.scala:56)
        at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter.writeFeature$(GeoMesaFeatureWriter.scala:46)
        at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter$TableFeatureWriter.writeFeature(GeoMesaFeatureWriter.scala:151)
        at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter$GeoMesaAppendFeatureWriter.write(GeoMesaFeatureWriter.scala:239)
        at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter$GeoMesaAppendFeatureWriter.write$(GeoMesaFeatureWriter.scala:235)
        at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter$$anon$3.write(GeoMesaFeatureWriter.scala:111)
        at org.locationtech.geomesa.utils.geotools.FeatureUtils$.write(FeatureUtils.scala:147)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$8(LocalConverterIngest.scala:181)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$8$adapted(LocalConverterIngest.scala:179)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573)
        at org.locationtech.geomesa.utils.collection.CloseableIterator$FlatMapCloseableIterator.foreach(CloseableIterator.scala:132)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$7(LocalConverterIngest.scala:179)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$7$adapted(LocalConverterIngest.scala:173)
        at org.locationtech.geomesa.utils.io.CloseablePool$CommonsPoolPool.borrow(CloseablePool.scala:68)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$6(LocalConverterIngest.scala:173)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$6$adapted(LocalConverterIngest.scala:172)
        at org.locationtech.geomesa.utils.io.package$WithClose$.apply(package.scala:64)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$4(LocalConverterIngest.scala:172)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$4$adapted(LocalConverterIngest.scala:168)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573)
        at org.locationtech.geomesa.utils.collection.CloseableIterator$CloseableSingleIterator.foreach(CloseableIterator.scala:85)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$3(LocalConverterIngest.scala:168)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$3$adapted(LocalConverterIngest.scala:167)
        at org.locationtech.geomesa.utils.io.package$WithClose$.apply(package.scala:64)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$2(LocalConverterIngest.scala:167)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.$anonfun$run$2$adapted(LocalConverterIngest.scala:166)
        at org.locationtech.geomesa.utils.io.CloseablePool$CommonsPoolPool.borrow(CloseablePool.scala:68)
        at org.locationtech.geomesa.tools.ingest.LocalConverterIngest$LocalIngestWorker.run(LocalConverterIngest.scala:166)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2136
        at com.esotericsoftware.kryo.io.Output.writeByte(Output.java:226)
        at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.locationtech.geomesa.features.serialization.WkbSerialization.serializeWkb(WkbSerialization.scala:44)
        at org.locationtech.geomesa.features.serialization.WkbSerialization.serializeWkb$(WkbSerialization.scala:42)
        at org.locationtech.geomesa.features.kryo.serialization.KryoGeometrySerialization$.serializeWkb(KryoGeometrySerialization.scala:14)
        at org.locationtech.geomesa.features.kryo.impl.KryoFeatureSerialization$KryoGeometryWkbWriter$.apply(KryoFeatureSerialization.scala:229)
        at org.locationtech.geomesa.features.kryo.impl.KryoFeatureSerialization.writeFeature(KryoFeatureSerialization.scala:71)
        at org.locationtech.geomesa.features.kryo.impl.KryoFeatureSerialization.serialize(KryoFeatureSerialization.scala:43)
        at org.locationtech.geomesa.features.kryo.impl.KryoFeatureSerialization.serialize$(KryoFeatureSerialization.scala:41)
        at org.locationtech.geomesa.features.kryo.KryoFeatureSerializer$MutableActiveSerializer.serialize(KryoFeatureSerializer.scala:75)
        at org.locationtech.geomesa.index.api.WritableFeature$FeatureLevelWritableFeature.$anonfun$values$2(WritableFeature.scala:153)
        at org.locationtech.geomesa.index.api.package$KeyValue.value$lzycompute(package.scala:183)
        at org.locationtech.geomesa.index.api.package$KeyValue.value(package.scala:183)
        at org.locationtech.geomesa.accumulo.data.AccumuloIndexAdapter$AccumuloIndexWriter.$anonfun$write$1(AccumuloIndexAdapter.scala:397)
        at org.locationtech.geomesa.accumulo.data.AccumuloIndexAdapter$AccumuloIndexWriter.$anonfun$write$1$adapted(AccumuloIndexAdapter.scala:396)
        at scala.collection.immutable.Vector.foreach(Vector.scala:1895)
        at org.locationtech.geomesa.accumulo.data.AccumuloIndexAdapter$AccumuloIndexWriter.write(AccumuloIndexAdapter.scala:396)
        at org.locationtech.geomesa.index.api.IndexAdapter$BaseIndexWriter.write(IndexAdapter.scala:153)
        at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter.writeFeature(GeoMesaFeatureWriter.scala:50)
        ... 34 more
Christopher
  • 2,427
  • 19
  • 24
Luigi
  • 181
  • 3
  • 15
  • 1
    Is the shapefile shareable? This seems like a bug, and if you can share the shapefile that causes the issue so we can reproduce it, you can open a ticket at https://geomesa.atlassian.net/jira/software/c/projects/GEOMESA/issues/?filter=allissues – Emilio Lahr-Vivaz Jul 26 '23 at 13:33
  • The ArrayIndexOutOfBounds error is different, could you open a new post with that one? And sharing the file would definitely be helpful. – Emilio Lahr-Vivaz Jul 26 '23 at 15:43
  • Okay, I open a new post with the new stacktrace. Thank you very much for your interest. – Luigi Jul 26 '23 at 15:51
  • The new post https://stackoverflow.com/questions/76773268/geomesa-caused-by-java-lang-arrayindexoutofboundsexception I have added a link to download the file that generates the error. I was unable to open a ticket on JIRA as I do not have permissions. – Luigi Jul 26 '23 at 16:27
  • @Luigi please don't use the accumulo tag on geomesa questions. The question is not an Accumulo one, and only marginally related to Accumulo. Geomesa is an application that uses Accumulo, but the Accumulo developers have nothing to do with it. As a StackOverflow moderator and Accumulo dev who follows the accumulo topic here for actual Accumulo issues, I will continue to remove these marginally related tags. I'm not sure why geomesa users keep doing this. Do you know if geomesa itself recommending this practice? – Christopher Jul 26 '23 at 17:10
  • @Christopher I apologize for the misunderstanding. As an inexperienced user, I thought that the error I encounter might also have to do with Accumulo. – Luigi Jul 27 '23 at 06:03
  • @Luigi No worries. It's not a huge deal... I've just seen a lot of the geomesa questions tagged with accumulo, and was curious if that was being recommended somewhere or something. – Christopher Jul 28 '23 at 19:30

1 Answers1

0

From a quick look at the code it seems that the reprojection code thinks you don't have any geometry columns or they are already in the correct projection and so concludes there is nothing for it to do.

An ArrayIndexOutOfBounds would not be related to the number of features as GeoTools never reads the whole file to memory, it's more likely to be a mismatch between the expected and observed number of attributes, but with out the actual error log it's hard to say for sure.

Ian Turton
  • 10,018
  • 1
  • 28
  • 47
  • Ian, do you know if setting coordinateSystemReproject here is the problem? https://github.com/locationtech/geomesa/blob/f9ed222fd23afd24d23682f55512697bfc5ef151/geomesa-convert/geomesa-convert-shp/src/main/scala/org/locationtech/geomesa/convert/shp/ShapefileConverter.scala#L69 – Emilio Lahr-Vivaz Jul 26 '23 at 13:22
  • It might be - does your shapefile have a projection set? – Ian Turton Jul 26 '23 at 13:26
  • Not sure what shapefile being used here. GeoMesa 3.5 (which worked I guess) uses GeoTools 23, and the check was a bit [different](https://github.com/geotools/geotools/blob/3a74d70c88a384047c14ebcb77420d0bd0ba0fc2/modules/library/main/src/main/java/org/geotools/data/crs/ReprojectFeatureReader.java#L116). I'd guess GeoMesa needs to add a check for CRS before setting the reproject. – Emilio Lahr-Vivaz Jul 26 '23 at 13:31
  • Yes, if it is that there is a geometry column but it `!CRS.equalsIgnoreMetadata(original, target)` is false then you probably shouldn't create a reprojectingDataStore – Ian Turton Jul 26 '23 at 13:42
  • Finally I was able to find a way to overcome the above stacktrace error. It was enough to delete the file with the .prj extension. Now I get the java.lang.ArrayIndexOutOfBoundsException error again. I posted the complete stacktrace, – Luigi Jul 26 '23 at 15:23
  • 1
    I opened a ticket [here](https://geomesa.atlassian.net/browse/GEOMESA-3288) to track the original issue – Emilio Lahr-Vivaz Jul 26 '23 at 15:47