0

I'm trying to store geoshape (like the following) to ES via pig using org.elasticsearch.hadoop.pig.EsStorage (2.2.0) :

{
    "location" : {
        "type" : "circle",
        "coordinates" : [-45.0, 45.0],
        "radius" : "100m"
    }
}

or :

{
    "location" : {
        "type" : "polygon",
        "orientation" : "clockwise",
        "coordinates" : [
            [ [-177.0, 10.0], [176.0, 15.0], [172.0, 0.0], [176.0, -15.0], [-177.0, -10.0], [-177.0, 10.0] ],
            [ [178.2, 8.2], [-178.8, 8.2], [-180.8, -8.8], [178.2, 8.8] ]
        ]
    }
}

We tried the following:

REGISTER ./elasticsearch-hadoop-2.2.0.jar;

loadedRecords = LOAD 'inputFile.csv' USING PigStorage('|') AS (type:chararray,coordinates:bag{(float,float)},radius:chararray);

elasticData = foreach loadedRecords GENERATE (type ,{(45.0f,46.0f)},radius) AS geoArea:tuple(type:chararray,coordinates:bag{(float,float)},radius:chararray);

DESCRIBE elasticData ;

DUMP elasticData;

STORE elasticData INTO 'myindex/mytype' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.retries=10','es.nodes=localhost','es.index.auto.create=true','es.mapping.pig.tuple.use.field.names=false');

and receiving an error while parsing the coordinates it encountered a non numeric value and failed. (type was parsed to CIRCLE)

We tried also the following:

I tried another thing but this was problematic as well:

REGISTER ./elasticsearch-hadoop-2.2.0.jar;

loadedRecords = LOAD 'inputFile.csv' USING PigStorage('|') AS (type:chararray,coordinates:chararray,radius:chararray);

--elasticData = foreach loadedRecords GENERATE (type ,{(45.0f,46.0f)} ,radius) AS geo:tuple(type:chararray,coordinates:bag{(float,float)},radius:chararray;
elasticData = foreach loadedRecords GENERATE TOMAP('type','circle','coordinates','[40.0f,46.0f]','radius','150m') AS geo:map[chararray];
DESCRIBE elasticData ;

DUMP elasticData;

STORE elasticData INTO 'myindex/mytype' USING org.elasticsearch.hadoop.pig.EsStorage('es.http.retries=10','es.nodes=host','es.index.auto.create=true','es.mapping.pig.tuple.use.field.names=false');

received:

Caused by: com.fasterxml.jackson.core.JsonParseException: Current token (END_OBJECT) not numeric, can not use numeric value accessors
 at [Source: org.elasticsearch.common.io.stream.InputStreamStreamInput@20063f76; line: 1, column: 83]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1581)
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:533)
    at com.fasterxml.jackson.core.base.ParserBase._parseNumericValue(ParserBase.java:799)
    at com.fasterxml.jackson.core.base.ParserBase.getDoubleValue(ParserBase.java:713)
    at org.elasticsearch.common.xcontent.json.JsonXContentParser.doDoubleValue(JsonXContentParser.java:180)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.doubleValue(AbstractXContentParser.java:184)
    at org.elasticsearch.common.xcontent.support.AbstractXContentParser.doubleValue(AbstractXContentParser.java:174)
    at org.elasticsearch.common.geo.builders.ShapeBuilder.parseCoordinates(ShapeBuilder.java:248)
    at org.elasticsearch.common.geo.builders.ShapeBuilder.access$100(ShapeBuilder.java:46)
    at org.elasticsearch.common.geo.builders.ShapeBuilder$GeoShapeType.parse(ShapeBuilder.java:744)
    at org.elasticsearch.common.geo.builders.ShapeBuilder.parse(ShapeBuilder.java:291)

Did someone store geoshape to ES using pig and can help us?

Thanks!

roh
  • 123
  • 1
  • 1
  • 10
  • so got the error while loading records from inputFile.csv? can you post a sample the content of inputFIle.csv – Vikas Madhusudana May 27 '16 at 03:23
  • looks like loading itself if failing this might be because you have tow types geoshape one with orientation and one without – Vikas Madhusudana May 27 '16 at 09:49
  • hi I added some information to my original post. Regarding your question: The csv looks like: circle,**coordinates**,100m. it has just one line. And in the code I've attached, I'm ignoring the coordinates input and tried to insert hard-coded coordinates. I think the load was ok.. In my second try (with the TOMAP) I ignored the load and try store hard coded shape... can you explain a little about the orientation? Thanks! – roh May 27 '16 at 09:53
  • just try loading the records (loadedRecords) and dumping it i feel this is because you dont have uniform data (same number of columns) in csv – Vikas Madhusudana May 27 '16 at 09:55
  • The load command will not run any map reduce job it is just a place holder to the the load data. you have to have a dump command next to it just load csv and dump the relation – Vikas Madhusudana May 27 '16 at 09:56
  • But I see that data from the csv is loaded because I found that while debugging the 'ShapeBuilder' class of ES I got the right values from the load.. It failes on 'parseCoordinates' method. I believe I have some problem with the construction of the tuple/bag/string of the coordinates that I need to pass to ES. – roh May 27 '16 at 09:58
  • @VikasMadhusudana - do you have any idea regarding my last comment? Thanks :) – roh May 27 '16 at 19:34
  • Can try to put a row to ES using curl. – Vikas Madhusudana May 28 '16 at 01:07
  • I did it and succeeded.. The problem is just through PIG...can't find the right way to do that. – roh May 28 '16 at 13:53
  • so you mean to say the same row that is dumped from pig can be loaded to ES. why don't to write a python udf that will load the rows to ES using requests library – Vikas Madhusudana May 28 '16 at 13:55
  • I'll explain again - the row that dumped from pig wasn't stored properly to ES because there is an error. The problem is not PIG itself, but the row that pig generates. BTW I'm using java, and prefer to do that within pig this time (unless I wont have other choice). Do you happen to have/you can write short sample or PIG code that will help me? (just the line that generates the data, 'elasticData' in my code.) Thanks a lot. – roh May 28 '16 at 14:13
  • Can you provide a sample of row that works and a row from pig. – Vikas Madhusudana May 28 '16 at 14:18
  • Row from pig: dump result With TOMAP: ([type#circle,coordinates#[40.0f,46.0f],radius#150m]). for the dump result I showed, ES received as an input (I debugged and found that) the following line: [{"Geo":{"radius":"150m","type":"circle","coordinates":"[40.0f,46.0f]"}}]} Line that works: { "location" : { "type" : "circle", "coordinates" : [40.0, 46.0], "radius" : "150m" } } – roh May 28 '16 at 16:07
  • My major problem is to construct the right structure for Coordinates element. (It could be an array of points, or array of array of points) Thanks @VikasMadhusudana – roh May 28 '16 at 16:09
  • I guess the parser is looking for a number and it is not getting it can you have [40.0,46.0] instead of "[40.0, 46.0f]". Also try to create a relation with the working line in a file and try to upload to ES using EsStorage and see whether that works. – Vikas Madhusudana May 28 '16 at 16:18

1 Answers1

0

Can you show mapping for this index? Some time ago I had similar issues with coordinates in Pig - what I did is:

  1. in ES schema I defined location

    "location": {   
          "type": "geo_point"   
    }
    
  2. generated location as TOTUPLE(longitude,latitude)

Hope it helps.

rrydziu
  • 56
  • 8