2

I have a dataset that contains a latitude and longitude written like 20.55E and 30.11N. I want to replace these direction strings with an appropriate - where required. So basically, I'll map based on the condition and change the value.
Currently, I have a Schema and I'm trying to sort out the TransformProcess

My Schema is like this:

new Schema.Builder()
                .addColumnTime("dt", DateTimeZone.UTC)
                .addColumnsDouble("AverageTemperature" , "AverageTemperatureUncertainty")
                .addColumnsInteger("City" , "Country")
                .addColumnsFloat("Latitude" , "Longitude")
                .build();  

And I'm stuck with my TransformProcess like this:

new TransformProcess.Builder(schema)
                .filter(new FilterInvalidValues("AverageTemperature" , "AverageTemperatureUncertainty"))
                .stringToTimeTransform("dt","yyyy-MM-dd", DateTimeZone.UTC)
                . // map currentLatitude -> remove direction string and put sign  

I am trying to follow this code from a tutorial and after the TransformProcess, I'll do the Spark stuff and save the data.

My question is:
How can I perform the mapping?
From the API docs of TansformProcess, I cannot make sense of anything that will help me solve my problem.
I am using the Datavec library in Deeplearning4J

Shankha057
  • 1,296
  • 1
  • 20
  • 38
  • 1
    So, you want to replace latitude and longitude with `-`? – Benjamin Urquhart May 21 '19 at 22:45
  • @BenjaminUrquhart If have S and W in for the latitude and longitude respectively, then I want to replace them with a `-`. – Shankha057 May 21 '19 at 22:49
  • So 30W becomes -30? Or am I not understanding – Benjamin Urquhart May 21 '19 at 22:51
  • @BenjaminUrquhart Yep, that's correct. And for 50S it becomes -50. – Shankha057 May 21 '19 at 22:52
  • could you create a function with a regex or indexOf that returns a 1 or -1 value based on the letter found in the string and then just use parseFloat on the string and multiple it by the returned value from the function? – Dani May 21 '19 at 23:13
  • @dprogramz I cannot access the value directly at this stage. The library does this underneath. To be honest, the actual data hasn't been read at this point, this is simply some sort of configuration definition that will be passed to some kind of `Spark` class which will do the actual reading and do the stuff. – Shankha057 May 21 '19 at 23:15
  • ahh, I see. I don't have a lot of Java experience, so this might be totally wrong. But, the hacky way I might try would be to create an intermediary schema for incoming data, convert it to what you need on the save, and then have a separate schema for the saved data that you actually use for things. – Dani May 21 '19 at 23:22
  • 1
    @dprogramz If I have to visualize this, then I would require a map function in either the `Schema` or the `TransformProcess`(which is usually the transformations specifier for the schema) which does the mapping. Either way, there is a requirement of a map, even if I have to hack it via an intermediate schema(I'm thinking of this like a "temp" variable). – Shankha057 May 21 '19 at 23:29
  • hopefully someone more seasoned than me can come up with a more elegant way, but the temp variable method would probably be my solution as well. – Dani May 21 '19 at 23:31

0 Answers0