1

I'm working on converting the existing recipe for Data Science Experience (DSX) to use data from a connected Sensor Tag device. However the mobile applications for that device send the data as strings rather than numerics - this is causing the DSX recipe that calculates a Z score to choke. The data is coming from a cloudant db used as a histtorian for Watson IoT Platform so I cant simply reformat it there. Is there a simple way to convert the data inside a DSX notebook ?

Philipp Langer
  • 328
  • 1
  • 8
Skilganon
  • 33
  • 4

2 Answers2

1

Just access the row object and convert it:

cloudantdata.rdd.map(lambda row : float(row.temperature)).take(10)

EDIT 30.1.17:

To directly address your question:

df = cloudantdata.selectExpr("timestamp as timestamp", "data.d.objectTemp as temperature").map(lambda row : (row.timestamp,float(row.temperature)))

That way you get a tuple RDD which IMHO anyway is more usable as a RowRDD

Romeo Kienzler
  • 3,373
  • 3
  • 36
  • 58
  • Sorry I'm not that familiar with this language - the existing line looks like this: df = cloudantdata.selectExpr("timestamp as timestamp", "data.d.objectTemp as temperature") -- how can i convert that to use the code you've posted above for the second column ? – Skilganon Jan 30 '17 at 13:45
  • Pandas is not recommended since it doesn't scale, spark dataframes and RDDs do – Romeo Kienzler Jan 30 '17 at 14:11
0

I'm not familiar with DSX but you can use node red to parse the information from devices then store it in cloudant db in numeric format

idan
  • 554
  • 6
  • 19
  • Thanks - I'm aware of that but I'm looking to use the built-in historian capability - i dont want to pre-process the data in node-red and put it into a different database so the only option is to convert it in DSX. – Skilganon Jan 30 '17 at 12:57