I'm working on converting the existing recipe for Data Science Experience (DSX) to use data from a connected Sensor Tag device. However the mobile applications for that device send the data as strings rather than numerics - this is causing the DSX recipe that calculates a Z score to choke. The data is coming from a cloudant db used as a histtorian for Watson IoT Platform so I cant simply reformat it there. Is there a simple way to convert the data inside a DSX notebook ?
Asked
Active
Viewed 86 times
2 Answers
1
Just access the row object and convert it:
cloudantdata.rdd.map(lambda row : float(row.temperature)).take(10)
EDIT 30.1.17:
To directly address your question:
df = cloudantdata.selectExpr("timestamp as timestamp", "data.d.objectTemp as temperature").map(lambda row : (row.timestamp,float(row.temperature)))
That way you get a tuple RDD which IMHO anyway is more usable as a RowRDD

Romeo Kienzler
- 3,373
- 3
- 36
- 58
-
Sorry I'm not that familiar with this language - the existing line looks like this: df = cloudantdata.selectExpr("timestamp as timestamp", "data.d.objectTemp as temperature") -- how can i convert that to use the code you've posted above for the second column ? – Skilganon Jan 30 '17 at 13:45
-
Pandas is not recommended since it doesn't scale, spark dataframes and RDDs do – Romeo Kienzler Jan 30 '17 at 14:11
0
I'm not familiar with DSX but you can use node red to parse the information from devices then store it in cloudant db in numeric format

idan
- 554
- 6
- 19
-
Thanks - I'm aware of that but I'm looking to use the built-in historian capability - i dont want to pre-process the data in node-red and put it into a different database so the only option is to convert it in DSX. – Skilganon Jan 30 '17 at 12:57