Does anyone know if the Cerner Bunsen library (https://github.com/cerner/bunsen) will load FHIR R4 bundles and persist the data to spark sql databases? If anyone can offer any guidance or point me to any, that would be great. At the moment I'm just trying to load a bundled sample from https://simplifier.net/ukcore. The ultimate objective is to persist incoming Bundles to a hive database to be accessed by Apache Spark clusters.
The sample code to try to load a single entry Bundle is:
Bundles bundles = Bundles.forR4();
URL fileUrl = R4Test.class.getClassLoader().getResource("ukcore/UKCore-AllergyIntolerance-Amoxicillin-Example.json");
JavaRDD bundlesRdd = bundles.loadFromDirectory(spark, fileUrl.toExternalForm(), 200);
Object c = bundlesRdd.collect();
bundles.saveAsDatabase(spark, bundlesRdd, "r4database", "AllergyIntolerance");
On the bundlesRdd.collect()
I get the following warnings:
INFO WholeTextFileRDD: Input split: Paths:/path/to/ukcore/UKCore-AllergyIntolerance-Amoxicillin-Example.json:0+2017
WARN LenientErrorHandler: Unknown element 'meta' found while parsing
WARN LenientErrorHandler: Unknown element 'clinicalStatus' found while parsing
WARN LenientErrorHandler: Unknown element 'verificationStatus' found while parsing
WARN LenientErrorHandler: Unknown element 'type' found while parsing
WARN LenientErrorHandler: Unknown element 'category' found while parsing
WARN LenientErrorHandler: Unknown element 'code' found while parsing
WARN LenientErrorHandler: Unknown element 'patient' found while parsing
WARN LenientErrorHandler: Unknown element 'encounter' found while parsing
WARN LenientErrorHandler: Unknown element 'recordedDate' found while parsing
WARN LenientErrorHandler: Unknown element 'recorder' found while parsing
WARN LenientErrorHandler: Unknown element 'asserter' found while parsing
WARN LenientErrorHandler: Unknown element 'reaction' found while parsing
And when trying to saveAsDatabase()
it fails with:
java.lang.IllegalArgumentException: Unsupported FHIR version: R4
at com.cerner.bunsen.definitions.StructureDefinitions.create(StructureDefinitions.java:120)
at com.cerner.bunsen.spark.SparkRowConverter.forResource(SparkRowConverter.java:75)
at com.cerner.bunsen.spark.SparkRowConverter.forResource(SparkRowConverter.java:54)
at com.cerner.bunsen.spark.Bundles.extractEntry(Bundles.java:211)
at com.cerner.bunsen.spark.Bundles.saveAsDatabase(Bundles.java:290)
I'm currently running with the following dependencies:
<dependencies>
<dependency>
<groupId>com.cerner.bunsen</groupId>
<artifactId>bunsen-r4</artifactId>
<version>0.4.5</version>
</dependency>
<dependency>
<groupId>com.cerner.bunsen</groupId>
<artifactId>bunsen-core</artifactId>
<version>0.5.7</version>
</dependency>
<dependency>
<groupId>com.cerner.bunsen</groupId>
<artifactId>bunsen-spark</artifactId>
<version>0.5.7</version>
</dependency>
<!--
to resolve java.lang.IllegalAccessError:
"tried to access method com.google.common.base.Stopwatch.<init>()V from class
org.apache.hadoop.mapreduce.lib.input.FileInputFormat"
-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
</dependency>
<!-- Spark dependencies -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.5</version>
</dependency>
</dependencies>
Many thanks,
Dave