0

I'm using Hive to select million records using JDBC now I want to work with the ResultSet to write my OWL file how can I apply this scenario using MapReduce?

Note: in case of no , I also have the database in mysql, can I work with resultset with MapReduce?

private static String driverName = "org.apache.hive.jdbc.HiveDriver";

public static void main(String[] args) throws OWLOntologyCreationException, OWLOntologyStorageException, SQLException, ClassNotFoundException, FileNotFoundException {
  // Register driver and create driver instance
  //        Class.forName("com.mysql.jdbc.Driver");
  Class.forName(driverName);
  // get connection
  //        Connection connection = DriverManager.getConnection("jdbc:mysql://127.0.0.1:3306/foodProducts", "root", "root");
  Connection connection = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "", "");
  //        String SQL = "SELECT p.NDB_Number, p.Long_Name ,n.Output_Value, n.Output_UOM, n.Nutrient_Name, \n"
  //                + "p.Ingredients, ss.Household_Serving, ss.Household_Serving_Size_UOM, ss.Serving_Size, ss.Serving_Size_UOM\n"
  //                + "FROM foodProducts.Products as p left join foodProducts.Nutrients as n on n.NDB_No = p.NDB_Number\n"
  //                + "left join foodProducts.ServingSize as ss on ss.NDB_No = p.NDB_Number";
  String SQL = "SELECT  p.NDB_Number, p.Long_Name ,n.Output_Value,n.Output_UOM, n.Nutrient_Name ,p.Ingredients, ss.Household_Serving, " +
   "ss.Household_Serving_Size_UOM, ss.Serving_Size, ss.Serving_Size_UOM FROM products as p left join nutrients as n on n.NDB_No = p.NDB_Number " +
   "left join servingsize as ss on ss.NDB_No = p.NDB_Number ";

  connection.setAutoCommit(false);
  PreparedStatement stmt = connection.prepareStatement(SQL, java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
  //        stmt.setFetchSize(Integer.MIN_VALUE);
  stmt.setFetchSize(10000);
  stmt.execute();
  ResultSet res = stmt.getResultSet();

  OWLOntologyManager manager = OWLManager.createOWLOntologyManager();
  // Load the local copy
  File owlFile = new File("src/main/resources/thesisOWL.owl");
  OWLOntology foodOwl = manager.loadOntologyFromOntologyDocument(owlFile);
  File owlDataFile = new File("src/main/resources/thesisOWLData.owl");

  OWLDataFactory dataFactory = manager.getOWLDataFactory();
  String base = "http://www.semanticweb.org/omar/ontologies/2019/10/untitled-ontology-69#";
  PrefixManager pm = new DefaultPrefixManager(base);

  while (res.next()) {

   String foodItemName = res.getString(2);
   String outputValue = res.getString(3);
   ...
   ....
   ....
GBouffard
  • 1,125
  • 4
  • 11
  • 24
  • where is the problem now? You're already iterating over the resultset, so why can't you add the data to the OWL ontology? – UninformedUser Nov 16 '19 at 07:58
  • the problem is that it takes too many time to iterate , maybe more than 6 hrs and never ended ! – Sarah S. Abulwafa Nov 16 '19 at 11:15
  • ok, but what do you expect now? at some point the data has to be generated and your large data is in a Hive database. If Hive doesn't provide and export function I don't see how you could avoid this. It's also not clear about how many results we're talking about. Your code shows a limit of 10000, that can't take 6h. And without a limit indeed it will take longer. It's also clear that your local machine might be too slow. – UninformedUser Nov 17 '19 at 10:47
  • Of course you could also try to run multiple queries in parallel and create OWL out of it. For simple strings could could also directly dump from Hive to disk, I mean, you could just create N-Triples, i.e. each line is a triple and Hive can export CSV or TSV. That's pretty simple – UninformedUser Nov 17 '19 at 10:48

0 Answers0