0

Problem Statement: I am trying to export a huge database using java. I have implemented it using queryForStream() of JDBCTemplate as it provides streaming feature. For normal entities, its working perfectly. For, let's say ~3GB of data, its consuming around 600MB of memory, which is fine. The problem comes when relationships are introduced in the entity. There I am unable to get the data of entities which are in relationship with other entity. Example: I have an entity E1 which has 1:many relationship with entity E2. When I hit my export API, it exports whatever is in the table of entity E1, which makes sense, as my sql query asks it to do it like that, while the columns which has relationships are coming blank as it does not exist in the table.

Any idea on how to proceed?

Edit: I cannot use mysqldump to export the data, as I am doing some processing before exporting it.

Edit: Main logic to hit DB: try (Stream<?> dataStream = template.queryForStream(new StreamingStatementCreator(sql), getEntityRowMapper(clazz, converter, context))) { //processing logic goes here }

StreamingStatementCreator.java

public class StreamingStatementCreator implements PreparedStatementCreator {
  private final String sql;

  public StreamingStatementCreator(String sql) {
    this.sql = sql;
  }

  @Override
  public PreparedStatement createPreparedStatement(Connection connection) throws SQLException {
    final PreparedStatement statement = connection.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
    statement.setFetchSize(Integer.MIN_VALUE);
    return statement;
  }
}

getEntityRowMapper(clazz, converter, context)):

  private EntityRowMapper<?> getEntityRowMapper(Class<?> domainType, JdbcConverter converter, JdbcMappingContext context) {
    return new EntityRowMapper<>(getRequiredPersistentEntity(domainType, context), converter);
  }

  private <S> RelationalPersistentEntity<S> getRequiredPersistentEntity(Class<S> domainType, JdbcMappingContext context) {
    return (RelationalPersistentEntity<S>) context.getRequiredPersistentEntity(domainType);
  }

Edit:

Table-1
country

CREATE TABLE `country` (
  `code` int(11) DEFAULT NULL,
  `updated_by` varchar(100) DEFAULT NULL,
  `created_by` varchar(100) DEFAULT NULL,
  `name` varchar(50) DEFAULT NULL,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `updated_on` datetime(3) DEFAULT NULL,
  `created_on` datetime(3) DEFAULT NULL,
  `version` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1
+------------+--------------+------+-----+---------+----------------+
| Field      | Type         | Null | Key | Default | Extra          |
+------------+--------------+------+-----+---------+----------------+
| code       | int(11)      | YES  |     | NULL    |                |
| updated_by | varchar(100) | YES  |     | NULL    |                |
| created_by | varchar(100) | YES  |     | NULL    |                |
| name       | varchar(50)  | YES  |     | NULL    |                |
| id         | int(11)      | NO   | PRI | NULL    | auto_increment |
| updated_on | datetime(3)  | YES  |     | NULL    |                |
| created_on | datetime(3)  | YES  |     | NULL    |                |
| version    | bigint(20)   | YES  |     | NULL    |                |
+------------+--------------+------+-----+---------+----------------+

Table-2
capital

CREATE TABLE `capital` (
  `name` varchar(50) DEFAULT NULL,
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `population` int(11) DEFAULT NULL,
  `country_capital_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `sys_country_capital_id` (`country_capital_id`),
  CONSTRAINT `fk__capital__country_country_capital_id_id` FOREIGN KEY (`country_capital_id`) REFERENCES `country` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 |
+--------------------+-------------+------+-----+---------+----------------+
| Field              | Type        | Null | Key | Default | Extra          |
+--------------------+-------------+------+-----+---------+----------------+
| name               | varchar(50) | YES  |     | NULL    |                |
| id                 | int(11)     | NO   | PRI | NULL    | auto_increment |
| population         | int(11)     | YES  |     | NULL    |                |
| country_capital_id | int(11)     | YES  | UNI | NULL    |                |
+--------------------+-------------+------+-----+---------+----------------+

Table-1 is in ONE_TO_ONE relationship with Table-2

Expected result is a csv file with the expected value would be:

|id  |code  |name   |capital                                            |createdBy   |createdOn     |updatedBy  |updatedOn      |version
|1   |91    |India  |{"id":1,"name":"New Delhi","population":1173902}   |System      |1.69E+09      |System     |1.69E+09       |0
|2   |91    |India  |{"id":2,"name":"New Delhi","population":1173902}   |System      |1.69E+09      |System     |1.69E+09       |0
|3   |91    |India  |{"id":3,"name":"New Delhi","population":1173902}   |System      |1.69E+09      |System     |1.69E+09       |0
|4   |91    |India  |{"id":4,"name":"New Delhi","population":1173902}   |System      |1.69E+09      |System     |1.69E+09       |0
Emperor
  • 11
  • 4
  • *I am trying to export a huge database using java.* Why? Why are you not using standard tools like mysqldump? – g00se Jun 16 '23 at 10:04
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jun 16 '23 at 10:16
  • @g00se its the requirement, as client doesn't have access to mysql – Emperor Jun 16 '23 at 10:35
  • @Mar-Z our whole system is built on JDBC, so switching to JPA would be my last resolve. Mapping to E1 is memory consuming as in that case, I would be holding data for the parent entity until I get the data of child entity. – Emperor Jun 16 '23 at 10:38
  • *@g00se its the requirement, as client doesn't have access to mysql*. So, you write an interface for them that uses these tools. Using an ORM layer to export real data doesn't make any sense – g00se Jun 16 '23 at 10:45
  • @g00se, ok then how would we manage the relationships there? It will simply give me the data that's present in the table. – Emperor Jun 16 '23 at 10:51
  • @Emperor via joins. I would not use ORM either for dumping GBs of data. I would use simple queries to get the data – Shadow Jun 16 '23 at 10:57
  • @Shadow, I probably missed to mention it in the question. But my MySQL server and tomcat server are not at the same place. For using mysql dump, I would need to fire a mysql client and hit the queries and then kill that mysql client. For large data it is good. But again for small data, it will take time. Ex: for KBs of data, it might take 3-4 mins – Emperor Jun 16 '23 at 11:03
  • @Shadow, writing direct queries for joining tables whose details will be changing on the fly would be a little tricky as well. – Emperor Jun 16 '23 at 11:08
  • @Mar-Z is it the data mapping you are talking about or structural mapping. If its structural mapping, then that's already done. Don't know how to do the data mapping between entities without holding it in the memory. – Emperor Jun 16 '23 at 11:29
  • The relationships *are* the data – g00se Jun 16 '23 at 11:30
  • Ok, so here is an example: I have 2 entities, E1 and E2. E1 has a column 'id' as primary key which is mapped to 'E1_id' column of E2. The relationship is 1:many (E1(1) -> E2(Many)). This information I have in hand. Now while exporting the data, I have, lets say 'id' of E1 as 5 against which, I have 2 values in E2. What I want is, if my row of E1 with 'id' 5 is exported, I want the 2 rows of E2 as well along with it. – Emperor Jun 16 '23 at 11:37
  • @Mar-Z yeah thats correct. Assume this thing changes on the fly and I don't know which column is mapped to which column. Then it would be difficult to generate the sql query. Right? – Emperor Jun 16 '23 at 11:50
  • If you don't know the structure of your database model beforehand then it will be not possible to fetch any relationship. IMHO – Mar-Z Jun 16 '23 at 12:54
  • @Mar-Z No what I am saying is, I have that structure, but generifying that query would still be a tricky task. – Emperor Jun 16 '23 at 13:23
  • Please provide the actual code you are using, not some vague prose description. – Jens Schauder Jun 19 '23 at 05:33
  • @Jens Schauder, though I had described the method I had used, I have updated the question with the actual code that I am using. Do let me know your thoughts. – Emperor Jun 19 '23 at 06:54
  • What are "columns which has relationships"? How does you SQL statement look like? How does your data look like? What is the exact result you get? – Jens Schauder Jun 23 '23 at 07:13
  • @Jens Schauder, the term "columns which has relationships" refers to tables which are related to each other via column. Currently I don't have a generic SQL statement, hence the question. Although, I am using "Select * from " to get all the data from the table. My data contains tables of MySQL which are in relationship. The result which I get when the given query is run is the details of the table given in the query. What I want is, the details of all the tables which are related to the parent table. – Emperor Jun 26 '23 at 06:33
  • Again: We don't need vague (and honestly contradictionary) prose. We need the SQL you are executing. The actual result and the expected result. – Jens Schauder Jun 26 '23 at 09:37
  • Actually, I can't generify the SQL statement, one of the reasons why I asked the question. And based on the sql, my actual result would vary. However, let me update the question with expected result. – Emperor Jun 26 '23 at 09:56
  • @Jens Schauder, any idea on how to approach? – Emperor Jul 04 '23 at 06:51
  • I just reread the question and all the comment, and I still don't understand what the problem / question is. The desired content of the last example can be trivially construct using close to trivial SQL. But in the comments you talk about "generifying" SQL statement which isn't explained anywhere. It is tagged with [spring-data-jdbc] but none of the concepts of Spring Data JDBC is to be seen, it's all Spring JDBC ... Sorry, I'm lost. – Jens Schauder Jul 04 '23 at 08:21
  • I added spring data jdbc to know if there is a way to do the job with spring data jdbc. My question is about getting the data from database, do some processing and write it into a csv file. Normally it works, but with relationship, it doesn't. Posted this question to know whether thats possible. :) – Emperor Jul 05 '23 at 07:58

0 Answers0