1

I have a Spring-cloud-dataflow server deployed on Pivotal Cloud Foundry. On the server, runs a pipeline of three spring-batch tasks. The pipeline is encapsulated inside a composed-task.

When I launch the execution of this composed-task, the composed-task-runner starts the first batch job execution. This first batch is connected to two different datasources: a shared metadata datasource for the Spring metadata schemas (SCDF, SCT & SB) and a business datasource for my business data. The databases are MySQL. The execution of this first task works fine, however when the composed-task-runner attempts to retrieve the task execution status from the task repository (metadata datasource), it throws the following exception and stops the whole pipeline:

org.springframework.dao.DeadlockLoserDataAccessException: 
PreparedStatementCallback; 
SQL [SELECT TASK_EXECUTION_ID, START_TIME, END_TIME, TASK_NAME, EXIT_CODE, EXIT_MESSAGE, ERROR_MESSAGE, LAST_UPDATED, EXTERNAL_EXECUTION_ID, PARENT_EXECUTION_ID from TASK_EXECUTION where TASK_EXECUTION_ID = ?]; 
(conn:56675) Deadlock found when trying to get lock; 
try restarting transaction 
Query is: SELECT TASK_EXECUTION_ID, START_TIME, END_TIME, TASK_NAME, EXIT_CODE, EXIT_MESSAGE, ERROR_MESSAGE, LAST_UPDATED, EXTERNAL_EXECUTION_ID, PARENT_EXECUTION_ID from TASK_EXECUTION where TASK_EXECUTION_ID = ?, parameters [2]; 
nested exception is 
java.sql.SQLTransactionRollbackException: (conn:56675) Deadlock found when 
trying to get lock; try restarting transaction 
Query is: SELECT TASK_EXECUTION_ID, START_TIME, END_TIME, TASK_NAME,EXIT_CODE, EXIT_MESSAGE, ERROR_MESSAGE, LAST_UPDATED, EXTERNAL_EXECUTION_ID, PARENT_EXECUTION_ID from TASK_EXECUTION where TASK_EXECUTION_ID = ?, parameters [2] 
at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:263) 
at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73) 
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:649) 
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:684) 
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:716) 
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:726) 
at org.springframework.jdbc.core.JdbcTemplate.queryForObject(JdbcTemplate.java:800) 
at org.springframework.cloud.task.repository.dao.JdbcTaskExecutionDao.getTaskExecution(JdbcTaskExecutionDao.java:262) 
at org.springframework.cloud.task.repository.support.SimpleTaskExplorer.getTaskExecution(SimpleTaskExplorer.java:52) 
at org.springframework.cloud.task.app.composedtaskrunner.TaskLauncherTasklet.waitForTaskToComplete(TaskLauncherTasklet.java:146)
at org.springframework.cloud.task.app.composedtaskrunner.TaskLauncherTasklet.execute(TaskLauncherTasklet.java:123)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:406) 
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:330) 
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133) 
at org.springframework.batch.core.s.

The code for the multiple datasource access from my spring-cloud-task / spring-batch is the following:

BatchConfiguration class:

@Profile("!test")
@Configuration
@EnableBatchProcessing
public class BatchJobConfiguration {

@Autowired
private JobBuilderFactory jobBuilderFactory; 

[...]

@Bean
public Step step01() {
    return stepChargementFoliosBuilder().buildStepChargement(); 
}

@Bean
public Step step02() {
    return stepChargementPretsBuilder().buildStepChargement(); 
}

@Bean
public Step step03() {      
    return stepChargementGarantiesBuilder().buildStepChargement(); 
}

@Bean
public Job job() {
    return jobBuilderFactory.get("Spring Batch Job: chargement_donnees_SEM")
        .incrementer(new JobParametersIncrementer() {

            @Override
            public JobParameters getNext(JobParameters parameters) {
                return new JobParametersBuilder().addLong("time", System.currentTimeMillis()).toJobParameters();
            }
        })
        .flow(step01())
        .on("COMPLETED").to(step02())
        .on("COMPLETED").to(step03())
        .end()
        .build(); 
}

@Primary
@Bean
public BatchConfigurer batchConfigurer(@Qualifier(JPAConfiguration.METADATA_DATASOURCE) DataSource datasource) {
    return new DefaultBatchConfigurer(datasource);
}

Task configuration class:

@Profile("!test")
@Configuration
@EnableTask
public class TaskConfiguration {

@Bean
public TaskRepositoryInitializer taskRepositoryInitializer(@Qualifier(JPAConfiguration.METADATA_DATASOURCE) DataSource datasource) {
    TaskRepositoryInitializer initializer = new TaskRepositoryInitializer(); 
    initializer.setDataSource(datasource);

    return initializer; 
}

@Bean
public TaskConfigurer taskConfigurer(@Qualifier(JPAConfiguration.METADATA_DATASOURCE) DataSource datasource) {
    return new DefaultTaskConfigurer(datasource); 
} 

Finally, here is the JPAConfiguration class:

@Profile("!test")
@Configuration
@EnableTransactionManagement
@EnableJpaRepositories (
basePackages = "com.desjardins.parcourshabitation.chargerprets.repository", 
    entityManagerFactoryRef = JPAConfiguration.BUSINESS_ENTITYMANAGER, 
    transactionManagerRef = JPAConfiguration.BUSINESS_TRANSACTION_MANAGER 
)
public class JPAConfiguration {

public static final String METADATA_DATASOURCE = "metadataDatasource";
public static final String BUSINESS_DATASOURCE = "businessDatasource"; 
public static final String BUSINESS_ENTITYMANAGER = "businessEntityManager"; 
public static final String BUSINESS_TRANSACTION_MANAGER = "businessTransactionManager"; 

@Primary
@Bean(name=METADATA_DATASOURCE)
public DataSource scdfDatasource() {
    return new DatasourceBuilder("scdf-mysql").buildDatasource(); 
}

@Bean(name=BUSINESS_DATASOURCE)
public DataSource pretsDatasource() {
    return new DatasourceBuilder("sem-mysql").buildDatasource(); 
}

@Bean(name=BUSINESS_ENTITYMANAGER)
public LocalContainerEntityManagerFactoryBean businessEntityManager(EntityManagerFactoryBuilder builder, @Qualifier(BUSINESS_DATASOURCE) DataSource dataSource) {

    return builder
        .dataSource(dataSource)
        .packages("com.desjardins.parcourshabitation.chargerprets.domaine")
        .build(); 
}

@Bean(name = BUSINESS_TRANSACTION_MANAGER)
public PlatformTransactionManager businessTransactionManager(@Qualifier(BUSINESS_ENTITYMANAGER) EntityManagerFactory entityManagerFactory) {
    return new JpaTransactionManager(entityManagerFactory);
}

Versions used:

  • Composed-task-runner: 1.0.0.RELEASE
  • Spring-cloud-task: 1.2.2.RELEASE

I have tried launching the composed-task with the interval-time-between-checks property set differently, yet this has not be conclusive.

I have uploaded a GitHub repository with a minimalistic version of the code, with instructions on how to reproduce in the readme file: https://github.com/JLauzonG/deadlock-bug-stackoverflow

Any clues how to solve this ?

Jeremy L-G
  • 21
  • 3
  • Thanks for the detailed write-up! Is it possible to share a simplified sample in a GH repo that reproduces the problem? That will help us to retry it on our side more easily. – Sabby Anandan Nov 08 '17 at 16:45
  • I have created a minimalistic version of our task and it's now uploaded here: https://github.com/JLauzonG/deadlock-bug-stackoverflow. The instructions to reproduce this bug are in the readme file... – Jeremy L-G Nov 08 '17 at 22:03
  • Hi, @jeremy-l-g. Thanks for taking the time to share the sample. This is not the solution, but I wanted to share my findings. 1) There's no option that I'm aware to supply 2-datasources to a `composed-task-runner` (CTR). It is not designed to handle >1 datasource. The java-buildpack's autoreconiguration _will_ fail, and it will skip connecting to either of the DBs. Instead, it will use the H2 database by default. Because the composed-task and SCDF will be on different DBs, the child-task execution will fail unable to find the task-id, which is persisted in SCDF's TaskRepository. – Sabby Anandan Nov 10 '17 at 03:16
  • 2) To somehow run the sample (to reproduce the problem), I changed the code to make both the datasources and SCDF configured to connect to the same DB. With that, I was able to run the composed-task (e.g., `task create lockzz --definition "d1: dlock && d2: dlock"`) from SCDF on PWS. I didn't see the deadlock error - I re-launched the composed-task a few times, and it did its job and shut down the containers at the end as expected. – Sabby Anandan Nov 10 '17 at 03:17
  • 3) The ideal approach to applying multiple datasources would be when we standardize the way SCDF accepts deployment properties for `composed-task-runner`. We currently have spring-cloud/spring-cloud-dataflow#1717, which addresses the model in which we would propagate the explicit binding to child-tasks. – Sabby Anandan Nov 10 '17 at 03:18
  • With all the above said, I'm still curious how to have had the CTR run with 2 different datasources on PCF. Maybe I misinterpreted something. – Sabby Anandan Nov 10 '17 at 03:20
  • Maybe my understanding for the flow of a composed task is wrong. 1) My SCDF server is binded to a datasource, say `foo`. I have set the environment variable `SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES` to the value `foo`, so that every task deployed on the server will inherit this datasource binding. When I launch the composed task, the CTR is the first task to be deployed on PCF. It is automatically binded to datasource `foo`. – Jeremy L-G Nov 10 '17 at 14:29
  • 2) The CTR proceeds on deploying the first child-task of the pipeline, which is my first spring-batch job. This spring-batch is binded to two datasources: `foo`, as binded by SCDF, and a second business datasource. With the help of its binded `foo` datasource, it updates the table `TASK_EXECUTION `. The CTR, which is simply doing a `Thread.sleep(interval)`, eventually wakes up and questions the TASK_EXECUTION table to know if the child-task has completed. If the CTR finds that effectively, the child-task has completed, it launches the next task in the pipeline, and so on... – Jeremy L-G Nov 10 '17 at 14:31
  • 3) The SCDF, at the end of the execution, uses the `foo` datasource to display on the dashboard the result of the whole pipeline execution, based on the individual updates made by each child-task. The problem is, when the CTR, with the help of `foo`, questions the TASK_EXECUTIOn table, the first child-task currently running seems to hold a lock on that table... – Jeremy L-G Nov 10 '17 at 14:33
  • 4) So to sum it up, the SCDF is binded to `foo`, the CTR is binded to `foo`, the child-tasks are binded both to `foo` and another business datasource. Since the deadlock seems to occur on code that is contained inside the spring packages, the only issue can be some misconfiguration / misunderstanding on my part. Maybe I understood wrong, and every child-task can be binded to only one datasource. In that case, I will consider not using the CTR, and orchestrate my tasks in such a way that after their respective batch execution, they will call SCDF's API to launch the next task in the pipeline. – Jeremy L-G Nov 10 '17 at 14:40
  • #1 is correct. I was following #2 as well; however, your comment is missing: "SCDF + CTR" together launch the child-tasks embedded in the composed-task graph (e.g., `"d1: dlock && d2: dlock"`). To be clear, CTR makes REST calls to SCDF to launch the child-tasks, and also, CTR keeps track of their execution and orchestrates the graph in the DSL order (e.g., `d1` runs first and then `d2`). When SCDF launches each task, they need to be also bound with 2-datasources. That won't happen unless we plug 2-datasources in `SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES` property. Does this help? – Sabby Anandan Nov 10 '17 at 14:46
  • In your case, SCDF + CTR are bound to `foo` - this is correct, and it can only have 1-datasource by design. I have a feeling you're pushing the child-tasks manually that are bound to 2-datasources and then expecting CTR will reuse the existing droplets - are you? If yes, then that's not how composed-task is supposed to be used. The child-tasks are _also_ created and launched by SCDF (via CTR). – Sabby Anandan Nov 10 '17 at 14:51
  • In every child task's manifest, I have set the second database instance to be bound when deployed. However, when SCDF deploys these tasks, the manifest defined services are disregarded. I have to manually bind a second database to each child task once SCDF has initially deployed them on PCF. If I bind two DB instances to the server environment variable, CTR will inherit and effectively, it will fail, which is, I believe, not an option. I'm puzzled, can child-tasks have access to several databases in a composed task context ? It seems like a pretty standard use-case. – Jeremy L-G Nov 10 '17 at 15:26
  • Also, if I bind a second database, to a child task, why would there be a deadlock on the metadata one ? Both are unrelated. – Jeremy L-G Nov 10 '17 at 15:27
  • (I will edit the answer - mistakenly added - with my final solution). – Jeremy L-G Nov 10 '17 at 15:29
  • _I'm puzzled, can child-tasks have access to several databases in a composed task context?_ => It is a very valid use-case! And we will support it as soon as [spring-cloud/spring-cloud-dataflow#1717](https://github.com/spring-cloud/spring-cloud-dataflow/pull/1717) is merged. This allows you to pass any no. of service bindings to a particular child-task (e.g., `deployer.d1.cloudfoundry.services=mysql-a,mysql-b,deployer.d2.cloudfoundry.services=mysql-c,mysql-d`). Where `d1` and `d2` are child-tasks. As far as the child-task knows how to handle 2 or more datasources, it will work. – Sabby Anandan Nov 10 '17 at 15:43
  • _Also, if I bind a second database, to a child task, why would there be a deadlock on the metadata one ?_ => I was hoping I'd see this in action - at least I cannot reproduce it. Maybe once #1717 is merged, we can revisit it. Also, I have created a gitter room for SCT -> https://gitter.im/spring-cloud/spring-cloud-task - if you want to discuss this more real-time, please ping there. – Sabby Anandan Nov 10 '17 at 15:47

1 Answers1

0

In every child task's manifest, I have set the second database instance to be bound when deployed. However, when SCDF deploys these tasks, the manifest defined services are disregarded. I have to manually bind a second database to each child task once SCDF has initially deployed them on PCF. If I bind two DB instances to the server environment variable, CTR will inherit and effectively, it will fail, which is, I believe, not an option.