2

We are creating a web script in Alfresco (Java based one). At certain interval this webscript should download files (lots of files) from a remote system and process them and version the files in Alfresco.

Now, this web script will be triggered from a Jenkin box, so we are planning to poll this web script status whether it is completed or not until the whole process complete the process. This going to be happen regularly say every day or weekly basis.

How can I make the webscript to send the intermediate response regularly to Jenkin job and proceed with process. Once all processes completed, the same webscript call should send completed status to the jenkin box.

How can I achieve this?

Note: I can not use Cron. Only can use Jenkin as input to webscript will be send from the Jenkin (which is received from different product).

sabtharishi
  • 3,141
  • 5
  • 24
  • 27
  • I'd suggest you look at the Replication and Transfer services in Alfresco - those have long-running jobs and webscripts which allow you to check the status. That's probably your best model to learn from built into Alfresco – Gagravarr Jan 15 '16 at 09:28
  • @Gagravarr, thanks for your input. I will look into them. – sabtharishi Jan 15 '16 at 17:00

2 Answers2

3

I will describe below how to implement a batch process in Alfresco. Before going into details, I would like to suggest also to integrate this process with Activiti workflows (or JBPM if you prefer).

As described later, the process will send events to notify listeners on the progress of the job. The listener of these events can call Jenkins directly.

Instead of calling directly Jenkins, the listener can update a workflow. In this case the logic to call Jenkins will be implemented in a workflow task. This makes it easier to separate the logic of the batch process from the logic of the notifier. Moreover, the workflow could be used also to store information about the progress of the job. This information can be eventually polled by someone/something interested.

Long running process:

I do not know what version of Alfresco you are using and I will describe a solution for version 4.1. Alfresco supports long running batch processes mainly with classes and interface in the package org.alfresco.repo.batch:

BatchProcessWorkProvider

BatchProcessor

BatchProcessor.BatchProcessWorker

BatchMonitor

BatchMonitorEvent.java

You will need to provide implementation for the two interfaces: BatchProcessorWorkProvider and BatchProcessor.BatchProcessWorker:

Both interfaces are attached below. The first one returns the work loads and the second defines what a worker is.

BatchProcessor:

/**
 * An interface that provides work loads to the {@link BatchProcessor}.
 * 
 * @author Derek Hulley
 * @since 3.4
 */
public interface BatchProcessWorkProvider<T>
{
    /**
     * Get an estimate of the total number of objects that will be provided by this instance.
     * Instances can provide accurate answers on each call, but only if the answer can be
     * provided quickly and efficiently; usually it is enough to to cache the result after
     * providing an initial estimate.
     * 
     * @return                  a total work size estimate
     */
    int getTotalEstimatedWorkSize();
    
    /**
     * Get the next lot of work for the batch processor.  Implementations should return
     * the largest number of entries possible; the {@link BatchProcessor} will keep calling
     * this method until it has enough work for the individual worker threads to process
     * or until the work load is empty.
     * 
     * @return                  the next set of work object to process or an empty collection
     *                          if there is no more work remaining.
     */
    Collection<T> getNextWork();
}

BatchProcessWorker:

/**
 * An interface for workers to be invoked by the {@link BatchProcessor}.
 */
public interface BatchProcessWorker<T>
{
    /**
     * Gets an identifier for the given entry (for monitoring / logging purposes).
     * 
     * @param entry
     *            the entry
     * @return the identifier
     */
    public String getIdentifier(T entry);

    /**
     * Callback to allow thread initialization before the work entries are
     * {@link #process(Object) processed}.  Typically, this will include authenticating
     * as a valid user and disbling or enabling any system flags that might affect the
     * entry processing.
     */
    public void beforeProcess() throws Throwable;

    /**
     * Processes the given entry.
     * 
     * @param entry
     *            the entry
     * @throws Throwable
     *             on any error
     */
    public void process(T entry) throws Throwable;

    /**
     * Callback to allow thread cleanup after the work entries have been
     * {@link #process(Object) processed}.
     * Typically, this will involve cleanup of authentication and resetting any
     * system flags previously set.
     * <p/>
     * This call is made regardless of the outcome of the entry processing.
     */
    public void afterProcess() throws Throwable;
}

In practice BatchProcessWorkProvider returns a collection of "work to do" (the "T" class). The "work to do" is a class that you need to provide. In your case this class can provide the information to extract a subset of the files from the remote system. The method process will use this information to actually do the job. Just as an example, in your case, we can call T, ImportFiles.

Your BatchProcessWorkProvider should partition the list of files into a collection of ImportFiles of a reasonable size.

The "most important" method in BatchProcessWorker is

public void process(ImportFiles filesToImport) throws Throwable;

This is the method that you have to implement. For the other methods there is an adapter BatchProcess.BatchProcessWorkerAdapter that provides a default implementation.

The process method receive as paramter an ImportFiles and can use it to find the files in the remote servers and import them.

Finally, you need to instantiate a BatchProcessor:

try {
    final RetryingTransactionHelper retryingTransactionHelper = transactionService.getRetryingTransactionHelper();
    BatchProcessor<ImportFiles> batchProcessor = new BatchProcessor<ImportFiles>(processName,
            retryingTransactionHelper, workProvider, threads, batchSize,
            applicationEventPublisher, logger, loggingInterval);
    batchProcessor.process(worker, true);
} 
catch (LockAcquisitionException e) {
    /* Manage exception */
} 

Where

processName: a description of the long running process

workProvider an instance of the BatchProcessWorkProvider

threads: the number of worker threads (in parallel)

batchSize: the number of entries to process in the same transaction

logger: the logger to use for reporting the progress

loggingInterval: the number of entries to process before reporting progress

retryingTransactionHelper: is the helper class to retry the transaction if there is a failure for concurrent update (an optimistic locking) or deadlock condition.

applicationEventPublisher: this is an instance of the Spring ApplicationEventPublisher that is usually (and also for Alfresco) the Spring ApplicationContext.

To send events to Jenkins you can use the applicationEventPublisher. The following link describes how to use it. It is a standard functionality of Spring.

Spring events

An event can be, for example sent by the method

process(ImportFiles filesToImport)

described above.

Community
  • 1
  • 1
Marco Altieri
  • 3,726
  • 2
  • 33
  • 47
1

I will not argue your choice of a webscript to implement your logic, although I am not 100 % OK with it.

As for your question, you can store the overall progress status of your job/logic execution in some singleton and have an other wesbcript (or just the same one with different parameters) return that value for you.

Younes Regaieg
  • 4,156
  • 2
  • 21
  • 37