1

everyone. I am currently maintaining an old system using struts2 and hibernate. There is a function which is formatting data in Excel then download it according to the query parameters. Usually, 3000 records was added per day, and usually we just need to check today's data, which is down about 3000 records in Excel format.

It works fine. Then I got a new demand, for every record, I need to get two position information about the record. Well, the position information doesn't exist in our database. I need to request two interfaces to get them. I simply do this in a for loop, that is when I got the data returned from database, then loop through the array list, for every record, request for the two position information it needed. Until I got all the message, then I format them in Excel, and then respond to front end.

The problem is, the server responds too slow. When it about 200、300 records, it works fine. But when it comes to 3000 records, I got a 504 timeout error.

I don't think 3000 records is large, but I think for every record sending a request then parse the respond message is time consuming. I think I'm doing it the wrong way, but I have little experience dealing with this situation. So, can I get some suggestion? Thanks in advance.

Edit: I record time when downloading 200 records, with and without external requests, and the pseudo code here. I thought the external requests is the main reason. with external requests with external requests with external requests without external requests without external requests without external requests

    public byte[] methodName() {

  // 13 title
  String[] title = { "title1", "title2", "title3", ... , "title13" };
  // result is the records returned from db select
  String[][] data = new String[result.size()*20][title.length];

  int row = 0, col = 0;

  SomeEntity someEntity = null;

  System.out.println("with external requests");
  System.out.println("before loop-->" + System.currentTimeMillis() + " - " + new Date().toString());

  for (Object object : result) {
    someEntity = (SomeEntity) object;
    col = 0;
    data[row][col++] = someEntity.getTitle1();
    data[row][col++] = someEntity.getTitle2();
    // add other data
    ...
    // get location, two similar requests
    data[row][col++] = getXXXLocation(someEntity.getLongitude(), someEntity.getLatitude());
    data[row][col++] = getXXXLocation(someEntity.getMctId(), someEntity.getTerId());
    row++;
  }
  // then generate the ExcelModel
  System.out.println("after loop-->" + System.currentTimeMillis() + " - " + new Date().toString());
  ExcelModel excelModel = new ExcelModel("filename");
  excelModel.addColumnsTitle(title);
  System.out.println("before generate excel-->" + System.currentTimeMillis() + " - " + new Date().toString());
  byte[] aws = excelModel.generateExcelFile(data);
  System.out.println("after generate excel-->" + System.currentTimeMillis() + " - " + new Date().toString());

  return aws;
}
uncle liu
  • 37
  • 7
  • "server responds too slow" Need a little more detail here. Do you have the correct indexes, etc? This question is too broad. – OldProgrammer Oct 20 '18 at 14:56
  • 1
    If there is no way to make that fast enough (for example by sending a single batch request to the external service), then you should make the process asynchronous: send a first request to trigger a download, respond immediately with some using process identifier, and start the Excel generation in a background thread. Then, from the client, poll a second resource passing the process identifier, until the process has ended and the data is available for download. – JB Nizet Oct 20 '18 at 15:00
  • @OldProgrammer Sorry I didn't make it clear. It's just a simple select, no indexes, only 3000 records per query, which I thought not much amount. The problem is for each records db returns, I need to send two request for external service, after I collect all the information I need, then respond to the client, but the client already shows a 504 error. In summary, just use a plain select statement to select all the records in an array list, then loop through the array list, for every entry, send two request to external service, get the additional message, after finish the loop, send back. – uncle liu Oct 21 '18 at 01:36
  • @OldProgrammer I hope this time clearer. Thanks. – uncle liu Oct 21 '18 at 01:36
  • @JBNizet Thanks for your reply, but I didn't fully understand, would you please make it clearer? A single batch request, did you mean sending a single request to external service to retrieve all the information instead of loop through every records then sending request just for one record? And if I use the asynchronous method, what should I immediately respond to the client? One-third data of the whole records? And then send another request with process identifier to get the remain records? It's just a plain http request to down Excel, not ajax, I even don't know whether the respond finished. – uncle liu Oct 21 '18 at 02:15
  • @JBNizet Would you please explain it further? Thanks. – uncle liu Oct 21 '18 at 02:15
  • The problem is not the database, it is the **6000 interface calls** you make. So you need to break down the problem. Is it network lag? Is the remote server unable to cope? Is the query (or whatever) to get the position data suboptimal? Perhaps a bulk interface which takes a payload of 3000 pairs and returns a payload of all positions would solve the problem. Perhaps you can get away with increasing the timeout limit. But there are too many unknowns for us to offer concrete advice. – APC Oct 21 '18 at 06:48
  • @APC Thanks for reply, you remind me of something that I missed. Well, the remote server works fine. The query to get the position data is simply an get request with longitude and latitude as param, for every record I made two request to get two position. The interfaces didn't offer a payload of 3000 pairs, then are just two different APIs, and the only way to use them is send only one longitude and latitude each time, a bunch of longitude and latitude is not allowed. As for the network lag, I don't think so, since when I download 200 records, it works fine. – uncle liu Oct 21 '18 at 12:22
  • @APC I'm using tomcat, the only way is to increasing time out? – uncle liu Oct 21 '18 at 12:22
  • No. Potentially there's all sorts of things you could. But first you need to know **what** is eating up the time, then you will know what you need to fix. But that requires a deep dive into your infrastructure which is beyond the scope of a question here. – APC Oct 21 '18 at 12:42
  • @APC Thanks for your reply, I have divided the process to two, one is request for location, another is generate excel file. Because of my reputation, I cannot post screenshots in my post, you can click the link in the main post to view them. I also posted the pseudo code here. I have test for **200** records to download, when with external requests, it took about nearly 40s to finish, when without external requests, it took about 1s to finish. In both cases, the excel generating process took about 3s. So, I thought the main problem is the external requests. Hope for your suggestion. – uncle liu Oct 22 '18 at 06:14

1 Answers1

2

This is not a real answer, but it is too long for a comment.

I thought the main problem is the external requests.

Obviously it's the external calls. Almost all the time spent is TTFB. But you still don't know where that time actually goes.

There are basically three costs. The first is the cost of executing your code, formatting the requests and parsing the responses. As this appears to be negligible, we can move right on. The second is the fixed cost of executing a call: sending stuff across the network. This is a toll you pay 6000 times. The third is the cost of retrieving the data: this is also a toll you pay 6000 times, but depending on the data source the cost may not be fixed (e.g. database caching can offset repeated queries).

If most of the time spent is on network calls then (unless you can persuade your network admin there's a problem with the line) you can't do much except reduce the number of calls. That is, extend the remote interface with services which accept and return bulk bulk payloads.

If most of the time is spent in data retrieval you need to look at the database queries (or whatever) and try to tune them.

Clearly I'm making these suggestions without knowing anything about your organisation or your responsibilities/authority. If the external service is truly external you may not be able to get improvements.

Alternative solutions?

  • Concurrent processing: have multiple threads calling the remote server. You will need to rewrite your code, but at least that's under your control.
  • Increasing the timeout: just give your program more time to complete. However, presuming the cost is linear it will take ten minutes to process 3000 pairs of calls. Your users may not want their screen to hang for that long. So...
  • Asynchronous submission. The user fires off a request, moves on to the next task and some time later the finished Excel file is popped into their inbox.
APC
  • 144,005
  • 19
  • 170
  • 281
  • 1
    Thanks for your reply, that's really a great answer, really inspired me, and I think there is another point if I could make it better. I started the download process when I obtained all the stuff that the client need, then return to the browser, so the download process finished immediately, but a very long waiting time. However, when I download file from a website in browser, for example, downloading nodejs.exe on Windows, the download process started immediately, the file is transfered gradually, until it fineshed. – uncle liu Oct 22 '18 at 08:24
  • So I'm wondering if I could use this method to download, even though in my case the file isn't already exists, I need to generate it accordingly. But if I respond to the client quickly, then 504 error is no longer exists, and then I can transfer file gradually. Is this possible? And what is the technical name this method called, I even don't know how to search for it. Hope for your reply. Thanks. – uncle liu Oct 22 '18 at 08:29