Apache 2.2 on linux too slow when showing 2,50,000 files from one directory

Question

Using the web browser, I have a requirement to access a directory on a Linux server hosting around 2,00,000 files in it.

I am using the 'Alias' directive in apache to achieve this requirement. Please see below -

Alias /barcodes/ "/m01/apps/codes/barcodes/"

<Directory "/m01/apps/codes/barcodes/">
Options +Indexes
IndexOptions +TrackModified
AllowOverride None
Order allow,deny
Allow from all
#Doing IndexOrderDefault so to see the files in a descending order (by date/timestamp)
IndexOrderDefault Descending Date
</Directory>

Issue - The web browser takes a lot of time in displaying up the files from the directory and becomes too slow to access.

Appreciate if someone can help in achieving this requirement.

Thanks..

Having 2.5 million (or is it 250k? Doesn't really matter) in a single directory is far from ideal and it will take a lot of time to transfer the list of files in a generated index. This is expected behaviour. — Sven, Apr 06 '17 at 15:50
What is the high level problem you are trying to solve? Why do folks need access to these files? Do they need access to the files or the listings only? Do they need all metadata from `ls -l`? Do the files change often or are added/removed often? Depending on the answers to these questions, there could be a variety of solutions, but your current one is in-efficient. — Not Now, Apr 06 '17 at 18:32
The problem I am trying to solve is displaying the files of barcodes directory through a web browser. The folks needs to access the files through browser, so they can read them. Yes, they do need the metadata as it's displayed when we list them in linux. The directory does gets refreshed through a cron once in every month and holds the last 30 days of data. So, the cron job, if it runs on 5th of every month, will remove the data of the previous date, 04th of that month - making it to keep a 30 days worth of data in it. — mwgeek, Apr 10 '17 at 13:49

score 2 · Answer 1 · answered Apr 06 '17 at 17:03

As @Sven points out, this isn't achievable with your current setup.

Right now, there are a number of possible bottlenecks, but a very likely one is your disk itself.

For each request to your index page:

apache needs to query the disk for the contents of the folder
apache then needs to query the disk for the details of each entry
which causes the OS to also do things like look up the UID/GID in order to try and provide textual, rather than numeric values

Depending on your actual needs, one possible solution could be to create your own index file programatically, and avoid scanning the whole directory each time.

Something like this ugly and untested 'script':

#!/bin/bash

DATE="$( date -I )"

echo "<html><head><title>File listing at $DATE <title></head>" > index.html
echo "<body><ul>" >> index.html
for x in `ls -1 FOLDER`; do
    echo "<li><a href=\"/path/to/$x\">$x<a></li>" >> index.html
done

echo "</ul></body></html" >> index.html

The user can then select a file from the list (one file read by apache and the disk), and access that content as needed.

Obviously, you'd probably want something a little nicer than that, but the general idea of reducing the reads per request as much as possible is hopefully clear.

Thanks @iwaseatenbyagrue . I am able to list the directory through the browser. However, it doesn't prints the time stamp of the files. Appreciate if you could advise more on this... — mwgeek, Apr 20 '17 at 12:04
@mwgeek - you could just use `ls -l` instead of `ls -1`, and then 'parse' the resulting string (e.g. use cut or sed on it). The details really do depend on what precisely your needs and purpose are, so you would need to build up whatever makes sense in your context. — iwaseatenbyagrue, Apr 20 '17 at 16:05

Apache 2.2 on linux too slow when showing 2,50,000 files from one directory

1 Answers1