4

Alfresco provides a CIFS connector so it can act just a normal file-server in your intranet.

Compared with a "normal" (windows/samba) based fileserver, certain operations can really hurt the system, e.g. listing a folder with a few thousand files using windows explorer. Not quite sure, but I think permission checking is the primary reason for this case. Anyways, now assume you have a big filesystem hierarchy exposed and many users using CIFS, stressing the system, effectively "knocking it down".

What is the suggested approach to scale / improve performance ?

Andreas Steffan
  • 6,039
  • 2
  • 23
  • 25

3 Answers3

3

In my experience Windows Explorer is part of the CIFS performance issue. I don't have exact numbers, but I remember working on an instance with roughly 500GB data, mostly composed of small images and a few texts in a not well balanced folder tree, for which listing a folder with a thousand children was taking in Explorer around a minute to display. The same operation was taking around 3s on Chrome browser.

We never had time to investigate the issue thoroughly, but we saw an impressive amount of traffic generated by Explorer due to prefetch of information of the subfolders of the currently open folder.

skuro
  • 13,414
  • 1
  • 48
  • 67
  • That pretty much backs what I have observed. Clients using windows explorer hurt the system pretty badly. It was much less of an issue when I was trying the samba client. Nevertheless, I think its by far most important to support Windows Explorer usage in the corporate scenario. I guess you can't tell a customer to use another client. :) – Andreas Steffan Feb 11 '12 at 10:18
  • Memories are coming back ... Some clients have hurt the system more than others. But IIRC, just browsing a space with the ootb JSF web-client hurt just about as badly as browsing with Windows Explorer does. I wonder what is so fundamentally different with ordinary fileservers that they handle this situation so much better. – Andreas Steffan Feb 11 '12 at 10:35
  • I remember enabling the most verbose logging on the app, and looking at the logs while accessing the file share to understand what was going on with different clients (I also jumped directly to wireshark). There I saw prefetching as the biggest difference between clients. Indeed, better overall performance regardless the client is an important and missing feature. – skuro Feb 11 '12 at 14:28
  • Seems folks at Westernacher addressed the performance issue by replacing CIFS with a proprietary REST based protocol : http://www.westernacher.com/en/business/ecm/alfresco-enterprise-integration-platform/windows-explorer-integration . Don't think this is a good idea though. – Andreas Steffan Feb 11 '12 at 18:43
  • If it's viable for the customer to install a shell extension on each end every end user workstation, I guess it's a viable option. Is NFS as slow as CIFS? If it's actually faster, that's probably another option. – skuro Feb 12 '12 at 17:06
3

Been revisiting the issue a little, and I guess the best answer I can give for now is: Tweak the cache(s).

I used a 5k children space, default cache values and benchmarked executing "ls -alrt" on the CIFS mount running alfresco 4.0.d.

The first execution took roughly two minutes bombarding the (lightning fast) mysql database with approx 200k queries.

The second execution took "only" around 40 seconds, but the amount of queries did not change significantly.

Increasing the CIFS fileinfo cache, I got the second time down to 30 seconds, but I still see 160k DB queries firing. I'm fairly sure this lions share has to do with permissions/ACLs and it should be possible improve the situation a lot.

PS: Windows Explorer definitely behaves a little unexpected, but I cannot confirm that it makes a significant difference regarding user experience.

PPS: https://issues.alfresco.com/jira/browse/ALFCOM-2951

PPPS: I'll look into this further when I find the time - should be this year. ;)

Update: The massive amount of queries is no permission issue.

Andreas Steffan
  • 6,039
  • 2
  • 23
  • 25
2

Permission checks definitely IS a part of the problem. I can't link to anything specific, but browsing alfresco forums and the net for the last few years I've learned that permissions can hurt the performance.

I've read (and experienced) in several scenarios that alfresco spaces with large numbers of children (1000+) can be painfully slow. One part you noticed yourself: it takes a while to go through 100-200k queries. But hook up something into alfresco to watch what's it doing and you'll see that massive amounts of time go on serialization/deserialization (e.g.webscripts for share) and also node traversal (hence the thousands of queries and averages of 400-500 qps when nobody is logged on). So you're on the right way with your cache optimizations.

Do you have dedicated hardware for your installation? I've had big issues with performance, but I've moved the MySQL server to a separate box (server-grade hardware - 4 cores, 8GB ram, SSD for myqsl server and SAS for tomcat server etc) and I gained a lot. So, get on with begging for the new hardware too :)

I think you're on the right path here.

Zlatko
  • 18,936
  • 14
  • 70
  • 123