Most real world data are not uniformly randomly distributed, but rather they're usually almost sorted. If a sorting algorithm always take O(nlog(n)) even when the data is nearly sorted, then it wouldn't perform as well with real world data.
For example sorting a log file based on the datetime entry in the log. As log entries are created while the events happens, most of the log entries would be close to where they should be, with just a few that are out of place due to concurrent writes. A log file can be extremely large, on the order of gigabytes or more, so a sorting algorithm that doesn't take advantage of the near sorted state of the log file is not as efficient as it should be.
Another case with log files, in a distributed system, multiple systems can produce log entries concurrently. The individual log file itself is sorted (or nearly sorted) but you'd want to merge the multiple log files into a single linear log file containing all the events in all systems. You can just concatenate all the logs and if the sorting algorithm recognizes that most of the data have wide spans of already sorted entries, it can do a much more efficient O(n) merge operation rather than an O(nlog(n)) sort.