Microsoft Excel Shared Strings Table
Shared strings table is and Open XML standard, as defined by ISO standard - ISO/IEC 29500-1:2016(E)
Official definition of Shared strings (cited from ISO document)
Shared String Table
String values may be stored directly inside spreadsheet cell elements; however, storing the same value inside multiple cell elements can result in very large worksheet Parts, possibly resulting in performance degradation. The Shared String Table is an indexed list of string values, shared across the workbook, which allows implementations to store values only once.
ISO standard on Shared Strings can be downloaded from
https://standards.iso.org/ittf/PubliclyAvailableStandards/c071691_ISO_IEC_29500-1_2016.zip
Answers to the questions on this topic
Question 1: Are shared strings used within the Excel application itself, or only when saving the data?
Answer:
Shared strings are used by Excel only at the time of saving the
document, I.E., only for the purpose of storing the spreadsheet as a
file on storage.
However, when the file is opened for display, the cells are populated
with actual string values pulled from the shared strings table.
-
Question 2: What would be an example algorithm to sort on the field then? Any language is fine (c, c#, c++, python).
Answer:
For an application like Excel, I guess that a special proprietary variation of
Quick sort is the most likely algorithm to be used for sorting on string values.
Excel has a limit of 1,048,576 rows. For this size, Quick sort is
definitely a winner. Quick sort can produce very efficient result for
data set of this magnitude.
Here is the link to the implementation of Quick Sort in C++ for
sorting strings:
http://www.cplusplus.com/forum/beginner/101599/