Standards committees such as the IEEE and POSIX develop and evolve standards with iterations that provide more functionality or to correct problems with previous versions of the standards. This process is driven by the needs of people who have problem domain software needs as well as by the vendors of software products supporting those people. Typically the implementation of a standard will vary between vendors to some degree. Like any other software, different people provide differences in implementation depending on the target environment as well as their own skills and knowledge. However as the standard matures there is a kind of Darwinian selection in which there is an agreement on best practices and the various implementations begin to converge.
The first versions of a pthreads POSIX library was in the 1990s targeting UNIX style operating system environments for instance see POSIX. 4: Programming for the Real World and see also PThreads Primer: A guide to Multithreaded Programming. The ideas and concepts for the library originated from work done earlier in an attempt to provide a co-routine or thread type of functionality which worked at a finer level than the operating system process level to reduce the overhead that creating, managing, and destroying processes involved. There were two major approaches to threading, user level with little kernel support and kernel level depending on the operating system to provide the thread management, with somewhat different capabilities such as pre-emptive thread switching or not being available.
In addition there were also the needs of tool makers such as debuggers to provide support for working in a multi-threaded environment and being able to see thread state and to identify specific threads.
There are several reasons for using an opaque type within the API for a library. The primary reason is to allow the developers of the library the flexibility to modify the type without causing problems for users of the library. There are several ways of creating opaque types in C.
One way is to require users of the API to use a pointer to some memory area that is managed by the API library. You can see examples of this approach in the Standard C Library with the file access functions such as fopen()
which returns a pointer to a FILE
type.
While this accomplishes the goal of creating an opaque type, it requires the API library to manage the memory allocation. Since it is pointers, you can run into problems of memory being allocated and never released or of attempting to use a pointer whose memory has already been released. It also means that specialized applications on specialized hardware may have a difficult time porting the functionality for instance to a specialized sensor with bare bones support which does not include a memory allocator. This kind of hidden overhead can also affect specialized applications with limited resources and being able to predict or model the resources used by an application.
A second way is to provide to the users of the API a data struct that is the same size as the actual data struct used by the API but that uses a char buffer to allocate the memory. This approach hides the details of the memory layout, since all the user of the API sees is a single char buffer or array, yet it also allocates the correct amount of memory that is used by the API. The API then has its own struct that lays out how the memory is actually used and the API does a pointer conversion internally to change the struct used to access the memory.
This second approach provides a couple of nice benefits. First of all, the memory used by the API is now managed by the user of the API and not the library itself. The user of the API can decide if they want to use stack allocation or global static allocation or some other memory allocation such as malloc()
. The user of the API can decide if they want to wrap the memory allocation in some kind of resource tracking such as a reference counting or some other management that the user wants to do on their side (though this could also be done with pointer opaque types as well). This approach also allows the user of the API to have a better idea of memory consumption and to model memory consumption for specialized applications on specialized hardware.
The API designer could also provide some types of data to the user of the API which might be handy such as status information. The goal of this status information is to allow the user of the API to query what are tantamount to read only members of the struct directly rather than going through the overhead of some kind of a helper function in the interests of efficiency. While the members are not specified as const
(to encourage the C compiler to reference the actual member rather than caching the value at some point in time depending on it to not change), the API may update the fields during operations to provide information to the user of the API while not depending on the values of those fields for its own use.
However any such data fields run the risk of introducing problems with backwards compatibility as well as changes introducing memory layout problems. A C compiler may introduce padding between the members of a struct in order to provide for efficient machine instructions when loading and storing data into those members or due to CPU architecture requiring some kind of a starting memory address boundary for some kinds of instructions.
Specifically for the pthreads library, we have the influence of UNIX style C programming of the 1980s and 1990s which tended to have open and visible data structures and header files allowing programmers to read the struct definitions and defined constants with comments since much of the available documentation was the source.
A brief example of an opaque struct would be as follows. There is the include file, thing.h, which contains the opaque type and which is included by anyone using the API. Then there is a library whose source file, thing.c, contains the actual struct used.
thing.h may look like
#define MY_THING_SIZE 256
typedef struct {
char array[MY_THING_SIZE];
} MyThing;
int DoMyThing (MyThing *pMyThing, int stuff);
Then in the implementation file, thing.c, you might have source like the following
typedef struct {
int thingyone;
int thingytwo;
char aszName[32];
} RealMyThing;
int DoMyThing (MyThing *pMyThing, int stuff)
{
RealMyThing *pReal = (RealMyThing *)pMyThing;
// do stuff with the real memory layout of MyThing
return 0;
}
Concerning "before main" threads
When an application using the C run time is started, the loader uses the entry point for the C run time as the application starting place. The C run time then performs the initialization and environmental setup that it needs to do and then invokes the designated entry point for the actual application. Historically this designated entry point is the function main()
however what the C run time uses can vary between operating systems and development environments. For instance for a Windows GUI application the designated entry point is WinMain()
(see WinMain entry point) rather than main()
.
It is up to the C run time to determine the conditions under which the designated entry point for the application is called. Whether there are "pre-main" threads running will depend on the C run time and the target environment.
With a Windows application using Active-X controls with their own message pump there could well be "pre-main" threads. I work with a large Windows application that uses several controls providing various kinds of device interfaces and when I look in the debugger, I can see a number of threads which the source of my application does not create with a specific create thread call. These threads are started by the run time as the Active-X controls used are loaded in and started.