You have asked a very broad and complicated question. Actually, you have asked several broad, complicated questions.
The software that has final governance over the operation of any hardware is called the hardware's "driver". Naturally, for graphics hardware, this is called the "graphics driver." Like all drivers, the graphics driver is effectively an installable part of the OS; the OS is what allows the graphics driver to do its job and talk to the hardware. The two work hand in hand.
There are effectively two kinds of D3D or OpenGL (heretofore known as "the API") calls: those that talk to the driver and those that do not. Every call that actually draws something needs to (eventually) talk to the driver, but calls that set up later drawing calls may just store data locally.
When you make a drawing call, the API does some checks to make sure that you as the user have made a valid rendering call. If so, the API has some options as to what to do. It turns out that talking directly to the driver takes a long time, regardless of how many commands you give it when you start talking. Therefore, what often happens is that the API stores your rendering call and returns immediately. Then, possibly in another thread, it may look to see how many rendering calls have been stored. If there are "enough", then it will forward them to the driver. This is called "marshalling".
The driver's job is to take these calls that have been forwarded and convert them into stuff that the GPU will do.
On that same line, where is the graphics pipeline defined? in the gpu hardware, the driver, or the software library?
That's actually a pretty tricky question these days, and becoming trickier every hardware generation.
In the old days, the construction of the graphics pipeline was rigidly controlled by the GPU hardware. These days, this is less true, though there is some hardware control. On modern hardware (capable of OpenGL 3.0 or Direct3D10 or better), it would be theoretically possible, if you had direct access to the graphics driver, to design an API that used a somewhat altered version of the graphics pipeline. So the APIs dictate much of what the graphics pipeline looks like.
Each stage in the rendering pipeline takes certain values from the precious stage(s) as input and generates some number of values as output. A stage is "programmable" if the mechanism for generating the outputs from the inputs involves executing a user-supplied program, called a "shader". So there is no such thing as a programmable pipeline (yet); just programmable stages of a fixed pipeline.