The scanline algorithm is what people did in software back in the 1990s, before modern GPUs. GPU developers figured out rather quickly that the algorithms you use for software rendering are vastly different from the algorithms you would implement in a VLSI implementation with billions of transistors. Algorithms optimized for hardware implementation tend to look fairly alien to anyone who comes from a software background anyway.
Another thing I'd like to clear up is that OpenGL doesn't say anything about "how" you render, it's just "what" you render. OpenGL implementations are free to do it however they please. We can find out "what" by reading the OpenGL standard, but "how" is buried in secrets kept by the GPU vendors.
Finally, before we start, the articles you linked are unrelated. They are about how ultrasonic scans work.
What do we know about scan conversion?
Scan conversion has as input a number of primitives. For our purposes, let's assume that they're all triangles (which is increasingly true these days).
Every triangle must be clipped by the clipping planes. This can add up to three additional sides to the triangle, in the worst case (turning it into a hexagon). This has to happen before perspective projection.
Every primitive must go through perspective projection. This process takes each vertex with homogeneous coordinates (X, Y, Z, W) and converts it to (X/W, Y/W, Z/W).
The framebuffer is usually organized hierarchically into tiles, not linearly like the way you do in software. Furthermore, the processing might be done at more than one hierarchical level. The reason why we use linear organization in software is because it takes extra cycles to compute memory addresses in a hierarchical layout. However, VLSI implementations do not suffer from this problem, they can simply wire up the bits in a register how they want to make an address from it.
So you can see that in software, tiles are "complicated and slow" but in hardware they're "easy and fast".
Some notes looking at the R5xx manual:
The R5xx series is positively ancient (2005) but the documentation is available online (search for "R5xx_Acceleration_v1.5.pdf"). It mentions two scan converters, so the pipeline looks something like this:
primitive output -> coarse scan converter -> quad scan converter -> fragment shader
The coarse scan converter appears to operate on larger tiles of configurable size (8x8 to 32x32), and has multiple selectable modes, an "intercept based" and a "bounding box based" mode.
The quad scan converter then takes the output of the coarse scan converter and outputs individual quads, which are groups of four samples. The depth values for each quad may be represented as four discrete values or as a plane equation. The plane equation allows the entire quad to be discarded quickly if the corresponding quad in the depth buffer also is specified as a plane equation. This is called "early Z" and it is a common optimization.
The fragment shader then works on one quad at a time. The quad might contain samples outside the triangle, which will then get discarded.
It's worth mentioning again that this is an old graphics card. Modern graphics cards are more complicated. For example, the R5xx doesn't even let you sample textures from the vertex shaders.
If you want an even more radically different picture, look up the PowerVR GPU implementations which use something called "tile-based deferred rendering". These modern and powerful GPUs are optimized for low cost and low power consumption, and they challenge a lot of your assumptions about how renderers work.