I've done this using a vertex and fragment shader.
I declare a point vertex for each pixel in the texture and render them.
I declare an offscreen render texture that is 1D and is the length of the number of buckets I want. You could make this 1 bucket but it will slow you down while each fragment is waiting to write to the same pixel. I've found 32 usually works pretty well for me.
Set the blend mode to additive.
In the vertex shader I mod gl_VertexID by the number of buckets and set that range between -1<->1.
Sample the texture for each vertex (which is really equal to each pixel) and set the gl_Color to white for valid pixel and black for not (basically 1 or 0).
Do a readback on the 1D texture and sum up the values. This will be the count of pixels.
A second way I've done it in the past is to do the first pass to an offscreen texture that is the same size as the input and classify the input as 1 or 0. Next, render to a texture that is half the width and height and sample 4 times add and write. Keep doing this until you get to a 1x1 and read the value back.