Indeed, General Purpose GPUs (such as those in Apple devices from the A8 on, for example) are not only capable but also intended to be able to solve such parallel data processing problems.
Apple introduced Data-parallel-processing using Metal in their platforms, and with some simple code you can solve problems like yours using the GPU. Even if this can also be done using other frameworks, I am including some code for the Metal+Swift case as proof of concept.
The following runs as a Swift command line tool on OS X Sierra, and was built using Xcode 9 (yup, I know it's beta). You can get the full project from my github repo.
As main.swift
:
import Foundation
import Metal
import CoreGraphics
import AppKit
guard FileManager.default.fileExists(atPath: "./testImage.png") else {
print("./testImage.png does not exist")
exit(1)
}
let url = URL(fileURLWithPath: "./testImage.png")
let imageData = try Data(contentsOf: url)
guard let image = NSImage(data: imageData),
let imageRef = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
print("Failed to load image data")
exit(1)
}
let bytesPerPixel = 4
let bytesPerRow = bytesPerPixel * imageRef.width
var rawData = [UInt8](repeating: 0, count: Int(bytesPerRow * imageRef.height))
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedFirst.rawValue).union(.byteOrder32Big)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let context = CGContext(data: &rawData,
width: imageRef.width,
height: imageRef.height,
bitsPerComponent: 8,
bytesPerRow: bytesPerRow,
space: colorSpace,
bitmapInfo: bitmapInfo.rawValue)
let fullRect = CGRect(x: 0, y: 0, width: CGFloat(imageRef.width), height: CGFloat(imageRef.height))
context?.draw(imageRef, in: fullRect, byTiling: false)
// Get access to iPhone or iPad GPU
guard let device = MTLCreateSystemDefaultDevice() else {
exit(1)
}
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .rgba8Unorm,
width: Int(imageRef.width),
height: Int(imageRef.height),
mipmapped: true)
let texture = device.makeTexture(descriptor: textureDescriptor)
let region = MTLRegionMake2D(0, 0, Int(imageRef.width), Int(imageRef.height))
texture.replace(region: region, mipmapLevel: 0, withBytes: &rawData, bytesPerRow: Int(bytesPerRow))
// Queue to handle an ordered list of command buffers
let commandQueue = device.makeCommandQueue()
// Buffer for storing encoded commands that are sent to GPU
let commandBuffer = commandQueue.makeCommandBuffer()
// Access to Metal functions that are stored in Shaders.metal file, e.g. sigmoid()
guard let defaultLibrary = device.makeDefaultLibrary() else {
print("Failed to create default metal shader library")
exit(1)
}
// Encoder for GPU commands
let computeCommandEncoder = commandBuffer.makeComputeCommandEncoder()
// hardcoded to 16 for now (recommendation: read about threadExecutionWidth)
var threadsPerGroup = MTLSize(width:16, height:16, depth:1)
var numThreadgroups = MTLSizeMake(texture.width / threadsPerGroup.width,
texture.height / threadsPerGroup.height,
1);
// b. set up a compute pipeline with Sigmoid function and add it to encoder
let countBlackProgram = defaultLibrary.makeFunction(name: "countBlack")
let computePipelineState = try device.makeComputePipelineState(function: countBlackProgram!)
computeCommandEncoder.setComputePipelineState(computePipelineState)
// set the input texture for the countBlack() function, e.g. inArray
// atIndex: 0 here corresponds to texture(0) in the countBlack() function
computeCommandEncoder.setTexture(texture, index: 0)
// create the output vector for the countBlack() function, e.g. counter
// atIndex: 1 here corresponds to buffer(0) in the Sigmoid function
var counterBuffer = device.makeBuffer(length: MemoryLayout<UInt32>.size,
options: .storageModeShared)
computeCommandEncoder.setBuffer(counterBuffer, offset: 0, index: 0)
computeCommandEncoder.dispatchThreadgroups(numThreadgroups, threadsPerThreadgroup: threadsPerGroup)
computeCommandEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// a. Get GPU data
// outVectorBuffer.contents() returns UnsafeMutablePointer roughly equivalent to char* in C
var data = NSData(bytesNoCopy: counterBuffer.contents(),
length: MemoryLayout<UInt32>.size,
freeWhenDone: false)
// b. prepare Swift array large enough to receive data from GPU
var finalResultArray = [UInt32](repeating: 0, count: 1)
// c. get data from GPU into Swift array
data.getBytes(&finalResultArray, length: MemoryLayout<UInt>.size)
print("Found \(finalResultArray[0]) non-white pixels")
// d. YOU'RE ALL SET!
Also, in Shaders.metal
:
#include <metal_stdlib>
using namespace metal;
kernel void
countBlack(texture2d<float, access::read> inArray [[texture(0)]],
volatile device uint *counter [[buffer(0)]],
uint2 gid [[thread_position_in_grid]]) {
// Atomic as we need to sync between threadgroups
device atomic_uint *atomicBuffer = (device atomic_uint *)counter;
float3 inColor = inArray.read(gid).rgb;
if(inColor.r != 1.0 || inColor.g != 1.0 || inColor.b != 1.0) {
atomic_fetch_add_explicit(atomicBuffer, 1, memory_order_relaxed);
}
}
I used the question to learn a bit about Metal and data-parallel computing, so most of the code was used as boilerplate from articles online and edited. Please take the time to visit the sources mentioned below for some more examples. Also, the code is pretty much hardcoded for this particular problem, but you shouldn't have a lot of trouble adapting it.
Sources:
http://flexmonkey.blogspot.com.ar/2016/05/histogram-equalisation-with-metal.html
http://metalbyexample.com/introduction-to-compute/
http://memkite.com/blog/2014/12/15/data-parallel-programming-with-metal-and-swift-for-iphoneipad-gpu/