What should happen with the final exclusive scan value in a stream compaction algorithm?
This is an example to pick out all the 'A' characters.
Sequence A:
Input: A B B A A B B A
Selection: 1 0 0 1 1 0 0 1
Scan: 0 1 1 1 2 3 3 3
0 - A
1 - A
2 - A
3 - A
Sequence B (same except the last value):
Input: A B B A A B B B
Selection: 1 0 0 1 1 0 0 0
Scan: 0 1 1 1 2 3 3 3
0 - A
1 - A
2 - A
3 - B
Clearly the second example gives the wrong final result based on doing a naive loop through the scan values writing into these addresses.
What am I missing here?
Update:
As I understand the scan algorithm, I would do the equivalent of the following:
for (int i = 0; i < scan.length(); i++)
{
result[scan[i]] = input[i];
}
In parallel this would involve a scatter instruction.