Questions tagged [prefix-sum]

63 questions
0
votes
1 answer

CUDA sum to the right

I am trying to implement sum reduction using CUDA, however I want the reduction to be to the right not to the left.. I wrote the below code, but I am not sure why it is not working __global__ void reduce_kernel( float *input, float…
noobie
  • 3
  • 1
0
votes
2 answers

Intuition behind calculating the prefix and suffix sums

I am solving a LeetCode question: Minimum Number of Operations to Move All Balls to Each Box. You have n boxes. You are given a binary string boxes of length n, where boxes[i] is '0' if the ith box is empty, and '1' if it contains one ball. In one…
P.K.
  • 379
  • 1
  • 4
  • 16
0
votes
1 answer

Heap overflow in Leetcode 1314

Question : https://leetcode.com/problems/matrix-block-sum/ I am trying to solve it using 2D prefix sum where sum[i][j] is the sum of all the elements to its left side including the element and it's row and column. Code : class Solution { public: …
Sarthak Saxena
  • 139
  • 4
  • 10
0
votes
2 answers

Cumulative sum to find Subarrays' whose sum equals a give value

I'm trying to understand the logic behind the following code however I'm unclear about 2 parts of the code partially because the math supporting the logic is not totally clear to me at this moment. CONFUSION 1: I don't understand why would we put 0…
Umer Farooq
  • 7,356
  • 7
  • 42
  • 67
0
votes
2 answers

Time complexity of a prefix sum algorithm

Given the following pseudo code, I'm wondering if my thought process is correct when trying to determine the time complexity. for i = 0 to n-1 Add the numbers A[0] thru A[i]. Store the result in B[i]. The algorithm will loop n times, and…
bullbo
  • 131
  • 10
0
votes
1 answer

How to a thread performs Binary Search on the Prefix-Sum array

In the context of Parallel programming and GPU, we have an array that is called Prefix-Sum array. In Dynamic Mapping, each thread performs a binary search on the Prefix-Sum to find the corresponding Work-Item. It is a question for me, How a thread…
Saeed Rahmani
  • 650
  • 1
  • 8
  • 29
0
votes
2 answers

OpenCL scan code

I'm looking for a fast implementation of scan(prefixsum) in OpenCL. The best thing that I found is in the Nvidia SDK but it's old(2010). Does anyone know any other implementation of Scan in OpenCL?
Shewartz
  • 5
  • 4
0
votes
0 answers

Number Sequences

I've got a homework about number sequences. Given array of n elements from 1 to MAX. We can choose any number become start. We can multiply start by 2, or divided it by 2, but if the start is already odd number we can't divided it. The task is we…
Zzzz
  • 1
  • 3
0
votes
1 answer

Is prefix scan CUDA sample code in gpugems3 correct?

I've written a piece of code to call the kernel in the book GPU Gems 3, Chapter 39: Parallel Prefix Sum (Scan) with CUDA. However the results that I get are a bunch of negative numbers instead of prefix scan. Is my kernel call wrong or is there…
dibid
  • 75
  • 9
0
votes
1 answer

Stream compaction (or Array Packing) with prefix scan using Openmp

I am using openmp to parallelize my code. I have an original array: A=[3,5,2,5,7,9,-4,6,7,-3,1,7,6,8,-1,2] and a marks array: M=[1,0,1,0,0,0,1,0,0,1,1,0,0,0,1,1] using array M i can compact my original array in this packed…
Pierpym
  • 13
  • 4
0
votes
1 answer

Python convert list to tree representation format

Me and my friend are working on a simple Python project. Actually we are implementing the prefix parallel sum algorithm in our own way. We are creating and processing a binary tree with a really weird format. We would like to convert this format to…
user3761291
  • 73
  • 1
  • 7
0
votes
0 answers

Prefix sum/scan using global memory in cuda/OpenCL

I was looking for a global memory implementation of prefix sum/scan algorithm using CUDA or OpenCL. All the implementation has been done using local memory. Can anyone help me with the algorithm and how I should proceed?
Luniam
  • 463
  • 7
  • 21
0
votes
1 answer

Prefix Sum with global memory and an error with local memory

I have a Mali GPU which does not support local memory at all. Everytime I run code consisting of local memory it gives me some errors from the device. So, I want to transfer my codes to a version that only uses global memory. I was thinking if it is…
Luniam
  • 463
  • 7
  • 21
0
votes
1 answer

PRAM if-then-else CREW/EREW

In my book of parallel algorithms there is the following pseudo-code for the PRAM model: procedure PrefixSumPRAM( A, n ): BEGIN b := new Array(2*n-1); b[1] := SumPRAM(A, n); //this will load A with the computation tree and return the sum …
SpectralWave
  • 971
  • 9
  • 18
-1
votes
1 answer

Leetcode 719. Find K-th Smallest Pair Distance

The question statement is as follows- Given an integer array, return the kth smallest distance among all the pairs. The distance of a pair (A, B) is defined as the absolute difference between A and B. The solution which leetcode accepts is the…
user11910577