Questions tagged [coalescing]
49 questions
1
vote
1 answer
Uncoalesced float2 CUDA kernel
I am having trouble optimizing the grid and block sizes of the example below. When I do profiling, it appears that the memory write operation in the kernel code is not coalesced.
I found some solutions on the internet but they suggested me to change…

Dunya Degirmenci
- 365
- 4
- 20
0
votes
1 answer
How to convert multiple set of column to single column in pandas?
i want to convert a columns(Azi_0 to Azi_47,Dist_0 to Dist_47) in dataframe(df) to a two column(Azimuth,Distance) as in new_df?
Azi = [f"Azi_{i}" for i in range(47)]
dist = [f"Dist_{i}" for i in range(47)]
sample dataframe,df:
expected…

Mageo
- 84
- 2
- 9
0
votes
1 answer
Value of first and last non missing time point by condition of other variables. conditional coalesce (dplyr)
a/b/c is different variables, t1 is time point 1, t2 is time point 2, t3 is time point 3.
The purpose is to create a two new columns: one with the first and one with the last non missing value for each row of a_t1 to a_t3. On the condition that it…

Droc
- 257
- 1
- 2
- 8
0
votes
2 answers
nil coalescing inside dictionary index using swift
Im trying to assign a value to dictionary key with optional value from textfield textFieldOne.text! and textFieldTwo.text!, but Xcode throws build error.
let variable = [
"keyOne": textFieldOne.text ?? "",
"keyTwo": textFieldTwo.text…

DoubtAN
- 25
- 3
0
votes
1 answer
CUDA: memory transaction size for compute capability 1.2 or later
all,
From "NVIDIA CUDA Programming Guide 2.0" Section 5.1.2.1:
"Coalescing on Devices with Compute Capability 1.2 and Higher"
"Find the memory segment that contains the address requested by the lowest numbered active thread. Segment size is 32…

derek
- 9,358
- 11
- 53
- 94
0
votes
1 answer
Coalescing two memory chunks in C++?
I'm trying to make my own memory allocator in C++ for educational purposes, and I have a code like this:
class IntObj
{
public:
IntObj(): var_int(6) {}
void setVar(int var)
{
var_int = var;
}
int getVar()
{
return var_int;
}
virtual…

Andrew Raiden
- 1
- 4
0
votes
2 answers
Request coalescing in Redis
This might be basic question, but google didn't return satisfactory result.
If I use Redis as a cache and send tons of same requests to Redis, would it coalesce it into one request and forward only one request to server in case of a cache miss?

animageofmine
- 121
- 9
0
votes
0 answers
Xeon Phi: Impossible to achieve perfect memory coalescing and fully utilize SMID units?
I have a GPU/CUDA code that processes a cube (3D image, a spectral cube to be precise). Think of the cube as a series of images/slices, or alternatively, a bunch of spectra with different spatial locations (on a square). Each pixel of an image has…

AstrOne
- 3,569
- 7
- 32
- 54
0
votes
1 answer
objective-c how to batch multiple read operations
I am performing multiple read operations on the same resource stored on disk.
Sometimes the read operation itself takes longer than the time between requests for that same resource. In those cases it would make sense to batch the read operations…

Avba
- 14,822
- 20
- 92
- 192
0
votes
2 answers
clojure: can I define an implicit conversion possility?
I have a protocol called IExample and I define a record type A that implements it:
(defprotocol IExample
(foo [this] "do something")
(bar [this] "do something else"))
(defrecord A [field1 field2]
IExample
(foo [this]
(+ field1 field2))
…

romeovs
- 5,785
- 9
- 43
- 74
0
votes
1 answer
CUDA: Why accessing the same device array is not coalesced?
I am posting a drilled down code for review. I believe it should compile and execute without any problems but since i excluded all the irrelevant parts, I might have made some mistake.
struct Users {
double A[96];
double B[32];
double…

Psypher
- 396
- 1
- 3
- 13
0
votes
1 answer
If each piece of data take 128 bit or more, is there any advantage of grouping them in memory?
I've read in the CUDA Programming Guide that the global memory in a CUDA device is accessed by transaction on 32, 64 or 128 bit. Knowing that, is there any advantage of, say, having an set of float4 (128 bit) close together in memory? As I…

subb
- 1,578
- 1
- 15
- 27
0
votes
1 answer
Where is the global memory replay overhead coming from?
Running the code below to write 1 GB in global memory in the NVIDIA Visual Profiler, I get:
- 100% storage efficiency
- 69.4% (128.6 GB/s) DRAM utilization
- 18.3% total replay overhead
- 18.3% global memory replay overhead.
The memory writes are…

coder
- 11
- 2
0
votes
1 answer
CUDA: When can someone achieve coalescing memory?
I have trouble understanding this concept. I've researched a lot online and the only thing I understood is that threads need to access consecutive data.
So if we have an array of 10000 integers, if thread i accesses i-th number of the array, then…

ksm001
- 3,772
- 10
- 36
- 57
0
votes
2 answers
Cuda - selective memory store
In my kernel, if a condition is met, I update an item of the output buffer
if (condition(input[i])) //?
output[i] = 1;
otherwise the output may stay the same, having value of 0.
The density of updates are quite unpredictable, depending on the…

phoad
- 1,801
- 2
- 20
- 31