Questions tagged [distributed]

Multiple computers working together, using a network to communicate

A distributed system consists of multiple autonomous computers that communicate through a . The computers interact with each other in order to achieve a common goal. A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs.

2221 questions
0
votes
1 answer

DistributedDatapParallel single-machine multi-card implementation with batch

I want to implement running my pytorch model training code on multiple Gpus on a single server. The specific scenario is as follows: The training epochs=2000, the total number of training data episodes for each epoch =1000, there are three GPUs. The…
chihiro
  • 5
  • 4
0
votes
0 answers

Can cnosdb ensure atomicity of batch writes in a distributed environment?

For example: INSERT m0(TIME, f0) VALUES(2079939785551584142, NULL), (1243152233754651379, 12321); The first line is illegal, the second line is legal. Can it ensure that either all writes succeed or all fail during writing? Either all succeed or all…
0
votes
1 answer

Any open source implementations of WS-DM working with JMX?

WS-DM is a web services equivalent of JMX. I am looking for an open source implementation...
McGovernTheory
  • 6,556
  • 4
  • 41
  • 75
0
votes
1 answer

Distributed Hash Table question about periodic republishing

I'm working on a custom DHT (Distributed Hash Table) network and I was wondering If a node's data ( pair) is republished upon its departure, is periodic refreshing every x hour necessary?
0
votes
1 answer

RedisLockRegistry with Webflux and kotlin

I am trying to get the LockRegistry working with Mono and Webflux. I do notice several times during a test that the lock fails. For starters are there any concrete examples that would show howto use the RedisLockRegistryin a webflux app?
user3241602
  • 91
  • 1
  • 5
0
votes
1 answer

C++ client : Ensuring Message Consumption Consistency in Aeron with Multiple Consumers on same channel

I have a scenario where I'm using Aeron messaging library with multiple consumers consuming from the same channel. I want to ensure that each message is consumed by only one consumer to avoid duplication and guarantee consistency across consumers.…
dopller
  • 291
  • 3
  • 13
0
votes
0 answers

RuntimeError: CUDA error: invalid device ordinal when implementing Distributed Data Parallel in Pytorch tutorial

I am trying to duplicate the tutorial Distributed Data Parallel in Pytorch https://www.youtube.com/playlist?list=PL_lsbAsL_o2CSuhUhJIiW0IkdT5C2wGWj and am getting the error RuntimeError: CUDA error: invalid device ordinal. The other stackoverflow…
jmuth
  • 71
  • 4
0
votes
1 answer

How Raft know previous term log entry committed or not

When I study raft, I have a problem. A Raft cluster has 5 servers. we call them a,b,c,d,e. a is the leader. Now everything is ok. Then, A handle a client request, makes a log entry. scenario 1, b & c replicate the log entry, d & e don't. Then a &…
tlb
  • 3
  • 3
0
votes
0 answers

Multi Node Training: How to use multiple GPUs on multiple machines in pytorch?

I am working on multiple machines and a single machine consists of two GPUs same as for the second machine. Overall, I have 4 GPUs in two machines. I am following the official example of PyTorch to train imagenet dataset. When I start the training…
Khawar Islam
  • 2,556
  • 2
  • 34
  • 56
0
votes
0 answers

Execute a code for only first clustered client in a Hazelcast Client Server architecture

I have a hazelcast Client Server architecture. The Java application having hazelcast client is on cloud and clustered i.e. multiple instances of it are running. How to execute a clean-up task only from the first hazelcast client? I know how to…
0
votes
1 answer

Massive flaw in raft algorithm

So the raft dissertation and paper say this is how to handle append entries: Receiver implementation: Reply false if term < currentTerm (§5.1) Reply false if log doesn’t contain an entry at prevLogIndex whose term matches prevLogTerm (§5.3) If an…
Ryan Glenn
  • 1,325
  • 4
  • 17
  • 30
0
votes
0 answers

Celonis connector in Kafka connect distributed mode

I am trying to connect to Celonis EMS using EmsSinkConnector using Kafka-Connect in distributed mode. I ran kafka in distributed mode using below command. nohup bin/connect-distributed.sh config/connect-distributed.properties >>…
SRIRAM RAMACHANDRAN
  • 297
  • 3
  • 8
  • 23
0
votes
2 answers

How to execute a windows executable remotely from a java server?

I want to execute a program on a windows machine from a java program that is running on a java server in another windows machine and return something from that executable. While I am researching different ways to do this, I don't seem to find any…
Jaizen
  • 121
  • 1
  • 11
0
votes
1 answer

Unable to use the Golang otel http client to propagate B3 headers to downstream service

I'm trying to use the otel packages to do tracing header (b3) propagation. Unfortunately I'm unable to get this to work. For the purposes of explaining, I have created a project on github which illustrates my problem:…
Rogier Lommers
  • 2,263
  • 3
  • 24
  • 38
0
votes
1 answer

JMeter Slave is unable to use the CSV when the count of Slaves are increasing

The load is 1000Threads from each instances. When I'm trying with 1 Master & 3 Slaves the jmeter is able to correctly use the CSV for slave instances however when the slaves are increasing , the JMeter is skipping the Thread groups which require…
speedy
  • 1
  • 1