How can i create system for distributed calculations?

Question

I am a student of faculty of Cybernetics and I want to write one project using Java. I want to create system for distributed computing.

It will contains next components:
1. User's main program (different for each concrete situation)
2. User's task program, that can only solve some little task (also different for each case)
3. My program, that will interact with user's main program to know, which tasks are needed to be solved 4. My program, that will interact with user's task program to tell it input data and get output data 5. Apache Tomcat and my servlets + database, all this will allow next things:

register main program and calculations node in system
save in DB tasks from main program, save task results, that will be sent from nodes
see some statistic information (how many tasks are solved, how many nodes in system and so on)

Please tell me some ideas about designing this system. I also want to know, how can my java program interacts with user's program on local machine (i mean data exchanging).

p.s. thank you, sorry for my English and remember, that i want to write my own system (so i can't use existing solutions)

Existing applications will give ideas about systems and discussions about them will add information about what works. So start by learning from others and the ones who really know this have written or used existing systems — mmmmmm, Jan 06 '11 at 18:39
I can use existing applications for ideas, but i want to create my own from scratch :) — Timmy, Jan 06 '11 at 18:46

score 2 · Answer 1 · answered Jan 06 '11 at 18:43

2

Read up on Linda and JavaSpaces. Then read up on Apache Hadoop and MapReduce. That should give you some ideas for the ways these things can be tackled.

answered Jan 06 '11 at 18:43

regularfry

3,248
2
21
27

+1 JavaSpace is great for passing around messages for job requests and responses. As long as you're not sending large amounts of data back and forth, it works well. – Erick Robertson Jan 06 '11 at 18:47
Thank you. I'm planning to work with big data (for example, big image processing) but also with this system it will be possible to solve tasks such as find Hamilton's way on graph, bruteforce MD5 hashes and so on (so, it not always big data). – Timmy Jan 06 '11 at 19:52

score 0 · Answer 2 · answered Jan 06 '11 at 18:35

0

Have a look at the Java Remote Method Invocation Tutorial to understand the nuts and bolts of distributed programming.

http://download.oracle.com/javase/tutorial/rmi/index.html

answered Jan 06 '11 at 18:35

Will

6,179
4
31
49

1

I would not recommend RMI to start. It has its own specific use, but it tends to be complicated to understand and a bear to get up and running. Everything has to be set up just so, and it's not suited for distributed processing without additional libraries or software. – Erick Robertson Jan 06 '11 at 18:57
1

Totally agree :) Just watched the hadoop lectures here. http://vimeo.com/3584610 – Will Jan 07 '11 at 00:21

score 0 · Answer 3 · answered Jan 06 '11 at 18:38

0

For learning concepts, I'd recommend studying how Hadoop works. You'll learn a ton!

answered Jan 06 '11 at 18:38

David Weiser

5,190
4
28
35

score 0 · Answer 4 · answered Jan 06 '11 at 18:41

0

The speed of your networked system will depend primarily on how autonomous each node is (ie, reliance on new data), and how even the distribution of processes are. It's my belief that your solution will resemble the multiprocessing model through necessity.

answered Jan 06 '11 at 18:41

motoku

1,571
1
21
49

As i think now, each node will very autonomous and it will communicate only with central node to get new task and send results of solved tasks. What about processes distribution, i want to allow computers from local net as good as from internet to connect to my system. (so if you'll need to solve task in real-time, you use local net). – Timmy Jan 06 '11 at 19:43
The process distribution could be done through sockets. Broadcast packets would alleviate the cost of too many streams open at once, but you'd need a custom handshake confirming every packet, either in bulk or one at a time. On the other hand your server or process manager could only accept a certain number of stream connections at once and rely on the nodes to retry their connection. As for distributing the processes I would send each node their work in a still tangible form, i.e. as you would pass the arguments to an object extending Thread just before calling run(). – motoku Jan 07 '11 at 05:33

How can i create system for distributed calculations?

4 Answers4

Linked