0

I started from the premise that the most atomic, the smallest addressable piece on the venerable x86 PC is a byte. However, this algorithm (cellular automaton) really manipulates single bits. It requires some flexibility - it relies on fairly sophisticated mechanisms, however I am not sure if using Java's integer doesn't slow it down.

I think my question is would switching to assembly opened doors for some more robust techniques - would bitwise operations for example help it a lot?

The algorithm in question was found in a repository of another ingenious programmer: https://bitbucket.org/BWerness/voxel-automata-terrain/src/master/ThreeState3dBitbucket.pde

  • Assuming that your code works, you might want to try asking on [codereview.se]. And also looking at existing [questions about cellular automata and the Game of Life](https://codereview.stackexchange.com/questions/tagged/cellular-automata+or+game-of-life) there. – Ilmari Karonen Oct 03 '20 at 17:54
  • 2
    (…but do read their [tips for asking good questions](https://codereview.stackexchange.com/help/how-to-ask) first.) – Ilmari Karonen Oct 03 '20 at 18:09
  • Note that dealing with smaller types is not always faster if the code then use inefficient instructions such as conditional jumps or much more instructions. Using branchless instructions is a good start. Then you can leverage parallel programing such as multithreading and SIMD instructions to speed your code up. No need to use assembly: all of this can be done in native languages like C or C++. – Jérôme Richard Oct 03 '20 at 19:52
  • Thank you @IlmariKaronen I will take a look (at all of them). – Daniel Krajnik Oct 04 '20 at 02:35
  • Thank you @JérômeRichard, insights like that is what I was looking for really. I am also wondering what are the "limits of optimization" - are today's compilers doing good enough job or can we still (realistically) move to lower levels (to the point of building a custom "Cellular Automata ASIC") although this probably should be another question. – Daniel Krajnik Oct 04 '20 at 02:40
  • @DanielKrajnik: Your computer already has a massively parallel processor that is quite well suited for things like cellular automata. It's called the GPU. Fortunately, we nowadays also have pretty decent programming tools for [general purpose computing on the GPU](https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units). – Ilmari Karonen Oct 04 '20 at 12:09
  • @IlmariKaronen true, true CUDA/OpenCL are great and with so many resources it would be unreasonable not to assume NVIDIA/AMD's edge in this race. I remember ASIC's success though in crypto mining - if one decides on a specific algorithm could learning curve to switch to different architecture be nonetheless worthwhile? Even though the best FPGAs on the market now still cant compete with RTXs in terms of floating point operations... given something as elementary as operations on only 1s and 0s could there then be some siginificant performance improvement possible.... – Daniel Krajnik Oct 04 '20 at 12:44
  • Sure, a dedicated ASIC could probably be faster than a general purpose GPU, if you were willing to spend the time and money to design and build one. Off the top of my head, I'd expect the difference between ASICs and GPUs to be somewhat smaller for cellular automata than for crypto mining, since GPUs are already quite well optimized for texture processing and image filtering, which are rather similar tasks to CA simulation. But dedicated single-task hardware will always beat general-purpose devices, if the only thing that matters is raw performance on that single task. – Ilmari Karonen Oct 04 '20 at 13:09
  • … Anyway, building dedicated hardware to speed up cellular automata is not a new idea; one of the oldest examples is probably Margolus and Toffoli's CAM ("Cellular Automata Machine") series of accelerators from the 1980s. [Here's one review article](https://www.researchgate.net/publication/320077544_Cellular_Automata_Hardware_Implementations-an_Overview) that I found by googling "cellular automata hardware". – Ilmari Karonen Oct 04 '20 at 13:15
  • Very interesting, thanks again. That's a good point (if I understood it correctly) GPUs expose some functions for more granular and efficient control of "bitwise operations"... Need to look into something like that - really want to speed up this algorithm – Daniel Krajnik Oct 04 '20 at 16:06

0 Answers0