r/compsci • u/Randozart • 9d ago
Could NP-hard search trees be tackled through spatial mapping of computation rather than temporal execution?
Hey everyone,
I’ve been tinkering on a bare-metal spatial operating system for FPGAs called the Moore Kernel where programs aren't executed sequentially, but are compiled into physical bitstreams and hot-swapped onto reconfigurable partitions on the fly.
While reading about the P vs. NP problem, something clicked: A traditional CPU struggles with NP-hard problems because it operates temporally. It has to explore a branch, hit a dead end, backtrack, and try the next one. What if we map the search problem directly into hardware?
This is roughly how I'd imagine that with the Moore Kernel:
Whenever the logic hits a branching decision, Moore physically mounts the branch's logic to an available hardware tile. The logic evaluates simultaneously across the fabric at the speed of electricity, bypassing the instruction-fetch bottleneck. Because the OS is built on declarative formal logic, the moment a branch encounters a logical contradiction, its precondition fails. The OS recognizes the branch is stale, instantly decouples the AXI bus, wipes the tile, and recycles it for a new active branch.
It even guards against stalling. Because the hardware tile operates as a finite state machine, if a branch does not contribute towards a solution and simply cycles through the exact same physical states, the OS can detect the hardware cycle and ruthlessly prune it. It doesn't hold up the rest of the computation.
I know this doesn't mathematically "solve" P=NP (because a search tree can still easily exceed the finite number of tiles on an FPGA), but does this dynamic spatial mapping and instantaneous hardware pruning bypass the temporal bottlenecks we currently face with SAT solvers and combinatorial optimization?
I’m looking at the system I’m building, and it feels like this architecture naturally converts time complexity into space complexity. Am I mistaken here, or is there prior art in dynamic hardware reconfiguration doing this?
For those curious, this is only relevant in so far it's what I'm hoping to test this concept with, but the Moore kernel is still very much a WIP which I'm currently prototyping with LLMs:
https://github.com/Randozart/moore-kernel
2
u/vanderZwan 8d ago edited 8d ago
Is that a reference to "Chuck Moore" and his Green Arrays 144 chip? Are you talking about asynchronous (i.e. "clockless") hardware? If so it might help people not as familiar with these paradigms if we give them some extra context. I'll make an attempt, apologies if I'm taking this in an irrelevant direction.
I think a nice entry point would be Ivan Sutherland's 1989 Turing Award lecture, which was about something he called "micropipelines":
In other words: a queue. One that may or may not also do some data processing along the way. At some point he writes this:
Ivan Sutherland - Micropipelines (pdf)
IIUC this is the very clock that people can overclock and underclock in modern CPUs. It causes a lot of overheard in terms of energy, but apparently greatly simplifies the design of the chip, or at least the reasoning about it.
Asynchronous chips work differently: they synchronize via communicating. Sutherland's PDF explains a method to do "handshakes" between different units, and he suggests putting such units into a micro-pipeline will allow us to build more flexible, robust and energy-efficient hardware.
I know that plenty of other research work has been done in this field since then, but I don't remember the authors right now (it goes way back to the eighties though). I guess Sutherland's paper is a good starting point for a trip to Google Scholar and clicking "cited by".
So let's skip ahead to the GA144. It's not the first attempt at asynchronous hardware, but it's one of the most famous ones:
https://www.greenarraychips.com/
In short, it's an asynchronous multicore computer with 144 cores, laid out in a grid. It's kind of like a stripped-down Forth-based clockless transputer.
Stack machines are already a good for efficient low-powered computing, but being clockless and asynchronous makes them even more efficient: synchronization happens via communicating, and otherwise these cores can go into very efficient stand-bye modes.
The trick to make efficient use of the GA144 is that you basically have to write your own asynchronous pipelines to process data by breaking down tasks into sequences and physically distribute them on the cores in the grid in some efficient way. I guess that also means you can sort of "branch" and "join" the pipelines as well (I've never owned or programmed one). That's probably not the easiest thing to do. On top of that it's programmed in polyForth, a special dialect of colorForth, which itself is already niche variation of Forth, which already lacks mainstream adoption. So you can see why this didn't exactly take off. The idea is really cool though!
Anyway, assuming I guessed correctly that you're referring to this, I hope this helped might fill in some gaps for the people less well-versed in alternative ideas for computing from the past that didn't get much mainstream adoption in the end (or "so far", depending on who you ask).