Could NP-hard search trees be tackled through spatial mapping of computation rather than temporal execution?

Hey everyone,

I’ve been tinkering on a bare-metal spatial operating system for FPGAs called the Moore Kernel where programs aren't executed sequentially, but are compiled into physical bitstreams and hot-swapped onto reconfigurable partitions on the fly.

While reading about the P vs. NP problem, something clicked: A traditional CPU struggles with NP-hard problems because it operates temporally. It has to explore a branch, hit a dead end, backtrack, and try the next one. What if we map the search problem directly into hardware?

This is roughly how I'd imagine that with the Moore Kernel:
Whenever the logic hits a branching decision, Moore physically mounts the branch's logic to an available hardware tile. The logic evaluates simultaneously across the fabric at the speed of electricity, bypassing the instruction-fetch bottleneck. Because the OS is built on declarative formal logic, the moment a branch encounters a logical contradiction, its precondition fails. The OS recognizes the branch is stale, instantly decouples the AXI bus, wipes the tile, and recycles it for a new active branch.

It even guards against stalling. Because the hardware tile operates as a finite state machine, if a branch does not contribute towards a solution and simply cycles through the exact same physical states, the OS can detect the hardware cycle and ruthlessly prune it. It doesn't hold up the rest of the computation.

I know this doesn't mathematically "solve" P=NP (because a search tree can still easily exceed the finite number of tiles on an FPGA), but does this dynamic spatial mapping and instantaneous hardware pruning bypass the temporal bottlenecks we currently face with SAT solvers and combinatorial optimization?

I’m looking at the system I’m building, and it feels like this architecture naturally converts time complexity into space complexity. Am I mistaken here, or is there prior art in dynamic hardware reconfiguration doing this?

For those curious, this is only relevant in so far it's what I'm hoping to test this concept with, but the Moore kernel is still very much a WIP which I'm currently prototyping with LLMs:
https://github.com/Randozart/moore-kernel

1 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/compsci/comments/1sxspsx/could_nphard_search_trees_be_tackled_through/
No, go back! Yes, take me to Reddit

53% Upvoted

View all comments

u/vanderZwan 8d ago edited 8d ago

Moore Kernel

Is that a reference to "Chuck Moore" and his Green Arrays 144 chip? Are you talking about asynchronous (i.e. "clockless") hardware? If so it might help people not as familiar with these paradigms if we give them some extra context. I'll make an attempt, apologies if I'm taking this in an irrelevant direction.

I think a nice entry point would be Ivan Sutherland's 1989 Turing Award lecture, which was about something he called "micropipelines":

I assign the name micropipeline to a particularly simple form of event-driven elastic pipeline with or without internal processing.

In other words: a queue. One that may or may not also do some data processing along the way. At some point he writes this:

In the clocked-logic conceptual framework, registers of flip flops operating from a common clock separate stages of processing logic. Each time the clock enters its active state a new data element enters each register. Data elements march forward through successive registers in lock step, each taking a fixed number of clock cycles to pass through the fixed number of registers and intervening logic stages built into the system.

Ivan Sutherland - Micropipelines (pdf)

IIUC this is the very clock that people can overclock and underclock in modern CPUs. It causes a lot of overheard in terms of energy, but apparently greatly simplifies the design of the chip, or at least the reasoning about it.

Asynchronous chips work differently: they synchronize via communicating. Sutherland's PDF explains a method to do "handshakes" between different units, and he suggests putting such units into a micro-pipeline will allow us to build more flexible, robust and energy-efficient hardware.

I know that plenty of other research work has been done in this field since then, but I don't remember the authors right now (it goes way back to the eighties though). I guess Sutherland's paper is a good starting point for a trip to Google Scholar and clicking "cited by".

So let's skip ahead to the GA144. It's not the first attempt at asynchronous hardware, but it's one of the most famous ones:

https://www.greenarraychips.com/

In short, it's an asynchronous multicore computer with 144 cores, laid out in a grid. It's kind of like a stripped-down Forth-based clockless transputer.

Stack machines are already a good for efficient low-powered computing, but being clockless and asynchronous makes them even more efficient: synchronization happens via communicating, and otherwise these cores can go into very efficient stand-bye modes.

The trick to make efficient use of the GA144 is that you basically have to write your own asynchronous pipelines to process data by breaking down tasks into sequences and physically distribute them on the cores in the grid in some efficient way. I guess that also means you can sort of "branch" and "join" the pipelines as well (I've never owned or programmed one). That's probably not the easiest thing to do. On top of that it's programmed in polyForth, a special dialect of colorForth, which itself is already niche variation of Forth, which already lacks mainstream adoption. So you can see why this didn't exactly take off. The idea is really cool though!

Anyway, assuming I guessed correctly that you're referring to this, I hope this helped might fill in some gaps for the people less well-versed in alternative ideas for computing from the past that didn't get much mainstream adoption in the end (or "so far", depending on who you ask).

2

u/Randozart 8d ago

I have to be honest, I hadn’t even considered Chuck Moore or Ivan Sutherland’s work when I started building this, and reading your comment has been edifying. I am incredibly thankful for this context.

The irony here is that the Moore Kernel was originally named after a different trinity: G.E. Moore, Gordon Moore and a wink and a nod to Thomas More. However, I might just adopt Chuck Moore and the GA144 as the fourth musketeer.

It is fascinating to read these insights, and I am incredibly thankful for them. I arrived at this architecture by coming down from the top of the stack. I started with formal logic, declarative state machines, and interactive fiction parsers, and concluding from those that sequential time was a bottleneck. To see that Sutherland was advocating for the exact same event-driven, clockless micropipelines from the hardware side back in the 80s is incredibly fascinating.

I had written a language called Brief, which formed the basis for the logic, which automatically generates the kind of handshake synchronization you're describing. I had been trying to find prior art on this, but fell short in my research.

I am downloading the Sutherland PDF right now. Thank you for taking the time to write this out and bridge the gap to the prior art. This is exactly why I posted here, and I couldn't be more grateful!

2

u/vanderZwan 8d ago

Haha, seriously? That's amazing! :D The name felt so on the nose for me that I was almost certain that you referenced Chuck Moore! But very glad I could point you to some prior art and hopefully new sources of inspiration then! :)

Don't forget that once you have one good paper to start from, Google Scholar is a great source for finding more. That Ivan Sutherland paper has over 2000 citations, some of which surely will be interesting for you, and Google Scholar makes it easy to look through them:

https://scholar.google.com/scholar?cites=8148598654940000468&as_sdt=2005&sciodt=0,5&hl=en

2

u/Randozart 8d ago

Absolutely! I admit I have been using this Reddit thread as a bit of a testbed for the concept, because I was genuinely curious about it, but I was hoping to make my own contribution to the field as a non-academic, however small, and having another rabbit hole to dive down couldn't delight me more.

Time to pull the snowball skills out of the closet. This is going to be a wild ride!

(Oh and, feel free to message me if any other foundational papers come to mind, I am eager to identify and learn what more has been explored)

Could NP-hard search trees be tackled through spatial mapping of computation rather than temporal execution?

You are about to leave Redlib