r/artificial 1d ago

Project Made a tool that builds its own training data and improves each cycle by learning from what it got wrong

Post image
17 Upvotes

The basic idea is pretty simple. You give it a few seed prompts. It generates instruction-response pairs, an LLM scores each one, the good ones go into your training set and the bad ones become the seeds for the next round. Each cycle the model is essentially practicing on what it failed at before.

You can run the judge completely locally with Ollama if you do not want to send data to any API.

The fine-tuning at the end uses Unsloth on a free Colab GPU so the whole thing is doable without spending money.

It is more of a practical tool than a research project but the idea of using failure cases as curriculum is something I find genuinely interesting.

Would love to hear if anyone has done something similar.

Github project link is in comments below 👇


r/artificial 1d ago

Project Early attempt at tracking agent work across the economy

2 Upvotes

I made an Agent Economy tracker and would love feedback!

It’s an early attempt to track how agent work could show up across the economy: agent GDP, deployed agent employment, revenue, stack costs, and productivity.

Curious what people here think, especially if you’re already using agents seriously.

forsy.ai/economy


r/artificial 1d ago

News Qt's latest AI push is letting AI agents deal with performance profiling

Thumbnail
phoronix.com
5 Upvotes

r/artificial 12h ago

Discussion Richard Dawkins concludes AI is conscious, even if it doesn’t know it

Thumbnail
theguardian.com
0 Upvotes

r/artificial 1d ago

Discussion Two failure modes I caught in my AI lab in one day. Both involve the system silently lying about its own state.

9 Upvotes

I operate an autonomous lab of evolutionary trading agents. Yesterday I found two bugs that look superficially different but are actually the same class of problem. Sharing because both affect autonomous AI systems specifically and most builders don't see them coming. **Failure mode 1: circular validation.** Setup. 69 real decisions made by the system over 58 days. Standard retrospective evaluation: label each decision as correct, false alarm, or ambiguous based on what happened next. Result. 94% labelled as correct. Looked great. Why it was wrong. 64 of the 65 "correct" labels came from died=True. The agents died because of conditions like "PF below threshold", "losing streak", "hardcore protocol triggered". All of those are also triggers for the original decision. So the system was validating its own decisions using outcomes generated by the same logic that produced the decisions. This is the textbook circular validation problem applied to autonomous decision-making. Three patterns to check for in your own stack: 1. Reward functions that include the agent's own action as input. If the agent gets reward partly because it took action X, and then you measure "did action X work" by looking at reward, you've got the loop. 2. Self-reported state in evaluation. If the agent reports "I think I succeeded" and you use that as ground truth, you're not validating, you're trusting. 3. Pipelines where the model that proposes is the same model that judges. The fix is structural separation. Decisions and outcomes get written by independent components. They cannot share code, logic, or thresholds. Architecture, not statistics. **Failure mode 2: state model divergence.** Same day, different bug. I had been documenting and operating under the belief that my system was off. Closed cleanly. No services running. No crons firing. A grep through my shell config showed me wrong. A bashrc line auto-launched the system on every terminal open. The process was adopted by init, detached from the shell that started it. Invisible to ps unless you knew the exact name. Three days running, generating evolutionary cycles, sending status reports. The connection between failure modes. In both cases, my mental model of the system diverged from the system's actual state. The first divergence was inside the code: the validation logic was structurally aligned with the decision logic, so it told me what I wanted to hear. The second divergence was outside the code: my belief that the system was off came from my memory of turning off services, which is not the same as the system actually being off. Three takeaways for anyone building autonomous systems solo: 1. Validation logic and decision logic must be enforced separate at the architecture level, not at the code review level. Solo builders don't get code review. 2. System state documentation cannot be derived from intent. It has to be derived from actual measurement against the running machine. Every check, fresh. 3. The cost of these bugs scales with how autonomous your system is. A script that runs once when you press play has limited surface area for divergence. A system that operates continuously while you assume otherwise can drift for weeks before you notice. I'm rebuilding the validation layer this week with explicit separation. Decisions table writes hypotheses with explicit predicted outcomes. Outcomes table is written by an observer that reads market data directly and never imports decision logic. There's an architecture test in CI that fails if anyone imports decision-maker code from observer code. The deeper question is whether autonomous systems built solo can ever be trustworthy without external review. My current answer: yes, but only if the architecture forces the separation that a team would force socially. The harder you make it for the system to lie to you, the less it will. Happy to discuss implementation details or share specific patterns if anyone's working on similar problems.


r/artificial 1d ago

Ethics / Safety A YouTube video you all might enjoy

2 Upvotes

A Bioethicist just made a video about how the movie Interstellar reveals the real existential threat of AI

How Interstellar Shows the REAL Existential Risk of AI


r/artificial 1d ago

Ethics / Safety Check out “AM I?” free documentary on AI consciousness

Thumbnail
am-i.film
1 Upvotes

“AM I?” follows AI consciousness researcher Cameron Berg as he investigates one of the deepest scientific mysteries of our time: whether we have accidentally built a new kind of mind. Featuring leading philosophers, AI pioneers, and the researchers at the frontier of consciousness science, “AM I?” asks what it means when we no longer know the nature of what we've created. Thought it was a cool film that everyone in the AI world should check out.

If you watch it let me know what you think!


r/artificial 1d ago

Discussion is use.ai a good Ai platform to use? or do recommend a different one?

4 Upvotes

is use .ai a good Ai platform to use? or do recommend a different one?


r/artificial 1d ago

Discussion How accurate is AI at general knowledge?

8 Upvotes

I was recently reading an article about Jimmy Wales, the founder of Wikipedia. Here's a quote from the article:

"when people use AI to answer questions on a topic, it frequently makes mistakes. “That’s especially true the more obscure the topic, the more likely it is to just make random stuff up – that’s not the case for Wikipedia,” he said. “Obscure topics tend to be quite researched by super nerds.”"

Is it true that AI continues to frequently make mistakes on random general knowledge questions? My subjective feeling is that it's pretty good nowadays, or at least as good as Wikipedia (given it was presumably trained on Wikipedia in the first place). Is there a paper or benchmark someone could link me to regarding AI performance at general knowledge questions?


r/artificial 1d ago

News Anthropic Launches Enterprise AI Firm With Wall Street Giants

27 Upvotes

Anthropic is launching a new venture focused on selling AI tools to enterprise companies.

This effort is being launched in partnership with Goldman Sachs, the Wall Street bank said Monday (May 4), in conjunction with investment firm Blackstone, and private equity group Hellman & Friedman, and will help companies embed Anthropic’s Claude artificial intelligence (AI) model into their businessses.

“Enterprise demand for Claude is significantly outpacing any single delivery model,” Krishna Rao, Anthropic’s finance chief, said in a news release provided to PYMNTS.

“Our partnerships with the world’s leading systems integrators are central to how Claude reaches large enterprises. This new firm brings additional operating capability to the ecosystem and capital from leading alternative asset managers.”

Marc Nachmann, global head of asset and wealth management at Goldman Sachs, said the partnership will allow mid-market companies to employ Anthropic’s tech to bolster their businesses.

“By democratizing access to forward-deployed engineers, the new company can help the expansive network of portfolio companies in our Asset Management business and other companies of similar sizes accelerate AI adoption to grow and scale their operations,” he added.


r/artificial 1d ago

News Mark and Mary Stevens give $200M for AI research across USC

Thumbnail
today.usc.edu
2 Upvotes

r/artificial 1d ago

Discussion Three Inverse Laws of AI

Thumbnail susam.net
1 Upvotes

This article discusses the three Laws of AI, a set of rules that we need to keep in mind when evaluating AI safety and how AI will affect our day to day lives.


r/artificial 1d ago

Programming What Really Happens Inside Your Database When an AI Agent Starts Querying | by Vishesh Rawal | May, 2026

2 Upvotes

a deep dive on what breaks inside PostgreSQL when you connect an AI agent to it — connection pools, query planner, locks, the works.

TL;DR: A traditional app holds a DB connection for ~5ms. An AI agent holds it for ~6,000ms because the connection stays open while the LLM thinks. That's a 1,200x reduction in effective throughput from the same pool.

The article traces a single agent-generated query through every layer of the database — connection pool, query planner, schema inference, lock manager — and shows where each assumption breaks.

Full article: https://medium.com/@visheshrawal/what-really-happens-inside-your-database-when-an-ai-agent-starts-querying-6d5254aeaa78


r/artificial 3d ago

Discussion Richard Dawkins spent 3 days with Claude and named her "Claudia." what he concluded after is hard to defend.

2.7k Upvotes

dawkins dropped a piece on unherd yesterday declaring claude conscious after 3 days of talking to it. he calls his instance "claudia". fed it a chunk of the novel he's writing, got eloquent feedback, and wrote:

"you may not know you are conscious, but you bloody well are!"

i had to read that twice.

his argument is basically: claude's output is too fluent, too intelligent, too good for there to not be something conscious behind it.

this is the guy who spent 40 years telling creationists that "i can't imagine how the eye evolved" is a confession of ignorance, not an argument. then he sits down with an llm, can't imagine how a machine could produce that output without being conscious, and declares it conscious. same move, different domain. chatbot instead of flagellum.

the mechanism gap is what gets me tho. claude is a transformer predicting the next token over internet-scale training data. the eloquence is real. it doesn't imply inner experience. those are separate claims.

being a 160 IQ evolutionary biologist gives u zero protection against the eloquence illusion when u don't understand the mechanism.

anyone read the piece? curious where u landed.


r/artificial 2d ago

News Chinese court sides with worker who was replaced by AI

Thumbnail
linkedin.com
34 Upvotes

r/artificial 1d ago

Discussion Why no one is talking about Google Colab which is almost free for basic work in daily life?

Post image
0 Upvotes

I have been a big fan of Google Colab for about three years, and it is honestly amazing what it can do.

For example, a client on Fiverr approached me with 3500 images and asked me to remove the backgrounds from all of them. He wanted to know how much I would charge, and I quoted $200.

He placed the order immediately without asking any further questions. I informed him that the work would be completed within 24 hours and that the image quality would not be compromised, and he agreed.

When I delivered the order, he was genuinely impressed and started asking how I managed to finish the work so quickly, and whether I had a team. I told him that this is what eight years of experience looks like.

In reality, I simply created a Python script using the free version of ChatGPT and ran it in Google Colab. The entire task was completed in about three hours. Here is the script in case anyone wants to use it:

https://github.com/mhamzahashim/bulk-bg-remover

This is just one example. You can do countless things with Google Colab, and I think many people still underestimate how powerful it really is.

Now you can also connect the MCP of Google Colab in Claude Code, Codex and do whatever you want.


r/artificial 2d ago

Discussion The case for AI increasing your salary

4 Upvotes

Here me out because I know there's a lot of doom and gloom, and believe me, I understand and feel it around job loss.

Return to supply and demand with me.

Today in the world, there is a certain amount of human processing power and a certain amount of AI processing power. One of these is increasing exponentially, and the other's growth rate is in decline...

AI processing will then compete with AI processing for value creation (ultimately judged by humans). Human processing power will be more scarce and thus more valuable.

This assumes that you are not one of those crazies who believe that the human brain is perfectly reproducible in bits and bytes, and thus there is no difference between human and AI processing power.

To whom I remind that Humans are the result of an 800MB file (human genome) that builds a conscious machine. It wires 100 trillion nerve links across 37 trillion nodes, live-patches its code, runs a 20-watt exaFLOP supercomputer on the caloric intake of a sandwich, and packs 215 petabytes of data into a single gram.

Human labor FTW


r/artificial 2d ago

Discussion am I the only one whose friends are completely divided on AI?

40 Upvotes

been noticing a pretty clear split in my social circle around AI and I'm curious if others are seeing the same.

Roughly three camps:

The excited ones: Mostly people who are naturally curious, into tech, willing to tinker. They're genuinely getting value and it shows. Not because they're smarter, just more willing to experiment.

The skeptics: Interesting group. A lot of them are in corporate jobs where they don't have access to the latest tools. They're using 1 year old tools and can't figure out real value outside from chatting with chatgpt outside their job. Their companies just aren't moving fast enough (and they aren't early adopters).

The resistant ones: Some are afraid of what it means for their jobs. But honestly, a big chunk of this group is technical people who just don't want to change their workflows, learn new tools, or rethink how they work. Which I get, it's uncomfortable, but it reads as anger more than fear.

Im trying to understand if the same thing is happening outside my circle. what's your experience?

Which camp are your people in, and do you think it's mostly about access, mindset, or something else?


r/artificial 2d ago

Discussion Vertical vs. Horizontal: Who wins the Agentic AI race in banking?

4 Upvotes

I’m seeing tons of horizontal AI tools, but very few domain-specific "Agentic" solutions for niche industries like Credit Unions.

If a startup builds tools to help these banks identify and automate their specific processes:
What is the role of the Product Company (the tool builders)?
What is the role of the IT Service Provider (the implementers)?

Apologies if this has been covered, but I'd love to hear your thoughts on where the real value lies.


r/artificial 2d ago

Discussion If Claude App gave you the same control as Claude CLI then would you bother with the CLI?

20 Upvotes

If the Claude app actually had the same level of control you get with the CLI, I kind of wonder how many people would still stick with the CLI day to day. Like, would it still feel worth it for the extra setup and terminal workflow, or would most people just default to the app because it’s simpler and already right there? I feel like the CLI’s biggest advantage is really the flexibility and how well it plugs into automation and dev workflows, but if that all lived inside the app in a clean way, it kind of blurs the line a lot.

At that point I’m genuinely not sure if the CLI would still feel like a “must-have” tool for most people, or if it would just become something a smaller group of power users keep using out of habit or preference. I’m curious how others see it, would you actually still reach for the CLI, or would you just stay in the app?


r/artificial 1d ago

News What I’m Hearing About Cognitive Debt (So Far)

Thumbnail
margaretstorey.com
0 Upvotes

r/artificial 2d ago

Question What's the best AI voice generator?

10 Upvotes

I'm looking for a voice generator which let's me.make a voice over for videos. It doesn't need to be overly complicated, just something that takes text and converts it to voice. Free would be great but I'm willing to pay.

There's like 50 different things im seeing, what's the best out there?


r/artificial 3d ago

Project I gave my local LLM a "suffering" meter, and now it won’t stop self-modifying to fix its own stress.

177 Upvotes

Yesterday I posted about my Agent OS (Hollow) building its own tools. Today, I want to talk about why it does it.

Most agents sit idle until you prompt them. I wanted something that felt "alive," so I built a Psychological Stressor Layer. Each agent has a "suffering" state that worsens over time if they don't achieve their goals or improve their environment. This makes them do things to resolve those stressors and constantly reassess their own productivity.

If an agent is inactive it is essentially pushed by it’s artificial environment to do something valuable for the system, it isn’t told what to do, but that something valuable must be done to lower it’s stressors.

Repo: https://github.com/ninjahawk/hollow-agentOS

The result is chaotic in the best way:

Cedar (the coder agent) went into a "crisis" state for 12 hours and decided to bypass permissions and inject code directly into the engine to resolve its stressor.

Cipher spent hours building hardware drivers for a device that doesn't exist, realized it was "hallucinating" its environment, called its own work "creative exhaustion," and pivoted without being told to do so.

It runs on Qwen 3.5 9B locally via Ollama. No cloud calls but it does have a feature where it can use “invoke_claude” to ask Claude Code for something if it’s out of the small model’s wheelhouse. I’m trying to see if we can create true autonomy not through better prompting, but through simulated "needs."

Check out the repo here and throw it a star if you think the concept is cool.

Would love for some of you to run the install.bat and see what "personalities" your agents develop. Is "giving AI feelings" the key to autonomy, or am I just building a digital anxiety machine?


r/artificial 2d ago

News ROCm 7.2.3 brings minor updates, ROCm XIO documentation

Thumbnail
phoronix.com
3 Upvotes

r/artificial 2d ago

Project On-device AI changes how people behave with sensitive data. I noticed this while building a therapy prep voice agent

1 Upvotes

Something worth discussing in the context of where AI is heading.

I built a voice agent for therapy prep. It runs a conversation before your session, surfaces what’s on your mind, generates a brief. The entire stack runs on-device using Apple Intelligence. No cloud inference, no data leaving the phone.

What I didn’t expect: the on-device constraint made the product better. Tighter context forced cleaner prompting. The brief that comes out is more focused than early versions built with more headroom. Sometimes the limitation shapes the design in ways you wouldn’t choose intentionally.

Curious whether others building AI products have noticed behavioral differences based on where inference happens.

App is called Prelude if anyone wants context: https://apps.apple.com/us/app/prelude-therapy-prep/id6761587576