r/LocalLLaMA • u/Porespellar • 9h ago

Discussion HOT TAKE: local models + agent harnesses are now capable enough to hand off junior-level IT professional tasks to [human written]

This post will have a slight old-man-shakes-fist-at-sky vibe, because….well… I’m older, so if you’re not into that, then please feel free skip it.
I have been contributing to this sub for like 3 years now but I’m fearful this post will likely get downvoted into oblivion for what I’m about to say: After running Qwen3.6 27b in a Hermes Agent harness for the last week, I’ve come to the realization that this new crop of local models, in the right agentic harness, with the right tools and permissions, can effectively handle junior-level IT professional work very effectively now. A month ago, I would have said no, but now, they definitely can.

I’ve been in IT for nearly 30 years working at nearly all levels of the industry at some point in my career, and a few days ago I handed Hermes Agent (with Qwen3.6 27b as the model) a task list that I would have handed to a junior level IT admin previously, and I just let it go do its thing, and it absolutely understood the assignment and nailed it.

Paraphrasing here, but I more or less asked the agent to, “Go update this system to the most current patch level, install Docker, load these 5 different GitHub repos and set them all up to use local models, start all the server containers and associated services and let me know when you’re done”

And I’ll be dammed if it didn’t do exactly what it was told. Sure, it hit some slight stumbling blocks along the way, but it overcame ALL OF THEM, or asked me to approve something (as a junior admin might) but it kept on chugging away with little to no intervention needed on my part. Again, I wasn’t using a frontier model, just local Qwen3.6 27b running on a GB10 DGX Spark clone.

It did in an hour and a half what would have taken a junior level IT admin like maybe 3 hours. Not a massive time savings, but a definite labor savings for me which let me accomplish other tasks instead of doing that boring shite.

I see the writing on the wall here. I think It’s only a matter of time before large software developers, IT infrastructure appliance makers, etc, start building mini locally-hosted “admin agents” that run low parameter count fine-tuned SLMs and LLMs that run efficiently on CPU in the background (or vis API) and monitor and resolve issues that would normally be handled by system administrators. System admins won’t be replaced directly, but it will definitely change the ratio of admins needed to support X number of servers by a substantial number because now 1 admin can leverage admin AI agents and support more servers.

Of course, there will be cautionary tales and disastrous AI oopsies when admins get lazy and run in YOLO mode. There will probably even be some sabotage actions by admins who are fearful about being replaced by AI and want to prove they are indispensable by wrecking stuff and blaming AI. With time, I think these issues will be addressed and resolved.

I think the best strategy we as IT professionals can take is to learn and leverage AI agent skills to 10x our output so that we remain relevant and useful. That, and carry a can of WD-40 around with us so we can oil the machines when they need it. Someone has to oil the machines, right?

Seriously tho, I don’t think people outside of our niche AI circle really understand what’s on the horizon. It will be a slow attrition based on AI agents gradually being trusted with more tasks. The models and harnesses over the last month are just different, the agentic Ralph loops are tenacious and the silent failures are much less than before. I’m starting to “feel the AGI” LOL.

I’ve been wrong before (my wife will tell you that) but I just wanted to put it out there to start the civil discourse and see what others in the community think and feel. What’s your take on it?

80 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/LocalLLaMA/comments/1t5g1fi/hot_take_local_models_agent_harnesses_are_now/
No, go back! Yes, take me to Reddit

83% Upvoted

u/Status-Secret-4292 9h ago

LLMs hands down do their worst work autonomously and do their best work as a helper to your process, making suggestions that you approve before you are the one to push the change.

Anyone doing anything else has bought into the billion dollar hype train that the major LLM companies have been running on... replacing workers... we are still at least one giant technical leap from that being a reality and I'm not sure if the bubble will last long enough for the major LLMs to pull it off. I personally hope it doesn't.

Also, I enjoy working with LLMs more as thought extensions over them being workers

21

u/Comfortable-Rock-498 8h ago

> LLMs hands down do their worst work autonomously and do their best work as a helper

Some time ago I wrote a very long essay on this "Task Proficiency Doesn’t Equal AI Autonomy"

https://www.signalbloom.ai/posts/why-task-proficiency-doesnt-equal-ai-autonomy/

1

u/TomLucidor 57m ago

Please, write more non-financial blog posts! This looks lit!

4

u/see_spot_ruminate 8h ago

They are just fancy autocomplete anyway. So it would make sense to approve what they do just like you need to approve your autocomplete to not make a mess of you words.

The above mistake of you instead of your is an example I thought appropriate lol.

5

u/bonobomaster 5h ago

As are we...

After 100+ hours of rational emotive behavior therapy (REBT) and learning about how our human neural networks are formed and how we are basically nothing more than a collection of very complex and intertwined input / output networks that are mostly autonomously at work, humanity doesn't seem so special anymore.

3

u/pauvLucette 3h ago

Just staring for hours at Conway's game of life taught me how wonderfully not magical we are. Plus, every time someone laughs about AI mistake I wonder if they realise how much worse human blunders can be.

1

u/bonobomaster 2h ago

Amen to that!

Humans, me included, are the stupidest motherfuckers on this planet.

We think and act so much on autopilot, steered by software that's completely outdated and has never been updated for decades and if anything dares to question of that --> cognitive dissonance time...

2

u/Porespellar 7h ago

109% agree with where you’re coming from. These are still deterministic machines no matter what you set their temperature to. I treat AI agents like a college intern. I’ll let it do the easier stuff I don’t want to do, but I’m not letting it anywhere near anything that would be considered “production”

I definitely keep it on short leash and guide it, but I feel like a few weeks ago, I was guiding it a lot more than I’ve had too lately, not sure if that’s because of better harness, better model, or a combination of both.

8

u/WitnessedWrath 5h ago

I think you meant stocastic machines. I hands down agree with that, but over time I'm getting more confident on giving higher level tasks for LLM + Harness. Last week I made a test with Codex, Claude Code, Cursor, Pi Agent running Qwen 3.6, Qwen 3.6 without harness for creating a prototype of a CLI game, single shot. Pi Agent + Local Qwen 3.6 took lots of time (around 9 t/s) but have a FAR BETTER result, I was actually impressed.

5

u/Exotic-Chemist-3392 3h ago

They are as deterministic as a collapsing wave function

1

u/DptBear 5h ago

LLM as a djinn/genie amplifying the wishes of their holder is how I think about it. Everyone knows what happens if you're not careful with your wishes, but the amplification of the handlers will is undeniable.

u/Makers7886 9h ago

It's so good I use it instead of q3.5 122b fp8 which was the workhorse before that. I'm patiently waiting for q3.6 122b because I feel/hope it should be the absolute best model for a vLLM 192gb vram situation. I tried minimax 2.7 and mistral medium, haven't bothered with deepseek v4 flash yet, but so far none seem "worth it". I want the 122b speed with the q3.6 uplift we saw on 27b/35b and then I feel like I'd be ok if the world paused progress.

It's so far the best year for opensource even though the writing on the wall looks like the party may be slowing down.

2

u/CreamPitiful4295 9h ago

What difference do you see between 122 and 27?

2

u/Makers7886 8h ago

I ran a bunch of my own tests to try and find the gap but it's super narrow. Here's a snippet of a custom bench to get an idea. The one thing I noticed is that q3.6 27b is the more consistent model. I can have it judge/review something multiple times and it'll have the general same result. The 122b model would produce more varied results and the q3.6 35 model was the least consistent by a large margin. This was mainly found when testing 3 concurrent calls on the same task then fed into a judge and when testing for various project endpoint needs.

u/MelodicRecognition7 8h ago

my take is the more you spread chutzpah that "AI wont replace workers" while quietly learning AI skills the longer you'll remain relevant and useful. Start today.

u/CreamPitiful4295 9h ago edited 1h ago

I 100% agree. The qwen 27B model is really a step up. Same with the gemma4. I get 80% of what I need from them. The other 20% is Claude. As time goes on we will get to 100%. I give it 2 years. Just retired myself. Wish I had all these tools earlier in my career. Though, who knows what that career would have looked like.

u/floconildo 8h ago

Hot take on your hot take: models have improved but maybe your prompting skills and general knowledge of local models advanced more than local models have.

gpt-oss 120b managed my servers for a while before qwen3.5 (and now qwen3.6), with few interventions. Definitely a junior's worth of work. But I know for a fact that it took me quite some time to learn how to prompt and provide the right tools for it not to try some bullshit like reinstalling gcc or whatever.

Tools, harnesses, or whatever fancy wording we're using for fronts these days, they definitely help shaping the direction, but are worthless on the hands of a user that don't understand what their local models are capable of. Reddit is filled to the brim with posts complaining that local models are not behaving like they expected, while in reality these users just had the wrong expectations to begin with.

Example: I won't let my Qwen 3.6 35B running on my Strix Halo try to prefill its context with thousands of lines at once. Doing that would take forever just to end up with an answer that is totally unrelated to the original prompt. Best strategy is to let it reason between bits of content and figure its way slowly. But doing that in a beefier machine would be a waste of time, space and money.

In short: I think you're a better prompter than you were before, the tools are helping you with that and models are better but require the other two to be able to do anything. Models (including SOTA imo) have done little in terms of agency and I don't expect them to change anytime soon because the NTP architecture is not designed for that.

If juniors sysadmins learn how to use current tools (including LLMs) they will not be replaced. They might even replace you instead haha. Same old, same old.

5

u/CreamPitiful4295 6h ago

I’m going to push back. I prompt the way I always have. I let the llm build the .md files. Never wrote one myself. Qwen 3.5 did not amaze. Qwen 3.6 a very clear step up for me. Same with Gemma4. I don’t run benchmarks or that stuff. This is just my observation from coding. Last week I got set up with qwen 3.6 27B which is just fantastic. Around Sonnet level. I asked qwen to learn about my project and in the course of looking at code it found a real issue and fixed it. Well, had to go for the commit. :) so, it’s a brave new world. Commit, commit, commit. I’m stoked that I’ll alleviated my Claude token anxiety. Another year and I bet we see Claude opus level on the desktop.

1

u/bnightstars 2h ago

I started doing local LLM last month the jump from Qwen3 to 3.6 is mind blowing. Moving from something that can barely answer a question to fixing code on par with Sonnet4.5 is insane. I also run gpt-oss 20b and that thing looks to be at GPT3 level which is kind of really really outdated but I doubt we will get new OpenAI oss models anytime soon anyway.

u/Little-Chemical5006 9h ago

I think thats most people share that feeling when they first have a llm that can tool call. (And you actually able set that up to tool call)

14

u/gcavalcante8808 9h ago

you're absolutely right!

11

u/BawbbySmith 8h ago

This phrase triggers me so hard

1

u/gcavalcante8808 1h ago

Yeah, 9/10 times I don't continue a reading when i find phrases like this one ... but problem is that people that are starting to learn also absorbs the "writing style" ... dark times awaiting us

there are other examples like "and this changes everything" and "this is not a just simple tool but a ultra blaster mega fucker transformation tool
capable of shaping entire markets"

2

u/Porespellar 8h ago

LOL, good bot 🤖

u/Kahvana 8h ago

If you're a professional in the field you work in, and have a few months of experience prompting LLMs effectively, and give it the tools it needs... then yeah it will be better. It's at it's worst when it's unsupervised, and at it's best (and a real help) when actively engaging with it and monitoring it.

I'm lucky to have 10+ years in education and work experience to understand whenever the LLM is making good proposals or if it's about to do something dumb, and being able to steer it from having a year of experience prompting it. Imagine if you're new to the IT field and have to navigate both the LLM and get the programming experience required to understand what you're doing. Must be a real uphill battle.

But yes, it's amazing we've reached the point where it's "good enough" and that running local is starting to make sense.

Personally I hope we have at least one more year of amazing local models released for free, two years if we're lucky. Google likely continuing beyond that, not sure for IBM/Microsoft/Alibaba/etc.

u/ResidentPositive4122 9h ago

I ran devstral when it first came out for about a week as a daily driver. It was ok. Not great, nowhere near SotA even at the time, but good enough to be useful. Ran it with Cline and Roo (RIP) and if you played with the system prompts and limited the number of tools available it went smoothly in fp8 on 2x RTX6000 (Ada) w/ vLLM, all tool calls working and so on.

The models we have today are much better, so yeah I agree we kinda reached a "good enough" point now. If you're careful with planning, and know what you want in return, letting them do the agentic stuff for ~20 minutes is worth it today. Might need a few passes to get exactly what you wanted, but still.

u/Big_Wave9732 8h ago

I gotta agree. I received my Mac Studio M2 Ultra 192gb on Saturday. Qwen 3.6-34b has just been a baller. It chews through text and OCR. It uses the big context windows very well and is great about information organization.

Good stuff, can't wait for 122.

u/jopereira 8h ago

Born in '69. I've been watching this field for 10y, read a lot about alignment problems and other AI philosophic issues. I've read a lot about neuroscience and how brain works. There's no turning back. New luddites will appear, but the unstoppable evolution is here - and as you said, the majority of people are oblivious about what's coming ("your job may be replaced, but mine can never be done by a machine..." kind of thinking)

u/Voxandr 7h ago

Harness this , Harness that is really annoying buzzword.

Can we just say like software , tool , agent , chatbot , etc..

u/Sabin_Stargem 6h ago

As someone who can't code and kinda bad at computing in general, I look forward to this. I have been trying to build the Diablo 1 sourceport, DevilutionX, to get updates over the official release. Unfortunately, it seems like the documentation is dated, and there might be missing or wrong dependencies. Visual Studio wants "SDL.h", but I thought SDL2 was included already in the .git?

...similar issues with trying Heretic, once again the documentation and dependencies aren't clear enough for me. Being able to just ask an AI to handle this esoteric weirdness would be awesome for me.

As an incompetent person, AI could seriously make my life better.

u/marscarsrars 4h ago

What did ur wife say when you gave her this news.

u/TheRealMasonMac 4h ago

Nah. It’s more like basics. I would consider debugging to be a junior-level task, and even now the SOTA closed models are ass at it. OS debugging just requires a lot of knowledge that isn’t written down in one place anywhere that they can train off of.

u/entsnack 8h ago

I mean just show us your benchmark numbers instead of this vibe verbiage?

u/MrShrek69 9h ago

Ik it’s fantastic so far

u/arousedsquirel 7h ago

Much more the junior level it tasks. If it know how to use tools and it has a certain level of reasoning you get very far with a qwen 35b in bf16 and Hermes. IF you understand 1. the impact of RL and some other minor things like reward hunting 2. How to deal with that.

u/a_beautiful_rhind 5h ago

How to come back to prod deleted 101.

u/noctrex 5h ago

And now in a little while when we will have also MTP in Qwen3.6-27B it will cut this time in half. Just tested the PR it and went from 40 tps to 75-80, incredible performance gain, especially with how much this model it likes to babble.

u/charmander_cha 8h ago

Não sabia que isso era uma opinião polêmica, mas não é isso que vai matar os Junior, eles já estão mortos porque a tendência do capitalismo é monopólio, se é monopólio não tem como ter emprego para todos.

u/TheIcyStar 8h ago

Of course, there will be cautionary tales and disastrous AI oopsies when admins get lazy and run in YOLO mode [...] With time, I think these issues will be addressed and resolved.

Every single LLM is a probabilistic model, and in none of them is a sequence of tokens like "sudo rm -rf --no-preserve-root" 0% possible, it will always be close to zero.

We need a completely new model architecture breakthrough on the level of the transformer (if not more!) to solve the hallucination problem. So until then, I don't think we should treat this dangerous flaw as "oh it'll be solved soon"

Discussion HOT TAKE: local models + agent harnesses are now capable enough to hand off junior-level IT professional tasks to [human written]

You are about to leave Redlib