Messing around with LLMs

Sept. 6, 2025, 8:50 a.m.

A large part of what I’ve been doing for fun lately has involved playing with large language models. Specifically, I’ve been trying to build some sort of applications of them to do potentially useful things. “Useful” has an elastic definition.

My boss expressed the thought that we should try to use AI instead of hiring more people any time we had problems at work that we needed solved. The main problem I have at work is that business decisions are made too slowly, by people who are frequently traveling, and then talk too much whenever you actually do manage to get them in a room. Hence, we don’t have high quality discussions that result in decisive actions being taken. I thought it would be perfect to solve this by creating a multi-agent team that could augment or replace the entire C-suite. My thinking was two-fold:

  1. If the problem is that the communication loop is too slow, agents could talk to each other much faster and therefore accelerate decision making. They’d probably come up with slop, but it would still be overall faster to have a bunch of slop to talk about than to have nothing and fumble about in the dark.
  2. People talk a great deal about AI replacing coders, but my thought is that will never completely happen. Writing code is very detailed work, and even very good LLMs are still bad at details. A human being with expertise needs to check it. You know what kind of work there is where details matter less? Executive leadership. The details are quite literally beneath them.

Work has ChatGPT Enterprise, but IT didn’t give me any API keys. I told them this side project wasn’t that important (I didn’t tell them what it was), so I’m sure it fell straight to the bottom of their priority list. I’m actually perfectly OK with that because there are other things on their priority list that I’d rather have them do. But, I still wanted to try this crazy idea and play with LLMs programmatically.

I ended up installing Ollama on my own machine, along with a bunch of open models. I have an nVidia RTX 2070 with 8 GB of VRAM and 32 GB of system RAM, so I was surprised at how well this worked for many models. Just a few that I tried:

I tried to realize my “multi-agent executive” idea by wiring a bunch of models together with AutoGen, and applying the RAPID framework from Bain Capital to them (I saw an ad for it on LinkedIn, and thought it was wacky enough to try). The output that was produced was genuinely awful, and looked like a role-play of Soviet bureaucrats role-playing American business executives. It was enough to discourage me completely from the idea, at least for now.

I’ve been playing a lot with RAG approaches also. I’ll document those in another blog post, because I’ll probably add one of them up here as a side project to play with.