AI with Accountability, Cost Control, and Mission Velocity Artwork

Fed to Fed

Connecting government and industry to promote innovation through collaboration. The Fed to Fed podcast highlights the latest in innovation and technology modernization in the US government and industry. Join us for inspiring conversations with leaders and change-makers in the government technology space.

All Episodes

Fed to Fed

AI with Accountability, Cost Control, and Mission Velocity

November 19, 2025 • GOVTECH CONNECTS • Season 3 • Episode 6

Agentic AI is a powerful tool for enhancing mission outcomes and managing increasingly complex data environments. But model structure and balancing autonomy with oversight are key to responsible AI implementation. What do agencies need to know now to prepare for future success?

Find out in today's Fed to Fed podcast episode with:
- Ben Cushing - Chief Architect, Health and Life Sciences, Red Hat
- Moderator: Susan Sharer, Chief Executive Officer, GOVTECH CONNECTS

Thanks for listening! Would you like to know more about us? Follow our LinkedIn page!

Susan Sharer: 0:00

Welcome to the Fed to Fed podcast, where we dive into the dynamic world of government technology. In this podcast series, we'll be joined by current and former federal leaders and industry trailblazers at the forefront of innovation. Here, we speak openly and honestly about the challenges and opportunities facing the federal government and the Department of Defense and its partners. in the modern age, driving innovation and the incredible capabilities of technology along the way. Whether you're a federal leader, a tech industry professional, or simply fascinated by IT modernization, just like us, this podcast is for you. And we're so happy to have you tuning in.

Susan Sharer: 0:48

As generative AI adoption accelerates, federal agencies face a mounting dilemma. Soaring opex from hosted LLM services, limited transparency, and brittle integrations. This discussion explores how a genetic AI offers a more sustainable, modular, and mission aligned alternative. We'll cover three pillars of transformation architectural modernization, cost control, and CapEx realignment and operational autonomy to ensure safe deployment. This session will emphasize the role of policy driven guardrails, adversarial AI oversight, and composable governance patterns that align with Executive Order 14 028. The AI Risk Management Framework and OMB m Dash 25. Dash 21. Federal CIOs must modernize AI architectures, not just AI apps, if they want scalable, secure, and audible automation. Ben Cushing. Thank you so much for joining us today.

Ben Cushing: 1:53

Happy to be here.

Susan Sharer: 1:54

I'm really excited about this discussion. And let's jump right into the questions. So Ben, why are agencies drawn to AI systems that can process fast context windows and generate insights beyond human capability?

Ben Cushing: 2:08

It's to augment human beings. There's so much complexity in the world we live in. There's so much knowledge that exists already, and it's being produced every day. The amount of complexity and knowledge that we encounter surpasses our our ability to perceive the world around us completely and holistically. And a large context window within a large language model service, or any large language model allows for more information, more context to be provided for a richer response from any one of those services.

Ben Cushing: 2:45

Example of that might be if you were really interested in soccer as a not a topic I talk about a lot, but, if you're interested in soccer, you start asking about a one game, then, you know, you're going to get response about one game. If you ask about the trends of soccer over hundreds of years, then the context window is going to increase as you put more and more information into the prompt. And the lip service has to hold on to the running conversation that you're having in order to have more rich conversation.

Ben Cushing: 3:17

You can also think of a context window in the same way we have a conversation with each other. So when you start talking with somebody, if you have a conversation that lasts for four hours, that's a much larger context window. And when somebody references something, the beginning of that conversation, they're expecting you to pick up on that and respond to it or close out a thought from the beginning.

Ben Cushing: 3:38

That's how you have a more rich dialog, versus the small talk of like, back and forth. How's the weather? That's a small context window. So I think that in order for large language model services to really, serve us properly, we're going to need to continue to increase those context windows. There's significant costs associated with that and technical challenges. One of the we're here to talk about agentless architecture. That's something that will inevitably help with those context windows. So we'll come back to that in a second, I think. So we get through these questions.

Susan Sharer: 4:15

It sounds great. And so how does the history of software, from monoliths to microservices, help us understand today's challenges with land based applications?

Ben Cushing: 4:28

Right. So, you know, I'm, I'm touting the, the brilliance and how great LLM services are. But unfortunately, we've created a new monolith. So for those who have done any sort of application development programing of any kind in the last 25 years, there's been a general movement away from creating large applications that are difficult to, to test, very difficult to debug, very costly to run these a lot of memory, difficult to move around like they're in terms of their portability. Those applications require pretty large, substantial runtime. So over the last 25 years, the the practical decision, there was a practical decision made by the community that we should be using microservices instead.

Ben Cushing: 5:17

And so there's a very common practice within modernization of IT services where you deconstruct the, the monolith. Thus, the less the more colorful way to say that people say we'll strangle the monolith. I don't know why they went with that, but it's like it's pretty accurate in the sense that you're taking this monolithic application. Your, isolating specific features of that application, and then you're extracting them from it into a microservice that then gets referenced by the larger application.

Ben Cushing: 5:50

And over time, you're pulling more and more parts out of the monolith until you have a deconstructed set, an array of microservices that together still do the same thing. The application did the single one, but they're all been, pulled apart. And each one of those microservices runs as its own software, application, meaning that it has its own life cycle. You test each one independently, and then you can test them as a group.

Ben Cushing: 6:19

And in this way, we're able to solve some of the problems I just mentioned, like we're able to have, more succinct, software lifecycle for each one of those applications. Testing is a whole lot easier because you're testing each individual part and then you test the whole. It's a lot easier to debug because you can find a bug within the single microservice. And traditionally, if you found a bug in a in a monolith, it takes forever to find where that bug might be because you have to look over the whole thing.

Ben Cushing: 6:44

Y ou can't just go right into the microservice that's already been pulled apart. Also, these microservices are a lot more portable, so we can now pick and choose different features that we want depending on what's necessary for the environment or where the application will get deployed. So the hardware can only support a third of the features and pull a third of the features from the, the microservice architecture and just deploy those individual microservices within that, edge location. So going back to Lmms, we kind of messed up in the sense that we were still doing it. Like the these large language model services get bigger and bigger.

Ben Cushing: 7:28

I think right now, GPT five is about 163 billion parameters as a model. That's a huge load that there's some good stuff there. It means that it's very smart. It can talk about tons of stuff. It has, you know, incredible amount of knowledge baked into it. To run that model. You need, like, a lot of memory to just have it running in real time and, all of the services that are attached to it to make it function for all the different users that are hitting that model or instances of that model requires also a lot of resources.

Ben Cushing: 8:09

So the technology companies that run those kinds of services are also starting to build up again to services, along that just try to kind of do what I'm describing here. But I'm going to I'm going to take a different tact in that I'm going to say that those large language models have probably reached a plateau of size where it's like really useful to have a model that big.

Ben Cushing: 8:35

I think we've reached a plateau where it's just like at a certain point, the model does not need to get bigger and bigger. It just doesn't need to know that much anymore. There's of course, an endless amount of knowledge and things we could pack into it. But if we start to think about how to create microservices from an LM, we start to think about domain specificity, meaning that instead of a huge, large language model that knows everything from physics, gardening, we have a couple small language models that know something.

Ben Cushing: 9:06

One of them knows something about physics, and one of them knows something about gardening, and we run them together in a microservices architecture. And now the system knows about gardening and physics, but each one of those models is independent of each other, has its own software license life cycle. Each model can be tested. Each model can be, audited. Things of that nature.

Ben Cushing: 9:32

And what I'm describing here is not necessarily objective, but it is the decomposition of a large language model into smaller nodes. And together they have the same effect as a large language model, but are easier to manage. Genetic itself is when you start to use these models to take action. So if you break down the word, it means, agent, right.

Ben Cushing: 9:57

It's like these are a bunch of agents. What agents do they do? Stuff. Right. So each one of these little agents is responsible for maybe processing some data or or reviewing content or querying an external tool to do something like, hey, go open a ticket with ServiceNow or send an email. Right? That's what the agents for. But each one of those agents has generally a large language model or a small language model associated with it has the thinking center.

Ben Cushing: 10:29

So when you start to create agents that are have domain specificity, like let's say there's maybe like 6 or 7 agents that do gardening and 6 or 7 agents that do physics. Right. Those little pods of agents are again relying on a small language model that knows the domain specificity. And when you combine them together, you then have a more nuanced view of the world.

Ben Cushing: 10:56

Large language models, when they first were developed, they're, of course, based on deep learning, which is based on neural nets. Neural nets are the idea behind them are to try to mimic the way the brain works are you have layers and layers and layers of thought. What's kind of interesting is that the genetic architecture, the way I'm describing it, is also very similar to the way the brain works, and that you have individual nodes that are responsible for a single thought, and they're connected to each other. And the communicating. So the I'll say when we get into this microservice architecture and, and we apply genetic concepts to it and it starts to act. One of the really neat things is the convergence of this existing application development cycle. You know, turning monoliths into microservices and developing a genetic.

Ben Cushing: 11:49

And a lot of the things that we learned in the microservices world will apply directly to a genetic development. So things like policy, right. Like we can we can create a policy that all the genetic systems have to follow. Governance of the genetic systems, traceability, logging. Like every time one of those agents communicates to another agent, it creates a log. We know what they thought about what they told each other. We know it explicitly. It's not it's not implied. We know exactly what they said to each other in a large language model system.

Ben Cushing: 12:25

You can ask the large language model what it was thinking when it came up with the answer, but it has to think in order to produce the answer for you. Meaning that when you do the audit, it can hallucinate in the audit too, which is obviously not great. Like there's plenty of examples of people recording large language models lying about what they were thinking to protect themselves in. In a genetic system, the communication from one to the other is a wrote it task. That communication happens over a network, and that network will record what happened, what was said. And now you have an audit trail.

Susan Sharer: 13:06

Wow. That's awesome. Thank you for breaking it down for us, Ben. So, Ben, what of risks of treating looms as monolithic applications? And what lessons learned can you take from past modernization efforts?

Ben Cushing: 13:20

I think I dovetailed into this question like accidentally with my previous response. But I'll I'll just add that, the risks are that we, we just keep leaning more and more on resource consumption to the point where models are just too large to run, and maintain a profit margin. What's not really clear when you interact with, you know, anthropic Gemini? OpenAI is those companies aren't making money. They're they're losing money. Because the amount of resources they use to run those services far exceeds the amount of, money they're getting. And subscriptions to the services.

Ben Cushing: 14:03

They're still really trying to figure out the business model. So I think the risk we run is making that worse as we build out more larger, larger models. The other thing, too, is we're the longer we wait to embrace a genetic, the longer we screw tech debt, right? Like, usually when it comes to decomposition of monoliths, IT organizations and enterprises treat that as technical debt they have to attend to. The longer we wait to do that same process with Elm services, the more technical debt we have, we accrue, and then we have to go back and readdress it over another 20 years or however long it takes to decompose these monoliths.

Susan Sharer: 14:50

Wow. So, Ben, how can agency structure I use in a way that keeps operational costs manageable, even as demand skyrockets?

Ben Cushing: 15:01

Yeah. So there's a there's a value trade off here that is worth discussing. So when you use an LM service that's usually paid for through opex, meaning that your operational costs include the LM service itself. Okay. If you're if a federal agency has, a contract with OpenAI, they generally go into that with some idea of what they're going to spend month to month. They set up a contract with, you know, the large company, in this case, OpenAI. And they start to, you know, they, they turn on the service for users. Users start to use it. And as you might imagine, it becomes a very popular, so popular, in fact, that it exceeds the resources that open opening AI and that agency negotiated. And then OpenAI says, hey, we either have to throttle the use or we have to renegotiate because we can't afford to just keep giving this to you. It's way more than we expected.

Ben Cushing: 16:05

I think I can say safely that that has happened almost every single time that an agency has contracted with an LM service because these things work. They're incredible. They help people. They augment our abilities. There's an endless amount of faux pas that people do with them, which that's, you know, those are like, I think those get lost in how practical and useful the tools are. And that's all under opex. Okay.

Ben Cushing: 16:37

Now, the way to deal with this is there's a there's like I said, there's a value trade off that occurs when you start seeing your opex costs exceeding the CapEx expenditure for building your own service. Then it becomes a build your own scenario where I'm just these are completely made up numbers. But like, let's say you're spending $40,000 a month on open AI and you have tons of users that have exceeded the limit. Your opex is is ballooning. You then say, okay, well, if I put down a couple million dollars to build my own service, it does the same thing or something like it. How how soon will I read coop the actual $2 million? If I compare it to the opex cost? Right. So it's just a basic mathematics and agencies are getting to that point.

Ben Cushing: 17:38

There are a number of commercial entities that have reached that point already and have decided we're just going to build our own. And to their to their benefit. There are a lot of tools already available to them to do just that. The open source community has responded to this new. I, I should say, Jen, I space and it has developed many, many tools. A lot of them, in fact, are used by Google and Anthropic and OpenAI to run their own services. So you can actually go get those AI, tools yourself. And given that you have the, you know, the skills, the right, partnerships and the money, you can do your own service and that that's where the opex versus CapEx conversion is just generally happening.

Susan Sharer: 18:27

Wow. Thank you for that. So, Ben, if you had to pick one, what's the non-negotiable governance control for agent tech?

Ben Cushing: 18:33

I. Okay. So, I think I said it already, but like magenta is the new app dev in the sense that we're we're building applications. Each application has some relationship to an AI, but some kind of AI, like whether it's predictive model or generation or we're saying a predictive model or a gen AI model or, you know, small language model or a large language model, whatever. Each one of those agents. For the most part, especially for the large language models, are probabilistic. All right. Meaning that you can ask an Lem the same question, and you'll get a different answer each time. Not to say the answers are wrong, just that each answer is different.

Ben Cushing: 19:24

And if you are an enterprise and you have to test the output of lots of little agents that are producing some sort of, you know, they have a query that comes in and then they produce a response in order to test that they work right. You need the response to be pretty much the same every time, right? Otherwise you can't test the validity. And so I would say a non-negotiable is not having some form of adversarial AI in the chain of thought that occurs within energetic system.

Ben Cushing: 20:00

Now, the amount of adversarial AI you put in there is entirely, I would say application specific, but I can give you an example. So I just did a presentation at hands this year where my colleague and I had built an, energetic system specifically for filling medication orders. And within that system, we had one agent that was adversarial, and its job was to validate the output of a, medication order following the fire standard, which is a health care protocol. And it it would do two things.

Ben Cushing: 20:37

It reviewed to make sure that the fire format had been followed explicitly so that it can be, sent to the electronic health record. It also read through the content of the that was produced to make sure that the fields were properly, written to the right keys. So it's essentially validating the data output. And that's all it did. Its job was just to validate the output and running the system a number of different times. It it did. It job. It would, you know, maybe 30% of the time discover malformed information coming from one of the other agents. And it would say, no, this is wrong. It would send it back and say, hey, you screwed up. This is where you screwed up. And then the other agent would be like, oh, so sorry.

Ben Cushing: 21:24

It would produce a new output and send it and we could check again that that for me that's not that's non-negotiable. Like those are that's how you create trust in a system. And if we can't have some level of trust within that system, no one will adopt this stuff. What's really nice about what I just described is that that adversarial AI can follow, like a human type check, right? Like it can act like a informatics, doctor or somebody else who might be reviewing content. There's there's, of course, other controls that have to be in place that are sort of like table stakes. I should say, like, for instance, looking for any sort of and, malicious injection. So, like if somebody in the prompt, if somebody writes code into a prompt and that code is looking to get executed by the system and then like, you know, mess it up, that needs to be checked for, of course.

Ben Cushing: 22:24

You also want to have like a policy check. So if your system like the example I used earlier, if the system knows a lot about physics and gardening and somebody asks a question about music, well, the system should be able to say, hey, I actually don't know anything about that. But those are sort of upfront filters that you need within the system itself, which again, each one of those filters is an agent. And it like one of those agents is has registered the domain expertise of the of the system. It can say, hey, this query relates to something we don't know anything about and it should respond in kind. So I'm not sure what what what that is.

Susan Sharer: 23:04

Very very cool. So banning closing what emerging architectures like AI microservices could provide a path forward for scalable and sustainable AI adoption. And what are the practical steps for scalable AI adoption?

Ben Cushing: 23:19

Yeah, I would say the the if if you're just getting started. The best way to consume AI today is by far to use a service. There's there's plenty of services out there, I think especially in the federal space. Google and Anthropic and OpenAI have have gone. They've they've really stretched themselves to accommodate the federal requirements for multi-tenancy, for security controls, for data leakage, things like that. I can't say that they've like, solved it on the back end because I haven't I can't see that. But I'll say that they've they've taken all the right steps at least. So I would say using those services, great use. It will help you understand your user base. It'll help you understand the use you can expect from constituents. It will, you'll start to see the value and impact of it pretty quick. That's step one. Step two in parallel. I highly suggest doing pilots where you start to link a large language model that's deployed on a local system to Rag or retrieval augmented generation.

Ben Cushing: 24:32

When you combine those two things, you're able to retrieve knowledge very rapidly. It's a very low cost way to start doing AI. And I mean, I can say from experience, like having been engaged with a few of those, you have immediate impact for customers. Meaning that by customers I mean the constituents, you know, consumers, whatever. Whoever is the the group you're serving, almost immediately you can start to extract information from, seemingly endless array of information and and getting it contextually and put it in the hands of the person who needs it. You can do that really quick. Examples of this would be like, let's say, the policy. Let's say you've got a mountain of policy documents. And and whenever somebody needs to query that policy, it requires a human being with 30 years of experience to be involved in the conversation.

Ben Cushing: 25:25

Well, all that policy can be loaded into a database, into a vector database, and then queried through a large language model. And you pretty much get like subject matter expert response. You probably want to review that before you give it back to a consumer by a subject matter expert. But in this case they're doing the review of the content. They're not spending hours and hours reviewing the policy. Okay. So it's just like immediate value right there.

Ben Cushing: 25:53

As you start to work with that kind of system, you're going to discover that you want to add other knowledge sources and you want to add other services. And now you're kind of walking into an identity space where, oh, I need a way to communicate with that data, that data that's more flexible. I need these individual models to or services to talk to each other, because the question that was asked is too complicated for a single lab and and database. I need a couple of these to respond in time. And now you have an agenda requirement. And like I said before, the open source community has already provided these tools. So they're all out there and available to work with today. But that's something that you will you'll walk into as you take these steps.

Ben Cushing: 26:40

I think it's a fallacy to just build an incentive system right off the bat without any sort of, you know, testing and and need. But you but I think agencies, commercial entities, everyone will will find themselves in this place fairly soon if they're not there already.

Susan Sharer: 26:59

That's great. Ben. Thank you so much for joining us today. And I look forward to future discussions around genetic AI and for people to gather additional information about each and take. I hit the QR code below in this post.

Ben Cushing: 27:15

Thank you. Susan.

Susan Sharer: 27:17

You betcha. This concludes today's episode of the Fed to Fed podcast. If you enjoyed this episode, please don't forget to subscribe, rate and leave a review. Your feedback helps us continue bringing you thought provoking sessions with the brightest minds in government, technology. Stay tuned for our next episode, where we will continue to explore opportunities to harness the power of technology and explore what's next in developing a more innovative and efficient government. Until then, this is the Fed to Fed podcast by GovTech Connects. Thank you for joining us.