Azure scale sets to manage agents

Has anyone here used Azure scale sets to create (and manage) agents?

yes many times. mostly works really well but occasionally the scale set gets stuck. and needs a restart.

Yes, we are building the agent images based on the images from the <https://github.com/actions/runner-images|runner images repo>, and deploying them in scale sets. Works like a charm, hardly had any issues with the setup :slightly_smiling_face:

I generally have used packer, or Azure image builder, to publish to an Shared Image Gallery and then point the VMSS to just pick up the latest version of the image. This way the running image is updated and you can schedule a image generation to your own schedule. It does work really well and generally very low maintenance.

wrote some scripts to take the official image and store it somewhere, then used it as a base to create a few customized images on top of that.

My coworker is attempting to build some ADO agents using scale sets but running into issues. Using Terraform and the same image that linked, too, I believe.
His struggle is that when the extension to join to our domain runs, the VM does not run the agent software. MS Support :nauseated_face: claims this is not possible and will submit this scenario as a feature request, and in the meantime, manually installing the agent is the workaround…
He’s encountered some other issues, but I think those need revisited after this part works. I haven’t seen his source code, yet - I’ve poked him to upload it to a repo. I thought I’d ask here how others do this, since it doesn’t exactly seem like an odd scenario.

Domain joining a scaleset agent has all kinds of issues. It needs a reboot, pollutes AD, etc. What makes them think they need to?

I can see the use of persistent, fully locked down VMs that potentially reboot and set their drives after each job, but not an ephemeral VM. All of the effort to apply policy, lock down the machine to then throw it away…

The AD guys refused to let ephemeral agents join the VM, and even stated they were more secure.

Yeah. Giving the VM a domain identity automatically grants it all kinds of permissions. It being a true guest has a lot of advantages, but it makes things like PowerShell remoting harder. For those cases domain join a VM once, snapshot it and set it to discard its differencing disk after every reboot. A VM like that can be domain joined, as it has a true identity.

Good question. I’ll ask him. I’m not sure if any of the software, or the agent itself, needs to run as a service account on the domain. I know these are being built for running tests, which might need to connect to internal DB’s

Just confirmed that the intent is for these agents to access internal databases

And SQL-authenticaltion or simply allowing the IP address-range to connect isn’t an option?

Or setting a username & password in the azure pipelines variables?

Or runninq the SQL servers using a docker service container on the agent?

the effect would be largely the same, if the agent always run a user who can access the db.

I’ll have to ask some questions on that. I’m guessing these tests are expected to use Integrated Security in their connection strings to the databases to replicate Production as closely as possible.

Yeah, my thinking, but they could allow a guest/anonymous account to connect and still use SSPI

An ephemeral build agent won’t in any way reflect the prod environment anyway :wink:.

I don’t think I’ve ever used a guest/anonymous account in conjunction with SSPI.