Chef - Nodes deregistering automatically

We have some nodes creating via eks(kubernetes) and it is registring to chef server using validator key method. Once any instance is get created it will register with chef automatically . 2 days back we were verified that nodes are available in chef web ui. Today when i check the nodes get deleted automatically. How it is happen. Is there any deregistering can do from node side ?

Are you using Chef Automate?

https://docs.chef.io/automate/client_runs/#managing-node-data

Is chef-client cookbook will do this activity ? Eqch nodes applied the chef-client cookbook in some interval. Also more thing is chef_guid make anything ?

Where are the nodes “disappearing” from

What is the “chef web ui”?

From chef server , web ui is nothing it is chef manage console

Are you saying your nodes are eks pods?

Not pods, we have to do with worker nodes. Not monitoring pods . This workers is part of the auto scaling. We have included the keys(client.rb,valudator.pem and firstboot.json) in the ami. And bootstrap script (aws bootstarp) will execute when new nodes get launched. When i increase autoscale desired state 5 to 6 one worker node is launching and this node getting registering with chef( so that we dont have any issue of ami or bootstrapping). But after couple of hours it is disappearing.

I see. ok the eks worker ec2 instances. In your runlist do you have the client configured to run regularly? Usually every 30-60 minutes. Automate has a data retention feature that marks hosts as missing after so long and then removes them. The other thing you want to watch for is with eks managed nodes they may be swapped out occasionally and if you don’t have a system to create a unique name for the node when it joins the chef-server it may be rejected due to existing node object. Validator key doesn’t have permissions to replace existing nodes.

Here’s our bootstrap script that we use to generate hostnames with unique number at the end with the instance ip address. We use terraform template resource to supply a couple of the variable’s values. This gives initial config and then in our runlist we have our base cookbook that configures the official config and sets the client up to run as a systemd.timer every 30 minutes.

It is make sence, i have configured run chef -client every 30 min . I am using chef-client version 15.3.8 Is it included the chef automate To do the cleanup? . Also as you mentioned if it is running every 30 min also how disappearing existing node? . Say for example in my eks have 5 desired nodes (always this 5 nodes will available until it is delete) if auto scale happend adding 2 more nodes . total 7 . after some time if autoscaled down happen will delete those 2 and has to clean up those 2 using chef automate. In my scenario it is deleing all worker nodes. Is it fix the issue if i am not using chef-clinet cookbook ?

If your client registers with chef-server but during it’s first client-run during bootstrap it fails to converge you’ll see the node but it’ll go missing and if it never got to your chef-client in your run list to configure it to run every 30 minutes then it wont run unless manually triggered. It also wont have a run list assigned in chef-server as that gets updated when chef-client reports back the results after successful chef-client run. My guess is you may be hitting a compile error somewhere.

In my script above I added exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 to log the scripts output to help diagnose bootstrap errors.

I noticed one thing. As you suggested before the node name is making problm. If the node is register with ip-10-140-… it is de registering automatically