Upgrade Error “MachineConfigPool worker is not ready” in OpenShift

If you are managing an OpenShift cluster and encounter the error “MachineConfigPool worker is not ready”, it indicates an issue with the application of the MachineConfig, preventing the cluster nodes from reaching a "Ready" state. This error typically occurs due to failures in the Machine Config Daemon (MCD), which is responsible for applying configurations to the nodes.

In this article, we will cover how to diagnose and fix this issue technically, ensuring your cluster returns to a healthy state.

1. Diagnosing the Issue

Before applying any fix, the first step is to identify the root cause of the error. OpenShift provides logs and built-in tools that help in troubleshooting.

1.1 Check the Status of the MachineConfigPool

To understand the state of the MachineConfigPool, run:

oc get machineconfigpool worker -o wide

This command will return output similar to:

NAME     CONFIG                                  UPDATED   UPDATING   DEGRADED 
worker   rendered-worker-123456789abcdef         False     True       True

Here's what these values indicate:

UPDATED = False → The configuration has not been successfully applied yet.
UPDATING = True → The cluster is still trying to apply the changes.
DEGRADED = True → Something went wrong during the configuration application.

If the node is degraded, it is likely that the Machine Config Daemon (MCD) has failed to apply the configuration.

1.2 Check the MachineConfigDaemon Logs

The MachineConfigDaemon (MCD) runs as a DaemonSet on cluster nodes and is responsible for applying configurations to worker nodes. To identify potential errors, check the logs for the affected nodes.

First, list the MachineConfigDaemon pods:

oc get pods -n openshift-machine-config-operator

This will return a list similar to:

NAME                                 READY   STATUS    RESTARTS   AGE 
machine-config-daemon-abcdef         1/1     Running   0          4h 
machine-config-daemon-ghijkl         1/1     Running   0          4h

Pick the pod corresponding to the problematic node and check its logs:

oc logs -n openshift-machine-config-operator machine-config-daemon-abcdef

If there is an issue with the MCD, it will be displayed in the logs—such as failures during reboot, authentication issues, or difficulties in applying the new configuration.

2. Fixing the Issue

If the logs indicate that the MachineConfigDaemon is stuck or failed to apply the update correctly, you can force it to retry the process.

2.1 Forcing Configuration Reapplication

Run the following commands on the affected node:

sudo rm /etc/machine-config-daemon/currentconfig 
sudo touch /run/machine-config-daemon-force

The first command removes the current configuration file, allowing the daemon to attempt the process again.
The second command signals the daemon to force a re-execution of the configuration.

After that, restart the node:

sudo reboot

2.2 Verifying the Fix

Once the node and the MCD restart, check the MachineConfigPool status again:

oc get machineconfigpool worker

If everything is working correctly, the expected output will be:

NAME     CONFIG                                  UPDATED   UPDATING   DEGRADED 
worker   rendered-worker-abcdef123456            True      False      False

At this point, the OpenShift update should continue normally.

💡 About Me

I’m Gabriel Carmo, a technology enthusiast (especially Open Source). I have experience in Cloud, Kubernetes, OpenShift, Zabbix, Dynatrace, and much more! Always exploring new technologies and sharing knowledge. 🚀

📬 Let’s Connect?
🔗 LinkedIn
🐙 GitHub
🦊 GitLab
🏅 Credly
📧 Contact: contato@gabrielandre.com.br