Let's Discuss: Healthy Containers and Devices for Embedded Linux products

What is a healthy container?

What is a healthy device made up from BSP and containers only?

What shall you do if you don’t have a healthy container?

What happens if you have an unhealthy device?

Lets discuss!

@anibal how about we start like this:

  1. all READY containers that have reached their state goal are by default considered healthy
  2. to mark a container unhealthy the container can post by itself through REST API
  3. … or a health_probe would need to fail X times without success like in k8s (to bee configurable in run.json)

Not sure yet how to best react on unhealthy containers, but a very naive way to think about this could be::

  1. for containers that need system restart on update → we restart the system
  2. for containers that need container restart on update → we restart container

If for scenario 2. the container does not reach READY state anymore we restart system.

I agree with restarting system or container in case of unhealthy container according to its policy. We could also rollback in case the device is in TESTING stage no matter the policy. How about that?

Yes, I agree. During TESTING we want to apply extra scrutiny that an update is good and works so +1 on rollback if any container goes unhealthy during this period … at least initially.