I write web apps that show system health information, for support purposes. Whenever I talk to my boss about the general direction of what I'm writing he says, "I want one page that shows me everything."

This is an enormous company, with tens of millions of customers, and an infrastructure so big that there are literally millions of potential points of failure.

I hear this from management softs all the time: one page that shows me EVERYTHING. To me, that means he wants a red or green indicator that he can quickly check on his iphone while he's skiing.

I'm afraid that managing this kind of infrastructure is a bit more complicated than that. If it was that simple, you wouldn't have anyone to manage.

    how about icinga? there is an view that displays only the server that have at least one failing tes or an view that displays the status of the servers in a circle. the only pain is to configure it. after that its easy to maintain.
