38
Linux
44d

I am running a small - but growing - ceph-cluster at work. Since it is fun and our storage demand is growing each day.

Today, it was time to bring another node online and add another 12TB to the cluster.
Installation of the OS went fine, network settings fine, drives looks fine.

Now, time to add it into the cluster.... BAM
Every Dell machine in the Cluster - Dead.
The two HP-machines is online and running. But the Dell-machines just died.

WAT!?

Comments
  • 7
    Dell uefi is a helluva drug.
  • 1
    Sounds like the HP machines are murderers
  • 0
    @electrineer

    But the HP machines was already in place has part of the cluster for months
  • 1
    Ever found out what this was?
  • 0
    I smell vendor-speciffic hardware differences.., :)
  • 0
    Good luck dealing with that :\
  • 7
    HP: *murders Dell devices*

    HP: "See? We are superior, you should ditch Dell and buy our stuff!"
  • 0
    @xalys

    Not yet
  • 3
  • 5
    Theory number one: powerspikes
    Theory number two: bug in the network cards
    Theory number three: Dell server has an orgy but the HP servers was not invited so the killed the Dell servers
  • 1
    @Linux curious, what series were those Dells? Also, did the servers themselves physically die or did the cluster nodes on those servers die? Coincidentally couple weeks ago I was reading into Ceph, Beegee FS, Gluster FS ... some people were saying Ceph has tendencies to act weirdly under sudden load spikes.
  • 1
    @theKarlisK

    I just got NULBYTES on all logs, so I am not sure yet. I have not been able to try to trigger it again.

    I have not heard that ceph can act that way actually, that is worrying...
  • 1
    Also,
    It is Dell 630 and Hp Proliant DL360 G8
  • 0
    @Linux it could have been an issue limited to an older version or certain situations. Also, try as I might, I couldn't find the source of my claims.

    But from what I can recall, the issue did not "kill servers" it only put them under unexpectedly increased HW load and it only exhibited itself only under sudden spikes. I doubt this is connected.

    Sidenote, I asked about the model just in case there's something I need to look out for.
  • 1
    @theKarlisK

    It is unlikely since the logs is just nulbytes actually,
    And doubt it is because of stress/load actually.
  • 4
    @SortOfTested

    I just ran into that....
  • 3
    @Linux
    This is future. 😿
  • 3
    @SortOfTested

    Fuck my ass

    I am quitting.
  • 1
    @Linux Wait... am I reading that correctly? is that basically the "BIOS" crashing? XD
  • 1
    @FinlayDaG33k

    Yes , the diagnostic tools is crashing.
  • 2
  • 1
    Crazy @SortOfTested called it.
Add Comment