15
Condor
51d

Let's talk a bit about CA-based SSH and TOFU, because this is really why I hate the guts out of how SSH works by default (TOFU) and why I'm amazed that so few people even know about certificate-based SSH.

So for a while now I've been ogling CA-based SSH to solve the issues with key distribution and replacement. Because SSH does 2-way verification, this is relevant to both the host key (which changes on e.g. reinstallation) and user keys (ever replaced one? Yeah that's the problem).

So in my own network I've signed all my devices' host keys a few days ago (user keys will come later). And it works great! Except... Because I wanted to "do it right straight away" I signed only the ED25519 keys on each host, because IMO that's what all the keys should be using. My user keys use it, and among others the host keys use it too. But not by default, which brings me back to this error message.

If you look closely you'd find that the host key did not actually change. That host hasn't been replaced. What has been replaced however is the key this client got initially (i.e. TOFU at work) and the key it's being presented now. The key it's comparing against is ECDSA, which is one of the host key types you'd find in /etc/ssh. But RSA is the default for user keys so God knows why that one is being served... Anyway, the SSH servers apparently prefer signed keys, so what is being served now is an ED25519 key. And TOFU breaks and generates this atrocity of a warning.

This is peak TOFU at its worst really, and with the CA now replacing it I can't help but think that this is TOFU's last scream into the void, a climax of how terrible it is. Use CA's everyone, it's so much better than this default dumpster fire doing its thing.

PS: yes I know how to solve it. Remove .ssh/known_hosts and put the CA as a known host there instead. This is just to illustrate a point.

Also if you're interested in learning about CA-based SSH, check out https://ibug.io/blog/2019/... and https://dmuth.org/ssh-at-scale-cas-... - these really helped me out when I started deploying the CA-based authentication model.

Comments
  • 1
    I've always felt like this was weird. Very few applications handle CAs properly in my experience.
  • 0
    Yes. No.

    Muh.

    I don't see an alternative to TOFU.

    I understand your trouble - although I would simply disable all server host keys except ECDSA if I'd go through the hassle.

    If there are too old clients that cannot connect they're most likely a security hazard anyway.

    Trust on first use has no alternative afaik. Am I wrong?
  • 1
    @IntrusionCM a CA is that alternative to TOFU. With the CA, your clients all trust it and its signatures if the server can present one. Conversely with user keys, they get assigned something called principals which are usually authorized users on the server. When the server trusts the CA, it will also trust signed keys and its principals.

    When the keys are rotated, they can be re-signed by the CA and there will be no further trust issues in the network.

    Outdated SSH implementations are indeed an issue, there are even some that only do RSA type keys (so ed25519 isn't even supported either). On Termux or Linux/WSL/Windows desktop clients it shouldn't be an issue. Embedded implementations in applications (such as FX, otherwise a great file explorer but its SSH implementation is severely limited) however can cause issues.
  • 1
    Yeah, I just roll my eyes at anyone self signing with no actual authority and talking about how secure they are. The dumbs run the world.

    Fedora and RH actually just announced that weak RSA is no longer allowed and I'm future versions won't be supported at all.
  • 1
    Out of curiosity:

    How do you revoke certificates?
    How do you automate certificate enrolment?
    Why not Kerberos?
  • 1
    @Condor hm, i thought that even with CA you needed to connect to the machine via SSH to deploy the initial config:

    When push from server to client it would work without tofu on client side, but server would have the host info stored.

    With pull from client of server it would be on client side.

    But without IaC deploying is a real PITA.
  • 1
    And yeah. I meant the 25519 thingie.

    Not ecdsa... I always confuse those things.

    The name ed25519 just doesn't stick in my brain.

    *sad face*
  • 1
    @sbiewald

    It seems like revocation is not possible. There is an incentive to make key certificates short-lived instead... This could be a day or a week, or in my case I initially signed my keys for a year. Either way it should ideally be as short as reasonably possible.

    There is a tooling gap there. I guess it kinda depends on where the CA's signing key is stored. Here I made an Alpine container that's a dedicated server just for that. Other people store their CA key locally and sign their keys that way. There are pros and cons to each. I'm writing some tooling to make the signing network-wide and somewhat automated (signing key is password-protected so it can't be fully automated without serious security implications).

    I'm not very familiar with Kerberos myself but as far as I'm aware that's (partly) a ticketing server? The CA with short-lived keys could more or less act as a ticketing server. E.g. Facebook uses an entrance node that requests a signature (https://0x0.st/N1C2).
  • 1
    @Condor Depends.

    I once did an kind of jolly adventure time shit...

    There was an LDAP server.

    And an Artifactory server connected to the LDAP server.

    And Artifactory can (if you let the LDAP users be created in Artifactory) create encrypted passwords and API tokens.

    What I did in a nutshell was using a Python script that did a REST call against Artifactory with encrypted pass to fetch a single file stored in artifactory which contained an randomly generated passphrase for an SSH signing key.

    The artifactory was complete lock down. TLS 1.2 only, no anonymous and reverse proxy was whitelist only. You couldn't call simply URLs - IP and path ACLs.

    The artifactory idea came up since I never like connecting LDAP directly to something.

    The SSH key was used to deploy to all machines (Push from server to clients).

    I hope it's understandable...
  • 0
    @IntrusionCM never heard of Artifactory so far but glancing at their website and if I understand the workflow correctly, would be pretty much entirely automated? That's a really interesting way to do it and seems secure. Will certainly take this with me in my network design too, thanks!
  • 1
    One thing I forgot.

    The Deployment took part via SSH - but was triggered via an Web Interface on the Deployment machine.

    So.
    You logon via Web Interface, Auth against Artifactory, Artifactory against LDAP.

    You select what you want to deploy.

    Same thing happening, fetching credentials from Artifactory after auth, doing deploy.

    This way you have an clear and evident auth log that cannot be manipulated.

    It exists on 3 machines. SSH Deployment, Artifactory, LDAP.

    The SSH Deployment server is useless without Artifactory as the key cannot be decrypted.

    Artifactory is useless without LDAP and the reverse proxy on the Artifactory hosts and the Artifactory permissions lock down all access.

    The SSH Deployment server has an firewall and heavy restrictions on auth, too.

    It's a russian matryoshka doll
  • 2
    @Condor Artifactory is a universal cache system.

    Depending on license - it acts as an universal proxy cache and private registry.

    Apt, Maven, Ivy, Docker... Anything in the Pro plus licenses.
  • 2
    @IntrusionCM
    I love artifactory, though I think their skus need work. At a former client we had started them on cloud due to a sales pitch they'd received, and then hit the wall on storage and traffic. The quote for enteprise+ came to $12,000/month. We moved it on prem, then eventually sunset it in favor of gitlab.
  • 2
    @SortOfTested I dislike that it's quite ... Hard.

    There's a lot of documentation.

    But you'll have to really grind your gears to make it work good.

    A lot of gems are hidden behind the most simplistic and dumb UI. Don't get me wrong - I like Artifactory, but it's really expensive if you don't make use of the features _you don't see_ in the UI.

    That artifactory setup had a whole more functionality as a lot of scripts and an - additional - "maintenance" web UI wrapping the scripts was added to it.

    Self hosted... I don't like cloud unless it's unevitable. And usually it's less expensive.

    If you don't do a whole lot of different registry types, Artifactory is a wrong choice.

    Eg Harbor for Docker, Verdaccio for NPM and so on.
  • 2
    @IntrusionCM
    Yep. They make most of their sales under the auspices of being "the only ball of X you'll ever need." Works like a bat out of hell when it's done right, but you're going to hate life if you have an amateur managing it.
  • 2
    @Condor About Kerberos:

    All participants share (long term) symmetric keys ("passwords") with a central instance.
    One authenticates against the central instance, which issues personal short term keys (+ metadata = "ticket") for sessions decryptable by the participants with their respectable long term keys.
    All tickets are short termed, and the long time keys can be changed whenever one whishes (the previous one is usually kept to not break existing tickets).

    It more or less solves the same problem, with some notable differences (and by default without any public key operations).
  • 2
    @SortOfTested Amateurs and package managememt ... Oh boy.

    That's desastrous nightmare.

    I always joke that I'm a lunatic as I mostly try to hide services like artifactory behind an own reverse proxy and mask it completely.

    Eg you don't even use the artifactory URLs, but instead pre defined URLs that follow a hierarchy based on subdomain and path

    (eg maven.packages.com/jcenter/release, maven.packages.com/jcenter/snapshots, maven.packages.com/myteam/release and so on...

    Hell of setup, but you don't end up with a hornets nest in your pants when something changes.

    And you can way better put everything under complete lock down.

    Another thing where I think I'm entirely insane...

    TLS 1.3 is a real blessing and active in most setups I own. ;)

    And reverse proxies are really good at ACLing and locking shit down so noone can eg access UI on repository URLs.
  • 1
    Only thing I don't like about that is the custom dns need. It certainly works though. Very gateway.

    Everyone loves to scrimp on the admins nowadays, so the people running the servers here tend to be those who can barely speak English and are maybe a few years out of school. It's a disasterpiece and a miracle that anyone is still in business.
  • 1
    @SortOfTested DNS yes.

    But nowadays without DNS... Not really possible.

    Another thing that got completely raped and does thing nowadays which it was never designed for :(
Add Comment