speculative exevution

Ranter

lorentz

15165

Comments

3

Root

77451

4y

Intel: “Fuck it, let’s just guess!”

That pretty much describes their business model over the past fifteen years.
3

Fast-Nop

36817

4y

@iiii It also would be a lot slower, as evidenced by the performance impact of the mitigations.

@Root Don't forget anti-competitive bribery as core part of that shit company.
2

Fast-Nop

36817

4y

@iiii CPUs without support for speculative execution are slow from the beginning because the mitigations basically bog down modern CPUs to these crap levels.
1

lorentz

15165

4y

@Fast-Nop I'm no expert but may be other possibilities for improvement that were dismissed because they didn't combine well with speculative execution.
2

Fast-Nop

36817

4y

@lbfalvy @iiii The underlying structural problem is the same for all architectures: RAM is slow, and cache is small. Either you accept a slow CPU as result of waiting on RAM instead of calculating, or you start pulling up tricks.
0

lorentz

15165

4y

@Fast-Nop You could also introduce a new feature instead of tricks and give devs the tools to own the device, eg. by allowing running programs to invalidate cached memory regions.
0

Fast-Nop

36817

4y

@iiii It's fast enough because of these tricks. Otherwise, you'd look at something like an netbook Atom because there would be no point in cranking it up just so that it waits more on RAM.
1

Fast-Nop

36817

4y

@iiii Yeah, and my point is that the single-thread performance was also awful, so rigging several of these together would still suck.

With equal total computing power, you want that to be split across as few cores as possible. It took AMD years and billions of losses to finally understand that, and once they got it, big profits started rolling in. The market, i.e. the customers, decided clearly.

More cache doesn't solve the problem that cache misses are so much more expensive than cache hits that you don't need many misses to ruin performance. In addition to the slow RAM access, you also have a stalled pipeline to account for, adding even more delay.
1

Fast-Nop

36817

4y

@iiii Yeah, shifting the work to a super smart compiler was the idea behind Intel's Itanic. Turned out that... nope.

That multi-core thing won't work because the results of the work core determine which data and instructions the fetch core should even grab, and at that point, you're back to square one.

Given that speculative execution is pervasive not only with AMD and Intel, but also with ARM, we can conclude that is what the market wanted. The main category of chip that doesn't have anything like that and is still in use, that's microcontrollers.
1

RememberMe

13712

4y

Aggressive speculation is one of the core ideas behind why general purpose code can run so fast today. Even if you build other architectures, as long as a lot of dynamic control flow and memory data flow exist, there's not much else you can do, it'll still have to speculate. Basically, what @Fast-Nop said.

To add to the another core feeding the current core idea @iiii that's valuable actually, it's exactly what prefetch units do. Already exists in basically every processor, been there for a long time. Processors prefetch a LOT, so much that L2 sometimes becomes a glorified prefetch buffer. They aren't going to wait around for your code to tell them what to do. Compilers can also help by inserting prefetch hints or instructions.

@lbfalvy there are already processors that give programs manual access to the cache (they usually call it "scratchpad" memory then) but it's usually a terrible idea because cache is a microarchitectural detail - you do not want to expose it at the architectural level (i.e. the ISA, i.e. up for programmer control). For one thing it would make code dependent on microarchitecture, which is horrible for portability (ref. the MIPS branch delay slot headache). It's fine for specialized applications, but breaking the ISA is not good for general purpose stuff because it introduces coupling between hardware and software development.

Even if you have software be the abstraction instead - eg. you have something like Java, and hardware is free to implement the corresponding JVM how ever it likes - unless everyone agrees to such a universal VM standard, it's not going to happen, which is why it's traditionally enforced by hardware instead. And it's basically the same thing in the end, with the only difference being who writes the documentation (you can think of x86, ARM, RISC-V, SPARC etc. as being different hardware VMs on top of microarchitecture).
0

hjk101

5546

4y

@iiii we seem to move more and more towards RISC. I think there are quite a few issues with older tech too (we advanced our attacks and hardware protections a lot) but yeah the CISC CPU's these days are vastly complex and with it increase the attack surface.
0

hjk101

5546

4y

@iiii with the advancement of RISC I think that is not necessarily true or at least the complexity shifts to the application (OS is counted as application in this case).
The issues discovered are in theory easier to fix as they are never in hardware but more fragmented so potentially longer existing. Still I think that RISC is less risky in this regard.
2

Fast-Nop

36817

4y

@hjk101 ARM was also hit by Meltdown and Spectre, exactly because they do the same tricks. See here: https://wiki.netbsd.org/security/...

Also internally, x86 CPUs have long in fact been RISC CPUs from an architecture POV. The CISC part is only the ISA and the decoder, the latter taking a negligible part of the die space.
2

RememberMe

13712

4y

@hjk101 in fact it's RISC that enables many of those speculative execution tricks, because it's easier to track, schedule, and rollback simple instructions. That's a big reason why these processors went RISC internally and why RISC is successful.

@Fast-Nop minor quibble though: they're RISC from a microarchitecture* PoV, not architecture.

"Architecture" is what you show to the programmer - the ISA basically. It's logical. It's high level.

"Microarchitecture" and organization are how it's actually implemented. It surprises many people that books on "computer organization" (like the classic P&H) are actually much lower level than books on "computer architecture" (like the classic H&P).

Sorry for nitpicking, just had to :p
1

Fast-Nop

36817

4y

@RememberMe I meant architecture from a chip POV. The last actual x86, i.e. with actual x86 architecture, was the Pentium in the 1990s.

The ISA is an architecture, but not of the CPU itself - instead, of the instruction set. Basically the instruction set design.
2

RememberMe

13712

4y

@Fast-Nop I get what you meant, but "Architecture" has a very specific technical meaning in comp. arch., and it's what I said above (well, sort of. Simplification yes, but reasonably accurate).

For example in a processor, we refer to "architectural state" as the state visible to the programmer/state according to the manual; and "microarchitectural state" as the actual internal state of the processor - rename register files, reorder buffer, issue queues/instruction window/reservation stations, load-store queues, prefetch buffers, etc. Every processor *has* to maintain architectural state according to the manual, but is free to use whatever microarchitecture it wants to do so. Generally the reorder buffer and architectural register file(s) take care of converting from microarch to arch state.

So for eg. when you see a register value update in a debugger, that's architectural state, but the update may actually have happened many cycles ago in the microarchitectural state of the execution engine, the processor just didn't tell you that it happened yet because architectural specifications (eg. that it must appear sequential) didn't let it. Similarly x86 is an arch specification, the uop based RISC implementation is a microarch detail.
2

Fast-Nop

36817

4y

@iiii No. Right now, their performance sucks, and they are only actually used as microcontrollers where they compete with ARM Cortex-M0, the low end of Cortex-M. That's why ARM went into panic mode and relaxed their licensing in exactly that controller class.

If they were to compete with ARM Cortex-A, such as in smartphones and tablets, also RISC-V would need such silicon hacks.

The fastest RISC-V right now, SiFive P550, needs four cores to match a single Cortex-A75 core (from 2017!), and that's already with out of order execution for the P550.

The main advantage of RISC-V is a different one: the licensing is free. You could get that with proprietary designs, too, but RISC-V also has support from compilers (GCC, Clang), and OS (Linux).
2

hjk101

5546

4y

@Fast-Nop @RememberMe I'm out of my dept here obviously. I had no idea they did that level of branch prediction. Thought that that was more up to compiler to optimise and look ahead but it makes sense to do it in hardware where everything is so much faster (cashes in chip, instructions loaded). Thanks for the correction!

I know that microcode plays a large role in x86 CPU's. Is that what you mean @Fast-Nop? Microcode providing the cisc instructions while the microcode itself consist of risc instructions that are incorporated hardware itself?
2

Fast-Nop

36817

4y

@hjk101 Yeah, the internal RISC micro-ops implement the external CISC instructions so that x86 are basically RISC CPUs that masquerade as CISC CPUs to the outside world in order to maintain binary backwards compatibility.
2

Fast-Nop

36817

4y

Actually, that begs the question whether x86 are the drag queens of the CPU world. 🤔 🤣
1

RememberMe

13712

4y

@Fast-Nop @iiii that's just because SiFive doesn't have the microarchitecture to back it up. Something that can (and will) change over time and with more investment, because that's the hard part of processor design. ISA basically doesn't matter much for performance any more, it's more about which ecosystem you're part of.

The real innovation of riscv has been the common architecture, so presently the coolest use cases aren't processors but in accelerator control. So many (including FPGA based accelerators) use riscv for control code because it has a common infrastructure behind it. It's already making a big difference there. It's so easy to just drop in a riscv core into your design and suddenly your design is fully programmable, with very little extra effort. That was really hard to do earlier.

Related Rants

Add Comment

rant

timing attacks

meltdown

spectre