Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
ARM can keep up with x86 at least in laptops performance-wise and at less power consumption, see the discussion in your last rant on this topic.
The kicker however is that Apple is using its own ARM implementation which is quite ahead of the competition. Dell & Co don't have that chip design competence so that they can't just mimic Apple's move to ARM.
Also, the ISA is mostly irrelevant these days, with the exception of embedded where ARM had to launch the mixed 16/32 bit instruction set Thumb-2 in the Cortex-M series to address code density vs. performance. -
ARM is a RISC architecture.
You aren't supposed to have complex one-opcode operations. -
@Parzi No, it isn't because internally, x86 CPUs are also RISC machines. They decompose the external, complex ops into micro-ops and execute them.
Also, Apple's ARM performance data when taking the power consumption into account don't look like any sort of suffering. -
Parzi88334y@Fast-Nop i mean, from what i've seen, something like a fastmult routine wouldn't be faster than the hardware mult opcode, so i'm assuming it's as such for stuff like AES or a bunch of the extra opcode sets, but considering there is no hardware real mode anymore...?
-
@Parzi The implementation of x86 micro-ops depends on the CPU manufacturer and even the CPU series. It's a compromise between performance vs. expected relevance for each instruction.
Also, ARM does have multiplication instructions. What it doesn't have are direct instructions for RAM address content manipulation because it's a load and store architecture. On the upside, ARM has more registers so that you have less need for register spill onto the stack. -
Parzi88334y@Fast-Nop it was an example, not a real case against ARM. I know ARM has a mult instruction. Additionally, lack of direct memory manip is going to eat even more cycles.
I am more knowledgeable on limited-cycle CPUs over it's-okay-to-burn-a-million-cycles CPUs, though, so maybe it's not a massive impact. -
@Parzi In reality, it doesn't. First, these complex instructions are also several micro-ops on x86. Second, less register spill compensates.
Third, optimising compilers take that into account via variable scope analysis so that unnecessary fetch/stores are avoided. -
@Parzi again, x86 also schedules risc-like micro-ops, so it really isn't an issue. When executing a fancy op with memory access thrown in on a modern out of order processor, the op would have to wait in a sort of reservation area for the memory op to complete anyway, so it doesn't really make a difference, because that's that a RISC load/store does too.
Compare real benchmarks of A12/A13 vs x86 CPUs keeping power draw in mind, and you'll see that ISA basically doesn't make a difference these days, and a simpler one like ARM has the advantage of a much simpler and more regular decoder. -
Parzi88334y@Fast-Nop @RememberMe by "micro-ops" do you mean a T/M-cycle relationship or do you mean it just aliases 8086 ops to one opcode? I don't have a timing chart in front of me, as no one times that shit anymore.
-
@Parzi Internal operations within the CPU, that's what the microcode does. Decomposing a complex external opcode instruction into several simple internal ones.
Also, apart from Dell & Co not having Apple's ARM chip design know-how, another issue is that one of Windows' major selling points is endless backwards compatibility for software.
That's why Windows has always been multi-platform, but didn't succeed on anything other than x86. The users don't want to ditch all their software. -
@Parzi x86 instructions are implemented as sequences of one or more micro ops instead of being done in hardware directly. Think of the x86 ISA as being a higher level language and micro-ops being the true, hidden ISA that's basically similar to ARM/RISCV/MIPS/etc. and the hardware does the translation for you. Hardware cycles execute these micro ops.
Timing is really hard to nail down these days because processor cores are superscalar speculative out of order with register renaming and lots of other tricks, so really isn't that simple. -
Parzi88334y@RememberMe @Fast-Nop T/M-cycle, got it. (M-cycle are the higher-level user-presented timings, while T-cycles are the ones the CPU actually does underneath. Pretty much all CPUs do this, and it's a massive issue in emulation even on systems like the Gameboy and NES as they determine the order conflicts are handled in and the like.)
-
@RememberMe What's your guess on the performance of Apple's upcoming Rosetta 2?
When Intel tried x86 smartphones with Atom, they had libhoudini for emulating ARM code, and I measured it at 50% of native x86 code on the same device.
I never found out how that lib actually worked - as interpreter or some sort of JIT compilation. Do you know more? -
Voxera115854y@RememberMe very true, better or worse can be down to what you are trying to do and how well the tooling is to your needs.
What Apple is doing now is putting a decade of experience if arm in phones to use for their laptops and desktops.
But there is an immense inertia for x86 that will take at least a few appliance generations to overcome, and intel and Microsoft has proven that they can adapt even if it can take some time.
Just like all “pc killers” that so far has failed to kill the pc.
They only spurred development to the benefit of customers :) -
@Fast-Nop I'm very curious about that too, I would guess it's an incrementally optimizing JIT compiler (like modern JS engines) but without actually seeing it, no clue. I don't know if licenses would prevent you from compiling already-compiled x86 stuff ahead of time, but a JIT that runs for long enough should get to the same place anyway. Pretty excited to check out how they do it versus say how Microsoft does it.
A wild idea would be if Apple does the intel trick and decodes x86 into ARM instructions in hardware, but then that's just doing what intel does anyway and subject to the same pitfalls (also ISA differences). Also licensing (that said, a company called Transmeta made a dynamic translator that converted x86 into their own VLIW instruction set, it never could compete in performance though). -
@RememberMe Wikipedia and several news articles claim that Rosetta 2 works at installation time. So it would convert an Intel binary into an ARM binary, which would run much faster.
I don't think Apple would integrate that into silicon. That would make the chip design much more complex, and it's only meant as temporary clutch anyway.
On the other hand, Intel can and will claim intellectual rights to the ISA so that this whole thing will run into lengthy court proceedings. Intel has no interest in supporting any of this, after all.
When MS and Qualcomm went for an x86 emulator in 2017, Intel immediately threatened to sue. The result was that x86-32 was emulated because the patents were long elapsed, but not x86-64. However, Apple has already moved to 64 bit on Mac already so that this wouldn't be an option. -
@electrineer Not exclusively. Intel has also introduced a lot of instructions that are used in real world binaries. AMD and Intel have a cross-licencing pact, but that doesn't extend to Apple.
And yeah, AMD has as little incentive as Intel to get people away from x86. -
Doesn't matter. Just the potential concept of intel getting fucked is satisfying enough.
https://wccftech.com/windows-pcs-wi...
this is stupid, Apple didn't kill x86 with PPC, and they'll not do the same with ARM. (ARM is cheaper, but it's not really more powerful yet. You can't do a bunch of one-opcode stuff on ARM yet, it needs to mature further. ARM'd also need massive retooling for cards and such and back-compat is gonna be hell.)
rant