mips

Ranter

12bitfloat

10426

Comments

5

RememberMe

13709

5y

Why do you need one when you can do rdest = add(rsrc, rzero)? Take advantage of the hardwired zero register and the three instruction format.

Instructions cost space in the encoding and make the decode stage slower. Adding redundant instructions is pointless. Most RISC instruction sets are meant to be generated by compilers or assemblers anyway, not handcoded. And when you do want it, most assemblers give you a pseudo move instruction that gets converted to add.
3

12bitfloat

10426

5y

@RememberMe I think a RISC architecture has the space to encode a simple mov instruction (one of the most used instructions maybe ever!)

Now if you wanted to optimize it to any degree you'd have to special case `add ${?} $zero ${?}` just like x86 special cases `xor ${0} ${0}`...

Does that REALLY help?..
4

Fast-Nop

36813

5y

@RememberMe ARM has both load and mov. Load is for indirect loading through registers as addresses. Mov is for direct load from either another register, or a constant, and can even be for bitwise negating while loading.
4

RememberMe

13709

5y

@12bitfloat nope, RISC ISAs, especially an academic one like MIPS (and RISC-V) are very carefully designed and it's always hard to squeeze in all the stuff you want into the encoding space you're given. You only add a new instruction when it helps the hardware decide faster what to do.

Adds with zeros can be optimized in the hardware and I imagine they are in commercial implementations. In an inorder processor they just get muxed into the next pipeline stage because you avoid the energy cost of activating the ALU for a useless addition (it doesn't make it *faster* because you have to wait for the entire clock cycle anyway).

I imagine in an out of order processor this would also go through an optimized pathway where you don't have to wait for ALU substages if the ALU is also pipelined, so the issue queue can throw another instruction into the integer unit in the very next cycle (or not use the integer unit at all, decreasing pressure on the integer units).

So what you're saying comes down to extending the primary decoder with an extra condition, or the arithmetic decoder with an extra condition. I would say this makes basically no difference, so why would you waste instruction space for nothing?

@Fast-Nop I'm no expert on ARM so I could be wrong, but that seems to be for loading immediate constants. The one big downside of

add rdest, rzero, const

is that the constant can't be big because encoding the registers takes bits. So you have a mov with only two operands which allows bigger constants. MIPS and RISC-V solve this using a LUI instruction which is orthogonal to add (unlike mov). Keil docs say add takes a 12-bit imm while mov takes a 16 bit imm. ARM is also a legacy-ridden mess while MIPS and RISC-V were designed to be "pure" RISC by academia.

ISA differences don't really matter much to execution, most implementations look the same. So even if MIPS and RISC-V don't have a mov, they will have circuitry that does pretty much what an x86 processor would do.
3

Fast-Nop

36813

5y

@RememberMe Yeah it's mostly for immediates, but also for moving from register to register.

Though the 12 bit thing in the other instructions has been done really cleverly - it's not actually a 12 bit constant, but allows for several different bit patterns such as shifted 8 bit constants, or patterns with a certain symmetry.

I guess you could drop the mov if you cancel some advanced indirect addressing modes. I mean, some are certainly useful, like immediate offsets (for structs) or incrementing the address register after loading (for pointer arithmetics in loops).

But using another register as offset or as increment amount seems a bit overdone, especially with even optional shifting included. I think that's a bit too CISCy, especially because the compiler usually should be able to use the simpler modes above with optimisation. E.g. implementing a loop over array[i] with pointer arithmetics, which would also use only one register instead of two.
3

Kimmax

10695

5y

Don't know wtf you're talking about, but yeah.. That's the way
3

RememberMe

13709

5y

@Fast-Nop yeah, ARM immediate handling is certainly clever, and it wouldn't cost much. I just wonder how much difference it actually makes versus just sticking in an extra instruction. From profiling data we compiled, most immediate constants tend to be fairly small (so you can compute fancy ones at compile time) so I suspect it's not actually that big a performance improvement, especially in an out of order processor where the out of order-ness hides a lot of delays that you might think exist.

I don't think RISC-V or MIPS use another reg for offset or increment, both have add-immediate instructions. For longer jumps the LUI instruction lets you write a 20bit shifted immediate (imm20 << 12) to a register (for absolute 32bit addresses) and AUIPC does the same but also adds the current PC value for relative jumps. iirc.
3

Fast-Nop

36813

5y

@RememberMe You need larger constants quite often for bitmasks of all kinds, that's why 0-4095 wouldn't do. However, these are mostly either shifted eight bit constants or have a certain symmetry.

Sticking in extra instructions is an issue with regard to code size, performance, and memory bandwidth. Doesn't matter with rarely used stuff, but the important things need to be there.

I remember why Cortex-M with its Thumb-2 even has that strange mix of 32 and 16 bit instructions. Originally, everything was 32 bit, but that blew up the code size. Then they came up with a 16 bit instruction set (Thumb), which solved this problem - but now too many instructions had to be split up into two so that the performance sucked.

Then they introduced Thumb-2 and got the best of both, but abandoned strict RISCiness.
3

Fast-Nop

36813

5y

@RememberMe Now, if RISC-V really goes for 32 bit only as well as a seriously reduced instruction set, that will be the opposite of Thumb-2, like a "worst of" blend.

Keep in mind that embedded flash is only good for 30 MHz and needs one waitstate per additional 30 MHz, so memory bandwidth is an issue. That's why e.g. ST is using a flash cache (ART), only that this would need to be scaled up for a pure 32 bit ISA, and even further for reduced ISA.

That still wouldn't address the need for more flash in the first place, but maybe RISC-V's bet here is that the cost disadvantage of needing more flash will be offset against savings in ARM licences.

Especially because ARM can't make the licences arbitrarily cheap while you get more flash per dollar with every advance in production processes.
2

RememberMe

13709

5y

@Fast-Nop I was talking about the number of immediate-loads in the dynamic instruction count (the instructions actually running, so loops would be repeated tripcount number of times in the dynamic count). Except for a few very specialized programs, that number is usually fairly small. So expensive large immediates aren't that big of a problem, and I suspect with memory ops being the actual bottleneck in high performance it's even less of a problem. Sure ARM's way is better overall but eh. Not a huge loss I think. Certainly not enough to make RISC-V's (very experienced) designers worry about it too much.

There's nothing limited about RISC-V - what I'm talking about is just part of RV32I, the base set. Most actual implementations expand that to the extension sets which have all the functionality you could ever want - including a compressed 16-bit format for common instructions (RVC) which is a lot like Thumb and a smaller system spec for embedded processors called RV32E. Since there's so much demand for RISC-V from so many areas (everywhere from high performance computing with accelerators to tiny embedded systems) it has to cover everything. Note that RVC is still a draft and still under development. There *are* problems with it, but this isn't one of them. It's definitely designed for real-world use (and research too). Because it's open source, if you really wanted a move instruction or a CISC like polynomial evaluator (no idea why but suppose), you can just do that and patch it into the also open source compilers and boom, you're done.
3

Fast-Nop

36813

5y

@RememberMe I'm pretty eager to finally see some RISC-V stuff that can compete with Cortex-M4. RN, it's more on M0 level (which is why ARM has loosened its M0 licencing), but I expect that to change.

However, the "anyone can add any instructions" thing is a troublesome aspect because I fear that this might lead to balkanisation across manufacturers. That's irrelevant for deeply embedded stuff like controllers in SSDs (e.g. Samsung is at it), but it would also hinder success outside such enclosures.

Of course, coming up with some RISC-V that can challenge current x86 or Apple's M1 would be quite a game changer. If (if!) such a RISC-V were to embed PCI and whatnot, as opposed to manufacturer specific ARM SoCs incompatible to anything else, that could fend off the threat to open computing that ARM is going to pose.
3

Fast-Nop

36813

5y

And yeah, I know that ARM is hailed to put an end to the ugly x86 ISA, but on device level, ARM is about manufacturer specific SoCs. This is not progress. It's going back to where we were in the home computer era before x86 won the race.

The number one reason why it won was its open device architecture, back then with the ISA bus, which evolved via VESA local bus (my 486 had that) over PCI (my Pentium-1 had that) to PCIe. IBM tried to close it with their microchannel PS2, but of course that failed.

E.g. in Apple M1 devices, you can't even plug in a regular SSD at market price because the flash is also included in the SoC - and 256GB cost more than 1TB in the free market because the SoC's closed architecture shuts out market competition.

I hope that RISC-V will come in open device architectures that counter this pretty grim ARM threat.
3

RememberMe

13709

5y

@Fast-Nop true, but that's just the nature of open source. Custom extensions are very costly because you lose out on the shared ecosystem development benefits, but if your application really needs them it's there. I don't think it would be a massive problem and I think compilers are modular enough to adapt.

Yeah, I've seen some prototype high performance RISC-V stuff but that's usually for the control path of large accelerators and FPGA overlays. Developing a powerful processor needs a lot of resources so I'm looking at the lesser known but still large players like Marvell (or AMD tbh) or supercomputer vendors to kick it off.

Thing is ISA doesn't really matter to most users and ARM also gives you a lot of other stuff with the license, so it's going to take a while, if it even happens. There are other innovations happening in the chipmaking industry like chiplet, packaging, and interposer tech that could work *very* well with an open source ISA. We might see custom hardware with configurable premade hardware blocks that you as a company can order and it'd cost very little because they just fab the blocks beforehand as chiplets and string them together with modern generic interconnects and NoCs (and hopefully flexible modular software to go along with all that). Cheap, easily designed custom hardware with very little NRE cost to the user. I hope that happens. (Intel already does this with EMIB, but it's not quite there yet).

Related Rants

Add Comment

MIPS doesn't have a mov instruction... What the fuck

rant

wtf