ray at arachelian.com
Sat Oct 21 15:00:43 CDT 2006
Chuck Guzis wrote:
> On 21 Oct 2006 at 13:24, Ray Arachelian wrote:
>> On context changes you always have to save all the registers to memory,
>> so you'd lose on any interrupt.
> That's the bonehead way of doing things. For the way it's done
> right, take a look at the S/360 calling sequence--the called routine
> uses the STM instruction to save just the registers that will be used
> by it.
Note that by context change, I mean switching from one process to
another, or from one thread to another, not whilst making a function
call! Obviously, no one who is writing efficient assembly would save
ALL of the registers around a function call. On an interrupt, you need
to switch from userland code to supervisor code to handle the interrupts.
You can delay processing by having a small ISR which does very little
work and schedules something else to run, but in the end, at some point,
you need to switch from one process to another, and doing so requires
saving all of the registers. With a stack architecture, you need to
flush the stack cache and save only 3-4 registers, (PC, SP, SR) and they
can be saved into the process table itself, making context switches cheaper.
> I've worked with 3-address machines with 256 registers of 64 bits.
> If you're working with even a moderately-sized subroutine under those
> circumstances, the entire set of local variables will usually fit in
> the register file A smart compiler or programmer can even segment
> the variable set out into dynamic and static variables, so the
> smallest part of the register set is saved upon exit.
That depends on your code and your optimizer. If you have a lot of
function calls that the optimizer can't inline, it won't be as efficient
as a stack machine. If you have a big blob of code in a single function
that does a lot of floating point, or even integer math, then a stack
machine will lose. Also, does your 3-address machine have a way of
indexing into registers such that r0 is one register, but if you make a
subroutine/function call, r0 is another register? If not you need to
save that data somewhere before making the call, and you wind up doing a
lot of register shuffling to do it.
Either that, or your compiler has to inline a lot of the commonly used
functions and assign them different registers from the file. (i.e. it
has to treat as if function calls were to nested functions and inline them.)
> The point is, that no matter how one touts the benefits of a cache as
> being able to substitute for a fast register file, it's a false
> claim. A cache can never have the information about the nature of a
> program's behavior that the programmer or compiler can--it's too far
> removed from the actual program context and must rely on history for
> what belongs in the cache and what doesn't.
The stack cache doesn't need any of that information though. That's
what's nice about the register windowed and stack machines: It also
saves the compiler a lot of headaches in register scheduling and
optimizing code. Why would it be any faster if it did? I can see how
temporary values that are discarded would wastefully get written back to
main memory, but beyond that, its not slower than a large register file.
You could even build the stack machine such that if the SP indicates
that it returned, any dirty cache entries beyond the SP are thrown away,
thus eliminating a lot of those wasted writes. (You'd have to ensure
that the return value is not thrown away by placing it somewhere safe
first, but that's not hard to do.)
BTW: there's nothing inside a stack machine that prevents it from also
being a 3-address machine. :-) The only requirement a stack machine is
in treating the stack as a register file and optimizing the hell out of
the stack cache in terms of how it accesses memory.
More information about the cctech