Access to IOPAGE Registers using PDP-11 Opearting Systems
Jerome H. Fine
jhfinedp3k at compsys.to
Sun Nov 25 21:06:28 CST 2007
>Johnny Billquist wrote:
>> Jerome Fine replies:
>>
>> I am replying to Johnny's response, but I had also read the other
>> replies as well.
>> Thank you all for your help.
>>
>> The first point is that using a PEEK/POKE SYSTEM (EMT? - RT-11 has
>> such a call)
>> is so high in overhead that it becomes almost useless. In fact, the
>> key point about the
>> use of the EMEM.DLL under RT-11 is the efficiency. While it is
>> possible to access
>> normal "emulated PDP-11" memory (using E11 on a 750 MHz Pentium III)
>> in about
>> 0.3 micro-seconds, it takes about 1.2 micro-seconds to reference an
>> IOPAGE address
>> in some sort of way - including the PSW or the EMEM.DLL values or
>> about 4 times
>> as long. Since this is a huge improvement over using a PEEK/POKE, it
>> is even worth
>> giving up 8192 bytes of address space to a dedicated APR (of the
>> IOPAGE) for that
>> purpose.
>
> True. From an efficiency point of view, using system calls to
> read/write memory is very inefficient.
Jerome Fine replies:
Which means that using a system call is useful only during
initialization. For example,
RT-11 allows the user to .Peek/.Poke the PSW which I agree would be VERY
unreasonable
under RSX-11 and RSTS/E which depend on the KERNEL code maintaining complete
control.
However, under RT-11, the VBGEXE program sets the PREVIOUS DATA space to
user mode for reasons which I do not understand. Fortunately, it is
possible to use the
.Poke system call to set the PREVIOUS DATA space back to KERNEL which is
what the RT-11 setting is prior to using VBGEXE. When the PREVIOUS DATA
space is KERNEL, it is possible to use the 2 instruction example given
below. So
a system call to .Poke during initialization is entirely acceptable.
>> On the other hand, with RT-11, it is possible and easy to set the
>> PREVIOUS DATA
>> space in the PSW to KERNEL even when VBGEXE is used - more to the point,
>> it is actually unnecessary since that is the default for a so-called
>> privileged job (which
>> all programs are by default). This allows the instruction:
>> Mov @#BaseReg,R0 ;Get the current value from PC
>> memory
>> to be replaced by:
>> MTPD @#BaseReg ;Get the current value
>> Mov (SP)+,R0 ; from PC memory
>> with almost the same time for execution. It also avoids losing that
>> 8192 bytes for APR7
>> being available just for the IOPAGE registers.
>
> That's not possible with OSes that maintain any kind of protection
> between processes, along with virtual memory.
> The PSW as such, is not possible to manipulate. If you could, you can
> also change your mode to kernel even though it's currently something
> else.
> Actually, you must be in kernel mode in order to modify the PSW with
> any other instructions than SEx and CLx.
See my previous response with respect to RT-11 and my agreement that
under RSX-11 and RSTS/E, the PSW must NEVER be modified by a
user program or a .Poke requested by a user program.
>> Obviously, a SYSTEM request avoids all of the problems at a heavy
>> cost in overhead
>> estimated at between 50 and 500 times the above two examples.
>>
>> That was sort of what I was thinking about when I asked if there was
>> an "fast method
>> (only a few instructions)" to access an IOPAGE register.
>
> Well, in RSX, you have a rather high overhead to set up the mappping
> to the I/O page, unless it's already mapped in when the task starts.
> But from there on, there is no overhead at all. It's located somewhere
> in your 16-bit address space. (Note that you really don't have to map
> the I/O page at APR7 in RSX. You can get it mapped anywhere if you use
> the CRAW$/MAP$ or TKB options.)
That is helpful information. There might be times when APR7 should not
be used.
> However, with normal privileged programs, the I/O page is always
> present at APR7 even if you don't do anything.
Also VERY helpful. However, if so, then would there be any reason why
APR7 could not be mapped to user memory in the normal manner so that
the user program has a full 65536 bytes of address space, but then the
PREVIOUS DATA space is mapped to KERNEL providing the user
with complete access to the IOPAGE registers via that 2 instruction
example that I gave at the beginning? I can't see that there would be
any greater loss of security since being able to change the IOPAGE
registers either directly or indirectly is just as damaging!
Please comment?
>>>> RSX had a bit more flexibility (opportunity) in this regard. I
>>>> believe you can set up a CRAW$ (create address window) directive in
>>>> either Macro or Fortran to achieve the desired result.
>>>
>>> Yes with reservation. CRAW$ (create address window) is as a part of
>>> doing dynamic remapping of your address space.
>>> However, CRAW$ always required a named memory partition. You cannot
>>> create an address window to an arbitrary memory address.
>>> Also, the memory partitions have protections and ownership associated
>>> with them.
>>>
>>> On most systems, CRAW$ cannot get you access to the I/O page, simply
>>> because normally you don't have an address space and a partition
>>> associated with the I/O page.
>>>
>>> But if such a partition is created, then CRAW$, in combination with
>>> MAP$
>>> would allow you to access the I/O page.
>>>
>>> The same thing can also be achieved even without CRAW$/MAP$, since you
>>> can specify mapping that your task should have already at task build
>>> time, with the COMMON and RESCOM options to TKB.
>>
>> This seems to be the answer if it is allowed. Obviously it does
>> require giving up
>> that 8192 bytes the have APR7 mapped to user space.
>
> Correct.
But unnecessary if privileged jobs already have access to the IOPAGE.
Please comment?
>> There is also another option with E11 that I will make use of when I
>> have finished
>> with the HD(X).SYS device driver for RT-11. It turns out that if the
>> memory is
>> being accessed sequentially, the average time to reference a single
>> 16 bit value
>> in the file under:
>> MOUNT HD: FOOBAR.DSK
>> is actually less than the time to get/store a single value under
>> EMEM.DLL when as
>> few as 8 blocks (2048 words at a time) are being referenced.
>> Consequently, setting
>> up a small 4096 byte buffer and the associated code to handle to
>> calls to the HD:
>> device driver (all standard calls to .ReadF and .WritF in RT-11) is
>> actually more
>> efficient since after the values are in the buffer inside the
>> program, the values can
>> be referenced and modified at "emulated PDP-11" memory speeds.
>
> You mean that using a device driver, and a device that can access the
> "normal" memory instead is better. Well, I'm not surprised. What this
> essentially turns into, is that you're emulating DMA.
Actually, under E11, it is almost identical in principle to the VM:
device driver
which accesses "emulated normal PDP-11 extended memory". The E11 command:
MOUNT HD: RAM:/SIZE:number-of-blocks
makes HD: into a Virtual Memory device which directly uses PC memory.
However, the average transfer rate per word for even a few blocks (or a few
thousand words) from/into emulated user memory is a small fraction of a
normal
memory access time.
In addition, if an operating system caches the blocks in a file, the same
speed is achieved.
>> Of course, the above solution for sequential references does not work
>> when the
>> references are random or when references are at regular but very
>> large intervals
>> (thousands and even millions of successive values). For this latter
>> situation, it
>> may be possible to modify EMEM.DLL so that a single reference to the
>> IOPAGE
>> register modifies all of the specified values (over a range of up to
>> many billions of
>> values).
>
> Can't comment much, since I don't know exactly what you're trying to do.
> But speedwise, if you really want something to act like fast disk,
> writing something that behaves like proper DMA is the best.
> You give the device a memory address, a length, and a destination
> address on the device, and let it process the data as fast as it can,
> without involving the PDP-11 after that point.
It is just a bit more complicated since the memory address can be anywhere
in the 4 MB of emulated PDP-11 memory. So 22 bit address is required -
which can be determined during initialization. The even better aspect
is that
the code (only about a dozen instructions which set up the 6 IOPAGE
registers)
can be in user space which avoids the overhead of a system call.
And if you don't think that amounts to much, my benchmarks show that
the transfer speeds with just 8 blocks (2048 words or 4096 bytes) take
about half the time as a normal system call. Fewer blocks are even more
efficient vs system calls.
>> Of course, the result would no longer really be a PDP-11 except for
>> the controlling
>> code which would still be 99% of the required code since the EMEM.DLL
>> changes
>> are really quite trivial, yet consume 99% of the time to execute. In
>> case anyone
>> does not appreciate what I refer to, it is back to my other addiction
>> - sieving for
>> prime numbers. I realize that I should probably switch to native
>> Pentium code,
>> but is seems more of a challenge and much more fun to run as if a
>> PDP-11 is being
>> used with a few GB of memory somewhere out there that can be easily
>> fiddled with
>> as if there is a very fast additional CPU similar to those that used
>> to be available for
>> special math applications - anyone remember SKYMNK for FFTs?
>
> Hmm, are you just creating a sieve for primes? Ok, then you need large
> memory somehow.
> Several ways of doing that. For your specific needs, a simple device
> in the I/O-page with a command register, an address register and a
> data register would probably be just about the best.
For a demonstration program to sieve up to 10**12, I can use normal
PDP-11 memory for the work area of around 30 KB. The 2 arrays
which will be used sequentially will require 78,498 elements each that
are 32 bits or 4 bytes each - a total of less than a MB, but since they
are used sequentially, can be easily read / written in groups of 2048 words
or 8 blocks each.
For those not familiar with sieving for primes, a very large memory is used!
The problem is that sieving requires the storage of large memory used both
sequentially and what seems like randomly. One array is used to store
the primes being used. A second array is used to store the next location
to be used in the work area for that prime. The work area is normally as
large as possible and is accessed at intervals equal to the current prime
being processed.
These days, a sieve program up to a billion (10**9) is considered trivial.
Most individuals who are serious consider any range under a trillion
(10**12)
to be in the nature of a toy. However, since the number of primes up to a
trillion - described as pi(10**12) = 37,607,912,018 - requires more than
16 bits
per element for the second array, just the storage of the second array
of at least
78,498 elements is over 1/4 MB. And since pi(10**9) = 50,847,534 which
is the number of elements in the second array required to sieve up to
pi(10**18) = 24,739,954,287,740,860 for which 30 bits per element is just
sufficient, the second array then requires over 200 megabytes.
Of course, these memory sizes are no longer even very large for a
current Pentium III
system (I have a Pentium III with 768 MB of memory) and with a Pentium
4, they
are only a small aspect of the problem. However, for the PDP-11, they are
obviously impossible. Thus my interest in using E11 and the features
that I have
described.
Note that pi(10**22) is considered to be known and pi (10**23) is likely
known, having been recently found this year. pi(10**24) has still not been
published, but will likely be known in the next year or two when faster
algorithms are found or faster CPUs are used. Sieve programs were not
used to find these values.
Sincerely yours,
Jerome Fine
--
If you attempted to send a reply and the original e-mail
address has been discontinued due a high volume of junk
e-mail, then the semi-permanent e-mail address can be
obtained by replacing the four characters preceding the
'at' with the four digits of the current year.
More information about the cctech
mailing list