Johnny Billquist wrote: 
   Jerome Fine
replies:
 I am replying to Johnny's response, but I had also read the other
 replies as well.
 Thank you all for your help.
 The first point is that using a PEEK/POKE  SYSTEM (EMT? - RT-11 has
 such a call)
 is so high in overhead that it becomes almost useless.  In fact, the
 key point about the
 use of the EMEM.DLL under RT-11 is the efficiency.  While it is
 possible to access
 normal "emulated PDP-11" memory (using E11 on a 750 MHz Pentium III)
 in about
 0.3 micro-seconds, it takes about 1.2 micro-seconds to reference an
 IOPAGE address
 in some sort of way - including the PSW or the EMEM.DLL values or
 about 4 times
 as long.  Since this is a huge improvement over using a PEEK/POKE, it
 is even worth
 giving up 8192 bytes of address space to a dedicated APR (of the
 IOPAGE) for that
 purpose. 
 True. From an efficiency point of view, using system calls to
 read/write memory is very inefficient. 
 
Jerome Fine replies:
Which means that using a system call is useful only during
initialization.  For example,
RT-11 allows the user to .Peek/.Poke the PSW which I agree would be VERY
unreasonable
under RSX-11 and RSTS/E which depend on the KERNEL code maintaining complete
control.
However, under RT-11, the VBGEXE program sets the PREVIOUS  DATA space to
user mode for reasons which I do not understand.  Fortunately, it is
possible to use the
.Poke system call to set the PREVIOUS  DATA space back to KERNEL which is
what the RT-11 setting is prior to using VBGEXE.  When the PREVIOUS  DATA
space is KERNEL, it is possible to use the 2 instruction example given
below.  So
a system call to .Poke during initialization is entirely acceptable.
   On the other
hand, with RT-11, it is possible and easy to set the
 PREVIOUS  DATA
 space in the PSW to KERNEL even when VBGEXE is used - more to the point,
 it is actually unnecessary since that is the default for a so-called
 privileged job (which
 all programs are by default).  This allows the instruction:
     Mov  @#BaseReg,R0                  ;Get the current value from PC
 memory
 to be replaced by:
     MTPD   @#BaseReg                   ;Get the current value
     Mov      (SP)+,R0                        ;  from PC memory
 with almost the same time for execution.  It also avoids losing that
 8192 bytes for APR7
 being available just for the IOPAGE registers. 
 That's not possible with OSes that maintain any kind of protection
 between processes, along with virtual memory.
 The PSW as such, is not possible to manipulate. If you could, you can
 also change your mode to kernel even though it's currently something
 else.
 Actually, you must be in kernel mode in order to modify the PSW with
 any other instructions than SEx and CLx. 
 
See my previous response with respect to RT-11 and my agreement that
under RSX-11 and RSTS/E, the PSW must NEVER be modified by a
user program or a .Poke requested by a user program.
   Obviously, a
SYSTEM request avoids all of the problems at a heavy
 cost in overhead
 estimated at between 50 and 500 times the above two examples.
 That was sort of what I was thinking about when I asked if there was
 an "fast method
 (only a few instructions)" to access an IOPAGE register. 
 Well, in RSX, you have a rather high overhead to set up the mappping
 to the I/O page, unless it's already mapped in when the task starts.
 But from there on, there is no overhead at all. It's located somewhere
 in your 16-bit address space. (Note that you really don't have to map
 the I/O page at APR7 in RSX. You can get it mapped anywhere if you use
 the CRAW$/MAP$ or TKB options.) 
 
That is helpful information.  There might be times when APR7 should not
be used.
  However, with normal privileged programs, the I/O page
is always
 present at APR7 even if you don't do anything. 
Also VERY helpful.  However, if so, then would there be any reason why
APR7 could not be mapped to user memory in the normal manner so that
the user program has a full 65536 bytes of address space, but then the
PREVIOUS  DATA space is mapped to KERNEL providing the user
with complete access to the IOPAGE registers via that 2 instruction
example that I gave at the beginning?  I can't see that there would be
any greater loss of security since being able to change the IOPAGE
registers either directly or indirectly is just as damaging!
Please comment?
     RSX had a bit
more flexibility (opportunity) in this regard.  I
 believe you can set up a CRAW$ (create address window) directive in
 either Macro or Fortran to achieve the desired result. 
 Yes with reservation. CRAW$ (create address window) is as a part of
 doing dynamic remapping of your address space.
 However, CRAW$ always required a named memory partition. You cannot
 create an address window to an arbitrary memory address.
 Also, the memory partitions have protections and ownership associated
 with them.
 On most systems, CRAW$ cannot get you access to the I/O page, simply
 because normally you don't have an address space and a partition
 associated with the I/O page.
 But if such a partition is created, then CRAW$, in combination with
 MAP$
 would allow you to access the I/O page.
 The same thing can also be achieved even without CRAW$/MAP$, since you
 can specify mapping that your task should have already at task build
 time, with the COMMON and RESCOM options to TKB. 
 
 This seems to be the answer if it is allowed.  Obviously it does
 require giving up
 that 8192 bytes the have APR7 mapped to user space.  
 
 Correct. 
 
But unnecessary if privileged jobs already have access to the IOPAGE.
Please comment?
   There is also
another option with E11 that I will make use of when I
 have finished
 with the HD(X).SYS device driver for RT-11.  It turns out that if the
 memory is
 being accessed sequentially, the average time to reference a single
 16 bit value
 in the file under:
 MOUNT  HD:  FOOBAR.DSK
 is actually less than the time to get/store a single value under
 EMEM.DLL when as
 few as 8 blocks (2048 words at a time) are being referenced.
 Consequently, setting
 up a small 4096 byte buffer and the associated code to handle to
 calls to the HD:
 device driver (all standard calls to .ReadF and .WritF in RT-11) is
 actually more
 efficient since after the values are in the buffer inside the
 program, the values can
 be referenced and modified at "emulated PDP-11" memory speeds. 
 You mean that using a device driver, and a device that can access the
 "normal" memory instead is better. Well, I'm not surprised. What this
 essentially turns into, is that you're emulating DMA. 
 
Actually, under E11, it is almost identical in principle to the VM:
device driver
which accesses "emulated normal PDP-11 extended memory".  The E11 command:
MOUNT  HD:  RAM:/SIZE:number-of-blocks
makes HD: into a Virtual Memory device which directly uses PC memory.
However, the  average transfer rate per word for even a few blocks (or a few
thousand words) from/into emulated user memory is a small fraction of a
normal
memory access time.
In addition, if an operating system caches the blocks in a file, the same
speed is achieved.
   Of course, the
above solution for sequential references does not work
 when the
 references are random or when references are at regular but very
 large intervals
 (thousands and even millions of  successive values).  For this latter
 situation, it
 may be possible to modify EMEM.DLL so that a single reference to the
 IOPAGE
 register modifies all of the specified values (over a range of up to
 many billions of
 values). 
 Can't comment much, since I don't know exactly what you're trying to do.
 But speedwise, if you really want something to act like fast disk,
 writing something that behaves like proper DMA is the best.
 You give the device a memory address, a length, and a destination
 address on the device, and let it process the data as fast as it can,
 without involving the PDP-11 after that point. 
 
It is just a bit more complicated since the memory address can be anywhere
in the 4 MB of emulated PDP-11 memory.  So 22 bit address is required -
which can be determined during initialization.  The even better aspect
is that
the code (only about a dozen instructions which set up the 6 IOPAGE
registers)
can be in user space which avoids the overhead of a system call.
And if you don't think that amounts to much, my benchmarks show that
the transfer speeds with just 8 blocks (2048 words or 4096 bytes) take
about half the time as a normal system call.  Fewer blocks are even more
efficient vs system calls.
   Of course, the
result would no longer really be a PDP-11 except for
 the controlling
 code which would still be 99% of the required code since the EMEM.DLL
 changes
 are really quite trivial, yet consume 99% of the time to execute.  In
 case anyone
 does not appreciate what I refer to, it is back to my other addiction
 - sieving for
 prime numbers.  I realize that I should probably switch to native
 Pentium code,
 but is seems more of a challenge and much more fun to run as if a
 PDP-11 is being
 used with a few GB of memory somewhere out there that can be easily
 fiddled with
 as if there is a very fast additional CPU similar to those that used
 to be available for
 special math applications - anyone remember SKYMNK for FFTs? 
 Hmm, are you just creating a sieve for primes? Ok, then you need large
 memory somehow.
 Several ways of doing that. For your specific needs, a simple device
 in the I/O-page with a command register, an address register and a
 data register would probably be just about the best. 
 
For a demonstration program to sieve up to 10**12, I can use normal
PDP-11 memory for the work area of around 30 KB.  The 2 arrays
which will be used sequentially will require 78,498 elements each that
are 32 bits or 4 bytes each - a total of less than a MB, but since they
are used sequentially, can be easily read / written in groups of 2048 words
or 8 blocks each.
For those not familiar with sieving for primes, a very large memory is used!
The problem is that sieving requires the storage of large memory used both
sequentially and what seems like randomly.  One array is used to store
the primes being used.  A second array is used to store the next location
to be used in the work area for that prime.  The work area is normally as
large as possible and is accessed at intervals equal to the current prime
being processed.
These days, a sieve program up to a billion (10**9) is considered trivial.
Most individuals who are serious consider any range under a trillion
(10**12)
to be in the nature of a toy.  However, since the number of primes up to a
trillion - described as pi(10**12) = 37,607,912,018 - requires more than
16 bits
per element for the second array, just the storage of the second array
of at least
78,498 elements is over 1/4 MB.  And since pi(10**9) = 50,847,534 which
is the number of elements in the second array required to sieve up to
pi(10**18) = 24,739,954,287,740,860 for which 30 bits per element is just
sufficient, the second array then requires over 200 megabytes.
Of course, these memory sizes are no longer even very large for a
current Pentium III
system (I have a Pentium III with 768 MB of memory) and with a Pentium
4, they
are only a small aspect of the problem.  However, for the PDP-11, they are
obviously impossible.  Thus my interest in using E11 and the features
that I have
described.
Note that pi(10**22) is considered to be known and pi (10**23) is likely
known, having been recently found this year.  pi(10**24) has still not been
published, but will likely be known in the next year or two when faster
algorithms are found or faster CPUs are used.  Sieve programs were not
used to find these values.
Sincerely yours,
Jerome Fine
--
If you attempted to send a reply and the original e-mail
address has been discontinued due a high volume of junk
e-mail, then the semi-permanent e-mail address can be
obtained by replacing the four characters preceding the
'at' with the four digits of the current year.