RT-11 MULTIPROCESSOR USER'S GUIDE (Version 7-Nov-85 NKGAZG) PAGE M M PPPPPPP 1 1 MM MM P P 11 11 M M M M P P 1 1 1 1 M M M M P P 1 1 M M M PPPPPPP 1 1 M M P 1 1 M M P 1 1 M M P 1 1 M M P 11111 11111 ************************* * MULTIPROCESSOR RT-11 * * * * USER'S GUIDE * ************************* 1. Introduction 2. Multiprocessor Components 3. Multiprocessor Utilities Overview 4. Device allocation 5. Handlers and Pseudo-Handlers 6. Getting started 7. Bootstrapping 8. Error messages and trouble shooting 1. INTRODUCTION. ---------------- This manual is a short guide for using the RT-11 MULTIPROCESSOR package as developped at the University Hospital Groningen and the first version (V3) is described in full detail in: "A MODULAR DATA COMMUNICATION PACKAGE PROVIDING A MULTIUSER ENVIRONMENT AND PARALLEL PROCESSING", Proceedings DECUS EUROPE, Coventry U.K., Sept. 1982. The package provides a datacommunication facility between RT-11 (V4 and higher) systems(LSI or PDP-11) and facilitates transparent use of remote devices as disks, lineprinters etc. A 2nd public version (V5) requiring RT-11 V5.0 or higher, is described in: "MULTIPROCESSING AND HIGH SPEED DATACOMMUNICATION UNDER RT-11" Proceedings DECUS U.S.A., New Orleans, May 1985. A RT-11 DISK DATA CACHE is very usefull in a multiprocessor environment. Details and application are described in: "DISK USAGE ANALYSIS and DISK DATA CACHING under RT-11" Proceedings DECUS EUROPE, Zuerich, August/Sept. 1983. and "THE DISK DATA CACHE UNDER RT-11" Proceedings DECUS U.S.A., New Orleans, May 1985. NOTE: ----- In the remainder of this manual DC will stand for the abbreviation of datacommunication. The manual will always describe the latest version of MP-11. RSP will stand for the Radial Serial Protocol, which is used by DEC for communicating with the TU58 device. The protocol used for MP-11 is a derived version of RSP. 2. MULTIPROCESSOR COMPONENTS ----------------------------- $ The following hardware is currently supported: 1. Parallel words: DRV-11, DR-11 C, DR-11 K 2. Serial, words: WB-11, WBV-11 3. DMA blocks: Qnector (Westvries Systems b.v., NL) Suppose we have two systems coupled by DR(V)-11 hardware: @+ /----------------\ /-----------------\ ! RT-11 /----! ! RT-11 ! ! !JOB !--\ DR(V)-11 link /--! ! ! DCJOB !Hnd ! !======================! ! DR handler ! ! \----!--/ Data \--! ! !"service job" ! <----------------> ! "DC handler" ! ! site ! Commands ! site ! \----------------/ <---------------- \-----------------/ @- Note that the DR-handler site issues the commands for I/O transfers. Then the following files should reside on either system device: $ DCJOB site DC Handler site ========== =============== DCJOB : job (there may be one other version e.g. DCJOB.SPD for Special Dir. support) DJ.SYS : job handler, drives DR.SYS : Data communi- the DC hardware at the cation handler job site (behaves like a disk) DJBOOT.SYS : boot program with Pseudo-handlers, e.g. LP.SYS, password access HL.SYS for HELLO JBINFO.DAT : job data file and HELLO.SAV : Shows remote available mailbox devices, device char- teristics, access JOBDEF.SAV : defines the list of status, message facility remote available Checks remote/local date devices, their default access status and type number of jobs running simultaneously MAIL.SAV : communicates with JBDATE.SAV: Fetches remote mailbox and job data date&time message facility JSHOW.SAV : disk/device overview WATCH.REL: Scans at regular time nr. I/O's, error count intervals remote mailbox read/write access for changes in contens $ In general the job site and the DC handler site have separate system disks. In case of a remote system disk, both disk units, will be physically located at the job site. However, it is also possible that both systems use the same system disk by using a disk - CACHE at the DC handler site. By using the JSHOW utility you have a way to see the device list of phys- ical devices, device sizes & characteristics and the read/write access. The read/write access to the devices may be changed. The access status is valid for the user at the "DC handler site", not for the user at the job site! However, it is very important that only ONE site has WRITE access to a certain disk unit. If both sites write more or less simulta- neously to the same disk unit, directory corruption may occur! At the job site a disk unit can be protected against writes by the local user by assigning the disk unit to the job who needs write access to it, e.g. : .LOAD DL2:=DCJOB0 or, in case of logical disks, by using the SET command, e.g. .SET LDx NOWRITE (x=0-7) $ 3. MULTIPROCESSOR UTILITIES OVERVIEW ------------------------------------- JSHOW Shows DC jobs running(Max. 7), nr. I/O's received and error report, devices available for remote access and their Read/Write access status. The R/W status can be changed. JBHOLD FORTRAN callable functions for blocking DCJOB activity by BG program. Usefull when e.g. BG program performs high speed A/D acquisition. IERR=JBHOLD ; Initializes job holding ; --> Should be called only ONCE before calling ; the following routines: IERR=JBSPND ; Suspends job running in the Foreground (F) IERR=JBRSUM ; Resumes job running in the Foreground (F) ; --> For each call JBSPND a matching JBRSUM ; should be executed! (NOTE: sept 85 DEC there is still a serious bug in the ABORT I/O code of the resident monitor!) MAIL Puts messages for DC jobs and general news in mailbox. $ Attention: ---------- Message size is restricted to 480 characters. HELLO To be run at the DC handler site. Displays remote available devices, their characteristics, the R/W access. The device names are the physical names, so you can "see through" logical assign- ments made at the remote configuration! Reads news and message in mailbox, sends message to servicejobs where it is put in the mailbox and printed on the console terminal. The pseudo device handler HL should be installed and have the appropriate settings (RSPvector, RSP unit no.). Compares remote and local DATE&TIME and reports differences (TIME difference should be more then about 20 min.) WATCH May be running at the DC handler site. Scans every 10 sec. status of DC job and contents of mailbox for changes. Uses HL: for I/O (see HELLO) JOBDEF Defines, by interactive query, the remote available devices and their default access status. Should be run whenever the device list or default access status is changed or when a bootstrap program is added for a memory-only system. JBDATE Fetch remote DATE&TIME and set them locally. Very uselfull in startup commandfile if normally at startup a remote DATE&TIME is present. General purpose programs: DEVICE Prints device characteristics in local system. FREE Prints sizes, no. files & free blocks of all random access devices in the local system. $ 4. DEVICE ALLOCATION --------------------- When a job starts running it attemps to open an I/O channel to all devices specified in the device list. This means that, when a handler is not loaded, the I/O channel will not be opened and the device will not be available to the remote site(s). IMPORTANT: handlers to which an I/O channel has been opened may not be unloaded as long as the DC job(s) run! Renaming of devices in the jobs device list can be done using the logical assignment procedure. NOTE: logical assigments have only an effect on the device allocation when they are made BEFORE startup of the job(s)! The JSHOW utility also examines whether the devices in the list can be allocated within the running system. During this examination it also presents the identifier and characteristics (such as special function support, variable size etc.) of the device. From these data it can be seen whether a logical assignment is in effect (and thus was done before starting up the jobs). In addition to the job's device allocation scheme, JSHOW also examines, for random access devices (disks) only, whether they really exist by issuing "dummy" read requests to these devices. Device units, who have no drive, or, when a drive is not on cycle (think of removable disks), are marked and an additional question mark ('?') is presented in the R & W access overview. At the remote site this mark is presented by HELLO. NOTE: the additional random access device examination is only done when JSHOW is run! So JSHOW should be run each time when a drive is turned off, and when a drive that was off is turned on. $ 5. HANDLERS and PSEUDO-HANDLERS -------------------------------- Using the datacommunication and pseudo-handlers is simple as they are used in the same way as handlers for local devices are used! All handlers know the SET XX SHOW command which shows you the values of the conditionals(octal!) in the handler and other usefull information. The SET commands in the pseudo handlers are: .SET XX RSPVEC=nnn .SET XX UNITS=n .SET XX UNIT0=n UNITS is the number of device units supports. E.g. if UNITS is set to 2, the handler only supports requests for units 0 and 1. Requests to other device units will return immediately with a hard error. UNIT0 is the RSP unit start number of the handler. So the handler unit number 0 corresponds with the UNIT0 setting. Handler unit 1 corresponds with the setting UNIT0+1, etc.. In this way you can change access to a remote device by changing this number. E.g. when the following remote devices (list as displayed by JSHOW and HELLO) are available: $$$ Central device DCJOB1 ----------------------------- 0 RK0: System/fixed R - 1 RK1: (Fixed RK05) R W 2 RK2: (Removable) R ? - 3 RK3: N 4 RK4: N 5 LD1: R ? - 6 LD2: N 7 DM: (RK07) R - 8 JBINFO (Mailbox) R W 9 HELLO N 10 SP: (LP: SPOOLER) - W 11 SP1:(LP: wide ) - W 12 SP2:(LP: quality) - W 13 SP3:( Plotter ) - W 14 MT: (Magtape) N 15 JOB - handler R W ----------------------------- (R = Read, W = Write access) Then a pseudo LP handler should be set: .SET LP UNITS=4 .SET LP UNIT0=10 in order to access the printing/plotting devices no. 10 - 13 ! RSPVEC is the vector of the DC handler which drives the DC hardware using the RSP protocol and to which the pseudo handler belongs. E.g. when we have a system communicating with two separate remote systems, one, a LSI-11/23, accessed by a DC handler QN: having vector 170 and one, a PDP-11/34, by DR: having vector 310, then by setting: .SET HL RSPVEC=170 ! HL: is pseudo handler used by HELLO the LSI-11/23 is accessed by the HELLO program and after: .SET HL RSPVEC=310 the PDP-11/34 is accessed! With RT-11 V5.2 the following usefull commands could be defined in this case: HELL23 :== SET HL RSPVEC=170\HELLO\\ HELL34 :== SET HL RSPVEC=310\HELLO\\ The DC handlers driving the hardware, the so called DC/RSP handlers, recognize the same SET commands except for UNIT0 and RSPVEC. The UNIT0 value is default set to 0 but may be changed by editing the handler source. Also hardware I/O page addresses and vectors are defined in the sources. All DC handlers are defined as random access with VARIABLE SIZE. This assures that the size of a (disk) volume is always correct, even when volumes with different sizes are exchanged during the run of DC jobs! Use the FREE utility to inspect the sizes of mounted volumes. When the DC-handlers are generated within a system with "device time-out" support, they also recognize the option: .SET XX TIME=t Specifie in decimal the time-out value t in 0.1 sec.(50 Hz.) units. Note that you can only use a pseudo handler when the DC-handler which drives the hardware is loaded! Otherwise a hard error will be returned on each call to the handler. Attention: Pseudo handlers CANNOT be used for bootstrapping! (The bootstrap cannot load the appropriate DC handler!) Therefore the RSP units resp. device units to boot are restricted to the range 0 - 7. 6. GETTING STARTED ------------------- The software is simple to use and control. If you want to run only one DC job you should use the Foreground/Background monitor. For running 2-7 DC-jobs or when you want to run e.g. QUEUE or other foreground/system jobs simultaneously with one DC-job, then you need the FB monitor with system job support. It is recommended to have a monitor with the so called "device time-out" support. Otherwise time-out will not be supported. Before a DC job is started assure that it's handler is installed. Then type the SET XX SHOW command, where XX is the name of the job handler. Now you can verify several settings and hardware addresses. All DC jobs are numbered with a number between 0 and 7. The number is stored in the job handler (device identifier = jobnumber + 300). The first job to run is set to number 0 with the command SET XX JOBNUM=0. The second to 1 with SET YY JOBNUM=1 etc. These numbers have to be correctly set as they are used by the utilities in order to find the correct job settings from the job info file SY:JBINFO.DAT. Once these "job-numbers" have been set, they should not be changed! Now load the job handler and assign it to the logical name JOB: .LOAD DJ .ASS DJ JOB The DC job can be started: .FRUN/BUFFER:nnnn DCJOB (or SRUN/BUFFER:nnnn) The number nnnn is the extra buffer space to be allocated to the job. When /BUFFER:nnnn is omitted only a fixed buffer space of 256 words (1 disk block) is used. When a second DC job has to be started the procedure is repeated. Assume that the second job uses a job handler with name DI. Then: .LOA DI .ASS DI JOB .SRUN/BUFFER:nnnn/NAME:DCJOB1 DCJOB Note that although the same DC job program is used, it must be given another logical name! Now run the utility JSHOW to check the devices allocated to the job(s) and R/W protection. As a good rule JSHOW should be run always after startup of job(s) as this utility updates the device lists in JBINFO.DAT (device and/or device characteristics may have changed e.g. due new logical assignments). 7. BOOTSTRAPPING ----------------- When you have a running RT-11 system, you can bootstrap a DC-handler with the command BOOT DC:. Before you should make the disk bootable with the command COPY/BOOT DC:RT11FB DC: or, at the site where the DC-job runs, COPY/BOOT:DC DSK:RT11FB DSK:, where DSK: is the handler name for the disk. If you have not a running RT-11 system (e.g. a "memory only" system) with a DC-link, then check that the remote disk is bootable for the DC handler and the remote job has BOOT support (normally default). Booting: 1. Activate the BOOT PROM at its correct start address, if you have no automatic boot on power on. If you have not yet a suitable PROM use a toggle-in boot (see System Manual). 2. In case of a bootprogram with password access enter the correct password for your processor and thereafter the RSP unit number of the bootable remote disk (should preferable be 0). 8. ERROR MESSAGES AND TROUBLE SHOOTING --------------------------------------- 8.1 ERROR MESSAGES: When a DC job prints "No JBINFO.DAT" or "No JOB:" at startup, this means respectively that there is no file SY:JBINFO.DAT (check with DIR SY:.DAT) or there is no handler with the logical assingment JOB:. When a job is running, messages printed have the following general format: ab-c-DCJOBx where x is the number of the job which printed the message. c is the condition under which the error occured: C = while receiving a command packet R = while receiving a data packet S = while sending a data packet E = while sending an end packet F = while processing special function request. I = no error but a message: e.g. ST-I-DCJOBx, job start message ab is detailed error/message specification: NU = No USR available (special direc. devices only) CK = Checksum error HD = Protocol error IN = Initialize job command received BT = Bootstrap request received BF = Bootstrap request failed NB = Not enough Buffer space for executing a Special Function request received. You can avoid this error by increasing the job's buffer space by restarting with FRUN/BUFFER:yyyy DCJOBx SP = Special directory device (e.g. Magtape) request received while the job has no code for processing it. (use another DCJOB program which has the code) ST = Start of DC job Normally when an error occurs, this is not fatal as the DC-job returns a hard error to the requesting site. However, in some more serious cases e.g. protocol errors, the job tries to recover by generating an internal reset. As a result synchronisation with the DC-handler may be lost. In this case the first next request of the handler probably will result in a hard error. Therefore the request should be repeated. It should be noted that protocol errors only occur very seldom. 8.2 TROUBLE SHOOTING: 1. Is the remote job running? Use SHOW JOBS monitor command and program JSHOW as check. Has the remote job the required features (optional is BOOT, SPDIR support)? All devices which are no special directory devices (as e.g. MAGTAPE) and to which access is needed during the runtime of the job should be loaded before a job is started (FRUN or SRUN jobname). 2. Are you aware of the appropriate device names and RSP unit numbers? Use HELLO or JSHOW for an overview. Is the access status correct? 3. Check the RSP units which a device XX supports by executing SET XX SHOW. The printout also tells you if XX is a pseudo- handler as only pseudo handlers print: RSPvec=yyy. You can only do succesfull I/O transfers if the DC/RSP-handler, to which the pseudo handler belongs is LOADED. Otherwise the pseudo handler reports hard I/O errors. 4. Are the JBINFO.DAT (job data file & mailbox) and xxBOOT.SYS (for booting with password) present on the system disk of the job site? Where xx stands for the job handler name driving the DC hardware (e.g. DJ: --> SY:DJBOOT.SYS). Is the JBINFO file initialized? This is done when at least once the program JOBDEF has run. JOBDEF should also be run when a new boot program (xxBOOT.SYS) is placed on the system disk. Does the HL pseudo handler unit (UNIT0), communicating with the remote mailbox equal the message unit number defined by the program JOBDEF and as displayed by JSHOW? 5. For EACH I/O access to a special directory device, it is required that at the job site SET USR NOSWAP is in effect. 6. AGAIN: Pseudo-handlers can only operate when the DC-handler to which they belong is LOADED. Otherwise they return a normal hard error. 7. Let executed tests to check the hardware data link. *********************************