DELUA technical manual, VAX diagnostic

Ian S. King isking at uw.edu
Wed Nov 19 19:50:23 CST 2014


I have to say this about the LCM VAX: it successfully ran CMUIP for a
rather long time.  The pattern of failures, becoming more and more
frequent, seems to conform to a hardware issue rather than one of
software.  The machine has never had a large load - when I watched it
regularly, there were rarely more than a handful of users at any one time.
The machine has been power-cycled in response to errors, which would of
course reinitialize all transient data structures - and I do not believe
that CMUIP uses persistent caches (i.e. cached to disk).

Yes, LCM could just load UCX and perhaps whistle a happy tune.  It might be
an interesting experiment to do so and observe behavior - the changes in
the startup script could easily be commented in/out.  I'd certainly like to
see them continue to use CMUIP for historical reasons.  Multinet would also
be interesting, if policies have changed at LCM to provide for licensing
costs of that software.  When I was restoring the machine back in 2008-9,
Process Software offered a somewhat amusing 'discount' for an educational
institution.  -- Ian

On Wed, Nov 19, 2014 at 8:03 AM, Peter Coghlan <cctech at beyondthepale.ie>
wrote:

>
>> One thing, though.
>> I don't think that the error code from the $QIO in the OPCOM log is a
>> VMS exit code. But I might be wrong on that.
>> But that could do with some more examining.
>>
>>
> There is a poorly phrased entry in the CMU/IP FAQ which could give the
> impression that CMU/IP uses it's own error codes that are entirely
> different
> from VMS status codes.  What I think it is really trying to say is that
> like
> many VMS applications, CMU/IP defines _additional_ status codes that VMS
> does
> not already have suitable messages defined for and the text messages
> associated
> with these are not available unless the appropriate CMU/IP provided message
> files are loaded.
>
> Low numbered error codes such as 1C (and another favourite - 0C which is
> %SYSTEM-F-ACCVIO, access violation) come from system services and runtime
> library functions that are part of VMS and the message texts are made
> available
> automatically by VMS.  It is not the case that CMU/IP reporting an error
> code
> of 1C means something different to some part of VMS reporting it.  They
> both
> mean process quota exceeded.
>
> Directly underneath that entry in the FAQ, I found the following:
>
>
>> 3.1.2 >>>> IPACP CRASH DUE TO QUOTA EXCEEDED
>>  [20-MAR-1995]
>>
>> For systems with a high IP load, IPACP may occasionally crash with a quota
>> exceeded. This does not refer to disk quota, but to one of the process
>> quota
>> limits. Usually, the quota in question is BYTLM.
>>
>> The default BYTLM provided for IPACP (65536) is sufficient for only about
>> 20
>> connections. IPACP takes about 32000 for itself and each connection takes
>>
> about
>
>> 1872 bytes. This requirement is NOT currently documented.
>>
>> To increase the BYTLM for the IPACP, modify the IP_STARTUP.COM procedure
>> and
>> change the value of the /BUFFER_LIMIT qualifier on the RUN command that
>> starts
>> the IPACP process. Then shut down and restart IPACP.
>>
>> At the current time, there also appears to be a memory leak in IPACP
>> which has
>> the effect of gradually reducing the available BYTLM over time. When this
>> gets
>> close to zero, IPACP will hang (as it retries) and then crash soon
>> afterwards.
>> It is therefore desirable to give IPACP more BYTLM than the typical load
>> might
>> suggest. If this sort of crash is experienced, increase the BYTLM by 50%
>> and
>> restart it.
>>                                                       <A.Harper at kcl.ac.uk
>> >
>>
>>
> Looks like my pagefile quota guess was wrong and the culprit is BYTLM.
> However,
> I suspect the underlying cause of this problem has never been fully
> addressed
> and increasing the quota will not help, or worse, will help for about a
> week
> before the problem returns even more frequently.
>
> I cannot overemphasise how much relief will be experienced on the
> replacement
> of CMU/IP by something that works properly or even by something that
> doesn't
> mess up as badly.  Problems that you didn't even know you had will go away,
> even ones which seemed unrelated to networking.  On sunny days, the sun
> will
> seem brighter and the sky bluer :-)
>
> In my previous posting, I forgot to mention that you can also try:
>
> $ MCR NCP SHOW KNOWN LINE COUNTERS
>
> if running DECnet.  This will give DECnet's view on any network media
> problems
> including those relating to other protocols going through the same network
> adapter.  It probably won't have much to say about hardware failures in the
> network adapter though.  Remember that on a half duplex ethernet,
> collisions
> are normal and expected but late collisions indicate a problem.
>
> Regards,
> Peter Coghlan.
>



-- 
Ian S. King, MSIS, MSCS
Ph.D. Candidate
The Information School
University of Washington

An optimist sees a glass half full. A pessimist sees it half empty. An
engineer sees it twice as large as it needs to be.


More information about the cctech mailing list