>>
APLDN Home

>>
Events

>>
Trainings

>>
APL Books

>>
APLDN Links

>>
Discussion Groups

>>
Downloads

>>
Articles

>>
Library

>>
Learning Tools

>>
APLDN User IO

>>
APL2000.com




Bug Reports

Author Thread: recurring system shutdowns
Richard.Hill
recurring system shutdowns
Posted: Tuesday, October 25, 2005 11:53 PM (EST)

The system shutdown problem is getting worse.

Am now getting a shutdown 1-2 times a day, whereas

it might have happened only every few months before.

We follow the advice provided earlier, using

{quad}IT  audit...

and WSTOFILE... to rebuild the ws

at intervals.

The individual shutdowns are not that serious

and we are not losing much work, but it is very  annoying.

 

It seems that we have hit some sort of barrier and/or

unwittingly drifted into a new programming practice.

Would appreciate guidance.

 

Our ws's are about 50MB with 6000+ fns.

We drive windows fairly hard. Opening and closing many small

windows at speed.

Our practice is to write fairly small fns, consequently the size

of {quad}SI can get big.

The failures occur on different computers (all with winXP)

However, none of this is new, so what has changed in the last few

weeks?

Richard Hill


Comments:

Author Thread:
davin.church
recurring system shutdowns
Posted: Wednesday, October 26, 2005 7:22 PM (EST)
Can you describe the "shutdowns" a little bit? Maybe some of us can come up with something to try. Also, when you're using []IT 'AuditRefcountsC', is it returning a matrix of all zeros, or is it giving back positive numbers in places?

     

Richard.Hill
recurring system shutdowns
Posted: Wednesday, October 26, 2005 9:36 PM (EST)

Davin Church asked for details of the shutdowns, will try to do this,

The only way I know to capture the shutdown msg is to use printscreen

and paint, and get an image. This is fiddly and increases the

annoyance level. Is there a better way?

re {quad}IT 'AuditRefconuntsC' I get different results each time.

Following DC's earlier advice repeat untill all zero before

saving the ws. There is no pattern to the display that I can recall.

Sometimes the numbers appear in the first row, sometimes the

second.

Thank you for your interest in the problem.

     

davin.church
recurring system shutdowns
Posted: Thursday, October 27, 2005 1:43 AM (EST)

Ok, Richard...

 

First of all, simply knowing that you're getting a message at all is a help.  (You didn't specify anything about what was going on.)  You might try to just tell us what the title of the message box says plus the first line of text from inside the box.  If there's very much information in there, it's probably too much detail for us anyway.  So just getting the basics will help get things started in the right direction.

 

As for your AuditRefcountsC problem, that I can probably help with.  That problem most often recurs when you copy something into the workspace from another "infected" workspace.  So, you should begin by clearing out your production workspace (using AuditRefcountsC until it's all zeros) and then )SAVE it that way.  Then go and do the same to every other workspace on your system, and on any other system that you move or )COPY workspaces to and from.  This should clean up your entire environment so that the problem will be more difficult to recur.

 

If it's still showing up after all this, and you're not copying things in from other workspaces (that aren't also proven "clean"), then you've probably found a new way to produce the orphan-damage.  At that point, you'll need to start work on a simplified replication method for the APL folks to track down for you.

     

Andrew.Brown
recurring system shutdowns
Posted: Thursday, October 27, 2005 5:39 AM (EST)

Just as a matter of interest, I allways get a "System Failure" when I run []IT 'AuditRefcountsC'.

 

     

Support
recurring system shutdowns
Posted: Thursday, October 27, 2005 8:57 AM (EST)

Richard:

 

Davin's comments and suggestions are spot on.  [Thanks Davin]  If you find a reproducible case, please send it to support@apl2000.com  for investigation.  To help isolate the point of failure in your application, you might consider interspersing the ŒIT 'AuditRefcountsS' command throughout your application.

APL2000 Support

     

Support
recurring system shutdowns
Posted: Thursday, October 27, 2005 9:20 AM (EST)
Andrew:

If you're getting a "System Failure" error report when trying ŒIT 'AuditRefcountsC', then your workspace is damaged in a manner that isn't repairable with the ŒIT command.  In this instance, the solution is to get the objects (fns and vars) from the damaged workspace into a new workspace.  You can accomplish this using the WStoFILE and FILEtoWS utility functions from the CONVERT.W3 workspace or the ]OUT and ]IN user commands.  After creating the new workspace, you should run the ŒIT 'AuditRefcountsS' command to ensure that the workspace is free of any anomolies.

APL2000 Support

     

Paul.Ravitz
recurring system shutdowns
Posted: Thursday, October 27, 2005 10:08 AM (EST)
Question for APL2000 support: Is it always safe to run #IT 'AuditRefcountsS' in a workspace prior to )SAVEing it? I ask because I have a tool set up which I use instead of typing )SAVE, which checks a number of things (eg. #SI) before actually saving the workspace, and I'd like to add a call to #IT 'AuditRefcountsS'. I do NOT plan on automatically calling #IT 'AuditRefcountsC'. Thanks.

     

Support
recurring system shutdowns
Posted: Thursday, October 27, 2005 11:12 AM (EST)
Rav:

Yes, but make certain to not save the workspace when the ŒIT command detects anomalies in the workspace.

APL2000 Support

     

William.Rutiser
recurring system shutdowns
Posted: Friday, October 28, 2005 1:34 PM (EST)
The AuditRefcounts? operations "frisk" the Ws beforehand. The frisker examines most of the memory management data in the workspace. It looks for things like incorrect lengths, invalid pointers, etc. AuditRefcounts? itself, recomputes and verifies reference counts. {BREAK} The frisker reports the first error it finds by putting up a "System Failure" message box. These problems are sufficiently catastrophic that the interpreter can't continue safely. {BREAK} Reference count problems are much less severe. They can live on, and multiply( as Davin discovered) for years. They are unpleasantly hard to track down. {BREAK} It is somewhat possible that the auditor will trip over something that the frisker didn't detect. In which case, the interpreter crashes.{BREAK} So if there is nothing wrong except reference count problems, AuditRefcountsS is safe. If there is anything else wrong, you will lose the active WS.{BREAK} Note that many of the things that the frisker can find have a pretty good chance of inducing a failure while saving the WS. Such failures would happen before anything is written to the disk. {BREAK} The belt and suspenders{aka braces} would be to first do save with a temporary name, run the auditor, and then resave with the real name.{BREAK} -- wru

     

Richard.Hill
recurring system shutdowns
Posted: Monday, October 31, 2005 9:35 PM (EST)

System Failure

...wsfrisker.cpp

Line 171

unrecognised descriptor

...

after several runs of AuditRefcountsC

 

soon after a rebuild of the ws with WSTOFILE

after couple of )saves and )loads

 no )copy

a file was open

     

Paul.Ravitz
recurring system shutdowns
Posted: Tuesday, November 01, 2005 6:12 AM (EST)
Thank you for that detailed explanation. p

     

Support
recurring system shutdowns
Posted: Wednesday, November 02, 2005 12:22 PM (EST)
Richad,

If you have a relialbly reproducible example, send it to us we will investigate it.

APL2000 Support

     

Richard.Hill
recurring system shutdowns
Posted: Sunday, November 06, 2005 10:13 PM (EST)
in a completely different area, following a rebuild with WSTOFILE, and 2 succesive )saves and )loads 2 flags from AuditRefCount, both cleared before this system failure ...WsManager.cpp line: 379 there was a file open, 2 timers running and the SI had about 100 entries

     

Richard.Hill
recurring system shutdowns
Posted: Monday, November 14, 2005 3:08 AM (EST)
Support asked for a repeatable case. There doesnt seem to be any pattern. Most of the times we get a immediate stop and a MS windows msg..."please tell us about this problem" We got another one today, again, just after a refcount problem, then a WSTOFILE-FILETOWS clean up a couple of saves and then refcount... then cleared refcount then lost the ws and got "Please tell us..." I havnt been sending these in to MS, but decided to start doing it today. One sent off a few minutes ago. Is it a waste of time sending stuff to MS? Richard Hill

     

brent.hildebrand
recurring system shutdowns
Posted: Monday, November 14, 2005 11:39 AM (EST)
100 entries in []SI? Why so many? Are you making onTimer calls, and then executing a []WGIVE in the callback handler, and thus backing up the completion of the handlers? That would not be a good thing.

     

Richard.Hill
recurring system shutdowns
Posted: Tuesday, November 15, 2005 9:31 PM (EST)
Brent, Thank you for your interest in the problem, Our large QuadSI is due to writing very many small functions which invoke other small functions... (We were told that this is a good idea, some years ago, and its not new, the application has had this architecture for a long time)<break> we have one onTimer call with Wgive in the callback. It is deep in the system and has been working quite well for years. The recurring shutdowns only started happening a few weeks ago.<break> I fear that we may have reached some sort of 'reasonableness' limit, as we have slowly grown the complexity of the app. However, there dont seem to be any guidelines for this. There are mentions of APL+ apps that must be much more heavyweight than our baby.<break>Is there any recommendation of where you should stop in growing things like no. of files, no. of forms, no. of functions,,,,?<break> The other possibility, of course, is that a few weeks ago MS have modded something in WinXP, since we have applied all the automatic updates. For this testing I only have acess to winXP or winXP64 machines. Tried on 4 different computers, same behaviour on all.<break> My next step, when pressures reduce, is to write an onTimer tool that runs AuditRefCount independently at frequent intervals and stops the system somehow when a non-zero appears. Has anyone got something like this available? Richard

     

davin.church
recurring system shutdowns
Posted: Wednesday, November 16, 2005 10:42 AM (EST)

Richard, here's a simple version of a refcount-checker that I use:

 

    ’ CheckWs
[1]   Œerror (~^/^/0=Œit 'AuditRefcountsS')/'WS damaged!'
    ’

 

You're also welcome to download my timer utility from <http://apldn.apl2000.com/downloads/APLWI+Downloads/APLWIUF/2347.aspx> to handle the timer events conveniently.  For example:

 

'CheckWs' ‘Timer ¯30

 

runs a check every 30 seconds.

     

William.Rutiser
recurring system shutdowns
Posted: Thursday, November 17, 2005 10:25 AM (EST)
Richard, Exactly what build of APL are you running-- What does []sysver report? What does "system shutdown" refer to? the entire Windows system, the APL interpreter, or perhaps something else? If the interpreter, how is the shutdown manifested? Do you get a dialog box from APL or from Windows? What does it say? What non APL things do you interface with? ActiveX objects, []WCALLed things, []NA called things, []call stuff? Are you using []NI? []WI? -- Bill

     

Richard.Hill
recurring system shutdowns
Posted: Friday, November 18, 2005 2:31 AM (EST)
Davin, thanks for your help. your suggestions are already downloaded and working in our test system.<break> Bill, you asked for more info... Œsysver 5.2.08 Dec 10 2004 12:41:08 Win/32 <break> What does "system shutdown" refer to? the entire Windows system...yes..., the APL interpreter...yes...how is the shutdown manifested? Do you get a dialog box from APL...yes... or from Windows?...yes... What does it say?...I quoted the interpreter messages earlier in the thread. The windows messages are the "please send us error report" type, as mentioned earlier in tthe thread.<break> A typical scenario is... pause during testing, and try to save the work, our save has auditrefcount built in. We get the entries and try to clear as recommended. If they clear, I usually quickly save a function file with the modded functions and then try to save the ws. The save may work, but usually we get the crash at this time. About 10percent of the time it is a total machine lockup that only power off will clear. Most often we get the microsoft "...we are sorry..." message, and occasionally we get the message from the interpreter. Less often, we get the failure while a test is running.<break> What non APL things do you interface with? ActiveX objects,..yes...(Excel, but Excel is usually not active when the failures occur. <break> []WCALLed things?...yes..., []NA called things?..yes... , []call stuff?...yes... (but, only the APL+ supplied functions)<break> Are you using []NI? ...no...<break>[]WI?...yes...(a lot) thanks again for your interest, Richard

     



APL2000 Official Web Site

Service is the rent we pay for being. It is the very purpose of life, and not something you do in your spare time.
--- Marion Wright Edelman

APLDN Home   |    |  Events   |  Trainings   |  APL Books   |  APLDN Links   |    |  Discussion Groups   |    |  Downloads   |  Articles   |  Library   |  Learning Tools   |  APLDN User IO   |  APL2000.com   |