Hi,
I use webservices to run a server which hundreds of client processes talk to at a high rate. They try to send a small amount of apl data to the server, receive a small amount of apl data back, and they try to wait as little as possible.
I've experienced several problems with this arrangement:
1) the server eventually 'locks up'. That is, it won't accept any more incoming requests.
2) the server generates thousands of temporary files which it does not delete.
3) sometimes the function the client is calling on the server executes, but the right argument is erroneously empty.
4) rate of successful requests seems low
I've tried to measure exactly what was happening.
The processes were sending over 300,000 requests a day (/1440 = 208.3 a minute, /1440x60 = 3.5 a second) to the server.
The processes send data using the WebTransfer's XSendObject method. When I quad DR wrapl the data being sent it is on average 408 characters long. The data returned from the server when wrapl'd is about 108 characters long. I use a 2 second timeout for the XSendObject method. According to quadMF using the CPU clock, the server side function on average takes 4 ms.
I've also measured the percentage of successful client to server communications, which varied from 60-80%. Successful client to server to client communication (client actually received data back) was <50%.
There appear to be three associated problems with this much traffic. First, the server will eventually lock up, ie all incoming requests begin to fail. The only way I know to fix that is to restart the APLWebservices application entirely. One apparently associated symptom of this is that a netstat command will show that hundreds (all?) of the ports on the server machine are in a CLOSE_WAIT state. (I can send an example of this output if you like). My kluge workaround is to periodically measure the netstat ouput and reload the server if that output was too big.
The second associated issue is the generation of thousands of temporary files, either empty or containing HTTP request headers. My workaround here is to delete all the temporary files after there are more than 1000. Before Jairo alerted me about this issue, it had created over 60000 files in the temporary folder and refused to accept new requests since (I'm guessing) it ran out of unique 4 char hex temp file names to give. In general the server seems to create many files in a short period
of time, I generally see lots of them created in a 5-15 minute period on different days. Since these seem to correspond to requests maybe these are created during periods where the server is halted (error in my apl code) and then when the server is reloaded it loses track of them? I've tried to recreate this on purpose and haven't been able to so far.
The third issue is the empty right arg on the server side function. My workaround is to check for it and essentially throw away those requests.
I recently (less than a week ago) made some changes to the client processes to reduce the number of communications with the server by about 50%. After these changes the processes are now sending less than 150,000 requests a day (/1440 = 104.2 a minute, /1440x60 = 1.7 a second). The percentage of successful client-server communications is now 99%, and client-server-client is around 95%. I'm still using a 2 second timeout in general, and some of the time a 30 second timeout.
The server hasn't been consistently up long enough for me to say for certain if the locking problem has gone away, but it hasn't happened since these changes. It is still creating lots of temporary files on occassion.
Things have thus improved quite a bit, but it bodes ill for the future because the number of client processes will almost certainly increase over time, possibly even doubling. So this is at best a temporary solution.
Is this level of performance what is to be expected, are 300,000 requests to the server a day too many? Or is this expected simply because my timeout is low?
(One idea I am considering is trying to use the asynchronous XASendObject instead, then a long timeout presumably doesn't hold up the client process.)
Any help or suggestions are appreciated, and thanks for reading what is undoubtedly a long, boring post.
APLWebServices Desktop version 2.3.7 Beta
WebTransfer object version 1.0.1720
server machine is Windows Server 2003, clients are some version of NT from NT4.0 to Windows 2000