>>
APLDN Home

>>
Events

>>
Trainings

>>
APL Books

>>
APLDN Links

>>
Discussion Groups

>>
Downloads

>>
Articles

>>
Library

>>
Learning Tools

>>
APLDN User IO

>>
APL2000.com




Problem Resolution

Author Thread: Performance issues with Web Services
Owen.Shelksohn
Performance issues with Web Services
Posted: Thursday, July 01, 2004 7:45 PM (EST)

Hi,
 I use webservices to run a server which hundreds of client processes talk to at a high rate. They try to send a small amount of apl data to the server, receive a small amount of apl data back, and they try to wait as little as possible.

 

I've experienced several problems with this arrangement:
1) the server eventually 'locks up'. That is, it won't accept any more incoming requests.
2) the server generates thousands of temporary files which it does not delete.
3) sometimes the function the client is calling on the server executes, but the right argument is erroneously empty.
4) rate of successful requests seems low

 

I've tried to measure exactly what was happening.
The processes were sending over 300,000 requests a day (/1440 = 208.3 a minute, /1440x60 = 3.5 a second) to the server.
The processes send data using the WebTransfer's XSendObject method. When I quad DR wrapl the data being sent it is on average 408 characters long. The data returned from the server when wrapl'd is about 108 characters long. I use a 2 second timeout for the XSendObject method. According to quadMF using the CPU clock, the server side function on average takes 4 ms.

I've also measured the percentage of successful client to server communications, which varied from 60-80%. Successful client to server to client communication (client actually received data back) was <50%.

 

There appear to be three associated problems with this much traffic. First, the server will eventually lock up, ie all incoming requests begin to fail. The only way I know to fix that is to restart the APLWebservices application entirely. One apparently associated symptom of this is that a netstat command will show that hundreds (all?) of the ports on the server machine are in a CLOSE_WAIT state. (I can send an example of this output if you like). My kluge workaround is to periodically measure the netstat ouput and reload the server if that output was too big.

 

The second associated issue is the generation of thousands of temporary files, either empty or containing HTTP request headers. My workaround here is to delete all the temporary files after there are more than 1000. Before Jairo alerted me about this issue, it had created over 60000 files in the temporary folder and refused to accept new requests since (I'm guessing) it ran out of unique 4 char hex temp file names to give. In general the server seems to create many files in a short period
of time, I generally see lots of them created in a 5-15 minute period on different days. Since these seem to correspond to requests maybe these are created during periods where the server is halted (error in my apl code) and then when the server is reloaded it loses track of them? I've tried to recreate this on purpose and haven't been able to so far.

 

The third issue is the empty right arg on the server side function. My workaround is to check for it and essentially throw away those requests.

 

I recently (less than a week ago) made some changes to the client processes to reduce the number of communications with the server by about 50%. After these changes the processes are now sending less than 150,000 requests a day (/1440 = 104.2 a minute, /1440x60 = 1.7 a second). The percentage of successful client-server communications is now 99%, and client-server-client is around 95%. I'm still using a 2 second timeout in general, and some of the time a 30 second timeout.

 

The server hasn't been consistently up long enough for me to say for certain if the locking problem has gone away, but it hasn't happened since these changes. It is still creating lots of temporary files on occassion.

 

Things have thus improved quite a bit, but it bodes ill for the future because the number of client processes will almost certainly increase over time, possibly even doubling. So this is at best a temporary solution.

 

Is this level of performance what is to be expected, are 300,000 requests to the server a day too many? Or is this expected simply because my timeout is low?

 

(One idea I am considering is trying to use the asynchronous XASendObject instead, then a long timeout presumably doesn't hold up the client process.)

 

Any help or suggestions are appreciated, and thanks for reading what is undoubtedly a long, boring post.

 

APLWebServices Desktop version 2.3.7 Beta
WebTransfer object version 1.0.1720
server machine is Windows Server 2003, clients are some version of NT from NT4.0 to Windows 2000


Comments:

Author Thread:
j.merrill
Performance issues with Web Services
Posted: Sunday, July 04, 2004 10:59 PM (EST)
Google for TIME_WAIT (possibly include Windows as well) and you'll get info about a registry setting to reduce the time delay after a disconnect from the default 4 minutes (!!!) to a smaller value. Holler if that doesn't solve your problem.

     

Jairo.Lopez
Performance issues with Web Services
Posted: Tuesday, July 06, 2004 1:14 PM (EST)

A new Release Candidate of the Desktop version of the server has been posted to APLDN. You can find it at the following link:
http://apldn.apl2000.com/downloads/APLWS+Downloads/APLWSRC/756.aspx
This new version implements a more aggressive clean up of open sockets and temporary files.
My first suggestion is to use ‘ASendObject’ instead of ‘SendObject’. I am assuming that the timeout of only 2 seconds is to avoid the client being locked waiting for the server to process the request. Using the asynchronous flavor of ‘SendObject’ will accomplish the same result in a cleaner and extensible mechanism.
My second suggestion is to be careful of not creating your own Denial of Service (DoS) attack. A machine has a limitation of 64000 ports and when an incoming connection is accepted, the connected socket gets its own port. If the client then closes the socket on his end it takes some time for the server to receive the event and release the port. On your test, making 3.5 requests per second and having a client-server-client success rate of less than 50% means that every second (on average) the system leaves 1.75 open ports waiting to be closed (which takes up to 4 minutes). At that rate is just a matter of time before the machine runs out of ports and the server cannot accept any more incoming requests.
Can you try the new server and let me know of any performance improvements?

     

j.merrill
Performance issues with Web Services
Posted: Tuesday, July 06, 2004 4:47 PM (EST)

What do you mean by "more aggressive clean up of open sockets"?  (Clearly cleaning up temp files more aggressively was warranted.)  The server should not "clean up open sockets" by violating the TCP/IP standards in regards to the length of time the system stays in the TIME_WAIT state, given that the owner of the machine can lower this value if needed (it always should be lowered on a potentially busy server, IMO).

 

If it had not been the case before (and I haven't looked in a while), the caller should be able to specify the equivalent of HTTP's Keep-Alive spec to allow re-use of the connection (if the caller knows that another request will follow).  However, that should not be the default, and the server should not be leaving the socket open unless the request includes such a keep-alive spec.

 

I don't know what you can mean by "aggressive clean up of open sockets" that doesn't make me nervous.

     

Jairo.Lopez
Performance issues with Web Services
Posted: Wednesday, July 07, 2004 11:06 AM (EST)

The system will close an open socket if both conditions are true:

  • More than 500 open sockets
  • No activity in the socket for 30 seconds.

This mechanism bypasses the settings of the server properties "Connection Timeout" and "HTTP Keep-Alives enabled", and the Windows setting for TIME_WAIT.

 

This is designed to prevent a DOS attack.

 

As this is really a features issue, we should probably start a new thread under the "Features" section for a followup discussion.

     



APL2000 Official Web Site

Here is a test to find out whether your mission in life is complete. If you're alive, it isn't.
--- Richard Bach

APLDN Home   |    |  Events   |  Trainings   |  APL Books   |  APLDN Links   |    |  Discussion Groups   |    |  Downloads   |  Articles   |  Library   |  Learning Tools   |  APLDN User IO   |  APL2000.com   |