We have an application which was tested released last month. Now the client complains saying that the application is stuck/jam twice to three times a week and they have to rebot the server every time that happens. They usually expect a lot of users during the day time, which is works fine during this time, but at night it stuck and they have to rebot the server at morning…any one has similar experience? Where you thing the problem might be? My guess is memory leak, what you think?
Memory leak is one thing. Get a Boundschecker product (or similar, I think Purfiy is one) and these are easily found out.
Other issues will be how many services are the servers runnning and can these be batched into logical Synchronous blocks. This reduces possible conflicts that services may. Also eaxmine all Application, Systems and Transaction logs and look for common occurances around the time the server locks up.
One project I work on was a 24 hr system (defence) in which we had bugs we could not repeated in an 8hr shift. What we did with these is we logically grouped them and compared it against the system design to see if a particular area of the architecture needed addressing.
Tough problem to diagnose. Question: Does it ever hang during the day? If not, then memory leak looks less likely.
A few things to look at, in addition to you looking at the memory leak.
Housekeeping jobs. Are there any jobs running over night that might have an adverse affect? Backups can cause a problem, particularly if the backup is touching a file that the application is also touching. Batch jobs and file locking - are there any jobs that run that might take a file lock (even a record lock) that causes your application to wait and that lock is not released.
Anti-virus. Any anti-virus runs causing the problem?
Debug code. Any debug code still in the application that has been used for testing that might cause it to just stop?
Time. Any code that does calculations on the time (particularly subtraction) that might cause a time interval to go negative after midnight?
As Robert says, look at the logs - if they don't have sufficient info, increase logging levels.
Something else you might want to consider is looking at how the application has been implemented by the client versus how it was tested by your team. Differences in the environments has the potential to open up a number of issues.
Cyberentomological Detection, Prevention, and Eradication Specialist
"The single biggest problem in communication is the illusion that it has taken place."
-George Bernard Shaw, Irish playwright and Nobel Prize winner, 1856-1950