% Disk Time, Avg Disk sec/transfer - to tell me how frequently memory paging is happening.
I also have the OpenSTA script recording the response time and reporting any errors it gets.
The only thing I'm missing is database stats, and getting these is tricky because of the environment I'm working in, so I'm planning to infer database performance from web server performance (e.g. if web server is not busy but response time is long then DB is a likely source).
If any of these indicate a problem then I'll investigate further but this is my starting point.
Does anyone else know of anything good I should be monitoring to start with?
Everywhere's within walking distance if you have enough time.
Monitor and record more counters than you think you'll need. If you try to be too selective, you'll miss something that could be the key to finding a problem. You'll then either miss the problem or have to rerun the test.
I'm curious what most of you use for your collection interval when collecting server statistics. I usually determine this by my goal and the length of the test. I.E., if I'm running a long test (10+ hours) and I have not detected anything to focus in on in previous tests, then I will usually set the interval to 30 or 60 seconds. If I'm trying to delve into some area where we have a concern, I'll usually reduce that to 5~15 seconds depending on the length of the test run again.