Load Testing For Performance Tuning
This is the topic for which I have devoted much these days.
We recently had an exercise, when we came to know that the site degrades in performance if concurrent users are >50, all we could do is list the variable that affect the performance using the webserver specific guidelines (Weblogic for e.g)
All we used to do was change a parameter value and keep the rest variable valus constant and conduct Load test to see whether things improved or not.
We spent days and nights but the end result was that we could not find optimal values for the variables that would improve performance.
I came from Stats background so thought to experiment the same using techniques called Taguchi Methods
I would like to request you let me know
1.How do we ensure that we have listed all the variables
2.How To know nearest best values for variables
3.How to go about Load testing
Re: Load Testing For Performance Tuning
Govind, here are some tips straight out of the Weblogic Server Tuning Guide. I added the link if you want to see the whole document.Check your other post under loadrunner for the settings the will really affect your load testing.
WebLogic Server Tuning
The primary means to tune and configure WebLogic is through making changes to the weblogic.properties file.
This value in the weblogic.properties file equals the number of simultaneous operations that can be performed by the WebLogic Server. As work enters a WebLogic Server, it is placed on an execute queue while waiting to be performed. This work is then assigned to a thread that does the work on it.
The default value is 15. For most applications, you should leave this value unchanged. You will only see marginal benefit from increasing this value if you do not know what you are doing. If you are in doubt about this parameter, leave it at the default.
Adding more threads does not necessarily imply that you will be able to process more work. Even if you add more threads, you are still limited by the power of your processor. As such, you can degrade performance by increasing this value unnecessarily. Since threads are resources that consume memory, a very high execute thread count causes more memory to be used and increased context switches. This will degrade your performance as the following explanation illustrates:
Setting the executeThreadCount too high will cause too much context switching. The executeThreadCount value is more CPU related than WebLogic related, so the general rules of thumb regarding threads and CPUs apply. Assume that:
n = executeThreadCount (number of threads) and k = number of CPUs
The following scenarios are possible:
1. If (n < k) this results in an under utilized CPU, we need to increase the thread count.
2. If (n == k) that is theoretically ideal, but the CPUs are under utilized, we need to add more threads
3. If (n > k) by a "moderate amount of threads". This is practically ideal, resulting in a moderate amount of context switching and a high CPU utilization rate. Tune the "moderate amount of threads" and compare performance results.
4. If (n > k) by "many threads". This could lead to significant performance degradation as it results in too much context switching, so reduce the number of threads.
For example, if you have 4 processors, then 4 threads can concurrently be running. So, you want the execute threads to be 4 + (the number of blocked threads).
This is very dependent upon the application. For instance, how long the application might block on threads, which can invalidate the formula above. The value of the executethreadCount depends very much on the type of work the application does. For example, if your client application is thin and does a lot of its work through remote invocation, the time your client app spends connected will be greater than for a client application that does a lot of client-side processing, for example.
If your application makes database calls that takes a long time to return, then you will need more execute threads than an application that makes calls that are short and turnover very rapidly. For the latter, you can use a small number of execute threads and improve performance.
It is also important to note that when the native performance packs are not being used, some of the execute threads will be used to read from the sockets (see weblogic.system.percentSocketReaders).
If your executeThreadCount is too low, you will see the following symptoms under maximum load on your server:
· CPU is waiting to do work, but there is work that could be done.
· Can not get 100% CPU and
· All threads are blocked and runnable when you do an execution snapshot.
If your executeThreadCount is too high, you will see the following symptoms when running the WebLogic Server under maximum load:
· An execution snapshot shows that there is a lot of context switching going on in your JVM.
· Your performance increases as you decrease the number of threads.
You should use the performance packs for your platform. The performance packs use platform-optimized native I/O. Benchmarks show performance improvements of up to a factor of 3 for most workloads. For a list of currently available performance packs, please see: Installing from a zip archive (UNIX, Windows NT).
Amount of Resources Dedicated to Listening
NOTE: Only applicable if performance packs are NOT being used. If possible, use the Performance Pack for your platform. This parameter sets the maximum percentage of execute threads that are set to read messages from a socket. Allocating execute threads to act as socket reader threads increases the speed and the ability of the server to accept client requests. Again, an optimal value for this property is very application specific. It is essential to have a good balance between the number of execute threads that are devoted to reading messages from a socket and those that perform the actual execution of tasks in the server.
The default is 33, and the valid range is 1-99.
Connection Backlog Buffering
During operations, if many connections are dropped or refused at the client, and there are no other error messages at the server, the problem could be with the TCP backlog parameter. This parameter specifies how many TCP connections can be buffered in a wait queue. This queue is populated with requests for connections that the TCP stack has received by the application has not accepted yet. This is a fixed size queue definable by this parameter.
If you are getting "connection refused" messages when you access the WebLogic Server, raise this number from the default by 25%. Continue increasing the value of weblogic.system.acceptBacklog by 25% until the messages cease to appear.
EJB Pool Size
When EJBs are created, the bean instance is created and given an identity. When the client removes a bean, the bean instance will be placed in the free pool. A subsequent bean creation can avoid an object allocation by reusing the previous instance that is in the free pool. This property improves performance if there are frequent creation/deletions of EJBs.
Do not change this value unless you frequently create beans, do a quick operation, and then throw them away. Then, enlarge your free pool by 25-50% and see if performance improves. If object creation represents a small fraction of your workload, then increasing this parameter does not get you much. For applications where EJBs are very database intensive, do not change the value of this parameter.
Caution: do not go overboard. Tuning this parameter too high uses extra memory. Tuning it too low will cause unnecessary object creation. If you are in doubt about changing this parameter, leave it unchanged.
EJB Caching Size
The WebLogic Server allows you to configure the number of active beans (with an identity) which are present in the EJB cache. This cache is the in-memory space where beans exist.
When a bean is brought into the cache, ejbActivate() is called, when it is removed, ejbPassivate() is called. It is basically equivalent to virtual memory being kept in memory or on disk. Tuning it too high will use up memory unnecessarily.
In general, the main idea behind setting an optimal value for maxBeansInCache is to avoid excessive passivation (the transfer of an EJB instance from memory to secondary storage) and activation (the transfer of an EJB instance from secondary storage to memory). As mentioned earlier, the EJB container performs passivation when it invokes the ejbPassivate method and when the EJB session object is needed again, it is recalled with the ejbActivate method. When the ejbPassivate() call is made, the EJB object is serialized using the Java serialization API or other similar methods and stored in secondary memory (disk). The ejbActivate() method causes just the opposite.
The container automatically manages this working set of session objects in the EJB cache without the client or server's direct intervention. There are specific callback methods in each EJB which describes how to passivate (store in cache) or activate (retrieve from cache) these objects. Excessive activation and passivation will nullify the performance benefits of caching the working set of session objects in the EJB cache - especially when the application has to handle a large number of session objects.
Activation and passivation of EJBs is analogous to virtual memory on a computer. You want to minimize the number of times that your beans are activated and passivated. The cache size can help minimize this activity. To set determine if you cache size should be bigger, take a number of execution snapshots. Look at these snapshots of your execution and see if there is a lot of passivation and activation going on. If so, increase the size of your cache and see if performance improves. Otherwise, leave this value alone.
Setting Database as Shared
(dbIsShared) You want to set this value to "false" if possible. Better performance can be achieved if dbIsShared is set to "false" because read-only operations do not require WebLogic to re-read from the database.
You cannot set this value to "false" if you are running in a cluster or if another process will modify or use the same database.
Database Connection Pools
It is safe to have the number of Database connections equal the executeThread count. Additionally, it might be worthwhile to have the executeThread count equal to Database connections be one or two more than the number of execute threads. This way the remaining threads can do work while the others are blocked waiting for the database.
Also, since connections in the pool are allocated on a one per transaction basis, it is possible that long-lived transactions could block the pool for a long time, meaning that you will need to increase your pool size. This has nothing to do with the number of execute thread count but must be taken into consideration. When the native performance packs are not being used, you should also take into consideration the number of threads dedicated to reading from the sockets.
Tuning WebLogic Clients
This section covers the settings and configuration that should be used for tuning clients to the WebLogic Server.
If you are using WebLogic RMI clients and there are more than 2 Weblogic servers in a cluster, you may encounter a significant performance degradation (very long round trip times for stateless session beans, for instance). The solution is to make some property changes on the client side as explained below.
The solution to this problem is to ensure that there are at least as many socket reader threads as there are connections to the server and also allowing for some extra threads for processing other tasks. This is accomplished by starting the client with the command line argument "-Dweblogic.system.percentSocketReaders" set to a sufficiently high percentage (say 50) and by ensuring that there sufficient number of execute threads for other processing on the client. A metric of twice the number of execute threads as there are servers in a cluster should work fine if the above percentage is at 50. The command line argument affecting the number of execute threads is
For instance, we could use:
when testing with 3 or 4 servers in a cluster.
Tuning an Application Running in the WebLogic Server
The WebLogic Server only performs as well as the applications running inside of it. It is important to determine the bottlenecks that impede performance.
Profiling will reveal the hotspots in the application resulting that result in either high CPU utilization or high contention for shared resources. Some common profilers are:
1. OptimizeIt (http://www.optimizeit.com) A good performance debugging tool for Solaris and NT.
2. JProbe (http://www.klg.com) Has a family of products that provide the capability to detect performance bottlenecks, perform code coverage and other metrics.
In addition, Java 2 has some excellent profiling tools. For more information please see http://java.sun.com/products/jdk/1.2/docs
These profilers can show you where you are spending the majority of time during your application's execution.
Use sessions sparingly. Sessions should only be used for state which cannot realistically be kept on the client or if URL rewriting support is required. Use of sessions involves a scalability trade off. Simple bits of state, such as a user's name, should be kept in cookies directly. If desired one can write a wrapper class to do the getting and setting of these cookies in order to make life easier for other servlet developers working on the same project. The fewer accesses made to a session object the better; each is a costly operation. Keep frequently used values in local variables. Put aggregate objects rather than multiple single objects into the session where possible.
(for Type 4 MS SQL)
Because of the way the type-4 MS SQL driver is written, it may be much faster to create and execute an SQL statement without parameters, or with parameter values converted to their string counterparts and added as appropriate to the string, than to do a long series of setXXX() calls, followed by execute().
Decide how many simultaneous connections you expect (there is never any reason to have more http threads than http connections), how many simultaneous requests you are likely to have and how long the http threads are likely to block servicing the requests. Arrange parameters for the minimum number of threads (but equal to or greater than the number of CPUs) possible so that there is never a situation where all threads are blocked at the same time there are pending requests. Assuming a constant number of connections and a constant request rate from those connections this means that the more your servlets block the more HTTP threads you will need.
Set Isolation Level
Data accessibility is controlled through the transaction-isolation level mechanism, which determines the degree to which multiple interleaved transactions are prevented from interfering with each other in a multi-user database system. Transaction isolation is achieved through use of locking protocols that guide the reading and writing of transaction data. This transaction data is written to the disk in a process called "serialization." Lower isolation levels give you better database concurrency at the cost of less transaction isolation.
You should optimize your application so that it does as little work as possible when handling session persistence. In the case of the WebLogic Server, several options are available for Session Persistence including: in-memory replication and JDBC-based persistence.
In-memory replication is up to 10-times faster than JDBC-based persistence for session state. You should use in-memory replication if possible.
If you are using JDBC-based persistence, you should optimize your code so that it has a high granularity for session state persistence as possible. This means that you want to reduce the number of "puts" that you use during your http session. In the case of JDBC-based persistence, every session "put" that you do results in a database write. You want to minimize how often information is persisted during a given session. Look at your "puts" and see if you can combine them into one large put instead.
If you followed the steps described here, you should now have a well-tuned WebLogic Server and application. You should remember, however, that performance tuning is an iterative science and requires time and effort. Experimentation with your given application and configuration is the best way to develop the best performing system possible on the WebLogic Server.
Additional Information can be found at: http://www.weblogic.com/docs51/admindocs/tuning.html
-- Mike --