Document ID: 0010
Topic: Java, JVM, HotSpot
Created: 2007-06-10
Last Updated: 2007-10-28
Author: Stefan Parvu
References:
OS: Solaris, Linux
JVM Tuning Procedure
The current procedure explains how can you perform a test
cycle for your J2EE application running on top of Java Virtual Machine.
The main goal is to tune the JVM layer and ensure that your application
can work for certain amount of time without performance problems.
For this procedure Sun's implementation has been selected: Java HotSpot VM.
The procedure consists of one or more test cycles, mainly 3-4 cycles
each containing different test cases, designed to stress the application
for various payloads. The workload is injected using JMeter or LoadRunner
applications. It is better to have dedicated load injectors machines for
this procedure.
Below you can see a suite of steps, called cycles which should be
applied and run against your applications. The current procedure
requires a lot of patience and dedication in order to be able to
observe and collect all data from your Java application. At the end
of this procedure you will gather different settings for your
application which should be reviewed with the entire development team
and applied to your PROD environment.
Note:
This procedure was created after spending long nights to keep
up and running various J2EE applications: it is a simple algorithm which can
be applied to collect numbers and tune the JVM layer on different payload
cycles. As a system admin I was wasting long hours trying to discover
why certain Java services would alert in the middle of the night or stop
working 3AM ! After long investigations I found out that majority of
J2EE applications are simple deployed to a PROD environment in hurry and
without proper attention from this side: the JVM.
You should adapt this to your own application and chose the most suitable steps
needed.
Table of Contents
Cycle 1
|
|
initial analysis using the default JVM options
|
Cycle 2
|
|
different options for JVM: GC, Heap size, Permanent generation
|
Cycle 3
|
|
quality procedure to ensure Cycle 2
|
Cycle 4
|
|
finalise Cycle 3 findings and make a plan in case of a failure
|
The initial cycle, Cycle 1: (C1) will start with the current default
settings for JVM: Heap size, Permanent generation,
default garbage collector. It is however important to have a big enough
Heap size at this stage in order to avoid the immediate OOM errors.
Keep all your managed servers up and running, trying to simulate
as much as possible the PROD like environment. You should conduct this exercise
in QA env, where the number of hardware/software components are similar
with PROD env. As well, make sure you dont have loaded other 3rd parties
agents: any profiling tools which might slow down your application.
Make sure all software/hardware components are functioning as expected !
Teams: Java Development, Support
Monitoring points:
System utilisation: CPU, Mem, Disk, Net, TCP: ESTABLISHED, CLOSE_WAIT,
TIME_WAIT
JVM utilisation: GC, Old, Young, Permanent generations
JVM options: -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
- C1.1: Observe the JVM dynamics using visualGC
Start with 1-2GB Heap size and 128 MB for permanent generation.
Observe using visualGC the behavior of your application
for an average number of VUs: ~25-100 during 1, 2hrs. Examine closely
the GC log and monitor the number of minor and major collections.
Record the time spent in GC and the response time of the application.
If you start to notice long time spent in GC try to decrease the number
of VUs to 10-50 and repeat the test. If you cant even sustain for 1-2 hrs
a minimal number of 5-10 VUs, recall the application to functional testing
and escalate with your Java project development team.
- C1.2: Minimal payload
Ensure the application works ok with this minimal payload found in C1.1.
Keep the application a bit longer than 2hrs: 2-4hrs and examine closely
the JVM data: GC, the number of minor collections and the time spent.
If your application cant sustain for more than 2hrs this minimal payload
return to functional testing and core development, opening bugs to the
development team finding out why the application handles so bad
such minimal payload.
- C1.3: Long run test
If everything goes fine in C1.2 plan a longer test for 10-12hrs with
this payload. Analyse the results with the Java development and support
teams, together and write down the results found for this payload.
If you found problems in this phase return to 1.2 and decrease the payload.
Easily adjust the payload trying to find out the for how long your
application works fine.
- C1.4: Stress test
Plan a stress test at this point where you can analyse how long your
application will survive for a different number of virtual users. Keep
this load for certain period of time: 2-6hrs. For instance: start with
the minimal payload, slowly ramping up every 30minutes with 50VUs more.
Keep a max load level, say 500 VUs and observe the System and JVM data.
Decrease if necessarily or increase if the application seems to handle
ok that load. Try to find out the max number of VUs that your application
can handle. Repeat the test to ensure this is correct.
- Final Exit Criteria
Based on C1.2-4 you should approximate the max number of VUs one
JVM can sustain and if the initial VM settings, for Heap and Permanent
generation are ok.
It is highly recommended to have a common meeting review after Cycle 1
and discuss the results with your Java Development Team and Support Unit.
Finally review the results and apply the changes to your JVM to be ready
for your next cycle.
It is very important to have a tuning document where you record all
these test cases and write down the findings.
The next cycle, Cycle 2: (C2) will focus on experimenting with different
settings for garbage collector and memory generations. During this cycle
we will use the findings from C1, keeping only one JVM up and running.
We will try to discover the maximum number of VUs one JVM can sustain
for an acceptable number of hours. We will monitor closely:
operating system metrics and the throughput of the application.
If your J2EE application is part of a cluster, having more than one JVM running,
make sure you shutdown all JVM except one - the goal is to have one JVM up and
running for this cycle.
Teams: Java Development, Support
Monitoring points:
System utilisation: CPU, Mem, Disk, Net, TCP: ESTABLISHED
JVM utilisation: GC, Old, Young, Permanent generations
JVM options: -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
- C2.1: The Heap and GC tuning
Make sure your application has enough memory dedicated, as found in C1.
Adjust the Garbage collector, experimenting with the Parallel and Concurrent
collector. Observe the application response time for different options:
for instance dedicate more memory for Heap and use CMS collector and notice
the number of minor and major collections. Choose the value for your Heap and
the garbage collector at this moment and prepare to run a longer test.
- C2.2: New settings, 10hrs
Use the max payload found in C1 and run a test for 10hrs using the new
collector and memory settings from C2.1. If you haven't changed anything in
C2.1, simple check that one JVM can easily sustain this payload. Observe
the JVM dynamics using visualGC and jstat. Record all times and ensure you
don't have a degradation of your service using the new VM settings.
Repeat the test as many times you think in order to ensure the new JVM
settings are ok. Make final adjustments to your Heap, Permanent generation
and garbage collector and freeze the options. Repeat the test with the final
settings.
- C2.3: Uplimit
Increase the payload to 500-1000VUs and run it for 2-4hrs with the
new settings. If problems decrease the load to a safe value where you
dont find anymore errors. Write down the max value for VUs.
- C2.4: Endurance
Ensure the JVM can sustain for 10-12hrs the C2.3 payload. If you have troubles
return to C2.1.
- C2.5: Final Exit Criteria
At this moment you should have a set of the new settings for your JVM and
a min and max value numbers of VUs. Write down the average number of VUs for
one JVM as avg(VU)/JVM.
Re-analyse the C2 by running a endurance test for 24-48-72 hrs with the new
JVM settings in QA and having all managed servers up and running. At this moment you should ensure that the new options are suitable for PROD env.
Teams: Java Development, Support
Monitoring points:
System utilisation: CPU, Mem, Disk, Net, TCP: ESTABLISHED
JVM utilisation: GC, Old, Young, Permanent generations
JVM options: -Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
- C3.1: Endurance test
Plan with the LoadRunner/Jmeter team several test cases for QA where the
application will use the new settings found in C2. Use as a payload the C2
findings.
- C3.2: Last test
Run a final test for 72hrs with all managed servers up and running, no extra
debug level on. Record the response time of the application and watch for
major collections and how long they take.
- C3.5: Final Exit Criteria
If no troubles found in C3.2 or C3.1 push the new settings for PROD env. If
you have experinced troubles in C3.1,2 it is good to review the results and
return to Cycle 2.
A special test phase where you should consider adding special measures in case
your application: crashes, does not respond, it is very very slow. This step
should be taken care as part of the support and troubleshooting for your
application.
Teams: Support
- C4.1: JVM core dump
Prepare a simple recovery plan in case your application core dumps.
At this point you should consider to: keep up to date the JDK with your vendor,
check periodically the Release Notes for any corrections on JDK.
Examine the core dump using dbx or mdb in Solaris.
- C4.2: Application not responsive, very slow
One of the most common symptoms what you could experience: your application
starts to answer very slow, or looks like is doing nothing. Check the CPU
consumption and get two, three thread dumps using SIGQUIT signal. Use prstat
-mL to get a lwp usage distribution. As well experiment this in QA env by
developing a series of scripts which can be used. Consider using here:
jstack, pstack, dtrace.
- C4.3: Dynamic Tracing
You should not wait until you have a really problem. Therefore it is good
you develop in QA env a series of D scripts, based on DTrace to find out
how your application really works. Try to understand the numbers you get
and run these periodically communicating the findings to your
Java development team and Support Unit.
Back to main Java homepage
This document is Copyright (c) 2007 Stefan Parvu
Document License:
PDL