JMeter and Load Testing Best Practices

As the saying goes, there are no best practices or probably in context. Following are some guidelines which I followed when working on load testing with JMeter. Some of these guidelines/practices are not limited to only JMeter and can be applied to load testing in general, So let’s begin -

Number of Test Runs - The worse thing you can do with load test is to conduct test only once. Given that your test environment would depends on many factors, it is wise to conduct load test more than once to verify consistency of results. If test results have more than 5~10% discrepancy then it is sign of inconsistent system. Figure out the cause of problem first and fix it.

State of system - If you are conducting load test consecutively then beware that each consecutive run would leave system in a state which would have memory, db etc resources utilized. Probably your goal is to run each test on a clean state of system.

Identify client limitations - Many a times you would encounter that client becomes a bottleneck when there are high number of threads on one client, moreover quicker the application response the more work JMeter has to do to process results. From various sources it is usually recommended that you should not use more than 300 threads on one machine.

While load testing one API I encountered that more than 70 threads on one jmeter client resulted in very high 95 and 99 percentiles response times but when I distributed load from multiple agents each having 70 threads then response times were within acceptable limits.

Hence once you start seeing higher response times, unexpected throughput etc then you should also consider if you are hitting the limit on client side. One way to figure it out is to divide big large load on one test agent to many smaller loads on multiple test agents and gauge how results change. Once you identify the max threads you can have from one test machine then you know how many more machines you need to generate required amount of load

ulimit - Each user has limit on number of open files. This limit is applied to each process run by user. If the limit is 1024 and user has three process running then each process can open total of 3072 files.

To find out soft limit -

ulimit -Sn

1024

To find out hard limit -

ulimit -Hn

2048

ulimit -n shows soft limit. Soft limit is the limit applied for opening files. Hard limit is limit you can increase soft limit to.

Increasing to limit to 1080 -

ulimit -Sn 1080

Changing hard limit -

ulimit -Sn 4000

ulimit -n 4000 changes both soft limit and hard limit to same value.

Once having set the hard limit, you can not increase it above this value with reboot

If you set soft limit above hard limit then you get error -

ulimit -Sn 5000

bash: ulimit: open files: cannot modify limit: Invalid argument

Once you reboot the limit is reset.

To make the limit bigger and to make change permanent edit following config file on ubuntu and reboot -

sudo nano /etc/security/limits.conf

Add lines like these -

<username> soft nofile 4000

<username> hard nofile 5000

You can use * in the limit.conf file instead of a user name to specify all users, This does not apply to the root -

* soft nofile 20000

* hard nofile 30000

This is how it looks on my system -

# End of file

* soft nofile 32768

* hard nofile 32768

root soft nofile 32768

root hard nofile 32768

Don’t forget to reboot ;-)

Finding the number of open files on ubuntu -

gerp the process >

ps aux | grep jmeter

Let’s assume process id is 12345

Now you can see open files using lsof command -

lsof -p 12345

and count the number of open files by counting the number of lines output by lsof command -

lsof -p 12345 | wc -l

Assertions - Like manual testing, it is important to find out if a web page or API response is right under load test. A plain 200 response code does not guarantee that page or API response is how it is supposed to be. Such check can be achieved by adding assertions to sample response. Example JMeter assertion - Response Assertion, Duration Assertion etc.

The result of assertion can be seen in Assertion Result Listener.

Even if you don’t add Assertion Result Listener, a failed assertion would always be reported in test results.

Response Assertion and the Duration Assertion are more or less safe to use, whereas the Compare Assertion and other XML-based ones like XPath Assertion take up the most CPU and memory.

Wait Period (aka sleep time) - If you have worked UI test automation tools then you know how much static waits are hated. But when it comes to load test then wait period is recommended to be used between transactions. This is the time to give pause between subsequent sample requests. Wait period is required to emulate real user behavior since real user does not hit one request after another, after another etc but pauses for “some” duration before continuing with next request. You can use constant timer with JMeter to achieve this. Like other JMeter elements, timer can be added at the test plan level (which would then be applicable to all http requests) or specific samplers to add different constant timer for each sampler. There is another timer available in JMeter, known as Uniform Random Timer. This can be used to generate random time pause.

Retrieving embedded resources - A web page is made of many components, there are css files, images, js files etc. You can instruct JMeter to download the resources during load test. HTTP Sampler > “Retrieve All Embedded Resources” - Check this checkbox to make JMeter download javascript, css and images just as real browser would do, also set

Use thread/connection pool to simulate the browser parallel fetching (use between 2-4 threads). In addition, for every one of these threads simulating a user, JMeter creates separate thread pools of given pool size with thread names like pool-n-thread-m. The main page is downloaded by the user’s thread "Thread Group 1-k" while the embedded resources are downloaded by its associated thread pool with thread names like pool-n-thread-m. when setting the concurrent pool size, keep in mind the number of users being simulated, because a separate thread pool is created for each of these simulated users. If there are many users, too many threads may get created and start affecting the response times adversely due to bandwidth contention at the JMeter side. If many users are to be simulated, it’s recommended to distribute JMeter testing to multiple machines.

When retrieving embedded resources then make sure you exclude external domain from download using “URLs must match” field. You don’t want to load test external URL you don’t have control on

ex - Add the following RegEx to the edit box named Embedded URLs must match to exclude external domains :

^((?!<domain #1>|<domain #2>|<domain #3><domain #4>|<domain #5>).)*$

E.g.

^((?!google|facebook|pinterest|twimg|doubleclick).)*$

Set “Retrieve All Embedded Resources” in “HTTP Request Default” if resources are to be retrieved for all the http samplers

To download all resources from web site - https://www-de.test.appdoamin.net/ use following URL pattern - .*test\.appdoamin\.net.*

Furthermore, for massive load tests there are many result data you don't need.

So, in user.properties, add:

jmeter.save.saveservice.output_format=csv

jmeter.save.saveservice.data_type=false

jmeter.save.saveservice.label=true

jmeter.save.saveservice.response_code=true

jmeter.save.saveservice.response_data.on_error=false

jmeter.save.saveservice.response_message=false

jmeter.save.saveservice.successful=true

jmeter.save.saveservice.thread_name=true

jmeter.save.saveservice.time=true

jmeter.save.saveservice.subresults=false

jmeter.save.saveservice.assertions=false

jmeter.save.saveservice.latency=true

jmeter.save.saveservice.bytes=true

jmeter.save.saveservice.hostname=true

jmeter.save.saveservice.thread_counts=true

jmeter.save.saveservice.sample_count=true

jmeter.save.saveservice.response_message=false

jmeter.save.saveservice.assertion_results_failure_message=false

jmeter.save.saveservice.timestamp_format=HH:mm:ss

jmeter.save.saveservice.default_delimiter=;

jmeter.save.saveservice.print_field_names=true

You can also print variable, parameters used during test in csv file. For example if you use variable / parameters - counter, accessToken in your test plan then you can print them in csv file using following from command line -

-Jsample_variables=counter,accessToken

Using regx expression extractor - Use Regular Expression Extractor for extracting data BUT never ever check Body (unescaped), choose among:

Body

Headers

URL

Response Code

Response Message

Use efficient Regular expressions and extract as less data as possible

Threadgroup name for distributed testing - For distributed testing use thread group name as -

${__machineName()}_My Threadgroup name

This would identify thread group name exclusively for a machine

Use cache manager to simulate browser cache

Use cookie manager to simulate browser cookie

By default, JMeter does not save threads count in JTL files. If you plan to work with JMeter JTL files, you should enable it by uncommenting in JMETER-INSTALL-DIR/bin/jmeter.properties the line and set it to true:

#jmeter.save.saveservice.thread_counts=true

To simulate browser, add user agent string in Header Manager. You can copy it from your browser -

It does not matter where you place header manager, headers in request will be same -

Sharing variables between threads and thread groups -

Variables are local to a thread; a variable set in one thread cannot be read in

another. This is by design. For variables that can be determined before a test starts,

see Parameterising Tests (above). If the value is not known until the test starts, there

are various options:

Store the variable as a property - properties are global to the JMeter instance hence they can be used in different thread groups, unlike variable which are local to a thread group.

When testing web application login as one user manually and navigate to screen under test, you may find application bugs

Verify client side performance using Firebug, Chrome Console or Google Page Speed online or chrome extension -https://chrome.google.com/webstore/detail/page-speed-insights-with/lanlbpjbalfkflkhegagflkgcfklnbnh/reviews

Application logs-

During manual testing, analysing application logs helps to uncover application defect which would otherwise be missed. This is equally true with load test. You may find errors on connection timeout, application operation failures ec which would add more value to load test report. So don’t forget to scan application logs when carrying out load test :-)

Check for stickiness on AWS load balancer -

Stickiness should be disabled else you traffic would end up on one instance

No really a best practice but remember you can not add port number in Server Name / IP field of sampler. You may easily forget this if you use variable for URL and specify port number there. You must mention port number only in port number field. You can also specify port number in HTTP Request Default element.

Which HTTP Request Implementation to use?

There are some limitation on using java implementation as described here

Hence use HttpClient4 implementation, which employs Apache HttpComponents HttpClient 4.x

Besides the snazzy iftop / top etc command (which is topic for another post) you can also read n/w read, write and other operations from AWS console

Notice that if it EC2 instance is EBS mounted then you would see Read and Write operations on corresponding mounted EBS -

In the similar manner you can also monitor 4XX, 5XX errors from load balancer monitoring. This comes handy when you encounter 504 or others errors on client end but don’t see any error in application log. And then you know load balancer is the error generator.

Are you filling the Surge Queue Length. According to AWS

Surge Queue Length > The total number of requests that are pending routing. The load balancer queues a request if it is unable to establish a connection with a healthy instance in order to route the request. The maximum size of the queue is 1,024. Additional requests are rejected when the queue is full. For more information, seeSpilloverCount.

Reporting criteria: There is a nonzero value.

Statistics: The most useful statistic is max, because it represents the peak of queued requests. The averagestatistic can be useful in combination withmin and max to determine the range of queued requests. Note that sumis not useful.

Example: Suppose that your load balancer has us-west-2a and us-west-2b enabled, and that instances in us-west-2a are experiencing high latency and are slow to respond to requests. As a result, the surge queue for the load balancer nodes in us-west-2a fills, with clients likely experiencing increased response times. If this continues, the load balancer will likely have spillovers (see the SpilloverCount metric). If us-west-2b continues to respond normally, the max for the load balancer will be the same as the max for us-west-2a.

You can observe surge queue length in cloud watch metrics.

You can filter per LB Metrics and per LB > per AZ Metrics-

and this is how your graph looks like -

If you encounter high average latency on ELB like this -

Then it is time to troubleshoot ELB latency issue

Why does throughput on aws instance drops suddenly?

You would have seen such behavior when conducting load test on instances deployed Due to IO credits and Burst Performance AWS-EBS shows higher throughput for certain period and then goes down to baseline performance. If you encounter issue of drastic drop in application throughput but other performance criteria do not show degradation then it is time to find out what IOPS you need and what is supported from AWS infra you have. Here is one case study I came across when conducting load test for one of the project -

Test Environment -

2 c4.8xlarge instances
2 ESB volume storage, 120 GB each
10 JMeter m4.4xlarge test clients (repeated tests with m4.10xlarge test instances which is 10Gigabit n/w instance but results were same)

Considering EBS-EC2 doc, I was on 10 Gigabit n/w instance since I was using c4.8xlarge instance. Hence I assume that Max bandwidth is limited to 500MB/s.

And considering EBSVolumeTypes I was limited to maximum throughout of 160MiB/s

Throughput during test -

Read Request - 12000/sec
Write Request - 300/sec

Taking Avg Read and Write size into consideration -

Read Bandwidth - 12000 * 30 (avg read size of ESB from aws console in KiB/op) = 360000KB = 360MB
Write Bandwidth - 300 * 60 (avg write size of ESB from aws console in KiB/op) = 18000KB = 18MB

Which is 378 MB.

I suppose that I was able to reach throughput of more than 160 MB owing to IO credits and Burst Performance

But credit balance runs out in some time during test and performance comes down to baseline performances. This is when throughput drops from 12000 req/sec to 6000 req/sec. I have repeated the test on different days and different times but results have been same. Throughput drops dramatically (infact as low as 4000/sec) and continues to be there for about 30 mins of test run.

Other performance metrics i.e. cpu, load average were considerably low for entire duration of load test.

Given this I don't foresee any other reason than n/w limitation for drop in application throughput.

You could also scrutinize n/w limitation by analyzing sent and receive bytes from ELB access log. You can also enable access log following this document -

http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/access-log-collection.html

Do you have sufficient number of connection pools to support the threads with which you run the test?

Did you get rid of DNS caching as specified here - https://blazemeter.com/blog/dns-cache-manager-right-way-test-load-balanced-apps

If you have tons of components then did you test them individually to isolate slow/erroneous components?

References -

http://jmeter.apache.org/usermanual/best-practices.html

https://flenniken.net/blog/65/

https://guide.blazemeter.com/hc/en-us/articles/207421405-JMeter-Best-Practices

NewAutomationWorld

Search This Blog