Chartered Industrial Psychologists

Reliability of Methods

Reliability is the consistency or precision with which the test or assessment method measures what it claims to measure.

An ability test or personality assessment needs to measure each factor it is attempting to measure reliably, for the given population (e.g., customer service applicants, males, females etc).

Any assessment method or test needs to be a consistent measure – this means if the test was used repeatedly on the same candidate it would produce similar results.

A test that is unreliable cannot be valid

Assessments or tests with lower reliability are of little practical use. Imagine if you gave a test to a candidate one day and their result was at the 99th percentile (better than 99 out of 100 people taking the test) and then the next time you gave the same test their result was at the 18th percentile (better than only 18 people out of 100 taking the test), this test would be unreliable as the results are not replicable – in reality, you could not trust this test.

Reliability is a very important concept and works in tandem with validity. A guiding principle for psychology is that a test can be reliable but not valid for a particular purpose, however, a test cannot be valid if it is unreliable.

Most psychological test makers provide a reliability coefficient which establishes the reliability of the test and is usually based on test-retest methodology or split half technique (otherwise known as internal consistency reliability).

The reliability coefficient r is used to set up a band for error around individual scores that is acceptable and renders the results reliable. An r of zero is completely unreliable and an r of 1 is completely reliable, every time you would give the test the result would be exactly the same.

Test Retest Reliability

Test retest reliability is when the same test is administered to a sample group of people twice. The limitations of this method are the impact on performance of any information the subject remembers or has learnt from the first testing session that may impact how well they answer the test the second time around (practive effect).

To get around this issue, many test developers design an alternative form of the test and administer this – this second form however does have to measure the issue exactly as the first measure does for reliability to be assessed.

Split Half Reliability

This is where subjects in the reliability research may have half of their test answers correlated with the other half (say all odd numbered items compared to all even numbered items).

If both halves correlate highly with each other (the correlation coefficient r is greater than .70 or .80), the test is considered reliable.

Comparing scores from two halves of the test

Test reliability is also represented by a correlation coefficient (r). As with validity coefficients, the closer the correlation coefficient is to 1 the better the reliability of a method.

While many personality tests are considered to have acceptable levels of reliability if they have reliability coefficients greater than r = .70, ability tests should have reliability coefficients greater than r = .80.