Averaging a collection of values by computing the mean is an operation learned in elementary school. Sum the values. Divide by their quantity. Many learn the process by rote, with little attention given to how the result might be used and when it might or might not be appropriate.

# The Setup

Consider the problem of computing the average miles per gallon (mpg) across a fleet of automobiles. We have some number of vehicles. We know the miles per gallon each is capable of. We presume each vehicle will be driven about the same number of miles. We want to improve fuel economy across the fleet. We decide to compute a mean miles per gallon to give a single number by which to track our improvement.

Example Data:Download and execute harmonicExamples.sql to create the example tables and data used in this article.

Here's some data from a purposely tiny fleet to make the example easy to follow:

SELECT * FROM fleet;ID DESCRIPTION MPG --- -------------------- ---- 1 2013 Focus SE manual 36 2 2013 Expedition auto 20

The naive approach to computing the mean is to apply the AVG function:

SELECT AVG(mpg) FROM fleet;AVG(MPG) ---------- 28

We get a result, but is it correct? Is it meaningful?

The term *miles per gallon *refers to the distance a vehicle can travel using a single gallon of fuel. Put one gallon into the Focus, and we can drive 36 miles. Put one gallon into the Expedition, and we can go for 20 miles. Average those distances, and we get 28 miles.

We know we can average the distances traveled – 36 miles and 20 miles – by summing and dividing by two. Does the same approach hold true in computing average miles per gallon? Let's work through the math presuming a 200,000 mile lifetime per vehicle:

Step 1: Compute gallons used by each vehicleVehicle #1: 200000 miles / 36 mpg = 5555.56 gallons Vehicle #2: 200000 miles / 20 mpg = 10,000 gallonsStep 2: Compute total gallons used, and total miles driven10,000 gallons + 5555.56 gallons = 15,555.56 total gallons 200,000 miles + 200,000 miles = 400,000 total milesStep 3: Compute the mean miles per gallon overall400,000 miles / 15,555.56 gallons = 25.7 miles per gallon

Our result of 25.7 mpg is 2.3 miles per gallon away from the arithmetic mean of 28 mpg. Clearly the arithmetic mean is the wrong mean to use in our scenario.

The problem is that when driving a vehicle, it's not to see how far we can go on one gallon of fuel. We drive to get places. Whichever vehicle we take, the distance is the same to where we wish to go. We will refill the tank if needed to reach our destination. Our rates in the fleet table tell us miles per gallon, but we rather care more about gallons per mile. The naive approach fails us precisely because it sneakily rests on the implicit assumption that all we care about is driving whatever distance a single gallon of fuel will take us.

# The Solution

Salvation lies in the *harmonic mean. *The mean that failed us in the preceding section is the *arithmetic mean *– the easy mean we learn as children. Yet there are many other types of means that apply in various circumstances, and the harmonic mean is often the one to look to when averaging rates such as miles per gallon. Compute it by taking the *reciprocal of the mean of the reciprocals*.

Begin by converting each miles per gallon value into its reciprocal:

36 mpg and 20 mpg become 1/36 gpm and 1/20 gpm

Then take the mean of the reciprocals:

(1/36 gpm + 1/20 gpm) / 2

Now take "all of the above", and compute the reciprocal of that:

1 ----------------------------- (1/36 gpm + 1/20 gpm) / 2

From here, it's just a matter of working out the arithmetic:

1 ----------------------------- (1/36 gpm + 1/20 gpm) / 2 1 = ----------------------------- (0.0278 gpm + 0.05 gpm) / 2 1 = -------------- 0.0778 gpm / 2 1 = ---------- 0.0389 gpm = 25.7 gpm

The harmonic mean delivers the result we're after. It's the correct mean to use when the following conditions are true:

We want the mean of a collection of rates such as miles / gallon.

The quantity represented by the numerator – e.g., mileage – is held constant.

The quantity represented by the denominator – e.g., gallons – is allowed to vary.

Our scenario meets all three criteria. Our rates are the miles per gallon values. We held miles – the numerator – constant at 200,000. It is gallons – the denominator – that varies across vehicles. All three criteria are met. Harmonic mean is appropriate.

Tip:Harmonic mean is appropriate to database professionals! Check out Neil Gunther's application of harmonic mean to input/output and processor performance, and also to load-balancing, in his blog entry What the Harmonic Mean Means.

# The Explanation

The magic lies in forcing a common denominator by taking the reciprocal of each value. Changing 36 miles per gallon into 1/36 gallons per mile has the effect of making 1 mile the denominator. Having a common denominator makes it acceptable to add the numerators, in this case treating 1/36 and 1/20 as numerators.

Note:It may seem odd to consider the fraction 1/36 as a numerator in an encompassing fraction, but such is the case. 1/36 gallons per mile is 1/36 gallons / 1 mile, with 1 mile being the denominator.

Figure 1's first example shows the math when each vehicle is driven just one mile. The implied common denominator is 1 mile. The mean gallons per mile is computed first, and then the reciprocal is taken to produce a miles per gallon figure that is the harmonic mean.

The number of miles driven doesn't affect the result so long as the miles are the same for each vehicle. Whether each vehicle is driven one mile, two miles, or 200,000 miles, the mean miles per gallon remains the same.

Note:If you were looking at historical data and had records of how many miles each vehicle had been driven, then a weighted arithmetic mean would be more appropriate. You would weight the number of miles a vehicle was driven by the gallons per mile that vehicle was known to achieve. See the section on "Weighted Means" in my article AVG: What Does it Mean? for help on computing such a mean.

Figure 2 begins a closer look at the math. Each vehicle is driven 200,000 miles. Multiplying gallons per mile by 200,000 gives the total gallons used per vehicle. Dividing by the total miles driven gives a mean gallons per mile.

Because the miles driven is the same across the board, it simply cancels to the value 1. This cancelling is shown in Figure 3, and is why all the examples in Figure 1 lead to the same result.

Figure 4 follows the rest of the arithmetic, which is now simply the standard process for computing the harmonic mean: 1) Take the mean of the reciprocals, and 2) Take the reciprocal of that mean. Step 1 gives the mean gallons per mile. Step 2 inverts that value into the mean miles per gallon.

The mean gallons per mile works out to 14 / 360. Invert that to get 360 / 14, or 25.7 miles per gallon. You can use that value in planning for the future. If you expect to put 1000 miles per month on each vehicle, you know that you will need 2000 / 25.7 ≈ 78 gallons of fuel. (Always round up when purchasing fuel).

# The SQL

Computing the harmonic mean from SQL is not difficult. You can embed the arithmetic easily enough as a column expression in a SELECT statement. For example:

SELECT 1 / AVG(1/mpg) AS "Harmonic MPG" FROM fleet;Harmonic MPG ------------ 25.7142857

It's also possible to write an aggregate function in PL/SQL. By doing that, you make your queries more self-documenting. For example, you could choose to write a function named harmonic that you applyas follows:

SELECT harmonic(mpg) FROM fleet;HARMONIC(MPG) ------------- 25.7142857

You'll find my code implementing the harmonic function in the file harmonicAVG.sql. The code is less intimidating than it looks, and creates a PL/SQL object type implementing an interface defined in the *Oracle Database Data Cartridge Developer's Guide*. (See chapters 11 and 22 in the 11*g *R2 edition). Parallel execution and the analytic syntax in the OVER clause are supported.

# The Burden

Weighted means are useful when you have different quantities of the objects you are examining. For example, the fleet_counted table contains a quantity column indicating how many of each type of vehicle are in the fleet:

SELECT * FROM fleet_counted;ID DESCRIPTION QUANTITY MPG --- -------------------- ---------- ---- 1 2013 Focus SE manual 2 36 2 2013 Expedition auto 1 20

To compute a proper harmonic mean from this data requires that you weight each miles per gallon value by the number of vehicles owned of the type in question. This makes the arithmetic somewhat more complex, as shown in Figure 5.

Reciprocals always have the numerator 1, so in practice you can skip writing 2 * 1 / 36; you can go directly to 2 / 36. Multiply each reciprocal by the associated quantity to apply the appropriate weight. Sum the weights to form the denominator. Three vehicles are in the example data. Hence the value 3 as the sum of the weights.

Following is the logic from Figure 5 encompassed within a query:

SELECT 1 / (SUM(quantity/mpg) / SUM(quantity)) AS "Weighted Harmonic MPGs" FROM fleet_counted;Weighted Harmonic MPGs ---------------------- 28.4210526

Writing a custom aggregate is *very* helpful now. You reduce the chance of an arithmetic error, and you erase doubt over the purpose of the expression in the SELECT list, which otherwise some future programmer might look at and wonder about. The file harmonicW.sql provides a type and function to be used as follows:

SELECT WeightedHarmonic( whpair(quantity, mpg) ) AS "Weighted Harmonic MPGs" FROM fleet_counted;Weighted Harmonic MPGs ---------------------- 28.4210526

Type whpair encapsulates the two values – weight and value to be meaned – into a single object. That's done because aggregate functions are only permitted to accept *one* value, but that one value can be an object with multiple attributes. Essentially, my passing an object to WeightedHarmonic neatly sidesteps Oracle Database's inability to allow for aggregate functions taking more than one parameter.

Tip:Type whpair is useful with any aggregate function that you write to compute weighted means of any type. For example, you might someday write a function to compute a weighted arithmetic mean, or a weighted geometric mean. A forward-thinking programmer might choose to name the type as wpair, dropping the letter h, so as to not link the type by name specifically to the harmonic mean.

# The Summation

Harmonic mean is not *always* the correct solution when dealing with rates. Arithmetic mean is actually appropriate in the vehicle scenario when the goal is to consider the average number of miles traveled if all vehicles are fueled with just one gallon of gasoline. It is the scenario in which all vehicles travel the* same distance *that requires the harmonic mean. Remember, the criteria are:

You want the mean of a collection of rates such as miles / gallon.

The quantity represented by the numerator – e.g., mileage – is held constant.

The quantity represented by the denominator – e.g., gallons – is allowed to vary.

Also consider what you're trying to accomplish versus what the harmonic mean, or any mean for that matter, provides. For example, computing the harmonic mean of muzzle velocities across a collection of muskets probably isn't meaningful if it is the muskets themselves that are under discussion. After all, each musket shot is independent of the others. There's no point in computing a mean of any type without first having a good application.

Corporate Average Fuel Economy (CAFE) standards are an application affecting just about everyone who is a reader in the United States. CAFE standards regulate the average miles per gallon that automobile manufacturers must attain across all vehicles sold in various classes such as passenger cars and light trucks, and the harmonic mean is the foundation for CAFE calculations.

Note:The popularity of sport-utility vehicles (SUVs) and minivans is an indirect result from CAFE standards. See Washington Post columnist Warren Brown's articles Real Wheels (Friday, April 13, 2007) and The Station Wagon Stealthily Returns (Sunday, August 29, 2004). Next time you can't safely turn or pull out into traffic because your view is blocked by a large SUV, just sit back and pause for a moment of harmonious reflection on the harmonic mean. (And really, the harmonic mean is very, very sorry about the inconvenience).

Database administrators and others concerned over the performance of software systems should know that the harmonic mean has great application to the work they do. Variable-speed processors like those mentioned in Neil Gunther's article referred to earlier provide just one example of when the harmonic mean can make the difference in managing system performance.

Anytime you're averaging rates, from input/output rates to miles per gallon rates to load-balancing across variable-speed processors, pause and think about whether harmonic mean applies to the problem you're working to solve, or to the question you're wanting to answer. It might just be the mean that you need.

Acknowledgments:Stéphane Faroult, Dr. Myron Hlynka, and Dr. Jon Westfall were kind enough to read and comment on a pre-publication draft of this article to help ensure correctness and accuracy.