Weighted Average Limitations

This Originally appeared at “http://www.arlingsoft.com”, but it is no longer available on the Web. Also see: United States Patent 6151565.

The Weighted Average And its Limitations In Decision Support
What is a weighted average and why is it used in decision support? This is a key question to answer if we are to rely upon the weighted average for assisting us in making a decision, particularly as most decision systems are based on this fundamental value, calculated in one way or another. Perhaps the best starting point is to understand the concept of the average. An average is something we all have calculated at one time in our school lives, and probably many times since. We know, for instance that the average of the three numbers (3, 4, 5) is 4 almost instinctively. It is the middle number, but we can check the result by adding 3+4+5=12 and dividing by the number of values – in this case three values. The result is of course 12/3=4. However, taking the average of 3, 4 and 5 actually implies we have given the same weight to each of the three numbers – we value them equally to reach the average. We have in fact performed the following operation:

(3+4+5)/3 = [(1/3)*(3)] + [(1/3)* (4)] + [(1/3)*(5)] = 4

This is a special case of the weighted average where all the weights are the same. If we suppose that the three is twice as important as the other two numbers, we can represent this by changing the weight applied to the number 3 – in fact by making the ratios of the weights 2:1:1. Hence our “weighted” average becomes:

(1/2)*(3) + (1/4)*(4) + (1/4)*(5) = 3.75

Another series of values can also create the same weighted average:

(1/2)*(4) + (1/4)*(3) + (1/4)*(4) = 3.75


(1/2)*(2) + (1/4)*5 + (1/4)*(6) = 3.75

If the weighted average is used for decision purposes, then the preceding examples obviously are problematic as we have no way of distinguishing which is the ‘best’ among the sets of values. Clearly something is wrong. When faced with these circumstances, many decision makers revisit the data, and re-examine how the decision was reached. What Arlington has done is to essentially revisit by a numerical measure. However, it goes deeper than that. The “pattern” of thinking of an evaluator is reflected in the weights, which is often referred to as the utility function. In terms of Which & Why, the utility function is not a function but a utility pattern. The fundamental difference in view implies that we do not require a function to link disparate criteria, or compare the closeness of one function to another through statistical means (in effect, one is comparing two series of numbers.) Methods such as linear regression make the assumption that some respectable degree of normality – a truly random distribution – exists in both scores and weights, and often, their combination. This assumption is at best invalid because scores and weights are deliberately biased by human evaluators, and the question of comparing to a random distribution is as equally valid as comparing it to any other distribution. Not only that, but the combination of even normally distributed scores and weights does not result in a normally distributed combination. The implications are clear. Traditional statistical analysis does not work – and should not be expected to work – for decision systems.

The weighted average is used in many sciences, from econometrics to biology, medical analysis to particle physics. And of course in Decision Sciences. Let us be clear, however, that the question asked by the decision process is not neccesarily the same as the weighted average “answer” in quantitative sciences. In the latter data is often measured and weights determined from physical processes having quantifiable parameters. In decision processes, these values are at best imprecise, and heuristically arrived at with parameters which are likely to have little or no relationship to each other – they are not united through any physical process. Quantifiable sciences ask what is the end value of the particular measurement given a set of quantifiable nuisance and shape parameters. In decision processes we are asking what is the alternative closest to the pattern of requirements of the evaluator. These questions are different. Lotfi Zadeh of the University of California at Berkeley, the father of modern fuzzy logic applications to artificial intelligence, declared that no single unique value can represent human thought processes, and normal statistics does not apply. If we assume that there is a particular pattern to human thinking for the purpose of making a decision, then the thinking behind a decision must reflect the pattern of thought of the evaluator. The weighted average alone cannot reflect a pattern of thinking, and any method that does not deal with the question of matching the pattern of thought with the evaluation pattern of an alternative is missing a fundamental point.

The Three Houses Problem

Here is a further simple example. Let us assume that one wishes to purchase a house. The decision rests on a weight distribution of 40% for the house, 30% for the neighborhood, 20% for the property, and 10% for the garage and driveway. Three houses at equal selling prices were assessed on a 0 to 10 point scale for the four factors, and the following results were presented:

Factor:          Weight      House A     House B      House C
House              40           6           0            8
Neighborhood       30           6          10            6
Property           20           6          10            4
Garage+Driveway    10           6          10            2
Weighted Average:               6           6            6

The figure on the right illustrates the build-up-pattern (the Which & Why Overall Chart) as to how the values cumulatively add up to the weighted average. The order of the houses has been changed to show the area plots more clearly.

Obviously, if the prices are the same House B would not be chosen, since it appears it does not even exist. Yet if we blindly accepted the weighted average, we could wind up with a plot without a house! Of course one can set minimum requirements which would eliminate House B.

However, we are left with trying to choose between Houses A and C. There is no guidance here except from the “revisitation” of the data, and reviewing what our priorities are in choosing a house. However, we have already, in theory, expressed this in the weight distribution. Consequently we may consider comparing the utility pattern of weights and the resulting score pattern that was obtained. We do this by looking at how each weighted average is composed, and compare the composition pattern to the pattern of weights, factor by factor. A method common in pattern analysis that has been utilized for this measure relates to a linear measure of difference between two patterns, and is referred to in terms of a “loss” or “cost” from a benchmark pattern. Using this method, and a twist added by Arlington’s research, a value called the matching index, related to the pattern loss, is calculated. The following table gives the results for the matching index:

Factor:          Weight      House A     House B      House C
House              40           6           0            8
Neighborhood       30           6          10            6
Property           20           6          10            4
Garage+Driveway    10           6          10            2
Weighted Average:             6.00        6.00         6.00
Matching Index                1.00        0.56         0.85
Adjusted Weighted Average:    6.00        3.33         5.11

The combined result is given in the adjusted weighted average, and it is patently obvious House B does not conform to our initial desires, even without minimum conditions. This is further illustrated in the factor-by-factor analysis chart given in the figure on the right. The House A line plot covers the benchmark pattern perfectly.

We could ask at what point does House B become attractive? In other words, at what point are we willing to trade House A for House B in order to live in the obviously better neighborhood. There are four ways to deal with this problem. These are: Price Equivalency – what is House B really worth when compared to the recommended?

  1. At what point does House B become interesting: i.e. suppose there is another house in the same area – at what point does
    it become competive with the better houses in the not-so-good neighborhood (amongst other factors)?
  2. What trade-offs are we ready to make? Would we feel comfortable changing the weight distribution to meet our goal? By how much
    do we feel comfortable in changing the weights?
  3. A combination of the above three.

At every point in this discussion, it is evident that we are making trade-offs. The matching index makes us aware of the incompatibilities of the house evaluations with our own sense of the relative importance of the factors, and how much ‘trading’ needs to be done to get to a particular option. In terms of price equivalency, for instance, the value of House B to us is (3.33/6) of House A. This works out to a 44% reduction in the price if the houses were equally priced. Alternatively, we may find a house in the same area that scores a four or more. In this case, the trade off of a lesser house against neighborhood and so forth is evident as the adjused weighted average of House B will exceed 6. As for the weight distribution, it turns out one would have to reduce the emphasis on the house some 20% in order for House B to be selected – in other words its emphasis must be reduced from 40% to 21% in our decision – are we willing to make that kind of reduction? Is there a mixture of all three of these that we can live with? These are scenarios that need to be considered in making our decision.

Complex Models and the Weighted Average

In more complex models, the deficiencies in a weighted average may not be so obvious. This could be particularly true where some alternatives are deficient where other alternatives are strong, and the weighted averages are close. The trade-offs become important, but can be completely masked by the wholesale reliance on the weighted average. As mentioned earlier, the matching index is an automated measure of the degree of difference from the evaluator’s utility pattern. Wise decision makers know they have to question where a weighted average came from before finalizing any decision. Yet in many organizations the weighted average is relied upon to indicate the solution in complex situations – to give, in other words, the “best numerical guess.” This may be a poor assumption. Of course, this means we rely upon the utility pattern first set by the evaluator. Again, in complex models one requires a significantly large feedback mechanism than just slide bars and looking at tables of numbers to get a good ‘feel’ for the utility pattern. Objective assesment and feedback is essential, and without this the process can become unreliable. We know also it is difficult to pinpoint scores – often there is a spread in values which leads to uncertainty in the final values. We know also that a decision maker must look from several or many perspectives – in other words, to look at the various scenarios. In the process of decision making, the weighted average’s significance can be changed in unexpected ways, leading to false results, as each scenario is considered. For not only does the score change, so can the weights – hence the evaluation of House B above could lose all significance as the weight of the house is reduced with respect to the other parameters. The trading of weights may lead to more exaggeration, and a reduced minimum score for House B. The following table indicates the amount of change in the the house factor (exchanged with the next highest weighted factor – the neighborhood) to switch ranks between House A and House B. Reductions are required using the matching index adjusted weighted average as House A leads. With the weighted average, we must increase the house importance in our decision in order for House A to lead House B in ranking.

Score House B  Reduction in Weight of      Increase In House Weight
               House Factor using the      to bring House B below A      adjusted weighted average   using weighted average only
     0                 -19%                          +0%
     1                 -16%                          +5%
     2                 -13%                          +10%
     3                  -8%                          +17.5%

What the above scores are telling us is that large changes in weights are required when the matching index is used to cause a change in rank reversal. If we took the weighted average only, then lower changes in weight are required to reverse rank between House A and House B. The Matching Index method appears from this simple example to be more stable. It is difficult to extend this to more complex problems, and more study is obviously required. The robustness of a decision, however, is important, and again the addition of the matching index appears to improve this aspect.

Originally written by Dr. Edward Robins, Arlington Software. No copyright infringement intended.

Related reading

A Brief History of Decision-Making

This is a paper someone wrote that was a background paper for a decision making product, Which & Why.  The technology or algorithm is now called ‘ebestmatch’ and used in the product “Ergo“.  See United States Patent 6151565 “Decision support system, method and article of manufacture”.  Copied as a blog post here since the paper does not seem to be authoritatively hosted anywhere.

White papers from original Ergo product:

  1. paper1
  2. paper2

SubjectsDecision Support System, Decision Making Software, Decision Theory, MCDM, Conjoint analysis

A Brief History of Decision-Making

Historically, decision techniques have focused on outcome prediction, not decision process or technique. The reason for this focus is evident: in the absence of predictive information, decision making is simply the process of guessing the future consequences of impending choice. Modern decision making is the result of very small incremental gains in the understanding of decision making processes and human thought, and the application of technology tools to support the process.

Decision making is not an exclusively human problem. Anyone who has watched a dog agonizing over whether or not to snatch an unguarded steak will understand that choices and consequences are being weighed inside that canine mind. However, the fact is that lower life-forms face fewer decisions. Their choices are limited, less complex, and usually “hardwired” into instinctual patterns and rituals.

A little further up the evolutionary ladder, human decision makers, with the mixed blessing of some capacity to think and choose, having been looking for support for their decisions since the beginning of recorded history,

In earlier times, societies consulted their elders for alternatives and experimental data about the probability of success for decision choices in similar situations. Then, at some point, this advisory function shifted to soothsayers, astrologers, and religious figures — the management consultants of the day.

Alexander the Great regularly consulted oracles and fortune tellers on the eve of great battles. Always a creative general, Alexander was not looking for innovative battle strategies from his advisers. What he needed was advice about the potential outcome of the untried strategies he already had. This information could only be provided by those who claimed to have a “window” on the future: fortune tellers, high priests, and the like.


Oracles eventually tired of being at the whim of leaders like Alexander. They set themselves up in temples and hermitages, thus creating the first consulting houses. One of the most famous was at Delphi in ancient Greece, high on Mount Parnassus overlooking the Aegean. Advice seekers like Alexander were mainly looking for a glimpse into the future. They were impatient to see the consequences of a choice, rather than a way analyzing current data or alternatives. According to historical reports, most of the Delphi Oracle’s advice was sufficiently cryptic and vague to stand up to scrutiny, regardless of the outcome of the decision. With the rates charged and its great location, the Oracles guaranteed itself an exclusive market position with its clientele. This approach is still used by many business consultants.

The early Romans also had their oracles, but leaned heavily on interpretation of “hard” data. Their specialty was the explanation of natural phenomena such as where and how lightning would strike. The other data interpretation was done by Haruspicists. They were an organized guild dedicated to the inspection and analysis of animal entrails. The Romans also turned prediction into a “fast-fortune” business. The first coin-operated machine is said to have been a Roman oracle in which fortunes, written on parchment, were dispensed when a coin was deposited into a slot. There is no mention whether the advice-seeker’s weight was included.


The Chinese, while equally fascinated by divination, were also searching for ways to integrate prophecy with a more systematic process of decision making. The result was I Ching, first developed in 3,000 BC. The I Ching integrated Chinese world views about the primeval forces of yin and yang; cycles of the calendar; and the interaction of the elements of water, earth, and fire.

The actual divination process involves asking the I Ching for the prognosis of a given decision. The answer consists of a generic evaluation of the situation as well as the potential risks and opportunities.

As a decision making tool, the I Ching has about the same real value as an Ouija board. However, as a decision making process, it does offer valuable lessons: proceed slowly, consider the alternatives, identify risks, and build contingency plans before choosing a course of action. This focus on careful research, data collection, and data analysis before making the decision is entirely consistent with modern decision making practices.

In summary, much of ancient decision making was haphazard, largely focused on guessing or sensing the outcome of a given choice, rather than generating creative choices and then systematically evaluating them. While elders were consulted for knowledge and predictions, fortune-tellers often recommended courses of action which changed the course of history — though not always as predicted.


Throughout the Middle Ages, the Roman Catholic establishment discouraged the practice of prophecy as well as research into many scientific areas. The official reason was that since all decisions would ultimately be affected by God’s will, human decision making is trivial and/or irrelevant.

In the second half of the 16th century, England was the home of two of the most brilliant contributors to the study of decision making: Francis Bacon and William Shakespeare. Bacon’s contribution was to attempt development of the scientific method. Shakespeare’s efforts include many tragedies on the consequence of decisions, including Othello, King Lear, Romeo and Juliet, and others. The most profound was Hamlet which reflects on the agony and terrible consequences of psychological indecision.

A century and a half later, Benjamin Franklin turned his analytic mind to decision making. He is credited with developing the “balance sheet” approach, which gives a simple, workable way of structuring information for evaluation. Franklin recommended making a two-column list of the pros and cons of each alternative and then calculating a “middle line” value. His evaluation technique may seem naive by present-day standards, but his information documentation process is hard to fault. It lists not only what is known about a given choice, but also points out the information gaps that must be filled before a decision can be made. In more recent times, the authorities Wheeler and Janis have developed an updated version of Franklin’s balance sheet method as part of their own decision making model.


Modern technology and psychology have attempted to tame the great decision dilemma in the 20th century. Application of Bacon’s scientific method in the area of psychology has led to major revelations about how people make decisions, pointing out typical flaws in our interpretation of the data that influences our choices, and quantitative techniques to give value to what we feel.

With the advent of computing power, along with the development of more sophisticated statistical analysis techniques, an opportunity arose to overcome the decision maker’s prime obstacle: too much disparate data to handle at one time. Approaches therefore began to focus on the process of data collection and analysis to support, and even to replace, human decision making.

Going one step further, H.A. Simon set out to build the General Problem Solver, an algorithm capable of solving problems, including those of a decision making nature. While his laboratory is the computer room, his field studies have taken him from corporate boardrooms to clinical group therapy sessions.

For many, the decade of the fifties was the golden age of decision making as well as rock roll. Social and cognitive psychologists were establishing base line data on how individuals made decisions and solved problems. Scientists set to work studying how executives and management teams worked, and began building theories and models based on that data.

One of the hot-beds of group behavioral research was a US federally-funded project that went by the innocuous title of National Training Laboratories. It became a Mecca for social scientists to experiment and theorize about small group dynamics, and a number of major discoveries, as well as academic reputations, were made during its operation.


One of the outcomes of the N.T.L. work was the development of group activities to stimulate social interaction and thinking. The best known of these is “brainstorming.” First created to overcome the natural reluctance of people to participate openly and honestly in groups, brainstorming is a technique in which a group facilitator asks participants to offer a stream of alternative solutions for a given problem or issue. The rules of the process are that all participants must make a contribution, which the facilitator records verbatim. Other participants must then encourage and build on these suggestions without resorting to negativity. The objective is that at the end of the brainstorming session, a lot of creative data will have been recorded, as well as some of the subjective needs and concerns of the group.

When the technique was published in the influential journal “Developing Human Resources,” a number of facilitators were stymied about what to do with the accumulated data. Brainstorming continues to be a tool to generate and collect a large pool of potentially useful data, but it must still be edited, classified, and evaluated in an objective manner, something that the group itself is not necessarily qualified to handle.


Based on work, done at the N.T.L. labs, Charles Kepner and Benjamin Tregoe developed a practical methodology for problem solving and its cousin, decision making. Using an analysis model that would have made sense to Francis Bacon, Kepner and Tregoe designed a business-friendly process to isolate problems, generate alternative solutions and to evaluate the best solution. And suddenly, “eureka!” The first complete problem-solving and decision-making system.

In essence, their process consists of a three-phase problem solving process, of which decision making is but one phase. One way to describe the process might be as a series of steps, which include:

  1. Define the problem.
  2. Formulate a complete decision objective.
  3. Generate criteria.
  4. Generate alternatives.
  5. Rate how well each of the criteria are met for each alternative.
  6. Compare the scores for the alternatives.
  7. Choose the alternative with the best score.

However, The Kepner-Tregoe process has not become the universal business methodology for decision making for a number of reasons. Reports indicate that the finely-constructed case studies that respond so well to the KT process in the classroom are not necessarily an accurate reflection of real world problems or the dynamics of people who make decisions.

The KT process was infinitely more complete and sophisticated than any previous attempt, but still did not address the issue that decision making must allow for the so-called “soft” factors. The subjective or affective domain plays an active role in establishing criteria for even the most mechanical of decisions. Whatever system is used must therefore allow all types of criteria to figure in the evaluation process. Furthermore, since all criteria are not of equal importance, the ideal process must provide a method for comparing and weighting the criteria in a way that reflects both their subjective and objective values.

As Carl Jung stated in his 1923 text “Psychological types,” “we should not pretend to understand the world only by intellect; we apprehend it just as much by feeling. Therefore, the judgment of the intellect is at best, only half of the truth, and must, if it is to be honest, also come to an understanding of its own inadequacy.”


Beginning in the sixties, and working at the fringes of statistical methods, a number of social scientists were attempting to build mathematical models of subjective reality. For example, Likert used a scaling technique for measuring what words meant to different people as a means of evaluating public opinion and measuring intercultural perceptions and prejudice.

A decade later, Thomas Saaty developed the Analytical Hierarchy Process to measure the subjective “distance” between criteria. Saaty used a pairwise comparison method in which each factor of the criteria is rated against every other factor to establish ranked values. While this approach had been tried before, the earlier mathematics seemed inappropriate and didn’t fit the problem. Furthermore, there was no way of evaluating the subjective consistency with which these alternatives were being compared.

As it happened, this consistency factor has emerged as one of the key issues for both decision researchers and decision makers alike. Users of the AHP have found that if they went through the pairwise comparison and found that it “didn’t feel right” there was inevitably a correspondingly weak consistency value.

Saaty’s method impressed a wide range of decision makers, including the US State Department, which used it to test alternative foreign policy scenarios for real and potential events in world affairs. In a much different application, co-authors have use it to resolve disagreements about how characters would react in film script plot situations.

Which & Why

After five years of development, we are proud to say that Which & Why Decision Valuation Software can legitimately be regarded as an effective culmination of the entire history of decision making so far.

From Bacon to Franklin, from Kepner-Tregoe to Saaty, Which & Why pulls from the entire pool of collected research and adds the genius of modern computing power to offer what might well be the simplest, quickest, and most precise methodology ever for decision making.

Brainstorming theory, for example, is used effectively in the first phase of the Which & Why process, model-building. Once the model is developed, the criteria factors must be given an importance ranking. In this phase, Which & Why uses the required matrix algebra to make the pairwise comparison methodology easy for anyone. In the third phase, the options under consideration are evaluated against the ranked criteria factors. Then finally, in the fourth phase of decision making, the results are analyzed by taking a leaf from the pages of Carl Jung. An innovative scoring method balances out the subjective and objective aspects of the evaluation to offer a combined Which & Why score.

But Which & Why is more than just a theoretical decision making tool. It has been designed to take into account the ever-present reality of cost as a crucial ingredient of the mix. When this price dynamic is added to the equation, Which & Why goes beyond decision making to value analysis and expenditure justification.

To provide this complete recommendation, Which & Why automatically computes the combined Which & Why score for each option against the fully established costs for that option. It then compares this value to analysis to announce the option that provides the best value under current circumstances. If those circumstances should change (for example, if a new option come to light, or a price is re-negotiated), just plug in the numbers and let Which & Why re-evaluate the recommendation instantaneously.

Finally, as an incentive to make full use of its capabilities, Which & Why has been designed to be as user-friendly and intuitive as possible, from its colorful graphics to its pull-down menus and mouse function.

We think the oracle at Delphi would have been proud.

Prepared by the research department of Arlington Software Corporation


Originally appeared as “http://www.arlingsoft.com/history.htm”, but it is no longer  available on the Web.

Some links