Disclaimer The text below does not claim to be in any way scientific. It is rather the result of a layman's attempt to come to terms with the question of what is the essence of a musical scale, and more specific, what is the essence of a scale that permits creation and performance of harmonic and finally tonal music. Trying to find a model that explains harmony in the face of the mysterious ways of the human brain, the author decided to rather stick with the obvious, if at all the obvious would be sufficient to support an explanation. It's like staying with Bohr's model of the atom rather than attempting to wrestle quarks and superstring theory if all you want is to understand basic chemistry. If any justification for that simplistic approach should be required: it has paved the way to the discovery of what is now called the BohlenPierce scale. 
Contents:
Abstract
1.
Exponential sequence of tone pitches ("Principle of equidistance")
2.
Gestalt compatibility between intervals ("Principle of consonance")
3. Sensory dissonance
4.
Consequence
Abstract
Views of what the author considered essential properties of a
harmonic scale can be found by combing through the early papers
[1,2,3] on what was later dubbed the BohlenPierce scale. In a
condensed form these views can be reduced to the necessity to
conform with two independent aesthetic principles which are based
on the physiology of middle and inner ear:
A third criterion, rather than a principle, is taking into account sensory dissonance. The numerically most simple scales abiding by these axioms are the Western 12tone scale which in its just version is a reasonable approximation to
and the BohlenPierce scale (BP) with the fairly close approximation of its just form to
1. Exponential sequence of tone pitches ("Principle of equidistance")
This very much simplified schematic of a rolledout cochlea and the basilar membrane contained therein nevertheless gives a useful visual impression of the resonant traveling wave that is generated by a certain tone. High frequencies cause resonances near the entrance of the cochlea where the basilar membrane is narrow and under high tension while low frequency resonances are located near the wide, floppy end of the membrane. In humans, the basilar membrane is typically 35 mm (1.55 inch) long.
Stimulated by these frequencydependent resonances are the neurons of the organ of Corti that runs along the center line of the entire basilar membrane. Donald D. Greenwood [4,5,6], basing on research conducted by Georg von Békésy, found experimentally that within a critical bandwidth the ear cannot distinguish between two frequencies that are very close to each other. Then, integrating across the critical bandwidth distribution along the organ of Corti, he determined the following equation for the organ's frequency sensitivity:
Hz (Hertz) is the expression for one period per second, L is the length of the human basilar membrane, and x is the distance of the sensitive point from the (floppy) end of the membrane. Thus, by solving the equation for x, we can calculate that middle A (440 Hz) excites a point about 9.2 mm from the end of the membrane, its octave (880 Hz) does this at 13.2 mm, the next octave (1760 Hz) at 17.7 mm, and the following octave (3520 Hz) at 22.4 mm. We see that stimulation points of the low octaves sit slightly closer together than those of the high octaves. If we ignore this trick of nature to pack slightly more frequency range into the low end, we can use Greenwood's equation to express the distance x_{n} x_{0} of the stimulation points of two tones f_{n} and f_{0}, allocated on steps 0 and n of a scale:
Comparing this empirically found and totally physiologybased equation with the frequency relation of the same tones expressed in Western 12tone equal temperament,
we find that the two expressions become identical if we substitute basilar membrane distances by tone steps (or vice versa):
From these observations it is not far to the conclusion that our sense of hearing judges equal distances between excitation points of the organ of Corti as practically equal steps of pitch, while actually their frequency relation is exponential. Thus in the attempt to find musical scales consisting of what appears to us as approximately equal steps between tones, we in reality create scales with an exponential sequence of pitches. Most known musical scales share this phenomenon, so that it seems justified to formulate a "principle of equidistance" (exponential sequence of tone pitches):
Based on the exponential frequency sensitivity characteristic of the organ of Corti, a musical scale is perceived as aesthetically pleasing if its tone steps follow or at least closely approximate the equation with f_{n} meaning the pitch of step n of the scale, f_{0} the fundamental tone, K the frame interval and N the total number of steps. 
In scales for monodic music (one voice only at any time), K and N can assume quite a range of values.
However,
this condition is just necessary but not sufficient for the design
of scales for polyphonic music which at the same time have
to abide by the following principle, too.
2. Gestalt compatibility between intervals ("Principle of consonance")
Any nonlinearity in the path of a composite acoustical or electrical signal changes the amplitudes of the signal's frequency components, be it fundamental frequencies or harmonics, and it generates intermodulation products, called combination tones in acoustical signals. In the human auditory system nonlinearities abound; otherwise the system would not be able to cover the enormous dynamic range of our hearing and at the same time prevent damage to its elements. The elements that are most easily to understand as acting in a nonlinear way are the system of tympanic membrane and ossicles in the middle ear (Fig. 3) and the basilar membrane in the inner ear (see Fig. 1 above).
The artist's rendition in Fig. 3 makes the inevitability of nonlinearity in the middle ear system almost painfully obvious.
The most commonly experienced combination tones, originated by two tones of the frequencies f_{1} and f_{2}, appear at the frequencies f_{2}  f_{1} ("difference tone") and 2f_{1}  f_{2}. The latter is sometimes described as being even louder than the difference tone. In any case: the existence of these tones shows that our auditory system "suffers" from both quadratic (f_{2}  f_{1}) as well as cubic nonlinearity (2f_{1}  f_{2}). Combination tones of comparable amplitudes should theoretically also appear at f_{2} + f_{1}and at 2f_{2}  f_{1}, however, since these two frequencies are higher than the original tones, they have a tendency to become obscured by the original tones and their harmonics.
We usually do not perceive any combination tones consciously, but they are present, as even a person not trained in analyzing acoustical signals can find out in two simple experiments. (It's a little like dealing with a problem of astronomy: If you can't see a planet, just observe the irregularities in the paths of the others.) Using a multitone generator (software versions are inexpensive), setting the first oscillator to a sine tone at a fixed frequency f_{1} of 200 Hz and letting the second oscillator produce a slowly swept sine signal f_{2} between 370 Hz and 430 Hz, reveals the passing of the difference tone f_{2}  f_{1} (quadratic nonlinearity) past f_{1} by a clearly audible sequence of beating  smooth signal  beating. Then fixing frequency f_{2} at 400 Hz and adding a third oscillator with a sine tone of the frequency f_{3}, slowly swept between 570 and 630 Hz, causes the same sequence again, at a lower loudness this time, but unambiguously discernible. It betrays the passing of the cubic nonlinear product f_{1}  f_{2} + f_{3} past f_{2}. Since the sine tones possess no overtones, the observed beats can only be interpreted in this way. Simple experiments like these prove that combination tones not only exist, but that their amplitude is large enough to cause beats with the original tones.
In the view of this author, the intervals which the combination tones form with the original tones and among each other are detected by the auditory nervous system. If the two original tones happen to form exactly or at least approximately an interval which can be described as the ratio of products of small primes, e.g. 2:3, 3:4, 3:5 etc., the combination tones contribute to the gestalt impression of the interval. The following diagram (Fig. 4) shows an example.
Depicted are the two tones of a just fourth 3:4 (at frequencies f_{1} = 3 and f_{2} = 4 of a normalized frequency scale) after passing through an invented transfer system with a relatively modest quadratic as well as cubic nonlinear amplitude characteristic which reduces their original amplitude from 60 dB to about 59 dB. For greater clarity, the original two tones are sine tones, i.e. they possess no overtones, and the transfer system is predominantly linear with just one each quadratic and cubic element added.
We observe that this transfer creates several previously not present tones at different frequencies. The "tone" at frequency 0 is naturally inaudible; it represents a slight change in air pressure for the duration of the interval. At frequency 1 we notice the difference tone f_{2}  f_{1} of the original interval. Its amplitude in this example is about 32 dB lower than that of the interval tones and certainly not easy to detect. At frequency 2, however, we find, as a result of the nonlinear characteristic's cubic element, 2f_{1}  f_{2} with an amplitude only 15 dB below that of the interval tones, and that should be fairly audible. At frequency 5 we meet 2f_{2}  f_{1} at a similar amplitude, but difficult to detect for reasons already explained. Finally at frequency 6 there appears 2f_{1} as a previously not existing harmonic.
We recognize the face of a person not because it displays obvious elements like mouth, nose, eyes and ears. There are much subtler features that escape our consciousness but add gestalt attributes to a face, and that our brain uses for identification. In a similar way we do not identify intervals because of their timbre. We recognize them independently from the instrument that produces them; we even recognize them when they consist of mere sine tones. But there are the combination tones that provide gestalt elements to the interval, and the author holds that the brain uses this gestalt enhancement in a twofold manner:
First, the gestalt of the interval has changed from two single tones into kind of a harmonic series which is based on frequency 1. The author believes that the human brain is able to detect this feature and that an interval appears the more consonant, the more this harmonic series nears completion. (Not surprisingly, the most complete harmonic series among the intervals forming the Western 12tone scale is generated by the octave, followed in this order by the fifth, the fourth, the sixth, the major third, the minor third, and so on, and that is exactly how we rate their consonance.)
Second, and for the development of musical scales perhaps even more important, this series of tones contains several other intervals, as for instance in the above example three octaves (1:2, 2:4, 3:6), two fifths (2:3, 4:6), a sixth (3:5), a major third (4:5) and a minor third (5:6). Consciously, we cannot filter these intervals out of the sound cluster that we hear, but subconsciously our brain recognizes them and therefore is prepared for them if in the context of a musical composition they appear on their own as centerpieces of similar clusters. The brain considers these intervals as compatible (or related), and it is therefore important for a musical scale to contain them as building blocks. (It is, by the way, not deciding whether we start with the forth, like shown here, or with the the fifth, the major third or the minor third; the resulting harmonic series contain predominantly the intervals which form the Western 12tone scale. That changes, however, if we start with the sixth (3:5); in that case the harmonic series, now mainly consisting of odd harmonics, produces predominantly the intervals of the BohlenPierce scale.) Intervals which are not compatible with these harmonic series are judged by the brain as out of place.
This altogether results in a "principle of consonance" (gestalt compatibility between intervals) for scales suitable for polyphonic, harmonic music:
The nonlinearity of the ear, mainly located in middle ear and basilar membrane, changes the gestalt impression of intervals and chords by causing combination tones and altering harmonics. These thus modified composite sounds reduce the choice of tones, which can form a usable scale for harmonic polyphonic music, to the members of a small range of intervals with compatible gestalt features. 
Under realistic conditions, the original interval tones would not be free of overtones. If these partials are harmonics, the basic effect doesn't change, because the harmonics simply reinforce the harmonic series that has been caused by nonlinearity. 2f_{1}  f_{2}, for example, is now supported by the second harmonic of f_{1}, too, as the difference tone between this second harmonic and f_{2}. Likewise, all harmonics of both tones interact, strengthening the harmonic series, under condition that the original interval can be described as the ratio of products of small primes.
But
this example also demonstrates where the "principle of consonance"
is no longer applicable. For if the partials are not harmonic,
as in the case of gamelan instruments for example, they and the
combination tones caused by them destroy the harmonic series generated
by the two original frequencies. An entirely different aesthetical
approach is required to cope with this situation, culminating
in nonharmonic scales like pelog and slendro.
The two principles described above are not the whole story. Depending on the timbre of the available voices, sensory dissonance sets limits to the choice of intervals which can be used in a scale for polyphonic music, as described by William A. Sethares [7] based on work of predominantly Hermann von Helmholtz [8] and R. Plomp and W. J. M. Levelt [9]. Partials or original tones and combination tones in less than critical bandwidth distance to each other can cause unpleasant beating sensations or roughness. Sensory dissonance calculations show which intervals are relatively free of these sensations, and thus can provide good guidance for choosing the members of a scale fit for a specific set of instruments. This is the third independent criterion regarding material for harmonic scales.
However, both consonance and dissonance have a significance of their own; they are not each others opposites. Like missing beauty does not necessarily mean ugliness, fading consonance does not simply morph into dissonance. Vice versa, the absence of sensory dissonance does not unavoidably confer the impression of consonance. A consonant interval can still maintain much of its character even when being affected with a degree of beating sensations and even roughness, and just dodging sensory dissonance does not yet create a useful musical scale.
"Complex
models are rarely useful, except for those writing their dissertations."
Vladimir
Igorevich Arnold
Thus a simplistic model of the process that generates scales, which can be used to create harmonic and tonal music, may be sketched like this:







basilar membrane 





consonance & compatibility of intervals and chords 



of pitches f_{n}/f_{0} ~ K^{n/N} 

(if limitations posed by sensory dissonance are respected) 
Scales for this purpose can be developed by a variety of means but will fail to be attractive if the result violates either of the two principles described above. Yet, simultaneously abiding by both of them limits the number of those scales to a small pool, even when allowing for some tolerance. The numerically most simple of these scales are the Western 12tone scale which in its just version is a reasonable approximation to
and the BohlenPierce scale (BP) with its fairly close approximation of its just form to
[1] Bohlen,
Heinz: Manuscript, untitled, undated,
pencil, 24 pages (in German). Hamburg, early 1972.
Original archived at HuygensFokker
Foundation
(Stichting HuygensFokker), Amsterdam.
The paper describes the derivation of the 13step scale (later
BP) in both a just and equaltempered form, in conformance with
two basic, independent principles: consonance with combination
tones, and approximate equidistance of scale steps. The presentation
comprises a 13step chromatic and four 9step diatonic versions.
[2] Bohlen,
Heinz: Die Bildungsgesetze des 12stufigen Tonsystems und ihre
Anwendung auf einen Sonderfall.
Manuscript, ink, 50 pages, Hamburg, July 1972.
Original archived at HuygensFokker Foundation (Stichting HuygensFokker),
Amsterdam.
Mainly an expanded and refined version of [1], containing also
first considerations of realization essentials (tone names, notation,
plans for an electronic organ).
[3] Bohlen,
Heinz: Versuch über den Aufbau eines tonalen Systems auf
der Basis einer 13stufigen Skala.
Manuscript, first
version,
typed, 7 pages, Hamburg, Dec. 1972.
Manuscript, second version, typed, 9 pages, Hamburg, July 1974.
Originals archived at HuygensFokker Foundation (Stichting HuygensFokker),
Amsterdam.
These
are mainly abstracts of [2], intended to inform a selected readership
about the 13step scale.
[4] Greenwood,
D.D.: Auditory Masking of the Critical Band. Journal of
the Acoustical Society of America, vol. 33, pp. 484–502,
1961a.
[5] Greenwood, D.D.: Critical Bandwidth and the Frequency Coordinates
of the Basilar Membrane. Journal of the Acoustical Society
of America, vol. 33, pp. 1344–1356, 1961a.
[6] Greenwood, D.D.: A cochlear frequencyposition function
for several species  29 years later. Journal of the Acoustical
Society of America, vol. 87, pp. 2592–2605, 1990.
[7] Sethares, William A.: Tuning, Timbre, Spectrum, Scale. SpringerVerlag London Limited, 1998, pp.165188
[8] Helmholtz, Hermann von: On the Sensations of Tone (Die Lehre von den Tonempfindungen). Dover Publications, New York 1954 (1885), pp.152233
[9] Plomp, R.
and W. J. M. Levelt: Tonal Consonance and Critical Bandwidth.
J. Acoust. Soc. Am. 38, 1965, pp.548560
([4,5,6] retrieved from "http://en.wikipedia.org/wiki/Greenwood_function")