Open Letter to Tom Wolfe

Author and journalist Tom Wolfe died On May 14 of pneumonia, at the age of 88.  He was a wonderful writer, of fiction as well as non-fiction, and a penetrating popper of rhetorical bubbles. I corresponded with a him a few times about various neuroscience topics.

In 2016 in his book on language, The Kingdom of Speech, Mr. Wolfe wrote about two eminent scientific figures, one from the nineteenth century one from the twentieth. His take on Noam Chomsky, that the great linguist is somewhat arrogant and immune to empirical evidence, is very plausible, But, his view of Charles Darwin, as class-privileged and ambitious for personal fame, does not at all fit with my knowledge of him.  I wrote to Mr. Wolfe to make my argument, but he did not respond.  Here is the letter:

Dear Tom Wolfe:

You are much too tough on poor old Charlie Darwin who, from everything I have read by him and about him, was a very decent man.

In 1992 or so I made notes for a review of a tendentious and inflated book on Darwin by Desmond and Moore, a book much admired by the propagandist Stephen Jay Gould (I could give you much chapter and verse on Gould’s mendacious treatment of the Bell Curve book and the IQ/heritability controversy in general, for example).  D & M interpreted much of Darwin’s science in social/political terms.  Like you, they think he cheated Wallace.  D & M also favor the reader with many magical intrusions into Darwin’s private thoughts.  I never wrote the review, but most of my notes could do as well for your class-conscious attack on Darwin.

I have read much Darwin and never saw any evidence of snobbery.  (And I can claim first-hand knowledge of British snobbery, having a left-school-at-14 cockney father and Anglo-Indian mother and being a grammar-school boy myself! And Darwin married into “trade” – his cousin Emma Wedgewood) Yes, Darwin was hooked in to the establishment, but it was an intellectual establishment not one based on wealth or class.  Darwin and Wallace met and corresponded all amiably.  As far as I can tell, they got along just fine.  Wallace was deferential, but Darwin was the older man and better established.

D & M’s main thesis, like yours, is that Darwin cheated Wallace.  But that is not correct because they, and you, make an implicit assumption that is completely wrong. The wrong assumption is that being first to publish an idea is, and should be, the only basis for assigning scientific credit.  Not true.  The weight of evidence behind a theory – which takes time to collect – is just as important as the theory itself.  Darwin hesitated to publish for some 20 years because he was building his case.  Unlike many modern scientists he did not look for the LPU – “least publishable unit” – as a way to puff up his CV.  He did the right thing by holding back from publication until he had an overwhelming case.  He should not be punished for acting responsibly. And he did think of natural selection first!

That is why Lyell and his other friends wanted him to share credit – not because they were of the same social class.  They knew he had been working for years to find evidence in support of his theory.  Or contrary to it: Darwin was very good about considering contradictory evidence – just read the Origin.

What is more, Wallace agreed he had been treated fairly.  He never held anything against Darwin, calling one of his books Darwinism, as you point out.  So what right have we, knowing less and living in a different time, what right have we to blame Darwin if Wallace did not? (And do you really want to appear to parrot Desmond and Moore?)

Finally, natural selection and language: I agree with you and others that the evolution of language, and human intelligence generally, is still a problem.  But I think Darwin was also well aware of the difficulties.  Unlike Noam C, he was a cautious and thoughtful scientist.  Darwin did make a mistake, though.  He believed that variation – the raw material on which selection must act – is always, or almost always, random and small in extent (he did know about large variants called “sports”, though: he just thought them too rare to have much evolutionary effect). He was wrong on both counts: variation is sometimes large and not random.  He also believed in some Lamarckian effects, inheritance of acquired characters, for which he has been much criticized.  But of course recent work on epigenetics shows he was to some extent right about that.

Incidentally, Darwin also well knew about what he called “correlated variation” the fact that selection for one characteristic often brings other irrelevant ones along with it – tameness and floppy ears (dogs, Russian foxes), large beak and large feet (pigeons) large hands and large…(Donald Trump) and so on.  Sickle-cell anemia is the classic example: if you have one sickle gene you have limited immunity to malaria, if you have two, you are sick.

I think you and others are correct in doubting that the evolution of language and human intelligence depends much on natural or even sexual selection.  It seems obvious to me that it depends much, much more on the very neglected topic of variation: what are the kinds of changes in cognitive repertoire offered up from generation to generation by genetic and epigenetic variation?  More generally, is variation small from one generation to the next (as Darwin implies) or is it sometimes large?  Is it directional? Does it tend to move in a preferred direction (recurrent mutations are one case where there is clearly a built-in trend)? And so forth..

With that sole correction – that the humans’ apparent leap in language and cognitive development depends much more on the (largely unknown) properties of genotypic and phenotypic variation than on natural selection – human beings and their  evolution may be safely reunited with rest of the animal kingdom.  Darwin was wrong about variation, but not wrong about natural selection.  His problem is that natural selection may indeed be almost irrelevant to the evolution of whatever it is that makes people smarter than chimps.

And finally, are language and culture simply a manifestation of human cognitive abilities in general – nothing special to see here, move on!  That simply re-labels the problem.  Neither a chimp nor even a border collie can spontaneously construct tools or sentences in the way that a human child can.  What does the kid have that the ape does not?  That is still a problem, whether you call the evolution of language the evolution of intelligence or the evolution of culture.


John Staddon

On Responsibility and Punishment

Published as: Staddon, J. (1995) On responsibility and punishment.  The Atlantic Monthly, Feb., 88-94.

The litany of social dysfunction is now familiar.  The rates of violent crime are higher than they have ever been: Americans kill and maim one another at per-capita rates an order of magnitude higher than other industrialized nations.  The rate of marriage has been generally declining and the rate of illegitimacy hits new highs each year.  Tens of thousands of children have no fathers and no family member or close acquaintance who has a regular job.  This pattern is now repeating into a second and third generation.  Illiteracy is becoming a problem and schools have so lost authority that the accepted response to armed pupils is to install metal detectors.  Senator Moynihan in a celebrated article recently pointed out how we cope with social disintegration by redefining deviancy, so that crimes become “normal” behavior.

How did we arrive at this condition?  There’s no short answer, but I have come increasingly to believe that my own profession — psychology — bears a large part of the blame.  The story began many years ago, when psychology defined itself as a science. By thus anointing itself, psychology gained great prestige.  People accepted with little demur prescriptions that would earlier have been condemned on moral grounds.  Don’t spank your child.  Don’t attempt to deter sexual exploration by young people — deterrence is probably bad and will certainly fail.  Punishment is ineffective and should be replaced by positive reinforcement.  Self-esteem is good, social stigma bad.  It is not clear that this advice was all wrong.  What is clear, and what I will show in this article, is that it was not based on science.

Some questions about behavior can be answered — either now or in the future — through the methods of science.  How does visual perception work?  What are the effects of different reward schedules?  How accurate is memory for words and faces?  What lighting conditions are best for different kinds of task?  Which people are likely to succeed in which professions?  Other questions, including apparently simple ones such as the value of some teaching techniques or the legitimacy of corporal punishment, cannot be answered.  They cannot be answered by science because they have consequences that go beyond the individual or far into the future.  Corporal punishment and teaching methods affect not just the child but, eventually, the nature of society.  Society cannot be the subject of experiments, and even if it could, effects of social changes usually take decades or even centuries to play out.  Hence we cannot expect to get hard scientific answers to many social questions.

Obviously, we need to separate those questions that belong in the domain of science from those that do not; to separate questions which can be answered definitively from those which cannot.  Unfortunately, psychology as a profession tends to assume that all questions about human action fall within its domain and all can eventually be answered with the authority of science — and this imperialism has gone largely unquestioned.

Psychologists and behavioral psychiatrists seem like a diverse crew.  At one end we have “touchy-feelies” who say things like “any of us who were raised in the traditional patriarchal system have trouble relating because we’ve been ‘mystified’ to some degree by an upbringing that compels obedience and rules by fear, a raising that can be survived only by a denial of the authentic self (John Bradshaw).”  At the other we have the behaviorists, who say things like “In the scientific view. . . a person’s behavior is determined by a genetic endowment traceable to the evolutionary history of the species and by the environmental circumstances to which as an individual he has been exposed (B. F. Skinner).”

Skinner and Bradshaw seem to agree on little.  Skinner had no time for “authentic selves” or “feelings”; Bradshaw undoubtedly feels little kinship with Skinnerian “rat psychology.”  It may come as a surprise, therefore, to learn that psychological pundits from Bradshaw to Skinner agree on several important things.  Almost all have a perspective that is entirely individual.  All reject what John Bradshaw calls “fear,” Fred Skinner called “aversive control” and the rest of us call punishment.  Nearly all psychologists believe that behavior is completely determined by heredity and environment.  A substantial majority agree with Skinner that determinism rules out the concept of personal responsibility.  This opposition between determinism and responsibility is now widely accepted, not just by behaviorists but by every category of mental-health professional, by journalists, by much of the public — and by many in the legal profession.

Behaviorism is the most self-consciously “scientific” of the many strands that make up psychology.  Although recently somewhat overshadowed by other movements such as cognitive psychology, the influence of behaviorism during most of the short history of psychology has been overwhelming.  Consequently, when behaviorists have produced “hard” evidence in favor of beliefs already shared by other psychologists, the combined effect has always been decisive.  I will describe just such a confluence in this article.

About moral positions, argument is possible.  But about scientific “facts” there can be no argument.  Skinner, and the behaviorist movement of which he was the head, delegitimized both individual responsibility and punishment.  Responsibility was dismissed by philosophical argument.  Punishment was ruled out not by moral opposition but by supposedly scientific laboratory fact.  Less “scientific” psychologists and psychiatrists have also agreed that punishment is bad, but the reasons for their consensus are more complex and to do with the social function of psychotherapy.  Nevertheless, for the majority of psychologists and psychiatrists, the “facts” established by the behaviorists have always constituted an unanswerable argument — especially if they support preexisting beliefs.  This “scientific” consensus has had a devastating effect on  the moral basis of American society.                I will argue just two things in this article: first, that there is no opposition between behavioral determinism and the notion of individual responsibility.  And second, that the supposedly scientific basis for blanket opposition to punishment as a legitimate social instrument –in the family, school and workplace, and the judicial system –is nonexistent.  My focus is Skinnerian behaviorism, because it is the area of psychology that has been most concerned with large social issues.  But the key ideas have been carried forward by a much larger number of psychologists and psychiatrists who have never thought of themselves as behaviorists.

  1. F. Skinner’s 1971 best-seller Beyond Freedom and Dignity contains his most concerted, and successful, attack on traditional methods of social control. Most psychotherapists, behaviorist and nonbehaviorist alike, have come to agree with the substance of Skinner’s message: that punishment is bad and that that the idea of individual responsibility is a myth.  Skinner’s argument is simply wrong.  It will be a task for future sociologists to understand why such a bad argument received such ready assent.

Skinner contrasts the “prescientific” view that “a person’s behavior is at least to some extent his own achievement” with the “scientific” view that behavior is completely determined by heredity and environment.  The conventional view, says Skinner, is that “[A] person is free.  He is autonomous in the sense that his behavior is uncaused.  He can therefore be held responsible for what he does and justly punished if he offends.  That view, together with its associated practices, must be re-examined when a scientific analysis reveals unsuspected controlling relations between behavior and environment.”  What’s wrong with these apparently reasonable claims?


Is man free?  Well, as the professor used to say, it depends on what you mean by “freedom.”  The bottom line is that you’re free if you feel free.  Skinner’s definition is simpler: to him, freedom is simply the absence of punishment (“aversive contingencies”).  But we are all “punished” by gravity if we don’t obey its rules.  The punishment can sometimes be quite severe, as beginning cyclists and skaters can attest.  Yet we don’t feel unfree when we learn to skate or cycle.  Punishment doesn’t always abolish freedom — and freedom is not just absence of punishment.

Skinner has another definition for freedom: absence of causation (“autonomous man”).  This is an odd notion indeed.  How can one ever prove absence of causation.  In science, a conjecture like this is called “proving the null hypothesis” and everyone accepts its impossibility.  We might prove the converse, however, that people feel unfree when their behavior is determined, that is to say, when it can be predicted.   For example, suppose a rich and generous aunt offers her young niece a choice between a small sum of money and a large sum.  In the absence of any contrary factors, the niece will doubtless pick the larger over the smaller (classical economics rests on the assumption that this will always be the free choice).  Can we predict the niece’s behavior?  Certainly.  Is her behavior determined?  Yes, by all the usual criteria.  Is she unfree?  She certainly doesn’t feel unfree.  People generally feel free when they follow their preferences, no matter how predictable those preferences may be.  Behavior can be predicted in other contexts as well.  Mathematicians predictably follow the laws of arithmetic, architects the laws of geometry and baseball players the laws of physics.  The behavior of all is determined; yet all feel free.  Ergo, predictability — determinism — doesn’t equal absence of freedom as Skinner proposes.

So, even if we could predict all human behavior with the precision of these examples, this wonderful new science would have no bearing at all on the idea of freedom.


There’s another strand in Skinner’s assault on traditional practices, his attack on punishment.  He rejects punishment not because it’s morally wrong, but because it doesn’t work.  (W. H. Auden had no such doubts about punishment when he remarked “Give me a no-nonsense, down-to-earth behaviorist, a few drugs, and simple electrical appliances, and in six months I will have him reciting the Athanasian creed in public.”)  Since everyone knows that some punishments work, sometimes, you’ll naturally be curious to know how Skinner defended this position.  His argument boils down to three points: punishment is ineffective because when you stop punishing, the punished behavior returns; punishment provokes “counterattack”; positive reinforcement is better.  Let’s look at each of these.

Punishment is ineffective.  Well, no, it isn’t.  Common sense aside, laboratory studies with pigeons and rats (the data base for Skinner’s argument) show that punishment (usually a brief electric shock) works very well to suppress behavior, so long as it is of the right magnitude and follows promptly on the behavior that is to be suppressed.  If the rat gets a moderate shock when he presses the bar, he stops pressing more or less at once.  If the shock is too great, the rat stops doing anything; if it’s too weak, he may still press the bar once in a while; if it’s just right, he quits pressing, but otherwise behaves normally.  Does the punished behavior return when the punishment is withdrawn?  It depends on the training procedure.  A rat well-trained on an avoidance procedure called shock postponement, in which he gets no shock so long as he presses the lever every now and then, may keep pressing indefinitely even after the shock generator is disconnected.  In this case, punishment has very persistent effects indeed.

Punishment provokes counterattack.  Sure; if a food-producing lever also produces shock, the rat will try to get the food without getting the shock.  A famous picture in introductory psychology texts is called “breakfast in bed.”  It shows a rat in a shock-food experiment that learned to press the lever while lying on its back, insulated by its fur from the metal floor grid.  Skinner was right that rats, and people, try to beat a punishment schedule.

Positive reinforcement is more effective.  Not true.  The effects of positive reinforcement also dissipate when the reinforcement is withdrawn, and there is no positive-reinforcement procedure that produces such persistent behavior as a shock-postponement schedule.  Positive reinforcement also provokes “counterattack.”  Every student who cheats, every gambler who rigs the odds, every robber and thief, shows the “counterattack” provoked by positive reinforcement schedules.

There are other arguments on both sides, but the net conclusion must be that the scientific evidence is pretty neutral in deciding between reward and punishment.  They both have their advantages and disadvantages: punishment is better for suppressing behavior, positive reinforcement better for generating behavior; avoidance (punishment) schedules tend to produce more persistent behavior than reward schedules, and so on.  If we wish to favor reward over punishment, we must make a moral, not a scientific, case.


All this might be academic, but for its impact on legal thinking.  The opposition between determinism and responsibility, and the doubts cast on punishment, do seem to raise issues of justice.  If “the Devil (or, at least, “my environment”) made me do it!” surely the rigors of just punishment (of dubious effectiveness in any case, according to psychologists) should be spared?  In the era of Lorena Bobbit, the Reginald Denny attackers and the Menendez brothers, this argument evidently strikes a receptive chord in the hearts of American juries.

Too bad, because the argument is false.  I’ve already argued that behavior can be both determined (in the sense of predictable) and free.  I’ll argue now that the legal concept of personal responsibility is founded on this kind of predictability.  Personal responsibility demands that behavior be predictable, not the opposite, as Skinner contended.

What is the purpose of judicial punishment?  Legal scholars normally identify two purposes, retribution and deterrence.  Retribution is a moral concept, which need not concern us here.  But deterrence is a practical matter.  Arguments about deterrence are clouded by ideology and the impossibility of deciding the issue by the methods of science.  Nevertheless, there is an approach to deterrence that is straightforward and acceptable to most people which much simplifies a jury’s task.  The idea is that the purpose of legal punishment is to minimize the total amount of suffering in society, the suffering caused by crime as well as the suffering caused by punishment.  The concept is simple: if thievery is punished by amputation, the level of thievery will be low, but the suffering of thieves will be very high, higher perhaps than warranted by the reduction in theft.  On the other hand, if murderers go free, the level of murder will be high and the ease of the killers will not be balanced by the suffering of the rest.  We may argue about how to measure suffering and how to assess the effect of a given level of legal punishment for a given crime, but the principle, which I call the social view of punishment, seems reasonable enough.  It is consistent with the fundamental principle that government exists for the welfare of society as a whole, not for the good of any particular individual.  Once they understand the argument, most people seem to agree that the social view of punishment is acceptable, although not, perhaps, the whole story.  What people do not seem to realize is that this perfectly reasonable view is not opposed to determinism: it requires determinism.

From an objective point of view — the only legitimate point of view for science — “holding a man responsible” for his actions means nothing more than making him subject to punishment if he breaks the law.  The social view of punishment assumes that people are sensitive to reward and punishment, that behavior be predictably subject to causal influences.  If criminal behavior is predictably deterred by punishment, the justly punished criminal is less likely to disobey the law again, and serves as an example to other potential lawbreakers.  This is the only objective justification for punishment.  But if behavior were unpredictable and unaffected by “reinforcement contingencies” — if it were uncaused, in Skinner’s caricature of “freedom” — there would be absolutely no point to punishment or any other form of behavioral control, because it would have no predictable effect.  In short, legal responsibility requires behavioral determinism, not the reverse.

It is interesting to reflect that the objective case for personal responsibility rests entirely on the beneficial collective effects (on the sum total of human suffering) of just punishment.  It does not rest on philosophical notions of individual autonomy, or personal intent, or anything else at the level of the individual — other than normal susceptibility to reward and punishment.  The idea that the law is somehow concerned with the mental state of the accused, rather than with the consequences of judicial action, has taken root because Skinner, like most other psychologists, focused so exclusively on the individual.

If a person’s “behavior is at least to some extent his own achievement” then, says Skinner, he can be blamed for failure and praised for success.  Since personal responsibility is a myth (he concludes) praise and blame are irrelevant.  But if personal responsibility is defined as I have defined it, praise and blame need not –should not — be abandoned.  In the social view, the use of praise and blame has nothing to do with the ontology of personal responsibility, the epistemology of intention or whatnot.  It has everything to do with reward and punishment (in other contexts, Skinner admits as much, at least with respect to praise).  We praise good behavior because we wish to see more of it; we blame the criminal because we wish less crime.  Praise and blame are perhaps the strongest incentives available to society.  By giving them up, Skinner gave up our best tools for social order.

It is extraordinary that Skinner seems to have missed the connection between determinism and the sanctions imposed by the legal system.  He spent his life studying how the behavior of animals is determined by the conditions of reward and punishment.  He and his students discovered dozens of subtle and previously unsuspected regularities in the actions of reward and punishment.  Yet he failed to see that the system of rewards and punishments imposed by society works in much the same way as his reinforcement schedules.

Remarkably, law and science seem to agree on the social view of punishment.  Only when punishment is likely to be completely ineffective as a deterrent does the law limit its use.  If the criminal is insane, or if injury was the unintended result of actions whose harmful outcome was unforeseeable, no guilt is attached to the perpetrator and no punishment is given — presumably because punishment can play no role in preventing the recurrence of such acts.  There is surprising congruence between the legal concept of responsibility and the function of punishment as a deterrent.   “Guilt” is established not so much by the act, as by the potential of punishment to deter the act.


These arguments greatly simplify a jury’s task.  Jurors have no need to puzzle through philosophical questions about “intent” or knowledge of right and wrong.  Nor do they need to ask whether criminal behavior was determined by the defendant’s past history.  (The scientific answer will almost always be, “yes,” because almost all behavior is determined.)  History is not the point.  The point is: Did the defendant know that his actions would have an illegal outcome?  And, if the accused had known, in advance of the act, that sure punishment would follow, would he still have acted as he did?  If the criminal would have been deterred by the prospect of punishment then, says the social view, he should be punished.  Did the Menendez brothers know that their actions would result in the death of their parents?  Presumably, yes.  If they had known that these acts would result in severe punishment (life in prison, death), would have they have acted nevertheless?  Probably not.  Verdict: guilty.  On the other hand, if the jury has reason to believe that the defendants’ past history was so horrific that they would have murdered even in the face of certain punishment, then some other verdict (which might still involve removing these damaged men from society) would be appropriate.


The social view of punishment is as far as psychology can go towards prescribing social policy.  Given a certain set of values, psychology may help us decide what system of rewards and punishments will be helpful in promoting them.  But the social view of reward and punishment does not by itself prescribe social policy.  Our value system, our morality, plays a legitimate role in measuring “suffering,” in evaluating known outcomes and in judging the rightness of wrongness of particular rewards and punishments.  We’re less moved by the plight of the disappointed thief who breaks open an empty safe, than by the suffering of a mugging victim, for example.  Psychology can tell us a little (only a little, since we don’t do such experiments on human beings) about the individual effects of corporal punishment vs. the effects of a jail term; it cannot tell us whether corporal punishment is cruel or not.  Social science can tell us that more people will be killed by guns if guns are freely available than if they are not.  It cannot tell us whether the freedom to bear arms is an inalienable right.  Psychology can tell us something about the extent of homosexuality in different cultures; it cannot tell us whether homosexuality is good, bad or a matter of indifference.  Psychology can also tell us that social opprobrium — Hester Prynne’s “A”, blame, or the big red “D” some have proposed for drunk drivers — is often an effective deterrent.  It cannot tell us whether such punishments are “right” or not.  Scientific psychology, like all science, is amoral: it tells us what is, or what might be — not what should be.  Psychologists who offer more, promoters of “authentic selves” or punishment-free societies, are peddling not science but faith.


Was Darwin Wrong?

Or have critics – and some fans – missed the point?

Christopher Booker is a contrarian English journalist who writes extensively on science-related issues.  He has produced possibly the best available critical review of the anthropogenic global warming hypothesis. He has cast justifiable doubt on the alleged ill effects of low-level pollutants like airborne asbestos and second-hand tobacco smoke.

Booker has also lobbed a few hand-grenades at Darwin’s theory of evolution.  He identifies a real problem, but his criticism misses a point which is also missed even by some Darwin fans.

Is anti-Darwin ‘politically incorrect’?

In that 2010 article, Booker was reacting to a seminar of Darwin skeptics, many very distinguished in their own fields.  These folk had faced hostility from the scientific establishment which seemed to Booker excessive or at least unfair. Their discussion provided all the ingredients for a conspiracy novel:

[T]hey had come up against a wall of hostility from the scientific establishment. Even to raise such questions was just not permissible. One had been fired as editor of a major scientific journal because he dared publish a paper sceptical of Darwin’s theory. Another, the leading expert on his subject, had only come lately to his dissenting view and had not yet worked out how to admit this to his fellow academics for fear that he too might lose his post.

The problem was raised at an earlier conference:

[A] number of expert scientists came together in America to share their conviction that, in light of the astonishing intricacies of construction revealed by molecular biology, Darwin’s gradualism could not possibly account for them. So organizationally complex, for instance, are the structures of DNA and cell reproduction that they could not conceivably have evolved just through minute, random variations. Some other unknown factor must have been responsible for the appearance of these ‘irreducibly complex’ micromechanisms, to which they gave the name ‘intelligent design’. [my emphasis]

I am a big fan of Darwin. I also have respect for Booker’s skepticism.  The contradiction can be resolved if we look more carefully at what we know now – and at what Darwin actually said.

The logic of evolution

There are three parts to the theory of evolution:

  1. The fact of evolution itself. The fact that the human species shares common ancestors with the great apes.  The fact that there is a phylogenetic “tree of life” which connects all species, beginning with one or a few ancestors who successively subdivided or became extinct in favor of a growing variety of descendants.  Small divergences became large ones as one species gave rise to two and so on.
  2. Variation: the fact that individual organisms vary – have different phenotypes, different physical bodies and behaviors – and that some of these individual differences are caused by different genotypes, so are passed on to descendants .
  3. Selection: the fact that individual variants in a population will also vary in the number of viable offspring to which they give rise. If number of offspring is correlated with some heritable characteristic – if particular genes are carried by a fitter phenotype – then the next generation may differ phenotypically from the preceding one.
    Notice that in order for selection to work, at every stage the new variant must be more successful than the old.

An example: Rosemary and Peter Grant looked at birds on the Galapagos Islands.  They studied populations of finches, and noticed surprisingly rapid increases in beak size from year to year. The cause was weather changes which changed the available food for a few years from easy- to hard-to-crack nuts.  Birds with larger beaks were more successful in getting food and in leaving descendants.  Natural selection operated amazingly quickly, leading to larger average beak size within just a few years.  Bernard Kettlewell observed a similar change, over a slightly longer term, in the color of the peppered moth in England.  As tree bark changed from light to dark to light again as industrial pollution waxed and waned over the years, so did the color of the moths. There are several other “natural experiments” that make this same point.

None of the serious critics of Darwinian evolution seems to question evolution itself, the fact that organisms are all related and that the living world has developed over many millions of years.  The idea of evolution preceded Darwin. His contribution was to suggest a mechanism, a process – natural selection – by which evolution comes about.  It is the supposed inadequacy of this process that exercises Booker and other critics.

Looked at from one point of view, Darwin’s theory is almost a tautology, like a theorem in mathematics:

  1. Organisms vary (have different phenotypes).
  2. Some of this variation is heritable, passed from one generation to the next (have different genotypes).
  3. Some heritable variations (phenotypes) are fitter (produce more offspring) than others because they are better adapted to their environment.
  4. Ergo, each generation will be better adapted than the preceding one. Organisms will evolve.

Expressed in this way, Darwin’s idea seems self-evidently true.  But the simplicity is only apparent.

The direction of evolution

Darwinian evolution depends on not one but two forces: selection, the gradual improvement from generation to generation as better-adapted phenotypes are selected; and variation: the set of heritable characteristics that are offered up for selection in each generation.  This joint process can be progressive or stabilizing, depending on the pattern of variation.  Selection/variation does not necessarily produce progressive change.  This should have been obvious, for a reason I describe in a moment.

The usual assumption is that  among the heritable variants in each generation will be some that fare better than average.  If these are selected, then the average must improve, the species will change – adapt better – from one generation to the next.

But what if  variation only offers up individuals that fare worse than the modal individual?  These will all be selected against and there will be no shift in the average; adaptation will remain as before.  This is called stabilizing selection and is perhaps the usual pattern.  Stabilizing selection is why many species in the geological record have remained unchanged for many hundreds of thousands, even millions, of years.  Indeed, a forerunner of Darwin, the ‘father of geology’ the Scot, James Hutton (1726-1797), came up with the idea of natural selection as an explanation for the constancy  of species.  The difference – progress or stasis – depends not just on selection but on the range and type of variation.

The structure of variation

Darwin’s process has two parts: variation is just as important as selection.  Indeed, without variation, there is nothing to select. But like many others Richard Dawkins, a Darwinian fundamentalist, puts all weight on selection: “natural selection is the force that drives evolution on.” says Dawkins in one of his many TV shows.  Variation represents “random mistakes” and the effect of selection is like “modelling clay”.  Like Christopher Booker, he seems to believe that natural selection operates on small, random variations.

Critics of evolution simply find it hard to believe that the complexity of the living world can all be explained by selection from small, random variations.  Darwin was very well aware of the problem: “If it could be demonstrated that any complex organ existed which could not possibly have been formed by numerous, successive, slight modifications, my theory would absolutely break down.” [Origin]  But he was being either naïve or disingenuous here.  He should surely have known that outside the realm of logic, proving a negative, proving that you can’t do something, is next to impossible.  Poverty of imagination is not disproof!

Darwin was concerned about the evolution of the vertebrate eye: focusing lens, sensitive retina and so on.  How could the bits of an eye evolve and be useful before the whole perfect structure has evolved?  He justified his argument by pointing to the wide variety of primitive eyes in a range of species that lack many of the elements of the fully-formed vertebrate eye but are nevertheless better than the structures that preceded them.

There is general agreement that the focusing eye could have evolved in just the way that Darwin proposed.  But there is some skepticism about many other extravagances of evolution: all that useless patterning and behavior associated with sexual reproduction in bower birds and birds of paradise, the unnecessary ornamentation of the male peacock and many other examples of apparently maladaptive behavior associated with reproduction, even human super-intelligence – we seem to be much smarter than we needed to be as hunter-gatherers.  The theory of sexual selection was developed to deal with cases like these, but it must be admitted that many details are still missing.

The fundamental error in Booker’s criticism of Darwin as well as Dawkins’ celebration of him, is the claim that evolution always occurred “just through [selection of] minute, random variations.  Selection, natural or otherwise, is just a filter.  It creates nothing.  Variation proposes, selection just disposes.  All the creation is supplied by the processes of variation.  If variation is not totally random or always small in extent, if it is creating complex structures, not just tiny variations in existing structures, then it is doing the work, not selection.

Non-random variation

In Darwin’s day, nothing was known about genetics.  He saw no easy pattern in variation, but was impressed by the power of selection, which was demonstrated in artificial selection of animals and crops.  It was therefore reasonable and parsimonious for him to assume as little structure in variation as possible.  But he also discussed many cases where variation is neither small nor random.  So-called “sporting” plants are  examples of quite large changes from one generation to the next, “that is, of plants which have suddenly produced a single bud with a new and sometimes widely different character from that of the other buds on the same plant.” What Darwin called correlated variation is an example of linked, hence non-random, characteristics.  He quotes another distinguished naturalist writing that “Breeders believe that long limbs are almost always accompanied by an elongated head” and “Colour and constitutional peculiarities go together, of which many remarkable cases could be given among animals and plants.”  Darwin’s observation about correlated variation has been strikingly confirmed by a long-term Russian experiment with silver foxes selectively bred for their friendliness to humans.  After several generations, the now-friendly animals began to show many of the features of domestic dogs, like floppy ears and wagging tails.

“Monster” fetuses and infants with characters much different from normal have been known for centuries.  Most are mutants and they show large effects.  But again, they are not random.  It is well known that some inherited deformities, like extra fingers and limbs or two heads, are relatively common, but others – a partial finger or half a head, are rare to non-existent.

Most monsters die before or soon after birth.  But once in a very long while such a non-random variant may turn out to succeed better than the normal organism, perhaps lighting the fuse to a huge jump in evolution like the Cambrian explosion.  Stephen Jay Gould publicized George Gaylord Simpson’s “tempo and mode in evolution” as punctuated equilibrium, to describe the sometimes sudden shift from stasis to change in the history of species evolution.  Sometimes these jumps  may result from a change in selection pressures.  But some may be triggered by an occasional large monster-like change in phenotype with no change in the selection environment.

The kinds of phenotypic (observed form) variation that can occur depend on the way the genetic instructions in the fertilized egg are translated into the growing organism.  Genetic errors (mutations) may be random, but the phenotypes to which they give rise are most certainly not.  It is the phenotypes that are selected not the genes themselves.  So selection operates on a pool of (phenotypic) variation that is not always “small and random”.

Even mutations themselves do not in fact occur at random.  Recurrent mutations occur more frequently than others, so would resist any attempt to select them out.  There are sometimes links between mutations so that mutation A is more likely to be accompanied by mutation B (“hitchhiking”) and so on.

Is there structure to variation?

An underlying mystery remains: just how is the information in the genes translated during development into the adult organism?  How might one or two modest mutations sometimes result in large structured changes in the phenotype?  Is there any directionality to such changes?  Is there a pattern?  Some recent studies of the evolution of African lake fish suggests that there may be a pre-determined pattern. Genetically different cichlid fish in different lakes have evolved to look almost identical.  “In other words, the ‘tape’ of cichlid evolution has been run twice. And both times, the outcome has been much the same.” There is room, in other words, for the hypothesis that natural selection is not the sole “driving force” in evolution.  Some of the process, at least, may be pre-determined.

The laws of development (ontogenesis), if laws there be, still elude discovery. But the origin of species (phylogenesis) surely depends as much on them as on selection.  Perhaps these largely unknown laws are what Darwin’s critics mean by ‘intelligent design’?  But if so, the term is deeply unfortunate because it implies that evolution is guided by intention, by an inscrutable agent, not by impersonal laws.  As a hypothesis it is untestable.  Darwin’s critics are right to see a problem with “small, random variation” Darwinism.  But they are wrong to insert an intelligent agent as a solution and still claim they are doing science. Appealing to intelligent design just begs the question of how development actually works. It is not science, but faith.

Darwin’s theory is not wrong. As he knew, but many of his fans do not, it is incomplete.  Instead of paying attention to the gaps, and seeking to fill them, these enthusiasts have provided a straw man for opponents to attack.  Emboldened by its imperfections they have proposed as an alternative ‘intelligent design’: an untestable non-solution that blocks further advance.   Darwin was closer to the truth than his critics – and closer than some simple-minded supporters.


John Staddon is James B. Duke Professor of Psychology and Professor of Biology, Emeritus, at Duke University. Recent books are (2016) Adaptive Behavior and Learning (2nd edition) Cambridge University Press and Scientific Method: How science works, fails to work or pretends to work. (2017) Routledge.


A study in perception: Feelings cause…feelings

Statistical correlations and thousands of subjects are not enough

The #MeToo movement has taken off and so have the bad effects attributed to anything from mildly disagreeable or misperceived ‘microaggressions’ to physical assault.  Naturally, there is a desire among socially concerned scientists to study the issue. Unfortunately, it is tough to study the effects of a bad social environment. You can’t do experiments – vary the environment and look at the effect – and feelings are not the same thing as verifiable data. But the pressure to demonstrate scientifically what many ‘know’ to be true is irresistible. The result is a plethora of supposedly scientific studies, using methods that pretend to prove what they in fact cannot. Here is a recent example.

“Recent social movements such as the Women’s March, #MeToo, [etc.] draw attention to the broad spectrum gender-related violence that is pervasive in the United States and around the world”, the authors claim in a May 5 op-ed in the Raleigh News and Observer. The title of their study is: “Discrimination, Harassment, and Gendered Health Inequalities: Do Perceptions of Workplace Mistreatment Contribute to the Gender Gap in Self-reported Health?”  It captures in one place some of the worst errors that have crept into social science in recent decades: correlations treated as causes, and subjective judgement treated as objective data.  This study even manages to combine the two: subjective judgments are treated as causes of…subjective judgments.

The article, in the Journal of Health and Social Behavior, is based on reports from 5579 respondents collected in three surveys in 2006, 2010 and 2014. The report applies a battery of statistical tests (whose assumptions are never discussed) to people’s answers to questions about how they feel about mental and physical health, gender, age and racial discrimination, sexual and other harassment.  The large number of subjects just about guarantees that some ‘statistically significant’ correlations will be found.

The study looks at two sets of subjective variables – self-reports – and associates them in a way that will look like cause-effect to most readers.  But the link between these two sets is not causal – no experiments was done or could be done – but a statistical correlation.

Did the authors check to see if self-reports (by “economically active respondents” healthy enough to answer a survey) are reliable predictors of actual, physical health? No, they did not. Their claim that self-reports give an accurate picture of health is inconsistent even with data they do report “In general, studies show that men report better self-rated health than women…[self-report] is nonetheless an important dimension of individuals’ well-being and is strongly correlated with more ‘objective’ indicators of health, including mortality.” Er, really, given that women live longer than men but (according to the authors) report more ill-health? And why the ‘scare’ quotes around ‘objective’?

The authors long, statistics-stuffed, report is full of statements like “Taken together, these studies suggest that perceptions of gender discrimination, sexual harassment, and other forms [of] workplace mistreatment adversely affect multiple dimensions of women’s health.[my emphasis]” So, now perceptions (of gender discrimination) affect [i.e., cause] not mere perceptions but “multiple dimensions” of women’s health.  Unfortunately, these “multiple dimensions” include no actual, objective measures of health.  In other words, this study has found nothing – because finding a causal relation between one ‘perception’ and another is essentially impossible, and because a health study should be about reality, not perceived reality.

The main problem with this and countless similar studies is that although they usually avoid saying so directly, the authors treat a correlation between A and B as the same as A causes B.  Many, perhaps most, readers of the report will conclude that women’s bad experiences are a cause of their bad mental and physical health.  That may well be true, but not because of this study. We have absolutely no reason to believe either that people’s self-reports are accurate reflections of reality or, more importantly, that a correlation is guaranteed to be a cause. Even if these self-reports are accurate, it is impossible to conclude that one causes the other: either that feeling harassed causes sickness, or that feeling sick makes you feel harassed.

Studies like this are nothing but “noise” tuned to prevailing opinion. They overwhelm the reader with impressive-sounding statistics which are never discussed. They mislead and muddle.

The periodical The Week has a column called “Health Scare of the Week”; that is where items like this belong, not on the editorial pages – or in a scientific journal.

Is this why so many NHST studies fail to replicate?

Most ‘significant’ results occur on the first try

Leif Nelson has a fascinating blog on the NHST method and statistical significance and the chance of a false positive.  The question can be posed in the following way: Suppose 100 labs begin the same bad study, i.e., a study involving variables that in fact have no effect. Once a lab gets a “hit”, it stops trying. If the chosen significance level is p (commonly p = 0.05), then approximately 5 of the 100 labs will, by chance, get a “hit”, a significant result, on the first try.  If the remaining 95 labs attempt to replicate, again a fraction between 4 and 5 will “hit” – and so on.  So, the number of ‘hits’ is a declining (exponential) function of the number of trials – even though the chance of a hit is constant, trial-by-trial.

The reason for the trial-by-trial decline, of course, is that every lab has an opportunity for a hit on trial 1, but a smaller number, 1-p = 0.95, has a chance at a second trial, and so on. The ratio of hit probability per opportunity remains constant, p.  The average number of trials per hit is 1/p = 20 in this case. But the modal number is just one, because the opportunity is maximal on the first trial.

On the other hand, the more trials are carried out, the more likely that there will be a ‘hit’ – this even though the maximum number (but not probability) of hits is on the first trial.  To see this, imagine running the hundred experiments for, say 10 repeats each. The probability of non-significance on trial 1 is 1-0.05 = 0.95, on trial 2, (1-p), on trial 3 (1-p)2 and so on.  These trials are independent, so the probability of failure, no ‘hit’ from trials 1 through N is obviously (1-p)N. The probability of success, a ‘hit’ somewhere from trial 1 to trial N is obviously the complement of that:

P(‘hit’|N) = 1-(1-p)N,

Which is  an increasing, not a decreasing function of N. In other words, even though, most false positives occur on the first trial (because opportunities are then at a maximum), it is also true that the more trials are run, the more likely one of them will be a false positive.

But Leif Nelson is undoubtedly correct that it is those 5% that turned up ‘heads’ on the very first try that are so persuasive, both to the researcher who gets the result and the reviewer who judges it.




Response to Vicky: Is racism everywhere, really?

This is a response to a thoughtful comment from Vicky to my blog critical of the supposed ubiquity of racism.  This response turned out to be too long for a comment; hence this new blog. (It also made Psychology Today uncomfortable).

Apropos race differences in IQ and SAT: They do exist, both in the US and in comparisons between white Europeans and Africans.  What they mean is much less clear.  Since IQ and SAT predict college performance, we can expect that blacks will on average do worse in college than whites and Asians, and they do.  Consequently, the pernicious “disparate impact” need not (although it may) reflect racial discrimination.

If a phenomenon has more than one possible cause, you cannot just favor one – as British TV person Cathy Newman did repeatedly in her notorious interview with Canadian psychologist Jordan Peterson.  She kept pulling out “gender discrimination” as the cause for wage disparities and Peterson kept having to repeat his list of possible causes – of which discrimination was only one.  Since there are at least two possible causes for average black-white differences in college performance, it is simply wrong to blame one – racism – exclusively.

I believe you agree, since you refer to “hundreds of variables that could each play a role in explaining why someone of very low SES might fail academically.”  Even Herrnstein and Murray say as much in their much-maligned The Bell Curve.  Nevertheless, the late Stephen Jay Gould falsely accused them of just this crime, writing that “Herrnstein and Murray violate fairness by converting a complex case that can yield only agnosticism into a biased brief for permanent and heritable difference.”  Herrnstein died in 1994, just as the book was published. But the accusation dogs Murray to this day, despite the fact that what they actually said was: “It seems highly likely to us that both genes and environment have something to do with racial differences.  What might the mix be?  We are resolutely agnostic on that issue; as far as we can determine, the evidence does not yet justify an estimate. (my emphases)” Gould’s baleful influence lives on as their critics continue to misrepresent Herrnstein and Murray’s position.

The genetic component might well be less than they suspected. African immigrants to the US presumably have a smaller admixture of “white” genes than African Americans, descended from slaves – and their masters.  If “white” genes make you smarter than “black” genes, American-born blacks should do better than immigrants. Yet immigrants seem to do better socioeconomically than American-born blacks. There are many possible reasons for this, of course. But it serves to remind us that statistical differences between groups need not reflect genetic effects.

A more worrying issue is the assumption that racism is everywhere.  At one time, a religious nation accepted as axiomatic that “we are all sinners!”  The idea of sin has fallen out of favor in a secular age, but racism has taken its place.   We are all racist, whether we know it or not.  Vicky writes: “we are all implicitly biased against people of color”.

Are we, really? There is a problem is with the concept of implicit bias. It appears to be a “scientifically proven” version of sin.  The problem is: it isn’t scientifically proven at all.  The clever ‘scientific test’ for implicit bias – especially racial bias – has not been, and perhaps cannot be, scientifically validated.  The test is the ‘scientific’ equivalent of telling entrails or reading tea leaves.  (The problem is that you can validate a test for an unconscious process only by showing that it predicts some actual behavior. In other words, to validate implicit bias, you must show that it predicts explicit, overt bias. If there is in fact explicit bias, the test is validated – but then you don’t need it, since you have the actual overt bias. Otherwise, no matter what the test says, you can conclude nothing.)

We have had a black president for two terms; there are more than a hundred black members of congress and many more state and local black elected officials.  Many beloved icons of sports and entertainment are black. The rate of interracial marriage continues to increase.  The racial situation in the US is infinitely better than it was 40 or 50 years ago.  It is time to stop imagining, or at least exaggerating, racial bias when little exists. Let’s pay some attention to more critical problems, like the development character and citizenship in the young, the roles of men and women, the place of marriage in a civilized society, and a dozen others more important than a tiny racial divide which agitation about an imaginary implicit bias serves only to widen.

CONSCIOUSNESS — and the Color-Phi phenomenon

When you watch a movie, your retina is stimulated 24 times each second with 24 static images.  An object that takes up adjacent positions in each successive image is perceived as moving smoothly.  The effect can be demonstrated experimentally with a single bright spot that is successively presented at one place and then at an adjacent place (see Figure 15.1).   If the delay between the two presentations is short, the spot appears to move, rather than disappear and then reappear.  This is termed the phi phenomenon.  There is a related effect in which the two spots are different colors.  What is seen is a single moving spot which changes color at about the midpoint of its travel.

This is a puzzle for some cognitivists.  A philosopher and a psychologist conjecture as follows:

[Philosopher Nelson] Goodman wondered: “How are we able…to fill in the spot at the intervening place-times along a path running from the first to the second flash before that flash occurs?” …. Unless there is precognition, the illusory content cannot be created until after some identification of the second spot occurs in the brain.  But if this identification of the second spot is already “in conscious experience” would it not be too late to interpose the illusory ­color-­switching-while-moving scene between the conscious experience of spot 1 and the conscious experience of spot 2?…[other experimenters] proposed that the inter­vening motion is produced retrospectively, built only after the second flash occurs, and “projected backwards in time”….But what does it mean that this experienced motion is “projected backwards in time”?[1]

Presented in this way, the color-phi effect certainly seems baffling, at least to philosopher Goodman.  Dennett and Kinsbourne describe, rather picturesquely, two standard cognitive ways of dealing with this effect.  One, which they term “Orwellian,” is that we experience things in one way, but then revise our memories, much as Minitruth in Orwell’s 1984 revised history.  The color-phi effect thus becomes a post-hoc reinterpretation: two spots are experienced, but a smoothly moving, color-changing spot is reported.   Dennett and Kinsbourne term the other standard approach “Stalinesque,” by analogy with Stalin’s show trials, in which false evidence is created but report­ed accurately.  In this view, what is reported is what was actually experienced, though what was experienced was not what (objectively) happened.

Dennett and Kinsbourne dismiss both these accounts in favor of what they term a “multiple-drafts” model: “Our Multiple Drafts model agrees with Goodman that retrospectively the brain creates the content (the judgment) that there was intervening motion, and that this content is then available to govern activity and leave its mark on memory.  But our model claims that the brain does not bother “constructing” any repre­sentations that go to the trouble of “filling in” the blanks” [2].  In the multiple-drafts model con­sciousness becomes a distributed construct, like “The British Empire” (their analogy), which is not uniquely located in time or space.

Theoretical behaviorism has a much simpler way of looking at the color-phi effect.  First, note that like all other psychological phenomena, the effect involves three conceptually separate domains:

Domain 1: The first is the domain of felt experience, the phenomenological domain.  There is a certain quality (philosophers call this quale) associated with the color-phi experience.  This is subjective and science has nothing to say about it.  From a scientific point of view, I cannot say whether “green” looks the same to you as to me; I can only say whether or not you make the same judgments about colored objects as I do. This point used to be a commonplace in philosophy, but apparently it needs to be reiterated from time to time: “That different people classify external stimuli in the ‘same’ way does not mean that individual sense qualities are the same for different people (which would be a meaningless statement), but that the systems of sense qualities of different people have a common structure (are homeomorphic systems of relations)”  wrote Friedrich Hayek.[3] The same idea was on the table at the dawn of behaviorism: “Suppose, for example, that I introspect concerning my consciousness of colors.  All you can ever really learn from such introspection is whether or not I shall behave towards those colors in the same ways that you do.  You can never learn what those colors really ‘feel’ like to me.” – this from that most cognitive of behaviorists, Edward Tolman[4].

What this means is that if you and I are standing in the same place, we see the same chair to the left of the same table, we judge these two greens to be the same and the red to be different from the green, and so forth.  What we cannot say is that my green is the same as yours,  What we can say (unless one of us is color-blind), is that my green bears the same relation to my yellow as your green does to your yellow.

I can also know if you say the same things about color-phi-type stimuli as I do.  Note that this is a behavioristic position, but it is not the version of behaviorism dismissed by Dennett and Kinsbourne, when they say “One could, then, ‘make the problems disappear’ by simply refusing to take introspective reports seriously.”[5]  As we will see shortly, the question is not whether phenomenological reports should be ignored – of course they should not – but how they should be interpreted.

Domain 2: The second domain is physiological, the real-time functioning of the brain.  The color-phi experiment says nothing about the brain, but another experiment, which I will discuss in a moment, does include physiological data.

Domain 3: The third domain is the domain of behavioral data, “intersubjectively verifi­able” reports and judgments by experimental subjects.  The reports of people in response to appropriate stimuli are the basis for everything objective we can know about color-phi.

Much of the muddle in the various cognitive accounts arises from confusion among these three domains.  For example, an eminent neuroscientist writes: “The qualia question is, how does the flux of ions in little bits of jelly – the neurons – give rise to the redness of red, the flavor of Marmite or paneer tikka masala or wine?”[6]  Phrased in this way we don’t know and can’t know.  But phrased a little differently, the question can yield a scientific answer: What brain state or states corresponds to the response “it tastes like Marmite”?  As Hayek and many others have pointed out (mostly in vain), the phenomenology question – which always boils down to “how does red look to you?” – is not answerable.  All we can know is whether red, green, blue, etc. enter into the same relations with one another with the same results for you as for me – Hayek’s ‘homeomorphic relations.’

Color phi provides yet another example of the same confusion.  Dennett and Kinsbourne write “Conscious experiences are real events occurring in the real time and space of the brain, and hence they are clockable and locatable within the appropriate limits of precision for real phenomena of their type.”[7]  Well, no, not really.  What can be clocked and located are reports of conscious experi­ences and measurements of physiological events.  Conscious experiences are Domain 1, which has neither time nor space, but only ineffable qualia.  The only evidence we have for these qualia (at least, for someone else’s) is Domain 3.  And we can try and correlate Domain 3 data with Domain 2 data and infer something about the brain correlates of reported experiences.  But that’s all.  Dennett and Kinsbourne’s confident claim just confuses the issue.

All becomes much clearer once we look more closely at Domain 3: What did the subjects see?  What did they say about it, and when did they say it?  The real-time events in the color-phi experiment are illustrated in Figure 15.2, which is a version of the general framework of Figure 13.1 tailored to this experiment.  Time goes from top to bottom in discrete steps.  At time 0 the red spot is lit and goes out; there is a delay; then the green spot is lit; there is another delay and then the subject reports what he has seen, namely a continuously moving red spot that changes to green half way through its travel: “RRRRGGGG.”  Stimulus and Response are both Domain 3. The properties of the states are as yet undefined.  Defining them requires a theory for the effect, which I’ll get to in a moment.

Confusion centers on the subject’s response “RRRRGGGG”.  What does this response mean?  This seems to be the heart of the puzzle, but the unknowable quale here is scientifically irrele­vant.   We do not, can not, know what the subject “sees.”  That doesn’t mean the subject’s response is meaningless.  What it can tell us is something about other, “control” experiments that might give the same quale.  Figure 15.3 shows one such control experiment.  In this experiment, a single spot really is moving and changing color at the midpoint: RRRRGGGG, and the subject’s report is, appropriately, “RRRRGGGG.”  The similarity between the responses to the really moving stimulus and to the color-phi stimulus is what the statement “the color-phi stimulus looks like a continuously moving spot that changes color” means.  The point is that we (i.e., an external observer) cannot judge the subject’s quale, but we can judge if his response is the same or differ­ent on two occasions.  And as for the subject, he can also judge whether one thing looks like another or not.  These same-different judgments are all that is required for a scientific account.

A Theory of Color-Phi

The comparison between these two experiments suggests an answerable scientific prob­lem, namely: “What kinds of process give the same output to the two different histories illustrat­ed in the two figures?”  More generally, what characterizes the class of histories that give the response “RRRRGGGG”?  The answer will be some kind of model.  What might the process be?  .  It will be one in which the temporally adjacent events tend to inhibit one another, so that initial and terminal events are more salient than events in the middle of a series.  Thus, the input sequence RRRRGGGG might be registered[8] as something like RRRRGGGG — a sort of serial-position effect, i.e., stimuli in the middle of a series have less effect than the stimuli on the ends (see Chapter 2).   In the limit, when the stimuli are presented rapidly enough, stimuli in the middle may have a negligible effect, so that the input RRRRGGGG yields the registered sequence R….G, which is indistinguishable from the color-phi sequence.    It would then make perfect sense that subjects makes the same response to the complete sequence and the color-phi sequence.

The same response, yes, but just what response will it be?  Let’s accept the existence of a perceptual process that gives the same output to two different input sequences: RRRRGGGG and R……G.  The question is, Why is the response “RRRRGGGG,” rather than “R……G”?  Why do people report the abbreviated sequence as appearing like the complete sequence?   Why not (per contra) report RRRRGGGG  as R……G?  Why privilege one of the two possible interpretations over the other?   It is here that evolution and personal history comes into play[9].   Just as in the Ames Room (Chapter 1) the visual system takes the processed visual input (in this case R……G) and infers, unconsciously, the most likely state of world that it signifies.  Since alternating on-and-off spots are rare in our evolutionary history the inference is that a single moving spot is changing color.

Thus, by responding “RRRRGGGG,” rather than “R……G” we may simply be playing the evolutionary odds.  Given that these two sequences produce the same internal state, the most likely state of the world is RRRRGGGG  – the moving, color-changing spot – rather than the other.  So RRRRGGGG is what we report—and perceive (the subject isn’t lying)[10].

This approach to the color-phi effect is as suitable for non-human as human animals.  As far as I know, no one has attempted a suitable experiment with pigeons, say, but it could easily be done.  A pioneering experiment very similar in form was done many years ago by Donald Blough when he measured pigeons’ visual threshold, something that also raises ‘consciousness’-type questions.  After all, only the pigeon knows when he ceases to ‘see’ a slowly dimming stimulus.  Blough’s solution was a technique invented by his colleague, sensory physiologist Georg Békésy, to measure human auditory thresholds[11].   Blough describes his version of the method in this way: “The pigeon’s basic task is to peck key A when the stimulus patch is visible and to peck key B when the patch is dark. The stimulus patch, brightly lighted during early periods of training is gradually reduced in brightness until it falls beneath the pigeon’s absolute threshold.”[12]  As the patch dims and becomes invisible, so the pigeon’s choice shifts from key A to key B.  Blough’s experiment tracked the pigeon’s dark-adaptation curve – the change in threshold as the light dimmed – which turned out to be very similar in form to curves obtained from people.

Exactly the same procedure could be used to see when a pigeon shifts from seeing on-and-off lights to a continuously moving-and-color-changing light.  The pigeon is confronted with two choice keys, on the left (A) and the right (B).  In between is a digital display that can show either a continuously moving[13] dot that changes color from red to green in mid-travel (continuous), or two dots, a red on the left and green on the right, that alternate (alternating; see Figures 15.1 and 15.3).   The animal would first be trained to peck key A when alternating is presented (with an alternation rate slow enough to make the two dots easily visible as separate events); and to peck key B when the continuously moving light is presented.  The rate of the continuous display would need to match the alternation rate of the alternation display.  As the experiment progresses, the alternation rate is slowly increased just as, in Blough’s experiment, stimulus brightness was slowly decreased.  I very much expect that the animal will at some point change its preference from key A, indicating that it sees the two dots as separate stimuli, to key B, indicating that they look like the continuous stimulus.

The point is that consciousness can perfectly well be studied using methods that require no verbal report – merely a response signaling that sequences are perceived as similar to one thing or the other.  The attempt to interpret phenomena like color-phi in terms of  ‘consciousness’ usually leads to a muddle.  It’s a scientific hindrance rather than a help.

This story of the color-phi problem parallels exactly the history of research on another perceptual phenomenon: color vision.  An early dis­covery was that people sometimes see “red” (for example) when no spectrally red light is present – just as people sometimes see movement when nothing is actually moving (in movies, for example).  Later research expanded on this theme through the study of after-effects, color-contrast and Land effects[14] eventually showing a wide range of disparities between the color seen and the wavelengths present.  The solution to the problem was the discovery of processing mechanisms that define the necessary and sufficient physical-stimulus conditions for a person to report “green,” “red” or any other color.   “Consciousness” forms no part of this account either.

My analysis of the color-phi effect sheds some light on a pseudo-issue  in cognitive psychology and artificial intelligence: the so-called binding problem.  A philosopher describes it this way:

I see the yellow tennis ball.  I see your face and hear what you say.  I see and smell the bouquet of roses.  The binding problem arises by paying attention to how these coherent perceptions arise.  There are specialized sets of neurons that detect different aspects of objects in the visual field.  The color and motion of the ball are detected by different sets of neurons in different areas of the visual cortex…Binding seeing and hearing, or seeing and smelling, is even more complex…The problem is how all this individually processed information can give rise to a unified percept.[15]

What does “unified perception” amount to?  We report a unified percept “cat.”  When confronted with a cat we can say “cat,” can identify different aspects of the cat, can compare this cat to others like it, and so on.  The cognitive assumption is that this requires some sort of unity in the brain: “The answer would be simple if there were a place where all the outputs of all the processors involved delivered their computations at the same time, a faculty of consciousness, as it were.  But…there is no such place.”

There is no such place…Yes, that is correct.  But why on earth should there be?  From a behavioristic point of view, ‘binding’ is a pseudo-problem.  We report continuous movement in the color-phi effect, but nothing moves in the brain.  All we have is a functional equivalence between the brain state produced by a moving dot and the brain state produced by two flashing dots.  The same is surely true for the cat percept.  There is a state (probably a large set of states) that the subject reports as “cat.”   This state can be invoked by the sight of a cat, a sketch of a cat, the sound of a cat, and so on.  We have no idea about the mechanism by which this comes about – perceiving a cat is more complex than perceiving movement of a dot – but there is no difficulty in principle in understanding what is happening.

Why does there seem to be a problem?  Because of a conflation of Domain 1 with Domain 2.  The percept “cat” is real and unified in Domain 1, but that has no bearing on Domain 2, the underlying physiology.  Recall Kinsbourne and Dennett’s erroneous claim that “Conscious experiences are … are clockable and locatable…”  No, they’re not.  Reports, or electrochemical brain events, are “clockable…, etc.” but qualia are not.  The “cat” percept is just one of a very large number of brain states.  It is the one evoked by the sight of a cat, the word “cat,” the sight of dead mouse on the doorstep, etc.  But the phenomenology of that state has no relevance to its physical nature[16] – any more than there needs to be any simple relation between the contents of a book and its Dewey decimal number.   The point is that the brain is (among other things) a classifier.   It classifies the color-phi stimulus and a moving-color-changing stimulus in the same way – they have the same Dewey decimal number.  That’s what it means to say that we “see” two physically different things as the same.  That’s all it means.

The only objective “unity” that corresponds to the phenomenal unity of the percept is that a variety of very different physical stimuli can all yield the percept “cat.”   People show a sophisticated kind of stimulus generalization.  There are simple mechanical equivalents for this.  Associative memories  can “store” a number of patterns and recreate them from partial inputs.  Physicists describe their function in this way:

Associative memory is the “fruit fly” or “Bohr atom” of this field.  It illustrates in about the simplest possible manner the way that collective computation can work.  The basic problem is this: Store a set of p patterns in such a way that when presented with a new pattern , the network responds by producing whichever one of the stored patterns most closely resembles .[17]

Given parts of one of the patterns as input, the network responds with the complete pattern (Plate 15.1).  So, given a picture of a cat, a poem about a cat, cat’s whiskers, or a meeow, the result is the percept “cat.”  But neither cat nor any other percept exists in recognizable form in the network (Domain 2).  Nothing is “bound.”  Nothing needs to be.

Don’t be misled by the fact that in this kind of network, the output looks like the input.  An even simpler network will just activate a particular node when all or part of the target stimulus is presented.  The basic idea is the same.  The network has N stable states; when a stimulus is presented to it, it will go to the state whose prototype, the stored stimulus, is most similar to the presented stimulus.


[1] Dennett & M. Kinsbourne (1992) Time and the observer: The where and when of conscious­ness in the brain. Behavioral and Brain Sciences, 15, 183-247.  P. 186

[2] Op. cit., p. 194

[3] Hayek, F. A., (1979) The counterrevolution of science: studies in the abuse of reason. Indianapolis: Liberty Press (reprint of the 1952 edition), p. 37.  Friedrich Hayek (1899-1992) won the Economics Nobel in 1974 but also made important contributions to sensory psychology.

[4] E.C. Tolman A new formula for behaviorism. Psychological Review, January, 44-5, 1922. P. 44.

[5] Op. cit., p. 187

[6] V. S. Ramachandran  A Brief Tour of Human Consciousness, PI Press, New York, 2004, p. 96.

[7] Op. cit., p. 235

[8] What exactly do you mean by “registered” a critic might reasonably ask?   “Register” just refers to the properties of the internal state.  Another way to pose the problem is to say that we need a model such that the inputs RRRRGGGG and R….G yield the same internal state.

[9] Roger Shepard has discussed the role of evolution in the perceptual process:  Shepard, R. N. (1987) Evolution of a mesh between principles of the mind and regularities of the world.   In J. Dupré (Ed.) The latest on the best: essays on evolution and optimality (Pp. 251-275). Cambridge, MA: Bradford/MIT Press.

[10] But how can you be actually seeing what isn’t there (as opposed to simply reporting what is)?  Answer: that’s what you always do.  Sometimes what you see corresponds more or less closely to physical reality, sometimes it’s an approximation, sometimes it’s wholly imaginary (see, for example, Oliver Sacks’ Hallucinations, Borzoi, 2012).

[11] Hungarian émigré Békésy won the Nobel Prize in Physiology or Medicine in 1961.

[12] Methods of tracing dark adaptation in the pigeon. Blough, Donald S. Science, Vol 121, 1955, 703-704, and

[13] Obviously with a digital display this ‘continuous’ movement will be a succession of separate images.  But if the refresh rate is high enough and the spatial resolution fine enough the movement will appear continuous to any man or animal.


[15] Flanagan, O. (1992) Consciousness reconsidered. Cambridge, MA: MIT/Bradford.

[16] Britain’s Princess Anne, an avid horsewoman, fell at a jump a few years ago and suffered a concussion.  She reported seeing herself from above lying on the ground.  If brain states are “clockable and locatable,” just how high above the ground was Princess Anne?

[17] Hertz, J. A., Krogh, A, & Palmer, R. G. (1989) Neural computation. Reading, MA: Addison-Wesley.  P. 11.

Adaptive Behavior and Learning

This site is about behaviorism, a philosophical movement critical of the idea that the contents of consciousness are the causes of behavior.  The vast, inaccessible ‘dark matter’ of the unconscious is responsible for recollection, creativity and that ‘secret planner’ whose hidden motives sometimes overshadow conscious will.   But early behaviorism went too far in its attempts to simplify.  ‘Thought’ is not just covert speech.  B. F. Skinner’s claim that “Theories of learning are [not] necessary” is absurd.   The new behaviorism proposes simple, testable processes that can summarize the learned and instinctive adaptive behavior of animals and human beings.


The New Behaviorism

Adaptive Behavior and Learning

Where operant conditioning went wrong

Operant conditioning is BF Skinner’s name for instrumental learning, for learning by consequences.  Not a new idea, of course.  Humanity has always known how to teach children and animals by means of reward and punishment.  What gave Skinner’s label the edge was his invention of a brilliant method of studying this kind of learning in individual organisms.  The Skinner box and the cumulative recorder  were an unbeatable duo.

Three  things have prevented the study of operant conditioning from developing as it might have: a limitation of the method, over-valuing order and distrust of theory.

The method.  The cumulative record was a fantastic breakthrough in one respect: it allowed the study of the behavior of a single animal to be studied in real time.  Until Skinner, the data of animal psychology consisted largely of group averages – how many animals in group X or Y turned left vs. right in maze, for example.  And not only were individual animals lost in the group, so were the actual times – how long did the rat in the maze take to decide, how fast did it run?  What did it explore before deciding?

But the Skinner-box setup is also limited – to a single response and to changes in its rate of occurrence.  Operant conditioning involves selection from a repertoire of activities: the trial bit of trial-and-error.  The Skinner-box method encourages the study of just one or two already-learned responses.  Of the repertoire, that set of possible responses emitted for “other reasons” – of all those possible modes of behavior lurking below threshold but available to be selected – of those covert responses, so essential to instrumental learning, there is no mention.

Too much order? The second problem is an unexamined respect for what might be called “order at any price”.  Fred Skinner frequently quoted Pavlov: “control your conditions and you will see order.”   But he never said just why “order” in and of itself is desirable.

The easiest way to get order, to reduce variation, is to of course take an average.  Skinnerian experiments involve single animals, so the method discourages averaging across animals.  But why not average all those pecks?  Averaging responses was further encouraged by Skinner’s emphasis on probability of response as the proper dependent variable for psychology.   So the most widely used datum in operant psychology is response rate, the number of responses that occur over a time period of minutes or hours.

Another way to reduce variability is negative feedback.  A thermostatically controlled HVAC system reduces the variation in house temperature.  Any kind of negative feedback will reduce variation in the controlled variable.  Operant conditioning, almost by definition, involves feedback.  The more the organism responds, the more reward it gets – subject to the constraints of whatever reinforcement schedule is in effect.  This is positive feedback.  But the most-studied operant choice procedure – concurrent variable-interval schedule – also  involves negative feedback.  When the choice is between two variable-interval schedules, the more time is spent on one choice the higher the  payoff probability for switching to the other.   So no matter the difference in payoff rates for the choices, the organism will never just fixate on one.

As technology advanced, these two things converged: the desire for order, enabled by averaging and negative feedback, and Skinner’s idea that response probability is an appropriate – the appropriate – dependent variable.  Variable-interval schedules either singly or in two-choice situations, became  a kind of measuring device.  Response rate on VI is steady – no waits, pauses or sudden spikes.  It seemed to offer a simple and direct way to measure response probability.    From response rate as response probability to the theoretical idea of rate as somehow equivalent to response strength was but a short step.

Theory Response strength is a theoretical construct.  It goes well beyond response rate or indeed any other directly measureable quantity.  Unfortunately, most people think they know what they mean by “strength”.  The  Skinnerian tradition made it difficult to see that more is needed.

A landmark 1961 study by George Reynolds illustrates the problem (although George never saw it in this way).   Here is a simplified version:  Imagine two experimental conditions and two identical pigeons.  Each condition runs for several daily sessions.  In Condition A, pigeon A pecks a red key for food reward delivered on a VI 30-s schedule.  In Condition B, pigeon B pecks a green key for food reward delivered on a VI 15-s schedule.  Because both food rates are relatively high, after lengthy exposure to the procedure, the pigeons will be pecking at a high rate in both cases: response rates – hence ‘strengths’ – will be roughly the same.  Now change the procedure for both pigeons.  Instead of a single schedule, two schedules alternate, for a minute or so each, across a one-hour experimental session.  The added, second schedule is the same for both pigeons: VI 15 s, signaled by a yellow key (alternating two signaled schedules in this way is called a multiple schedule).  Thus, pigeon A is on a mult VI 30 VI 15 (red and yellow stimuli) and pigeon B on a mult VI 15 VI 15 (green and yellow stimuli).  In summary, the two experimental conditions are (stimulus colors above):

Experiment A:  VI 30 (Red), mult VI 30 (Red) VI 15 (Yellow)

Experiment B:   VI 15 (Green), mult VI 15 (Green) VI 15 (Yellow)

Now look at the second condition for each pigeon.  Unsurprisingly, B’s response rate in green will not change.  All that that has changed for him is the key color – from green all the time to green and yellow alternating, both with the same payoff.  But A’s response rate in red, the VI 30 stimulus, will be much depressed, and response rate in yellow for A will be considerably higher than B’s yellow response rate, even though the VI 15-s schedule is the same in both.  The effect on responding in the yellow stimulus by pigeon A, an increase in response rate when a given schedule is alternated with a leaner one, is called positive behavioral contrast and the rate decrease in the leaner schedule for pigeon A is negative contrast.

The obvious conclusion is that response rate alone is inadequate as a description of the ‘strength’ of an operant response.  The steady rate maintained by VI schedules is misleading.  It looks like a simple measure of strength.  Because of Skinner’s emphasis on order, because the  averaged-response and feedback-rich variable-interval schedule seemed to provide it and because it was easy to equate response probability with response rate, the idea took root.  Yet even in the 1950s, it was well known that response rate can itself be manipulated – by so-called differential-reinforcement-of-low-rate (DRL) schedules, for example.

Conclusion: response rate does not equal response strength; hence our emphasis on rate may be a mistake.  If the strength idea is to survive the demise of rate as its best measure, something more is needed: a theory about the factors that control an operant response.  But because Skinner had successfully proclaimed that theories of learning are not necessary, real theory was not forthcoming for many years.