Experimental evidence

“In this heyday of scientism, all sorts of experiments… are performed to back up every conceivable view of education, and people simply cite the ones that confirm their prior beliefs and ignore the rest. Hence they are asking other people to abandon their opinions in deference to a type of ‘evidence’ which they themselves would (quite rightly) not pay a moment’s attention to if it had gone the other way.”
– David Deutsch

From the archives: Posted on 19th June 1997

Ian Cruickshank wrote:

“How about this for some evidence?”

[About how rewards and incentives impair creativity.]

I downloaded it. I read it carefully. Unsurprisingly, it contains no evidence whatsoever that bears on the issue of how creativity is affected by “rewards and incentives”, or anything else.

“Research done at MIT confirms that the use of rewards and incentives deadens creativity, and that intrinsic motivation as such works much better in letting people develop new and innovative solutions.”

It does not “confirm” anything of the sort. The experiments described are (like almost all experiments purporting to establish anything significant in educational theory) fundamentally and irreparably flawed because there is no way of identifying or measuring the quantities (such as creativity and motivation) about which they draw conclusions, no way of making the experiments “double-blind”, and more generally no way of avoiding arbitrarily large systematic errors.

Any one of those flaws alone would make the results worthless as scientific evidence. However, that over-states the value of these particular “studies”, or at least, the cited report of them, for it is so untroubled about elementary sources of bias and error, and hence so blatantly tendentious, as to be ludicrous, and indeed I did laugh out loud when reading it. Normally I would not waste time criticising such tacky nonsense, but I will do so here, because the issue of experimentation comes up so often on this List [the Taking Children Seriously email-based discussion forum], and I have a feeling that many subscribers don’t quite understand it yet—so perhaps a specific example will help. However, I must give a warning at the outset: Taking these “studies” seriously in the way I am about to do will seem to attribute some substance to them, while in fact they have none. I am choosing only a few criticisms for the purposes of illustration; this is not intended as a point-by-point rebuttal.

I should also say, to avoid any misunderstanding, that I believe that the conclusions of these “studies” are largely true.

One of the “studies”, by Teresa Amabile of Brandeis University, purported to show that creative work tends to be done better when it is done for intrinsic reasons rather than for external rewards. OK, this proposition is obvious, but how do you test it experimentally?

Well—this is what she did: She took a sample of 72 students and asked them each to write a poem. Then she divided them randomly into three groups of 24, and applied intrinsic motivations to one group (how on earth did she do that? I’ll come to that below), extrinsic motivations to the second, and did nothing special to the third (control) group. Then she asked them each to write a second poem. The poems were then rated for quality and creativity by a panel of 12 “independent” poets. The results were that (1) the extrinsically motivated group’s second poems were rated worse and less creative than the intrinsically motivated group’s second poems, and (2) the extrinsically motivated group’s second poems were worse and less creative than their first poems.

Does it follow from this result that extrinsic motivation harmed the students’ creativity? Certainly not. To see why, consider this question first: why did the students write their first poem? All 72 of them did so, but why? No extrinsic or intrinsic motivations were supposedly applied by the experimenters. So the students must already have been motivated to some degree. Let’s call this their “base level” of motivation. This base level itself consists of two components, intrinsic motivations and extrinsic ones, in some combination. An example of an intrinsic motivation would be a love of poetry-writing so great that one is willing to take any excuse to write a poem at the drop of a hat. An example of an extrinsic motivation would be that one has agreed to participate in a psychology experiment, and is unwilling to ruin it by dropping out halfway through, no matter how boring it gets.

Since it is not possible to measure the psychological pressures that a student might already be subject to—for instance whether his parents’ approval of him as a person does or does not depend on whether they perceive him as being a poet—it is not possible to determine experimentally which of the students have predominantly intrinsic base-level motivations and which have predominantly extrinsic ones.

In particular, many of them might be extrinsically motivated.

Now it could be that applying any total amount of extrinsic motivation always increases creativity compared with not applying it, but that the increase in creativity depends in the following way on the extrinsic motivation applied: for small amounts of extrinsic motivation, creativity increases rapidly with extrinsic motivation. At some optimal level of extrinsic motivation, the creativity enhancement is greatest, and then it falls off slightly, but remains large and positive no matter how much extrinsic motivation is applied.

In that case, if many of the students have high base levels of extrinsic motivation, adding more extrinsic motivation would then make their creativity lower than the optimum, though still higher than any level that purely intrinsic motivation could provide. The experiment would in fact be detecting a smaller, but still positive effect on creativity at higher levels of extrinsic motivation, but Teresa Amabile would falsely interpret this as a negative effect.

Therefore the results of the experiment are perfectly compatible with extrinsic motivation being superior in every case, and in all possible amounts to purely intrinsic motivation.

In summary: this experiment cannot resolve the issue of whether extrinsic motivation reduces or increases creativity, because it is impossible to measure the ratio of extrinsic to intrinsic base-level motivation. There is no way of allowing for our ignorance of that ratio, so this experiment (like every other experiment purporting to resolve the same issue) is fatally flawed. So we should really stop here. But let’s press on anyway.

Let’s pretend that all the students were intrinsically motivated to begin with. (This is a most unrealistic assumption, given the overwhelming degree of educational coercion that is normal in our society. Moreover, few genuine poets spontaneously write more than a few poems per year, so it is almost beyond belief that the base-level motivation that caused all these students to write two poems on the spot, on request, really was intrinsic. But let’s pretend that it was.)

Now can we interpret the results of the experiment as evidence that extrinsic motivation impairs creativity? Not at all. The results are still perfectly consistent with extrinsic motivation uniformly increasing creativity. For instance, it could be that after being extrinsically motivated, some of the students dash off a quick, mediocre poem just to be sure of completing one as they have agreed to, but then apply their increased creativity to increased daydreaming about an open-ended project they were doing anyway, but which they can’t be sure of completing within the short period of the experiment. Again, there is an insuperable problem of principle here: there is no way of measuring what proportion of a student’s creativity is being devoted to the prescribed task, or how that proportion depends on changes in his overall creativity. So again, the discussion should stop here.

But let me turn instead to the issue of controls, which in genuinely scientific experiments would involve a “blind trial” procedure. This is needed to protect against biases of the following type: an experimental subject’s behaviour can be strongly affected by his own expectations of what the outcome will be. These expectations can be influenced by the experimenters: if they are expecting him to perform well, or badly, at the given task, and these expectations are conveyed to him, then he may well conform to those expectations. So in this experiment, one problem would be that the experimenters are expecting the extrinsically motivated students to write worse, less creative poetry. How are they to prevent this expectation from getting through to the students and discouraging the extrinsically motivated ones? How are they going to apply extrinsic motivations in such a way that the student cannot detect and absorb their deep contempt for such a procedure? The only way of excluding that possibility is to have the experiment conducted by people who believe that extrinsic motivation increases creativity. But of course, the conduct of the experiment is only one way in which the expectation could be conveyed to the students. A far more likely way is that the students already have that expectation. I know I do. But in a truly blind trial, the subjects themselves must not know which of the two groups they are in, and in this experiment, it is inherently impossible to keep the students ignorant of what motivations they are being offered. Hence it is impossible to distinguish between having detected the purported effect, and merely having detected that the students expect the purported effect.

Furthermore, the students have a vested interest in the experiment providing “evidence” against extrinsic motivations. They are students, after all. Extrinsic motivations are the bane of their lives. If they have the slightest genuine interest left in their subjects, they yearn to follow those interests and not be continually manipulated into performing like circus animals. What precautions could ensure that these students gain no inkling of what the experiment is about, or what the socio-political implications of its possible outcomes are? I can’t think of any that would be effective. Certainly, if I were asked to create two works under the conditions of this experiment (and assuming that, for some reason that I currently can’t imagine, I agreed to participate), I would make damned sure that the “intrinsically” motivated work was markedly superior. (This would, by the way, be an extrinsic motivation—and who could detect its presence?) And if the students are too intimidated by academic authority to do this consciously, how could one measure what proportion of them did it unconsciously? One could not. This factor alone irretrievably biases the experiment.

Then there is the issue of measurement. How do you measure the quality of a piece of poetry and the creativity that went into it? Teresa Amabile asked the opinions of twelve “independent poets”. Let’s set aside the issue of whether their opinion provides a good measure of the quality or creativity in a poem. Suppose it does. Now, if the independent poets examine a poem and find that the creativity displayed in it is lower than that of another poem previously written by the same student, can we infer that the student’s poetic creativity has gone down? Of course not. It might have gone up. For example, the student might previously have been striving for something less. Now, with his additional, reward-induced creativity, he might be striving for something sublime—and hence failing, though a week after the experiment ends he will finally achieve it. Not every work of a creative person is of high quality, for creativity is invariably accompanied by risk, and the greater the creativity the greater the risk. Quite generally, we cannot measure how the creativity displayed in the poem is related to the creativity of the poet, and indeed there could easily be a negative correlation under the given conditions.

OK. I’ve only given a sample of the irreparable methodological flaws in this experiment—and that’s only one of several experiments in the reference. I won’t bother with the rest. But I will mention the thing that made me laugh out loud.

How do you think they applied “intrinsic” and “extrinsic” motivations to their two groups of students? Here is the method:

Some students then were given a list of extrinsic (external) reasons for writing, such as impressing teachers, making money and getting into graduate school, and were asked to think about their own writing with respect to these reasons. Others were given a list of intrinsic reasons: the enjoyment of playing with words, satisfaction from self-expression, and so forth. A third group was not given any list. All were then asked to do more writing.

In other words, the students were not actually given different motivations, but were merely asked to imagine them! This is rather like trying to measure the effects of post-traumatic stress disorder by asking people to imagine that they have been in an accident. Except that it is worse than that because, as I explained above, there was almost certainly a high degree of extrinsic motivation already present in addition to whatever slight additional intrinsic motivation might be caused by dwelling on one’s base-level intrinsic motivations.

And anyway, how can we measure whether dwelling on one’s base-level intrinsic/extrinsic motivations does indeed produce an additional amount of the same type of motivation, as this “study” assumes? Couldn’t it just as easily go the other way: when you dwell on how much you love writing poetry, then the anguish of the situation you are in as a creative writing student, of having to write poems to order, becomes more acute, and you feel the extrinsic spur more keenly than usual?

And so on.

All right, I’ll stop now.

* * *

While this “research” provides no evidence bearing on its conclusion, it does, as usual, contain evidence of the prior philosophical beliefs of its authors (which are, creditably, anti-behaviourist). The same is true of Ian Cruikshanck’s posting: For instance, his pejorative use of the term “incentive”, and the socialist jargon “alienation” and “exploitation”, indicate a left wing political stance and a commitment to the associated false economic theories.

One final comment. Appealing to “evidence” of this sort is not only invalid, it is usually intellectually dishonest as well. In this heyday of scientism, all sorts of experiments of this type are performed to back up every conceivable view of education, and people simply cite the ones that confirm their prior beliefs and ignore the rest. Hence they are asking other people to abandon their opinions in deference to a type of “evidence” which they themselves would (quite rightly) not pay a moment’s attention to if it had gone the other way.

Posted on 20th June 1997 1044:

I had written:

“I downloaded it. I read it carefully. Unsurprisingly, it contains no evidence whatsoever that bears on the issue of how creativity is affected by ‘rewards and incentives’, or anything else.”

Ian Cruikshank replied:

“You use the term ‘evidence’ strangely.”

I do not.

“There is no evidence for anything!”

False. There is, for instance, scientific evidence that smoking is harmful.

“Where is your double-blind proof that coercion is harmful? Nowhere.”

True.

“No proof would satisfy anyone”

No proof would, because these issues are not amenable to proof, any more than they are amenable to scientific evidence. However, there are other forms of rational argument that can take us closer to the truth. Philosophical reasoning is the general term for them, and we are now engaging in it.

“who didn’t already have a memetic tendency towards accepting your tilted propaganda.”

The idea that one only accepts arguments that one’s culture primes one to accept is known as the Myth of the Framework. It is false, and for a comprehensive refutation of it, and exploration of some of its dangers, I recommend Karl Popper’s book of the same name.

I should also say, to avoid any misunderstanding, that I believe that the conclusions of these “studies” are largely true.

“Right here, you prove the point just expressed above. Who cares about evidence, to paraphrase you.”

I agree. Very few people actually care about purported evidence of this type. Nor should they. However, very many people (such as you) cite it where it supports their beliefs. As I have said, this is intellectually dishonest, and I invite you to reconsider your practice of doing so.

I had said:

“One of the “studies”, by Teresa Amabile of Brandeis University, purported to show that creative work tends to be done better when it is done for intrinsic reasons rather than for external rewards. OK, this proposition is obvious, but how do you test it experimentally?”

Ian Cruikshank asked:

“How do you mean, “obvious”? Obviously true?”

Yes.

“It isn’t obvious. It’s a question you can prove best by asking yourself such things as, ‘Would I rather get up each day to be someone’s part-time slave for money, so I can then speak the language of the other peasants around the world, and get my own needs met? Or, would I rather do what I want to do, for no other reason than because I want to do it, with no one else having to give me sanction, nor me needing to ingratiate myself to anyone else?’ I.e., introspective evidence. The majority of this world’s people hate their jobs, and would not perform them were they not COERCED.”

OK. What you call “proof” and “evidence” here are more usually called “philosophical argument”. However, the argument you give is not the best one, for at least two reasons. First, it requires one already to believe that employees are in fact “part-time slaves for money”, which is false, and secondly, it is not available to the minority who love their jobs.

I had said:

“One final comment. Appealing to ‘evidence’ of this sort is not only invalid, it is usually intellectually dishonest as well. In this heyday of scientism, all sorts of experiments of this type are performed to back up every conceivable view of education, and people simply cite the ones that confirm their prior beliefs and ignore the rest. Hence they are asking other people to abandon their opinions in deference to a type of ‘evidence’ which they themselves would (quite rightly) not pay a moment’s attention to if it had gone the other way.”

Ian replied:

“‘Invalid’ is a term from aristotelian logic. As such, it means nothing to me.”

Yes.

“Long ago, it did. And long ago, it would have been appropriate. Meanwhile, Aristotle died thousands of years ago (who cares how many), and his conclusions mean absolutely nothing to me here-now.”

Therefore you have not abandoned the notion of ‘invalid’. You have simply changed the criterion of validity from the usual one (namely, that the conclusions necessarily follow from the premises) to a new one that you prefer (namely that the original proposer of the criterion should still be alive). Well, one problem with your proposed criterion is that once you adopt it, your conclusions will tend not to follow from your premises.

“Intellectual honesty, a.k.a. ‘integrity’ is another of those terms from aristotle’s ‘heyday’. What does it mean exactly? That one shouldn’t espouse an idea unless you believe in it?”

Among other things, yes.

“I have trouble living with such ‘integrity’.”

Yes. That’s what I was pointing out.

“My mind races in all directions around a variety of ideas, and I express them in a stream-of-consciousness. I advise you all, do not think someone believes something just because they espouse it.”

Good advice. You are criticising me for following it.

“People change constantly and they say one thing in one context and another in a different context.”

Not all people.

“Where is this elusive ‘integrity’ which people seek?”

Here.

“I’m not in possession of it.”

So you said.

See also:

David Deutsch, 1997, ‘Experimental evidence’, https://takingchildrenseriously.com/experimental-evidence