“Religious Children Are Meaner than Their Secular Counterparts,” proclaimed a headline in the Guardian. “Religious Kids Are Jerks,” raved the Daily Beast. Hundreds of other newspapers and blogs touted similar articles: The Economist, Forbes, Good Housekeeping, the LA Times, The Independent. All these articles were based on a four-and-a-half-page research note in Current Biology by University of Chicago professor Jean Decety and six other scholars. But what is the evidence behind these claims? Does it match previous research? Is it worth all the hype?
My analysis of the article demonstrates that the project was poorly constructed and the data analysis sloppy. The authors do virtually nothing to test alternative explanations or mitigate the flaws in their research design. The result contradicts the vast majority of other research on the topic. The authors extrapolate well beyond what the data show, and reporters extrapolate beyond even what the authors claim. However, to make the problems clear for those not trained in statistics and not familiar with the previous research on the topic takes some space.
The authors ran an experiment with a non-random sample of 1,170 children in six countries (Canada, China, Jordan, Turkey, USA, and South Africa). In the main experiment, the authors gave each child 30 stickers and allowed them to pick the 10 they wanted to keep. Researchers then told the child that they lacked time to run the experiment with other children, but if the child gave up some of their 10 chosen stickers, the researchers would give those stickers to another child. The researchers then counted the number of stickers each student gave back as a measure of how “altruistic” the children were. The researchers then interviewed a parent of each child and asked the parent a couple questions about the parent’s religion. The researchers decided whether or not they considered the parent religious, and then applied their religious designation to the child. The children (five to 12 years old) were not asked about their own religiousness. The researchers then compared the number of stickers given away by “religious” and “non-religious” children and found that on average “non-religious” children gave away more stickers (or actually 86 percent of a sticker more). Yes, the global media campaign is about a fraction of a sticker. Despite the huge diversity of people in their cross-national sample (e.g., Canada and Jordan) and the many factors that influence the generosity of children in such diverse contexts (e.g., poverty), the researchers assumed that the only difference between the “religious” and “non-religious” children was being religious.1 This is a big assumption.
Yes, the global media campaign is about a fraction of a sticker.
In the second experiment the researchers showed the children a series of scenarios involving one child pushing another and other types of “interpersonal harm.” The religious students judged the behaviors as more “mean” than the non-religious children. Muslim children recommended a harsher punishment for the bad behavior than non-religious children. The punishments Christian children recommended were indistinguishable from non-religious children.
However, in the conclusion of the article and in most media reports, the various authors claim that “religious” children were “meaner,” “harsher,” or “more vindictive” without qualifying that only Muslim children were (if we assume the problematic sample applies to Muslim children in general). The researchers did not interpret the religious children’s concern for people who were pushed or hurt by another child as a sign of altruism, but as vindictiveness. The authors do not adjust their evaluations of the severity of the punishments to account for the children’s interpretation of the severity of the offenses: if you don’t think pushing someone is bad, you obviously won’t want a strong punishment for it. Nor do they interpret the Christians as being merciful for simultaneously both thinking harming another student was meaner than non-religious students thought, yet calling for equally mild punishments as non-religious students did.
In the third experiment, researchers asked parents how empathetic their child is and how sensitive to injustice. Religious parents rated their children as more empathetic and sensitive to injustice than non-religious parents. The researchers interpret this as parental blindness—assuming the sticker experiment better captures the empathy and sensitivity of the children than either the child’s concern for children who are shoved, or a lifetime of parental experience. Alternatively, the researchers could have asked a teacher or other students about the empathy and sensitivity of the children (as outside corroboration), but they did not.
In both the first and third experiments, a difference between religious and non-religious children remains after controlling for age, country, and a rough measure of mothers’ education.2 The effect of religion on sticker-giving presumably becomes smaller with these controls, but it is hard to determine how much smaller because the authors switch to standardized coefficients without providing standard deviations. Thus, we do not know what “a standard deviation change in religious identity has a -.15 standard deviation change in sticker-giving” means in terms of stickers and cannot translate the coefficients back into an understandable unit. Before adding the controls, religious children gave 86 percent of a sticker less; maybe after controls religious children now give 20 percent of a sticker less, but in either case it is a small amount.
In both the conclusion and in the popular articles, the authors and reporters only refer to differences between religious and non-religious people without any controls, and none of the articles I read clarified the small differences we are talking about—less than one sticker. The authors do not state whether, in the second experiment, the difference in “vindictiveness” between “religious” and “non-religious” children remains significant after controlling for age, country, and mother’s education—so presumably it does not.
The researchers interpreted these three experiments as indicating that religious people think they are more helpful, when in fact they are actually “less helpful” and “more punitive.” In interviews with reporters, Decety explains that if people think they are more moral, they give themselves permission to be more immoral. Thus, thinking you are moral is detrimental. Decety also claims his research shows that secularization is good: “…secularization of moral discourse does not reduce human kindness. In fact, it does just the opposite.” Both claims are rather broad and not well supported by the data. We do not know if the children think they are more moral, only that their parents think they are more empathetic and sensitive to injustice. We do not know why the religious children gave away fewer stickers (or even if the association is causal), let alone that they acted less “morally” because they think they are more moral. Nor did the researchers do any investigation about the effect of secularization on kindness.
Popular articles extrapolate even further. An article in The Mirror claims that “Children of atheists are kinder and more tolerant”—although it is unlikely that the 28 percent of parents in the sample coded as “non-religious” are all atheists. An article in Forbes claims the research demonstrates that religious people are “less moral” and that “History backs-up the scientific evidence that secular people are more moral.” I guess a fraction of a sticker outweighs Hitler and Stalin, but who’s counting?
Most popular press accounts assumed the association is causal. None of the dozens of popular press accounts I read (other than that of Science Magazine) mentioned any of the previous research on the topic or interviewed a scholar who had a different point of view. All popular accounts I read (other than two private blog posts) were laudatory, often to the point of breathlessness. Clearly the research said something that a lot of reporters wanted to hear, and spread. And clearly almost none of them were willing to do a simple Google search to see what previous research said, or interview any of the dozens of scholars who specialize in this area.
So, how do we evaluate whether this research is worth taking seriously? Here are six questions to guide our evaluation.
1) Does the research adequately deal with and explain previous literature? No.
There are dozens of articles and books about the relationship between religion and altruism. The vast majority of this research shows that religious people are more altruistic than non-religious people. Much of this literature is based on self-report, but some is based on unobtrusive observation. Much of this research also comes from high-quality, random samples. However, there is some complexity in the evidence about the relationship between religion and altruism, so we need to look at the evidence by type.
First, as both the article and most media reports affirm, there is a widespread popular belief that religious people are more helpful. However, Decety and colleagues dismiss this evidence out of hand. This implies that most people are stupid. If religious people were in fact significantly less generous than secular people, the popular perception that the reverse is true would be hard to sustain—especially for people who interact with them regularly (as opposed to academics and reporters who typically do not).
Second, survey research consistently finds that religious people give more time and money to both religious and non-religious causes, both in formal and informal settings. Most of this evidence is based on self-report. Therefore, Decety and colleagues suggest that the association is caused entirely by social desirability bias—that is, highly religious people exaggerating how helpful they are more than non-religious people exaggerating how helpful they are. Some social desirability bias is plausible, but neither Decety and colleagues, nor the one article on the topic they cite, give any concrete evidence that the association between religion and self-reported helping behavior is caused entirely by social desirability bias. They assume it is. This is a strong assumption. Some of the survey-based research on altruism even attempts to measure and control for social desirability bias—yet still finds an association between religion and helping behavior. The type of religion people follow and their motivation for being religious also predict helping behavior. Clearly, if all these associations are completely caused by social desirability bias, survey research of all kinds is in deep trouble. It is hard to think of an interesting research project in which some of the responses are not more socially desirable to some respondents. Do the authors assume all survey research is pointless, or just the results they don’t like?
Third, laboratory studies based on games typically find either no relationship between religiosity and giving, or a weak positive relationship. They also find that religious cues increase giving behavior. Generally these studies are done with college students, often students from psychology or economics classes, and often in Europe. Little is known about whether or not behavior in these experimental games matches people’s altruistic behavior in real life, or if undergraduate psychology majors behave similarly to other people. Game situations may alter behavior—for example, we all know people who love violent video games and happily kill people on screen, but who are not unusually violent in real life. Moreover, since these types of games are used so often in psychology classes, it is unclear whether or not students have read about them before and know the purpose of the game while they are playing it. Even if we assume that games played in a laboratory perfectly capture how everyone acts in the real world (which I do not), laboratory-based experiments do not suggest a negative relationship between religion and altruism, just a neutral or weak positive relationship.
No line of research suggests a negative relationship between religion and helping behavior.
Finally, and most convincing to me, unobtrusive observation of real-life behavior suggests a positive relationship between religion and helping behavior both at the societal level and the individual level. This research also suggests that Christians, particularly Protestants, are more likely to be involved in institutional helping behavior. For example, in Japan virtually all the voluntary work with homeless people is done by religious organizations, the vast majority of which are Christian, despite the fact that Christians are a tiny minority in the country. Similarly, in countries like the United States, the vast majority of voluntary humanitarian organizations, private schools and so on were set up by religious groups/people. This would be unlikely if religious people were less generous with their time and money than the non-religious.
We see a similar pattern on an individual level. For example, when academics conduct surveys, they often ask interviewers to evaluate how friendly and cooperative the respondents were. As part of my master’s thesis, I analyzed every survey I could find that collected this type of information. I found that interviewers rated highly religious people as being significantly more helpful and cooperative than non-religious people, and that those who had to be convinced to participate in the survey in a follow-up attempt were significantly less religious than those who agreed to participate from the beginning. This suggests that in ordinary life, religious people are more generous with their time than non-religious people. Moreover, because survey respondents are contacted in isolation, religious people’s greater helpfulness in this realm is not caused by their relational networks, greater social pressure, or higher likelihood of being asked to volunteer and give money compared to non-religious people.
Thus, most non-laboratory research suggests a strong positive association between religion and helping behavior, and game-based laboratory research suggests a neutral or weakly positive association between religion and helping behavior, and a positive effect of religious cues. No line of research suggests a negative relationship between religion and helping behavior. The research by Decety and colleagues is clearly an outlier, and if reporters had cared to interview anyone who does research in this area, these scholars likely would have told them so.
2) Is the article in an appropriate, peer-reviewed journal, where scholars are likely to have been able to catch the major flaws? No.
The article is published in a biology journal (Current Biology), despite the fact that the article does not focus on anything biological, and none of the authors are biologists. This seems odd. Perhaps publishing the article in a biology journal allowed them to avoid getting reviewers who know the literature on religion and altruism, and who would likely force the authors to do a better job: for example, measure religion well, add sufficient controls for plausible alternative explanations, and at least deal with the previous literature on the topic. Basing an article primarily on t-tests from a non-random, heterogeneous sample may be acceptable in biology, but I haven’t seen a peer-reviewed statistical article like this published in a reputable social science journal since the advent of personal computers (after which scholars did not have to calculate statistics by hand).3
3) Do the authors use a representative sample of the groups they are studying? No.
Both the academic article, and the popular articles based on it, talk about religious children and non-religious children in general, but the authors did not sample children in a way that allows them to generalize to religious and non-religious children. Given the serious problems with the sample, we do not know who the results generalize to.
Nothing about the sample is random.
The authors picked six countries non-randomly (Canada, China, Jordan, Turkey, USA, and South Africa), picked one or two cities from each of these countries non-randomly, and then recruited respondents non-randomly. Nothing about the sample is random, thus the results cannot be generalized to any group, not even not even to religious and non-religious children in the seven sampled cities. Moreover, there are many ways this sampling method is likely to bias results towards religious children appearing less altruistic. For example, if you recruit a religious child from a South African slum and a non-religious child from the family of a University of Toronto professor, you are likely to find some differences between the children that have nothing to do with religion.
Because the sample is not random, all generalizations from their sample and all significance tests using their sample are meaningless.4 Any first-year statistics textbook will tell you this. When research based on random samples exists, we should always privilege results from random samples over non-random convenience samples. And random samples consistently suggest a positive association between religion and altruism.
4) Do the authors do sufficient work to demonstrate that the relationship between religion and giving behavior is causal? No.
Even in good samples, correlation does not prove causation. But with a badly biased sample, even more effort is required to demonstrate that a correlation is plausibly causal. Unfortunately, the authors do not even go to the effort I would require in an undergraduate statistics class.
Of course, demonstrating causality is difficult. The authors cannot randomly assign religious background to children and then see if religion causes differences in altruistic behavior. Thus, social scientists typically try to account for as many alternative explanations as possible, to demonstrate that the association between religion and giving behavior is not caused by something else. Past research on altruism demonstrates that many factors predict giving behavior, but the authors do not control for them. If any of these omitted factors is correlated both with religiosity and with the giving behavior of children, or is correlated with which religious and non-religious people are sampled, then the relationship between religion and giving in the authors’ analysis will be biased.
For example, both wealth and trust can influence giving. If children from wealthier backgrounds have more access to stickers than children from poor backgrounds, this makes stickers less valuable to wealthy children than poor children, on average. Giving stickers to other children is less costly for wealthy children than for poor children. Similarly, in contexts of high trust, low corruption, and low violence, people generally trust “the system” more. A child from a high-trust context may trust an unknown researcher to give the sticker-gift to another child more than children from low-trust environments. If, in the sample, “non-religious” children disproportionately come from wealthy, privileged families and live in high-trust environments relative to the religious children, this will create a spurious negative association between religion and giving. But religion is not reducing giving; poverty and distrust are.
Problematically, it seems likely that the authors coded many more Canadians as “non-religious,” and more Jordanians and South Africans as “religious.” But Canadian children are also typically wealthier and trust strangers more than Jordanian and South African children. Similarly, if we think about the university contexts where the samples were taken, it seems likely that the authors sampled wealthier, high-status “non-religious people” and poorer, lower-status “religious people.” I discuss this problem more in the next section.
5) Is the statistical analysis rigorous and appropriate? Is it plausible that a different religious upbringing is the only thing that makes stickers more valuable to some of the children than other children? No.
I cannot remember the last time I saw a published statistical research article in a peer-reviewed social science journal, using a heterogeneous sample, that was based primarily on t-tests (which assume the only relevant difference between the religious and non-religious children in the sample is their religion). If we compare the sticker-giving of poor Christian children from a South African slum and of a wealthy non-religious child in a Toronto suburb, is it plausible to think the only difference between them is their religion? No. But both the authors and journalists focus on the comparison between the religious and the non-religious without any controls (this comparison assumes the two groups are identical in every other way). Because there are more non-religious people in Canada than South Africa or Jordan, and wealth probably influences how valuable stickers are to children, carefully controlling for country and socio-economic status (SES) is crucial. The authors do some of this, but in a weak and misleading way.
The authors back up only two of their t-tests with OLS regressions. In these regressions, they only control for age, country, and what they misleadingly label “SES.” However, the only measure of SES they use is a rough measure of mother’s education (simplified to six categories), and they never mention this in the text. I had to search their supplemental material to find their measure of “SES.”5 But is control for mother’s education (in six categories) sufficient to equalize the socio-economic status of all children? I don’t think so. That implies, for example, that every child whose mother has a high school degree has the same access to resources as every other child whose mother has a high school degree, regardless of income, wealth, father’s education, race, parental marital status, etc. I doubt the authors think mother’s education fully accounts for SES either, or they would not have hidden the measure in an online appendix. It takes six words to say, “We measured SES using mother’s education.” It takes one word to transparently label a coefficient as “Mother’s Education” rather than “SES.” Not a lot. But either admission would have raised red flags.
Think of it this way: if you are wealthy and go to a well-financed school, you may have hundreds of stickers at home and get more regularly. Thus, stickers are not particularly valuable. It is easier to give stickers away because you can easily get more. Alternatively, if you come from a poor single-parent family and attend a poorly financed school, you may rarely get stickers. This makes stickers much more valuable to you and makes giving them away harder. If two children have an equal amount of altruism, on average, the child who has easy access to stickers is likely to give away more stickers than the identical child who has little access to stickers.
Now think about how this might work in the sample we are discussing. Presumably the University of Chicago team recruited people close to the university. Imagine they recruited two eight-year-olds, one named Gwyneth and the other Kanisha. Gwyneth is European-American and attends the Laboratory School, an elite private school with lots of resources. Her father is a physics professor at the University of Chicago and makes a large salary. Her mom earned a B.A. from Harvard, and works at the Chicago Art Museum. Both parents came from wealthy, well-educated families and are not religious.
Kanisha lives 10 blocks away from Gwyneth, but in a government housing project in a South Side slum. Kanisha is African-American and attends a struggling public school with few resources. Her mom is a single parent, who attended a local community college in the evenings and recently graduated with a degree in social work, but still works as a waitress at Denny’s and is struggling financially. Kanisha and her mom attend a local AME church every week.
In the regression the authors published, they assume Gwyneth and Kanisha are identical—that is, that the only relevant difference between them is their religious identity. Both children are eight years old, both live in the United States, and both have a mother with a B.A.—thus the authors assume the children have identical socio-economic status, that stickers are equally valuable to both of them, and that the only cause of differences in how many stickers they give away is their religion. But it is hard to believe that stickers are equally plentiful in both their homes and both their schools. It is also unlikely both are equally trusting that “the system” will work fairly for them, or that an unknown stranger will actually give the stickers to another child. Even if Kanisha’s religiosity increases her generosity relative to other similar children, this increase may be insufficient to overcome the differences in wealth and generalized trust between her and Gwyneth.
To check the authors’ results, and see if poor planning prevented the authors from measuring any other aspect of SES (because they did not collect that data), I asked the authors if I could have a copy of their questionnaire and replication data. So far they have not responded. Sometimes scholars keep data private for a while so that they can finish more publications from it before others have access. However, the demographic questions asked in a questionnaire are typically freely shared.
6) Is religion carefully measured and are religious groups carefully distinguished? No.
Over 60 percent of the religious people in the sample are Muslims. This means that in most of the t-tests and all of the regressions, Muslims disproportionately drive the results. If Muslims are different from other religious groups, or if the Muslims in the sample are disproportionately from poor or distrustful communities, this would bias the results the authors attributed to all religious children. But are Muslims identical to all other religious groups? In the t-tests the authors show us, the difference between Muslims and Christians is statistically significant 50 percent of the time. So why do they lump them as one group in all the regressions and all their conclusions? When results for Muslims and Christians differ, why do they treat the pattern for Muslims as representing all religious people?
The authors use two measures of religiosity: “How often do you attend religious services?” and “How often do you experience the ‘divine’ in your everyday life?” They then merge these into a single variable. The measure of frequency of attendance is likely biased toward Muslims. Because Muslims are expected to pray five times a day, every day, attending religious services more than once a week is more common among Muslims than Christians. The frequency of divine experience is likely biased toward Pentecostals. Thus, Muslims and Pentecostals will tend to cluster at the high end of the religiosity variable. If these groups are not typical of other religious groups in their generosity, socio-economic status, trust, or other factors that influence giving, the authors’ analyses will be biased if applied to religious people as a whole.
We never learn if either being religious or religiosity predicts lower generosity in all six countries in their sample and for both Muslims and Christians. In the article religion and religiosity are, for the most part, assumed to be one thing and assumed to work the same everywhere—as if the type of religion and the context of religion do not matter. Given the major sample problems, it would increase the plausibility of their results if the religious/low-giving association were consistent regardless of country and regardless of religious tradition.
Decety and colleagues ask an important and interesting question. However, they use a problematic non-random sample and inappropriate statistical techniques to analyze it. The sample is biased in a way that seems likely to make non-religious children seem more generous. The authors do almost nothing to account for alternative explanations. They lump religious groups together in a way that even their limited analysis suggests is inappropriate. They generalize to “religious” and “non-religious” children in a way that their sample does not allow. They make claims well beyond those supported by their analysis, and reporters make claims well beyond even those made by the study’s authors. The authors have not (yet) made either their data or questionnaire available to other scholars to allow them to check their results. The article is published in a biology journal in which the editor and reviewers are unlikely to be familiar with either the previous research on the topic, or the research standards required to make generalizable causal claims from sampled human populations. Both authors and journalists almost completely ignore the previous research on the topic, virtually all of which directly contradicts the study’s conclusions.
The fact that this study was published in a peer-reviewed journal and was so widely cited in the popular press—almost universally without interviewing or citing anyone else who has researched this topic or has a different point of view—is troubling. Although Decety claims that secularization makes people kinder and more moral, his research project does little to determine whether or not his belief is true.
Robert D. Woodbery is Associate Professor of Political Science at the National University of Singapore. This article originally appeared at Cornerstone, the blog of the Religious Freedom Project at Georgetown University’s Berkley Center for Religion, Peace, and World Affairs. It is reprinted with permission.
1. In some analyses the researchers also statistically controlled for age, country, and a rough measure of the education of the child’s mother. I will discuss the adequacy of their controls later.
2. After controls we are comparing people at the same levels of the control variables: in this case comparing children at the same age, from the same country, and whose mothers have the same level of education.
3. Laboratory-based studies of college students published in psychology journals sometimes uses simplified analyses like this (which is still highly problematic), but they have much less heterogeneous samples. These articles still assume the only relevant difference between religious and non-religious college students is there religion (e.g., that they have identical social backgrounds and social contexts).
4. Significance tests tell us whether or not the difference we find between two groups in a sample is larger than we would expect to find by chance if there is no difference between the two groups in the population we drew the sample from (at a given probability level). Calculating these probabilities requires a random sample. A statistically significant result does not necessarily mean the result is large, important, or causal; it merely means the two groups are unlikely to be identical in the population.
5. For readers who are statistically trained, the author’s models even violate the assumptions of OLS regression. The number of stickers children give away is a count variable, thus Poison or negative binomial regression are appropriate, not OLS regression.