Print Post
  • Do your choices matter? The challenges of omitted variables, selection effects, and more in social science. Tweet This
  • Assumptions researchers hold about choices and consequences play an important role in their conclusions. Tweet This

We co-authored a report put out by the National Marriage Project called “Before ‘I Do,’” where we look at how common experiences before marriage are associated with marital quality. Along with the media attention the report received come reasonable questions from other researchers. Here, we focus on issues related to omitted variables, selection effects, and how “included” variables can obscure findings. Wherever else researchers may differ in their viewpoints, most would agree that assumptions and practices about such issues impact interpretations and conclusions.

In a couple of pieces we wrote (here, and in a longer, more detailed discussion here), we noted our belief that social scientists sometimes emphasize selection effects to such an extent that there is little room left to conclude that personal behavior has a causal role in outcomes. Since an economist suggested we were unclear or maybe misapplied some terms related to the concepts of selection effects and omitted variables, we’ll first explicitly describe what we mean by our use of these terms in this piece before getting to our main points.

Selection bias broadly refers to situations where a sample has characteristics that impair your ability to draw the correct conclusion from an analysis. For instance, if you want to analyze the determinants of later first marriages, but the sample you have only includes people aged 21 and over, the sample is select for those who do not marry young, and that could bias your conclusions. Selection effects are variables that explain some of the association between an experience (e.g., cohabiting with more than one partner before marriage) and an outcome (e.g., marital quality), suggesting that the association between the experience and outcome is partly due to prior differences between those who do and do not have the experience. Such effects usually reduce the degree to which researchers will conclude that an experience may play a causal role in the outcome. Omitted variable bias occurs when an analysis does not include one or more variables that are causally related to an outcome variable, which leads one to over- or under-estimate the association between another variable and the outcome.

In our experience, discussions about selection effects are tilted toward the issue of who is at risk and why, and discussions about omitted variables are tilted toward whether or not an accurate picture of what causes an outcome is  represented in an analysis. Lastly, when we use the term consequential, here, we mean an experience or personal behavior plays a causal role.

Omitted Variables

One of the findings we presented in the “Before ‘I Do’” report was that having a greater number of wedding attendees was associated with higher ratings of marital quality. While many of our findings replicated prior and similar findings, this finding, as far as we know, is new to the field. The association provides a clear example of the omitted variable problem—a problem that is widespread in social science. In many cases, researchers do not have access to some variables they may suspect to be important; and in more cases than social scientists might like to admit, they don’t even know what they don’t know about other variables that matter.

On to weddings. We cannot prove what drives the association we found. In fact, we believe it likely is explained by a mix of factors such as the following (this is not an exhaustive list):

  1. Wealth of parents: While we controlled for individuals’ income and education, we have no variables about parental wealth or contributions to wedding budgets.
  2. Social capital: Friends and family of a couple.
  3. Preferences for how many people one wants to have attend their wedding.
  4. The power that public rituals have, in general, for fostering follow-through.

All of these factors, and more, reflect sound interpretations that could be investigated in future research. Numbers 2 and 4 have causal implications related to personal choices. First, choosing to have a ceremony with more friends and family could increase support around a marriage going forward. Second, findings in social psychology support the idea that public commitments in front of many witnesses can strengthen one’s resolve to follow through.

This example makes clear how great the challenges are to the task of inferring what drives a finding. We discussed numerous alternative explanations for this finding, such as parental wealth, in our report, but we also discussed how  choosing to have more witnesses for a wedding, if one were able, might have a consequential impact. For example, we took the opportunity to highlight the importance of social connection beyond weddings, noting that “maintaining important friendships and family connections, making new friends together, and getting involved in the community may enhance a couple’s relationship in multiple ways,” just as recommended by Paul Amato and colleagues in their book Alone Together (2007).

There are important differences in disciplines such as psychology, sociology, and economics on the subjects of consequential behavior, personal choices, and impacts. Scientists from all disciplines suggest interpretations and possible takeaways based on the assumptions specific to their fields. While we believe each discipline makes valid arguments, criticisms often result from the interdisciplinary differences that start in the assumptions.

While most social scientists believe that people make consequential choices, not all scientists of human behavior believe this.i Too often, in our view, scientists sound as if they do not believe a given person can alter his or her outcomes. Sometimes, this seems to us to be motivated by a desire to accurately, and compassionately, address the potent role contexts play in human behavior, especially when that context involves substantial disadvantages. But we believe it is also both accurate and compassionate to empower people to make choices, where possible, that can help them reach their goals. In this regard, we believe reality is both-and, not either-or.

Which Door into the Room? The Problem of Included Variables

As psychologists, we operate in a “choices have consequences” paradigm; thus, we are often looking out for information that can help individuals to reach their own goals. Furthermore, we believe that the problem of “included” variables can be as great a threat to understanding how experiences impact outcomes as the problem of omitted variables. We’ll use the metaphor of a room and doorways to illustrate. Consider each of the following experiences that we describe as being associated with marital quality in the “Before ‘I Do’” report (this is not the entire list):

A. Having had more sexual partners before marriage
B. Having had more cohabiting partners before marriage
C. Having a child with a prior partner before marriage (for women)
D. Having your marriage start with a hook-up (self-defined)
E. Having cohabited before mutually clarifying a specific intention to marry
F. Having a child together before marrying (for college graduates)
G. If cohabited beforehand: sliding into it more than talking about it and making a decision
H. Considering oneself more committed than one’s partner before marriage
I.  Having more witnesses at the wedding

Everything on this list (except for the last item) has historical, empirical precedent for being associated with relationship and/or life outcomes. Further, for each item on the list, there is substantial logic and theory to suggest how the experience can impact outcomes.

In the report, we present analyses of these predictors of marital quality while controlling for education, income, race, and religiousness. We could control for a greater number of variables if we chose to do so, but we used what we think is a respectable list of core control variables. Researchers can always choose to enter more variables in their models, if they have them. And this is often tempting to do because scientists of all stripes tend to believe that ever better measurement will result in better knowledge. But including more variables often can compromise what one concludes, as well.

Standard practice in fields such as sociology and economics is to present large regression models where many variables are entered in analyses. We have deep concerns about this general practice. Consider again the list of variables above, particularly A through G. Think about how interrelated these experiences are. A riskier pathway before marriage may include any or all of these, in various sequences, but all can be considered doorways into the same room—if you will, in the present context, an antechamber to marriage.

One might enter this room by starting their relationship by hooking up. One might enter this room by starting to cohabit without talking together or making a decision about it. One might enter this room by having an unplanned baby—with a prior partner or with the person one marries. Further, a similar set of selection effects are associated with the odds of walking through any of these doors. We do believe that passing through this room can be consequential for outcomes.

The figure below shows an example of a multivariate regression model with several of the variables discussed in the “Before ‘I Do’” report entered in one model (with the same controls we used and with variables related to children before marriage—from prior partners or with the spouse—combined into one variable to simplify the model). Children before marriage, the number of sexual partners before marriage, and the number of people attending the wedding all remain statistically significant when included in one model. Now, we know some researchers will ask, “what happens if you include all the doors and many more potential controls for selection in one model?” This is a common question, but it presents a serious dilemma in this context because of how potentially consequential behaviors, such as A through G on the list above, are interrelated.

new regression photo

As social scientists understand, when you use regression models that include a number of variables that are conceptually (and mathematically) associated with each other, some of those variables may no longer be significantly associated with the dependent variable. And while it can seem a less serious matter to add more control variables, that also has consequences because of impacts on statistical power and how some control variables may be proxies for predictors.

An important part of what researchers do is make decisions about what variables to keep or exclude. This may sound easy, but it isn’t, and conclusions can be affected. Overly inclusive regression models can lead researchers to imply to consumers of research that behaviors that may be consequential do not even matter: “just avoid that door, you’ll be fine.”

As an example, in our data set, having more sexual partners before marriage is moderately associated with whether or not one cohabited with anyone in addition to the eventual spouse. Entered separately, these variables are both statistically significant predictors of marital quality (net of the controls), but neither is significant when entered together. Does that mean neither is associated with marital quality? Of course not. Furthermore, while one could elect to make one variable that represents the entire room, and no single door, that is also unsatisfying because we believe that different people are likely to go through different doors.

As we’ve made clear in our report and in the media, wherever we could, nothing we describe dooms a person to a negative outcome. We do, however, encourage people to be aware of, and more deliberate about, the door that is nearest.

Degrees of Freedom

We understand how and why some social scientists might argue that some experiences such as those we analyzed in the “Before ‘I Do’” report do not play a causal role in marital outcomes. And, to be very clear, we don’t believe that such experiences impact everyone in the same way or degree, or even in the same direction. For many people, there is no measureable negative impact of going through a particular door or even a positive impact. Beyond the nuances, we do not believe that social scientists can actually measure if, at the fork in the road, a person has a choice between higher-risk path A and lower-risk path B. This is no small matter that intersects philosophy and science. Assumptions researchers hold about choices and consequences play an important role in their conclusions. True for us, true for all.

We do believe that romantic relationship behaviors involve some degree of personal choice and causality. To borrow a common statistical phrase, one’s will is “free to vary,” even if in small ways net of other influences. In the main, we agree with our colleague Marline Pearson, who often says to the women she works with (including many with substantial disadvantages in life), “Your love life is not neutral.”

Scott Stanley, Ph.D., is a Senior Fellow at the Institute for Family Studies and a Research Professor and Co-Director of the Center for Marital and Family Studies at the University of Denver. Galena Rhoades, Ph.D., is a Research Associate Professor in the Psychology Department at the University of Denver and a psychologist in private practice.


i. For example, see this Wall Street Journal article by Christopher Chabris, reviewing the book Incognito by neuroscientist David Eagleman (2011). Think of this as merely one specific example at the tip of an iceberg.