Rorschach Data: How Biases Affect Reforms in Higher Education
Alexandros M. Goudas (Working Paper No. 10) October 2018
Imagine you find a study about an intervention involving two-year college remedial education. For exactly half of the students who participate in the intervention, it works very well. In fact, after six years the students in that cohort graduate at a statistically significantly higher rate than students who never required any remediation, i.e., college-level students. On the other hand, for the other half, this intervention is associated with lower than average outcomes. These students tend to stop out more than average and graduate at rates much lower as a result of the intervention, one assumes.
What would you make of this intervention? Would you implement it? Half of the students who participate in the reform have better results and the other half worse. Compounding the problem is that these are correlations, and it is not clear whether the intervention is actually causing the effects. Like most research in higher education, it is not a randomized controlled trial. Nevertheless, you proceed by assuming that the intervention is causing the results. Do you focus on the half of the results that shows positive outcomes and thus promote the intervention? Or do you recommend avoiding the reform because of the negative results? Might your decision have anything to do with your views on remediation?
Many people, including researchers, look at similar correlative data and come to the conclusions they wish to believe are correct, even if the data are indicating mixed results. When numbers can be interpreted with broad latitude due to complexity, an apt term for this phenomenon is Rorschach data. Unsurprisingly, most postsecondary research, studies, and data are noisy and complicated. They often resemble a Rorschach test: That is, what you get out of it is based on what you bring to it.
Rorschach data is a type of confirmation bias, which is a bias that occurs when one finds evidence to support a preexisting belief. Worse yet, the very questions many researchers choose to ask, often based on their beliefs, skew what data they collect, pare down, and ultimately analyze. Thus the methods used for analysis are often flawed (Aschwanden & King, 2015), and many journals exacerbate this by publishing mostly positive findings (Joober et al., 2012), a phenomenon referred to as publication bias. The fields of medicine and psychology (Ioannidis, 2005; Nosek, 2015) have a reproducibility problem due to these and many other problems inherent in the extremely difficult process of designing proper studies and analyzing data without biases.
What often results from this flawed process is a false sense of certainty. This is where harm may occur. To show how the entire process of data selection and interpretation can result in flawed conclusions and harmful reforms based on those conclusions, this paper delves into a comparison of two studies involving similar remedial student tracking data, both of which can be interpreted differently based on what views researchers bring to the data.
Referral, Enrollment, and Completion in Developmental Education (bailey, jeong, & Cho, 2010)
In 2010, the field of remediation and developmental education was changed drastically with a seminal paper the Community College Research Center (CCRC) published called “Referral, Enrollment, and Completion in Developmental Education Sequences in Community Colleges,” authored by Thomas R. Bailey, Dong Wook Jeong, and Sung-Woo Cho. This paper has become foundational in the remedial reform landscape (over 1000 citations in Google Scholar), and it provided an outline for all of the current reforms sweeping the nation: acceleration, corequisites, multiple measures, guided pathways, and the elimination of prerequisite remediation.
The dataset Bailey et al. (2010) used came from a group of Achieving the Dream (ATD) institutions that “includes over 250,000 students from 57 colleges in seven states” (p. 256). These students started their first year of college in the fall of 2003 or 2004 and were tracked by the CCRC for three years. There are approximately 1200 two-year colleges in the nation, and so the researchers admit that their “sample is not representative of all community college students”; to account for this, they compared their results to a group of students from “an analysis using the National Education Longitudinal Study of 1988” (p. 256). The CCRC found the two datasets matched according to their criteria in spite of the fact that their comparison in Table 6, p. 262, shows a large difference in the two demographics of the ATD and the NELS datasets. Nonetheless, they concluded their datasets matched, their analysis was correct, and their results could be applied to all community colleges in the nation.
Importantly, most of the widespread current reforms are founded on the numbers and recommendations the authors made in this study: acceleration, corequisites, skipping remediation, etc. They are all mentioned in the conclusion, with the exception of placement test reform, which they hint at in other sections. It suggests that the authors looked at the data with a particular frame of mind and focused on the negative instead of the positive. For example, after noting that remedial completers had high pass rates in college-level courses, the authors downplayed any significance of this data: “The high pass rate is encouraging, but developmental education completers are already a selected group of students who have successfully navigated their often complicated sequences” (p. 260).
Remedial Coursetaking (Chen, 2016)
Several years later, a study by the National Center for Education Statistics (NCES) (Chen, 2016) explored a very similar group of students who attended two-year colleges starting in 2003–2004, perhaps even sampling some of the same students as the CCRC’s dataset. The NCES dataset was more limited, at approximately 17,000 students, but it was still a statistically representative sample. However, these students were tracked for six years, three years more than the CCRC’s ATD sample, and the findings of the study are vastly different from the CCRC’s. For instance, Chen noted several areas in which students who finished all their remedial coursework excelled: “Remedial completers did as well as or even better than those who did not take any remedial courses in such areas as earning college-level English credits (table 4), transferring to a 4-year institution (table 6), and persisting through college (figure 5)” (p. vii). No such comparison to nonremedial students occurred in the CCRC study.
In terms of completing remedial courses, the CCRC’s report on ATD data (Bailey et al., 2010) found that “between 33 and 46 percent of students, depending on the subject area, referred to developmental education actually complete their entire developmental sequence” (p. 256). However, the NCES report found that “about half of remedial coursetakers beginning at public 2-year institutions (49 percent) completed all the remedial courses they attempted” (p. v). The difference between 33 and 49% is large and requires investigation because when researchers conclude that remediation is a barrier due to a number that suggests that one-third of students are completing it, then they might recommend action to reduce remediation. However, if half of all remedial students complete their remediation, this number may suggest a different course of action.
There are three possible reasons for the discrepancy in remedial course completion rates in both studies. First, the CCRC (Bailey et al., 2010) chose a dataset of ATD colleges, which have more underprepared students than average, and this might account for the lower percent of remedial completers. Second, perhaps the fact that the CCRC only tracked these two-year college students for three years impacted their results as well. To demonstrate the effect that longer-term tracking has on data in higher education, a recent National Student Clearinghouse Research Center report shows that if two-year college student completion is tracked for eight years instead of six years, the graduation rate rises from 38% to 44% (Shapiro et al., 2017, p. 45). Indeed most students in all types of college are part-time (p. 11), and therefore positive results take more time to show in the data. Finally, the CCRC data deliberately include all students who were placed into remedial courses in their dataset, even if they never enrolled in those classes, and thus their findings reflect a far lower number completing these courses. The problem with this is that one cannot rate an intervention’s efficacy if students do not participate in that intervention. Nonparticipation, for reasons due to work, family, or money, is counted against remedial courses in this case.
The most important difference between the two studies is that the CCRC report (Bailey et al., 2010) did not track student completion metrics, meaning certificate or degree completion, and this is where researcher choice affects analytical outcomes. The NCES (Chen, 2016) found that of the 49% of remedial students who completed all their remedial courses, 43% of them graduated with a certificate or degree after six years, whereas nonremedial students—i.e., those students who never needed to take remedial courses—graduated at a rate of 39% (p. 35). That is, half of all remedial students, the students that finished all their remedial courses, graduated college at a rate of 4 percentage points higher than college-level students who never tested into remediation. This suggests remediation had a part in causing students to complete college. At the very least, it did not harm students’ ability to graduate.
Complexity in the Data Lends itself to Bias
The hypothetical anecdote presented at the beginning of this article reflects the NCES study (Chen, 2016). Remediation (i.e., stand-alone prerequisite remedial coursework) is the intervention itself, and the data suggest that those who finish their sequences perform better than nonremedial students in the pinnacle of college achievement, completion. The goal of remediation is to get underprepared students to the starting line so that they may do as well as nonremedial students in the end. If remedial students graduate at the same rates as nonremedial students, that should be considered a success. However, this study demonstrates that in fact, for half of remedial students in the sample, remediation is correlated with graduation rates that are higher than nonremedial students.
The main problem is that the CCRC authors (Bailey et al., 2010) looked at similar complex, mixed data and concluded that the problem was remediation itself. They then recommended actions to subvert, eliminate, or accelerate it. This might be due to preexisting bias on the part of the authors. There is strong language in the paper which suggests this bias. For example, they used the term “confusing” six times to refer to developmental education and remediation. They also characterized the developmental sequence as being in “confusion and disarray” (p. 268). In the working paper of the CCRC report (Bailey et al., 2009), they even stated that remediation “must appear confusing, intimidating, and boring to many students entering community colleges” (p. 28), a statement that was removed from the published version (Bailey et al., 2010).
Bailey et al. (2010) also labeled remediation as “ineffective,” the first of numerous references in many CCRC working papers and journal articles on developmental education, and they used this term as a basis for a now popular reform: “Given the confusion and ineffectiveness of the developmental system, one possible objective would be to reduce the length of time before a student can start college courses—to accelerate the remediation process” (p. 268). This language suggests a preexisting bias against remediation. This bias has contributed to the elimination of helpful remedial courses from many state systems, as well as the severe restriction or reduction of stand-alone remedial courses. Some CCRC researchers are in fact proud of this and consider reforms that simply reduce remedial courses as a success, thereby implying it was the goal all along (Scott-Clayton, 2018). It is important to keep in mind that only one reform implemented since the completion agenda of 2009 has resulted in higher graduation rates, and that is the Accelerated Study in Associate Programs (ASAP) from the City University of New York (“Significant Increases,” 2016). This finding has been successfully replicated in several community colleges in Ohio (Miller et al., 2020).
The CCRC researchers could have as easily interpreted the data positively. They could have pointed to the promising results of the ATD remedial completers and recommended that all students be supported and encouraged to enroll in and complete their remedial courses. They could have recommended that support models be utilized to assist students in finishing their coursework. More recently, they could be promoting the NCES paper (Chen, 2016) and proclaim that remedial completers are exemplars because they graduate at rates higher than nonremedial students. It would be the opposite but equal reaction to mixed results but based on a focus on the positive data instead. At the very least, they could recommend that institutions encourage students to complete their remedial coursework (along with implementing strong systems of support), but this is not included in any current recommendations.
Instead, CCRC researchers continue to focus on the negative side of complex Rorschach data in developmental education. A recent report (Ganga, Mazzariello, & Edgecombe, 2018) designed to introduce developmental education and remediation for policymakers cited the NCES report (Chen, 2016) five times. However, each citation focused on the negative aspects of remediation, and they also confused causation with correlation by implying that remediation is causing these negative outcomes. Not once do they cite the result that remedial completers graduate at a higher rate than nonremedial students, and that these completers are half of the remedial sample. The CCRC researchers then continued in the same manner as Bailey et al. (2010) started, and they recommended all of the same reforms: reducing placement tests, increasing acceleration, promoting corequisites, and implementing guided pathways, which is their comprehensive and holistic recommendation based on the book by Bailey et al. (2015).
Another example of how complex data with mixed results are interpreted according to preexisting beliefs can be seen in the research on corequisites. The CCRC disseminated a foundational working paper (Cho et al., 2012) on the Accelerated Learning Program (ALP) from the Community College of Baltimore County, the first model of corequisites studied that used some rigor. It is a very complex article with many different types of results, positive and negative. For instance, the model is correlated with higher pass rates in college composition I and II, but it also is correlated with lower certificate attainment rates. It is also correlated with higher persistence rates, but nonremedial students who took ALP college-level courses with ALP students had worse outcomes later. The program also cost double compared to traditional remediation. The bias in the interpretation of these results is that the CCRC and others mostly highlight the study’s positive findings. In fact, interest groups and media rarely mention the negative results associated with corequisites when they present on or cite this study.
Causation and correlation are extremely difficult to distinguish. When people, even top researchers, see positive correlations from an intervention, and they also approve of that reform philosophically, they are much more likely to present the data as causative. However, if they disagree with any findings, they are highly likely to downplay them, such as when Bailey et al. (2010) dismissed positive findings in the CCRC report on ATD students who complete their sequences, which was nearly half of remedial reading students.
Other examples of downplaying positive results can be seen in two recent reports on developmental education outcomes. First, in defense of Florida’s decision to make all remediation optional, Scott-Clayton (2018) characterized a 7-percentage point drop in overall passrates as “modest.” Yet when the overall numbers passing the courses rose by 7-percentage points, which in Scott-Clayton’s opinion is positive, it is characterized as a “marked” increase (para. 15). In a different recent report about current remedial reforms in the nation, Rutschow and Mayer (2018) argued that math pathways showed positive results by stating that students in math pathways passed a math course at a rate of 49% (p. 3). They downplay the traditional model’s pass rate of “only 37%” in the same paragraph. The bias can be seen in the rounding: In Table 1, the actual numbers are 48.5%, which became 49%, and 37.5%, which became 37% (p. 5). What is disregarded in the discussion is the fact that they are comparing one semester (49%) to two semesters (38%, rounded correctly), which could be considered an apples-to-oranges comparison.
It is disappointing top researchers would ignore any important data in complex research. It is particularly concerning that a prominent research organization such as the CCRC would cite an NCES report (Chen, 2016) five times and not mention that data clearly show completing remedial coursework is correlated with increased graduation rates. If their argument is that this is correlation and not causation, then the same argument applies to the original CCRC research findings and recommendations. If researchers are to confuse causation and correlation, it should not be a one-sided conclusion based on a biased interpretation of complex Rorschach data.
Aschwanden, C., & King, R. (2015, August 9). Science isn’t broken. Five Thirty Eight. http://fivethirtyeight.com/features/science-isnt-broken/
Bailey, T. R., Jaggars, S. S., & Jenkins, D. (2015). Redesigning America’s community colleges: A clearer path to student success. Harvard Press.
Bailey, T. R., Jeong, D. W., & Cho, S. W. (2009). Referral, enrollment and completion in developmental education sequences in community colleges (CCRC Working Paper No. 15). Community College Research Center, Teachers College, Columbia University. http://ccrc.tc.columbia.edu/media/k2/attachments/referral-enrollment-completion-developmental_V2.pdf
Bailey, T. R., Jeong, D. W., & Cho, S. W. (2010). Referral, enrollment and completion in developmental education sequences in community colleges. Economics of Education Review, 29(2), 255–270. https://doi.org/10.1016/j.econedurev.2009.09.002
Chen, X. (2016). Remedial coursetaking at U.S. public 2- and 4-year institutions: Scope, experiences, and outcomes (NCES 2016-405). U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics. https://nces.ed.gov/pubs2016/2016405.pdf
Cho, S. W., Kopko, E., Jenkins, D., & Jaggars, S. S. (2012). New evidence of success for community college remedial English students: Tracking the outcomes of students in the Accelerated Learning Program (ALP) (CCRC Working Paper No. 53). Community College Research Center, Teachers College, Columbia University. http://ccrc.tc.columbia.edu/media/k2/attachments/ccbc-alp-student-outcomes-follow-up.pdf
Ganga, E., Mazzariello, A., & Edgecombe, N. (2018). Developmental education: An introduction for policymakers. Education Commission of the States, Center for the Analysis of Postsecondary Research. https://www.ecs.org/wp-content/uploads/Developmental-Education_An-Introduction-for-Policymakers.pdf
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Med, 2(8): e124. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/
Joober, R., Schmitz, N., Annable, L., & Boksa, P. (2012). Publication bias: What are the challenges and can they be overcome? Journal of Psychiatry & Neuroscience : JPN, 37(3), 149–52. https://dx.doi.org/10.1503%2Fjpn.120065
Miller, C., Headlam, C., Manno, M., & Cullinan, D. (2020). Increasing community college graduation rates with a proven model: Three-year results from the Accelerated Study in Associate Programs (ASAP) Ohio demonstration. MDRC. https://www.mdrc.org/sites/default/files/ASAP_OH_3yr_Impact_Report_1.pdf
Nosek, B. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). http://science.sciencemag.org/content/349/6251/aac4716
Rutschow, E. Z., & Mayer, A. K. (2018). Making it through: Early findings from a national survey of developmental education practices (CAPR Research Brief). Center for the Analysis of Postsecondary Research. https://www.mdrc.org/sites/default/files/DCMP-InterimFindings.pdf
Scott-Clayton, J. (2018, March 29). Evidence-based reforms in college remediation are gaining steam – and so far living up to the hype (Evidence Speaks Series). Brookings Institution. https://www.brookings.edu/research/evidence-based-reforms-in-college-remediation-are-gaining-steam-and-so-far-living-up-to-the-hype/
Shapiro, D., Dundar, A., Huie, F., Wakhungu, P. K., Yuan, X., Nathan, A., & Bhimdiwali, A. (2017). Completing college: A national view of student completion rates – Fall 2011 cohort (Signature Report No. 14). National Student Clearinghouse Research Center. Retrieved from https://nscresearchcenter.org/wp-content/uploads/SignatureReport14_Final.pdf
Significant increases in associate degree graduation rates: CUNY Accelerated Study in Associate Programs (ASAP). (2016). City University of New York, Office of Institutional Research and Assessment. http://www1.cuny.edu/sites/asap/wp-content/uploads/sites/8/2016/06/ASAP_Program_Overview_Web.pdf