ESSA Evidence Requirements: What “Works”?

By: | 03/10/2021
Categories: Data Culture, Leadership

One of the more common requests we get from educators is for proof that our programs “work.” In the context of personalized learning and education programs in general, how something “works” can mean many things depending on who is asking and what is motivating their question. For Curriculum Associates, it means our programs lead to improved student outcomes. “Work” also means that our programs meet the Every Student Succeeds Act (ESSA) evidence requirements.

ESSA evidence requirements fall under four levels, with Level 1 being “Strong evidence of effectiveness” and Level 2 being “Moderate evidence of effectiveness.” So what’s the difference and why doesn’t Curriculum Associates have any studies that meet Level 1 evidence standards?

What Do ESSA Levels Mean?

“Level 1” is defined as having “one or more well-designed and well-implemented randomized control experimental or trial (RCT) studies.” Level 2 is defined as being “. . . supported by one or more well-designed and well-implemented quasi-experimental studies.” The main difference between Level 1 and Level 2 is how the treatment and control groups are determined. In an RCT, a lottery-type system is used, and students are put into the treatment or control group at random—the goal being to create two groups that are identical to each other on a wide range of important characteristics, except for whether they experience the program. Theoretically, random assignment helps ensure the two groups (control and intervention) are equivalent at the start of the research project. This means if differences in reading scores between the two groups are found, the only plausible explanation for the difference is the new reading intervention program and not some unknown characteristic that can’t be measured.

In a quasi-experiment, the goal is the same: to create two groups that are as equivalent as possible to each other before the intervention starts. The difference is that various statistical techniques are used to make the groups similar to each other before the intervention as opposed to random assignment. For example, say you wanted to test the impact of a new reading intervention program on third grade students. In an RCT, one group of third grade students would be randomly assigned to receive the new program while the remaining students would not be exposed to it. In a quasi-experimental study, the researchers would look at important data about the students and use different statistical techniques to create two equivalent groups.

There is little argument that RCTs are considered the “gold standard” for education research to determine if a program was the cause of an improvement in student outcomes. So why doesn’t Curriculum Associates utilize them? Why do we focus on rigorous research alternatives that meet Level 2 or even Level 3 ESSA evidence requirements? In short, we haven’t used an RCT yet because results from well-designed quasi-experimental studies are very close to results from RCTs. In addition, we are cognizant of the research reality that for many schools, participating in an RCT can be difficult for reasons of burden, rigidity, cost, and timeliness.

Burden

RCTs can require schools to do things they would not normally do, creating a large burden on study participants. Some of these burdens can include:

  • Randomly assigning students to teachers. RCTs can require the makeup of a classroom to be random, which means the RCT dictates how students are assigned to teachers—as opposed to the normal process of building classrooms.
  • Not being able to try something new. There is no guarantee that a participating school or district will get assigned to a “treatment” condition, meaning that they may not be able to try any new programs or instructional practices. They are stuck until the study is completed.

Districts are often not equipped to do a pure randomization, and when compromises are made on how randomization is done, the strongest argument for using an RCT can quickly become moot. In other instances, RCT designs are not plausible or can create additional difficulties for schools that make them hard for schools to agree to.

i‑Ready and Ready are backed by timely research that meets the criteria for “evidence-based” as defined by ESSA.

Read the Research

Rigidity

RCTs undertaken to meet ESSA Level 1 requirements are designed to answer one question really well: Is Program A better than Program B when it comes to improving student achievement? This is a very narrow question, and though its answer is certainly valuable, the information it provides isn’t always the most important. For many educators, knowing how and why something is effective is just as important as knowing if it is effective. Having information about the how and the why helps educators really understand a program and utilize it more effectively, and RCTs aren’t able to answer these questions significantly better than other research designs.

With RCTs, being assigned to a treatment condition may limit what a school or district can do while the study is underway because of the requirements of the intervention being used. In our experience, many educators would rather have more flexibility.

Cost

A March 2019 study published in the journal Educational Researcher found that RCTs carried out by education research organizations in the United Kingdom and the United States often cost more than £500,000 (i.e., approximately $680,000) to implement. While participating schools are not usually burdened with the costs of these studies, the organizations that are carrying out the studies are limited by the funds available to them, which are usually from competitive grants from the Federal government or nonprofit foundations.

To control costs, RCTs usually must be limited in scope across many important demographics (e.g., student demographics, number of schools and/or students involved, etc.), which can then limit how applicable the results are to students who weren’t represented in the study. For example, when RCTs are conducted with a small number of students in a limited number of grades in a small number of schools, the results may not be applicable to other students, grades, or school types.

Timeliness

The time from the beginning of an RCT to the end can be extremely long (often running three years or more just for data collection), which makes them hard to use when it comes to programs that are constantly evolving. For example, in just the past year we’ve made more than 100 updates to our i-Ready programs, many of which were implemented to better help educators meet the challenges of the moment (i.e., distance learning, unfinished learning, etc.). The further away the study is from the current version of the program, the more likely it is that important changes to the program have happened that weren’t part of the study.

The current version (revised in 2019) of the What Works Clearinghouse Educator’s Practice Guide on “Foundational Skills to Support Reading for Understanding in Kindergarten through 3rd Grade,” includes only studies that were published in 2014 or earlier. Indeed, the majority of included studies were published before 2010. Educational needs can change overnight—the rapid shift to at-home learning is a perfect example of this—and the tools designed to meet those evolving needs must change just as quickly if they are to remain efficacious.

Final Note

This post isn’t meant to be a deep dive into the pros and cons of different research designs, nor is it a blanket dismissal of RCTs in education research. If the situation is right, we will pursue the option to do an RCT. Rather, it’s an explanation of ESSA evidence-based requirements and Curriculum Associates’ efficacy and research approach. We believe that we can evaluate the impact of our programs using Level 2 designs, and doing that allows us to answer a diverse set of research questions in a timely and rigorous manner.

When it comes to choosing education programs, one study, even if it’s an RCT, can’t tell a program’s complete efficacy story. The more research the better, and, as described above, given the burden, costs, and other facets of doing an RCT, it is unusual to see replications of them. In contrast, taking a quasi-experimental approach can make it easier to replicate findings and build a more comprehensive evidence base.

Ultimately, we believe educators are looking for ample evidence from many different studies that programs such as i-Ready will lead to improved learning outcomes for many different students with many different needs.

Your Local Curriculum Associates Contact

Shenique Mens-Smith

Your Local Curriculum Associates Contact