Michelle Exstrom 12/1/2013


New ways of evaluating teachers recognize the complexity of their job, and may help ensure that only good teachers are teaching.

By Michelle Exstrom

Research over the last 20 years confirms what most of us already know intuitively: Good teachers make a world of difference. A student who has exceptional teachers benefits significantly.

Teacher at blackboardResearch also shows something that may surprise most of us: A student who has a series of ineffective teachers is unlikely to catch up academically—ever. Although  factors outside of school, such as how safe the child feels in the community or how much stress there is within the family, continue to have the biggest influence on a student’s achievement, inside a school, the most significant factor is the teacher. The challenge for lawmakers is ensuring that teachers, who have such a huge effect on kids, are the best they can be.

That requires coming up with a valid, fair and reliable way to assess effective teaching. This has been a challenge—but we may be getting closer.

A Little History

Since 2009—when two studies caught the attention of lawmakers—teacher evaluations have been based more and more on their students’ achievement. The first study—an international comparison of student achievement in math, science and reading, called the Program for International Student Assessment (PISA)—showed American 15-year-olds trailing their peers in many developed nations. The other—The New Teacher Project’s The Widget Effect—found more than 99 percent of teachers studied were rated “satisfactory” or better

Policymakers questioned how so many teachers could be receiving good evaluations yet be producing students who were falling behind the rest of the world academically. It became clear to them that there were problems in how teachers were being evaluated.

Formal evaluations were infrequent and yielded little information teachers could use to improve their classroom practices. In 2009, the New Teacher Project and some education experts started pushing hard for reform, arguing that if teachers are the key to improving student achievement, then evaluation systems must more accurately measure how teachers are contributing to students’ growth.

As state lawmakers were beginning to reassess their evaluation policies, the federal government provided incentives to encourage reform. The U.S. Department of Education offered Race to the Top grants to states looking (among other things) to reform teacher evaluations to include some measurement of their students’ learning. States seeking waivers from some of the requirements of the federal Elementary and Secondary Education Act (ESEA, also known as No Child Left Behind) also were required to consider student achievement in their teacher evaluation formulas.

The federal incentives pushed state and district policy onto the fast track, leaving many, including New Mexico Representative Mimi Stewart (D), wondering if changes were moving too fast. “Improving education shouldn’t be a race. Changes should be made based on the evidence of what works best,” she says. “We don’t want to make these drastic changes only to find out in a year or so that they don’t work and, even worse, that they harm teachers.”

Nevertheless, many states did jump in. And by the end of most legislative sessions this year, most states had adopted new requirements that students’ academic growth be a significant part of a teacher’s evaluation, along with other measures, such as peer or principal observations of their actual teaching and student surveys. Many states have also tied tenure and licensing to the new assessments. And, in some states, teachers whose subjects are not included in standardized tests are allowed to be evaluated by alternative measures, including locally developed performance standards.

First to Jump In

 Tennessee was one of the first states to reform the way it evalutes teachers. Lawmakers responded quickly to the federal initiative by passing legislation in 2010 they called the First to the Top Act. The legislature wanted to position the state well for the federal competition, and in July 2011, Tennessee became one of the first states to start using a teacher evaluation system based in part on student achievement. The state’s strategy worked. Tennessee was one of the first two states to win the competition.

“Our previous teacher evaluation system was pretty archaic because tenured teachers weren’t evaluated nearly enough­—only twice in 10 years,” says Tennessee Representative John Forgety (R), a former educator and current vice chairman of the House Education Committee. “This new system is an important step in the right direction because it gives us a better idea of teacher performance. But we need to continue to make tweaks along the way, which we did this year, to make sure we’re on the right track,” he says.

The Tennessee act requires student achievement to account for 50 percent of teachers’ evaluations—35 percent from student scores on the Tennessee Value-Added Assessment System and 15 percent from additional measures of student achievement as determined by the state Board of Education. The other 50 percent is based on teacher observations, personal conferences and review of previous evaluations and work. 

Similarly, in Colorado, Denver Public Schools began piloting a new evaluation system three years ago in response to state legislation passed in 2010. The law requires annual evaluations of teachers and principals based at least 50 percent on the academic growth of their students, along with peer observations and student surveys.

Sara Collyar, a former teacher, assistant principal and math “coach” for struggling teachers, is now in her second year as a peer observer in Denver. She’s convinced the new assessments are “very accurate at identifying what good teaching looks and sounds like.”

When observing, she notes whether the teacher has planned the lesson well, encourages student input and assigns rigorous tasks. “I want to see that the teacher clearly knows what she wants students to learn and how they will show her they understand. But I also look to see if students understand how what they are learning relates to the real world, a bigger picture of how it fits into context.

“I want to see students talking to and questioning each other, explaining their thinking. I want to see the teacher supporting those discussions with appropriate questions to get students unstuck or to take their learning further. If the teacher is lecturing, I want to see students with time to process, talk and question.”

Collyar then scores the teachers within a set framework that identifies the teachers in one of four ways: “not meeting,” “approaching,” “effective” or “distinguished.”

Research Catches Up

Researchers have been playing catch-up with all the new state policies being passed, and shedding light on what makes a sound evaluation system. The most important factor in developing a system is recognizing the complexity of teaching so that it will be trusted by educators, principals, parents and the public. For the past three years, the Measures of Effective Teaching (MET) project—a partnership among academics, teachers and education organizations that is funded by the Bill and Melinda Gates Foundation—has been investigating reliable ways to develop and identify effective teaching.

This January, the MET research team confirmed what many state legislators assumed was settled long ago: the effectiveness of a teacher can, indeed, be measured. Through well-designed student perception surveys and careful evaluations by highly trained observers, researchers can reliably identify groups of teachers who are better at helping students learn. Student surveys, for example, may ask whether they feel the teacher has control of the classroom, whether they feel safe and cared for, whether the teacher makes sure they understand material before moving on.

With the use of carefully designed instruments, the MET research team recommends that current and previous years’ students’ test scores count for at least a third, but no more than half, of a teacher’s total evaluation, with classroom observations and student surveys completing the rest of the assessment. MET researchers also verified that adding a second trained observer, such as a principal or an outside observer, increases the reliability significantly more than having the same person score several lessons of the same teacher.

Not Everyone’s on Board

Disagreement and debate continue over the MET group’s conclusions, about whether evaluations that rely heavily on test scores give enough weight to the numerous intangible attributes a teacher brings to the classroom. The issue is of even more concern when tenure is tied to the evaluation.

Representative Stewart is concerned that the evaluation process fails to take into account poverty rates and the number of second-language learners, which in New Mexico are both high. These students “tend to need a bit longer in school before they perform at grade level,” she says. She fears the new teacher evaluation system “appears poised to undermine the great work our teachers do in their classrooms.”

Indeed, teachers’ unions voice concerns that basing evaluations on student performance is risky. This may be especially true in the 46 states and four territories adopting the new, more rigorous content standards involved with the Common Core initiative. Its standards are designed “to be robust and relevant to the real world, reflecting the knowledge and skills that our young people need for success in college and careers,” according to the program’s mission statement. During the transition period, some students might score lower on these more challenging tests. How should a teacher then be evaluated? And what about the many educators who teach subjects for which there are no standardized tests given—such as art, music, PE or technology?

In April, Randi Weingarten, president of the American Federation of Teachers, called for a moratorium on full implementation of high-stakes evaluations or “assessment-driven sanctions tied to Common Core State Standards” until “solid implementation plans are embedded in schools” and evaluation systems are tested for a year or more. She argues that teachers should not be held accountable for student achievement until states have had enough time to establish the rigorous standards and new assessments.

In light of these concerns, Secretary of Education Arne Duncan announced in early June that the Department of Education would move back the timeline for the states receiving ESEA waivers. States would have until the 2016-2017 school year to overcome challenges.

Supporters of the new evaluations contend significant advances have addressed many of these early concerns. Evaluations in many states are based on several measures of student performance, not just statewide standardized tests. And in most states, districts can use approved alternatives to evaluate educators who teach subjects without standardized tests.

The SAS Institute, which develops software systems for states to use for these kinds of teacher evaluations, urges states and districts to look at three years or more of student test scores when evaluating a teacher. “Although it may be difficult to measure the progress students are making,” says Nadja Young, an education specialist in State and Local Government Practice at the SAS Institute, “sophisticated models now exist that are being used in the states that successfully measure students’ academic growth and the teacher’s role in it.” 

Keys to Success

Strong outreach and effective communication are essential in conveying the value and validity of the new evaluations to teachers and the public. It is important to include teachers and principals in developing the new evaluation system, to have an ongoing dialog among the state and local school districts, and to invite the public’s input.

“Getting teachers on board is a major challenge for states,” says Ellen Behrstock Sherratt of the American Institutes of Research. “Teachers who haven’t been authentically consulted from the beginning aren’t going to want to hop on a bandwagon. But it’s never too late to bring teachers into the conversation through focus groups or task force membership.”

Sherratt and her colleagues direct a program called “Everyone at the Table,” which provides materials and assistance to states and districts to involve teachers in the conversation. She points to Washington state as a good example of how to do it right. Officials there conducted focus groups at regional educator forums, supported local discussions with the public, and regularly convened a workgroup to talk about new policies with teachers and get their views on what was working well and what needed improvement.

Collyar says Denver, too, has invited collaboration among teachers, administrators and the union throughout the process. Feedback has been “encouraged, welcomed and acted upon each year,” she says.

Ample time and funding to establish an evaluation program also are critical to success. States that did not receive Race to the Top funds have struggled to ensure that their state and local education agencies have the staff and funding needed to successfully adopt the new systems.

This new area of work is complex and evolving. States and districts might want to consider starting with a pilot project—proceeding first with locally developed and initiated programs, then making any necessary adjustments before going full scale.

For states already moving forward with this effort, consistent review and analysis are important to determine if programs are leading to better teaching and, ultimately, to improved student learning.

After all, that’s why most teachers enter the profession—to make a world of difference in their students’ lives. Maybe that’s why Collyar has found most teachers welcome her observations and critiques. “They appreciate an honest evaluation, and most are eager to improve.” That should earn an “A” for effort, at least.

Across the Nation

Most states have made some kind of change to their teacher evaluation policies in recent years. Widespread changes, according to the Center for Public Education, include the following reforms.

  • Forty-seven states require or recommend that teachers be included in designing new evaluation systems.
  • Seventeen states give districts some flexibility in developing evaluations systems.
  • Twenty-one states allow districts to develop their own systems with few restrictions.
  • Forty-six states require or recommend that teacher evaluations include some measure of how the teacher influences students’ achievement.
  • No state evaluates teachers on student test scores alone.
  • Every state requires that classroom observations be part of the evaluation process.
  • Thirty-three states require or recommend at least annual observations.
  • Forty-one states require or recommend that evaluations consist of several measures of performance. 
  • Thirty-one states target professional development opportunities based on evaluations. 
  • Thirty-two states allow districts to dismiss teachers after several years of poor evaluations.

Source: The Center for Public Education, Trends in Teacher  Evaluation Report, October 2013

Michelle Exstrom directs the educator effectiveness, education finance, Common Core and assessments projects at NCSL.

Additional Resources

NCSL Resources

Other Resources


NCSL Contact: