Sunday, January 13, 2013

Multiple Issues About Multiple-Choice Items

It's amazing how a late-night email to Diane Ravitch grew into a charge for me. As I wrote before, my friend Christine asked me why I was upset about the use of the NWEA MAP , especially when it would likely replace the MEAP, Michigan's statewide assessment. I wrote to her the following statements:

The NWEA MAP is a computer adaptive, standardized test that uses selected-response items that were written from national standards. The test items are aligned post hoc to state standards and the test results are used to measure student growth in language, reading, and mathematics.

I underlined the concerns I had about the claims attached to tests like the NWEA MAP and the Michigan Education Assessment Program (MEAP). In hindsight, I omitted many other issues, such as teacher evaluation, cut scores, data, and proficiency. I will return to all of these in time. Today's topic for consideration is the selected-response item, also known as the multiple-choice item.

Assessment experts, psychometricians, and others who look for item difficulty, item discrimination, and test validity and reliability will sing the advantages of assessments comprised of multiple-choice items. Their evidence for these claims are based on statistical analyses where p-values, Pearson Product Moment correlations and reliability coefficients dominate the discussions. To the assessment experts these measures indicate how solid the assessment is. While important considerations for those involved in the multi-billion-dollar test industry, this evidence is nearly meaningless to most educators

Advocates of the multiple-choice assessments claim other advantages. For example, because large populations of students take the tests, test developers can spread the content over different forms of the test, a process called "adequate sampling." To the classroom teacher, this helps to explain why students can have different versions of the same test during a mass administration of standardized paper and pencil tests. Furthermore, they are more objective than open-ended or performance items; there is little bias in evaluating these items and they easily lend themselves to electronic scoring. The greatest advantage to using multiple-choice tests is that they are a faster and cheaper way to gauge student achievement than by performance or writing assessments. Great educators probably never factor faster and cheaper into anything they do.

Every stakeholder involved in the education of our children has a need for assessment, but one assessment cannot serve every purpose. A large-scale multiple-choice assessment can sweep across large populations of students and give a quick pulse of progress. This information is good for policy makers, Superintendents, and others needing a bird's eye view. It cannot, however, provide the insight needed at the classroom level, the place where detailed information about individual students becomes the basis for changes in instruction. Changes in instruction bring about improved teaching and increased learning, the heart of what we want to see in our schools.

The results from multiple-choice tests rarely if ever show what students are thinking when they choose an option. Because we are unable to see the reasoning behind the choice, we are unable to see where students went wrong. Whether the items are A-D or A-E, the possibilities remain for students to select the correct answer by chance. Students are forced to make a choice among options, prohibiting students from offering up a different response. Again, this fails to provide useful information about individual students.

Furthermore, our standards are, for the most part, written in such vague and broad language that multiple-choice items cannot get at the essence of what is expected of our students. The multiple-choice item is limited in how deep it can go. Factoids and trivia lend themselves easily; however, if we want our students to demonstrate deep and complex thinking, it will take more than checking an option.

A Google search for the origin of the multiple-choice test will likely give Frederic Kelly the credit for bringing this item type to education, and he did so only a century or so ago. Educators with some extra time on their hands might be interested in how its popularity increased exponentially to the current day. It took little time for college admissions tests to forgo the once all-essay tests and morph into assessments like the ACT and SAT of today.

Sure, there are additional limitations to using the multiple-choice items,but there is a greater issue here.

We look to Finland as a model for excellence in education, given their near-the-top status on international tests year after year. We have studied their model. The Finnish students rarely take multiple-choice assessments. I repeat, the Finnish students rarely take multiple-choice assessments. Their achievement is determined, instead, by demonstrating what they know.  Despite not having to take semi-yearly and yearly multiple-choice tests from grades 3-10, the Finnish students outperform most of the world on a primarily multiple-choice test. We have research here in the states that shows this same phenomenon. When students have demonstrated what they know and are able to do through authentic and performance assessment, they perform better on standardized, multiple-choice tests.

If we glean no useful information about individual students from tests like the NWEA MAP and we know that students can excel on international standardized tests when they are engaged in school- and classroom-based authentic assessments, why do these multiple-choice assessments exist at all?

Maya Angelou says that we do what we know to do and that when we know better, we do better. Not in this case.

Next up? Standardized testing.