It's time for a new NAPLAN
In a new report commissioned by the NSW Teachers Federation on behalf of the teaching profession, internationally-renowned expert Dr Les Perelman has found that that the NAPLAN writing test is, "By far the most absurd and the least valid of any test that I have seen". In this excerpt from his report, Dr Perelman measures how the NAPLAN writing task stacks up against comparable tests around the world.
Achievement tests have become an almost universal feature of primary and secondary education in industrialised countries. Such assessments, however, always need to be periodically reassessed to examine whether they are measuring the relevant abilities and whether the results of the assessment are being used appropriately.
Most importantly, the assessments must themselves be assessed to ensure they are supporting the targeted educational objectives. Contemporary concepts of validity are considered as simultaneous arguments involving the interpretation of construct validity, content validity, and external validity, along with arguments involving fairness and appropriateness of use.
As points of comparison, the examination of six different writing tests from the United States, Australia, Canada and the United Kingdom produced observations directly relevant to an evaluation of the NAPLAN essay:
- The majority of tests, and all the tests specifically for primary and secondary schools, are developed, administered, and refined within the context of publicly available framework and specification documents. These documents articulate, often in great detail, the specific educational constructs being assessed and exactly how they will be measured. They are not only an essential tool for any assessment design but also their publication is vital for the transparency and accountability necessary for any testing organisation.
- In some cases, these documents are produced with collective input from stakeholders and academic specialists in the specific disciplines. The Smarter Balanced Assessment Consortium and the National Assessment of Educational Progress (NAEP) writing assessments made use of large panels of teachers, administrators, parents, elected officials, and academic experts.
- Several of the tests unreservedly mix reading and writing. The Smarter Balanced Assessment Consortium reading test incorporates short-answer writing (constructed response). The texts in the reading exercise form part of the prompt for the long essay, and the short written answers to the reading questions serve as prewriting exercises. Integrating writing and reading in assessments makes sense. Children acquire language through exposure to speech. Eventually, reception leads to production. Although writing is a technology that is only approximately 6000 years old, it is an analogue to speech, albeit not a perfect one. Indeed, students will have extreme difficulty writing in a genre if they have not read pieces in that same genre.
- Writing tasks are designed and employed for specific classes or years. With the exception of NAPLAN, I know of no other large-scale writing assessment that attempts to employ a single prompt for different age groups.
- Similarly, most tests tailor their marking rubrics for different classes or years. For example, the scoring rubrics for Grades 4 and 7 in British Columbia’s Foundation Skills Assessment (FSA), displayed in Appendix D (see online report), vary significantly, consistently expecting a higher level of performance from the higher grade.
- Informative writing, in addition to narrative and persuasive writing, is a common genre in school writing assessments. Much of the writing students will do in school and then in higher education and in the workforce will be informative writing.
- Several of the assessments explicitly define an audience and, often, a genre as part of the writing task. One prompt from the National Assessment of Educational Progress (NAEP) assessments asks students to write a letter to the school principal on a specific issue. A Smarter Balanced Assessment Consortium informative writing task for Grade 6 students asks the student to write an informative article on sleep and naps (the topics of the reading questions) for the school newspaper that will be read by parents, teachers, and other students.
- All of the other assessments that employmulti-trait scoring use the same or similar scales for all traits. Moreover, they all employ significantly fewer trait categories. The Smarter Balanced Assessment Consortium employs three scales: two are 1-4, and the Conventions scale is 0-2. British Columbia’s Foundation Skills Assessment uses five scales, all 1-4. The Scholastic Aptitude Test (SAT) has three 1-4 scales that are not summed, and UK tests such as A and AS Levels have multiple traits, usually four to six, that are always scored on scales that are multiples of 1-5 levels.
- Most of the assessments, and all of the assessments that focused on the primary and secondary years/grades, allowed students access to dictionaries and, in some cases, grammar checkers or thesauri. Some of the assessments are now on networked computers or tablets that include standard word processing applications with spell-checkers or dictionaries and other tools for writing. Comparison of other Anglophone governmental and non-government organisation essay tests along with an analysis of the NAPLAN essay demonstrate that the NAPLAN essay is defective in its design and execution.
- There is a complete lack of transparency in the development of the NAPLAN essay and grading criteria. There is no publicly available document that presents the rationale for the 10 specific criteria used in marking the NAPLAN essay and the assignment of their relative weights. This lack of transparency is also evident in the failure of the Australian Curriculum Assessment and Reporting Authority (ACARA) to include other stakeholders, such as teachers, local administrators, parents, professional writers, and others in the formulation, design, and evaluation of the essay and its marking criteria.
- Informative writing is not assessed although explicitly included in the writing objectives of the Australian Curriculum. Informative writing is probably the most common and most important genre in both academic and professional writing. Because that which is tested is that which is taught, not testing informative writing devalues it in the overall curriculum.
- Ten marking criteria with different scales are too many and too confusing, causing high-level attributes such as ideas, argumentation, audience, and development to blend into each other even though they are marked separately. Given the number of markers and time allotted for marking approximately one million scripts, a very rough estimation would be that, on average, a marker would mark 10 scripts per hour, or one every six minutes (360 seconds). If we estimate that, on average, a marker takes one-and-a-half minutes (90 seconds) to read a script, that leaves 270 seconds for the marker to make 10 decisions, or 27 seconds per mark on four different scales. It is inconceivable that markers will consistently and accurately make 10 independent decisions in such a short time.
- The weighting of 10 scales appears to be arbitrary. The 10 traits are marked on four different scales, 0-3 to 0-6, and then totalled to compute a composite score. Curiously, the category Ideas is given a maximum of 5 marks while Spelling is given a maximum of 6.
- There is too much emphasis on spelling, punctuation, paragraphing and grammar at the expense of higher order writing issues. While mastery of these skills is important, the essential function of writing is the communication of information and ideas.
- The calculation of the spelling mark, in particular, may be unique in Anglophone testing. It is as concerned with the presence and correct spelling of limited sets of words defined as Difficult and Challenging as it is with the absence of misspelled words. Markers are given a Spelling reference list categorising approximately 1000 words as Simple, Common, Difficult, and Challenging. The scale for the spelling criterion is 0-6. A script containing no conventional spelling scores a 0, with correct spelling of most simple words and some common words yielding a mark of 2. To attain a mark of 6, a student must: spell all words correctly; and include at least 10 Difficult words and some Challenging words or at least 15 Difficult words. 8 Towards a New NAPLAN: Testing to the Teaching
- The NAPLAN grading scheme emphasises and virtually requires the five-paragraph essay form. Although the five-paragraph essay is a useful form for emerging writers, it is extremely restrictive and formulaic. Most arguments do not have three and only three supporting assertions. More mature writers such as those in Year 7 and Year 9 should be encouraged to break out of this form. The only real advantage of requiring the five-paragraph essay form for large-scale testing appears to be that it helps to ensure rapid marking.
- Although “audience” is a criterion for marking, no audience is defined in the writing prompt. There is a significant difference between a generic reader and a specific audience, a distinction that the current NAPLAN essay ignores but is essential for effective writing.
- Specificity in marking rubrics on issues of length and conventions not only skews the test towards low-level skills, it also makes the test developmentally inappropriate for lower years or stages. Several of the marking criteria specify at least one full page as “sustained writing” or “sustained use” necessary for higher marks. It is unrealistic to expect most Year 3 students to produce a full page of prose in 40 minutes.
- The supplementary material provided to markers on argument, text and sentence structure, and other issues is trivial at best and incorrect at worst. It should to be redone entirely as part of the redesign of the NAPLAN essay. Markers should be surveyed to discover what information would be most useful to them.
- The 40 minutes students have to plan, write, revise and edit precludes any significant planning (prewriting) or revision, two crucial stages of the writing process.
In summary, the NAPLAN essay fails to be a valid measure of any serious formulation of writing ability, especially within the context of its current uses. Indeed, NAPLAN’s focus on low-level mechanical skills, trivialisation of thought, and its overall disjunction from authentic constructs of writing may be partially responsible for declining scores in international tests.
There should be an impartial review of NAPLAN, commencing with special attention being paid to the writing essay, leading to a fundamental redesign of the essay and the reconsideration of its uses. Such a review should also consider the way in which NAPLAN is administered and marked, its current disconnection to a rich curriculum and the specific and diverse teaching programs that children experience in classrooms.
Such a review should be an inclusive process encompassing all elements of the educational and academic communities with the key focus areas identifying the particular needs of students, how they have progressed in their class with the programs they are experiencing and how systems, jurisdictions and the nation can further support their intellectual growth and futures.
A principal emphasis in this review should be to promote alignment of the curriculum, classroom pedagogy, and all forms of assessment; that is, to test to the teaching. If students consider classroom exercises and outside assessments to be indistinguishable, and both reflect the curriculum, then assessments reinforce teaching and learning rather than possibly subverting them.