Paul Walker, Murray State University
(Published: February 28, 2013)
There is only intuitive knowledge. Deduction and discursive argument, incorrectly called examples of knowing, are only instruments which lead to intuition . . . Intuition is the presence of consciousness to the thing.
- Jean-Paul Sartre, Being and Nothingness 240
One week, I happened to read both Paul Lynch’s explication of composition’s “apocalyptic turn” in College English and Lewis Lapham’s Harper’s essay on the importance of historical understanding. Were I not already thinking of writing assessment as a continuous response to an historicized disciplinary past, a threatened present, and an alternatively dire and improved future, these essays would have gone straight into the well of interesting reads. However, Lapham’s evocation of an Arab proverb—that we have less reason to fear what might happen tomorrow than to beware what happened yesterday—combined with Lynch’s echoing of Bruno Latour’s theories regarding composition’s purpose in an era of future anxiety—together provide a prescient frame for an evolving claim: in terms of writing assessment, the apocalypse has arrived, and its arrival, confirming the proverb, has been facilitated by the actions of those who have reason to care about it the most: theorists and practitioners of composition.
What a thing to say! I know. Hyperbolic, of course, since the use of apocalypse requires it never to arrive, always just ahead of the present giving us time to fear it and change something so it is ever delayed. The present situation of writing assessment is nevertheless dire; even while more and more of us are engaging in its scholarship, its practice has grown increasingly constrictive because of adherence to a certain methodology, which while useful and beneficial in several ways, has produced too much trust in an ontological fallacy – that an object exists independent of the subject and situation that makes it an object, which then allows it to be reducible and quantifiable.1 This methodology is calibrated, multiple-reader holistic scoring of student papers. I recognize that numerous contextual and situational variations of this methodology have been developed by teachers and scholars who have a genuine and sincere interest in fair and accurate assessment of student work. Fairness and accuracy are presumed more likely because of the methodology’s cultural and social science-based trustworthiness: readers create a (or use an existing) rubric, calibrate with each other in norming sessions, and check each other by a system of agreement of at least two readers within an acceptable quantified range (usually 1 point on a 4- or 6-point scale). Although researchers acknowledge the importance of contextual adaptations, there exists a conviction that calibrated holistic scoring, as a methodology, produces valid and reliable results. Even as portfolios have rightly overtaken single-document assessments, calibrated holistic scoring’s role remains essentially the same (e.g. in White’s “Phase 2” and Haswell’s “Tier 2” variations of portfolio assessments). The general methodology’s prevalence and assumed legitimacy cause many of us to take for granted its derivative method’s position as the most reliable and valid form of direct writing assessment of artifacts, refraining from challenging its continued use. In so doing we also view the singular method as comprehensive, for even suggested variations do not seriously question the necessity of direct writing assessment. Our acceptance of direct writing assessment, along with our legitimization and continued use of calibrated holistic scoring have proven extremely influential, and harmful, to our K-12 colleagues, for whom our attempts to wrangle a valid and critical writing assessment methodology have arguably backfired because we have neglected the expertise inherent in teaching.
Teachers as Expert Assessors of Learning
We care about our fellow teachers because scholars of writing and rhetoric are first teachers of writing and rhetoric. And implicit in teaching are indirect and direct attempts to identify whether what one purports to teach is learned by the student. Teaching without assessment of learning cannot be labeled as teaching, but rather lecturing, preaching, or merely speaking. The present situation, however, is a result of the separation of assessment from teaching, and in particular, the distinction between writing assessment and writing instruction. The separation has occurred in a few ways, most recently by the reaffirmation of writing assessment as its own field (Yancey, O’Neill). This categorization creates a taxonomy for burgeoning scholarly effort, but it also threatens the whole act of teaching. Similar attempts to separate assessment from teaching have been built on the claim that technology, either by computers or rubrics, can both minimize the effort of teachers and the missteps that teachers supposedly make—removing the responsibility of assessment from teachers for their own sake as well as the students’. The attention to assessment has produced useful and interesting scholarship, but its popularity as a specialization has overshadowed implicit aspects of teaching in order to meet the objectives of a normative culture that conflates fairness with standardization and confuses learning with the regurgitation of trivia. Our involvement in this development is our dangerous past, our unwitting akrasia. By first adopting (grudgingly) psychometrics in writing assessment, and then continuing to emphasize validity and reliability even as our understanding of the complexity of writing increased, we have acted against our own best judgment.
Whether a continuation of writing assessment’s third wave (Yancey), or an actual fourth wave, theories in the last decade insist on contextual applications of assessment models, dismissing general rubrics, and moving towards more directed self-assessment by students (Broad, Gallagher, Huot, Lynne, Royer and Giles, Wilson, among others), perhaps culminating in an ecological model of assessment proposed by Elizabeth Wardle and Kevin Roozen. Our discipline accepts that an assessment plan, if adopted, must emerge from the context for which it is being used to be fair, accurate, and appropriate. Unfortunately, many writing assessments are imposed, rather than developed, and we share in that blame. Our research and efforts to produce appropriate and fair writing evaluation stemming from our classroom practices have been very visible to policymakers (Broad 2). We are implicated in current trends because we have publicly affirmed “best practices” in assessment that, even when insisting that context matters, are method-heavy, meaning they are easily misused to contradict that guideline. For institutions under pressure to “show results” immediately, the quickest and most inexpensive move is to ignore context and adopt a “proven” method.
Kathleen Blake Yancey’s historicizing of the three waves of writing assessment illustrates the tug-of-war of theory, practice, and outside pressures that permitted “a kind of theorizing that is congruent with a composition studies that is art and practice” (1201). When writing assessment was first imposed as a necessity, we resisted, then adopted, the terms of psychometricians, continuing even as reliability and validity came under question as our understanding of writing’s complexity and contextuality increased. But even as we sympathize with our K-12 colleagues who are currently feeling the pressure of a national curriculum and on-demand writing tests, we ignore how our actions over the past half-century, especially in the last two decades, have inadvertently made the K-12 situation possible. The development and linkage of the Common Core State Standards and test-performance teacher accountability would not be not possible without “proven” methods for assessment. And having such assessments has led to widespread claims that hiding in every school are “bad teachers,” whom only standardized assessments can find. We legitimized rubric-based holistic scoring, and although we cringe at the ways that these methods are carried out by machines or by governments and for-profit education companies, we have not retracted our support of the general method and its several related practices and variations, still asserting that one assessor/teacher is not enough. In that vein, we have failed to recognize that while methods are the result of teacher expertise in practice, the calibrated holistic method itself is a visible artifact that brings the false and scientistic assurance that it represents a real Thing, in Bruno Latour’s terms (Lynch; see also Petruzzi “Articulating” and “Convalescence”). Meanwhile, our field pushes for a “unified field of assessment” (O’Neill) that rhetorically ignores how conflict generates ideas, while agreement can subsume innovation.2
The past and current reliance on one general methodology coincides with the degradation of trust in teachers. In a society in which standardization is offered as the primary solution to perceived unfairness, the expertise of individual teachers is systematically undermined. In K-12 public education, teachers are not trusted to develop a curriculum, but deigned capable enough to teach a prescriptive, “expert-consulted” curriculum intended to prepare students for college or careers. Nor are teachers trusted to handle the periodic assessments of that aim, for they are now relegated to managed workers, occasionally “given” freedom to innovate as long as results remain adequate. The existence of high-stakes assessments confirms the lack of trust in the professional expertise of teachers, but it is the increasing removal of any assessment from a classroom teacher’s control that continues to undermine the teacher, who is encouraged, either directly or subtly, to relinquish innovative or successful portions of her teaching to meet the requirements of outside assessment. In Kentucky, where I work, after more than a decade of conducting portfolio-based writing assessments in K-12 schools, the legislature switched to on-demand writing tests as the only required writing assessment. Most teachers supported the elimination of portfolio assessment because they were burned out helping students complete their portfolios; in many cases, they felt the portfolios represented teachers’ writing more than students’ writing because of the accountability pressures. But many of those same teachers now say that they must spend too much time teaching students how to do well on on-demand tests, even while knowing that on-demand writing is useless except for testing. In other words, no matter what method of assessment is used, teachers are encouraged by administrators and forced by accountability measures to teach to the test.
Holistic scoring originated as a method for avoiding test-focused measurement of writing. Yet although it liberates the rater from strict analytical judgments of writing, it still adopts the reliability and validity emphases of empiricism, the use of multiple readers, and the goal of a “true score.” These things diminish the subjectivity of both the writer and rater while discounting the teacher’s deliberate and intuitive practices. A lone teacher’s assessment is valid and reliable because both the teacher and student are active parts of the complicated contextual factors in the classroom. The teacher has developed and communicated objectives and outcomes, and has written the assignment prompts. The grade on the student paper indicates a contextual assessment of a contextual assignment. To dismiss teachers’ evaluations supports a distrust of their expertise in favor of supposed fairness, but as William Smith noted:
It should not shock anyone that teachers—who know the student, the human, as well as they know the student’s writing—do not give the same “grades” as raters who don’t know the student. (“Importance” 316)
Classroom teachers’ perceptions of student papers are influenced by the numerous swirling factors involved in “knowing” students and themselves—the very things that make writing assessment subjective. Peter Elbow notes that teachers already “take account of the complex context for writing: who the writer is, what the writer’s audience and goals are, who we are as readers and how we read, and how we might differ in our reading from other readers the writer might be addressing” (192). The denial of these nuanced impressions, which are enhanced by contextual experience and knowledge, is the objectification of the process. Under the guise of objectivity, those impressions are supposedly removed from the assessment process, which, if even possible, disconnects the act of writing from the evaluation of it. Any writing teacher will value certain elements, and as Broad’s dynamic criteria mapping and Mike Markel’s criteria development show, those elements will appeal to individual teachers differently. Holistic scoring allows for subjective reactions within a negotiated collective range but teachers and raters differ because the teacher is validating assessment by improvement in student’s work, while the rater is validating the assessment by its abstract use and replicability.
When Yancey anticipated future waves of writing assessment theory, she mentioned two possibilities relevant here: a focus on “individual assessment as interpretive act” and a focus on “the kinds of expertise and ways that they can construct and be represented in writing assessment” (1201). The reiteration of teaching and assessment as inseparable combines Yancey’s two possibilities, beginning the reestablishment of teachers as expert professionals. The direct evaluation of our students’ artifacts, whether portfolios, timed essays, or formal writing assignments, represents the central labor and driving theoretical frame of our discipline. Through this laborious task we increase our ability to intuitively recognize quality writing—to know it when we see it. We typically utilize that growing ability to engage in reflective teaching and teacher research to improve our curriculum, methods, and responses to students.
In this way, we trust ourselves as growing or established experts but understand that expertise is not a culmination but continuous increased recognition of our own capability and incapability. We sense this because we encounter anew each semester individual students that are unique and complex learners, who exhibit both familiar and unfamiliar problems. Despite this understanding, our cultural and institutional circumstances fetishize method over context, complexity, and expertise, so that calibrated readers and general rubrics are chosen deliberately to ignore learner complexity and require us to validate our judgments of student work. When we ourselves seek validation for improvement in our practice through reflection, mentoring, or conversations with colleagues, we improve, but when a process or system presumes that such validation is necessary for accuracy, the sense of distrust may cause us to doubt our individual judgments, not because we are unsure of our own ability, but because we are faced with someone else questioning our expertise. If we become used to looking elsewhere for validation (as Walker Percy discusses in "Loss of the Creature"), we underestimate the value of our individual expertise gained from our own credentials and hours of labor, never resisting the negation of difference. Studies of expertise show that a combination of knowledge, imitation, and many hours of deliberate practice produce expert performance (see Ericsson “Influence;” Chi, Glasser, and Farr; Ericsson, Krampe, and Tesch-Römer; Feltovich, Prietula, and Ericsson). By not adequately acknowledging our own deliberate practice in our classrooms, independent of but informed by a disciplinary discourse community, we distrust our own intuitive ability to judge writing quality within our domain, thereby limiting its necessary complexity.
Akrasia, in my use of the term here, should not be interpreted as presuming that composition scholars purposely acted against their judgment to make possible the misuses of holistic scoring. I surmise that the individuals and groups who defended writing assessment by legitimizing holistic scoring had no way of knowing how it would be coopted and perverted as part of large-scale, high-stakes assessment of student writing performance and teacher accountability. However, since only a few scholars have been able to offer radical alternatives to the “best practices” of direct writing assessments in the past two decades, reflection reveals we are entrenched in the very institutional habits we otherwise disdain. We can tell testing companies that machines can only count (Perleman), but our understanding of our own past should spur us to also tell them that more importantly, neither machines nor calibrated raters are able to interpret the rich context of the teachers’ classrooms, from which papers emerge and in which they are situated and where they must remain to be a real and meaningful object. We must do more than decry the use of the method we enabled, we need to offer a non-radical alternative for reversing the tide: to legitimize our intuitive and deliberate expertise as teachers and emphasize the unique and essential roles teachers play in facilitating and measuring student learning.
Relying on Individual Intuitive Expertise
Most expertise scholarship addresses doers of visible things: chess players, athletes, musicians, etc., rather than scholars whose expertise is defined by what they know. Writing evaluation expertise bridges knowledge and practice, and scoring guides are commonly suggested for speed and consistency, but they also draw strong opposition. Robert Connors and Andrea Lunsford, for example, explain that rubrics can overshadow helpful comments because the rubric-based vocabulary hinders honest communication to students, constraining individualized, contextually appropriate explanations. This recognition raises a central point for this essay: rubrics do not appear out of thin air, but are thoughtfully developed by writing teachers. However, methods derived from our deliberative expertise, once created, have the potential to diminish the application of our intuitive expertise that develops from classroom teaching and thoughtful exercises such as creating a scoring guide. Hubert and Stuart Dreyfus posit, “no system based upon heuristics will consistently do as well as experienced experts, even if those experts were the informants who provide the heuristic rules” (109).
In fact, Dreyfus and Dreyfus view individual intuition as essential for experts to perform beyond standardized measures such as rubrics. Heuristic-based competence in completing tasks, they suggest, can be met by novices or machines, but experienced humans can become proficient and expert at tasks because deliberate experience teaches experts that competent problem solving based on abstract and context-free rules is slow and detached from the actual nuanced situation. Novices learn from those rules, but an expert is slowed down by rule-based constraints. According to Dreyfus and Dreyfus, proficiency and expertise “are characterized by a rapid, fluid, involved kind of behavior” (27) exhibited through an immediate perception of a situation’s similarities with past experience—in other words, intuition. In the practice of calibrated, multiple-reader scoring, the collective expertise of calibrated scoring groups may produce reflective and engaging discussions about writing, culminating in an expertise-driven and abstract guide for judging quality. On the other hand, the findings of Dreyfus and Dreyfus suggest that a group of experts in a decontextualized scoring situation subvert their expertise to adhere to task competence. The expertise that provides the “rapid, fluid, involved” action is constrained by the necessity of conforming to colleagues. In sum, calibrated scoring deliberately diminishes individual intuitive expertise even while developing it.
Trust in teachers’ individual expertise forms a defense against standardized, decontextualized, and computer-scored writing assessments by emphasizing professional knowledge and contextual understanding. Perhaps it is naïve to say to policymakers, “Trust us.” But we, as a discipline, should not ignore the more dangerous naiveté of pseudo-accurate quantitative writing assessments. We are not so resolute that we cannot retract our approval of decontextualized calibrated scoring. While our training and experience are products of collective learning and our involvement in a community, they eventually result in making us, individually, connoisseurs of student writing; not as a technician consistently following the community’s rules as we once did, but because we intuitively gauge the quality of student writing through our knowledge and experience within a specific domain (Eisner). The most common and accepted assessment method is at odds with our connoisseurship and our intuition that Dreyfus and Dreyfus identify as hallmarks of individual expertise. Their work identifies in experts a “deliberative rationality,” which “does not seek to analyze the situation into context-free elements but seeks to test and improve whole intuitions” (36). Inferring context-free rules from expert performance is useful for novices and advanced beginners and ultimately useful for attaining competence. But in order to advance from competence to proficiency and then to expertise, one stops applying abstract rules to situations. Instead, those situations, as mentioned above, repeated again and again produce an intuitive sense. This deliberative rationality, not limited by the rationality of rule making, is how experts connect concrete situations and their responses to the similarities or differences exhibited. Thus, a trust in our expert judgment encourages reading student work more as a reader (see Sommers; Wilson), allowing for the “complex context” of the student to blend with the complex experience of the teacher, resulting in evaluations that connect as many elements as possible.
As mentioned, the educational and cultural importance of assessment for accountability has lately increased, and the “validity of global estimations” (Williamson and Huot 19) from limited data has strengthened. Holistic scoring methods and the increased use of portfolio evaluations have resisted more “technologically efficient” and analytical methods of writing assessment, allowing the local expertise and context of writing programs to play a part in evaluating student writing. Yet too often it seems that assumed objectivity and inter-rater reliability inherent in the method are the primary measures of success and validity of assessments, rather than the identification of tangible ways to improve what was assessed. The perception that qualitative methods produce results too “messy” for administrators, policymakers, and accrediting bodies ignores the concept of assessment as a natural part of teaching—to help teachers identify ideas or concerns they did not anticipate in their planning. I fear that the valuable collaboration and reflection that occurs in calibration sessions are being subverted by the ostensible need for quantitative, “value-added” results, results which are then used to justify further offenses against our subjective expertise3 and our insistence that writing is a complex phenomenon.
Because of that fear, I emphasize the role of expertise and intuition in writing assessment. In doing so, I am aware of and understand the varying levels of experience and credentialed expertise among writing instructors, and I share the concerns of those who are nervous about impressionistic intuition because of the range of knowledge and experience in our discipline. Nevertheless, my exploration of expertise scholarship coinciding with trends in K-16 has elevated the need, in my mind, for a theoretical defense of disciplinary expertise in our field. Individual expertise has typically been a target of profit-seeking enterprises that value efficiency, and the dismissal of teacher-classroom expertise in favor of accountability for mandated tests undermines teacher professionalism because of the presumptions alluded to earlier: common standards are necessary, common tests and curricula help students and teachers, and cultures of constant evaluation improves teaching and learning. As these problematic presumptions move upward to universities in the form of CAAP and CLA exams, or triangulated assessments of Quality Enhancement Plans (QEPs), the issues of contingent instructors and perceived burden of teaching composition will worsen. Increased standardized assessments should impel us immediately to prevent our assessment practices to be co-opted by policymakers unfamiliar with the history and contextual enhancement of those practices. However, heeding Edward White’s aphoristic warning to do our own assessment or someone else will do it for us should not result in assessment methods that resemble assessments designed by non-writing-instructors.
Hence, my call for intuitive expertise as a defense and alternative philosophy of assessment. In Blink, Malcolm Gladwell asserts that experts make “good” decisions based on the briefest look, taste, or smell of an object or situation. Gladwell supports his premise with examples relating to art, music, and taste testing, but his review of scholarship suggests that the idea would apply to the assessment or evaluation of writing. We make quick decisions in managing classroom discussions and situations and we can look at a student paper’s first few paragraphs and intuitively know how to respond in a constructive way. The question of trusting that “knowledge” in high-stakes (invented or real) situations has guided every reaction to assessment demands. Smith and Bob Broad both imply intuitive expertise by showing how “course-taught expertise” and “teacher’s special knowledge” overcome and undermine attempts to make assessments objective in the social scientific sense. Rather than arguing endlessly whether or not such intuitive knowledge should be trusted, I think the more useful discussion is how our intuitive knowledge can help us overcome our past – avoiding akrasia by halting any further devaluing of teacher expertise. I would rather see a few eccentric decisions resulting from trusting teachers than compromising our discipline for two-decimal point accuracy. While there are those who believe that we can keep “the traditional emphasis on responsible assessment (reliability, validity, sound practice)” even while insisting “that what happens in the classrooms matter” (Condon “Reinventing” 167), I think making classrooms matter raises hard questions about how responsible assessments that rely on a calibrated, holistic method affect what actually happens in the classroom.
The Dilution of Expertise
Charles Bazerman states that he distrusts the term expertise because it tempts us to think of a person’s skills or accomplishments in a domain as a single coherent package, perhaps even closely congruent to the expertise packages of others accomplished in the same domain” (131). Viewing expertise as a culmination, or in Bazerman’s terms, a package, misrepresents how it is gained and exhibited. Michael Carter suggests that expertise needs a broader definition, seen as a continuum, because it includes a mix of general and local knowledge (274). I agree that expertise is variable and dynamic, but favor a definition that acknowledges general knowledge but still depends more on the domain-specific knowledge gained through deliberate practice rather than learned heuristics. In fact, the idea that general knowledge makes up a large portion of expertise is what makes Bazerman nervous: general knowledge is bureaucratically easier to justify offering what Broad terms “fast-food style” (2) assessments to all students, dismissing the necessity of focused experience and deep knowledge within a domain. The domain-specific skills and knowledge a person attains is what makes that person an expert. Expertise and expert performance must be domain-limited, or they cannot be meaningful terms. Yet “packaging” is a common extrapolation in expertise studies because of a cultural eagerness to identify commonalities that could produce proficiency faster, which explains why Micheline Chi sees domain-limitedness as a problem in expertise training. However, Feltovich, Prietula, and Ericsson, reviewing major studies in expertise, note that “the discovery of the complex structure that executes expert performance and mediates its continued improvement” has a clear, negative implication: “[expertise research] has pretty much dispelled the hope that expert performance can easily be captured and that the decade-long training to become an expert can be dramatically reduced” (61). The immunity to “packaging” makes expertise worthwhile in a system built and reliant on a community of specialized experts.
Still, the continued attempt to package expertise, or apply it outside limited domains, has diluted its meaning, affecting its actual use. Cheryl Geisler explains how expertise has been professionalized because of its “guaranteed” achievement through schooling. Because society trusts the schooling process and the credentials it bestows, any professional is assumed to be an expert and “the source of their professional competence is in the academy” (74). However, the credential alone, which represents adequate laboratory performance of skills and the assumed acquisition of knowledge, is not professional expertise, because only the focused application of that knowledge over and over again can produce expertise. For Feltovich, Prietula, and Ericsson, the “negative” findings regarding expertise’s transferability and generalization led them to propose thoughtful deliberate practice activities to show students how to gain expertise after schooling (61). Contrary to this individualized proposal for the combination of deliberate practice and knowledge, Bereiter and Scardmalia look toward “an expert society,” in which expertise does not belong to an individual, but to a “knowledge-building community,” such as a classroom or other “functioning social unit.” This idea of a knowledge-building community replacing individual expertise is at the core of some writing assessment methods, with the presumption that multiple-reader reliability and fairness ensure validity for both in-context and out-of-context assessments of student performance. The greatest value of rater-calibration sessions is the collective discourse and social construction of knowledge, but the assumption that a procedure to generalize such built-knowledge can supplant the knowledge and experience of individual experts within a specific domain is contrary to scholarship on the acquisition of expertise and expert performance.
Nonetheless, our society is skeptical of expert intuition and accepting of quantifiable methods for measuring quality—and certain academic fields are probably the most ardent about such measures. Hence, everything, including the number of years it takes to become an expert, has been quantified, even though scholars of expertise define experts as, among other indicators, domain-specific and context-based. The accepted figure for attaining expertise is 10 years or 10,000 hours,4 which is probably realistic in the sense that if someone has done a specific task—or deliberately practiced something—for that period it would be difficult not to be an expert at it. The number intimidates the novice and validates the expert who has specialized in something for years. The reductive quantification of a long and complex process seems to actualize expertise, giving us the idea that there may be shortcuts, now that its entirety is “known.” This has happened in the instruction of writing. Anne Beaufort believes that we can “contextualize writing instruction more fully” and “teach for [expertise] transfer” if “we articulate these knowledge domains and apply them to shaping curriculum” (17). However, such articulation and application of “mental schema” eventually decontextualizes expertise to a broad curricular approach, in contrast to Dreyfus’ and Dreyfus’ highly contextual definition of expertise. Recognizing commonalities of heuristics in our thinking assumes that the teaching of such heuristics replicates in students what we have discovered ourselves through a complex and possibly arduous process. This contradicts Geisler’s package deal that generalized school curricula guarantees expertise by credential. To fulfill the guarantee, a first-year composition course, for example, must enable not competency, but mastery of writing skills. Before I began this exploration, I would not have found anything disconcerting in the following excerpt from a colleague’s first-year composition assignment prompt, but now it seems misleading:
The research paper should reflect your mastery of the skills of analysis, synthesis, argumentation, and audience-appropriateness. Until now you have been doing so much reading about your topic, you’re now an expert. As such, your research paper should be modeled after the journal articles we’ve all been reading.
Of course, Andrea Lunsford and John Ruszkiewicz provide similar encouraging words to students in their popular textbook Everything’s an Argument: “Almost all of us know enough about something to merit the label expert” (183). College freshmen, they claim, are likely to be experts on high school, but I think that oversimplifies the numerous specific tasks that make up what “high school” means. Sommers’ and Laura Saltz’ identification of the contradiction of students writing as experts despite being novices illustrates of the problem of expertise as a diluted term; while mastery and expertise are commonly used to describe student writing progress as they improve, the more appropriate terms are competence and proficiency. The Common Core State Standards for K-12 also misuse mastery where competence and proficiency are more appropriate. In the context of any English course, to presume that a semester or two will produce mastery or expertise on one topic or in all types of academic writing domains indeed fits with the “literacy guarantee,” but it is simply false. Even speaking of students as potential expert writers in the short term seems disingenuous. Carter writes:
Every day in our composition classes we answer the questions, what does it mean to be an expert writer and how do writers become experts. The goals of our classes indicate what we think expertise in writing is, and the way we teach indicates how we think writers achieve expertise. (265)
Talking about first-year composition students becoming expert writers weakens the meaning of an actual expert writer, though that does not mean students are unable to be already proficient, or even in some way be farther along the expertise continuum than expected. There may be rare cases of exceptional ability, but for most of us, significantly more time is needed in a specific discipline or practice to achieve expertise. Imitating an expert, on the other hand, as Sommers and Saltz describe, is an old and useful developmental strategy for achieving competency. Still, the extensive distance between novice and expert must be baldly evident to most students, and their struggles to cross that distance might stem from not recognizing that competency is adequate and proficiency the appropriate goal for their undergraduate college years.
When mere competency is not adequate for a task or career, as in the case of university instructors, we must also be careful of the presumption that a particular Ph.D. is an automatic expert. No matter where we are in our career, most of us would acknowledge that in the first years out of graduate school we are still engaged in deliberate practice. If a new assistant professor doesn’t have enough experience to be an expert, the graduate assistant has far less. In many of the published assessment studies, graduate students and young faculty are the evaluators, so calibration seems appropriate if for no other reason than to enable proficiency in the raters. But the use of calibrated holistic scoring to most accurately enable inexperienced evaluators or norm evaluators from a different context (such as AP exam reader calibration) should not be justification for its applicability for all direct writing assessment, or even as justification of the necessity for direct writing assessment. Petruzzi, who provides a deeper hermeneutical analysis of the underlying assumptions and practices of assessment, shows how the general method of calibrated scoring contradicts ontological hermeneutics by extricating supposed objects from the subjects and context that create the object as such. Thus, because objectivity is an illusion created by the methodology rather than the context, calibrated scoring cannot be assumed as a more appropriate or accurate method than individual expert judgments. Within such an ontological frame, individual, subjective evaluation by teacher-experts who are involved in the creation and implementation of a writing assignment is at least as fair and accurate as any assessment requiring evaluation of student papers removed from the classroom. After all, if a group of composition instructors have been teaching multiple sections for a significant amount of time, a direct writing assessment should not be necessary to determine the average ability of their first-year students—we can simply ask the teachers.5
Such a solution is oddly radical. Granted, our culture is empirical and not all teachers have significant experience to merit expert status. However, whether or not we have taught writing in the classroom and evaluated student papers long enough to be called an expert, all English instructors have ever-increasing experience in aspects of teaching. With each paper evaluated or class taught or workshop attended, our competence increases. That increased competence, which develops into proficiency and at some point professional expertise, includes an intuitive connoisseurship—a sense that arises out of general knowledge, local knowledge, and focused experience gained within a specific context (see also Carter 269). As Bazerman insists: “Unlike such socially simplified games as chess, which create a single universal playing space that in effect excludes all concerns, history, and social dynamics other than those that specifically occur over that board, rhetoric is always implicated in social complexity” (136). We should dismiss any talk of unifying the field of writing assessment because, unlike chess, every institution—as well every program and classroom—has what could be called different rules and different pieces that deny unification of methods or theories. Unification of effort, and sometimes theory, helped legitimize the discipline of composition and rhetoric, but its value can also be detrimental, which is why akrasia both applies and does not. A unified effort to establish best practices was judged an appropriate response to earlier threats to writing instruction, but we do not know what a non-unified approach would have produced for the present time.
Yet our field has known all along that assessment is a natural part of teaching,6 and therefore should be as varied as the personalities of teachers. But as realists and pragmatists, we reactively and proactively created a seemingly sound and responsible general method based on accepted empirical practice. Even as this method continues to be questioned by more composition scholars, out-of-class assessments will likely become more entrenched. To slow that entrenchment, our emphasis, now radical, needs to be that no method of assessment represents a better practice than another, and we must continue insisting that any process and method be developed and administered by teachers in the context where the results can only really matter: the classroom. The general method’s inferred effectiveness and efficiency allows administrators to pressure faculty to adopt established “best practices” with limited consideration of the local expertise and context. This restricts trust in individual teachers, impeding their creativity in teaching/assessing and influence on policymakers to accept qualitative and contextual results that emerge from natural, reflective, teacher-facilitated classroom assessment.
Toward Connoisseurship of Student Writing
Given the work in expertise scholarship and the for-profit trends in educational assessment, I believe that to not embrace our individual intuitive expertise in teaching/assessing is to allow writing assessment to be packaged and delivered in ways beyond our professional control. We know that writing is a complex activity and resists procedurization. When designing and conducting assessments, we must consider the effect it may have on students, teachers, administrators and the public, and such forethought should produce innovative and unique methods. If a direct writing assessment purported to assess written communication in order to enhance its instruction uses a holistic scored rubric, with the conclusion that first-year students are writing at a level of 3 on a 6-point scale, we are left wanting how to improve instruction. Useful assessments should involve a lot of people, but it shouldn’t be tedious for participants. Assessments should involve activities that would be a normal part of teaching, and in all cases should never mechanize the processes of teaching and learning.
Yet high-stakes testing of writing aims at reducing writing to a mechanical procedure and its performance measured by a score alone. Emphasizing our connoisseurship of writing based on our expertise and intuition in teaching and evaluating writing is one way to resist the encroachment of reductive definitions of writing and writing ability. Furthermore, such emphasis limits the possibilities for computerized writing assessments. Prioritization of the calibrated, holistic scoring method can imply that it doesn’t matter where the writing came from or how it was handled through the “machine,” purposefully calling the subjective connection between the writer and the paper problematic. But that subjectivity, the connection between our labor and our visible work, is what retains our humanistic defense against complete automation.7 One of the tendencies in “improving” human ability is to quantify or procedurize the attainment of a skill – which for writers is a somewhat loosely defined writing process. Those who write a lot, however, know that the act itself overtakes the process, making it difficult to break down one’s own process into anything more procedural than Donald Murray’s squiggly circle. Messy individual processes do not work for a textbook, or for a measurable curriculum, and so we work with others to scaffold what we have ourselves gained from experience and knowledge, suspending, for the sake of order, our discomfort with defined steps and procedures that supposedly simplify the learning process. Even though our own experience and knowledge contradicts the organized curriculum, we are anxious about stepping beyond the academic safety of acceptable practices, including assessment. But as Richard Haswell explains, only after many years of adopting the “sacred cows” of statistical reliability and efficiency that we borrowed from psychometrics do we recognize that perhaps that we are not actually after reliable or efficient assessments (“Automatons” 78).
However, to trust intuitive connoisseurship in individuals can conflict with our trust in science, which often claims no individual bias.8 The art world provides an interesting example: David Grann describes how a discovered painting, “La Bella Princippessa,” purported to be by Leonardo Da Vinci, pitted the authentication method of fingerprint analysis against the traditional art connoisseur’s opinion.9 He writes:
The desire to transform the authentication process through science—to supplant a subjective eye with objective tools—was not new. During the late nineteenth century, the Italian art critic Giovanni Morelli, dismissing many traditional connoisseurs as ‘charlatans,” proposed a new “scientific” method based on “indisputable and practical facts. (56-7)
“The desire to ‘scientificize’ connoisseurship,” Grann continues, “was . . . as much about the desire to democratize it, to wrest it out of the hands of art experts” (57), because the general public was “suspicious” of art connoisseurs.10 The history of assessment in education is full of methods claiming to eliminate the incongruities and idiosyncrasies of disciplinary experts (see Patricia Ericsson and Huot, Shermis and Bernstein, Willliamson and Huot). Anne Herrington and Charles Moran note that machine-scoring of essays sends the message that “human readers are unreliable, quirky, expensive, and finally irrelevant (497). Similarly, Grann relates the story of a Kansas couple told by an art dealer that their prized portrait could not be a Leonardo. In the couple’s resulting lawsuit, they “argued that connoisseurs offered only ‘air-spun abstractions and nebulous mumbo-jumbos’” (57). That idea, if not the same terms, is evident in the lack of institutional trust in the inherently subjective—even quirky—writing evaluation by individual teachers, couched in calls for more efficient and objective means of judging student writing. Even the judge in the Kansas couple’s case, Grann notes, “warned jurors to be wary of experts who relied on means “too introspective and subjective” (57).
The palpable distrust of non-measurable individual human judgments likewise exists in educational reform efforts to increase accountability, efficiency, and standardization. Our field has resisted by insisting on human scorers, but as we have seen with the K-12 common core state standards, even this practice is tenuous when non-teachers (especially corporations) are involved in assessment policy. For K-12 teachers, disciplinary expertise is secondary to what administrators and state legislatures insist on doing for the sake of testing. And there are plenty of examples of similar pressures at the university level. Therefore, I think White was only partially correct when he called the increased reliance on responsible holistic scoring a “triumph of the human” (“Holistic” 88). While holistic scoring legitimized writing assessment and according to White, helped disciplinary research embrace multiple demonstrations of writing and thinking ability, the triumph is incomplete because writing assessment was not protected from neo-empirical scientism (see Bullough; Denzin and Lincoln; Lynne; Petruzzi). Because of its inability to prevent the current situation, holistic scoring is an unfortunate instance of akrasia in our field. Whether our efforts now will produce additional problems later should be a major consideration. The Framework for Success in Postsecondary Writing is an example of a thoughtful theoretical approach to other trends in writing instruction, but we must push to prevent further impositions on teacher autonomy and professionalism. As long as we accept the necessity of out-of-classroom assessment and offer up our legitimized method, assessment will become less and less meaningful to our classrooms. Instead, we must embrace the complexity of connoisseurship and the sometimes messy world of intuitive expertise to avoid a complete and unnecessary disjunction between teaching and assessment of student learning.
Embracing the Complexity of Connoisseurship
Haswell claims, “the more complex tasks are the greater the temptation to simplify them” (“Complexities”). Evaluating student writing has been one area to which many simplifying methods have been applied. The goal of most of these, however, has been to reduce rather than to embrace the complexity of writing instruction. Perhaps it is natural to look for a simple answer, and every assessment method claims a form of simplicity, whether efficiency, consistency, fairness, or speed. All assessment methods compromise something to get other things right. That is the problem with “the field” of writing assessment as something separate from the teaching of writing. My exploration of expertise and intuition is not intended to replace the thoughtful feedback and reflection on practices that we believe necessary for student improvement. Such feedback hones our ability to identify quality writing, which increases our ability to explain why. According to Gladwell, “our unconscious reactions come from out of a locked room, and we can’t look inside that room. But with experience we become expert at using our own behavior and our training to interpret—and decode—what lies behind our snap judgments and first impressions” (183). As shown in expertise scholarship, our knowledge and training gives us the vocabulary to explain our initial impressions, which recursively become more intuitive because of the many, many times that we have decoded our own impressions relating to a specific task. The paradox here is when we recognize that our snap judgments can be trusted as much as the processes that we have been trained to trust, we still place more trust in those processes that led to our intuitive expertise.
To fully embrace connoisseurship and perhaps prevent future realizations of misjudgments, we must raise hard questions now regarding the conformity of experts to a scoring tool that partly suppresses that expert intuition. At my institution a holistic scoring team was organized with the intention to judge writing quality across disciplines for our QEP. During a norming session, one of the student papers produced a wide range of scores, and in the follow-up discussion, the group discovered that those who hadn’t taught the course and therefore hadn’t read one of the texts that the student explicated had given higher scores than those who had recently taught the course. The reason for this was that the way the student interpreted the text was at odds with the teacher-raters’ interpretation of the text, and this lowered their estimation of the student’s writing. For those who hadn’t taught the course, the interpretation was irrelevant to the communication of the idea, which was valued in the scoring guide. Acknowledging that expert opinion and normed opinion are different because of our individual experience means that we may need to change some of the conventional wisdom of the need for “objective” assessment, along with the accepted methods and requirement of empirical visibility.
If replication remains a permanent requirement for acceptable assessment, I am not sure there is a bridge between it and intuition. But confidence in our expertise, and the accompanying recognition of all teachers as true professionals, can elevate the importance and value of subjectivity as a diversifying and human-centered factor in education and by extension, educational assessment. As connoisseurs, we can recognize more confidently that we “know” quality writing within the specific and complex contexts where we teach and research. Our knowledge of writing increases because of our passion for helping students improve their own writing, and like connoisseurs of art, music, or wine, to know quality means that we also protect quality. However, rather than acting as snobs in our ivory tower, waiting for quality papers to rise from below, we protect quality by facilitating the best conditions for students to communicate and craft ideas. Further, we protect quality by insisting on the best conditions for all instructors to balance the work of the discipline with the labor that it requires, including facilitated conversations to discuss writing that communal assessments now provide. Our recognition of the interplay between context and intuitive expertise needs to be addressed to prevent the “best” judgments of our past from enabling further distrust of subjective, contextualized evaluation—the “nebulous mumbo-jumbo” from deep immersion in our domain.
1 Anthony Petruzzi ascribes the reductive inclination to scientism and then modernism, which “extends the purview of natural science to all of existence; then, all phenomena, human and non-human, are treated as if they can be reduced to objective information, to certain and strong representations of reality” (“Convalescence” 6).
2 The New Dorp High School profile in the October 2012 Atlantic provides an example of the difficulty in approaching education reform politely and trying to “unify” everyone. Both the National Writing Project and National Council of Teachers of English responded by celebrating the fact that its members were quoted in the article rather than critiquing how the article undermined both organizations’ positions on testing’s influence on writing instruction. The author of the article, Peg Tyre, echoed New Dorp’s administration in measuring the success of the “new” curriculum by the rise in test scores, never addressing by what means the tests measure writing ability. On-demand tests, which are generally evaluated by calibrated holistic scoring or computers, encourage formulaic writing, which the supposed “writing revolution” at New Dorp believes makes students better writers. In fact, emphasizing informative and persuasive essays early instead of creative and reflective pieces and drilling students on transition words makes students better at passing standardized tests. Many students in my college courses can write clearly; I wish I had more students who could write something interesting.
3 To sidestep legitimate questions about authentic subjectivity (e.g. Foucault), I am using subjectivity here as intuitive biases and tendencies emerging from multiple sources, including study, practice, and culture.
4 In another of his books, Outliers, Gladwell popularizes the idea that 10,000 hours of practice are required to become a world-class expert in anything. Some, like Feltovich, Prietula, and Ericsson, equate that number with 10 years devoted to a career or practice, and this is the accepted quantification in, for example, Ericsson’s edited volume The Cambridge Handbook of Expertise and Expert Performance. The 10,000-hour number’s origin seems to be William Chase’s and Herbert Simon’s 1973 study of chess masters, wherein they note that “the organization of the [chess] Master’s elaborate repertoire of information takes thousands of hours to build up, and the same is true of any skilled task (e.g. football, music)” (279).