Automated Essay Scoring Explained

by , under Features, For researchers, For teachers

Written skills are an essential form of communication. Because a writing sample, such as an essay, is able to indicate the ability to gather, synthesize, and present information in a clear, logical, and persuasive way, it is often used as a means to assess a student’s creativity, knowledge, and intellect. Consequently, essay assignments are common components of university and college entrance applications, standardized tests, and classrooms evaluations. However, when it comes to grading those essays, the task is daunting.

(See testdeclassement.com for a free online level placement test.)

Essay evaluation is time-consuming and expensive

The assessment of essays is time-consuming and expensive. If a college teacher spends an average of 10 minutes reading and scoring each student’s essay, scoring all of the teacher’s 150 students’ essays will require 25 hours of non-stop grading. It is not surprising that teachers limit the number of writing tasks they assign during the semester to just two or three essays, denying students opportunities for additional writing practice and feedback. Hiring more teachers to reduce class sizes could help, but it would lead to increased tuition fees and taxes.

Get a Moodle plugin to automate essay scoring.

Objectivity is difficult to guarantee

Furthermore, grading essays objectively is a problem. Grading spelling, grammar, and punctuation is relatively straightforward for teachers to score because errors can be counted, but grading content, organization, coherence, sophistication are more difficult to score impartially. Two teachers might score the same essay differently because of subjective impressions of quality, and one teacher might score the same essay inconsistently because of fatigue.

How automatic essay scoring works

With advances in computer technology, the possibility of grading essays with a computer has now become a reality. Automatic Essay Scoring (AES) allows teachers to assign scores to essays through computer analysis. It uses Natural Language Processing (NLP), a form of artificial intelligence enabling computers to comprehend and manipulate human language, to assess educational essays. An AES program works by extracting features such as word count, vocabulary choice, error density, sentiment strength, sentence length variance, and paragraph structure of high scoring essays to create a statistical model of essay quality.  Comparing a student’s essay to that statistical model allows the system to estimate a score in 2 seconds or less.

(Learn about the Virtual Writing Tutor’s free essay scoring system here.)

The earliest AES system dates back to 1966 when Ellis Page developed Project Essay Grade, the first computerized essay scoring system. The computers of the time were extremely expensive and there was not much advancement in the field until the 1990s when more systems were developed. In all systems, the goal is to improve the efficacy of written assessment and decrease human effort. When analyzing the effectiveness of AES itself, the system is evaluated based on its ability to be fair, valid, and reliable. If the system does not disproportionately penalize a group of people, if it measures what it sets out to measure, and if it repeatedly gives the same essay a consistent score and students can use the feedback to improve their writing, it is considered successful.

Advantages of automated essay scoring

AES supervised by a human teacher is clearly advantageous. It takes much less time to score, ensuring that results and feedback can be provided instantly. This is especially important with university and college classes that are so large it would be almost impossible to give each student frequent, detailed, individualized feedback. Since feedback is immediate, students are able to submit work at any stage of the writing process, receive feedback, make improvements, and keep writing. They no longer need to wait the customary two weeks for a teacher to comment and suggest corrections.  As a result, students write more frequently and make more revisions—two essential keys to becoming a better writer.

Consistency of scoring is also an advantage. AES grades each essay based on its own merits, and similar papers will receive the same grade. Computer scored essays are not subject to human bias and subjectivity. For example, when scored by a human rater, a student who is perceived as an “A” student may also receive an “A” on subsequent essays, even if they are not well-written. Likewise, a student who traditionally does not perform well on written assignments may receive a low grade, even though he/she constructed a well-written essay. Even the most well-meaning teacher can hold subconscious biases that affect students’ scores. An objective evaluation performed by computers can eliminate that bias. Many argue that AES systems can be just as good at assessing writing as a human instructor. However, not everyone agrees.

Critics of automated essay scoring

Critics of AES argue that computer-scoring focuses largely on surface elements and components such as creativity and originality of ideas are not adequately assessed. This is especially important for those students writing high-stakes essays (ones in which the outcome is of great importance for the test-taker). Additionally, if the students get to know what features are being evaluated, they may end up writing to the test. Test writers will compose a lengthy essay using big words and complex sentences, knowing the computer algorithm is set to look for those elements. Others worry that writers will lose motivation to write if they know a machine will evaluate them. Written communication assumes a relationship between the reader and the writer; without a human reader, the writer may not see the purpose for writing. This concern is particularly acute in a small classroom setting where the teacher-student relationship is important in written communication. Still others raise concerns about the quality of automated scorers. Occasionally, AES misses errors or provides bad feedback, not able to compete with the discerning eyes of an expert human evaluator. However, having a human evaluator doublecheck scores and the feedback generated by a machine seem to mitigate these worries.   

Conclusion

One cannot discount the advantageous acceleration of feedback and reduction of workload for teachers with the use of AES. Soon, students may come to expect automated feedback and scoring of their writing in all their courses, complaining to friends about old-fashioned teachers who make them wait unnecessarily for their scores.

Probably, there will also be teachers who overuse automated essay scoring systems, leaving students wondering who their audience is if only a machine ends up reading their essays. But when taxpayers start calling for lower taxes, governments may force colleges to reduce costs by increasing class sizes to the point that teachers feel they must use AES to manage their workload. An older generation will, no doubt, get nostalgic for handwritten comments in red ink and complain that youngsters have willfully dehumanized education and the writing process.

In all the hoopla, however, we would do well to remember that every profession in every domain embraces some degree of automation. Will essay writing courses be any different? Only time will tell.