Benjamin McAvoy-Bickford | ChatGPT is not going to save you from your essays

Barely a day goes by without some breathless thinkpiece about the cataclysmic effects of ChatGPT sashaying its way into the news I read. It’s going to kill coding! It’s going to kill the Common App personal statement! It’s going to, um, kill search engines and replace them with NFTs!

College-level essays seem relatively high on the list of ChatGPT’s targets for assassination, given the increasing concern about its plagiaristic abilities. Yet, as a college student, these worries have seemed unfounded to me. Sure, I could have plagiarized my assignments before an AI came along to help me do so, but I never did because it was a) unnecessary, b) immoral and c) unlikely to escape notice. Nothing about ChatGPT really changes that fundamental logic.

However, with so much ink being devoted to the dangers that ChatGPT poses to our educational system, I decided it was time to put it to the test. For one hour, I would see if it could write an assignment that I had already turned in: a LING 0051 research paper about the Phrygian language asking how other languages are related to it.

This seemed like it should be a comparatively easy assignment for ChatGPT. It didn’t require knowledge about current events or a particularly demanding format, and sources hadn’t been hard to find when I looked. The only two constraints were that it had to be roughly seven pages and that it had to have accurate citations, including ones in the body of the text. Still, I knew from playing around with ChatGPT a bit beforehand that I shouldn’t expect much. Before I started, I hypothesized that it would start out poorly and improve marginally over time, but struggle especially with word count and length.

When the clock struck eight, I fired up ChatGPT, to find predictably unimpressive results. I briefly inputted the content and specifications of each paragraph of my actual paper and asked it to write the essay, giving it instructions about the word count and citations. What I got back was far too short, had messy citations, and was mostly just plain wrong. The one bright spot was that it described the history of Phrygian studies in more detail than my actual paper, albeit with major inaccuracies.

After some more prompting, the citations improved, but only barely. I’ve learned a few different styles of in-text citations, but I’ve never heard of just spreading citations around willy-nilly, typically disconnected from factual information. It did little better elsewhere; asking it for more details usually made it write textwords about more linguistics concepts, but few of them were specific enough to be called details.

Then I decided to try writing it paragraph by paragraph, to see if that could let ChatGPT clear my low bar under which it was currently going. How hard could the relatively fact-light introduction be? As it turns out, difficult. When asked for an interesting anecdote, it led in with “an interesting anecdote about the studies of Phrygian is that” and proceeded to write something neither interesting nor accurate. Telling it to write in the style of creative nonfiction ameliorated the leaden prose, but didn’t make it sound human or fix the perpetually borked citations. Finally, it couldn’t craft an argument on its own, leading me to just hand over a thesis statement.

I gave it one last chance, asking it to write a paragraph about the connection between Phrygian and Burushaski, another language with a suspect connection to Phrygian. Not only did it make up nonsensical similarities between the two, it admitted its errors and then went back to making up similar nonsense. As a last-ditch attempt to salvage any writing, I tried feeding it in chunks of a text on Phrygian and Burushaski. Unfortunately, they were too large. ChatGPT timed out shortly before my hour was up.

SEE MORE FROM BENJAMIN McAVOY-BICKFORD:

Searching for reflection drive

Beware a tsunami of acronyms and abbreviations

Bonjour-hi to a multilingual United States!

In the end, ChatGPT turned in a pretty pitiful performance. Less than half of the statements in the final version of the essay were fully accurate. Not only were the citations misplaced, they were usually made up or irrelevant. And it certainly didn’t seem quicker than writing it myself, since I had to check everything myself. Humanity: 1. ChatGPT: 0.

I didn’t think this was particularly hard or atypical of what would be expected at any other university, but, still, I could let ChatGPT try something easier. Maybe a 2022 AP Literature exam question would be just what the AI needed to demonstrate its value.

A few minutes in to prompting ChatGPT, I realized that it was having no more luck with this prompt; it insisted on contrasting a character in Hermann Hesse’s The Glass Bead Game with himself. I decided to look up how to write a good ChatGPT prompt, in the hopes of getting something better than this. More importantly, I subbed out the novel I was using, wondering if Pride and Prejudice had been analyzed so much that the AI would be able to write about it semi-coherently.

Finally, I got an essay that bore a bit of a resemblance to the idea of ChatGPT as a plagiaristic jackhammer to the American educational system. The essay was still repeatedly wrong, and it certainly wasn’t stylishly written. Most high schoolers that I know could have done better, had they read the book. But the grammar was highly accurate and it certainly answered all parts of the prompt in a mostly coherent manner.

I’m inclined to think that this is nearly the maximum of ChatGPT’s current capabilities. The most-discussed actual cases of ChatGPT plagiarism had similar weaknesses to those in the essays that it gave me. Some other people have gotten mildly better essays, but, even so, a lot of this shows indicators of wishful thinking about technological progress, and possibly selection bias for unrealistically easy prompts.

In the end, ChatGPT is absolutely awful at writing the essays that college students are expected to write, but it is still indubitably impressive. Humans have made a machine that can master the rules of language, a rich and flexible domain of cognition. Although it’s frankly atrocious at what I wanted it to do, it still has useful cases: imitating the style of the King James Bible and generating very predictable kinds of text such as rubrics, for example.

The current best uses of ChatGPT are fascinating but not earth-shaking. Beyond that kind of formulaic writing, it’s hit-or-miss at creating wordplay, and hasn’t impressed me at longer text. Its repeated inaccuracies can be harmful, and, like many AI systems, it is prone to bias. What ChatGPT might perhaps do is pave the way for future intelligences that can write essays, although they may still have similar problems (or, perhaps, take over the world).

Given all this, it doesn’t seem like the fear about ChatGPT-driven plagiarism makes much sense. It’s easy to sidestep ChatGPT by asking a question about anything more obscure than the central themes of Pride and Prejudice. If its formulaic and inhuman style doesn’t tip off a grader, a plagiarism checker for it has already been built.

Furthermore, nothing about ChatGPT changes the core logic of why plagiarism doesn’t pay for students. To know enough to get the AI to be factually accurate, students are going to have to do just as much work as they would otherwise, especially if they edited it to meet Penn’s standards. It might currently offer slight improvements at avoiding plagiarism checkers, but I’d expect much better ways of flagging AI help to come online soon. In the end, all the people eagerly awaiting the death of typical essays are going to be forced to wait a while longer.

BENJAMIN McAVOY-BICKFORD is a College first year from Chapel Hill, NC. His e-mail is bmcavoyb@sas.upenn.edu.

SEE MORE FROM BENJAMIN McAVOY-BICKFORD:

Searching for reflection drive

Beware a tsunami of acronyms and abbreviations

Bonjour-hi to a multilingual United States!