The paper, released on arXiv two days ago, is getting a lot of attention. Some issues with it were immediately apparent. Then three MIT students dug up the study’s test set, and discovered things were much worse than they initially appeared.

Carol Anderson 6/15/23 Carol Anderson 6/15/23

Hype alert: new AI writing detector claims 99% accuracy

Multiple media outlets are reporting on a recent study published by University of Kansas researchers. Per these reports, the KU researchers have achieved near-perfect accuracy at detecting AI use in “scientific writing” or “academic writing.” But let’s take a look at what this study actually did.

Carol Anderson 6/1/23 Carol Anderson 6/1/23

The false promise of AI writing detectors

Carol Anderson 4/22/23 Carol Anderson 4/22/23

Should we have a government-funded “public option” for AI?

My reaction to the article “How Artificial Intelligence Can Aid Democracy” by Bruce Schneier, Henry Farrell, and Nathan E. Sanders, published April 21, 2023 by Slate.

Carol Anderson 4/11/23 Carol Anderson 4/11/23

Guardrails on large language models, part 4: content moderation

The final post in a four-part series on the guardrails on large language models.

Carol Anderson 3/27/23 Carol Anderson 3/27/23

Guardrails on large language models, part 3: prompt design

The third in a four-part series of posts about the guardrails on large language models.

Carol Anderson 3/9/23 Carol Anderson 3/9/23

Guardrails on large language models, part 2: model fine tuning

Prompted by Bing/Sydney’s zany behavior, this series of posts gives a non-technical introduction to the guardrails on large language models (LLMs).

Carol Anderson 3/1/23 Carol Anderson 3/1/23

Guardrails on large language models, part 1: dataset preparation

With the recent spate of news about Bing/Sydney going haywire, I’ve noticed some misconceptions about the guardrails on large language models (LLMs). To help dispel some of them, in this series of posts I’ll give a non-technical introduction to each of the four major points of control for LLMs.

Carol Anderson 2/22/23 Carol Anderson 2/22/23

What’s the difference between the new Bing and ChatGPT?

Wondering how the new AI-powered Bing is different from ChatGPT?

I made this graphic to show their key differences.

Carol Anderson 1/20/23 Carol Anderson 1/20/23

Koko, ChatGPT, and the outrage over corporate experimentation

Mental health service Koko sparked outrage by announcing it experimented with ChatGPT for mental health support, apparently without informing users. (It turned out users were informed all along, and the CEO’s Twitter thread was just really confusing.)

Here, I dig into the outrage and argue that much of it was focused on the wrong issue: the ethics of corporate experiments.

Carol Anderson 12/7/22 Carol Anderson 12/7/22

Why I don’t think ChatGPT is going to replace search anytime soon

There’s been a lot of breathless coverage of ChatGPT in the past week. One comment I keep seeing on social media is that it’s going to replace Google and other search engines. I don’t think that’s going to happen anytime soon, and here’s why.

Carol Anderson 11/16/22 Carol Anderson 11/16/22

Do stock image creators know they're training AI to compete with them?

Recent announcements have revealed that stock image collections are being used to train generative AI. Compared to using web-scraped data, this is less legally risky and potentially more fair to the creators of the images. But I have some nagging concerns about it.

Carol Anderson 11/3/22 Carol Anderson 11/3/22

Class-action lawsuit filed over copyright and privacy issues stemming from GitHub Copilot

Last week I posted about the copyright and privacy risks associated with large language models. One of the examples I discussed was GitHub Copilot, the code-writing assistant based on OpenAI's Codex model. One of the key problems with Copilot relates to code licensing. Today, the issue headed to court.

Carol Anderson 10/29/22 Carol Anderson 10/29/22

Large language models can steal work and spill secrets. Here’s why we should care.

Large language models are trained on massive datasets of web-scraped data. They memorize some of it, and can regurgitate it verbatim – including personal data and copyrighted material. Is that a problem?