Nobody should be using GPT detectors for anything important.
This is from a recent study that found that GPT detectors were misclassifying writing by non-native English speakers as AI-generated 48-76% of the time (!!!), compared to 0%-12% for native speakers.
It is irresponsible to use AI-generated text detectors as evidence of academic misconduct, and that's putting it mildly.
One of the perennial problems with deep learning models like DALL-E is that if you train them too well, eventually they start precisely reproducing material from their training data set that just happens to match whatever criteria they’re given.
Given that these models are a. trained on random images scraped in bulk from the Internet, largely without human curation, and b. being touted as a potential substitute for human artists in certain commercial applications, I’m just waiting for the inevitable lawsuit where one of these models spits out an exact copy of some reasonably well-known piece of art, that copy is used in a commercial publication whose author is unaware of what the model has done, and some poor judge has to rule on whether an AI can commit plagiarism.
On the one hand, people who take a hardline stance on “AI art is not art” are clearly saying something naïve and indefensible (as though any process cannot be used to make art? as though artistry cannot still be involved in the set-up of the parameters and the choice of data set and the framing of the result? as though “AI” means any one thing? you’re going to have a real hard time with process music, poetry cut-up methods, &c.).
But all of this (as well as takes that what's really needed is a crackdown on IP) are a distraction from a vital issue—namely that this is technology used to create and sort enormous databases of images, and the uses to which this technology is put in a police state are obvious: it's used in service of surveillance, incarceration, criminalisation, and the furthering of violence against criminalised people.
Of course we've long known that datasets are not "neutral" and that racist data will provide racist outcomes, and we've long known that the problem goes beyond the datasets (even carefully vetting datasets does not necessarily control for social factors). With regards to "predictive policing," this suggests that criminalisation of supposed leftist "radicals" and racialised people (and the concepts creating these two groups overlap significantly; [link 1], [link 2]) is not a problem, but intentional—a process is built so that it always finds people "suspicious" or "guilty," but because it is based on an "algorithm" or "machine learning" or so-called "AI" (processes that people tend to understand murkily, if at all), they can be presented as innocent and neutral. These are things that have been brought up repeatedly with regards to "automatic" processes and things that trawl the web to produce large datasets in the recent past (e.g. facial recognition technology), so their almost complete absence from the discourse wrt "AI art" confuses me.
Abeba Birhane's thread here, summarizing this paper (h/t @thingsthatmakeyouacey) explains how the LAION-400M dataset was sourced/created, how it is filtered, and how images are retrieved from it (for this reason it's a good beginner explanation of what large-scale datasets and large neural networks are 'doing'). She goes into how racist, misogynistic, and sexually violent content is returned (and racist mis-categorisations are made) as a result of every one of those processes. She also brings up issues of privacy, how individuals' data is stored in datasets (even after the individual deletes it from where it was originally posted), and how it may be stored associated with metadata which the poster did not intend to make public. This paper (h/t thingsthatmakeyouacey [link]) looks at the ImageNet-ILSVRC-2012 dataset to discuss "the landscape of harm and threats both the society at large and individuals face due to uncritical and ill-considered dataset curation practices" including the inclusion of non-consensual pornography in the dataset.
Of course (again) this is nothing that hasn't already been happening with large social media websites or with "big data" (Birhane notes that "On the one hand LAION-400M has opened a door that allows us to get a glimpse into the world of large scale datasets; these kinds of datasets remain hidden inside BigTech corps"). And there's no un-creating the technology behind this—resistance will have to be directed towards demolishing the police / carceral / imperial state as a whole. But all criticism of "AI" art can't be dismissed as always revolving around an anti-intellectual lack of knowledge of art history or else a reactionary desire to strengthen IP law (as though that would ever benefit small creators at the expense of large corporations...).
Поздравляю всех с праздником любви и нежности, с Днем святого Валентина! Пусть любовь всегда будет чиста, верна и преданна. Пусть ваши половинки будет всегда рядом, оберегают от всех невзгод и неурядиц. Пусть ваши чувства будут теплыми и крепкими, страстными и взаимными.
Congratulations to everyone on the holiday of love and tenderness, Happy Valentine's Day! May love always be pure, faithful and devoted. Let your other halves always be there, protect you from all adversity and troubles. Let your feelings be warm and strong, passionate and mutual.
Detecting AI-generated research papers through "tortured phrases"
So, a recent paper found and discusses a new way to figure out if a "research paper" is, in fact, phony AI-generated nonsense. How, you may ask? The same way teachers and professors detect if you just copied your paper from online and threw a thesaurus at it!
It looks for “tortured phrases”; that is, phrases which resemble standard field-specific jargon, but seemingly mangled by a thesaurus. Here's some examples (transcript below the cut):
profound neural organization - deep neural network
Thought: we shouldn't be calling all these "AI" things Artificial Intelligence.
Instead, I propose we use the term "Algorithmic Generators", or "AG" for short, for these types of things.
Because that better explains what they actually are, and also doesn't incorrectly peg them as "intelligent" or cause confusion about what AI actually mean anymore.
Controversial opinion but i think we need to give neural nets physical bodies. Not because i think they’re like actually sapient i just think it would be interesting, you know? I’d like to see the almost-person of an ai chatbot given physical form. Like imagine hanging out in person with frank @nostalgebraist-autoresponder