Please, do not call it intelligent

Photo of the Novelist Nina George
© Julia Beier

 

By Nina George

Nina George is a New-York-Times bestseller novelist, President of Honor of the European Writers’ Council and it’s Commissioner for Political Affairs. She is actively facing the challenges of AI by participating in many fora at a European level.

AI, the beautiful monster, was brought to the screen for the first time by Stanley Kubrick in 1968. HAL, the neurotic computer of the spaceship Discovery on its journey to Jupiter in “Odyssey 2001” is highly qualified, superior to humans in every computing power, and capable of forming a consciousness. But also fearful of being shut down, HAL assassinates the crew of the Discovery. One by one. Until HAL is manually deactivated, and its “mind” infantilises and shrinks down to a childlike organism, as helpful and harmless as a slide rule.

The good news: HAL won’t happen any time soon. AI is not to kill us. Panic is a business model, so let’s get down to basics: HAL is what would be called “strong artificial intelligence”, with the highest level of Emotional Intelligence (EI), which is equated with “humanoid” – as in the film “I am your human” by Maria Schrader. However, the AI that is used worldwide is exclusively so-called “weak AI”, with largely equally weak EI.
Weak AI can only focus on one single task, and has remained pretty dumb in the text domain since the first development steps in the 1960s: it can only either translate, or analyse, or write. Here we move into Natural Language Processing (NLP), i.e. composing, translating and checking texts, and Natural Language Understanding (NLU), e.g. to convert text into speech and vice versa, as in the funny, drunken-looking automatic subtitles in Zoom, or “customer conversations” in the hotline waiting loop.

The automatic text generator cannot read itself. It doesn’t even understand what it’s talking about, because words – by the way: stolen from copyrighted e-books from among others grey zone sources, also from piracy websites – are converted into formulas. So text analysis or translation AI also fails with irony or puns, and with emotion – unless it has “sentiment detection” built in, which is the ability to recognise terms with negative or positive connotations, such as “beautiful” or “dead. In various preliminary tests of the GPT automatic text and communication generator, however, strange things happened: In simulated conversations about the Holocaust, black people or women, the GPT in its version No. 3 rattled out sexist, racist and anti-Semitic comments, and in a simulated “therapist conversation” with a depressed patient, the generative language processor advised her that it would be best to commit suicide.
It really does not know what it is talking about, since today.

Sentiment detection lists of words are – still – built by human linguists by tagging and labelling terms, for example, extremely negative (-3) or extremely positive (+3), or by determining “n-grams”, i.e. word sequences that are evaluated as positive or negative. This also has its pitfalls, because punctuation marks are deleted for the selection. But without labelling, there would be no generative text machines – what a pity, that Open AI sourced this work out, in a certain period also to underpaid workers in Africa, for an hourly sum of 2 Dollars. ChatGPT is built on shame, everywhere you look into its roots.
Weak text AI is only as good as the templates it is “trained” from. In order to make the quality of programmes and software-products “better”, to learn topicality, changes in the home of values, nuanced terms, debate buzzwords, to become as diverse as society is, one thing must be clear: they need professional, good, zeitgeisty texts from professionals like book authors, or journalistically brilliant works. Writoids need writers – so that we become superfluous, one could say as a pessimist. Or as a realist.
They need our free minds as mines. This is called Text and Data Mining (TDM), which, besides being an acceptable analysis tool and a blessing to science and research, is since a decade highly relevant to commercial companies. Around the world, Oracle, Alibaba, Google, Microsoft, OpenAI, Nvidia and Amazon were working on text generators and machine translation. And the data sets for training are, at best, not the 1st Book of Moses, but current books and texts by people such as professional writers or private individuals. They are stolen, to be precise. In one or the other way: downloaded from bit torrent infringement sites, or used before even the European TDM-exception was in place (7 June 2021). It’s also doubtful, that machine learning is within the scope of TDM.
In fact, however, on 7 June 2021, the European legislator, by putting in the Directive on Copyright in the Digital Single Market a non-remuneration clause for TDM, opened the door to hell for Silicon Valley companies to create competitive products that imitate and partially replace the achievements of human authors. The only way out is to integrate a “machine-readable opt-out”. But how do authors do this in their e-books? There is no single contractual system, neither a “rights reservation protocol” in function. The theft goes on.

Back to the false term “intelligent”: Weak AI simulates what we misinterpret as human intelligence, decision-making ability, knowledge, empathy or just: Consciousness, character.
In principle, we look at AI as too naïve novices; Joseph Weizenbaum noted at the end of the 1970s that we only project a wisdom, an essentiality, into AI, especially into AI that “talks” to us reactively. Anyone who experienced the fuss about Tamagotchis in the 1990s – artificial chicks that “died” if they were not “cared for” – can imagine the deep emotional attachment to a product. Some people would consider the loss of their smartphone at least an amputation, if not a “loss of life”. The term “Tamagotchi effect” refers to the “emotional intelligence” attributed to a technical product or programme – or rather: how “well” it simulates empathy, feeling, intuition and lets us emotionally dock onto the slide rule. When a machine seems to “respond” to you – be it the automatic fill in of the 60-years-old Marchov-chain is your mobile phone to complete your sentences in a WhatsApp, be it the fancy filter of your camera, laying down a cat’s voice over your movements of the mouth – it tricks your lateral prefrontal cortex into desire and trust. Humans are too easily charmed.

So what we shall do is, to stop naming the technologies, constructed by theft of real artistry and human work, as “intelligence”. The wording “intelligence” is tricking us into a mind status, that will not help us regulating it; so from now on, we could say: AI is Advanced Informatics. It’s a business model established on the back of authors and artists.
If humanity finds this intelligent, it must ask themselves, how dumb it is.

Nina George is a New-York-Times bestseller novelist, President of Honor of the European Writers’ Council and Commissioner for Political Affairs.


Ressources
(1) Lists and reports of works and sources used
Which copyrighted works make up the Books3 corpus (which is based on Meta LLM as well as GPT (3), among others) - Author: Peter Schoppert (Singapore; publisher, author, see: https://substack.com/@aicopyright)
https://aicopyright.substack.com/p/has-your-book-been-used-to-train

From which time periods and from which publishing groups do the most frequent works come (neither licensed nor under any barrier for text and data mining; consequently: used illegally):
https://aicopyright.substack.com/p/the-books-used-to-train-llms

Reference to sources of e-books - bit torrent piracy sites, quote:
"Some background on Books3. In their paper announcing LLaMA, Facebook described it as "a publicly available dataset for training large language models." Well it sure is publicly available, for download here or from AI community HuggingFace. But that description elides the fact that this is a cache of (mostly) pirated ebooks, "all of bibliotik", described as in torrent directories as "the largest private torrenting site for downloading ebooks.""
https://psmedia.asia/publishers-has-your-book-been-used-to-train-chat-gpt-without-your-permission/

The list of (illegitimately) used books compiled by the research team of the Saftey AI Camp (Netherlands) (open source AIs like StableLM are - partly - trained with pirated books).
https://gist.githubusercontent.com/alexjc/a88577adf147c656c4be9a6dd0461fac/raw/f2fb258165dadc66d93d0312aab72ee7f0e3dd8c/2020-09-02-books3-filenames.txt

Overview (Also: The Pile, and the corpus Book2)
https://github.com/psmedia/Books3Info

(2) The Stanford Check: all major Foundation LLMs fail transparency on disclosure of used datasets, especially copyrighted works, and would not comply with the rules laid down in the AI Act proposal.
https://crfm.stanford.edu/2023/06/15/eu-ai-act.html?fbclid=IwAR2pW8d96Fwjor9LIeFXUJjei4l2hBs6LbjHJikO65VZHHnDavZvIMSxuR8

(3) The most common works in large language models: From 50 Shades of Grey to Sacrilege - the Berkeley analysis of how language models are pre-copyrighted (and plagiarised).
"Record archaeology" of language model GPT (4) by a Berkeley University research group:
Overview: https://arxiv.org/abs/2305.00118
To the pdf: https://arxiv.org/pdf/2305.00118.pdf

(4) Authors sue Open AI: https://nypost.com/2023/07/01/lawsuit-says-openai-violated-us-authors-copyrights-to-train-ai-chatbot/