Pirated Books Powering Gen AI

Joshua the Writer · Jul 22, 2024

I was browsing my chrome mobile home page and I came across this article, which I find very concerning. This essentially means that big ai is copying from writers' work that is supposed to be published. I see the value in pirating books, don't get me wrong. Art and culture shouldn't be accessible to only those that can afford it, but that only matters when it is for non commercial (aka personal use) reasons. I am also concerned because of how this increases the amount of anti-creativity in today's society. Culture isn't something you can type into a command prompt to get it generated by an unfeeling (and technically unthinkinking) machine. It has to be created by humans, for humans.

This is especially detrimental to authors because their books are being pirated for commercial use, so they are not giving any permission to have AI trained on their work, and they don't get any payment off of this, either. Any AI generated story from now on will be an abomination of plagiarism and copyright violations. Thankfully, there are technologies being developed to counteract this kind of thing. It's called Glaze/Nightshade. As it's naming suggest, it re-encodes the files so that they function normally, except when they're uploaded to an AI gen repo, and then it poisons any AI repo it is attached to in order to break it. From what I am skimmed through, Glaze's main goal is to make the file itself unusable, while Nightshade will have the potential to break or kill entire services if the encoded image is uploaded (as its name suggests). Although Nightshade is still in development. Both of these are also only for images so that artists and photographers can protect their work, but hopefully, it will support common file formats that writers use to publish their work, or an equivalent that does that for writers.

I am mainly concerned because I am a privacy conscious internet user, as well as a writer. Creative such as myself are the most affected by these issues, so please listen to us.

Mr. Stevens · Jul 22, 2024

I agree about the concerns with AI, and also like art being accessible, but I don't see how we can justify pirating. That kind of attitude seems to lead to the abuses we see with AI.

Princess Viola · Jul 22, 2024

This generative 'AI' nonsense really needs to be regulated like yesterday.

Outdated · Jul 22, 2024

Joshua the Writer said:
I see the value in pirating books, don't get me wrong. Art and culture shouldn't be accessible to only those that can afford it, but that only matters when it is for non commercial (aka personal use) reasons.

Thank you for writing this. I've spent a lot of time in remote communities amongst some of the poorest people on the planet, for many of these people things we often take for granted are outside of their reach.

A common practice that still happens today is that when someone buys a book, after they've read it they'll take it down to the local pub and put it on the public bookshelf so that everyone else can read it too. No one sees that as piracy, yet if we do the same with a digital copy it's treated differently by some people.

Princess Viola · Jul 22, 2024

I'm personally not bothered by digital piracy of books if someone has no other way to access it (maybe they're a kid and they can't afford it/their parents won't buy it for them, it hasn't been published in their locale for one reason or another, none of their local libraries have a copy, hell I'd even go so far and say if you've already bought it physically and want to pirate it so you can have an Ebook version is OK in my book - you've already bought a copy) but doing it when you can purchase the book and/or it is easily available to you, I'm not a big fan of.

Low book sales can affect an author both financially and in terms of getting future book deals.

It's one thing if we're talking about something written by a famous author that's guaranteed to sell tons of copies, who really cares if you pirate that, y'know? Stephen King, for example, is not going to be at risk of losing his publishing deal due to piracy, but an author who's still trying to make a name for themselves (or just needs to make money) that's a bit different.

Jumpinbare · Jul 22, 2024

Princess Viola said:
Low book sales can affect an author both financially and in terms of getting future book deals.

Or they may quit writing and go into something with more steady income.

Cryptid · Jul 22, 2024

Waitaminnit . . . "Not give permission for AI to be trained on their work"?

The permission is both inherent and implied in the publication of their work.

Otherwise, real intelligence would need written permission from the authors to learn from their textbooks.

Don't want AI to learn from your work? Don't publish. Simple as that.

Boogs · Jul 23, 2024

All the Large Language Model AI's, the one's popular among most ordinary internet users, are totally reliant on the largest mega copyright theft that's ever been (knowingly) achieved. They won't work without masses of data, and much of that is copyright. The big problem is that the nature of a neural network is that once it's been trained on the data, that data is stored internally in a form that can't be pulled out in the same state as when it went in. This makes proving copyright theft in the courts immensely difficult. The big tech companies know this, hence they act with impunity. Just being allowed to do that is potentially very toxic regards future anti-social behaviour.

Worse, the products of these AI's are tending to be shown to be very samey, they tend to average out all their input. If human creatives are squeezed out of the profession, we'll see our culture gradually turning into a averaged mush with no originality or worth.

To say the solution is simply not to publish is only looking at it in hindsight of knowing AI's will exist and this will be what they consume to do so. No-one who published pre AI could have known that would happen.
In the future it would only discourage others from publishing original works for the benefit of other humans, which also suppresses the who area of creativity. People create to let others enjoy it, sometimes they make a living from that, sometimes they do it for the love of it. I'd hate to see that crushed by giant tech firms looking to make ever more obscene profits out of the rest of us because they have the power to do so.

By the way, it may be worthy of note that firms like Microsoft and Google have seen a 50% rise in their carbon pollution levels from introducing AI already. And these babies already used a LOT of power. So much for green credentials when it comes to profiteering.

Joshua the Writer · Jul 23, 2024

Cryptid said:
Waitaminnit . . . "Not give permission for AI to be trained on their work"?

The permission is both inherent and implied in the publication of their work.

Otherwise, real intelligence would need written permission from the authors to learn from their textbooks.

Don't want AI to learn from your work? Don't publish. Simple as that.

No. Permission to be uploaded is not inherent in the publication of their work. As they're being published in order to be read by humans, not by machines.

"Don't want AI go learn from your work? Don't publish," is the equivalent of telling somebody they shouldn't own something if they don't want it to get stolen.

Joshua the Writer · Jul 23, 2024

Boogs said:
All the Large Language Model AI's, the one's popular among most ordinary internet users, are totally reliant on the largest mega copyright theft that's ever been (knowingly) achieved. They won't work without masses of data, and much of that is copyright. The big problem is that the nature of a neural network is that once it's been trained on the data, that data is stored internally in a form that can't be pulled out in the same state as when it went in. This makes proving copyright theft in the courts immensely difficult. The big tech companies know this, hence they act with impunity. Just being allowed to do that is potentially very toxic regards future anti-social behaviour.

Worse, the products of these AI's are tending to be shown to be very samey, they tend to average out all their input. If human creatives are squeezed out of the profession, we'll see our culture gradually turning into a averaged mush with no originality or worth.

To say the solution is simply not to publish is only looking at it in hindsight of knowing AI's will exist and this will be what they consume to do so. No-one who published pre AI could have known that would happen.
In the future it would only discourage others from publishing original works for the benefit of other humans, which also suppresses the who area of creativity. People create to let others enjoy it, sometimes they make a living from that, sometimes they do it for the love of it. I'd hate to see that crushed by giant tech firms looking to make ever more obscene profits out of the rest of us because they have the power to do so.

By the way, it may be worthy of note that firms like Microsoft and Google have seen a 50% rise in their carbon pollution levels from introducing AI already. And these babies already used a LOT of power. So much for green credentials when it comes to profiteering.

Well said. AI doesn't know how to be creative. It doesn't know what goes into it. It's just a bunch of ones and zeros with no love, no passion, and no humanity. Creativity and art is a human trait. Done by humans. Hell, the definition of art is "the expression or application of human creative skill and imagination." AI just recycles stuff that's put into it, so I wouldn't count that as creative skill, and an algorithm isn't imagination. And it's not human, either. Culture, art, and writing are human only skills.

Don't be lazy. Pick up the pencil.

WhitewaterWoman · Jul 23, 2024

Outdated said:
Thank you for writing this. I've spent a lot of time in remote communities amongst some of the poorest people on the planet, for many of these people things we often take for granted are outside of their reach.

A common practice that still happens today is that when someone buys a book, after they've read it they'll take it down to the local pub and put it on the public bookshelf so that everyone else can read it too. No one sees that as piracy, yet if we do the same with a digital copy it's treated differently by some people.

I’ve been in situations where people exchange books more or less informally. I agree it is particularly useful in areas poorly served by libraries.

But think of how many people will read that pre-owned book. 100? 1000?

How many times will similar material be “read” and mined and dispersed electronically? Millions?

It’s the scale that is different.

Joshua the Writer · Jul 23, 2024

Mr. Stevens said:
I agree about the concerns with AI, and also like art being accessible, but I don't see how we can justify pirating. That kind of attitude seems to lead to the abuses we see with AI.

As I said. Culture shouldn't be accessible by only those that can afford it, or otherwise unable to access it. I often watch anime that hasn't been translated to English yet through free fansubs. I also use emulators for older console games because I'm not paying hundreds for a still working retro system and the cartridges and disks.

Cryptid · Jul 23, 2024

Joshua the Writer said:
Cryptid said:

Waitaminnit . . . "Not give permission for AI to be trained on their work"?

The permission is both inherent and implied in the publication of their work.

Otherwise, real intelligence would need written permission from the authors to learn from their textbooks.

Don't want AI to learn from your work? Don't publish. Simple as that.

Click to expand...

No. Permission to be uploaded is not inherent in the publication of their work. As they're being published in order to be read by humans, not by machines.

"Don't want AI go learn from your work? Don't publish," is the equivalent of telling somebody they shouldn't own something if they don't want it to get stolen.

Where on the title page of any publication does it say, "To be read only by humans"? The writer's intent does not a prohibition make.

What about OCR devices that upload text to convert the printed word to an artificial voice? Does doing so make visually disabled people criminals?

What about those of us who upload PDFs to our iPads so that we can have ready access to reference materials, or to have something to read on a plane?

Saying "Thou Shalt Not Allow a Machine to Upload Thy Writings" is a bit naïve, don't you think?

I think what is really happening here is that artists/writers are fearful of competitors who can learn their craft in just a few hours, and produce artwork that is just as good as -- or even better than -- artwork produced by humans.

And do it for less cost, too. Why pay out a $10,000 advance to a human writer for the next installment of a serial story (and wait weeks for it, too), when with just a phone call and a promise, an editor can have an entire finished script on their desk the next day, and owe nothing more than a steak dinner and a bottle of champagne for the effort?

As for "Don't publish if you don't want AI to learn from your work", it is not analogous to "Don't own something if you don't want it stolen". It is analogous to "Don't leave your treasures where they can be stolen."

Joshua the Writer · Jul 23, 2024

Cryptid said:
Where on the title page of any publication does it say, "To be read only by humans"? The writer's intent does not a prohibition make.

What about OCR devices that upload text to convert the printed word to an artificial voice? Does doing so make visually disabled people criminals?

What about those of us who upload PDFs to our iPads so that we can have ready access to reference materials, or to have something to read on a plane?

Saying "Thou Shalt Not Allow a Machine to Upload Thy Writings" is a bit naïve, don't you think?

I think what is really happening here is that artists/writers are fearful of competitors who can learn their craft in just a few hours, and produce artwork that is just as good as -- or even better than -- artwork produced by humans.

And do it for less cost, too. Why pay out a $10,000 advance to a human writer for the next installment of a serial story (and wait weeks for it, too), when with just a phone call and a promise, an editor can have an entire finished script on their desk the next day, and owe nothing more than a steak dinner and a bottle of champagne for the effort?

As for "Don't publish if you don't want AI to learn from your work", it is not analogous to "Don't own something if you don't want it stolen". It is analogous to "Don't leave your treasures where they can be stolen."

Because it'd just make generic slop and reduce the amount of creativity in the world, despite the fact that we live in the world with unlimited and easy to access methods to share any ideas. Even your misinformed idea that AI is somehow good. I'm not fearful of a competitor, because I already know that once I get to writing, the stuff I come up with is consistently better than something an AI could produce. Keep in mind, AI essentailly can only replicate what it already knows, so there is a lack of creativity, because creativity is a human thing. Surely, monkeys and corvids can be creative, since they're also highly intelligent species, but creativity is mostly a human thing.

Paying the writers is something those mega corps SHOULD be doing. They have the budget to do so, but those greedy, snotty, CEOs decide that they want more money for themselves instead. Not to mention because. And you're eating their propaganda right up.

The only thing that can come with normalized commercial use of AI generated images and documents (I can't call it art or writing in good faith) is an even greater decrease in media literacy and creativity.

Publishing works is not leaving treasures where they can be stolen.

Cryptid · Jul 24, 2024

Do you mean "Generic Slop" as in every sword-and-sorcery novel since Lord of The Rings? How about the seemingly endless variations on the Disney/Marvel franchise? Maybe the daytime soap operas? How about professional sports? The History channel? Every exhibition at MOMA since its opening?

You may have a point though, if your point is that humans can generate enough generic slop all on their own without any assistance -- or competition -- from Artificial Intelligence.

Because, you see, AI systems can only learn to produce their own art from studying human art . . . giving the phrase "Garbage In, Garbage Out" a whole new context.

Boogs · Jul 24, 2024

Cryptid said:
Where on the title page of any publication does it say, "To be read only by humans"? The writer's intent does not a prohibition make.

The act of reading a book by a human is a very different act to feeding it into a machine that requires someone's original work to be able to produce profit making produce for the AI owner.

The AI owner is making direct profit from the original works created by others without payment, agreement or permission. Just because they use a fancy new technology to do so makes no difference to my mind.

What you suggest would have a terrible impact of human creativity and our access to it.
The big problem is copyright laws are wholly inadequate for purpose these days.

If someone creates a picture, spends years working on it, and someone else see's it, photographs it, and then proceeds to make a fortune out of that photograph without permission or even the knowledge of the artist, are you suggesting that would be fair? If not, what's the difference?

Bottom line, to say that anyone's original work should be allowed to be stolen unless they hide it away from the world, is to gift even greater power to the most powerful and anti-social companies in the world. Is that the future you want?

Cryptid said:
Do you mean "Generic Slop" as in every sword-and-sorcery novel since Lord of The Rings? How about the seemingly endless variations on the Disney/Marvel franchise? Maybe the daytime soap operas? How about professional sports? The History channel? Every exhibition at MOMA since its opening?

I think this refers to the output of generative AI, not all the examples you mention which are human produced. The quality of them is not the issue. AI does not, in fact cannot produce original content, while a human can. Generic Slop is the taking of existing ideas to mush together to make something apparently new but lacking any originality.

I understand how compelling the tools are, but that doesn't reflect the implications of how the technologies are being used and what it's being aimed at. It's been given an very alluring shine, but that hides a dirty underside that isn't so visible. If the continued homogenisation of culture is deemed desirable then generative AI may be a great thing, but if diversity is valued then these types of AI are anathema.

Try this line of thought - computers move more and more to 'human' style of communication with generative AI making huge advances in that area.
But what sort of human communication is that? Well, it'll be the average because that's the most effectively profitable - grabs the most consumers, it'll work best for NT's.
But what of us ND's? We are classed autistic mostly based on social and personal communication issues, so will homogenous AI gradually squeeze us out of the net as we've been squeezed out of offline society?
Computer says 'No'?

Outdated · Jul 24, 2024

Cryptid said:
Do you mean "Generic Slop" as in every sword-and-sorcery novel since Lord of The Rings?

There's a lot of generic slop out there, that's what always happens when people try to follow a formula, ie, act like an AI. There's also a lot of truly fantastic stories from very imaginative people. Lord of the Rings doesn't even rank in the top 50, I thought it was wonderful when I was 11 years old, fair enough, it is a children's book and not a great read for adults.

Joshua the Writer said:
And you're eating their propaganda right up.

That's the mentality of the greater populace, let someone else do their thinking for them. They saw it on TV, it must be the whole truth.

AI can never replace a good story teller. Most humans can't replace a good story teller. I write exceptionally well but I'm no good at trying to entertain people. AI doesn't even write well. One of my favourite authors wrote a little bit about himself and his comments always stuck with me.

Raymond Feist:

"Magician was all this, and more. In late 1977 I decided to try my hand at writing, part-time, while I was an employee of the University of California, San Diego. It is now some fifteen years later, and I have been a full-time writer for the last fourteen years, successful in this craft beyond my wildest dreams. Magician, the first novel in what became known as. The Riftwar Saga, was a book that quickly took on a life of its own. I hesitate to admit this publicly, but the truth is that part of the success of the book was my ignorance of what makes a commercially successful novel. My willingness to plunge blindly forward into a tale spanning two dissimilar worlds, covering twelve years in the lives of several major and dozens of minor characters, breaking numerous rules of plotting along the way, seemed to find kindred souls among readers the world over. After a decade in print, my best judgment is that the appeal of the book is based upon its being what was known once as a “ripping yarn.” I had little ambition beyond spinning a good story, one that satisfied my sense of wonder, adventure, and whimsy. It turned out that several million readers—many of whom read translations in languages I can’t even begin to comprehend—found it one that satisfied their tastes for such a yarn as well."

Cryptid · Jul 24, 2024

Sure, there is a lot of original work out there that deserves all the accolades and awards it receives. I am not disputing that. I am saying that what follows from that is mostly derivative of the original concept, with every iteration falling below the standards of the previous iteration until what is produced is a mere parody of itself, with no "artistic" value whatsoever.

So what do you get when AI is fed this denatured pabulum and told to regurgitate something meaningful? Nothing that surpasses what it has been fed - Garbage In, Garbage Out, again.

(And yes, that does inspire me to suspect that DisneyCorp is using an AI to generate movie scripts.)

Now, unless there is some protectionist legislation to prevent an AI from engaging in the same derivations that Hollywood scriptwriters consider their only money-making practice, then an AI deriving its output from learning the creative portfolios of one or more flesh-and-blood humans is doing nothing wrong.

I still think that the uproar over this whole issue is similar to the Luddite movement against stocking frames in the late 16th and early 17th centuries. A relatively cheap method of weaving socks by one person could put 20 people out of work. What they wanted was protectionism -- the legislation against mechanization, automation, and progress. Without it, their labors were devalued, and a living wage could not be earned unless they invested in their own stocking frames and increased the quality and quantity of their own output.

But it's always so much easier to attack new technology than it is to adapt to it, isn't it?

Outdated · Jul 24, 2024

Cryptid said:
I still think that the uproar over this whole issue is similar to the Luddite movement against stocking frames in the late 16th and early 17th centuries. A relatively cheap method of weaving socks by one person could put 20 people out of work. What they wanted was protectionism -- the legislation against mechanization, automation, and progress. Without it, their labors were devalued, and a living wage could not be earned unless they invested in their own stocking frames and increased the quality and quantity of their own output.

That's a great analogy for all things technical but it falls over when it comes to anything considered Art. The difference is that of between a good home cooked meal and something from the frozen section in the supermarket.

Hollywood and the US record companies have played that franchise formula game for a very long time, the amount of pap they've pumped out over the years is unbelievable. Elvis movies, etc. Because they keep trying to regurgitate the same formula when they do accidentally make a hit movie they're unable to repeat it.

Boogs · Jul 24, 2024

Cryptid said:
But it's always so much easier to attack new technology than it is to adapt to it, isn't it?

Technology is neutral, it's who uses it and how that matters.

Cryptid said:
I am saying that what follows from that is mostly derivative of the original concept, with every iteration falling below the standards of the previous iteration until what is produced is a mere parody of itself, with no "artistic" value whatsoever.

Poor quality human output is part and parcel of creativity, and what matters more is who is judging that quality.
How does one judge and grade artistic value?
But the big difference is AI's are not creative, have zero understanding. To take the case of language output, each word the AI adds to the sentence it's producing at the time, is decided by the probability of what word would come next in the relevant examples it's been trained with. It knows nothing, zero, zilch, nada about what it's 'writing'. To the AI it could be meaningless strings of random letters, it would have no more or less understanding of what they meant.
All of that meaning has been strip mined from human output without the AI having the smallest clue that the meaning has been added/included.

Cryptid said:
an AI deriving its output from learning the creative portfolios of one or more flesh-and-blood humans is doing nothing wrong.

The AI most certainly is NOT learning the creative portfolios.
If a script writer took a big bunch of other writers scripts, and wrote a new script that stole every idea and description from those other writers works, adding nothing original to that new script, would you say that writer had done nothing wrong? Because generative AI's do that. They don't read all that work and come up with new ideas based on that inspiration, only humans can do that.

Pirated Books Powering Gen AI

Very Nerdy Guy, Any Pronouns

Well-Known Member

Dumbass Asexual

High Function ASD2

Dumbass Asexual

Aspie Nudist and Absent-minded Professor camp dude

Well-Known Member

Of cabbages and Kings.

Very Nerdy Guy, Any Pronouns

Very Nerdy Guy, Any Pronouns

Well-Known Member

Very Nerdy Guy, Any Pronouns

Well-Known Member

Very Nerdy Guy, Any Pronouns

Well-Known Member

Of cabbages and Kings.

High Function ASD2

Well-Known Member

High Function ASD2

Of cabbages and Kings.