AI ART: Blessing or Curse? - A broad Analysis of its Impacts and Capabilities
There is a lot of hype and excitement, and also a lot of fear and anger around this new technology, AI image generators. Some think it will revolutionize society, and others think that it will create a dark and chaotic future. Where lies the truth? Can we even tell where all of this is going? In this video I will try to give you a broad overview, looking at this technology from all sorts of angles.
It is my aim to inform you to the best of my abilities. And especially to inform artists, because I know that a lot of you feel incredibly worried. And I have a lot of worries and concerns too. But this video is not just for artists, but for everyone. Even if you have nothing to do with this stuff, a lot of these things are good to know and think about.
To give you a list of the topics that I will go over in this video: At first we’ll talk about the basic principles behind this technology. How it works. This is going to be essential, especially for the next point, the legal and ethical issues. Of which there are a lot. Then how it might transform the industries and markets where artists work at, and whether or not it will get rid of us all.
And it is not just the art production that is affected, but also the consumption side of it. How people will perceive artworks in the future. Also we’re going to take a closer look at the promises behind those technologies, and if they are actually realistic. What are the limitations of deep learning, generative AI? After that I am going to talk about an aspect that often gets ignored, but it still plays an important role: the human aspect. Psychology. How all of this makes us feel, which also influences the markets after all.
Then some advice for artists for how to deal with this uncertain situation, and how to compete with such an advanced tool. And during the outro I will talk about my own worries and give credit to all of the people that provided the information that I used for this video. By the way, in the description you can find a document with all the various numbered citations.
So… that’s quite a lot. But there are so many important factors here. I encourage you to not skip too much of this video, because a lot of these aspects are interconnected with each other. And I want to tell all of you artists right from the beginning that hope is not lost.
It’s not all doom and gloom, and that a lot of us will still be able to do what we love doing without being forced to resort to AI tools. Especially in the later parts of the video I will talk about the reasons why I think that. Alright, this introduction is already long enough.
Let us get to the first point, which is how the technology actually works. Deep Learning AI is something that feels like a black box for most people. You give it some inputs, it somehow learns from it and gives you all of those amazing results. But it is not magic.
And the basic principle is actually not that complicated. So, you always start with a data set. In this case it is tons and tons of images that have certain tags, titles and descriptions associated with them.
This data set is curated so that it has as little false text-image pairs as possible. So for example if you have an image of an apple, but the text says banana, then it is useless and should be removed or corrected. Then this data is fed into the “deep learning” machine.
Based on mathematical processes it deconstructs those images into a long list of attributes and patterns, and connects them with the associated text. So it learns that an apple has a certain shape, certain colors, a certain texture and so on. And it does that with many pictures of apples and learns the similarities between them.
All of those learned connections will become a part of an artificial neural network. This acquired knowledge is stored here, together with everything else it learned, and connections between those pieces of knowledge are formed too. So, not only is that information associated with the word “apple”, but also with words like “fruit”, “green”, "red, “tree”, “pie”, and so on.
Even though those other words might have come from different images-text pairs from the data set. Something important to note here is that within the neural network none of the pictures from the data set are saved. You won’t be able to find any images, but only the knowledge it gained by analyzing all of that data. Alright, now the next step. We want to generate an image based on what this machine has learned, by typing in some words. So, you start with an image that just has noise in it.
Nothing else. Then you transform it with mathematical processes that can reverse a state of noise and chaos to something that has order and structure, a recognizable image. And we gave a text-prompt to guide it where we want it to move towards.
Let’s say an apple again. So the image is changed slightly and noise is removed that way. It still is mostly unrecognizable, but we might see the vague outline of an apple. So you keep on doing this over and over again until you stop it. Usually after a set amount of loops.
At the end you got your output, which is an image of an apple. This image is unique. This particular arrangement of pixels did not exist before. A common misconception is that this AI is taking pictures, crops out parts of them and then bashes them together in its own output. But as I described just now, this is not how it works.
They constantly work on implementing this technology in new areas and adding more features to tweak the results. Now of course there are far more details to all of this. If you want to learn more, then there are plenty of places to seek out. So this is going to be the biggest part of this video. Because there is a ton to unpack.
And all of this is significant. I know, you might fear that this legal stuff is going to be super boring and complicated. But I won’t describe everything in this cryptic legal jargon, and things are actually not that complicated.
If someone like me, a person who normally has nothing to do with this stuff can understand this at a fundamental level, then so can you. And I made sure to include as many citations that I read as possible, to minimize the chance that I get things wrong. Can’t make any guarantees, but I did a lot of research. So..., why is it important? Well, technology is supposed to benefit humanity.
It is supposed to make our lives easier, safer, more enjoyable, and all of that. It is not supposed to benefit a small group of people, while others have to suffer because of it. Laws and ethics aim to ensure that unfair practices are avoided and punished. They certainly are not perfect. Too often things that are unethical are not necessarily illegal. That’s why we constantly need to work on those laws.
Artificial intelligences, including AI image generators, have to follow certain rules, so that we can make sure that they will be used in positive and constructive ways. And in this case, we have lots of problems going on that need to be addressed. Let us start with the very first part of this technological process, which is the data set. Some of the most commonly used data sets are by LAION, an organization that collects images and embedded text from all over the internet, curates them and packages them into downloadable sets. To be more precise, they contain links to images, and you can easily retrieve those images by using their img2dataset tool.
Which is not all that different as if you would have those images directly in that data set. You just have one small hurdle in between. The one that is used a lot is LAION-5B, which consists of over 5 billion data points.
We got all sorts of stuff in it, including problematic images. Tons and tons of copyrighted materials. Owned by all sorts of organizations, reaching from huge companies to small studios, and even individual artists. When you read the question in their FAQ about whether or not they are respecting copyright laws, they are just trying to dodge the question by saying that it’s just links, and not the actual images.
And this can be easily checked by using a website that lets you search through that data set. You can type in copyrighted characters like Mickey Mouse or Pikachu and get that exact character. Sometimes even as official art releases.
And in some cases you can even type in artists’ names that posted their works on pages like Artstation and get plenty of results. Or use the reverse image search. I can even find my own artwork in this. And that’s not all. It is also possible to find medical records, non-consensual porn and images with extreme, disturbing violence.
Now, LAION is a non-profit organization, and as such it is allowed to collect data for research purposes. However, the companies using those datasets, like Open AI, are using it FOR profit. Now obviously this is very questionable, and so they even invented their own legal structure and call themselves a “capped-profit” organization.
I am not a legal expert, at all. But are you really able to just make up your own rules, just so that you allow yourself to use a data set by a non-profit organization to profit from it? Others, like Midjourney, do not even disclaim what data sets have been used. But the founder even admitted that they do not own the copyright of every single image from their data sets. And then comes up the question of who owns the images that have been output by those machines. The companies claim that they themselves own it, since it is their technology that made it. And they want to use them to further improve their products.
But then that brings up the comparison with photography. It is not the manufacturer of the camera who owns the copyright, but the photographer. They are the ones who chose the angle, the settings, the lighting, timing, and so on.
But what about those AI generated images? If you just typed in some words, did you really do enough to actually own it? Those are still very unclear and widely debated questions. But those companies do allow the users to commercialize the generated images. Is that even allowed? Now, there is the argument that they are transformed enough so that they fall under “Fair Use”, or whatever the equivalent might be outside of the United States. Which by the way is very different from country to country.
It is true that every generated image is unique. Unique as in the particular arrangement of the pixels. But that is not enough to keep you safe.
If you are just “close” to a copyrighted work, it already counts. If you are drawing something that looks very close to Mickey Mouse and sell it on a T-shirt, then Disney has the right to come after you. And it has been shown many times that some of the generated images do get very close to some images that were in the data set. Also, replicating copyrighted character designs like Pikachu, Super Mario, Mickey Mouse, Darth Vader and so on is very simple.
In those cases the copyright holders are in the right to take down those images. No matter if those copyright holders are huge companies or small studios and individuals. Often you hear the comparison that humans do the same thing. Artists look at references all the time and make their works based on that. Just transformed enough so that no copyright laws are broken.
There is a key difference here though: intent. An artist, if they are aware of copyright laws, would have to INTENTIONALLY infringe them. There is no way that they accidentally copied the work of another artist, or a licensed character design they knew of. Therefore, it is very easy for an artist to avoid such infringements. The same cannot be said about AI image generators.
The technology itself doesn’t have any mechanism that prevents it from doing that. It just makes images based on what it learned from the data sets. It can very well happen that you get an image that breaks some kind of copyright law, even though you never intended it and you might not even be aware when you look at the result. Which can get you into trouble if you commercialize that image in some sort of way.
Also, if an artist uses references, it is not just about the references. The artist’s personal skills and preferences, their experiences and memories, the story they want to tell and so many other things influence the art making process. An AI just has those references from the data set, and that’s it. So, it is NOT the same thing as an artist using references. That should be very much clear now.
By the way, a while ago I made a video about using references and when it is OK to do so. You can check it out if you’re are interested. There is another factor that comes into play, which is hard to quantify. The fact that artists improve those products without consent and without compensation.
One of the common techniques for improving the outputs you get is to write something like “in the style of so-and-so”. You see this often enough with old artists like Monet, Picasso, Van Gogh, Klimt, and so on. And also with artist names that are very much still alive. Their artworks are also part of the data sets and their styles can and are imitated in the outputs.
Now, you might come with the counter-argument that similar things like that already happen. For example artists uploading their works on platforms like Instagram, ArtStation, deviantArt and the like. Those companies also profit from it, and most of the time they are not monetarily compensating the artists. However, there are two key differences here.
The first one is consent. The artist has to make the conscious decision to agree to the Terms of Service and upload their artworks on those platforms. By the way, those platforms do NOT own the copyright to those artworks. The majority of times the Terms of Service you agreed to only allows them to display them publicly within the platform. If you upload something, and you owned whatever you uploaded, then you still own it. Secondly, the artists technically do get something in return: they can use the various features of the platform for reaching a large audience and communicating with them.
For free even. A lot of artists can make a living from commissions and such simply because of that fact. In the case of AI image generators, the artists gain nothing from it, and they never were asked if they consent. That’s a huge difference.
There is another point that suggests that this usage of copyrighted materials does not fall under Fair Use. There are four factors that judges need to consider, of which one of them is the effect on the “potential market”. So, we already found out that this technology can be quite good at imitating artists’ styles. A style cannot be copyrighted. Only a specific work, or character design, logo, or the like. However, this particular instance could harm the income of that specific artist.
For example they are widely known for their unique and aesthetic looking style, and therefore they get a good amount of commissions, because people like that style and want drawings that have the same look to it. Now they can use this AI tool instead to get what they want for much cheaper and quicker. Well, most likely not exactly what they would have wanted. This technology is not perfect. But it might be good enough. And that person might have otherwise commissioned that artist, but now they don’t see a reason to do that anymore.
Of course not all people who do that would have been potential clients. But there might and probably will be a significant impact on that artists’ number of commissions and income. Those instances are of course hard to prove and put into numbers. But it’s just a question of the scale, not whether or not it happens. It most definitely does and will happen.
And then there is another important thing that is harmed. The identity of that artist. For certain popular artists there is this flood coming in of all of that AI generated “art” that has been made by using their names. It is getting harder and harder to distinguish which artworks are the originals, and which ones are the fakes made by AI.
And that is terrifying for those artists. Which is a completely normal reaction. You put so many years into developing your skills, putting so much effort and heart into your creations, and then a machine comes in and imitates everything you do in amazing speeds. It is scary because this AI imitates a part of those artists. Which then evokes the question that if you think that this is still fine, then at what point will it not be anymore? When AI imitates your voice? The way you chat? The way you look in photos? Where exactly is the line? Identity theft is clearly a crime. However, in this particular case the definitions are very blurry.
There is another way how this AI can be misused, which is by pretending to be a digital artist in the conventional sense, accepting commissions, like for example on Fiverr, and then actually making them with AI tools. In some cases it is incredibly hard, if not even impossible to tell. Reliable tools to detect AI generated images do not exist yet, as I am saying this.
However, perhaps there could be one in the future. It would also have to be trained in large scales with deep-learning methods, which is costly after all. I have no idea if it is actually possible though, also considering that AI image generators are also constantly evolving.
Until then, committing fraud like this would be very easy for people, which also would damage the art commission market for non-AI artists by reducing the public’s trust. The discussions about AI “art” also often bring up the comparisons with other technological advances that replaced artists. The Printing Press, Photography, Digital Art, 3D Technology, and so on. And yes, they did in fact replace the majority of specific jobs. But in none of these cases were those new technologies built upon infringing the copyrights and identity of other creative people. That is a core difference here.
So far we have talked about the legal and ethical issues mostly concerning artists. However, there are problems with this technology that are way more far reaching. Deepfakes, Disinformation, Harassment, … there are many ways this technology could be misused if there are no regulations and mechanisms to contain it.
Spreading disinformation and forged images is already easy enough and successfully fools a significant amount of people. Which causes damages in all sorts of ways. AI image generators give people an enormously easy tool for producing images that portrait certain people in a way that damages their reputation. For example you type into the prompt line the name of a political figure, and then “showing their nazi tattoo on their bare chest”. If the machine has learned all of those things and has no filters implemented, then it could be easily produced. If you think that nobody would fall for something so obvious, then I seriously have to ask you if you ever have been on social media.
It has been shown that about 59% of the time people share links on Twitter they never even clicked on them. Now do you think that if people see a picture that is convincing enough and would fit their own personal bias, that they all would go through the effort to verify that picture? I am highly pessimistic about that. An idea would be to implement filters. For example by scanning the outputs and blur out problematic contents.
Doesn’t work all the time though, and sometimes they give you false alarms. Or you use a filter in the prompt line, so that if you type in certain words and names, you get unrelated or no output at all. Some of the current products out there already use text filters like that, but filters can be fooled.
Blacklisting websites, like ArtStation, won’t work too well either, since duplicates of those images often exist elsewhere. However, if you start with a data set that only has images from the Public Domain and images you own the licenses for, while also avoiding image sets that could produce problematic results, then there would be way less issues with this technology. The key point is that the technology itself is not a morally wrong thing. It is how we train and use it.
Now, could you come in and sort of “clean up” those neural networks and remove all of that data that is based on illegal and unethical images? So, could you “opt-out”? The answer is: not really. Again, it is not the original images that are in that neural network, but instead the attributes and patterns it learned. Going in and pinpointing which connections are based on one specific picture from the data set is close to impossible. For the most part you can say that those machines cannot forget. Therefore an opt-out system would not work. You can opt-out of a data set, but not out of an already existing AI system that learned from the data set.
Also, even if a good way for opting out would exist, not everyone would know about it. You would have to implement a wide reaching notification system that copyrighted materials have been used for machine learning and present the owners the option to opt-out. Which is not realistic. Especially since very often when you get images from the internet there is no license or author information next to it. Creative Commons licenses are in a gray area.
They usually require attribution, so you need to give credit to the original author. How exactly would you do that in the case of AI generated images? Should there be an enormous list of names on their website? Or are you going to trace everything back every time you make a new image and see which Creative Commons works have been used and spit out the author names of those? That sounds as difficult as the machine unlearning problem, if not even more difficult. So, everything has to be opt-IN by default! Companies are only allowed to use data sets that they clearly know where the images come from and are allowed to use them. And yes, that would mean that a lot of existing AI image generator systems would have to be scrapped.
They would have to start from scratch, since as I said before, those machines cannot unlearn. And the new systems would perform worse and less versatile, since they are based on much smaller data sets. Or those data sets would be far more expensive, if they are going to buy the licenses for millions or even billions of pictures. That would be very damaging for those companies, and could even ruin them. Which means jobs would be lost.
And machines would have to be re-trained, which requires a lot of GPU power. Also not an optimal solution, but as far I can see it, it is the best solution we have. Also, those people won’t have too much of a problem to find new jobs.
AI engineers and programmers are very high in demand. It’s not like with coal miners for example. Then there is of course the fact that laws aren’t the same in every single country. If you enforce such an opt-in system in the EU for example, but outside of it companies are still allowed to do whatever they want, then companies inside of the EU simply could not compete. The products of those foreign companies could be restricted or even banned inside the EU, so that puts some pressure on them, but would that be enough? Well, I don’t really know how this works, or could work.
I am not an international law expert. And some situations are so complex that not even those experts exactly know what would be the case. And then there is another huge problem that makes the enforcement of certain rules very difficult. Stable Diffusion, one of those text-to-image models, has gone Open-Source. This means anybody can download the source code and run it on their own computer.
People like you and me can have our own AI image generator, without even having to be connected to the internet. And we can also change the code however we want. You are seeing the problems here? Even if the companies are forced to comply with the laws, those individuals can easily dodge them. You cannot control what every single person does on their personal computer.
So for example, someone can just come in and train a machine to copy a specific artists’ style. like it was done with the renowned artist Kim Jung Gi who has passed away just recently. This has created a huge outrage in the art community and beyond.
And even if there are bans and filters out there to prevent deepfakes being made, people and shady organizations could easily bypass those filters and generate whatever they want. And catching them is very difficult, especially if they live in countries whose laws are much more lenient, as long as they don’t target the governments of those countries. Certain governments could even use it for their own purposes and just deny any involvement. Like it is already happening.
Spreading fake news that leads to more chaos and division, so that your enemy is busy with internal problems and you can politically profit from it. Sounds familiar? Not going to name any specific governments and politicians though. I don’t want to drift away too much from the actual topic of this video. By the way, this is actually not new.
Deep fake open source software already exists, and is actively used for creating disinformation and even deep fake porn that targets specific individuals. Most often women, and not just celebrities. Even though this software exists for several years at this point. Open Source often is a good thing.
It helps research and accessibility. But in this particular case there are some severe problems. I also would like to point out this double standard: Stability AI, which is the company that developed Stable Diffusion, among other products, is also working on “Dance Diffusion”, an open-source AI audio generation tool. So not only can you make visual media with AI, but also music.
However, in this case they DID make sure to avoid any kind of copyrighted materials. They admitted that their models tend to memorization and overfitting, which can lead to unwanted copyright violations if copyrighted samples have been used. And that was the reason why they didn’t include them in the data set from the very beginning. So… why exactly could they not have done the same thing with their visual counterpart? They must have known those problems before they started developing Stable Diffusion. Those kind of technologies aren’t actually new. People have been working on this for quite some years already.
Well, the most plausible explanation to me: they fear the music industry more than the industry that produces visual media. It shows how people treat images and videos and how little they care about who made them. Especially on the internet. There is still a large amount of people who think that if an image is on the world wide web, then it is fair game. If you uploaded something, then you don’t own it anymore.
Which is legally and morally not correct. Now, I also don’t want to see an internet where anything copyrighted is extremely controlled. Everything you upload is scanned for potential infringements. You are not allowed to play games in videos and streams anymore.
No memes depicting anything remotely copyrighted. No fan art. Nothing. That would suck. Sure, it would not be the end of the internet.
But so many things that we love would be gone. Just so that the large companies have absolute control over their IPs. You have to look at things case by case.
Sometimes copyright violations are harmful, sometimes they are harmless, and sometimes they can even be beneficial for the copyright holder by basically providing free advertisement for them. As far as I can see it, AI image generators and their current business models are pretty much only on the spectrum between harmless to very harmful. So, as you can see there is a large number of legal and ethical problems with this technology, that even goes far beyond the art world. Unfortunately, laws are always lagging behind new inventions. Especially nowadays when new, far reaching innovations pop up so incredibly fast. It takes time to go through new lawsuits and develop a foundation of cases to reference from.
And you can very much bet that those companies are going to do everything they can to fight against regulations that could significantly harm their revenue. But things definitely have to change, and fast. This current legal wild west situation cannot remain forever, especially since the capabilities of those technologies keep on advancing further and further. One of the biggest fears of artists is that all the jobs and commissions will be gone for them.
That everything will be done by AIs. Well, let’s say a studio wants to use one of the many AI tools that are based on those problematic data sets right now and produce an animated movie with it. It would be actually a very risky and potentially expensive thing to do. Because you see, they don’t just produce something and once it is done they release it. Before they are allowed to release anything, a legal team has to come in and look at every single detail, asks you how you made it, what tools and references you used, and makes sure that any kind of copyright violation is avoided. Why? Because if it turns out that their work, after they released it, actually contains something that they do not own and the original copyright holder comes after them, they will be hit by a take-down order.
All the digital releases, prints and what not have to be removed, compensations must be payed, and the copyrighted materials have to be replaced. Or you scrap everything altogether, because it is not worth it anymore to edit and re-release it. That is a scenario that those studios want to absolutely avoid. Now if you use an AI tool where you cannot be sure that it might violate some copyright laws, even if it is not intentionally, then that would be a risk.
Could be a large risk, or a rather small one. But the risk exists, and is very hard to control. So I can tell you that the industry is not going to replace artists in wide scales anytime soon yet. In certain areas perhaps, like concept artists, colorists, in-between animators,… tasks where there is no risk of creating any copyright violations for the final releases.
But other areas, like character designers and background artists, don’t have that much to worry about. At least not yet. That’s the thing… I can only base this on assumptions that make sense to me, but I could be wrong and things will turn out completely differently. As a matter of fact, nobody can clearly tell how things will develop.
One thing is for sure though. These products are not just there for being used as tools by artists. Sure, they can be used in various ways to make things more efficient and give you some new ideas.
However, they would primarily replace artists. Make it so fast, cheap and easy to use that you can replace an entire team with just one person. That is the vision. There are already certain individuals and studios that are making comics and animations with those tools. And well, to those I wish good luck.
I’d love to see how they are making sure that their finished, commercialized works are completely free of any copyright violations. There is another factor outside of the legal ones that would discourage studios from using AI tools: They have a negative reputation. And if using a certain tool gives your studio a bad reputation, therefore lowering your sales, you would want to stay away from it.
Another area: art competitions. In the Colorado State Fair’s annual art competition in 2022 a piece of AI generated “art” won the prize for emerging digital artists. The creator won 300 dollars.
Goes without saying that people were not happy, since he put significantly less effort into his work than everybody else. It’s like going to a carpenter competition, where everybody painstakingly makes furniture with normal tools out of raw wood, and then some dude gets something from IKEA, adds a few details, gives it some new colors, and won with it. How exactly is this fair? That’s not what those competitions are all about. There has to be a clear distinction between conventional art and AI “art”. Even calling it “art” feels inappropriate. But I’m not going to get into that discussion, because defining art is highly subjective.
Then we would get into semantics and philosophical blah blah... Now let’s assume we are going to have a text-to-image tool that produces high quality, aesthetic results without utilizing any questionable data sets. Studios would not have to worry that any copyright violations might accidentally occur, and all privacy laws are respected. Well, I do have to admit, it would probably become a dominant tool in the industry. Companies can save a lot of money and time using it. Not all artist positions would be affected equally, but it would definitely become much harder to find a job as a conventional, human artist.
And the argument that everyone should just switch to AI doesn’t hold either, because the amount of people you would need to produce something would significantly go down. There would be much less demand for artists, even if they use AI tools. And this is already in the process of happening. You can simply go to Fiverr and look for commissions like DnD characters or album covers, and see that people offer their AI tool assisted services for much cheaper and quicker than services offered by actual artists.
I wish I could tell you that this would have no effect on the job and commission markets. But we do have to stay realistic, whether we like it or not. And it is not just artists. For example, stock image companies like Shutterstock and Getty.
They are actually banning AI generated images off their platforms. Well, or at least they are trying to some degree. But fact is that if you could make any image you want without having to buy a license, then their business as it is today would go down. At least in the still image area. This all sounds very dark and gloomy for us artists, but as I stated at the beginning of the video, it is actually not that hopeless.
Very far from that. Again, I said all of this under the assumption that we would get a super advanced text-to-image generator that was built upon a clean data set. Who knows if we ever get that. And there are still a lot of factors we haven’t looked at yet.
So far we almost only looked at the production side and the finished works. But what about the consumers? The people who look at the artworks, animations and so on. How do they deal with this shift towards AI? Well, people would lose trust. And they already are. I can use myself as an example. I used to randomly browse art and check out what smaller and newer artists have made.
Every now and then I found some stolen art, which was always frustrating, but it wasn’t THAT frequent of a problem. Nowadays I have no interest in doing this anymore. Lots of the pictures that get uploaded were made with AI tools. And at first glance, if you only look at the thumbnails you cannot really tell. Sometimes they are properly tagged and described as AI art, so there is at least that. But nobody forces people to add those disclaimers.
It is incredibly easy to upload AI images and then claim that you painted it. I have lost all trust. If I do not specifically know the person who posted the image I am seeing, and I know what their skill level and style looks like, then I have no way of knowing for sure if they used AI tools or not. I am not interested in seeing an AI generated image.
There is nothing inspiring, impressive or moving about this to me. Sure, from a technical standpoint it is amazing what is possible nowadays. But that’s about it. It is not just me who thinks that way. People in general will become more distrusting and indifferent. The novelty of this technology will fade over time, as it always happens with any new technology.
For example, the first 3D animated, full length feature film Toy Story, which was released in 1995, was a huge deal back then. And nowadays you have hyper-realistic 3D animations all over the place, and even those are already not that special anymore. The exact same will happen with AI generated images and videos. Especially because it will create an enormous flood of content. So many high quality pictures that anybody can generate within seconds and post everywhere. And small studios making all sorts of animations, movies and games with it.
People will grow numb to it. The companies behind those technologies use a lot of grandiose words to describe their products and plans for the future. They make a lot of claims and promises, but will they actually be able to keep them? I want to give you a general advice. Take what CEOs, investors and other people that are financially involved have to say about their new, hyped up product with a BIG grain of salt.
You will most likely get a strongly biased and distorted image of its actual potentials. They are trying to sell you something after all. Especially nowadays, and ESPECIALLY in the Deep Learning AI world, over-hype is a widely spread phenomenon.
Nobody talks about the start-ups that fail. The promises that haven’t been kept. It’s always about the next exciting thing and how it COULD change our lives, without actually studying the demand and limitations of it. And the hype around text-to-image generators is huge right now. Which attracts more customers and more investors.
Companies like Open AI are exploding in funding. There is one core detail that barely anybody talks about. The question of how far this technology physically can go. There are a couple of factors that limit deep learning AI systems. Firstly, the size of the data set. Perhaps almost 6 billion pictures is actually not all that much, considering the huge task it is supposed to do: “Create a unique image however you imagine it”.
Processing power is limited too. They need to use a huge amount of GPU power to train those models, and run the image generation processes. And then the capabilities of the mathematical models themselves, which might be actually the biggest limitation. Deep Learning AI models are not capable of doing everything imaginable.
To give you an example, there was a contest for beating an old and highly challenging retro game, called NetHack. Various AI systems competed against each other to find out which one would score the best. Deep learning machines were expected to be dominant in this competition, considering that they were so in many other games. However, their performance was actually lacking quite a lot. Symbolic AI, which is driven by pure logic instead of relying on training a neural network, was by far the best system.
To better describe the difference, let’s take the scenario where the character is low on health. In symbolic AI you give the instruction that it should automatically heal itself if possible. Whereas deep learning AIs would have to go through lots of trial and error, and learn from its experiences that it has a better success rate if it heals itself. Now what was the reason why they failed against the symbolic AI? The biggest factor is that the game is randomized every time you start it anew. And you had to start a new game every time you died.
And it’s a pretty difficult game, even though the basic principles are not that difficult to understand. There is just a ton of randomness involved. Deep Learning AIs can play predictable games with ease, like chess, Go, and the like.
However, this one was a completely different story. In the case of images, you kind of have a mix of those situations. You have very easy and predictable cases. The user wants just some image of an apple on a table. And they will get exactly that. But as complexity increases, and the user has more specific demands, the generator will start to struggle.
It can still output something that might look kind of like what the user wanted, but if the user requires perfection, then this is not good enough. As a result the user has to keep on re-rolling and making adjustments. Which requires time, and GPU power too. And in the end drawing over the result and editing still might be required.
In some cases it could have been easier to actually draw or model what you wanted yourself. If you have the skills, that is. So, the randomness that makes the machine struggle comes from the prompts. Sometimes it nicely fits what it learned from the training and is easily predictable, like a chess game. And sometimes it barely has anything to do with its training data, and it has problems generating the desired results, like in the randomized game NetHack.
There are also a bunch of examples of this in already existing AI technologies. Like speech recognition systems. They made big advances over the past decades, for sure. But the improvements over the last few years are not that big, despite the large amount of funding.
You still have to speak very clearly. Accents and dialects are still causing lots of errors. Loud background noises make it harder. Context is still not always understood, and so on. It is overall pretty good, but it just cannot compete with a human listener that is familiar with the language, accent and subject matter of the spoken words. That is also why I still have to manually add subtitles to my videos, even though automatic closed captions exist.
Or do you perhaps remember IBMs deep learning machine “Watson”, that was claiming to revolutionize the medical world by basically making most doctors obsolete. It scanned through tons of medical data and images like CT scans, and was supposed make diagnosis based on those. Well, a couple of years have passed, and in January 2022 it was sold in parts.
Not a singular doctor was replaced. It doesn’t mean that this technology is useless. But rather it can only be used as an assistant by doctors for making more precise diagnosis.
By themselves those deep learning systems are not good enough. And the list goes on. Language translators, self-driving systems for cars, chat bots, and so on. They all cannot quite reach perfection. And sometimes “good enough” is not good enough.
AI image generators made huge leaps within a few years and even the last months. There is no denying that. But that could very well be because of the recently exploding media attention, investments and the fact that Stable Diffusion went open-source, which sped up the development significantly. Thanks to all of that extra momentum they were able to upscale and kind of catch up to where they technically could have been years ago. If you look at the typical learning curves of AI deep learning systems, you usually start with a large jump in accuracy, but over time the curve will flatten more and more towards a certain accuracy that is lower than 100%.
And you have that in problems where there is a clear true or false distinction. For example for determining if a tumor is malignant or benign. And then you test if those true and false results actually match the reality. But in the case of generating images, there is no true or false.
And you don’t even want that kind of distinction, because you don’t want to straight up replicate images from the data set. This is a more subjective problem. We can easily say that a strange blob does not count as a successfully generated image of an apple. A photo-realistic picture however would be a success.
But at what point in between would you count it as not a failure anymore? This makes things vastly more complicated, especially with more complex prompts than just an apple. As I quoted Stability AI themselves, those kind of models tend to overfitting. Which means that those models will follow errors and noise too closely.
To better explain what overfitting means, let’s look at a more relatable example: You trained your dog to raise its paw when you hold out your hand. Overfitting would mean that the dog only does this if you hold out your right hand. The dog gets confused when you do this with your left hand, or if another person does this. So your training model failed at achieving more generalized results.
It only works perfectly for this very specific scenario. Those weird fragments and the blurred out look of those AI generated images are because of that. It is incredibly difficult to get rid of that in such a complex and flexible model. If not even impossible.
Something else is happening on social media. In addition to making all of those lofty promises of revolutionizing humanity, they are also attacking opponents of that technology. Which are especially artists. “Democratizing art” is what you often hear as a selling point. Which implies that art was not democratized before.
Not available to the public, but only in the hands of a few elites. Us artists. First of all, ANYBODY can draw. Anybody can communicate visually by creating images with paper and a pencil. Or a stick and the dirt under your feet.
Even small children are capable of that. And anybody can learn how to get better at creating art. Especially nowadays there are plenty of free tutorials and tools out there. Like my own tutorial videos here, and so many others all over YouTube and the rest of the internet.
It never has been easier to learn art, as a matter of fact. This is just about producing high quality images without having to put any effort into it. That’s it. What exactly are you trying to democratize here? This is not comparable at all to things like the right to education, the right to vote, the right to free speech or any of that, even remotely. But that statement does one thing: demonize artists.
The artists are the gatekeepers. The tyrants, the anti-democratic elites that want to keep their treasures for themselves. I hope… I really hope that I don’t even have to explain why that is an utterly ridiculous statement.
The reasons why so many artists are against this technology has NOTHING to do with that. But rather the concerns we have are the legal and ethical problems with this technology. But you know what else this shows? The desperation of those people who go against artists like that. If you get cornered and cannot argue against those ethical accusations, which were clearly proven already in many cases, then you can either admit it and change your ways, or you come with ways to paint yourself as the victim.
A common tactic. Now this is getting rather heated, and I want to say that even though certain individuals high up there actually do claim those things, I am not trying to generalize. The vast majority of people that work at those companies are just normal people that try to earn some money to feed their families.
They are passionate about technology, they enjoy working with AI, and are people who genuinely want to do good things. And I want to encourage you all to always remind yourself of that. Do not generalize. Ever.
Not just in this particular situation. Also, there is an argument in favor of this “democratization” that I actually support. The fact that it gives certain disabled people a tool for creating art. For example people with severe carpal tunnel syndrome, Parkinson’s Disease, and the like. They could use speech recognition systems for writing prompts and making images with that.
That is fantastic for those who are really passionate about creating art, but couldn’t do so before. On the other hand though, and I have to be that guy… Those people would still be at a disadvantage. To create something outstanding you still need to make adjustments via selecting things and drawing over parts. Or modeling something at first and then running the image generator over it.
They would still struggle competing on the cruel, capitalist markets. And I think that it would be better to let those millions and even billions of dollars that go into those AI image generator technologies right now flow into medical care, so that people with those disabilities can afford getting the surgeries, treatments, prosthetics and medication they need. Then not only would they be able to hold a pen normally, but also live a normal life else wise. And isn’t that what most of us humans want to do? Those text-to-image generators will create more accessibility, but I don’t see it being revolutionizing. “Democratization” is still not a suited word for this.
Ok, now let us assume that we have a text-to-image system that was built upon an ethical data set. There are ways to easily avoid and counter things like fraud, deepfakes and so on. And they can actually keep their promises and you are able to create whatever you want. So, how I imagine it, you would have tons and tons of small studios popping up, making comics, animations, games and all sorts of other things at very low costs and with amazing looking results.
That would pretty cool, for sure! So many opportunities would open up for people. If I for example could easily make animations of my lil mink, and not have to put a crap-ton of hours into it, then that would be neat. But the thing is… it wouldn’t be special anymore. Not only do I get much less gratification from it, since it cost me so much less effort, but everybody else can do this too. There is nothing outstanding about this.
There would be such an enormous flood of content, created by all sorts of people. You would have to compete with all of that and somehow stick out. And, like it is nowadays, the big studios that can afford to have larger and more impressive productions and far more money for marketing are the ones that receive the most attention. So you can make that comic you have been dreaming about, which is cool. It really is.
But are you going to be able to make a living of it? Or even any considerable amount of money? Not very likely. Sure, there are other things than money. But these AI tools will get bland over time. Your first comic will be exciting to make.
You dreamed of creating it for so many years. The second one might still be fun. But then it becomes less and less special.
And you know that everybody else is making those things. Another idea I heard was communicating with others by sending AI created images. Which indeed does sound fun. Like a feature next to the emojis and GIFs we currently are using in various chats. Great for creating memes on the fly. Or describing something with an image, which might be more efficient than just using words.
But would you say that all of this is revolutionary? Benefiting humanity in significant ways? It’s a fun and exciting gimmick. It enables studios to save a bunch of money. And in certain areas it has convenient uses. But that’s about it.
I know that this is all speculation. I cannot look into the future and see all the possibilities it might have. But I can say with relative certainty that it won’t solve hunger around the world, stop wars, fight corruption, eliminate xenophobia, solve the climate crisis, cure cancer, or anything like that. There are some AI systems that can actually help solving those problems. But this ain’t one of them.
So let’s not fall for that overselling. Oh, by the way. I often heard that all of this is one of the steps towards an Utopian idea of a world where nobody has to work anymore since everything is done by robots and AI.
All humans can do whatever they want and enjoy life. I’m not even going to bother with this, since this is so far ahead into the future and so unsure if we’ll even get to that point. I’d say it’s still just science fiction. It’s a great vision, for sure. But also, consider that the transition to such a system would be extremely challenging.
To get to 100% unemployment, you first need to get to 20%, 30%, 50%, and so on. Let’s actually focus on solving the current problems. Of which there are plenty. This is just like dreaming of living on Mars, while we can’t even take care of the planet we are currently living on. This aspect is often left out of this discussion.
But it is also very important, and why I am confident enough to say that artists will NEVER be completely replaced. The human aspect is about how we feel, how psychology works. You often hear that you should leave this emotional stuff out. "Facts before feelings!" But humans have feelings, and those feelings guide how we perceive and consume things. That IS a fact. So, even though there were many technological advances that largely replaced a lot of artists, most of the time they were never completely replaced.
Portrait artists still get commissions, even though pretty much everyone has access to cameras nowadays. People still buy handmade crafts, even though often there are mass-produced, cheap alternatives out there, which might even have a comparable quality. Why? Because HUMANS made them. Which creates a more special, emotional connection to that object. I can give you a personal example.
Most of the stuff I own is mass-produced. My furniture is from IKEA, my electronics were assembled all over the world, my tools were manufactured by machines… I have some kind of connection to them. A practical one, since those things are useful to me. A monetary connection. Those things cost money, which is very limited in my case.
And I might have some memories attached to them. But I don’t have an emotional connection to any of this stuff. However, this little guy here. This plush is different. Well, first of all it is cute and soft. But more importantly, it has been made by a good friend of mine.
I watched how it was made during her crafting live-streams. Chat helped me choose the name. Absolotl, the axolotl.
I made some sketches of that little guy too. And whenever I look at him, he reminds me of my friend. A person, with a certain personality, history and all of that. If I would lose this little plush, I would feel devastated.
Would it be a reasonable reaction? No. After all, it is just a plush in a purely physical sense. I have no practical use for it. Other than decoration, but that is also more of an emotional aspect you could say. However, that doesn’t change how I feel. So, it is important to me.
And the same thing goes for art. If you just get something that an AI machine has spit out, then sure, it might be impressive. It is pretty amazing what technology can do. But that’s about it.
And that novelty will fade over time. But if you got some art from a human, perhaps one that you specifically commissioned,... now that’s a whole different story. Maybe you even know that artist personally and wanted to support them. Even though it might not be the most amazing artwork, and AI tools could make something better looking, it would have a story and an emotional value attached to it and therefore makes it feel so much more special.
And that value would not decrease over time. As a matter of fact, it might even increase. 30 years later this little guy will still remind me of my friend and all the good memories I associate with her. Sure, not everybody thinks that way.
You cannot project your feelings onto others. Some people value the practical aspect of things much more and don’t really care about the emotional, the human aspect of them. Which I am not going to judge. But there is a considerable amount of people who do care about such things, and there always will be. Simply because we ARE humans.
That’s how we tick. We seek out connections with other humans. We are group animals. Evolutionary it makes sense.
There is another psychological aspect here. And we can already see those effects. AI controlled systems aren’t always good for our well-being. Take the algorithms behind social media for example.
They are designed to maximize the amount of time you spend on their platform. That’s how they compete with other social media companies. But the psychological well-being of the consumer is not considered in this equation at all. Even when those companies, like Facebook, actually know of those negative effects. It’s just about short term profits.
However, in the long run this won’t work. There is a growing amount of people, especially young people, that are permanently leaving social media. And the majority of time they report that they started feeling much less anxious and their self-esteem has improved. Another trend is the return to “dumbphones”, while the sales of smartphones go down. Smartphones have lots of great features and can be used in so many different ways.
But often those features aren’t essential. You don’t need to always have Google with you. You can look things up later. You don’t always need GPS, or a camera, or video games. None of these things are essential. People got by just fine without those things years ago.
The novelty of smartphones is declining, and more and more people realize that they don’t really need them. I personally also barely use mine. The most useful feature for me would probably be the authenticator apps. And even then there are alternatives to 2 factor authentication most of the time.
That flood of AI generated content will only accelerate that movement away from social media and technology. It would be just exhausting. I also heard this idea that AI could automatically create content for you, based on your personal interests that the machine has learned about you.
Sounds impressive. And also discomforting. Even more constant attention seeking, coming from all sides. The attention span is shrinking away. Just consume, consume, consume,… and your privacy doesn’t matter.
I would hate that, and steer away from those products. Of course, that’s not how everybody would feel and act. Some people might actually love it.
There is also a good amount of people that greatly enjoy using social media and it offered them fantastic opportunities. That’s the thing. It’s always a mix.
It’s never just black and white. Some people love AI generated content, and some despise it. And you got everything in between. That’s why you shouldn’t feel like AI is going to take over the creative markets completely. It might dominate it. Perhaps.
I cannot really tell. But it won’t be 100%, and probably not even close to that amount. Also, I am going to bet that if we get to that point where the majority of content is AI made, that there will be a new sales-pitch attracting people: “human-made”.
A movie made completely without AI. You know, how it usually is made nowadays. Or conventional artworks, instead of the AI generated images. Those human aspects will be highlighted much more.
Until just recently it didn’t make sense to have something like that in your marketing. It’s like putting a 100% vegan sticker on a bottle of mineral water. Like yeah, duh.
But if you put a sticker like that on a sausage product, and there is some kind of system that makes sure that you can trust it, then it bears so much more weight. And we would have a similar situation in the creative entertainment world. Even though it would take a lot more effort and costs to produce so