Sir Potata: I think I’m in loss for words after realizing how much time has passed since my last post. It seems like I have reached a new level of procrastination (Kuyta: procrastination? more like a waste of life) with this performance of mine. However lots of things happened which I would rather not get into as it would expose how stupid I actually am, only if it wasn’t obvious already. The last post, we covered the two AI winters that happened and the boom in between. In this one, we are going to cover where we left off and come to where we are today. Anyway, there is a lot to talk about, but this time we are not going far. The first stop is the 1990s, where the last winter ended.
Well, actually, it didn’t really end with a boom again at least instantly. But there was also no “excess” public hype on the developments either which gave the researchers some room to breathe. As someone who witnessed a small portion of time between the current boom and the 90’s, I can say that this time, the people who marketed the models were more careful about their choice of words. Rightfully so, because everytime something labeled as AI failed to meet expectations, things went south. When the general public heard the word AI, they thought it would overthrow humanity in time like some garbage sci-fi film or novel. As you can see, there was actually so much going on
But AI was more into our lives than ever. Probably the most used AI is the search engines, but nobody labels it as “relevant site finder AI’’ or something but an “engine.” Another instance are social media “algorithms” that serve content (for ex. YT, Insta) or speech recognition present in computers since Apple shipped their consumer computers with it since 1993. And, 2011 comes with Siri being the first virtual assistant who screamed artificial intelligence, but instead was referred to as “intelligent assistant” in the introduction event, never once mentioning the artificial part. This basically is what I meant in the paragraph above. Now when somebody begins “artif-” (Kuyta: Artificial foods? Sir Potata: Fuck off you aren’t Bill Gates, but also not incorrect :) ) all attention is drawn immediately. Good to see how far we have come.
Let’s come to how we picked up from last winter. One of the factors behind the AI winters before was the lack of computational power, which must have been felt extreme in the 60’s. Thanks to the short wait in the field, Moore’s Law came to help and fixed that problem. The law states that speed and memory of computers double every year, which is an extremely amount of improvement if you are patient enough. (Kuyta: Some fun trivia: “Doubling a penny every day for 30 days”). It was finally the time to make the first dreams of AI come true. Another factor is the data available. The Internet solved this problem very well, maybe too well for some people. We will discuss it when we move to our discussions about arts. But suddenly, very huge datasets to feed AI were born and sampling plus processing became really easy compared to what it was then. This is what the term Big Data refers to. Another term that has the same significance is Deep Learning. Even though the road to Deep Learning began before 2000’s, it began to be used after its successful results were seen around that time. Kuyta has already talked about these two in his post “How Does AI Work?” Keep in mind that these two terms are the reason for all of the developments that this post will mention.
I had already mentioned Deep Blue when talking about Turing’s speculations about the field. The same year, researchers Sepp Hochreiter and Jürgen Schmidhuber developed LSTM (Long Short Term Memory) which is a type of recurrent neural network (RNN). To give a brief explanation of what it does, RNN’s can store information about an element and update it with new input, making them suitable to analyze data where the order is important. This was a big step for handwriting, speech recognition, and natural language processing.
Continuing with language processing again, one important paper, “A Neural Probabilistic Language Model” dropped in 2003 from 4 researchers, Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin. This paper is the basis of the capabilities that we have today with computers interacting with language, thanks to the new approaches taken. In this paper, the “curse of dimensionality” is when there is too much discrete data, it is impossible to compare data and define correct parameters for a dataset as big as a language. To overcome this problem the researchers have used the probabilistic relation between words that frequently appear next to each other in a sentence. This was a move from “conditional probabilities” to “distributed representations,” as they have concluded. I want to take a quick break and bring what I understood from here. When I was experimenting with ELIZA it was obvious that the responses were triggered by specific words in the input which was a conditional response. But ChatGPT (You didn’t see this coming, did you? Kuyta: I had a feeling though) recognizes the patterns in speech and finds the most probable words and sequence to reply with, simply guessing what comes next. (Link to the article).
I probably can’t go without mentioning IBM’s Watson winning Jeopardy! against two champions but I have talked about NLP and speech recognition so much already that I want to move on. For the interested, here is a video of how it went: Watson and the Jeopardy! Challenge.
Now moving on, something that will spark a good amount of love in some of you happened in 2009. Three researchers, Rajat Raina, Anand Madhavan and Andrew Y. Ng (Kuyta: Andrew Ng is actually the GOAT in machine learning tutorials) released a paper titled “Large-scale Deep Unsupervised Learning using Graphics Processors.’’ It’s self explanatory, really. The problem: Unsupervised learning takes darn long. Solution: Take advantage of a computer component that is specifically designed for processing parallel data fast instead of the CPU. Nowadays, it is the basic practice to use GPU for AI. Combined with cryptomining, this surely resulted in some fun moments recently for gamers who just wanted to run their latest extremely demanding video game on a brand new GPU (Kuyta: funny how the prices skyrocketed Sir potata: You use a goddamn Macbook dude, why do you even care).
Let’s talk about image generation now. The type of networks used in the popular image generation programs like DALL-E or MidJourney are called GAN (Generative Adversarial Networks). The term was first coined by Ian Goodfellow in the paper titled “Generative Adversarial Nets” published in 2014. Within these networks, there is G, the generative model and D, the discriminative model. The interesting part is here, while it’s generating, it is also trying to fool itself at the same time, creating “adversity” and a product that is almost indistinguishable from reality. However it is important to keep in mind that this type of network has some setbacks. Training such network is not easy for several reasons. Firstly, the network is prone to memorize the dataset and produces almost identical results to the training data. Secondly, even if it produces something new, there can be a lack of diversity in the output. Lastly, training requires tons of computational resources and time. The first two reasons actually pose problems for artists more than it does to users, which is a topic definitely will talk more about in the upcoming posts. Here is the link for the article.
Then pretty recently, in 2017 a paper called “Deep Unsupervised Learning Using Nonequilibrium Thermodynamics” was published by 4 researchers Dickstein, Wells, Maheswaranathan and Gangull. Seriously, the title sounds like a random word salad, at least to me. For now, I’ll only say that this is the article that discovered the technique that is used in the best image generation programs and it is some real black magic. (Wait for the post about my Image Generation with AI post for more details.) Obligatory article link.
And finally we have come to the part where Kuyta has specifically requested me to talk about, LLM’s. He is absolutely right though, these past years of AI history can’t be concluded without mentioning LLM and transformers. Kuyta has already written a post about them but to look at it historically, they are parallel with transformers. So, what are transformers? In 2017, Google researchers released a paper titled “Attention is all you need” and introduced transformers to the world. Transformers are deep learning architectures that use self and multi-head attention mechanisms, which is what the title is referring to. Self attention mechanisms work to measure the importance of other words in the sentence compared to the one at hand. And multi-head attention combines self attention mechanism outputs to capture different relationships in the data. A transformer consists of three components, encoder, decoder and softmax layer. LLM’s are mainly transformer type of neural networks. Machine translation tools are also based on transformers and still are not perfect. This explains the reason why the researchers back in the 1960’s and 70’s were so disappointed, they were dealing with a problem from 40-50 years in future. Link to the article.
For the ones that made it until here, congrats for your patience :D Believe it or not this concludes the History of AI series. Well, this series doesn’t actually align with the main aim of our blog but I wanted to delve in anyways. I think there is a relation between the current problems with AI and how we got here. From now on, we will discuss the current situation of the field rather than the past and look into the impact AI made over our lives, mainly art and creation.
This series was very challenging to write to be honest because the amount of technical details and abstract concepts that lies behind just one discovery was astounding. But good news, the technical part is mostly over. We can finally move onto the main part of the discussion. Not to pass without mentioning, we have some fun content planned ahead, make sure you follow them. Until then, I’m out.