In our last post on Generative AI, we briefly discussed the general concept of generative models and how they are used in different domains of text, images, speech, video, and coding. In this post, let’s take a deep dive into the technology behind these state-of-the-art generative models.
But before we start explaining these models, let’s see why it is important to understand this technology and how impactful it can be to shape the future IT industry of Pakistan.
According to Gartner’s predictions:
• Generative AI will produce 10% of all data by 2025. Currently, it’s less than 1%.
• 50% of drug discovery and development initiatives will use generative AI by 2025.
• By 2027, 30% of manufacturers will use generative AI to improve product development effectiveness.
• By 2025, 20% of all test data for consumer-facing use cases will be synthetically generated.
• By 2025, 30% of outbound marketing messages from large organizations will be synthetically generated.
• By 2025, 90% of the material in quarterly reports will be synthetically generated. According to 2023 Gartner Emerging Technologies and Trends Impact Radar, Generative AI will play a major role in production revolution in just a span of 3 – 6 years.
Over the past three years, venture capital firms have spent over $1.7 billion in generative AI solutions, with the largest money going towards AI-enabled medication research and AI software writing.
The rapid growth of generative AI presents significant opportunities for the IT industry in Pakistan. Embracing this technology can help pave the way for innovation and advancements. Early adoption and skill development can help in positioning Pakistan’s IT industry at the forefront of technological advancements.
Lets see now, how do these models work and what all they can achieve?
How do Generative Models Work?
Generative models use existing content like text, audio and videos, images, and even code to create new possible content. A well-trained Generative AI model can generate completely original artifacts that would look just like real. This is achieved by learning to abstract the underlying patterns related to the input data and generate or output
new content. To understand how generative models work, let’s first understand the distinction between discriminative and generative models.
Discriminative Models
Discriminative models are used to classify existing data points (e.g., images of cats and or birds into respective categories). They are useful for making decisions and classifying data. These models can, for example, help in predicting the tendency of diseases like cancer-based on an individual’s vitals. Radiology images can be automatically classified for the presence of tumors. These models can also be used to go through the history of banking transactions and detect possible frauds by classifying anomalies in transactions.
In short, discriminative algorithms try to find the decision boundary based on features present in the input data and predict a label or a class to which a certain data example belongs. These models kind of compress information about the differences between different classes, without trying to understand what an object or feature really is.
Generative models try to understand the dataset structure and generate similar examples (e.g., creating a realistic image or generating a fluent text). Instead of predicting a label given to some features, they try to predict features given a certain label. While discriminative algorithms care about the relations between x (inputs) and y (outputs); generative models care about how to get x (input) if a certain y (output) is there? Mathematically, these models capture the probability of x and y occurring together and are not concerned with the decision boundary.
Generally, there are two most widely used generative AI models:-
• Generative Adversarial Networks or GANs — technologies that can create visual and multimedia artifacts from both imagery and textual input data.
• Transformer-based models — technologies such as Generative Pre-Trained (GPT) language models that can use information gathered on the Internet to create textual content from website articles to press releases to whitepapers.
Generative Adversarial Networks (GANs)
GANs put the two neural networks — generator and discriminator — against each other, hence the “adversarial” part. The contest between two neural networks takes the form of a zero-sum game, where one agent’s gain is another agent’s loss.
GANs have two sub-models:
• Generator — a neural net which creates fake input or fake samples from a random input vector.
• Discriminator — a neural net which classifies a given image from the generator as a real-world image or fake. Discriminator returns a number between 0 and 1. The closer the result to 0, the more likely the output to be fake and vice versa.
Both a generator and a discriminator are often implemented as CNNs (Convolutional Neural Networks), especially when working with images. The generator’s output and the discriminator’s classification are recursively fed back into the generator, and the discriminator repeats the process and gradually improves the generator’s next output bringing it closer to reality.
GANs can be used in multiple scenarios, for example:-
In our next post of the series, we will see how the language generation models (based on Transformer architecture) work. Additionally, we will explore what all the latest trends in language and image generation can be crucial areas of focus for IT professionals and software developers.
About the writer: Dr. Usman Zia is an Assistant Professor at the School of Interdisciplinary Engineering and Sciences, National University of Sciences and Technology (NUST), Pakistan. His research interests are Natural Language Processing and Machine Learning. He has authored numerous publications on language generation and machine learning. As an AI enthusiast, he is actively involved in a number of projects related to generative AI and NLP.