A brief overview of large language models

What are large language models?

A large language model (LLMs) is a form of artificial intelligence that is created with the intention to be able to understand, generate, and interact with human language. LLMs are usually trained with large sets of data that allow them to exhibit versatility in answering questions across a wide range of topics, from science and technology to art and culture, making them invaluable tools for information retrieval, content creation, and natural language understanding. LLMs’ are created using machine learning, and more specifically by employing transformer models—a computer system modeled on the human brain’s network of neurons (also known as a neural network), that’s designed to recognize patterns and learn from data. The transformer models allow the LLM to make decisions or predictions based on user input.

History

The history of Large Language Models (LLMs) can be traced back to the late 19th century. In 1883, French philogost Michel Breal explored the structure of languages and the relationship between words within languages. Bréal’s work laid the foundation for programming languages like Java and Python, as well as for improving how computers understand human language. This led to simpler communication between people and machines.

The development of LLM models continued post-World War II, when there was an increased interest in developing systems based on Natural Language Processing (NLP) – a type of AI which allows computers to understand, interpret, and create human language. Interest was driven by the need for efficient language translation to aid in international communication and trade. During the 1960s ELIZA was created by MIT scientist Joseph Weizenbaum, which is stated to be the first ever program to use NLP. The program had the ability to give predetermined responses based on keywords the user input.

Over time, better-trained LLMs were developed that could handle more complex data, especially during the 1990s when computers started to become more powerful. The advance in LLM development significantly improved their ability to understand and generate human-like text. More recently, this advancement has enabled OpenAI to create ChatGPT, a tool that has revolutionized artificial intelligence by offering sophisticated conversational capabilities and a wide range of applications.

How LLMs work

An important factor in how large language models (LLMs) work is the way that they represent words. Traditional forms of machine learning use numerical tables to represent words. However, a big limitation of this is that they can not recognize relationships between words. The tables are also very long–as long as the number of words in the English language—and these representations do not capture any meaning between words (Amazon Web Services). For example, the words cat and kitten would be as different as cat and airplane. In order to solve this problem, a special list of numbers known as multi-dimensional vectors can be used to represent words so that words with similar contextual meaning or relationships are close to each other inside of these lists. Using these word embeddings allows us to understand the context of words with similar meanings and other relationships between words.

Word embeddings are typically trained using neural network models on large amounts of text data. The training process involves adjusting the embeddings so that the model can predict words based on their context. Doing so ensures that words appearing in similar contexts have similar embeddings. The training process will also make adjustments to bring words in the vector closer together or farther apart based on how they are used together in the text (Cloudflare). At the end of this process, words that appear in similar contexts should have similar embeddings. Over time, the model will become more advanced, and be able to see both simple and complex relationships.

Examples of LLMs

ChatGPT

ChatGPT, and similar GPT models, use self-attention mechanisms (a functionality that helps a model focus on relevant parts of the input) to improve understanding and response generation. These mechanisms allow the model to assess the importance of each word in context. The models are trained initially on a large set of text data in an unsupervised manner to learn the patterns, contexts, and nuances of language. This phase is known as pre-training. The model learns to predict the next word in a sentence by considering the words that come before it. Then, it can undergo fine-tuning on more specific tasks or datasets to improve its performance on those tasks. In interaction mode, when you provide it with a prompt, it generates responses based on its training, attempting to continue the text in a coherent and contextually appropriate manner.

Claude AI

Claude is an AI chatbot developed by Anthropic AI, an artificial intelligence startup that was co-founded by ex-Open AI (the company that developed ChatGPT) members. Claude is powered with a large language model, and its unique design ensures that all of its answers are harmless and honest (Proudfoot). One of the main differences between Claude and other AI chatbots is that Claude is capable of reading files and images. It also is extremely secure, and follows several security and HIPPA regulations. In addition, Claude follows a code of ethical principles to ensure safety and reliability. This is a great aspect of Claude for companies or users that need to use it for data-sensitive tasks. Another interesting aspect of Claude AI is that it will admit when it is wrong or doesn’t know something, which other AI models don’t do.

BERT

BERT, which is short for Bidirectional Encoder Representations from Transformers, is a model generally used for natural language processing. According to IBM, natural language processing involves using machine learning models to recognize, understand, and detect text and speech. One example of where BERT is used is in chatbots, like ChatGPT. A model like BERT, or BERT itself, will look at the text you put in and comprehend it. By doing this, the computer can take into consideration whether your tone is angry or sad, if you need help, or are just striking up a conversation. It can then summarize your text, or condense if needed. Pretty much any AI machinery that involves understanding or reading human language involves BERT/natural language processing software. This includes devices like Alexa, Google Assistant, and Siri.

Conclusion

Five years ago, most people wouldn’t know what artificial intelligence (AI) was if you asked them. Today, large language models have taken over the world, as seen through the rise of technologies like ChatGPT, Alexa, and Google Assistant. Technology has grown significantly over the past few years, and large language models are used in almost every aspect of your life, from what you read online to every Google search you make. In the future, we can only expect LLMs to grow even further and become more advanced, so it’s pivotal to know how they work now to better prepare us for the future.

Furthermore, large language models offer insight into the capabilities of what computers can do to interact with humans. Doing so increases their human-like features and creates otherworldly conversations that we would never have thought possible. You can have a conversation with ChatGPT for hours on end, asking it questions and never getting bored. Similar models can also write entire books for you, or check whether or not your resume is up to par. There are so many options for what large language models can offer the world, and we’ve only seen a tiny fraction of what they can do.

One task that we implore you to act upon after reading this article is to think about how you can use large language models like ChatGPT to enhance your everyday life. Maybe this means generating grocery lists, learning a language, or automating a repetitive task that you do every day at work. There are quite literally infinite possibilities for what LLMs can do for you, and by utilizing them to their fullest, you’ll be ahead of the curve. By embracing these models, we can pivot to a future where technology further enhances our lives and fosters a society that we would have never imagined.

For a list of the references used for this post, please check here: https://bit.ly/3U6gSpK

This blog post was written by Sahil Chugani (Senior STEM Data Fellow), Reagan Bourne (Senior STEM Data Fellow), and Carlos Bravo (STEM Data Fellow) from FSU Libraries.

A brief overview of large language models

Leave a ReplyCancel reply

University Libraries

Leave a ReplyCancel reply

University Libraries

Discover more from FSULIB