Skip to main content

Posts

Showing posts from 2023

Whose Model is Better?

You and your friend are training a neural network for classification. Both of you are using identical training data. The data has four classes with 40% examples of cat images, 10% images of dogs, and 25% each of horse and sheep images. Since the deadline for the project is nearing, both of you decide to run only a few epochs and get to report writing. At the same time, the two of you have a friendly wager of $10 going to the winner of the better model. At the end of training, you find out that your model, Net1, is making 30% recognition errors and the resulting distribution of assigned labels to the training data is 25% each for four classes. As luck would have it, your friend's model, Net2, is also yielding 30% error rate but the assigned labels in the training set are different with 40% cats, 10% dogs, 10% horse, and 40% sheep. Since the error rate by both models is identical, your friend declares a tie. You on the other hand are insisting that your model Net1 is slightly better

Mapping Nodes to Vectors: An Intro to Node Embedding

In an earlier post, I had stated that  the recent advances in Natural Language Processing (NLP) technology can be, to a large extent, attributed to the use of very high-dimensional vectors for language representation. These high-dimensional, 764 dimensions is common, vector representations are called   embeddings   and are aimed at capturing semantic meaning and relationships between linguistic items.  Given that graphs are everywhere, it is not surprising to see the ideas of word and sentence embeddings being extended to graphs in the form of node embeddings.   What are Node Embedding? Node embeddings are  encodings of the properties and relationships of nodes in a low-dimensional vector space.  This enables nodes with similar properties or connectivity patterns to have similar vector representations. Using node embeddings can improve performance on various graph analytics tasks such as node classification, link prediction, and clustering.    Methods for Node Embeddings There are seve

Google's Bard Can Code and Compute for You

Large language models (LLMs) continue to fascinate us with their capabilities to answer our questions, generate presentations and essays for us and many other assorted tasks. These models are also good at generating code for user specified tasks. However, almost all of them do not run the code for us; they simply give us the code that we can copy and execute.   Recently, Google has given its large language model, Bard , the computational capabilities as well. Bard thus not only provides the code but also executes it while answering user's questions. I wanted to check this feature of Bard. Below is what happened when I asked Bard a question that involved some computation. Not only generating the code for entropy calculation and running it, Bard went on to explain entropy and its answer. Google characterizes computing by Bard in response to user questions as "writing code on the fly" method. The company says, "So far, we've seen this method improve the accuracy of

Exploring Canonical Correlation Analysis (CCA): Uncovering Hidden Relationships

Canonical Correlation Analysis (CCA) is a statistical technique that enables us to uncover hidden associations between two sets of variables. Whether it's in the fields of psychology, economics, genetics, marketing or machine learning, CCA proves to be a powerful tool for gaining valuable insights. In this blog post, we will try to understand CCA. But first let’s take a look at two sets of observations, X and Y , shown below. These two sets of observations are made on the same set of objects and each observation represents a different variable. Let’s calculate pairwise correlation between the column vectors of X and Y . The resulting correlation values should give us some insight between the two sets of measurements. These values are shown below where the entry at (i,j) represents the correlation between the i-th column of X and the j-th column of Y . The correlation values show moderate to almost no correlation between the columns of the two datasets except a relatively higher

Embeddings Beyond Words: Intro to Sentence Embeddings

It wouldn't be an exaggeration to say that the recent advances in Natural Language Processing (NLP) technology can be, to a large extent, attributed to the use of very high-dimensional vectors for language representation. These high-dimensional, 764 dimensions is common, vector representations are called embeddings and are aimed at capturing semantic meaning and relationships between linguistic items. Although the idea of using vector representation for words has been around for many years, the interest in word embedding took a quantum jump with Tomáš Mikolov’s Word2vec algorithm in 2013. Since then, many methods for generating word embeddings, for example GloVe and BERT , have been developed. Before moving on further, let's see briefly how word embedding methods work. Word Embedding: How is it Performed? I am going to explain how word embedding is done using the Word2vec method. This method uses a linear encoder-decoder network with a single hidden layer. The input layer o

Claude 2: A New Member of the Growing Family of Large Language Models

AI has advanced rapidly in recent years, with large language models (LLMs) like ChatGPT creating enormous excitement. These models can generate remarkably human-like text albeit  with certain limitations. In this post, we'll look at a new member of the family of large language models, Anthropic's Claude 2 , and highlight some of its features. Claude 2 Overview Claude2 was released in February 2023.  Claude 2 utilizes a context window of approximately 4,000 tokens during conversations. This allows it to actively reference the last 1,000-2,000 words spoken in order to strengthen contextual awareness and continuity. The context window is dynamically managed, expanding or contracting slightly based on factors like conversation complexity. This context capacity exceeds ChatGPT's approximately 1,000 token window, enabling Claude 2 to sustain longer, more intricate dialogues while retaining appropriate context.  In addition to conversational context, Claude 2 can take in multiple

Difference Between Semi-Supervised Learning and Self-Supervised Learning

There are many styles of training machine learning models including the familiar supervised and unsupervised learning to active learning, semi-supervised learning and self-supervised learning. In this post, I will explain the difference between semi-supervised and self-supervised styles of learning. To get started, let us first recap what is  supervised learning, the most popular machine learning methodology to build predictive models. Supervised learning uses annotated or labeled data to train predictive models. A   label   attached to a data vector is nothing but the response that the predictive model should generate  for that data vector as input during the model training. For example, we will label pictures of cats and dogs with labels   cat   and   dog  to train  a Cat versus Dog classifier. We assume a large enough training data set with labels is available w hen building a classifier. When there are no labels attached to the training data, then the learning style is known as uns

Retrieval Augmented Generation: What is it and Why do we need it?

What is Retrieval Augmented Generation? Generative AI is currently garnering lots of attention. While the responses provided by the large language models (LLMs) are satisfactory in most situations, sometimes we want to get better focused responses when employing LLMs in specific domains. Retrieval-augmented generation (RAG) offers one such way to improve the output of generative AI systems. RAG enhances the LLMs capabilities by providing them with additional knowledge context through information retrieval. Thus, RAG aims to combine the strengths of both retrieval-based methods, which focus on selecting relevant information, and generation-based methods, which produce coherent and fluent text.  RAG works in the following way: Retrieval : The process starts with retrieving relevant documents, passages, or pieces of information from a pre-defined corpus or database. These retrieved sources contain content that is related to the topic or context for which you want to gen

LLaMA 2 and its Symbolic Regression Explanation

On July 17, a new family of AI models, LLaMA 2 was announced by Meta. LLaMA 2 is trained on a mix of publicly available data. According to Meta LLaMA 2 performs significantly better than the previous generation of LLaMA models. Two flavors of the model: LLaMA 2 and LLaMA 2-Chat, a model fine tuned for two-way conversations, were released. Each flavor further has three versions with the parameters ranging from 7 billions to 70 billions. Meta is also freely releasing the code and data behind the model for  researchers to build upon and improve the technology. There are several ways to access LLaMA 2 for development work; you can download it from HuggingFace or access it via Microsoft Azure or Amazon SageMaker . For those interested in interacting with the LLaMA 2-Chat version, you can do so by visiting llama2.ai , a chatbot model demo hosted by the venture capitalist Andreessen Horowitz. This is the route I took to interact with LLaMA 2-Chat. Since I was reading an excellent paper on