Skip to main content


Whose Model is Better?

You and your friend are training a neural network for classification. Both of you are using identical training data. The data has four classes with 40% examples of cat images, 10% images of dogs, and 25% each of horse and sheep images. Since the deadline for the project is nearing, both of you decide to run only a few epochs and get to report writing. At the same time, the two of you have a friendly wager of $10 going to the winner of the better model. At the end of training, you find out that your model, Net1, is making 30% recognition errors and the resulting distribution of assigned labels to the training data is 25% each for four classes. As luck would have it, your friend's model, Net2, is also yielding 30% error rate but the assigned labels in the training set are different with 40% cats, 10% dogs, 10% horse, and 40% sheep. Since the error rate by both models is identical, your friend declares a tie. You on the other hand are insisting that your model Net1 is slightly better
Recent posts

Mapping Nodes to Vectors: An Intro to Node Embedding

In an earlier post, I had stated that  the recent advances in Natural Language Processing (NLP) technology can be, to a large extent, attributed to the use of very high-dimensional vectors for language representation. These high-dimensional, 764 dimensions is common, vector representations are called   embeddings   and are aimed at capturing semantic meaning and relationships between linguistic items.  Given that graphs are everywhere, it is not surprising to see the ideas of word and sentence embeddings being extended to graphs in the form of node embeddings.   What are Node Embedding? Node embeddings are  encodings of the properties and relationships of nodes in a low-dimensional vector space.  This enables nodes with similar properties or connectivity patterns to have similar vector representations. Using node embeddings can improve performance on various graph analytics tasks such as node classification, link prediction, and clustering.    Methods for Node Embeddings There are seve

Google's Bard Can Code and Compute for You

Large language models (LLMs) continue to fascinate us with their capabilities to answer our questions, generate presentations and essays for us and many other assorted tasks. These models are also good at generating code for user specified tasks. However, almost all of them do not run the code for us; they simply give us the code that we can copy and execute.   Recently, Google has given its large language model, Bard , the computational capabilities as well. Bard thus not only provides the code but also executes it while answering user's questions. I wanted to check this feature of Bard. Below is what happened when I asked Bard a question that involved some computation. Not only generating the code for entropy calculation and running it, Bard went on to explain entropy and its answer. Google characterizes computing by Bard in response to user questions as "writing code on the fly" method. The company says, "So far, we've seen this method improve the accuracy of

Exploring Canonical Correlation Analysis (CCA): Uncovering Hidden Relationships

Canonical Correlation Analysis (CCA) is a statistical technique that enables us to uncover hidden associations between two sets of variables. Whether it's in the fields of psychology, economics, genetics, marketing or machine learning, CCA proves to be a powerful tool for gaining valuable insights. In this blog post, we will try to understand CCA. But first let’s take a look at two sets of observations, X and Y , shown below. These two sets of observations are made on the same set of objects and each observation represents a different variable. Let’s calculate pairwise correlation between the column vectors of X and Y . The resulting correlation values should give us some insight between the two sets of measurements. These values are shown below where the entry at (i,j) represents the correlation between the i-th column of X and the j-th column of Y . The correlation values show moderate to almost no correlation between the columns of the two datasets except a relatively higher

Embeddings Beyond Words: Intro to Sentence Embeddings

It wouldn't be an exaggeration to say that the recent advances in Natural Language Processing (NLP) technology can be, to a large extent, attributed to the use of very high-dimensional vectors for language representation. These high-dimensional, 764 dimensions is common, vector representations are called embeddings and are aimed at capturing semantic meaning and relationships between linguistic items. Although the idea of using vector representation for words has been around for many years, the interest in word embedding took a quantum jump with Tomáš Mikolov’s Word2vec algorithm in 2013. Since then, many methods for generating word embeddings, for example GloVe and BERT , have been developed. Before moving on further, let's see briefly how word embedding methods work. Word Embedding: How is it Performed? I am going to explain how word embedding is done using the Word2vec method. This method uses a linear encoder-decoder network with a single hidden layer. The input layer o

Claude 2: A New Member of the Growing Family of Large Language Models

AI has advanced rapidly in recent years, with large language models (LLMs) like ChatGPT creating enormous excitement. These models can generate remarkably human-like text albeit  with certain limitations. In this post, we'll look at a new member of the family of large language models, Anthropic's Claude 2 , and highlight some of its features. Claude 2 Overview Claude2 was released in February 2023.  Claude 2 utilizes a context window of approximately 4,000 tokens during conversations. This allows it to actively reference the last 1,000-2,000 words spoken in order to strengthen contextual awareness and continuity. The context window is dynamically managed, expanding or contracting slightly based on factors like conversation complexity. This context capacity exceeds ChatGPT's approximately 1,000 token window, enabling Claude 2 to sustain longer, more intricate dialogues while retaining appropriate context.  In addition to conversational context, Claude 2 can take in multiple

Difference Between Semi-Supervised Learning and Self-Supervised Learning

There are many styles of training machine learning models including the familiar supervised and unsupervised learning to active learning, semi-supervised learning and self-supervised learning. In this post, I will explain the difference between semi-supervised and self-supervised styles of learning. To get started, let us first recap what is  supervised learning, the most popular machine learning methodology to build predictive models. Supervised learning uses annotated or labeled data to train predictive models. A   label   attached to a data vector is nothing but the response that the predictive model should generate  for that data vector as input during the model training. For example, we will label pictures of cats and dogs with labels   cat   and   dog  to train  a Cat versus Dog classifier. We assume a large enough training data set with labels is available w hen building a classifier. When there are no labels attached to the training data, then the learning style is known as uns