Skip to main content


Showing posts with the label Word Embedding

Embeddings Beyond Words: Intro to Sentence Embeddings

It wouldn't be an exaggeration to say that the recent advances in Natural Language Processing (NLP) technology can be, to a large extent, attributed to the use of very high-dimensional vectors for language representation. These high-dimensional, 764 dimensions is common, vector representations are called embeddings and are aimed at capturing semantic meaning and relationships between linguistic items. Although the idea of using vector representation for words has been around for many years, the interest in word embedding took a quantum jump with Tomáš Mikolov’s Word2vec algorithm in 2013. Since then, many methods for generating word embeddings, for example GloVe and BERT , have been developed. Before moving on further, let's see briefly how word embedding methods work. Word Embedding: How is it Performed? I am going to explain how word embedding is done using the Word2vec method. This method uses a linear encoder-decoder network with a single hidden layer. The input layer o

Retrieval Augmented Generation: What is it and Why do we need it?

What is Retrieval Augmented Generation? Generative AI is currently garnering lots of attention. While the responses provided by the large language models (LLMs) are satisfactory in most situations, sometimes we want to get better focused responses when employing LLMs in specific domains. Retrieval-augmented generation (RAG) offers one such way to improve the output of generative AI systems. RAG enhances the LLMs capabilities by providing them with additional knowledge context through information retrieval. Thus, RAG aims to combine the strengths of both retrieval-based methods, which focus on selecting relevant information, and generation-based methods, which produce coherent and fluent text.  RAG works in the following way: Retrieval : The process starts with retrieving relevant documents, passages, or pieces of information from a pre-defined corpus or database. These retrieved sources contain content that is related to the topic or context for which you want to gen