Representing Unstructured data into meaningful embeddings

Sep 11, 2021

Now let us first understand some basic terminology

‣Unstructured data - It is data that is not in a tabular format or that lacks schema example data of images and text.

‣Embeddings - It is a representation of any unstructured data into vector space, they are usually multidimensional.

With these terms, the idea should b pretty clear

Now let us pick text data

As we can not feed a sentence of strings to any machine learning model. Text data can be converted to a higher vector space of embeddings. Each word with context is represented into numbers or in a multidimensional array. This multidimensional representation or embeddings computation in the text can be computed by.

‣Bert

‣Glove

‣Keras embeddings layers

My thread on to clear these basic of embeddings computation and usage for text

Tushar Raj Verma @TrajVoid

Comparison measures in data science are a key role to differentiate between two data points. Before jumping over to the methods commonly used to form a similarity matrix between two data points, let me just give an overview of What I mean by saying Comparison Measures. 👇

Let's talk about images

When we read images they are just pixel representation and we get a multi-dimension array should I call it embeddings?

Maybe or maybe not

I would rather call after passing it to a pre-trained model and compute meaningful embeddings.

Meaningful embeddings mean that it should attain a property or set of values that are unique.

Computing embeddings for images by

‣Pretrained models like inception, resent (cutting layers for better dimensions)

‣Using Keras embedding layer

My thread on to clear these basics of embeddings computation and usage for images

Tushar Raj Verma @TrajVoid

We always talk about Regression & Classification problems, How they are different what all types of outputs we have etc. But if I know my problem belongs to image classification Here is a list of pre-trained models and some of my recommendations🧵 https://t.co/wrm527kPmp

Now the advantage of using these embeddings

‣They can be used to train a model for classifying

‣They can be used to compute cosine similarities between two data points.

‣They can be used to perform unsupervised learning that is we can perform clustering on embeddings of multiple data points and can convert them into groups of similar embeddings

Feel free to jump over to the explorations of embeddings!

Thanks for reading!

Unfreeze.ml

Discussion about this post