www.turing.com
Open in
urlscan Pro
34.102.251.173
Public Scan
Submitted URL: http://www.turing.com/kb/introduction-to-self-supervised-learning-in-nlp
Effective URL: https://www.turing.com/kb/introduction-to-self-supervised-learning-in-nlp
Submission Tags: falconsandbox
Submission: On March 29 via api from US — Scanned from DE
Effective URL: https://www.turing.com/kb/introduction-to-self-supervised-learning-in-nlp
Submission Tags: falconsandbox
Submission: On March 29 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
* For Companies * For Developers * Blogs Apply for Jobs Login DEVELOPER LoginCreate an Account CLIENT Login Explore Services EXPLORE SERVICES AI Services Cloud Services Application Engineering Services Apply for JobsHire DevelopersHire Developers FOR DEVELOPERS INTRODUCTION TO SELF-SUPERVISED LEARNING IN NLP Share * * * * * Self-supervised learning (SSL) is a prominent part of deep learning. This is a legit method that is used to train most of the models as it can learn from the unlabeled data, making it easier to leverage a larger volume of raw data. But how is it done? When neural networks are provided with data, these networks try to connect the patterns within the data and extract relevant features. These features are then used to make decisions, like classifying objects (image classification), predicting a number (regression), generating captions (caption generator), and more. In this blog, we will discuss some techniques by which state-of-the-art deep learning models can be trained with less time and resources. We will also look into the details of self-supervised learning, its types, and the applications in which these models are used. TABLE OF CONTENTS * 1. Transfer learning * 2. Self-supervised learning * 2.1. Supervised learning * 2.2. Unsupervised learning * 2.3. Semi-supervised learning * 2.4. Reinforcement learning * 3. Difference between supervised and unsupervised learning * 4. Why was self-supervised learning introduced? * 5. Self-supervised learning applications in NLP * 5.1. Next sentence prediction * 5.2. Auto-regressive language modeling * 6. Wrapping up TRANSFER LEARNING Recently the IT industry saw an increase in the use of deep learning methods, thanks to the available large amounts of data and enough compute/processing power. This resulted in the training of heavier deep learning models on much larger datasets, particularly in computer vision and natural language processing tasks. Therefore, the output models, which achieved state-of-the-art results on standard benchmarks, were made available as pre-trained models for use. It was mainly used to fine-tune custom datasets. This technique forms the basis for transfer learning. Transfer learning is defined as transferring the knowledge gained by a model while learning one task. Then these trained models can be applied to a similar task with little modifications. You can also use pre-trained models like Resnet50, EfficientNet, and more. These models are trained on millions of images (like the ImageNet dataset) and fine-tuned on your data. Let's consider an example to train an image classification model of Cats vs Dogs (i.e., PETs Dataset). You can use the ResNet50 pre-trained model that has been trained on the Imagenet dataset with 1000 classes. Use the model to fine-tune the PETS dataset, containing different images of cats and dogs. Since you are using the pre-trained model, you are not starting the model training from scratch which means that the model has some knowledge about the appearance of a cat or dog. Now you can fine-tune the pre-trained model with the dataset and achieve good results quickly as compared to the training model from scratch. But you can't use transfer learning when you have no pre-trained models available. Here, you can use the self-supervised learning technique to train models and produce accurate results in lesser time and with minimum resources. SELF-SUPERVISED LEARNING Self-supervised learning is a technique used to train models in which the output labels are a part of the input data, thus no separate output labels are required. It is also known as predictive learning or pretext learning. In this method, the unsupervised problem is changed into a supervised one using auto-generation of labels. A classic example of this would be language models. The language model is a word sequence prediction model trained to predict the next word based on the previous input sentence. This kind of task is a self-supervised learning task because you are not defining separate output labels. Instead, you are providing the texts as inputs and outputs in a specific way that will help the model to understand the fundamentals and style of the language used in the dataset (or the language used in the dataset). Self-supervised learning is combined with transfer learning to create a more advances NLP model. When you don't have any pre-trained models for our dataset, you can create one using self-supervised learning. You can train a language model using the text corpus available in the train and test dataset. You can train a language model by providing an independent variable like the text of a specific length, and then provide the same text by appending the next word as the output label. This works when a lot of text is provided in the model and it can easily find patterns and learn the basic style of the text. In this way, the model will learn to predict the next word in the sentence. Usually, the language model generates text but it is not useful for any downstream task until it is fine-tuned. The language model acts as a pre-trained model that can be used to conduct transfer learning i.e., fine-tuning with the same dataset or different datasets for any downstream tasks like text classification, sentiment analysis, and more. One of the best resources to find the language models is HuggingFace. It has many models trained using corpora of text data of different styles and languages. Machine learning algorithms are broadly classified as 1. Supervised 2. Semi-supervised 3. Unsupervised 4. Reinforcement learning Let's see what each of these means in brief: SUPERVISED LEARNING The supervised learning technique is a popular technique that helps with training your neural networks on labeled data for a specific task. In this technique, a machine learning model will have inputs and corresponding labels to learn about. For example, you can consider image classification, regression analysis, or more. For instance, take a classroom where a student is learning from the teacher about any concept with different examples for better understanding. UNSUPERVISED LEARNING A deep learning technique that is used to find data from implicit patterns without explicitly training on labeled data is known as unsupervised learning models. Contrary to the supervised learning models, it doesn’t need a feedback loop and annotations for training. In this technique, a machine learning model will get inputs and the model will find patterns from it and use those to predict the output. For example, clustering, and principal component analysis all come under this. SEMI-SUPERVISED LEARNING It is a combination of both unsupervised and supervised learning models. This technique comes in handy when you have a small set of labeled data points for training the model. With the training process, you can use a pseudo-label and a set of labeled data for the rest of the dataset. For instance, when a student learns how to deal with specific problems from their teacher, they have to figure out how to solve those problems by themselves. It is a semi-supervised learning model. REINFORCEMENT LEARNING It is a method used for training AI agents to learn environmental behavior in certain contexts with the help of a reward feedback policy. With this technique, a machine learning model learns from actions and rewards. It depends on how agents take action in an environment for maximizing the awards received. Examples of this learning model are path planning, chess engine, a child who wants to win a stage game, and more. DIFFERENCE BETWEEN SUPERVISED AND UNSUPERVISED LEARNING Both supervised and unsupervised learning have distinct objectives and provide you with distinct solutions as per your requirement. Supervised learning models and unsupervised learning models can be complementary learning models as both don’t require labeling of the datasets. Unsupervised learning models must be the superset of self-supervised learning as they don’t provide any feedback loops. Supervised LearningUnsupervised LearningInput dataLabelledUnlabelledFeedback MechanismAvailableNot availableData ClassificationDone based on the training datasetAssigns properties of a given data for classificationTypesRegression and ClassificationClustering and AssociationUsageFor predictionFor analysisAlgorithmsDecision trees, Support Vector Machine, Logistic RegressionHierarchical Clustering, K-means Clustering, Apriori AlgorithmClassesKnown number of classesAn unknown number of classes On the other hand, the self-supervised learning model has many supervisory signals which act as responses in the process of the training. An unsupervised learning model focuses more on the model and not on the data. In contrast, the supervised learning model works the other way. But, unsupervised learning models are exceptional at dimensionality reduction and clustering, whereas supervised learning models are a pretext technique for classification and regression tasks. Another major difference is that supervised learning functions around labeled data but unsupervised learning mainly deals with unlabeled data. WHY WAS SELF-SUPERVISED LEARNING INTRODUCED? Self-supervised learning model was designed to address the following common issues. * High Cost: To acquire labeled data, it was necessary to use more of the learning models. However, to acquire higher quality labeled data, you have to pay a higher price and the task might also be tedious and time-consuming. * Generic AI: The self-supervised learning model framework is closely related to embedding human cognition with machines. So the self-supervised machine is also considered artificial intelligence. * Lengthy lifecycle: While developing machine learning models, the data preparation lifecycle will be a highly lengthy process. You will need to filter, annotate, review, clean, and restructure the data according to your training framework. The self-supervised learning model applications first came into existence because of the above-mentioned concerns. It not only overcomes the above concerns but also provides additional benefits like flexibility, and integrity of data which all come at a lower cost. SELF-SUPERVISED LEARNING APPLICATIONS IN NLP SSL created huge steps in the NLP (Natural Language Processing) field. The self-supervised learning is widely used everywhere starting from application documentation processing, sentence completion, text suggestions, and more. But, the learning abilities of the self-supervised model evolves majorly after the release of the Word2Vec research paper, which took the natural language processing domain to the next level. The model can predict the next word depending on the prior pattern, this is the idea behind word embedding approaches. It is because of such improvements that came from the Word2Vec research paper, you will now be able to get a beneficial representation using the allotment of word-embedded systems which are used for scenarios like word prediction, sentence completion, and so on. BERT (Bidirectional Encoder Representations from Transformers) is one of the most eminent SSL methods that is used in natural language processing. Now, let us discuss some of the vital applications of self-supervised learning models: NEXT SENTENCE PREDICTION With next sentence prediction, you can pick up two concurrent sentences from a document and an unspecified sentence from the same or different document, so you have a sentence 1, sentence 2, and sentence 3. Then, you can ask the self-supervised learning model the relative position of sentence 1 to sentence 2, and the model will provide an output either with IsNotNextSentence or IsNextSentence. You can use the same self-supervised model with all the combinations. You should consider the below scenarios: 1. The mission to the moon is finally starting after so many years. 2. You can watch TV when you reach home. 3. You can go home after you come from school. When you ask an individual to reorder any different sentences which will be suitable for our logical understanding, they would mostly choose sentence 1 and 2 as one after the other. The vital reason for using this model is to predict sentences depending on the everlasting contextual dependencies. BERT published a paper written by Google’s artificial intelligence team of researchers who have proficient in various NLP tasks like natural language inference, question answering, and many more. For similar tasks, BERT will provide a great method for capturing the relationships between sentences that is not possible with other language modeling methods. Here is how the self-supervised model for NLP works: 1. For making BERT handle different varieties of downstream tasks, input representation definitively represents a set of sentences that are conjoint in a single sequence. A sequence cites an input token order to BERT. 2. The initial token of each sequence is a singular classification token. The concluding hidden state after this token is utilized as the average sequence presentation for classifying tokens. You should differentiate between the sentences in two methods. Firstly, you should disconnect them with a unique token. Secondly, you have to add a learning model embedding each token which indicates whether it applies to sentence 1 or sentence 2. You should denote the input embedding as 5, then the concluding hidden vector of the unique token as 3, and the concluding hidden vector for the ith input token as Ti. Then, vector 3 is used for the NSP application. AUTO-REGRESSIVE LANGUAGE MODELING When you have auto-encoding models like BERT from transformers or you want to employ self-supervised learning for functions like sentence classification, then the application of SSL propositions occurs in the text generation domain. Auto-regressive models like GPT (Generative Pre-trained Transformers) agreed on the vintage language modeling task. It will forecast the upcoming word after reading all the preceding ones. Models like these will respond to the decoder part of the mask and transformer at the top of the completed sentence before and after the text. Now, let us understand more about how these models work by trying to understand the GPT training framework. The training approach will have two phases: Unsupervised pre-training phase This is the first stage. It will help you learn a powerful language model on a huge amount of text. When you have an unsupervised amount of tokens U = {u1, . . . . . . . . un}, you can use standard language modeling objectives for maximizing the following probability: In the above string, k is the measurement of the context window, and P is the dependent probability modeled with the help of neural networks with parameters. These variables are taught with the help of stochastic gradient descent. You are training a complex transformer decoder for the self-supervised learning model, which will also act as an alternative to the transformer. It applies a complex operation over the input context tokens following position-wise feed-forward layers for producing an output distribution over the targetted tokens: In the above equation, U = ( u-k, . . . . . . . , u -1) is the context vector of tokens, We are the token embedding matrix, Wp is the position embedding matrix, and n is the total number of layers. This attention where every token can visit the context to h0 will bring the self-supervised approach into the picture. Supervised fine-tuning In this step, you should think of a labeled dataset C, where every instance contains a set of input tokens, x1, . . . . . , xm along with a label y. You will pass the inputs through the pre-trained model for obtaining the final transformer block parameters Wy for predicting y: It will provide us with the objective for maximizing: When you want to add language modeling for the auxiliary purpose of the fine-tuning, it will help in understanding that improving the generality of the supervised learning model and increasing convergence is the best choice. You should optimize the below objective with a weight: On the whole, the only extra variables you need when fine-tuning are Wy, and embeddings for delimiter tokens. The transformer architecture and training objectives are the input transformations for fine-tuning various tasks. You must convert all the structured inputs into sequences of tokens that need to be processed by the pre-trained model and after that, we should process the linear + softmax layer. For various tasks, various types of processing are needed like Textual Entailment. There are various iterations required when you want to have an improvement over the original GPT model. It will also help you in understanding how you can use it for your requirements. WRAPPING UP When you are using the above techniques like transfer learning and self-supervised learning, you will be able to train the deep learning models even when there aren’t enough resources. It will provide us with a way to train deep learning models that are not task-specific but can support multiple tasks with the help of fine-tuning. Self-supervised learning has highly helped in the development of AI systems, which can learn with less help. With GPT-3 and BERT you can see that SSL is easily used in natural language processing. This network aims to learn using good representations from unlabeled data. It will highly reduce the dependency on huge amounts of data similar to the supervised learning model. At present, self-supervised learning is still a growing technology and developers are massively being hired to make the process smoother and efficient. AUTHOR * TURING Author is a seasoned writer with a reputation for crafting highly engaging, well-researched, and useful content that is widely read by many of today's skilled programmers and developers. RELATED ARTICLES HOW TO WRITE A GOOD RESEARCH PAPER IN THE MACHINE LEARNING AREA A research paper on machine learning refers to the proper technical documentation that... Read more TRAINING, TESTING & DEPLOYMENT OF A CLASSIFICATION MODEL USING CONVOLUTIONAL NEURAL NETWORKS AND MACHINE LEARNING CLASSIFIERS CNN, Convolutional Neural Networks, is a deep-learning-based algorithm that takes an image as an input... Read more INTRODUCTION TO DAGSHUB AND DVCS IN MACHINE LEARNING FOR BEGINNERS. Every machine learning problem demands a unique solution subjected to its distinctiveness... Read more INTRODUCTION TO STATISTICS FOR MACHINE LEARNING Machine learning is a subset of artificial intelligence in which a model holds the capability of... Read more A GUIDE ON WORD EMBEDDINGS IN NLP Word embedding in NLP is an important term that is used for representing words for text analysis... Read more APPLY FOR THE BEST REMOTE NLP JOBS AT TOP U.S. COMPANIES FULL-TIME REMOTE JOB FULL-STACK DEVELOPER A U.S.-based company that is developing a cutting-edge online platform that will bring revolutionary changes to the college admissions process, is looking for a Full-Stack Developer. The developer will help create products that high school guidance counselors can use to help students find colleges through the app process. By building rapport between students, supporters, high schools, and universities for a better admissions Edtech 51-250 employees ReactGo/Golang Apply now FULL-TIME REMOTE JOB SENIOR FULL-STACK ENGINEER A U.S.-based company that has developed an innovative online platform that connects like-minded people together is looking for a Senior Full-Stack Engineer. The engineer will be responsible for maintaining a high degree of performance and making a significant contribution to achieving the objectives. The platform enables users across the world to join and create communities that share their interests while communicating inti Recreation 251-10K employees ReactPythonTypescript Apply now FULL-TIME REMOTE JOB FRONT-END ENGINEER A rapidly-emerging company that is helping businesses to create potent internal dashboards on top of their smart contracts, is looking for a Front-End Engineer. The engineer will be working with a growing engineering team while tackling challenging technologies and defining engineering practices. The company’s cutting-edge Web3 platform is enabling businesses to develop world-class tools and user interfaces. This is an amazi Technology 1-10 employees ReactReduxTypescript Apply now FULL-TIME REMOTE JOB SENIOR FULL-STACK ENGINEER A NASDAQ-listed company that is utilizing state-of-the-art technology to build innovative electric vehicles is looking for a Senior Full-Stack Engineer. The selected candidate will be expected to execute DevOps tasks as needed to release the application to various environments. The company is determined to build a healthier future for the coming generations by enabling customers to reduce their carbon footprint. The company Transportation 251-10K employees ReactPythonAWS Lambda Apply now FULL-TIME REMOTE JOB SENIOR FRONT-END ENGINEER A rapidly-growing company that is developing a cutting-edge platform that offers DeFi pricing infrastructure solutions is looking for a Senior Front-End Engineer. The engineer will be in charge of designing and building efficient intuitive front-end applications. The company is building a broad range of decentralized, low-latency reference pricing for different exchanges. This is an amazing opportunity for developers to show Technology 1-10 employees ReactAngular Apply now FULL-TIME REMOTE JOB FULL-STACK BACK-END-HEAVY MERN DEVELOPER A rapidly-growing company that is developing innovative technology that enables collaborative crowdsourcing of financial data by fundamental analysts, is looking for a Full-Stack Back-End-Heavy MERN Developer. The selected candidate will be using their critical thinking abilities to produce solutions quickly and transparently. The company is on a mission to build an innovative SaaS MVP from scratch. This role requires a sign Technology 1-10 employees React Apply now FREQUENTLY ASKED QUESTIONS Does NLP use supervised learning? Machine learning for natural language processing and text analysis requires a set of statistical methods which will help them with the identification of speech, sentiments, entities, and other emotional parts of the text. The technique that is used for expressing it as a model which is then applied to other texts is known as supervised machine learning. What is a self-supervised learning example? One of the well-known examples of a self-supervised learning model is speech recognition. For example, the application developed by Facebook wav2vec runs on the self-supervised learning model. It performs speech recognition with the help of two deep convolutional neural networks which are built on one another. Are language models self-supervised? Language models are trained using a self-supervised model which are having tasks over huge amounts of unlabeled text. For example, when you take the masked language task, some token fractions in the original text will be masked randomly, and the language model will attempt in predicting the original text. What is the use of self-supervised learning? The self-supervised learning model is a representation where a supervised task is created with the help of unlabeled data. It is majorly used for reducing the data labeling cost and leveraging the unlabelled data pool. What is self-supervised vs unsupervised learning? Both self-supervised and unsupervised learning models are similar. The major difference among them is that the self-supervised learning model aims to tackle tasks that are traditionally done by supervised learning models. What are the two types of supervised learning models? The supervised learning models are of two types: regression and classification. Classification separates the data, whereas the regression fits the data in the required space. View more FAQs PRESS What's up with Turing? Get the latest news about us here. BLOG Know more about remote work. Checkout our blog here. CONTACT Have any questions? We'd love to hear from you. HIRE REMOTE DEVELOPERS Tell us the skills you need and we'll find the best developer for you in days, not weeks. Hire Developers COMPANIES * Hire Developers * Hire Development Team * Book a Call * Explore Services * Hire for Specific Skills * Customer Reviews * How to Hire * Interview Q/A * Hiring Resources DEVELOPERS * Apply for Jobs * Developer Login * Remote Developer Jobs * Developer Reviews * Knowledge Base * Resume Guide * Jobs for LatAm COMPANY * Blog * Press * About Us * Careers CONTACT * Contact Us * Help Center * Developer Support * Customer Support * * * * * © 2023 Turing1900 Embarcadero Road Palo Alto, CA, 94303 SitemapTerms of ServicePrivacy Policy