Loading...

Digging Deeper into the Core Building Blocks Behind AI: NLPs and ML


Digging Deeper into the Core Building Blocks Behind AI February 4, 2025

There are multiple elements - or building blocks - that go into the successful creation of AI knowledge bases, most of which cross into multiple branches of artificial intelligence (AI) in addition to other areas. Two of these such building blocks are natural language processing and machine learning, both of which we touched on in last week’s article you can find here. In today’s article, we’ll be digging deeper into what exactly these different areas are, and into what makes their usage particularly necessary in modern implementations of AI.

What is Natural Language Processing?

To start, natural language processing (NLP) is an intersection of AI and linguistics (computational linguistics) that focuses on granting machines the ability to understand, interpret, and ultimately generate natural language, or human language. Following from this, the raw language data NLPs are trained on to achieve this purpose are typically in the form of audio recordings or textual references. There are several broad techniques within NLP that are used to help machines to gain an understanding of natural language. Some of these key techniques, categorized based on the different aspects of language processing they address, are outlined below:

  • Tokenization: the segmentation of text into smaller segments, or tokens. Depending on the intent of the task, tokens can be made up of words or sentences. Tokenization helps NLP models to better understand the structure and meaning of a language. Some examples of use cases of this NLP technique include text classification, speech recognition, or text summarization.
  • Stemming and Lemmatization: the reduction of words to their base or root form through the removal of prefixes and suffixes, respectively. Breaking individual words down to their root form (ex: 'studying' or 'studied' to 'study') helps to ensure different forms are treated uniformly. Some examples of use cases of this NLP technique include helping to improve search engine capabilities and text dimensionality.
  • Part of Speech (POS) Tagging: the assigning of parts of speech to individual words in a sentence; essential for understanding the grammatical structure of a sentence. Some examples of use cases of this NLP technique include analysis of sentence structure, sentiment analysis, and machine translation.
  • Named Entity Recognition (NER): the identifying and classifying of entities in text into predefined categories. These categories typically include broad groups such as names, locations, colors, etc. NER is critical in ensuring valuable information is collected from unstructured text data. Some examples of use cases for this NLP technique include content categorization, search optimization prioritization, and information extraction for analysis.
Applications of NLP can be found in many of the language capabilities of the systems that we use today, such as digital assistants or AI agents or smarter searches.

What is Machine Learning?

Machine learning (ML) is a subset of AI that focuses on teaching machines to learn and make predictions from data without the need for manual intervention in the form of programming. The processes, or algorithms, that spring from this discipline are best thought of as rules for machines to follow to enable this learning.

The data used for this training is made up of many different types depending on the intended purpose of the algorithm – ranging from images of facial expressions, Facebook messages, and even recorded speech – but broadly is categorized into inputs and outputs. The machine learning algorithms used for this training are typically split into three categories, based on broader categories of the data used for this purpose:

  • Supervised learning: algorithms that learn from labeled datasets, meaning data where each 'input' is paired with a mapped 'output' label, or targeted response. This type of learning is the closest to guided, or supervised, learning as data is mapped to a targeted response. The purpose of this type of learning is to teach the model to learn by comparing its predictions with the intended output provided by the dataset, or actual answers. Some examples of the types of data this model might be trained to handle include spam email filtering or predicted weather conditions.
  • Unsupervised learning: algorithms that learn from unlabeled data, meaning data without a known category, or any form of mapped output as in supervised learning. The purpose of this type of learning is to teach the model, without human guidance, to identify patterns or organization for given data. Some examples of the types of data this model might be trained to handle include distinguishing different facial expressions, or even consumer types based on purchasing behavior.
  • Reinforcement learning: algorithms that learn sequentially through experience, or ‘trial and error’, within an environment. This category of machine learning focuses on making the best possible decisions, based on adjusting its behavior in response to rewards (positive reinforcement) or penalties (negative reinforcement). The intent of this type of learning is ultimately to maximize cumulative reward. Some examples of the type of data this model might be trained to handle include a checkers game, or even text summarization.
Application of ML include - though are not limited to - image and speech recognition, and advertisement recommendation systems.

Over the course of this article, natural language processing and machine learning have been shown to be two broad, but indistinct subfields within the ever-expanding area of AI. When combined these two subareas of AI can work together to accomplish many of the implementations of AI we see today, including - though not limited to - the context-understanding abilities displayed by AI knowledge bases.



Ready to see how Adrentech can automate your business?