Stephen Collie Enterprises New Zealand

There are many opinions about how Artificial Intelligence (AI) is going to change the world with expectations about its capabilities for now and in the future. AI simply refers to intelligence displayed by machines in contrast to that displayed by humans. Although humans are intelligent, they cannot be programmed to exceed their current capabilities in the same way a machine can. This has led to the creation of smart machines that handle tasks otherwise difficult for humans to handle efficiently.

Artificial intelligence is gradually becoming a constant presence in many technological applications. From apps and websites that show accurate user recommendations to gaming predictions, it is changing user experience in many fields.

Fleet management is one of the areas that AI is disrupting. The growing need to put driver safety first without compromising cost or efficiency has led to the adoption of smart fleet management systems.

For the average driver, the presence of AI can be felt heavily in the use of smartphones and telematics devices that recommend the best routes to take in traffic. This used to be a herculean task marked by paper maps and listening to radio broadcasts of traffic routes; today, we have complex traffic apps that combine GPS and artificial intelligence to make drivers’ lives easier.

Fleets benefit from powerful AI-based applications that handle anything from route recommendation to road risk data analysis and even driver coaching. It provides the accuracy, efficiency, convenience, and ease that earlier technology failed to provide. As a result, it is becoming safer to transport goods and services.

What is AI Fleet Management?

AI fleet management is the use of artificial intelligence-based technology to manage fleet operations. In a constantly changing world, it streamlines the work of any fleet manager by gradually eliminating human error from the transport process.

AI-based recommendations ensure that fleet drivers, managers, and mechanics can make better decisions that improve the long-term performance of the fleet. It also serves as assistive technology, ensuring that drivers retain autonomy during each transport cycle. Here are some key aspects of fleet management that AI can optimize:

Real-time Fleet Analytics

Collecting data is a key element of any operational process because without analyzing past data, you cannot make informed decisions. With historical insights to inform millions of data points analyzed in real-time,  the result is the prioritization opportunities and risks so that fleet managers and drivers can determine the best course of action to take in potentially problematic situations.

AI fleet management systems can be used to collect data for predictive analytics; data such as traffic and road conditions, environmental hazards, real-time weather, and mechanical faults can be used to predict incoming risk. This allows fleet managers to make better routes, schedules, maintenance delivery, and dispatch arrangements that improve fleet outcomes and activities.

Finally, with AI-based analytics, drivers no longer need to go in blind and can stay prepared for any unexpected events.

Better Repair and Maintenance Decisions

In May 2019, autonomous driving car brand Tesla made headlines after debuting AI-based technology that allows Tesla vehicles to diagnose their faults accurately. Although this technology has existed for some time and has been seen in several modern cars, artificial intelligence is providing a more accurate self-diagnostics as well as solutions to faults.

AI ensures that potential faults can be predicted before they even happen. For example, a normal vehicle with a diagnostics system would most likely signal an engine problem when it has already occurred. On the other hand, AI-based Internet of Things (IoT), data analytics and predictive maintenance, can lead to fault detection long before it eventually happens. According to a study by McKinsey, predictive maintenance will reduce costs by 10-40%, downtime by 50% and capital investment by 3-5%.

Predictive maintenance gives managers and their mechanics more than enough time for repairs which could potentially prevent accidents. More importantly, AI can recommend the most efficient and cost-effective solutions to mechanical faults. This has two major benefits:

  • It saves mechanics’ time usually spent on diagnostics.
  • It gives managers a clearer picture of the state of their fleets at all times. This could mean that service managers could save a lot of routine maintenance costs by carrying out repairs only when the AI systems show potential faults.

Fleet Integration

One major problem with fleet operations, especially in large fleets, is the number of moving parts within the system that need to be accessed. Several departments need a continuous inflow of information that needs to be in sync with all other departmental operations. Although a skilled workforce can make this happen, it is time and labor-intensive.

An AI system could simplify the process by seamlessly integrating every department on a single platform and feeding them information simultaneously. Service managers can save time and costs on planning, maintenance and monitoring operations since all data on those operations are fully accessible. This ensures that all personnel across the different departments have access to the data that helps them make informed decisions. It also leads to a more cohesive fleet, since every department automatically works in sync with the others.

Simpler Recruitment Process

According to a report by the U.S. Bureau of Labor Statistics. The need for automotive and diesel technicians is expected to grow by up to 5% by 2028. The American Trucking Association estimates that there will be a shortfall of up to 175,000 truck drivers by 2026.

As the older generation drivers and technicians retire, there is a need for younger tech-savvy replacements; however, this presents a problem with onboarding and training. AI can simplify the onboarding process by capturing the specialized skills of these workers before they retire.

This is especially great for technicians with unique ways of carrying out their tasks. AI can also recommend the most qualified drivers that suit the needs of the company from a pool of thousands of applicants, reducing the strain on recruiters.

Fleet Management

Image Credits

How is AI Integrated with Fleet Management?

AI-integrated software is usually a sophisticated system made up of several devices and applications such as Internet of Things, predictive data analysis and machine learning systems, HD cameras and sensors, communication and display systems, and WiFi.

For example, AI-based fleet management platform Driveri, currently deployed in fleets across the country is a combination of all of these components. There are also many other AI-integrated fleet management systems with one or more of these components.

Before understanding how each of these parts combines to create a fleet management powerhouse, it is important to know what each one does.

Internet of Things (IoT)

The Internet of Things refers to a network of actuators and sensors continuously collecting data from their environment. In fleet management, IoT ensures that enough data is captured for analysis while promoting the seamless sharing of information between all stakeholders on the supply chain such as retailers and manufacturers.

IoT for fleet management works through the use of 3 main technologies:

  • Wireless Communication (4G, Bluetooth< WiFi ) convey relevant information
  • Global Positioning System (GPS) for accurate real-time location tracking
  • Onboard Diagnostics (such as OBDII and J1939) scanners for self-diagnostics and reporting

Machine Learning 

Machine learning technology allows fleets to learn from data collected over time and make managed adjustments based on that data. The result is the creation of smart systems in which AI can learn decision making capabilities that enable more effective handling of practical situations.

HD Cameras

Cameras ensure that video data can be captured, analyzed and accessed at any time leading to a better study of driver behavior, road conditions or hazards.

An AI system with all of the above components will be capable of performing the following tasks:

  • Collecting accurate road data and transmitting it to other devices
  • Passing information across every arm of the supply chain
  • Analyzing data in real-time and advising the driver on the best course of action
  • Detecting distracted or drowsy driving behaviors in drivers before they lead to accidents
  • Capturing full video footage of accidents from different external vehicle angles
  • Running Self-diagnostics and recommending solutions through predictive maintenance

This is significant because it creates a future of fleet management in which human error is reduced in different aspects of the transport cycle. This, in turn, could lead to better outcomes and cost savings.

How AI Fleet Management Will Shape the Future of Transportation

Today, the automotive vehicle industry is faced with several problems that affect fleet activities and profitability. If properly applied, AI can potentially solve these problems and create a better future for transportation.These problems include:

  • Resource prioritization and efficacy
  • Risky driving behaviors that lead to accidents
  • Road risks
  • Data collection and analysis
  • Cost containment
  • Compliance

Risky road behaviors such as distracted and drowsy driving are often accompanied by signs that drivers are told to look out for. These signs include:

  • Yawning
  • Constant blinking
  • Missing turns or exits
  • Drifting out of their lane
  • Slower reaction times
  • Picking up a cellphone

Ordinarily, managers rely on their drivers to avoid these signs and have no way of knowing if a driver had been texting while driving or nodding off at the wheel. Artificial intelligence systems could be trained to detect head turns, missed exits, yawning and blinking frequencies and other signs of risky behavior. These signals can be broadcast to fleet managers in real-time, allowing them to take corrective measures.

Changing road conditions present another challenge for managers because they are difficult to detect without proper technological tools. These conditions present a huge risk evident in the 42,000 deaths they cause annually. AI-based predictive technology can map reduce the risk associated with this problem by studying and mapping out routes while also drawing from data gathered by other vehicles. It can also be trained to make smart predictions about the weather and detect environmental changes such as fog before a driver reaches that point.

A good example of this type of risk assessment through data collection is Netradyne, whose product has already mapped out over 1 million unique miles of US roads. In the future, an extensive database of road conditions will be essential for promoting safety.

As discussed above, AI-based systems can help managers save costs through fuel economy and predictive maintenance. No matter what type of fleet you operate, from trucks to trains, city buses, or taxis, fuel and maintenance are major contributors to operational costs. Vehicles break down and fuel prices increase without warning, leading to more expenditure. The elimination of routine maintenance schedules using IoT self-diagnostics and fuel control could be the key to better cost containment in the future of fleet management.

Which is the Best Fleet Management Software?

Fortunately, AI-based fleet management software has gone from being dreamy concepts to reality. Several technology companies have created software that improves driver safety and fleet performance without compromising cost or efficiency.

In our research, we looked at the key components that made each one stand out.After analyzing their mapping capabilities, technological range, as well as sensor technologies, Driveri emerged as the best fleet management software due to the following features:

  • An artificial intelligence DriverAlert system that captures and analyze every minute of driving time.
  • Real-time analysis and feedback enabled by powerful Edge Computing capabilities.
  • Internal lens that detects drowsy or distracted driving behaviors such as yawning that alerts managers in real-time enabling quick action to mitigate risk.
  • Advanced data analysis system with more than 1 million unique miles of US roads analyzed and stored in an accessible database
  • Forward, side, and interior HD cameras that capture high-quality videos in real-time
  • Access to up to 100 hours of video playback for records and as evidence in the case of accidents in which there are legal consequences
  • 4G LTE / WiFi / BT connection within fleets, to send and receive data, view video and analyze risky behaviors
  • A mobile application for real-time feedback
  • Single module installation system for quick and easy installation

Final Thoughts

The future of transportation looks more promising than ever due to the exciting applications of AI in fleet management. Unpredictable road conditions, operational costs, and driver retention problems could easily become obsolete as fleets move to AI-based systems. Every stakeholder stands to benefit a lot from the efficiency and reliability of this technology because of a reduction in costs, accidents, driver turnover, and other problems which could reflect on the pricing of fleet services. It could also ensure that other road users remain safe.

This article was originally published on

Featured Image Credits: Pixabay

AI and Blockchain – Super Cool or a Little Creepy?

If you’re a tiny bit freaked out by the enormous potential of AI and blockchain, you’re not the only one. When Dolly the sheep was cloned in the 90s, a pertinent question arose. Just because we can, does it mean we should?

Just because AI and blockchain technologies combined may stop crimes before they happen, replace human jobs with robots, and assign every “thing” in the stratosphere an identity–does it mean they should?

Are AI and blockchain combined super cool or a little creepy? Let’s take a closer look.

Artificial Intelligence (AI)

AI and blockchain and are obviously two very different foundational technologies. AI uses machine learning to analyze complex data models, identify patterns, and recreate them, eventually mimicking the actions of a human being. Sophia the robot? Right. And yes… a little creepy.

Hanson Robotics may have treated the world to a dose of AI at its very best. But not everything that involves Artificial Intelligence is as visually stunning, unfathomably expensive (or mentally disturbing) as Sophia.

So, if you’re worried about robots wiping out the human race, you can probably relax for now. You have to admit, as creepy as the spontaneous jokes and flawless facial expressions were, they were also pretty cool:

AI isn’t all about making robots come to life either. In fact, we’re constantly using AI in daily applications, perhaps without even realizing. Think LinkedIn and its predictive text chat, Siri or Alexa, for slightly more subtle examples of AI without the paparazzi.


Without diving too deeply into the particularities of blockchain (you can read more about it here if you want to), its main characteristics are decentralization, transparency, immutability, and ability to govern autonomously with smart contracts.

Through blockchain technology, we can send funds from one country to another without worrying about conversion fees in next to no time. We can speculate on the future price of Bitcoin, track items in the supply chain, store public records securely, eradicate poverty, wipe out corruption, trade energy credits, give power back to musicians, and hundreds of other extremely cool things.

But blockchain has a few characteristics that can give the hibigeebies as well. Smart governance can be hard to get your head around, the idea that code is law, and that at some point in the future, it’s likely that IoT technology and blockchain will create autonomous things with no living person behind them.

“What happens when something goes bad? Who is responsible?” asked Malta’s Steve Tendon in his interview with us. “These questions need new laws to be addressed.”

When AI and Blockchain Work Together

There are several ways in which Artificial Intelligence and blockchain can work together, to help each other reach their full potential. For example, we’re still a long way off smart governance while issues exist with smart contracts. Namely that they’re only as good as the developer who programmed them and they cannot leave room for good faith or human interpretation. Perhaps through machine learning and AI, smart contracts can reach their full potential and autonomous societies can exist.

AI - Internet of Things

The decentralized nature of blockchain technology also means that AI computing power can be spread over nodes to drive down the cost and make it more accessible to all.

But, perhaps the main common thread between AI and blockchain is actually data. Sophia said in her speech, “think of me a smart input-output system.” Good AI is only as good as the data fed into it. This is where teaming the two technologies can get really useful (and really cool, as well).

Rather than an army of Sophias being poorly programmed and driverless cars crashing into pedestrians, blockchain can ensure that the data given machines is high-quality. And verified through its cryptography and immutability.

Since AI depends on meaningful data, blockchain could soon be a huge enabler to allow machines to learn faster and more efficiently. Not just Sophia, but financial transactions, smart contracts, predictive text, and more.

Some Everyday Examples

Intelligent Trading Foundation and AiX are examples of trading platforms that use AI and blockchain together. Making use of deep recurrent neural networks to make predictions on price movements of cryptocurrencies. By harnessing the latest AI tech, they can figure out whether the price of cryptocurrencies will go up or down. By using AI for intelligence and blockchain for trading, these platforms can provide the latest in technology on a secure platform.

DeepBrain Chain is an AI computing platform powered by blockchain technology to integrate computing resources that are currently scattered all over the world. Seeing a pattern? DATA.

DeepBrain Chain He Yong

In the energy industry, companies like Electrify are using blockchain to support AI and improve the efficiency of their network operations, reducing energy bills by as much as 25 percent. Blockchain can ensure the security and privacy of the data they process using AI and also provide enough data to make rigorous Artificial Intelligence possible.

In finance, AI and blockchain work together to improve efficiencies as well. Artificial Intelligence can automate the repetitive tasks and blockchain can secure the data. And the list of cool (and creepy) projects goes on.

The Takeaway

We may not have a Bladerunner situation on our hands or a Minority Report in which we find ourselves massacred by robots or slammed behind bars before committing a crime. But still, there is something a little ominous about a robot that looks like a human saying, “you be nice to me and I’ll be nice to you.”

This article was previously published on

About the Author:

Christina is a B2B writer and MBA, specializing in fintech, cybersecurity, blockchain, and other geeky areas. When she’s not at her computer, you’ll find her surfing, traveling, or relaxing with a glass of wine.


This article is written by Nick McCrea and originally posted at Toptal

Machine Learning (ML) is coming into its own, with a growing recognition that ML can play a key role in a wide range of critical applications, such as data mining, natural language processing, image recognition, and expert systems. ML provides potential solutions in all these domains and more, and is set to be a pillar of our future civilization.

The supply of able ML designers has yet to catch up to this demand. A major reason for this is that ML is just plain tricky. This Machine Learning tutorial introduces the basics of ML theory, laying down the common themes and concepts, making it easy to follow the logic and get comfortable with machine learning basics.

Machine learning tutorial illustration: This curious machine is learning machine learning, unsupervised.

What is Machine Learning?

So what exactly is “machine learning” anyway? ML is actually a lot of things. The field is quite vast and is expanding rapidly, being continually partitioned and sub-partitioned ad nauseam into different sub-specialties and types of machine learning.

There are some basic common threads, however, and the overarching theme is best summed up by this oft-quoted statement made by Arthur Samuel way back in 1959: “[Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed.”

And more recently, in 1997, Tom Mitchell gave a “well-posed” definition that has proven more useful to engineering types: “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” — Tom Mitchell, Carnegie Mellon University

So if you want your program to predict, for example, traffic patterns at a busy intersection (task T), you can run it through a machine learning algorithm with data about past traffic patterns (experience E) and, if it has successfully “learned”, it will then do better at predicting future traffic patterns (performance measure P).

The highly complex nature of many real-world problems, though, often means that inventing specialized algorithms that will solve them perfectly every time is impractical, if not impossible. Examples of machine learning problems include, “Is this cancer?”“What is the market value of this house?”“Which of these people are good friends with each other?”“Will this rocket engine explode on take off?”“Will this person like this movie?”“Who is this?”“What did you say?”, and “How do you fly this thing?”. All of these problems are excellent targets for an ML project, and in fact ML has been applied to each of them with great success.

ML solves problems that cannot be solved by numerical means alone.

Among the different types of ML tasks, a crucial distinction is drawn between supervised and unsupervised learning:

  • Supervised machine learning: The program is “trained” on a pre-defined set of “training examples”, which then facilitate its ability to reach an accurate conclusion when given new data.
  • Unsupervised machine learning: The program is given a bunch of data and must find patterns and relationships therein.

We will primarily focus on supervised learning here, but the end of the article includes a brief discussion of unsupervised learning with some links for those who are interested in pursuing the topic further.

Supervised Machine Learning

In the majority of supervised learning applications, the ultimate goal is to develop a finely tuned predictor function h(x) (sometimes called the “hypothesis”). “Learning” consists of using sophisticated mathematical algorithms to optimize this function so that, given input data x about a certain domain (say, square footage of a house), it will accurately predict some interesting value h(x) (say, market price for said house).

In practice, x almost always represents multiple data points. So, for example, a housing price predictor might take not only square-footage (x1) but also number of bedrooms (x2), number of bathrooms (x3), number of floors (x4), year built (x5), zip code (x6), and so forth. Determining which inputs to use is an important part of ML design. However, for the sake of explanation, it is easiest to assume a single input value is used.

So let’s say our simple predictor has this form:

h of x equals theta 0 plus theta 1 times x - Machine Learning

where theta 0 and theta 1 are constants. Our goal is to find the perfect values of theta 0 and theta 1 to make our predictor work as well as possible.

Optimizing the predictor h(x) is done using training examples. For each training example, we have an input value x_train, for which a corresponding output, y, is known in advance. For each example, we find the difference between the known, correct value y, and our predicted value h(x_train). With enough training examples, these differences give us a useful way to measure the “wrongness” of h(x). We can then tweak h(x) by tweaking the values of theta 0 and theta 1 to make it “less wrong”. This process is repeated over and over until the system has converged on the best values for theta 0 and theta 1. In this way, the predictor becomes trained, and is ready to do some real-world predicting.

Machine Learning Examples

We stick to simple problems in this post for the sake of illustration, but the reason ML exists is because, in the real world, the problems are much more complex. On this flat screen we can draw you a picture of, at most, a three-dimensional data set, but ML problems commonly deal with data with millions of dimensions, and very complex predictor functions. ML solves problems that cannot be solved by numerical means alone.

With that in mind, let’s look at a simple example. Say we have the following training data, wherein company employees have rated their satisfaction on a scale of 1 to 100:

Employee satisfaction rating by salary is a great machine learning example.

First, notice that the data is a little noisy. That is, while we can see that there is a pattern to it (i.e. employee satisfaction tends to go up as salary goes up), it does not all fit neatly on a straight line. This will always be the case with real-world data (and we absolutely want to train our machine using real-world data!). So then how can we train a machine to perfectly predict an employee’s level of satisfaction? The answer, of course, is that we can’t. The goal of ML is never to make “perfect” guesses, because ML deals in domains where there is no such thing. The goal is to make guesses that are good enough to be useful.

It is somewhat reminiscent of the famous statement by British mathematician and professor of statistics George E. P. Box that “all models are wrong, but some are useful”.

The goal of ML is never to make “perfect” guesses, because ML deals in domains where there is no such thing. The goal is to make guesses that are good enough to be useful.

Machine Learning builds heavily on statistics. For example, when we train our machine to learn, we have to give it a statistically significant random sample as training data. If the training set is not random, we run the risk of the machine learning patterns that aren’t actually there. And if the training set is too small (see law of large numbers), we won’t learn enough and may even reach inaccurate conclusions. For example, attempting to predict company-wide satisfaction patterns based on data from upper management alone would likely be error-prone.

With this understanding, let’s give our machine the data we’ve been given above and have it learn it. First we have to initialize our predictor h(x) with some reasonable values of theta 0 and theta 1. Now our predictor looks like this when placed over our training set:

h of x equals twelve plus 0 point two x Machine learning example illustration: A machine learning predictor over a training dataset.

If we ask this predictor for the satisfaction of an employee making $60k, it would predict a rating of 27:

In this image, the machine has yet to learn to predict a probable outcome.

It’s obvious that this was a terrible guess and that this machine doesn’t know very much.

So now, let’s give this predictor all the salaries from our training set, and take the differences between the resulting predicted satisfaction ratings and the actual satisfaction ratings of the corresponding employees. If we perform a little mathematical wizardry (which I will describe shortly), we can calculate, with very high certainty, that values of 13.12 for theta 0 and 0.61 for theta 1 are going to give us a better predictor.

h of x equals thirteen point one two plus 0 point six one x - Machine Learning In this case, the machine learning predictor is getting closer.

And if we repeat this process, say 1500 times, our predictor will end up looking like this:

h of x equals fifteen point five four plus 0 point seven five x With a lot of repetition, the machine learning process starts to take shape.

At this point, if we repeat the process, we will find that theta 0 and theta 1 won’t change by any appreciable amount anymore and thus we see that the system has converged. If we haven’t made any mistakes, this means we’ve found the optimal predictor. Accordingly, if we now ask the machine again for the satisfaction rating of the employee who makes $60k, it will predict a rating of roughly 60.

In this example, the machine has learned to predict a probable data point.

Now we’re getting somewhere.

Machine Learning Regression: A Note on Complexity

The above example is technically a simple problem of univariate linear regression, which in reality can be solved by deriving a simple normal equation and skipping this “tuning” process altogether. However, consider a predictor that looks like this:

Four dimensional equation example - Machine Learning

This function takes input in four dimensions and has a variety of polynomial terms. Deriving a normal equation for this function is a significant challenge. Many modern machine learning problems take thousands or even millions of dimensions of data to build predictions using hundreds of coefficients. Predicting how an organism’s genome will be expressed, or what the climate will be like in fifty years, are examples of such complex problems.

Many modern ML problems take thousands or even millions of dimensions of data to build predictions using hundreds of coefficients.

Fortunately, the iterative approach taken by ML systems is much more resilient in the face of such complexity. Instead of using brute force, a machine learning system “feels its way” to the answer. For big problems, this works much better. While this doesn’t mean that ML can solve all arbitrarily complex problems (it can’t), it does make for an incredibly flexible and powerful tool.

Gradient Descent – Minimizing “Wrongness”

Let’s take a closer look at how this iterative process works. In the above example, how do we make sure theta 0and theta 1 are getting better with each step, and not worse? The answer lies in our “measurement of wrongness” alluded to previously, along with a little calculus.

The wrongness measure is known as the cost function (a.k.a., loss function), J of theta. The input theta represents all of the coefficients we are using in our predictor. So in our case, theta is really the pair theta 0 and theta 1J of theta 0 and theta 1 gives us a mathematical measurement of how wrong our predictor is when it uses the given values of theta 0 and theta 1.

The choice of the cost function is another important piece of an ML program. In different contexts, being “wrong” can mean very different things. In our employee satisfaction example, the well-established standard is the linear least squares function:

Cost function expressed as a linear least squares function - Machine Learning

With least squares, the penalty for a bad guess goes up quadratically with the difference between the guess and the correct answer, so it acts as a very “strict” measurement of wrongness. The cost function computes an average penalty over all of the training examples.

So now we see that our goal is to find theta 0 and theta 1 for our predictor h(x) such that our cost function J of theta 0 and theta 1 is as small as possible. We call on the power of calculus to accomplish this.

Consider the following plot of a cost function for some particular Machine Learning problem:

This graphic depicts the bowl-shaped plot of a cost function for a machine learning example.

Here we can see the cost associated with different values of theta 0 and theta 1. We can see the graph has a slight bowl to its shape. The bottom of the bowl represents the lowest cost our predictor can give us based on the given training data. The goal is to “roll down the hill”, and find theta 0 and theta 1 corresponding to this point.

This is where calculus comes in to this machine learning tutorial. For the sake of keeping this explanation manageable, I won’t write out the equations here, but essentially what we do is take the gradient of J of theta 0 and theta 1, which is the pair of derivatives of J of theta 0 and theta 1 - Machine Learning (one over theta 0 and one over theta 1). The gradient will be different for every different value of theta 0 and theta 1, and tells us what the “slope of the hill is” and, in particular, “which way is down”, for these particular thetas. For example, when we plug our current values of theta into the gradient, it may tell us that adding a little to theta 0 and subtracting a little from theta 1 will take us in the direction of the cost function-valley floor. Therefore, we add a little to theta 0, and subtract a little from theta 1, and voilà! We have completed one round of our learning algorithm. Our updated predictor, h(x) = theta 0 + theta 1x, will return better predictions than before. Our machine is now a little bit smarter.

This process of alternating between calculating the current gradient, and updating the thetas from the results, is known as gradient descent.

This image depicts an example of a machine learning gradient descent. This image depicts the number of iterations for this machine learning tutorial.

That covers the basic theory underlying the majority of supervised Machine Learning systems. But the basic concepts can be applied in a variety of different ways, depending on the problem at hand.

Classification Problems in Machine Learning

Under supervised ML, two major subcategories are:

  • Regression machine learning systems: Systems where the value being predicted falls somewhere on a continuous spectrum. These systems help us with questions of “How much?” or “How many?”.
  • Classification machine learning systems: Systems where we seek a yes-or-no prediction, such as “Is this tumer cancerous?”, “Does this cookie meet our quality standards?”, and so on.

As it turns out, the underlying Machine Learning theory is more or less the same. The major differences are the design of the predictor h(x) and the design of the cost function J of theta - Machine Learning.

Our examples so far have focused on regression problems, so let’s now also take a look at a classification example.

Here are the results of a cookie quality testing study, where the training examples have all been labeled as either “good cookie” (y = 1) in blue or “bad cookie” (y = 0) in red.

This example shows how a machine learning regression predictor is not the right solution here.

In classification, a regression predictor is not very useful. What we usually want is a predictor that makes a guess somewhere between 0 and 1. In a cookie quality classifier, a prediction of 1 would represent a very confident guess that the cookie is perfect and utterly mouthwatering. A prediction of 0 represents high confidence that the cookie is an embarrassment to the cookie industry. Values falling within this range represent less confidence, so we might design our system such that prediction of 0.6 means “Man, that’s a tough call, but I’m gonna go with yes, you can sell that cookie,” while a value exactly in the middle, at 0.5, might represent complete uncertainty. This isn’t always how confidence is distributed in a classifier but it’s a very common design and works for purposes of our illustration.

It turns out there’s a nice function that captures this behavior well. It’s called the sigmoid functiong(z), and it looks something like this:

h of x equals g of z - Machine Learning The sigmoid function at work to accomplish a supervised machine learning example.

z is some representation of our inputs and coefficients, such as:

z equals theta 0 plus theta 1 times x

so that our predictor becomes:

h of x equals g of theta 0 plus theta 1 times x - Machine Learning

Notice that the sigmoid function transforms our output into the range between 0 and 1.

The logic behind the design of the cost function is also different in classification. Again we ask “what does it mean for a guess to be wrong?” and this time a very good rule of thumb is that if the correct guess was 0 and we guessed 1, then we were completely and utterly wrong, and vice-versa. Since you can’t be more wrong than absolutely wrong, the penalty in this case is enormous. Alternatively if the correct guess was 0 and we guessed 0, our cost function should not add any cost for each time this happens. If the guess was right, but we weren’t completely confident (e.g. y = 1, but h(x) = 0.8), this should come with a small cost, and if our guess was wrong but we weren’t completely confident (e.g. y = 1 but h(x) = 0.3), this should come with some significant cost, but not as much as if we were completely wrong.

This behavior is captured by the log function, such that:

Cost expressed as log - Machine Learning

Again, the cost function J of theta gives us the average cost over all of our training examples.

So here we’ve described how the predictor h(x) and the cost function J of theta differ between regression and classification, but gradient descent still works fine.

A classification predictor can be visualized by drawing the boundary line; i.e., the barrier where the prediction changes from a “yes” (a prediction greater than 0.5) to a “no” (a prediction less than 0.5). With a well-designed system, our cookie data can generate a classification boundary that looks like this:

A graph of a completed machine learning example using the sigmoid function.

Now that’s a machine that knows a thing or two about cookies!

An Introduction to Neural Networks

No discussion of Machine Learning would be complete without at least mentioning neural networks. Not only do neural nets offer an extremely powerful tool to solve very tough problems, but they also offer fascinating hints at the workings of our own brains, and intriguing possibilities for one day creating truly intelligent machines.

Neural networks are well suited to machine learning models where the number of inputs is gigantic. The computational cost of handling such a problem is just too overwhelming for the types of systems we’ve discussed above. As it turns out, however, neural networks can be effectively tuned using techniques that are strikingly similar to gradient descent in principle.

A thorough discussion of neural networks is beyond the scope of this tutorial, but I recommend checking out our previous post on the subject.

Unsupervised Machine Learning

Unsupervised machine learning is typically tasked with finding relationships within data. There are no training examples used in this process. Instead, the system is given a set data and tasked with finding patterns and correlations therein. A good example is identifying close-knit groups of friends in social network data.

The Machine Learning algorithms used to do this are very different from those used for supervised learning, and the topic merits its own post. However, for something to chew on in the meantime, take a look at clustering algorithms such as k-means, and also look into dimensionality reduction systems such as principle component analysis. Our prior post on big data discusses a number of these topics in more detail as well.

This article is written by Nick McCrea and originally posted at Toptal

About the author: Nicholas is a professional software engineer with a passion for quality craftsmanship. He loves architecting and writing top-notch code, and is proud of his ability to synthesize and communicate ideas effectively to technical and non-technical folks alike. Nicholas always enjoys a novel challenge.


Company Info

This website is a project by:

TNZ Web Solutions

TNZ Web Solutions is part of ZedBee Limited
NZ Companies registration nr. 5397562 (records)



3/12 Cypress Street
Tauranga 3110, New Zealand


© 2018 Stephen Collie Enterprises