Face Mask Detector

In this article, we are going to find out how to detect faces in real-time using OpenCV.After detecting the face from the webcam stream, we are going to save the frames containing the face. Later we will pass these frames(images) to our mask detector classifier to find out if the person is wearing a mask or not.

What is Face Detection?

The goal of face detection is to determine if there are any faces in the image or video. If multiple faces are present, each face is enclosed by a bounding box and thus we know the location of the faces

Human faces are difficult to model as there are many variables that can change for example facial expression, orientation, lighting conditions and partial occlusions such as sunglasses, scarf, mask etc. The result of the detection gives the face location parameters and it could be required in various forms, for instance, a rectangle covering the central part of the face, eye centers or landmarks including eyes, nose and mouth corners, eyebrows, nostrils, etc.


Face mask detection had seen significant progress in the domains of Image processing and Computer vision, since the rise of the Covid-19 pandemic. Many face detection models have been created using several algorithms and techniques. The proposed approach in this paper uses deep learning, TensorFlow, Keras, and OpenCV to detect face masks. This model can be used for safety purposes since it is very resource efficient to deploy. The SSDMNV2 approach uses Single Shot Multibox Detector as a face detector and MobilenetV2 architecture as a framework for the classifier, which is very lightweight and can even be used in embedded devices (like NVIDIA Jetson Nano, Raspberry pi) to perform real-time mask detection. The technique deployed in this paper gives us an accuracy score of 0.9264 and an F1 score of 0.93. The dataset provided in this paper, was collected from various sources, can be used by other researchers for further advanced models such as those of face recognition, facial landmarks, and facial part detection process.


The trend of wearing face masks in public is rising due to the COVID-19 coronavirus epidemic all over the world. Before Covid-19, People used to wear masks to protect their health from air pollution. While other people are self-conscious about their looks, they hide their emotions from the public by hiding their faces. Scientists proved that wearing face masks works on impeding COVID-19 transmission. COVID19 (known as coronavirus) is the
latest epidemic virus that hit human health in the last century. In 2020, the rapid spreading of COVID-19 has forced the World Health Organization to declare COVID- 19 as a global pandemic. More than five million cases were infected by COVID-19 in less than 6 months across 188 countries. The virus spreads through close contact and in crowded and overcrowded areas. The coronavirus epidemic has given rise to an extraordinary degree of
worldwide scientific cooperation. Artificial Intelligence (AI) based on Machine learning and Deep Learning can help to fight Covid-19 in many ways. Machine learning allows researchers and clinicians evaluate vast quantities of data to forecast the distribution of COVID-19,to serve as an early warning mechanism for potential pandemics, and to classify
vulnerable populations.The provision of healthcare needs funding for emerging technology such as artificial intelligence, IoT, big data and machine learning to tackle and predict new diseases. In order to better understand infection rates and to trace and quickly detect infections, the AI’s power is being exploited to address the Covid-19 pandemic. People are forced by laws to wear face masks in public in many countries.These rules and laws were
developed as an action to the exponential growth in cases and deaths in many areas. However, the process of monitoring large groups of people is becoming more difficult. The monitoring process involves the detection of anyone who is not wearing a face mask. Here we introduce a mask face detection model that is based on computer vision and deep learning. The proposed model can be integrated with surveillance cameras to impede the COVID-19
transmission by allowing the detection of people who are wearing masks not wearing face masks. The model is integration between deep learning and classical machine learning techniques with opencv, tensor flow and keras. We have used deep transfer learning for feature extractions and combined it with three classical machine learning algorithms. We introduced a comparison between them to find the most suitable algorithm that achieved the
highest accuracy and consumed the least time in the process of training and detection.

1.1 Problem Statement:

As we all know that there is an ongoing pandemic of coronavirus disease 2019 (COVID-19) which is accelerating day by day, self protection is the only way out which can be done by wearing masks due to which the utility of masks nowadays is widely accepted.Though frontline corona warriors are trying their best and ensuring that everyone one is wearing mask
in public places but it is not possible for them alone to go to nook and cranny to ensure safety.In view of this current situation our team decided to make face mask detector.The task in hand is to check whether the person is wearing mask or not through LiveStream using WebCam.Besides that,there is also an advantage of detecting face mask through images.

1.2 Objective:

To identify the person on an image/video stream wearing a face mask with the help of computer vision and deep learning algorithm with opencv,keras,tensorflow.The mask detector that is built in this project could potentially be used to help in ensuring your safety and the safety of others.

1.3 Scope:

Face mask detection is an AI based technology that analyzes a video stream to detect and recognize a face mask worn by an individual person or a crowd of people. Our DeepSight software outputs a confidence value for each detection. Every individual is classified either as ‘wearing a mask’ or flagged as ‘not wearing a mask.


In the recent past, various researchers and analysts mainly focused on gray-scale face image ( Ojala, Pietikainen, & Maenpaa, 2002 ). While some were completely built on pattern identification models, possessing initial information of the face model while others were using AdaBoost ( Kim, Park, Woo, Jeong, & Min, 2011 ), which was an excellent classifier for
training purposes. Then came the Viola-Jones Detector, which provided a breakthrough in face detection technology, and real-time face detection got possible. It faced various problems like the orientation and brightness of the face, making it hard to intercept. So basically, it failed to work in dull and dim light.Thus, researchers started searching for a new alternative
model that could easily detect faces as well as masks on the face.
In the past, many datasets for face detection were developed to form an impression of face mask detection models. Earlier datasets consisted of images fetched in supervised surroundings, while recent datasets are constructed by taking online images like WiderFace ( Yang, Luo, Loy, & Tang, 2016 ), IJB-A ( Klare et al., 2015a ), MALF ( Yang, Yan, Lei, & Li, 2015 ), and CelebA ( Klare et al., 2015b ). Annotations are provided for present faces in these datasets as compared to earlier ones. Large datasets are much more needed for making better training and testing data and perform real-world applications in a much simpler way. This calls for various deep learning algorithms which can read faces and mask straight from the data provided by the user.
Face Mask detection models have many variations. These can be divided into several categories. In Boosting-based classification, boosted cascades with easy haar features were embraced using the Viola-Jones face detector ( Jones, Viola, & Jones, 2001 ), which was discussed above in this section. Then a Multiview face mask detector was made motivated by the Viola-Jones detector model. In addition to this, a face mask detector model was made
using decision trees algorithms. Face mask detectors in this category were very effective in detecting face masks.
In Deformable Part Model-based classification, the structure and orientations of several different faces are modelled using DPM. In 2006 Ramanan proposed a Random forest tree model in face mask detection, which accurately guesses face structures and facial poses. Zhang, Zhang, Li, and Qiao (2016 ), one of the renowned researchers made a DPM-based face mask detector using around 30, 000 faces divided into masks and without masks category. His work achieved an exceptional accuracy of 97.14 %. Further models of face mask detectors were made by Chen, Ren, Wei, Cao, and Sun (2014 ). Typically, DPM-based face mask detection models can achieve majestic precisions, but it may be tolerant from the very towering cost of computation due to the use of DPM. In Convolutional Neural Network-based classification, face detector models learn directly from the user’s data and then apply several deep learning algorithms on it ( Ren, He, Girshick,
& Sun, 2015 ). In the year 2007, Li, Lin, Shen, Brandt, and Hua (2015 ) came up with Cascade CNN. In Yang, Yan et al. (2015) , Yang et al. came up with the idea of features aggregation of faces in the face detection model. In further research works, Ojala et al. (2002 ) upgraded the AlexNet architecture for fine-tuning the image dataset. For uninhibited circumstances, Zhu et
al. (2017 ) propose a Contextual Multi-Scale Region-based Convolutional Neural Network (CMS-RCNN), which brought a significant impact on the face detection models. To minimize the error on the substitute layers of CNN layers and dealing with the biased obstructions generated in the mask detection models, Opitz et al. (2016 ) prepared a grid loss layer. As technology advanced, further CNN-based 3D models started coming up; one was proposed by Li et al. (2015 ). It was a learning structure for face mask detection models. Several other works were done in the sphere of pose recognition, gender estimation, localization of landmarks, etc.
TheFace mask detection model named SSDMNV2 has been developed using deep neural network modules from OpenCV and TensorFlow, which contains a Single Shot Multibox Detector object detection model. Typical classification architectures like ResNet-10 which is used as a backbone architecture for this model and image classification and fine-tuned MobileNetV2 classifier has been used, MobileNetV2 classifier has been an improvement over MobileNetV1 architecture classifier as it consisted of 3 × 3 convolutional layer as the initial layer, which is followed by 13 times the previous building blocks. In contrast, MobileNetV2 architecture is made up of 17, 3 × 3 convolutional layers in a row accompanied by a 1 × 1 convolution, an average layer of max pooling, and a layer of classification. The residual connection is a new addition in the MobileNetV2 classifier.

2.1 Proposed Methodology:

To predict whether a person has worn a mask correctly, the initial stage would be to train the model using a proper dataset. Details about the Dataset have been discussed.After training the classifier, an accurate face detection model is required to detect faces, so that the SSDMNV2 model can classify whether the person is wearing a mask or not. The task in this paper is to
raise the accuracy of mask detection without being too resource-heavy. For doing this task, the DNN module was used from OpenCV, which contains a ‘Single Shot Multibox Detector’ (SSD) ( Liu et al., 2016 ) object detection model with ResNet-10 ( Anisimov & Khanova, 2017 ) as its backbone architecture. This approach helps in detecting faces in real-time, even on embedded devices like Raspberry Pi. The following classifier uses a pre-trained model MobileNetV2 ( Sandler, Howard, Zhu, Zhmoginov, & Chen, 2018 ) to predict whether the person is wearing a mask or not.

2.2 Summary:

A deep learning based framework for feature based face mask detection by
training a deep convolutional neural network. A strategy has been proposed for face mask detection using a dataset which includes images of people with mask and without mask.Hence this algorithm uses a pre-trained model MobileNetV2 to predict whether the person is wearing a mask or not.


3.1 Machine Learning:

Machine learning (ML)is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as “training data’’, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as email filtering
and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning. In its application across business problems, machine learning is also referred to as predictive analytics.

Machine learning approaches are traditionally divided into three broad categories, depending on the nature of the “signal” or “feedback” available to the learning system:

● Supervised learning: The computer is presented with example inputs and their desired outputs, given by a “teacher”, and the goal is to learn a general rule that maps inputs
to outputs.
● Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).
● Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle or playing a game against an opponent). As it navigates its problem space, the program is provided feedback that’s analogous to rewards, which it tries to maximize.

3.2 Artificial Intelligence:

Artificial intelligence is a branch of computer science that works on developing simulations of human intelligence in machines or computer systems. Ultimately, the goal of artificial intelligence is to replicate human intelligence so that machines can carry out tasks that require general human intelligence. AI works by taking in large amounts of data and teaching
the machine to recognize patterns. This then enables the machine to be able to perform tasks such as learning, decision making, problem solving, and reasoning.

There are three different types of artificial intelligence, narrow artificial intelligence, general artificial intelligence, and artificial super intelligence.

Narrow artificial intelligence is the artificial intelligence that we have currently achieved. It operates under a set of conditions and limitations that allow it to appear as if it has human intelligence in some tasks. Some examples of narrow AI are voice and facial recognition, Apple’s Siri, or self-driving cars.

General artificial intelligence is a computer system that is able to accurately replicate the intelligence of humans. The system is able to learn through performing tasks and apply the knowledge the system acquires to similar or more difficult tasks, much as a human would do. General AI has not yet been achieved.

Artificial super intelligence is the idea that AI machines will eventually surpass human intelligence and ability. When this is achieved, machines will be better and faster at everything that humans are able to do.

COVID-19, also known as CoronaVirus, has been spreading rapidly across the world for months now. The only way to keep more people from becoming ill with it is to stop the spread of the virus from infecting more people. This is done by wearing masks, staying 6 feet away from one another, social distancing, and staying home if one is experiencing symptoms. It has been said that if everyone were to follow this safety protocol, the whole virus would be cleared out in approximately six weeks. However, people are having a hard time adhering to these precautions, which is keeping the virus around even longer.

One solution that is currently being developed and tested is using artificial intelligence to identify whether people are wearing a mask or not. It’s difficult keeping track of so many people and making sure that each one is following proper safety precautions. However an AI detection system would help to detect who is complying and who is not.

The program works by identifying faces and placing them into different groups. The system has to be able to recognize faces that are wearing masks, faces that are not wearing masks, and faces that have masks, but are not wearing them correctly. Not wearing them correctly includes having the mask under the person’s nose or at their chin.

The system detects faces that are not wearing a mask or are not wearing a mask correctly by first detecting the faces of the people in the frame of the video. The system then uses biometrics to map the facial features that are in view. It narrows in on each face to see if it can detect a mouth, nose, cheeks, or chin on the face. When a mask is worn the right way, none of those features should be exposed. If any of these parts of the face are visible, this alerts the system that a mask is either not being worn or is not being worn correctly.

3.3 Computer Vision:

Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do, Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high dimensional data from the real world in
order to produce numerical or symbolic information, e.g. in the forms of decisions, Understanding in this context means the transformation of visual images (the input of the retina)into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics,
statistics, and learning theory.

The scientific discipline of computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, multidimensional data from a 3D scanner or medical scanning device. The technological discipline of computer vision seeks to apply its
theories and models to the construction of computer vision systems. Computer vision is an interdisciplinary field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.

Computer vision is concerned with the automatic extraction, analysis and understanding of useful information from a single image or a sequence of images. It involves the development of a theoretical and algorithmic basis to achieve automatic visual understanding. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video
sequences, views from multiple cameras, or multidimensional data from a medical scanner. As a technological discipline, computer vision seeks to apply its theories and models for the construction of computer vision systems.

3.4 Deep Learning:

Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. Automatically learning features at multiple levels of abstraction allow a system to learn complex functions mapping the input to the output directly from data, without depending completely on human-crafted
features. Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features.

The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning. Deep learning excels on problem domains where the inputs (and even output)
are analog. Meaning, they are not a few quantities in a tabular format but instead are images of pixel data, documents of text data or files of audio data. Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction.


4.1 Software Requirements:
Operating System : MacOS,Windows.
Technology : Deep Learning,Computer Vision,Machine Learning,Artificial Intelligence.
Programming Language : Python.
Coding Platform : Xcode.

4.1.1 Python:

Python is an interpreted, high-level, general-purpose programming language. Created by “Guido van Rossum” and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace. It provides constructs that enable clear programming on both small and large scales.

4.1.2 Tensorflow:

TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks. It is used for both research and production at Google, TensorFlow is Google Brain’s second-generation system. Version 1.0.0 was released on February 11, While the reference implementation runs on single devices,
TensorFlow can run on multiple CPUs and GPUs (with optional CUDA and SYCL extensions for general-purpose computing on graphics processing units). Tensor Flow is available on 64-bit Linux, macOS, Windows, and mobile computing platforms including Android and iOS. Its flexible architecture allows for the easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers
to mobile and edge devices. The name TensorFlow derives from the operations that such neural networks perform on multidimensional data arrays, which are referred to as tensors. During the Google I/O Conference in June 2016, Jeff Dean stated that 1,500 repositories on GitHub mentioned TensorFlow, of which only 5 were from Google.Unlike other numerical
libraries intended for use in Deep Learning like Theano, TensorFlow was designed for use both in research and development and in production systems, not least RankBrain in Google search and the fun DeepDream project.It can run on single CPU systems, GPUs as well as mobile devices and large scale distributed systems of hundreds of machines.

4.1.3 Keras:

Keras is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear & actionable error messages. It also has extensive documentation and developer guides. Keras contains numerous
implementations of commonly used neural network building blocks such as layers, objectives, activation functions, optimizers, and a host of tools to make working with image and text data easier to simplify the coding necessary for writing deep neural network code. The code is hosted on GitHub, and community support forums include the GitHub issues page, and a Slack channel. Keras is a minimalist Python library for deep learning that can run
on top of Theano or TensorFlow. It was developed to make implementing deep learning models as fast and easy as possible for research and development.It runs on Python 2.7 or 3.5 and can seamlessly execute on GPUs and CPUs given the underlying frameworks. It is released under the permissive MIT license.

4.1.5 Numpy:

A library for the Python programming language. adding support for large. multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate with.

4.1.6 Opencv:

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the code.

The library has more than 2500 optimized algorithms,which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects,
produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc. OpenCV has more than 47 thousand people in the user community and an estimated number of downloads exceeding 18 million. The
library is used extensively in companies, research groups and by governmental bodies.Along with well-established companies like
Google, Yahoo, Microsoft,Intel,IBM, Sony,Honda,Toyota that employ the library, there are many startups such as Applied Minds, Video Surf, and Zeitera, that make extensive use of OpenCV. OpenCV’s deployed uses span
the range from stitching street view images together, detecting intrusions in surveillance video in Israel, monitoring mine equipment in China, helping robots navigate and pick up objects atWillow Garage, detection of swimming pool drowning accidents in Europe, running interactive art in Spain and New York, checking runways for debris in Turkey, inspecting labels on products in factories around the world on to rapid face detection in Japan.

It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS. OpenCV leans mostly towards real-time vision applications and takes advantage of MMX and SSE instructions when available. A full-featured CUDA and OpenCL interfaces are being actively developed right now. There are over 500 algorithms and about 10 times as many functions that compose or support those algorithms. OpenCV is written natively in C++ and has a template interface that works seamlessly with STL

4.1.7 Matplotlib:

A plotting library for the Python programming language and its numerical mathematics extension NumPy.

4.1.8 Argparse:

It is a parser for command-line options, arguments and sub-commands. The argparse module makes it easy to write user-friendly command-line interfaces. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys. argv .

4.1.9 Scipy:

SciPy in Python is an open-source library used for solving mathematical, scientific, engineering, and technical problems. It allows users to manipulate the data and visualize the data using a wide range of high-level Python commands. SciPy is built on the Python NumPy extension. SciPy is also pronounced as “Sigh Pi.”

4.1.10 Scikit-Learn:

Scikit-learn is a free machine learning library for Python. It features various algorithms like support vector machines, random forests, and k-neighbours, and it also supports Python numerical and scientific libraries like NumPy and SciPy .

4.1.11 Pillow:

The Python Imaging Library adds image processing capabilities to your Python interpreter.This library provides extensive file format support, an efficient internal representation, and fairly powerful image processing capabilities.The core image library is designed for fast access to data stored in a few basic pixel formats. It should provide a solid foundation for a general image processing tool.

4.1.12 Streamlit:

Streamlit is an open source app framework specifically designed for ML engineers working with Python. It allows you to create a stunning looking application with only a few lines of code.

4.1.13 Xcode:

Xcode is an Integrated Development Environment, which means it pulls all the tools needed to produce an application (particularly a text editor, a compiler, and a build system) into one software package rather than leaving them as a set of individual tools connected by scripts. Xcode is Apple’s official IDE for Mac and iOS developers; it was originally known as Project Builder in
the NeXT days, and renamed to Xcode somewhere around Mac OS X 10.3 or 10.4. By version 4, Apple had folded in the companion Interface Builder program so there was only one app bundle the design of the program hasn’t changed a whole lot since then, although obviously the tools are updated regularly.

4.2 Hardware Requirements:

Processor : Latest version or any updated Processor.
RAM : Minimum 4 GB.
Hard Disk : Minimum 10 GB.
System Type : 64-bit OS, x64-based processor


5.1 Flow Chart

Figure 5.1: Flowchart of Facemask Detector

5.2 Use Case Diagram

A Use Case Diagram is a representation of a user’s interaction with the system that shows the relationship between the user and the different use cases in which the user is involved. Use cases share different kinds of relationships. Defining the relationship between two use cases is the decision of the software analysts of the use case diagram. A relationship between two
use cases is basically modeling the dependency between the two use cases. The reuse of an existing use case by using different types of relationships reduces the overall effort required in developing a system.

Figure 5.2: Use Case Diagram of Facemask Detector

5.3 Sequence Diagram

Sequence diagram shows object interactions arranged in time sequence. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. Sequence diagrams are typically associated with use case realizations in the Logical View of the system under development. A sequence diagram shows parallel vertical lines (lifelines), different processes or objects that live simultaneously, and horizontal arrows, the messages exchanged between them, in the order in which they occur. This allows the specification of simple runtime scenarios in a graphical manner.

5.4 Building the face mask detector:

5.4.1 Dataset

The dataset that has been used for face mask detection consists of “with mask” and “without mask” images. We will use the dataset to build a COVID-19 face mask detector with computer vision and deep learning using Python, OpenCV, and TensorFlow/Keras. Our dataset consists of 3848 images:

● with_mask:1917 images
● Without_mask:1931 images

5.4.2 Pre-processing

The Dataset from Masked face recognition and application contained a lot of noise, and a lot of repetitions were present in the images of this dataset. Since a good dataset dictates the accuracy of the model trained on it, so the data from the above-specified datasets were taken. They were then processed, and also all the repetitions were removed manually. The data cleaning was done manually to remove the corrupt images which were found in the said dataset. Finding these images was a vital part. As it is well known, the corrupt image was a tricky task, but due to valid efforts, we divided the work and cleaned the data set with mask images and without mask images. Cleaning, identifying, and correcting errors in a dataset removes adverse effects from any predictive model.

This part explains the procedure of pre-processing the data and then training on data. First, we define a function name sorted alphanumerically to sort the list in lexicographical order. A function pre-processing is defined, which takes the folder to the dataset as input, then loads all the files from the folder and resizes the images according to the MobileNetV2 model. Then the list is sorted using sorted alphanumerically, and then the images are converted into tensors. Then the list is converted to a NumPy array for faster calculation. After this, the process of data augmentation is applied to increase the accuracy after training the model.

5.4.3 Data augmentation

For the training of MobileNetV2 model, an enormous quantity of data is necessary to perform training effectively due to the non-availability of an adequate amount of data for training the proposed model. The method of data augmentation is used to solve this issue. In this technique, methods like rotation, zooming, shifting, shearing, and flipping the picture are used for generating numerous versions of a similar picture. In the proposed model, image augmentation is used for the data augmentation process. A function image data generation is created for image augmentation, which returns test and train batches of data.

5.4.4 Classification of images using MobileNetV2

MobileNetV2 is a Deep Neural Network that has been deployed for the classification problem. Pretrained weights of ImageNet were loaded from TensorFlow. Then the base layers are frozen to avoid impairment of already learned features. New trainable layers are added, and these layers are trained on the collected dataset so that it can determine the features to classify a face wearing a mask from a face not wearing a mask. Then the model is fine-tuned, and then the weights are saved. Using pre-trained models helps avoid unnecessary computational costs and helps in taking advantage of already biased weights without losing already learned features.

Figure 5.4.4: MobileNet V2


Now that we’ve reviewed our face mask dataset, let’s see how we can use Keras and TensorFlow to train a classifier to automatically detect whether a person is wearing a mask or not.

To accomplish this task, we’ll be fine-tuning the MobileNet V2 architecture, a highly efficient architecture that can be applied to embedded devices with limited computational capacity.

Deploying our face mask detector to embedded devices could reduce the cost of manufacturing such face mask detection systems, hence why we choose to use this architecture.

6.1 Training the face mask detector:

We use tensorflow.keras for importing:

  • Data augmentation
  • Loading the MobilNetV2 classifier (we will fine-tune this model with pre-trained ImageNet weights)
  • Building a new fully-connected (FC) head
  • Pre-processing
  • Loading image data
  • We’ll use scikit-learn (sklearn) for binarizing class labels, segmenting our dataset, and printing a classification report.
  • We’ll use matplotlib library for plotting training curves.
  • We’ll use scikit-learn (sklearn) for binarizing class labels, segmenting our dataset, and printing a classification report.
  • My imutils paths implementation will help us to find and list images in our dataset. And we’ll use matplotlib to plot our training curves.

6.1.1 Argparser:

1st argument is to take dataset that is used to train the model.
2nd argument is the path to output training history plot, which will be generated using matplotlib.
3rd argument is the path to the resulting serialized face mask classification model.

After loading the dataset we preprocess the data by appending images in data numpy array and labels of them into label numpy array.
Then we convert the labels into “one-hot encoded “ data.

During training, we’ll be applying on-the-fly mutations to our images in an effort to improve generalization. This is known as “ data augmentation “ , where the random rotation, zoom, shear, shift, and flip parameters are established . We’ll use the “aug” object at training time.

6.1.2 Building Model :

Load MobileNet with pre-trained ImageNet weights, leaving off head of the network. Construct a new FC head, and append it to the base in place of the old head.

Freeze the base layers of the network. The weights of these base layers will not be updated during the process of back propagation, whereas the head layer weights will be tuned.

  • The above process we used is called “Fine Tuning”
  • Compiling the model with the Adam optimizer and binary cross-entropy
  • Then we make predictions on the test set, grabbing the highest probability class label indices. Then, we print a classification report in the terminal for inspection.
    After that we serialized our face mask classification model to disk.

6.2 Implementing our COVID-19 face mask detector for images with OpenCV:

Now that our face mask detector is trained, let’s see how we can:
1.Load an input image from disk
2.Detect faces in the image
3.Apply our face mask detector to classify the face as either with_mask or without_mask.

Here we take an input image that has to be checked , then loading the face detector model and the model that we trained for detecting the mask.

As we have our deep learning models, our next step is to load and pre-process an input image.

Here, we loop over our detections and extract the confidence to measure against the confidence threshold .

Next, we’ll run the face ROI through our MaskNet model:

  1. Extract the face ROI via NumPy slicing.
  2. Pre-process the ROI the same way we did during training.
  3. Perform mask detection to predict with_mask or without_mask
    After that we determine the class label based on probabilities returned by the mask detector mode

Once all detections have been processed, we display the output image.

6.3 Implementing our COVID-19 face mask detector in real-time video streams with OpenCV:

The algorithm for this script is the same, but it is pieced together in such a way to allow for processing every frame of your webcam stream.

Thus, the only difference when it comes to imports is that we need a VideoStream class and time. Both of these will help us to work with the stream. We’ll also take advantage of imutils for its aspect-aware resizing method.

Our face detection/mask prediction logic for this script is in the detect_and_predict_mask function.This function detects faces and then applies our face mask classifier to each face ROI.

Our detect_and_predict_mask function accepts three parameters:

  • frame: A frame from our stream
  • faceNet: The model used to detect where in the image faces are
  • maskNet: Our COVID-19 face mask classifier model.

We construct a blob, detect faces, and initialize lists, two of which the function is set to return. These lists include our faces (i.e., ROIs), locs (the face locations), and preds (the list of mask/no mask predictions).

We filter out weak detections and extract bounding boxes while ensuring bounding box coordinates do not fall outside the bounds of the image.

Next, we’ll add face ROIs to two of our corresponding lists.

After extracting face ROIs and pre-processing , we append the face ROIs and bounding boxes to their respective lists.

Now we run our faces through our mask predictor.

It returns our face bounding box locations and corresponding mask/not mask predictions to the caller.

Here we have initialized our:

  • Face detector
  • COVID-19 face mask detector
  • Webcam video stream

Now we loop over frames in the stream.
Hence it detects and predicts whether people are wearing their masks or not.


7.1 COVID-19 face mask detection in images with OpenCV:

Figure 7.1: Our computer vision and deep learning method using Python,OpenCV and TensorFlow/Keras has made it possible to detect the presence of the mask and no mask automatically.

7.2 Detecting COVID-19 face masks with OpenCV in real-time:

Figure 7.2.1: Using Python, OpenCV, and TensorFlow/Keras, our system has correctly detected “No Mask” for my face.
Figure 7.2.2: Using Python, OpenCV, and TensorFlow/Keras, our system has correctly detected “Mask” for my face.

For live stream video check this link: https://youtu.be/cs8ssjZ6CBE


As the technology is blooming with emerging trends the availability so we have novel face mask detectors which can possibly contribute to public healthcare. The architecture consists of MobileNet as the backbone and it can be used for high and low computation scenarios. In order to extract more robust features, we utilize transfer learning to adopt weights from a similar task face detection, which is trained on a very large dataset. We used OpenCV, tensor flow, keras and CNN to detect whether people were wearing face masks or not. The models were tested with images and real-time video streams. The accuracy of the model is achieved and the optimization of the model is a continuous process and we are building a highly accurate solution by tuning the MobileNet V2. This specific model could be used as a use case for edge analytics. Furthermore, the proposed method achieves state-of-the-art results on a public face mask dataset. By the development of face mask detection we can detect if the person is wearing a face mask and allow their entry would be of great help to the society.Further we can add heat sensing,social distancing,sending alerts via messages and application whether a person is wearing a mask or not.




Programmer || Undergraduate CSE student || Free lance Editor || Developer@home ||

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Ensemble Learning

A Quick Introduction to Neural Networks

Figure 1: a single neuron

Custom Activation Function in Tensorflow for Deep Neural Networks from scratch, tutorial.

COVID-19 Rampage on the Stock Market. Machine Learning Comes to Explain.

Build your Basic Machine Learning Web App with Streamlit

Why do neural nets work?

Fundamentals of Logistic Regression!

Debugging neural networks

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Virinchi Sai

Virinchi Sai

Programmer || Undergraduate CSE student || Free lance Editor || Developer@home ||

More from Medium

How I use voice to text to improve my writing productivity 🗣️➡️ 📝

How to Learn Android Development Programming in 6 Easy Steps

The Challenges behind Ethical Journalism

WM Questionnaire Part-1 WH Structure