Gravio Blog
May 8, 2023

Tutorial: How to Create Your Computer Vision Software Sensors with TensorFlow and Gravio

Step by step guide how to create your own computer vision and AI engine using TensorFlow, Google and Gravio
Tutorial: How to Create Your Computer Vision Software Sensors with TensorFlow and Gravio

What is a Software Sensor?

A software sensor is a combination of a data-gathering device, such as a microphone or a camera, that is connected to an AI functionality to detect certain pre-trained patterns within that data source. 

For example, a software sensor could be a camera that is attached to an AI layer that detects certain events or objects (for example people) within the camera feed. Once detected, a sensing event will be triggered.

One of the benefits of Gravio is that you can create your own Software Sensors easily and cost effectively using the Google Cloud Platform.

In this tutorial, we will learn how to do this.

Requirements

  • Google Cloud Platform account, you will need the AutoML 
  • Gravio Account and installation, you can find out how to get started with Gravio here
  • A stationary network/ONVIF camera captures an area where you want to detect an object or classify a state. This camera needs to be in the same network as the Gravio installation.

Object Detection vs Image Classification

Object detection is the task of identifying and localizing objects within an image or video. The goal of object detection is to not only recognize the presence of an object in an image but also to determine its location within the image. Object detection algorithms typically output a boundary box around each detected object in an image.

Classification, on the other hand, is the task of assigning a label or category to an entire image or a region within an image. The goal of classification is to determine what category an object belongs to, based on the features or characteristics of the object. For example, an image classification algorithm might be trained to recognize different breeds of dogs, and it would output a label indicating the breed of dog in the image.

The High-Level Process

The full process is split into the following steps:

  1. Define what you want to capture (for example: detecting dirty dishes in a sink)
  2. Create training data (example: images of various states of the sink, full, empty, half empty, etc.)
  3. Tag or classify the training data
  4. Train the model based on the training data
  5. Deploy the model to the edge device for processing

Note, in this case, we use Gravio, which can detect objects or classify images at an interval of a maximum of 1 image per second from a video feed. Gravio can also ingest images for computer vision processing that are saved to a predefined folder.

Step 1: Defining What You Like to Capture

The “Business Problem” that we are solving in this tutorial is the sink being filled with dirty dishes and nobody is doing the dishes. We want our system to detect, if there are dirty dishes in the sink, and if there are, we want to trigger a message to Slack group chat that someone has to do the dishes. For this, we need to capture plates, cutlery or mugs in the sink. We leave out any other objects for the sake of simplicity.

Step 2: Creating the training data

We install the camera above the sink and start collecting images in a set time interval. This gives us images in different lighting conditions and with different objects in the sink. The more versatile the images are, the better. In Gravio, we can tick the box for saving images in set intervals in the device settings screen:

This will save the images in the “mediadata” folder of your Gravio installation. On Mac, that’s under `/Library/Application Support/HubKit/mediadata`, on Windows, that’s under `C:\ProgramData\HubKit\mediadata\` and on Linux, that’s under `/var/opt/hubkit/mediadata`. More details about the Gravio file handling can be found under https://doc.gravio.com/manuals/gravio4/1/en/topic/file-path

A folder full of images could look something like this:

Step 3: Classifying and Labeling the Images

The next step can be considered the most tedious work of the entire process. That’s because people have to manually tag or classify the images. There are many tools available to tag or classify images. You can find many of them by searching for “image annotation tools” on Google. In this tutorial, we create the models on Google Cloud Vision account, which is part of the Google Cloud Platform. Training models is a very computer power-intensive process, hence Google is charging for the computing power, but often you get some credit to start with and try it out.

To get started, head over to https://cloud.google.com/vision/ and click on the “Get started for free” button.

After completing the registration process, search for “AutoML” and click on “Vision” here:

Which will open this screen:

Gravio supports “Object Detection” or “Image Classification”. For this tutorial we use Object Detection, as it is slightly more complex. If you prefer Image Classification, you can choose that, too. Please click “Get Started” on Object Detection. The first time you do this, you may have to enable the “AutoML API”:

Then you create a new data set:

Give it a sensible name and pick the “Object Detection” on the right:

After a few days of collecting training images, you should hopefully have enough variants to start the labeling process.  Start by uploading them into a Google bucket. In our case here, we created a subfolder and added them there: 

Once uploaded, you can click on the “Images” tab and start creating the labels for the items you would like to detect:

Google then also provides an online tool to assign labels to items in the images. As mentioned, this is a very tedious and time-consuming task, you may want to consider outsourcing this to companies that specialize in image tagging

But if you like to do it yourself, this is how it could look like:

You have to do it for a few dozen, if not hundreds of images until Google has enough examples, it can split it into three types of images: 

The “Validation” and “Test” columns will automatically fill up after a while, just ensure that you have uploaded and tagged enough images. You may have to do it in multiple batches.

That’s it!

Step 4: Training Your Model

Once your images are uploaded and you have enough types of them, you can start the training process. This is where you will incur costs, but the initial free Google credits should be enough for you to give it a first try. Just click on “Start Training”.

It may take a while for the training to finish (multiple hours), and once the training is finished, it’s time for you to download the models Google has created. 

Note: If you like to test the models first, you can deploy the model on Google temporarily, and test it by uploading a random test image. Google will then try to detect objects and you will see if it worked:

Step 5: Deploy the model to the edge device for processing

If your model works well, you can download your model to your computer for deployment on your Gravio instance. You will need the “TensorFlow Lite” or “TensorFlow” files for Gravio. 

In order to do that you will need to install “gsutil” on your computer. Please follow the instructions for your operating system on the website from Google: https://cloud.google.com/storage/docs/gsutil_install

Once that’s installed and authenticated, you can use the command 

gsutil cp -r gs://<name-of-your-bucket> ./<destination_folder> 

To download the models that Google has created for you. It will look something like this:

You can now upload those files to your Gravio instance using Gravio Studio. Connect to your HubKit instance and navigate to Settings > Image Inference Model


Once there, you have two buttons: 

Import & Deploy, which is to deploy a previously exported model, including its configuration

Create Model, which is to start setting up a new model with your newly created TensorFlow files

Click on “Create Model” as we are starting from TensorFlow files. You will see this screen:



  • Select if you are detecting objects or using image classification. 
  • Pick TensorFlow or TensorFlow Lite, depending on which type of model you created.
  • Give your model a name. This name will appear as your new sensor. Upload the files that Google created for you.
  • The Method can be either “Count” or “Group By”. Count will provide the total number of detected objects whereas “Group By” provides the total number of objects for same label (e.g. mug, plate, cutlery etc.).
  • Under OutputFormat you can determine if the result should be provided as a JSON document or as a raw string.
  • “Include Detection Values” only takes effect if JSON is selected and means the output will contain the values that have been detected.
  • The confidence level determines under which confidence level a result should be given.

Once you created your new Model Package, it becomes ready to use as a sensor, and if deployed, the Data Viewer will start showing the result:

If you have chosen JSON as the output format, you can now start accessing those values in the JSON file using JSONPath. A respective trigger could look like this:


Trigger any previously created action from that trigger if the tv.Data.detections is greater than 0, ie. if an object has been detected in the sink.

To learn how to create actions, you can refer to our documentation or YouTube Channel, for example, using this video: https://www.youtube.com/watch?v=Q2040arfeac 

That’s it! Enjoy building with Gravio. Feel free to join our https://link.gravio.com/slack for help within the community! We’re excited to see what you build with Gravio!

Latest Posts
[Tutorial] Using Ollama, LLaVA and Gravio to Build a Local Visual Question and Answer AI Assistant
Tutorial on how to use Gravio, Ollama, LLaVA AI to build a local Visual Question and Answer (VQA) application. Anyone can build this solution without coding required and deploy it as a PoC or even in a production environment if the use case fits.
Monday, June 3, 2024
Read More