Haptic Vest Computer Vision Project

Herein lies the almighty Haptic Vest project.

Authors

Philip Pfeffer & Raul Dagir

Classes

Stanford University: CS229, CS230

Papers

Stanford CS229: https://github.com/PhilipPfeffer/haptic_vest/blob/main/CS229_Final_Paper.pdf

Stanford CS230: https://github.com/PhilipPfeffer/haptic_vest/blob/main/CS230_Project_Final_Report.pdf

Description

Imagine being visually impaired but still able to feel the movement of surrounding crowds and cars on your body. Imagine being a peacekeeper but resting assured that unexpected movements behind you will be transduced into your shirt. Imagine being a twenty year-old playing Pokémon Go and receiving a haptic vibration pointing you towards your next catch. This can be achieved by wearing a ‘haptic vest’ that uses a camera to capture images, a computer vision model to process them and coin-sized vibration motors that vibrate the shirt to communicate the output

Video Description of MobileNetV1+V2 Methods

Featuring Raul Dagir (GitHub: rgdagir), who helped me with this portion of the project. Link: https://youtu.be/GM2dVZ0eXgQ

Project Methods

Retraining MobileNets (Stanford CS230 - Deep Learning)

This section/paper describes the design of the computer vision model for such a garment. The input to our algorithm is a (128, 128) greyscale image for MobileNetV1 and a (96, 96) greyscale image for MobileNetV2. We then use a retrained publically-available convolutional neural network (CNN) architecture called MobileNet to output a predicted classification of the image. The model classifies the image into the classes [‘person’, ‘car’, ‘neither’].

Creating a new optimiser, Pow2Opt (Stanford CS229 - Machine Learning)

Computer vision models tend to be large and require many matrix multiplications to perform inference on a new image. This poses challenges on consumer edge devices, such as the Arduino Nano BLE Sense and Raspberry Pi 4, where memory and compute are scarce.

Proposed Methods

Pow2Opt on CNN

Apply the Pow2Opt optimiser to a CNN, not just a fully-connected network. I already tried this with a pre-trained MobileNetV2, but it didn’t work. However, it should be possible to get this working.

Pros: This would decrease the number of parameters AND probably increase accuracy compared to the fully-connected network. Cons: This makes the device-side code more difficult. You have to pack and then unpack the representation. Why not use easier quantisation?

Two models

Use two binary classification models, the first classifies [‘person’, ‘neither’] and the second classifies [‘car’, ‘neither’]. Each counts as a vote and the system decides what is present given the models. This can also serve for multilabel classification.

Pros: can run these models concurrently on different arduinos AND binary classification should be simpler than three-class. Cons: have to develop two models AND have to buy several arduinos on each vest.

This can also be expanded to use a third model that classifies [‘person’, ‘car’].

Components

The components of the project are:

Useful links