Fork me on GitHub See my Experiment on

What is this?

This is what I got when I combined webcam-based gesture recognition with Hakim El Hattab's reveal.js.
It took me a while to write and fine tune the detection algorithms. Even then, the algorithms are only about 80% accurate. You get the gist of it though: 
A flick of the hand in mid-air changes the slide.

A two hand flick up or down activates the slide overview.

Tips and troubleshooting in the slides below (you can still use the keyboard or reveal.js's built in controls)


(Troubleshooting is in the slide below this one)
  • The detection algorithm can determine the amount and position of motion. It will only change the slide if your amount of motion exceeds a threshold.
  • Try to keep the rest of your body still.
  • Your hand should preferably come within two to three feet away from the camera.
  • Gestures should be performed with open palms facing the camera.


  • You need a webcam. You're sunk if you don't have one.
  • If you're not viewing this in Google Chrome, please do.
  • If you're using Chrome, you should get a message asking to access your webcam. Hit Allow.
  • If you aren't getting that message, refresh.
  • If refreshing doesn't work, head over to chrome://flags and enable all flags that mention WebRTC. Restart Chrome.
  • If that doesn't work, I don't know what will. You can at least watch a video of this at
  • What is REVEAL.JS?

    A framework for easily creating
    beautiful presentations using HTML
    -Hakim El Hattab, the author

    (go down for details)


    It's similar to Prezi or Powerpoint, except it's in HTML5.

    HTML5 is the reason behind Reveal's speed, looks, and


    Beat that, Prezi! (which uses Flash)


    In addition to going left and right, 
    you can also go 


    There's also a slide overview if you hit Esc.

    What is WebRTC?

    This is a developing standard in webpages to get data from your webcam and your microphone.

    Once you have a video feed, you can display, alter, and extract information from it using JavaScript.

    Why Chrome? Look below to find out.

    Chrome meets Javascript

    Chrome runs Javascript using it's wonderfully fast
    V8 Javascript Engine.

    It's remarkably fast. I ran my Tic-tac-toe algorithm on multiple browsers and this is how V8 stacks up:

    1. Chrome: 199 ms
    2. FF: 241 ms
    3. Opera: 439 ms
    4. IE: >16000 ms

    My thought Process

    You can get fabulous presentations and cool 3D transitions with Reveal.js...
    You can get user input through the webcam using Web RTC...

    Armed with the two and some spare time, I wrote some signal processing JS (took a while to get the algorithms just right) and overlay interface.

    I believe this will be the one aspect of future of computing.

    What about Kinect?

    Kinect uses a 3D sensor, which costs around $100.

    This uses a webcam, which is built into all newer computers. The cheaper ones come at $20 a pop.

    However, there are limitations of webcams as described beneath.


    Webcams cannot detect depth.

    Since all you get from a webcam is a picture, there is not much data you actually get, compared to a 3D scanner.

    The lack of data forces developers to think outside the box and write algorithms to infer what is actually going on without the depth data.

    The WOW factor

    You can control the computer without touching it!

    The computer can distinguish between different gestures!

    Who needs touchscreens when you have this?

    More Info

    Fork me on Github!

    My site is at