Ombromanie: Storytelling with Machine-Learning Infused Hand Shadows - Jen Looper
Sorry if my background's just a little messy. Just do the entire screen because i'm going to be tabbing in and out. Okay, here we go. So there is my messy messy screen. We'll just go like this and then we'll open up the presentation. Is that okay? Hopefully not too bad. Okay. It's not that fabulous. There we go. Okay. Great! So, I'm really excited to be here!
Thank you so much for inviting me! Merci beaucoup! Yes, my name is Jen and i'm a cloud developer advocate lead at Microsoft. I'm on the academic team which is a wonderful place to be. We have a fantastic team of cloud advocates and we're all about students and educators and teachers and anyone who's interested in learning about the cloud, about the web, about all the amazing things that you can do um from microsoft student ambassadors and all the way up to grad students and faculty and great in big institutions. So, it's a real honor to be here. Bonjour Lille. I've actually never been to Lille but someday, someday we can travel again. So let's talk about Ombromanie. So you say... We're going to talk about making pictures with
your hands and it's a fascinating and ancient art. So I'm going to invite you to please extinguish all lights and get ready to take a magical journey with me as we learn how to make stories with our hands and how to cast those shadows on the web. So we're going to be making virtual hand shadows. So I figured since this is all about web stories, let's actually make some web stories and push the the envelope a little bit. So I'd like you to prepare for some magic. Here in Boston it's nine in the morning but I think in France it's the afternoon so maybe grab some tean, turn out the lights and let's think about how we can make stories with our hands. So I've been traveling for a long time, especially in asia.
My husband's chinese, he's from beijing but we've also been to various places in china. I want to show you a little video from Sichuan. Sichuan is fabulous, of course, because of food and big giant pandas rolling around in the snow and some really beautiful landscapes, and they also have, and this is all the way throughout china but there's a tradition in china of these lovely little tea houses, and they're kind of, they kind of act like variety shows that people are sitting around eating tea and having little snacks and there's just entertainer after entertainer comes and just does a little, like almost like a vaudeville show. There's a very old tradition. There's stand-up comics and it's interesting because I don't understand much chinese but I could understand the jokes. There's also a tradition of one person here and one person standing behind with back to back and the person in the back is telling a story and the person in front is is acting it out improvisationally. Lots of really interesting traditions of
this kind of vaudeville show. The audience is basically not paying attention most of the time, they're chatting, they're making a lot of noise, but here's a marvelous example of the ancient chinese folk art of hand shadows. So I'd like to show you just a clip and how this looks. So this just gives you a little taste of this kind of really interesting and magical folk art. It really draws you in and it makes you believe. And he's doing a really, something... well, a couple interesting things. I think he's behind the screen. I think they're pointing the light, even if it's natural light through that that little screen, and he's casting a shadow that way. That's a little bit different technique. Normally you have a candle in between,
love TikTok. I learn all kinds of things on TikTok and one of the trends is "cottage core". So it's all about going back to basics, going back to, you know, living with candles and in one with nature and pressing flowers. And I don't know about the skateboard but a lot of you know this kind of back to basics pioneer living as we would say in the US. So indulge your cottage coil of this fashion this dress is 400 by the way. It's a retreat back to the back from what we're facing now,
back to nostalgia. So I would love to invite you to just grab your sunbonnet and let's go. So this is very little house on the prairie. I grew up in the 70s and this was the fashion. Oh, well, not the sun bonnet but the big prairie dresses was a big deal. So let's go back to nostalgia. I took a look at when people started writing about shadowgraphy and about this fashion for shadowgraphy or ombromanie, the english word would be shadowgraphy, and there's several books written in 1859 ,1860, again in 1910, 1920, 1921, and then again in 1970, and I'm trying I don't know what I think of this slide but I started looking at the dates of when hand shadows were fashionable in the west. And it seems like in times of strife, or when people are really looking to escape,
hand shadows become fashionable. So I think, you know, here we've got the crimean war in the 50s, in the 60s we've got civil war over here, 1910 you're getting ready for the spanish flu and world war one. We've got the 1920s of like complete escapism and trying to between two wars, and then in 1970 in vietnam. For us that was, that was a moment I really think in the 70s people were trying to escape definitely. All the prairie style part part of it for us was in 1976 it was our bicentennial. So there was a lot of kind of back to colonial living I remember this very very well, even though I was only six but it was a very fashionable thing. so I'm just wanna gently
think about when shadowgraphy is fashionable and maybe suggest that it's time for a comeback, maybe it's time to make this thing fashionable again. Because we're in the middle of a pandemic, I don't think this thing is ending anytime very soon even though the vaccine is here. It's a moment where people are looking to escape witness TikTok. TikTok is the perfect escapist app. I think hand shadows are terrific way to to escape a little bit. So how do you cast hand shadows in real time, in real life? So you take a box light, you see how this gentleman has a box light and there's a candle inside of that, and then they've made the aperture small and circular and then the person's standing in between the light and a wall, a blank wall, and casting shadows. And there are some exercises you can do to warm up when you want to start casting hand shadows and this is actually really hard, but if you can do these exercises, you know to try to warm up your fingers, so I would encourage you, you know, before we do our demo, warm up your fingers.
It's not easy. Oh yes! These two fingers have to touch and then you can try to make these shapes with your hands. So while I'm speaking warm up your hands, you're only going to need one hand unfortunately for this demo but that's okay, and then a little more history was was brought to France in 1776, interesting date for the US, probably from china and it was popularized by a guy from Angoulem, named Felicien Trewey, 1848 and he died in 1920. So he was really at that moment where, where it became a big fashion. He was instrumental, he was a magician and he wrote a book about it as well. So this is kind of the ultimate ephemeral low-tech high art, folk art type of, type of activity and what was interesting to me, was to experiment a little bit with light and the best way to cast shadows and I found, you know, don't bother like when you're kids and you're doing this with a flashlight on the wall because your mom thought you were sleeping but you weren't you're casting shadows and scaring your little little brother, a little sister. I was the little sister and my brothers would, you know,
yeah... Just try this at home with just a little candle on a blank wall in pitch black room. It works really well. So this evening, just light a candle and try to cast a shadow. Really the nicest experience is just with a candle and it makes for a very atmospheric activity.
A scary dinosaur or a giant goose. So what I suggest is that we try to breathe new life into this art form with technology. We are technologists, we are web people. What about this idea that you could save a shadow. What about sharing a shadow. Let's give it a try, let's give it a try, let's build a website and use some machine learning, because why not to, cast shadows.
So if you start googling, googling with bing as we say at microsoft, hand shadows. Sorry just articulated hands or hand mimicry or hand casting, articulated hand tracking. There is a lot on the web out there, because this is kind of a big deal, this is kind of an important activity. In gaming, it seems to be quite important so people can follow their hands when wearing like a hololens or, you know, in VR or AR, following your hands and manipulating objects outside of the gaming world. You can also maybe do flight simulation and figure out where to or in the health field to use your hands and figure out exactly what instruments you should pick up and what you should do with this and that to try to, you know, make, to try to interact with a virtual world. So, this is old, 2014, but Microsoft was doing fully articulated hand tracking back in the day, kind of interesting, and really there are a lot of hand tracking libraries and workshops and lists. There is a whole topic on GitHub on hand tracking.
There is one of those awesome lists, so awesome hand pose estimation, with a whole bunch of stuff. There's a challenge on figuring out what to do with hands and how to manipulate. Really a lot on the web, so you kind of have to sift through all of these opportunities to tackle this concept of hand shadows. And just to elaborate a little bit on hand tracking, the uses of hand tracking. Yes, for gaming, we talked about that for simulations and training, but also for hands-free remote interactions like this example. He's able to scroll virtually on the web just by moving his hand, so they're watching the motion of the hand go up and just scroll up and scroll up, scroll up. Assistive technologies, you know, can you do things with your hands
that would allow you to interact with things virtually? Of course, there are important and amazing TikTok effects, there's this trend where you have you show your hand and there's a filter that shows a screenshot of your choice or an image of your choice and then you can swipe it away. It's a good way to present content. There's... I'm not going to show this, it's kind of cute. So I don't know, our good friend Donald Trump used to have a habit of going like this, when he spoke. He has a very strange way of gesturing as he speaks, like this. So people made a pretty hilarious demo called accordion hands, so, you know, I think he's like playing an accordion. Hey it's nothing if we can't laugh right? Anyway, we're done with that.
So that's good. Interestingly, hands are really complicated things. There are 21 points on a hand that you need to be tracking. It's kind of interesting because when you think about it, you've got, you know, a thumb has one, two, three, so that's two, three and four, and then the base where you can actually also track. That's that number one point then the wrist is zero. Then you can also detect a palm. I think that's what TikTok is probably doing they're just checking whether there's a palm and, you know, adding an image to the palm, but each finger has, you know, a lot of articulated moments, a lot of points where you can manipulate. So there's a...
I'm going to share these slides, and they're on the repo, which is open source, but you can take a look at some of these research papers about how to figure out the points of hand and what to track and how to manipulate. So, to use hand tracking on the web there are two main libraries. I'm going to just hop over to show you what they are. The first one is tensorflow (tensorflow.js). The tensorflow models make use of media pipe, which is kind of the gold standard in hand tracking, they've done a very fine job in hand tracking. They also do full body articulation, so the entire body, which is split off in tensorflow. In tensorflow, they use a pose net for just
the 20, 17 body poses and then they do a separate model for hands. Media pipe has done both, so they take hands plus the body poses and you can have something pretty sophisticated being tracked. There's a library called finger pose which is for sign language. It's very specifically. I'm not sure but I bet it's only american sign language, I'm not sure. I'm not very good at fingerspelling, I used to know how to do this. Spell my name is j-e-n-n-y. Y. Y. I can't remember. But anyway, so, that is something to explore but there's a new library
called "hands-free js", so I'm just gonna pull up these websites. So, this one is tensorflow and this is their article on their blog. "Face and hand tracking in the browser with mediapipe and tensorflow.js". So they have the facial recognition, they have hand pose and then have, that's face mesh, and then they have pose net for the body. Funnily enough, their demo is down. So if you click over to hand pose, you're gonna get an error but we won't bother. Yeah, that's the error. So, Google, let's fix that. Mediapipe is a company that seems to be working with Google.
I believe they're affiliated with Google, if not acquired by Google, I'm not sure, but they are doing very sophisticated work like I said with this hand pose and you can do two hands at a time, which is really really nice. They also have a special model just for palm detection. They have their hand landmark, which is this bit, and then, they have a lot of nice sample code that you can, that you can look up. And then there's a brand new library here, taking media pipe and translating it for the web into a npm package. it's "handsfree.js". Very very promising. I talked to the guy who's running this online and he's really doing good job and we talked about like the differences between tensorflow and media pipe for hand tracking. He's trying to find a common ground
between the two and make it easier for us to use, like take a look at this tensorflow hand pose. That here, that has a different sort of... He's not using the canvas API. He was recommending using SVGs to do things with the hands on the web. Lots of good, really promising code. This is pretty new. After I wrote the talk, I actually found this, so I'm excited. Good to see these things being pushed forward. So, there's a technical challenge here which is
that palms, palms are harder to detect than faces because faces have a lot of points that you can check. You can check eyes, nose, you know, lips, ears, but palms are just kind of a blank slate. I mean we're not doing, you know, palm reading which is another fun thing to try. You can do that with apps. I should do a demo. But, anyway, so that is, technically it's more challenge, more
challenging, to just detect a palm, interestingly, and hands have more data points than bodies like i said so there's 21 from one hand versus 17 for a body. The more data points that you have, that you need to track, the more weight the model is going to have so it becomes harder to use it in the browser. Another thing is hands come in all different sizes. So there are, you know, it's, they have to just make sure to catch all of the little points. So, I would love to, I don't have a child available and I'd love to track their hand. So, you can do that in my demo and send me your results. My kids are all grown up. And the data set that they use to create these hand posed models is gathered from real world images and also synthetic rendered images, so it's kind of interesting how they worked through figuring out, you know, the differences between reality and synthetic rendering images to create a performant model.
So, media pipe, like i mentioned, it's used by tensorflow.js and it's using multiple machine learning models. It's using a model called blaze palm to get the palm detection. It returns a hand bounding box. It's like there's a palm and here's a hand. The hand mark, hand landmark model operates on a cropped image region and returns 3d hand key points. And then it's efficient because it's using these two models. It's grabbing the palm and if there's no palm it just stops, but if there is a palm then it continues on to detect a hand. So it's using two models and it reduces this need for data augmentation,
for cropping and rotating and scaling. And then (oops sorry) it can detect its capacity towards prediction accuracy and then the pipeline is implemented as a media pipe graph. So there's a lot of complicated technical stuff going on to create the model and then to render it and it's really useful and interesting to take a look at some of the papers that are written. On this, body posing, is completely different. It's done in a completely different way than
hand posing. Who knew? Fascinating. So the trick is... It... to make the best hand shadows you really need two hands. Did you you remember, you know, to make a bunny properly, you want to have a little bit here maybe a little tail. Media pipe allows for that, which is nice, and tensorflow.js does not, it only has one hand to keep the model size down but unfortunately media pipe does not allow you to style the shadow hand. It does... It expects you to be rendering the hand the way it wants you to render, and it doesn't give you a lot of creative freedom to get away from the canvas or to really leverage the canvas API or do more interesting things with it. It kind of locks you
down a little bit. So, unless we want to reverse engineer media pipe, which I'm not eager to do, I would just please ask tensorflow.js to give us two hands. Please. Please. If you're listening, I'm a GDE. Let's go. Because we need to tell stories with our hands and it's important. Okay so what we're going to do is we're going to try tensorflow.js. We're going to hope for
multi-hand support sand we're just going to do hand shadows with one hand. Because, like I said, media pipe doesn't allow the styling of the hand skeletons and its purpose is really more mechanical than artistic and tensorflow allows you to just do whatever you want with the canvas API. Although, it is a good suggestion to try maybe something different like SVGs or something.
So, let's go. What we're going to build is a virtual shadowgraphy show. So we're going to open up the webcam to show our hand poses and we're going to cast those poses styled like shadows onto a canvas. We're going to allow the user to record these shadows as a story. So we have three tasks. So let's do this. So there are a couple of design and architecture decisions that you're going to be needing to make. You're going to need a base architecture. You're going to need to manage that webcam nicely. Implement your hand pose and get the model to give you, keep the key points and then draw what it looks like. You're going to need to then cast your shadows onto yet another
canvas and you're going to need to deploy this whole web app, so that your friends can use it too and I've done all of these things for you. So, I hope you're grateful. Anyway. So let's talk about the design and the architecture. I have some code snippets here we'll just walk through these. I am a Vue developer so I tend to do everything in VueJS. Well, recently I've been using Vite which
is amazing and fast, it's amazingly fast. If you can take a look at the code base you'll see that one of my components is really large, I think it would benefit from an upgrade to Vue 3 with the component API. I haven't gotten around to it. I welcome pull requests however. If you would like to take a look at the first piece of any application, I always go to the package.json. You can see your dependencies. So, here we have tensorflow and there's a bunch
of installations tensorflow.js has broken up their package so that you need to install the specific model that you need which, for in our case is hand pose, and then you need tensorflow.js so that you can use that model for the web. You need then several back ends to be installed as well. They've broken everything up into backend so I'm going to use the WebGL backend. And then you need the core tensorflow.js package. I used bulma for styling. Use whatever you like and then there's a couple of Vue packages.
At that point you need to think about how you're going to set up the view. So, this is a pretty simple two-page app. You start the show and then you welcome your users and click and then you start the show it's only two routes.
So, in the second route is this big component called show view. The first piece of the template is this video tag here. So this is where your webcam is going to be cast and you can see that there's a canvas, an output canvas that lays on top of the video canvas.
You can see how this looks. So i'm going to show myself in my hand and then that output is going to be where I draw the skeleton. Then here I have the shadow canvas down here which is next to it, and I'm going to, i'm going to take my key points and re-render them onto the shadow canvas and play a little bit with the canvas API, limited as it is. We do the best we can. So,
you're going to see in the code base a lot of asynchronicity because these are relatively heavy models, and the reason tensorflow has not released double hand poses, I'm sure, is because these models are just pretty big and I think they're trying to be a little bit respectful of the browser needs. Remember, 21 key points, you know, that would be another 42, as opposed to a whole body which is just 17. So, that's a lot, it's a lot of key points. So I always mount my processes asynchronously, load up the model and then, you know, just wait for that to load and then set your message to say "okay model is loaded". Then, I'm going to start
up the webcam. So, first model and then webcam and then I'm going to start looking for a hand. When you're setting up your camera, just make sure to handle any unavailable cameras, always work asynchronous, asynchronously. So, here we're grabbing our video as a stream and we're starting to capture the keyframes and for each keyframe, we're going to add 60fps, we're going to be drawing and redrawing and redrawing your skeleton hand. Which sounds scary but it's not scary. It's cool. Then you need to design your hands and design your shadows, and this is interesting because you would think that out of the box tensorflow.js would say "okay, here's a hand. I'm just going to draw you some key points automatically" Fortunately for us
that's not the case and the reason it's fortunate is that we can then really play with the way we draw our skeleton. So, here I have the first skeleton being drawn, right on top of my video. So, I draw a clear rectangle so that I can see the video through this overlying output box. I set the stroke style to red and the fill style to bed of the lines that i'm drawing. And then, I flip it around so that the hand matches the video, and then I do the same thing for the shadow box next to it, so I'm going to be casting black shadows on a white background. So,
that is, you know, the amount that you can kind of play with the canvas API. Canvas API has something called shadow blur and I set that to 20 and you can play with. That it would be kind of cool if I added a slider in the web application to maybe, you know, tinker around with the shadow blur and the shadow offset. If you add the offset, you're going to have the actual hand here and then the shadow is going to be offset so that you're not going to see the hand, so the hand is going to be in white, the shadow is going to be in black and the background can be white so the hand kind of hides away and then all you see is the shadow. So that's how we're going to kind of simulate casting shadows, because, luckily, the canvas API has a shadow capability. So, for each frame
and the webcam, you're going to draw your key points and here's where we're starting to make predictions according to the model. It's estimating the hands and figuring out where the landmarks are and then drawing the key points. So, it's going to take the key points for each animation frame and draw it, both onto the output of the video and also to the shadow canvas. And you have to clear each time. Now this is interesting. I'm also lucky because tensorflow.js gives us a lot of flexibility. You are going to create your hand by setting the indexes,
the indices, and I tweaked this because, if you saw, if you remember the original hand is like a line all the way to the wrist. Line, line, line and it ends up looking like a garden rake, you know, when you cast the shadow. It wasn't very pretty so what I wanted is I wanted to redraw it, so that the palm is more like a square or a polygon and then, I would just draw the indexes. So I changed the palm to not come to a point but actually to go around.
So that's nice that you have that flexibility to draw to canvas with the points of your choosing. And then, you know, you're going to have this palm area drawn out as well. Okay. So, then we have to figure out a way to tell our story, we've got our video loaded up, we're getting our key frames queued up and we're showing our skeleton hand and our shadow hand here, you have to figure out how to tell your story somehow. Well,
I went ahead and created a speech to text implementation using azure (why not?), using cognitive services speech to text and this was kind of an interesting and fun experiment. I've never dealt with speech to text using this type of thing. I know that Leonie did a talk. I believe that the API she was using is text-to-speech and is looking for groupings of words to check for. This, of course, is using machine learning so it's really,
you know, a text detection. It's a full powered cognitive service that you can use. You can set the language and I regret that I did not have time to make a switch, to switch english to french. Maybe I'll add that after this talk so that you can test your, tell your stories in english or french. Right now it's set to english. And, yeah, you can just go ahead and create this cognitive
service and allow your story to be captured and drawn to the screen along with your shadow. So, you're going to connect to your speech service, which is a cognitive service, and you need to store your key somewhere. The key that I just set up in that cognitive service. So, I took the key and I posted it into the azure static web app which is where this whole show is stored and it's very helpful that you can just create a little api for yourself just to grab the key so that you don't expose your key anywhere in your code base. You're gonna save it in your configuration in your azure static web app. So it's very helpful so I just use "axios" to get the key to the cognitive service that I set up, and then I'm configuring the audio and starting the recognizer once my subscription has started up. And the recognizer has a couple of methods. One of them is "recognizing" so this recognizer recognizing
will give you what it thinks it's hearing as it goes along as it's listening to your speech, but I chose "recognized" because what it does is it recognizes words and then the rec... Let me say that again. The recognizing method just takes the words that it hears but recognized takes the words and assembles it into a sentence that kind of makes sense as according to what it's hearing. So it really is a smart machine learning based cognitive service. It works pretty well
actually. So, it's just asynchronously start your continuous recognition asynchronously, so just listens asynchronously according to the speech service you set up. So that's pretty cool. So, then we have to figure out, once we've got our shadows cast to the screen and we've got our speech being picked up so that we can write the story to screen, we need to find a way to somehow save and share the videos. So I tinkered around with this a little bit. I didn't... I originally had a little download method. It didn't work great in Firefox so I ripped it out and I just had the video being shown under the story, so you can at least replay it, and show your whole story as it's written to a user and then you could probably, you know, export that video and paste the story into an email and send it to someone else or, if the story's short enough, or maybe just take screenshots and post it as a tweet. So, if you feel inclined to do this and if it works for you, use the hashtag show me a story and create a shadow story for me. I'm really curious to
see what you create. And you can fork this repo and do whatever you like. Last thing I needed to do was a deployment, so it's going to live somewhere. Let's just go ahead and post it up to azure static web apps. Azure static web apps is going GA pretty soon, probably for build,
which is in may and I'm pretty excited to use this. I'm starting to post all of my code into that static web apps it's a really nice service for Vue, for VuePress, for here static sites and there's a very nice implementation with visual studio code, that you can do if you have the azure extension in visual studio code. You can just, you know, create your static web apps and just pop it up and it's still in preview but pretty soon it's going to go to GA. So this is a really really
helpful way to manage your azure functions and then your static web apps and push everything to production right from within visual studio code. That works really well. So I encourage you to try. So, now the moment you've been waiting for. If you're looking to become better at hand shadows, because I know you all are, the Ballard Institute, and medium a museum of puppetry, has a nice youtube channel and she gives good tutorials on how to make hand shadows. So, if you would like, you can go to aka ms ombromanie. The code base is aka ms ombromanie dash code. Hashtag show me a
story and go ahead and create your own ombromanie. I'm going to just reduce the size of this and open up this link. Let's see. So, the trick with giving a talk like this, which is already using my webcam, is to use a different browser and try to cue this thing up. Let's see, this is my wi-fi, okay. Here is the app and you can see it's an azure static app and this is the first page so we're just going to enter and here comes the model, that was pretty fast actually, that's good, and here's the video. So, please, do not freeze, okay. I'm gonna hope that everyone can see this. I'm gonna tell you a story. So, let's make sure that we can cast our hands. Yes, we can
cast our hands. So, you can see the shadow and you can see how I redrew the palm here because normally it would be like all gathered here. Oh that was interesting. Anyway, good fun. Yes. So, tensorflow with just one hand but that's okay, because you can see how the shadow is cast onto the shadow and you can see right there, you can see the white casted shadow, the white hand, which is hidden against the white background but it's covering up my hand because I'm over here against the edge of the screen. I'm gonna go back over here. So, show me a story. I'm gonna tell you a story. So, we're gonna go like this. You can probably hear my husband teaching
across the wall but we'll try to, I'll try to tell you a story. Okay, so I'm going to start recording and let's see if we can capture a story. So watch for the speech up at the top. Once upon a time There was an egg It was a beautiful egg. Its mother named it George. One day, the egg started wiggling.
And jiggling And "up" out popped a little creature And it looked around. It looked around for its mommy. Mom... Mommy... Mommy But its mom was nowhere to be found. What kind of creature was it? Was it a goose? Was it a duck? No. It was a baby dinosaur.
So, it cracked out of its leathery egg and found its feet and lifted its head and went to find his mommy. The end. So here is the story that I just told and here is how I can play my recording.
So you can see "once upon a time there was an egg it was a beautiful egg its mother named it george and one day the egg started wiggling and jiggling and out popped a little creature". And see if it captured everything. Yep! Here comes the little creature. It's wiggling and jiggling. And here's the little creature, there's a little creature. There it is. Mom! So you can have some fun with telling stories on the web and saving to... saving from canvas to
video just by using a couple of lines of code, and I would love to get your ideas on the best way to share this. I always love the idea of creating like greeting cards, like a grandparent in this pandemic could, you know, create a little hand story and share it to a grandchild. And I think it would be really really fun so I'm just going to leave you with a little dinosaur who walks away to find its mommy. I hope it found its mommy. I think it did, but maybe you can tell that story on your own. So, that is ombromanie and I'm very very grateful that you gave me the opportunity to create this talk special for this web stories conference, so I'm just going to wave "bye" but I'll hang out for questions.