A.I. in Multimedia Classical Music: Presentation, Live Demonstration, & Discussion
All right, welcome everyone, thank you so so much for coming to this interesting presentation that we're going to be giving today We've got a little bit of everything: a little bit of explanation, a little bit of performance, and then very excitingly I'd like to invite you all to join us for a discussion at the very end to talk about what you've seen, and love to hear your ideas as well. I'm so fortunate today to be joined by, from Indiana University's Informatics Department, Professor Chris Raphael. [Applause] And my sister, Kaitlin, who's a PhD student studying under Professor Raphael also at Indiana. In addition to their technological skills and expertise, both are also very well accomplished oboists, which is very exciting, and I'm also so happy to have Professor David Shifrin from our own Yale School of Music here and later on we will also have a special appearance from Professor Frank Morelli which is also been very exciting.
So, as just a broad overview of what you're going to hear today, We're going to be talking about multimedia performance: what it's been traditionally, what we are doing with it, the changes and Innovations we're making in this sphere, as well as ideas for how multimedia performance can continue to grow in the future. This is our ending discussion which I hope you guys will be able to take part in. But to look at what we're doing now, right we need to look at what happened in the past and what has been done for the 20th and 21st centuries. So, the traditional idea of a multimedia performance is this "fixed media" idea. A very famous example is musique concrète, where (forgetting sort of the technical things that make it musique concrète) basically you have a fixed audio that plays, oftentimes with a live acoustic ensemble and there needs to be some sort of way of coordinating the acoustic players with the tape music, and either there's a conductor or the musicians will just know the tape well enough to be able to fit in. But the tape doesn't move and the musicians move
around it. Similarly for visual media, silent film is a very famous early example, where you would have live soundtrack performers, so sometimes just a single pianist, or organist, or even a full orchestra, and the way they would stay in time with the film (which again doesn't move or it can change to the performer) those musicians would have cue sheets so they would know oh there's this event happening in the music that needs to coordinate with a certain scene in the film a more modern example of course would be this "movies at the symphony" idea. If the New York Philharmonic presents Harry Potter and the Sorcerer's Stone they will have either a click track in the musician's ear with the conductor as well so they don't run away from the film.
A slightly more modern example and a little bit more flexible is the use of Bluetooth cues in multimedia music. So basically what happens is you can have several different tracks of multimedia, usually audio-visual, that you align with your performance, and to move between those you can activate (usually a Bluetooth) pedal that will change between these different sections. So you have some control over the timing of this, but ultimately what happens between each cue will stay the same. So the idea that I'm getting
at with all these examples is that the media is a mobile and the musician must accommodate around it. Now I don't know about you the rest of the musicians in... For the performers in the audience-- I personally don't like playing with something that is fixed. I think we love to have freedom when we're performing to make spontaneous decisions, and you know we'll do the metronome work in the practice room, but I think a lot of the fun of performance, especially classical performance, is really being able to push the piece and see how far it can go.
This is going to lead to inconsistency between performances, but then if you have to conform to a fixed partner having this expectation, that can then feel very limiting. One of the ways that we're trying to work around this feeling of limitation with a fixed track is by incorporating a live score follower. In this case we're using Cadenza, a.k.a. the Informatics Philharmonic, and I would like to invite Professor Chris Raphael up to talk about the genesis and give us a little background about this program. First before I say anything, I want to add special thanks to Kaitlin, who put this whole connection between the four of us together, did a major part of the technical heavy lifting, and also as part of the creative vision behind this. So thanks very much to her, and also thanks
to Nikki, who I'm meeting in person for the first time today, for the beautiful artwork and for reaching out to Joan Tower and making that connection, and also for being part of the creative vision, and also thanks to David Shifrin. It may seem like we sort of come from left field here--I'm in the school of informatics (computer science). We are from a long way off, so I appreciate the openness to our way of doing things, which is a little bit different. When we have a performance that fails, well, we think of bit more as an experiment rather than a performance, and we try to get the next one to be a little bit better--it might be. So I appreciate the
openness to our way of doing things, which is a little bit different. We try very much to to fit within the framework of of musicians--I think it can't all be one side coming to the other, there has to be a little give and take on both parts. So a little bit about the history of this. I grew up playing in the orchestra. That was my first love, and you know I saw many iterations of the same experience of having a concerto soloist.
Maybe it's a violin, or a singer or, a piano, and the first order approximation of the way things are supposed to work with the soloist is the orchestra follows the soloist. It may be a little bit more complicated than that, but it's a you know it's a good way to think about it for starters. And so, not just the player's instrument but the whole orchestra become sort of this extension of the thinking and the interpretation of the live player It's a lot of power to put in somebody's hands, usually for good maybe not always. And why I think that from a psychological point of view, the end result, I'm thinking about it from the player's perspective, is a really wonderful experience. It's aesthetically meaningful, it is instructive like one wouldn't believe, it's fun, it's just good from a lot of different angles.
But the sad part of it is the demographics of the situation: maybe a fair world say there are 50 people in the orchestra and you're in the orchestra maybe one out of 50 times that would be you, but of course it doesn't work that way. Maybe you play the contrabassoon and you're thinking there are no contrabassoon concertos, but it's not true there are, but you still don't get to do it your fair share of the time, or maybe you are the third desk violinist, or something like that, quite deserving and talented in your own right, but according to the hierarchy of the ensemble, that's probably not going to be you. And when all is said and done, this is an experience that most people get infrequently not nearly as much as they probably deserve and and I'm also a scientist, and I'm a computer scientist, and it makes me think it's time to bring this aspect to bear on the problem: what if we had a computer that knew the score of the piece that we're trying to play? And what if it could follow you as you play and track your performance through the score? And maybe, even better, what if it could just sort of predict how the future was going to evolve? Which is, of course, something that you need to do if you're going to generate an accompaniment or another part that follows. Maybe it could even learn through rehearsals like people learn.
This was the idea. Of course there are couple decades after that initial conception of what might happen to get to the point where we are now, but we have built these tools and we're at a point now, maybe the original conception was about concerto performances and raving critics and loud applause, and things like that, and I think it's moved considerably further away from that. I don't really have too much time to go into that but I would say we're at a point where we're trying to rework, re-imagine (I know that word so overused), to rethink the different ways that we might be able to bring this technology to bear in musical contexts. So I don't think I say too much more
about it than that now because that's what the whole evening is about, and I'm so glad that you're here to witness it. The one thing I just reiterate my comment on experiments. We've made a ton of progress but the musical bar is super high.
It's hard to to understand from a musician's point of view how hard it is to do such simple things-- I think of this as maybe it's like the fortepiano: it's the first attempt, there will be future generations-- these we are doing are our first generation of experiments. We made a lot of progress and we probably didn't get it all completely right--well we're still duking it out so there's more to come. That's all. [Applause] [Kaitlin Pet] All right, thank you for that description and your introduction. Now I'm going to go over just an overview of our talk and an overview of those new novel applications. So these applications are going to focus on two major aspects of Cadenza/InfoPhil that Chris touched upon earlier.
One is this prediction, or automatically anticipating someone's musical trajectory as they're playing live during a concert or during a rehearsal, and two is the ability to time-stretch music that is to play back music faster or slower so it can follow a live soloist, or to use it for other applications. All right, so specifically the three areas that we researched are: 1) live synchronization animation, so this is similar to what Nikki was talking about in the silent movie sphere or the "symphony at the movies," so in this case instead of the orchestra being very constrained to follow a video, we're going to flip that on its head, so instead of that the video is going to be following Nikki as she freely changes her interpretation over time, and Nikki's going to talk a bit more about the artistic merit of such a setup. Second, I'm going to talk about a project we did with Yale alumni, faculty, and students on virtual ensemble assembly. These are remote recordings created in a way that doesn't make people use a click track. A click track can be very constricting--as Nikki also discussed earlier, you can't pick your own tempo, you can't do rubato, generally there's a lot of decisions that you may want to make without the click track, but you're being constricted by having that framing. So we were looking at--okay how do we do this but allow people to have that freedom, so assembly becomes a little bit more of a challenge, so we're going to talk about the strategies we use in order to make that a reality.
Last we're going to do a performance of chamber music, but with one part that is pre-recorded by Nikki in the past. So this is similar to the concerto application that Chris was describing, but now instead of a concerto accompanying it's just one single bass clarinet part, recorded by Nikki, and we're going to talk a little bit at the end, or I hope you'll discuss with us some of the ramifications of that setup, so having a trio instead of an accompanist setup. [Nikki Pet] So starting with the live synchronized animation, the example that we are looking at today is Joan Tower's "Wings."
Some of you may have seen a very early prototype last May at my degree recital, and now this is in its full form. I think one of the big questions is: why add animation at all? Can't music just stand on its own auditory merit? For me this is a big question. As many of you may know, I've been spending almost the past two and a half years creating many animated performances, mostly on YouTube, and why would I do this instead of just regular performance? Well, one reason is: it's fun! I like it. But there is a deeper artistic reason for me, and that is really the problem of classical music's relatability to general audiences. For me this is both an imagined problem and a personal one.
Even though I've been studying classical music for about 20 years now, if I encounter a piece by a composer I don't know, using a musical idiom that I'm not familiar with, it can be very hard to connect on that first listen, no matter how amazing the performance is, just because I don't have the tools and the skills to understand what's going on. And if I feel that way, how must a general public with absolutely no classical music training feel about even the most average classical Brahms performance, for example? So the way I approach bridging this gap is through image and movement. I think image and movement are really much more immediately understandable and accessible to most audiences, much more so than classical music, and by reinforcing what's happening in the music through visual cues, I believe you can more effectively tell the story of the piece, especially to these novice audiences. Now this is all well and good when it comes to YouTube videos, where you can just precisely do what you want in the video editor, but becomes a problem in live performances. One of the key components to making these animated performances work is a very careful synchronization between what you see and what you hear. For example, if you have a very loud, fast
passage, you would want to complement that with something that looks loud and fast--so you could use bright flashing colors and very quick movement. But if that very loud looking animation plays during a section of music that's very still and very delicate, it's going to be confusing, it's not going to work, it's going to be distracting, and it will detract from the overall quality of the performance, which is absolutely not what we would like to convey. So as I said before, this type of synchronization is very easy in post-production, but you definitely can't achieve that with just a cue sheet, and even with a click track that type of synchronization is is still difficult to have a precise lineup, and this is especially the case the smaller your movements get. So for example in Wings, you are going to see sections of
flashing dots and each dot will align with a pretty quick sextuplet, and within these sextuplets I will have a little bit of rubato. Some of them speed up, some will slow down as I see fit. As a quick aside about these flashing dots that you're going to see, they do come up quite a few times and they're actually another program created by Kaitlin. So each dot represents a note, and each note is connected by a line, so this line actually represents the interval between those two notes that the dot represents, and the thicker the line the more frequently that interval comes up, the thinner the line, it only comes up a few times. So back to "why animation"-- having something that can follow you and line these up without the performer having to worry while also contending with a very difficult piece of music is really essential to pulling off a performance like this in a live setting. And then lastly I just want to touch on why Wings in particular works well for a medium of performance such as this.
One is that, as this piece of solo clarinet work, performances can be highly variable when it comes to timing, as other clarinetists in the room and potentially saxophonists may know, this piece is an absolute stamina killer. You want to have the ability to be flexible with how much time you take on certain passages, depending on your condition that day. So having that flexibility not being tied down to a fixed piece of media is really important.
Another section or another part about this piece is that it's very well known within the clarinet community and saxophone as well, but much less so to the broader classical music, and I would say not at all to the general audience. But this is an amazing piece-- there's this quasi-programmatic title that is then elaborated on in very vivid gestural motions that, in my mind, paints a very clear picture. By bringing that out in a visual animation, I think it's a way for hopefully me to share what I see and what I experience with Wings with a much broader audience. So with that said, I hope I've convinced you and now I'll let Kaitlin explain how all of this works. [Kaitlin Pet] All right! So the first aspect I want to talk about in the technology that can allow this very tight video synchronization is the ability to anticipate a musician's trajectory in real time. So this concept of anticipation is the familiar one to
most musicians. When you're in orchestra you need to anticipate when the conductor is going to place their downbeat in order to stay with them, and stay with the rest of the group in an accompaniment scenario, either as a pianist or you're just playing an accompanimental role in chamber music, you need to anticipate the soloist's next note position in order to stay in time. And there is that old adage: "If you don't anticipate you're going to be late." So this is a concept that musicians are familiar with, but how do we translate that to language that a computer will understand? We feel anticipation, but what does that really mean in very concrete terms? So what we did was we defined anticipation time as the amount of time from now until the next note should be played. This may seem a little bit foreign at first, but we do it instinctually. For example, if you have, I don't know, a measure rest until you have to come in you need to decide, "when do I pick up my instrument, when do I prepare to play." So we're doing all that unconsciously, but inherent in that is a
guess in when that note is going to happen. So okay, in order to better understand how the computer makes that guess, I'm going to ask Nikki to do a little bit of a demo for us, and we're all going to participate along in the demo. So here on the screen you see a bit of a score. It's a scale, and what I want is for us to clap when Nikki reaches the top of the scale. I could show it or we could just do it together..
Amazing, y'all did great! Unsurprising for this crow ;) So think about what was going on in your mind when you decided to make that clap. I'm guessing Nikki's going at a pretty slow tempo and based on that tempo you were able to extrapolate, "Now's when I should clap." All right Nikki, why don't we do it again, this time I won't help you. I'm guessing initially you were expecting Nikki to continue with the same tempo, pretty similar to the first time, which would mean you would clap a lot later, but then Nikki speeds up which means, "Oh no, I've got to change that internal conception of when to clap" to a time that's earlier than now that's why maybe y'all were a little bit late, but you weren't that late. So this type of changing mind over time is what a program like Cadenza and InfoPhil does in order to track a musician's trajectory and predict those future notes, so as it hears more notes being played, that prediction of "how long do I wait" is being refined based on what the soloist is doing.
So Chris mentioned another aspect of this, which is training. So I'm going to do another exercise to demonstrate this. Nikki is going to do this again but I'm going to tell you Nikki is going to play the same way that she played previously, so she's going to use the same interpretation that she just did. [Music] All right, y'all got it this time Just like you guys were able to internalize that Nikki's going to start slow, then speed up, now InfoPhil, after hearing it twice, is going to be able to remember the previous interpretation, and then use that to inform its prediction. So as you
guys can see, you guys were a lot more accurate the second time, and having that foreknowledge has the potential to increase the accuracy of the predictions. All right. So great, anticipation all well and good, but what does that have to do with video synchronization? Before getting really in-depth, I want to talk a little bit about what animation is at its very core. You can see on the screen a series of moving
images. They look just like kind of a slideshow that's going at a pretty constant rate, but if you speed these images up a little bit, it becomes smooth. Our eye perceives the movement as something that is continuous. The nice thing about animation, as just a series of frames played at varying rates, is that the rate is the deciding factor for how fast we perceive the motion to happen. For example, what we see now is about 15 frames per second, but we could increase that frame rate to something like 55 frames per second. Now the movement is a lot quicker.
We have the ability to anticipate when the musician will place their next note and the ability to control how fast animation is going to be played, so those two things can be put together in order to create synchronous real-time video. Here's essentially how it works: if a player speeds up, the software is going to anticipate the next note to happen sooner just like you guys were able to anticipate that based on Nikki speeding up. Because of that, the animation frame rate is going to increase, the movement is going to be faster. Conversely if Nikki slows down, the software is going to anticipate the next note to happen later, and the frame rate is going to slow down.
At this point Nikki is going to perform Wings for you guys, while she is setting up I can take a few questions if you guys have any questions or comments at this point. [Audience Question] Is this considered a type of machine learning? [Kaitlin Pet] Yeah, it would be considered a type of machine learning, especially the training aspect where you're learning from previous patterns in order to improve the current setup. Chris can expand on that--oh sure I will explain for machine learning is. Machine learning is when you use a computer program to learn patterns from data. In this case we have, especially in the training part (I think the definition Chris is better with the definition of machine learning) but especially in the training part it's taking data from your first take and using it to inform the predictions of data from the second take. Any other questions? Thank you for your question by the way.
[David Shifrin] Have you had much interaction with the film industry? [Kaitlin Pet] We have not, that's definitely some people I want to talk to because I think this type of technology would be really cool for them. I know directors sometimes have very clear ideas of when exactly they want shots to happen-- we've encountered similar sentiments when talking to more fixed media composers where they say, "Okay I designed this piece to have these different times, then why is it stretching?" So film makers may have a similar outlook but I'm still definitely very interested in talking to them and seeing their take on this type of technology. Thanks for your question [Music] [Applause] [Kaitlin Pet] Thank you for that fabulous performance! This is the first time I've seen Nikki perform this in front of an audience and she sounded amazing, and also her animation looked just so amazing. [Nikki Pet] Praise me more! I love validation. [Kaitlin Pet]. I guess before we continue I have one more technical note as well. You guys can
notice that the actions usually were aligned exactly with Nikki but sometimes it was a bit early and sometimes it was a bit late. So that's because it is a prediction--the computer is making a guess of where Nikki is going to play next, and it's usually right but it's not going to be right 100 of the time. [David Shifrin] Does the computer have the score? [Kaitlin Pet] It does, yes, so it's using the score in order to make this prediction, so in the example we saw you were able to know where to clap because you saw the score, and there was that label saying "clap," so Cadenza is working the same way where it knows the score and it's using that in order to do this tracking, and in order to figure out when it is the best to place those notes, very similar to as if we're listening to a concert and we're looking at a score following it along. [David Shifrin] I wasn't sure if it was reacting very very quickly or actually following a map.
[Kaitlin Pet] Yeah, that's a really interesting point actually, so without this type of predictive technology we can't actually get something like video control because you need to interpolate into the future to know, okay, how fast do I move these frames in order to meet her at the next note. So for this type of technology you actually need that prediction. There's other types of score-following based quote-unquote technology that is more reactive, so those are two different classes of technology. [David Shifrin] So if Nikki were to play this again now... [Laughter] [Nikki Pet] I think I would pass out...
[David Shifrin] Would the video have been trained to to have a faster reaction? [Kaitlin Pet] So you actually decide explicitly after each run whether you want to train or not. So in this case Nikki clicked no to training so it wouldn't have have trained, but if she had clicked yes then it would have learned her pattern. For something like Wings that may not... [David Shifrin] That puts the performer into a box, too.
In order to play with the trained computer you have to train yourself. [Nikki Pet] I will say for Wings, there is some training in it. I did most of my training over the summer to really get it locked in, but the problem is sometimes you train it too much, and if I'm really crapping out at the end and I need a lot of time, I don't want it to rush me.
So yeah for a piece like this, actually it's better to keep the training a bit more minimal. [Audience Question] You'd never do this, but like what if you played a measure twice in a row? [Nikki Pet] Sometimes it'll catch it, if it's a small enough mistake, and sometimes it'll crash and burn horrifically. It sort of depends on the scale of how bad was was the mistake, and how far off is it thrown, because if you keep going sometimes it'll catch it it's like a like a real person if you're in a rehearsal, if you can recognize where another person is, you can get back on, but if you just have no idea what's happening you're just going to sit back and wait for it to grind to a halt. [Audience Question] I like what you said about anticipation before--so when you, I don't know if you've experienced this, but if you play it on a different machine or different system does it know its own latency, or do you have to sort of factor that in--who learns that further enough, how do you fine-tune that? [Kaitlin Pet] So in this setup specifically if you notice, oh I guess you can't see it, because all everything's happening on this computer, but we have both the animation control and the score- following happening on the same machine. So what that means is, I think I measured the latency at one point, but it's not more than 10 milliseconds ,so that's something that we don't need to worry about whatsoever. If we did do out on different machines, then we would need to do some tests to measure, okay, we set the signal now, how much time until it receives that signal.
All right, good questions, we'll move on... So next I'm going to talk about a project that we did with Yale faculty, students, and now alumni, which is virtual ensemble assembly. So as I said before, this project is about creating remote performances where players don't need to use a click track, and as I touched up on earlier, as well, there's a lot of reasons why we don't like click tracks.
It's constricting, you in you can't do exactly what you want, so we were saying, okay, without a click track can we still assemble these pieces? And the answer, as we're showing you, is yes, but the way we do it is basically stretching these parts so that they fit each other. But there is some complexity to that, so before I go more into our project with Yale, I want to quickly touch upon two previous assemblies that we did, and what makes this project with Yale a little bit different. We previously worked with a clarinet trio at the Jacobs School and an octet at the University der Kunst in Berlin, and in those two instances the players either knew each other, or were very familiar with the piece ,especially for the Jacobs group, they were rehearsing Brahms Clarinet Trio before the pandemic so they knew how each other played, they knew how the interpretation should go, they knew their vision as a group, essentially. What we're doing for Yale is very different, so in this case everyone played the piece without any preconceived notion of how it should go, so everyone picked their own tempos, everyone pick their own interpretations, someone might do some rubato somewhere, other people might do it elsewhere. We didn't align note lengths, or anything like that, so specifically what we did, we reached out to David Shifrin, who helped put in touch us in touch with a lot of amazing people at Yale, and we ended up picking the Mozart Serenade in E-flat, Adagio movement, and our group is listed here: Mr. Shifrin and Nikki for clarinet, Mr. Purvis and Olivia Martinez on horn, Mr. Morelli and Eleni Katz on bassoon, and Chris and I on oboe. So we never played
together before, I've only met a lot of you guys in person today, so what we did was, we didn't know each other, we had no preconception of the piece, so when I got the recordings in, everyone had very, very different interpretations. Some people took the Adagio pretty fast, some people took it a lot slower, some people wanted a very smooth articulation, long note lengths, other people were more bouncy, so to give you guys an idea of how this sounds without any processing, I'm going to play you guys a take of just taking all the recordings that I got raw, stacking them on top of each other as if there were a click track, quote unquote, and seeing what we got: [Music] I think I've got enough of that, so initially it sounds it might be promising, a little out of tune, but this might work and then very quickly everyone pulls apart. So something that we're not going to explicitly talk about in this talk is tuning correction, and that's actually relatively straightforward. We just assume everyone should be at A440 and adjust accordingly, but what I'm going to focus on here is the timing correction aspects. What you heard was kind of a mess in terms of synchronization. Nobody was together, so how do we change that?
The answer is audio stretching. Earlier I showed you how video can be stretched to make us perceive animation go faster versus slower, and with audio you could do something very similar. It's a little bit more complicated because if you don't do this in a careful way the pitch is also going to change. We want the pitch to stay steady, but this is doable, just take my word for it, it's doable.
So specifically how we processed each part was like this: we would first take the score of each instrument, and in this case the clarinet one part first two measures... So we first have the clarinet score, we have the recording time stamp, and what we want to do first is find the time stamp associated with every note in the clarinet 1 part. So in this example let's say the first note starts at time stamp 1 second, the second note starts 2.5 seconds after the start of the recording, and so on and
so forth. We'll go through and label everything and then we repeat that process with every other instrument, so we've got clarinet 2 recording, imaginary clarinet recording, starts near the beginning, the second note starts about 0.5 seconds afterward, and then so on and so forth, we do this labeling. So if we just stack these two hypothetical parts on top of each other, it's going to sound a lot like what you guys just heard. It looks like here none of the notes are lining up-- especially egregious is the fact that clarinet 2 is supposed to start after clarinet 1, and in this version it's starting quite a bit before. So one way you can approach alignment is: I'm not saying this is a good way, but one way you could approach alignment is making everything line up with clarinet 1, so here I'm highlighting the clarinet 1 note onsets times, and all we've got to do is move each of the clarinet 2 notes to a good spot, and stretch their length according to conform with the part from clarinet 1. As you may have guessed from the arbitrary nature of this, there's an infinite way number of ways to put together this assembly in such a way that clarinet 2 and clarinet one are in sync. We could stretch the clarinet 1
part to fit clarinet 2, or we could do something really ridiculous, like this. So if we look at the alignment, okay, looks like they're still lining up but take a look at how the clarinet 2 part was stretched. These are supposed to be straight eighth notes--all these little pieces are supposed to be the same size but you've got a huge range: some of them are more than a second some of them are more like an eighth of a second, so if you listen to this it would sound just like every note is a different length, wobbling around like crazy. So now the question becomes: how do we create a good virtual assembly, where a good virtual assembly is something that's tasteful, something that the musicians feel reflect themselves? What we did for this Mozart Serenade experiment was find the final stretch configuration that compromises three goals: so first of all, because it's Mozart we want the tempo to not wiggle around too much. We want it to be fairly
steady. Second of all, obviously the parts need to be in sync, and third of all we want to the original parts' tempo patterns to be preserved, so if someone does some kind of rubato, maybe takes a little ritard. somewhere, we want some of that to reflect itself in this final recording. A quick note on these goals is that they're not all, they don't naturally go together. I noted that everyone recorded their part at a different tempo, so what tempo are we going to pick? Naturally just by picking one tempo we have to change everybody else in order to conform to that. In addition, with the parts being together and retaining the original tempo patterns, if you care only about retaining people's original intent, it's not going to be in sync, it'll sound very similar to what you guys heard in the beginning. So we're going to have to stretch
everyone a little bit in order to get that synchrony, and just a reminder from before, before we listen to this I wanted to remind you that we didn't discuss any of this at an interpretational level before sending in our recordings, so what you're hearing isn't something rehearsed in the traditional way where people decide on an interpretation together. Before we go on, I'm remembering there's two points that I forgot to mention about the video. So first of all, you guys see that there is video as well as audio, so the speed of people's movement in the video reflects how much we had to stretch from the original. I don't know if you remember, there's one part where I'm in uber slow motion, and that's because I didn't wait long enough for my rest. So you had to really stretch out that section. Another note is the mixing is not automatically done. I'm sure you guys doing pandemic-related virtual performances have run into the problem of how do we mix this after the fact--so initially we were researching mixing but we decided for this version to actually do it by hand, so those are all hand done, and not machine learning or anything like that.
So now we want to have some comments from people who participated in this project. [Nikki Pet] First we're going to invite virtual Mr. Morelli to share his thoughts. Finally he reappears! [Frank Morelli] Nikki and I just listened through the finished product again, and I'm looking forward--she said she will have for us at some point the actual takes from where we started, which I think would really make it even a more amazing accomplishment that she was able to produce. The coronavirus, the the fact that we all had to go remote, created opportunities, challenges. For me, personally, I had never learned that much about remote recording, or even putting things together in my own basic way. I then
started trying to learn to do things in GarageBand, the most basic of of apps, to use for this type of purpose. And so when the idea was presented to do this project that we're talking about today, I found that kind of astounding, knowing how much trouble I was going through in my own limited experience and lining things up, using remote recording so students, and then putting them together. So watching the process, being part of this process was was really enlightening. The finished product, I think, speaks for itself. That's not where we started, and in a way intentionally we weren't supposed to start there, so I can't even imagine all the possible practical applications for this, but I really look forward to seeing what the future holds with this type of technology. I guess a downside, if you want to also
look at how that technology works, and how it affects us all, would be: often when in my experience doing any sort of remote recordings and then putting them together, you might have, perhaps, the bass instrument, a bottom instrument who has first recorded his or her line, and then one builds upon that, and in that way you get some of the satisfaction of playing with someone else, so I guess I do miss the in-the-moment collaboration that would come from having more than one person at a time. Now I guess in the future there's always the question of latency and all that, but the idea of being able to do some of this, maybe synchronously, but at a distance knowing then that the parts that were difficult to get together, or knowing okay don't be too uptight, let's play together, we can fix things. I've had myself limited experience that way, in, if you want to call it "commercial recording," popular recording.
You know, we spend our lives in doing acoustical recordings--you find a great room, you get a great engineer, everybody plays at the same time in that space. I've had limited experience sitting in my own little cubicle, and four or five other people in their own little cubicles, and then later fixing everything up, and it made a giant difference to have that opportunity, and of course then we were laying it down together instead of individually I mean it's it's hard for me to even imagine where this could go if one is thinking of a way of doing it so that there is some sort of rallying flag, you know one thing already laid down or something. I mean, the case of commercial music you have a thing called "The Rhythm Section," that's why they call it the rhythm section, right, and so everyone is sort of playing off of that, as opposed to our "classical" approach of physical, we're obviously listening to our own kind of Rhythm Section, but it's not as as regulated as say piano, bass and drums, in the case of, say, the popular music or jazz type music.
[Nikki Pet] Great, thank you virtual Mr. Morelli for your comments. Now I'd like to invite Mr Shifrin, do you do you want to share your thoughts and experiences? [David Shifrin] I don't think any of us quite knew what to make of this project before we did it. I said, "Sell what tempo are we going to play it, at what are we going to tune to? No don't worry about it." Those were the instructions, just play. Of course we came to it with very different experience levels. I know Frank and Bill Purvis and I have played this piece, you know, many, many, many, many, many, many times, with one another and with lots of other people, and have very specific ideas of how it goes, and I think I hear some of that in this performance.
But ultimately I think it's really important who does the mixing and editing and stretching, so Kaitlin's stamp is all over this interpretation, and I think she did a marvelous job of of bringing order to chaos, really incredible. That being said, I think, as Frank put right at the outset, the interest for this kind of thing, in this technology, was vastly heightened by the epidemic, and I think some of you in this room were in a-- Stephanie was in a group that we coached to play the Nielsen Woodwind Quintet, but it was very very different we had click tracks, and tuning, and we would put together sections, and then the last movement of Nielsen's woodwind quintet, for the uninitiated, is a theme in variations, so we had very finite blocks of music that we could do and we only had to fix one thing at a time, and if one part wasn't quite working, that person could re-record. We didn't have that for this, you had to work with what you had, which was even more extraordinary.
I would love to hear the first mix again, a little more of that, as Professor Raphael mentioned earlier that--oh no it was a, no it was you in your email response to hearing this online, when you sent out both to those of us who had played, that it was especially amusing to hear the end of it where people finished, and that's it. And I think that would be interesting for the audience for everybody to hear, just how chaotic it became and it was like the the last half minute of a marathon with people crossing the finish line. [David Shifrin] It's like walking down the practice room hallway with all the doors open, not really recognizable is the same piece, or maybe walking into a rehearsal where everybody's just warming up on different things and that you were able to make that conform to the score in the tempo, was remarkable. It seems like you chose note lengths that had the most common denominators.
You know, you didn't choose my note lengths in other words... but that's fine, you had to make it work, and I was I was interested to hear it with the second clarinet part played in a way that's not exactly how I did it--that just remarkable. So just talking about the process, is just extraordinary that you could do that. One of the things that just occurred to me about this project, and about the ways that Yale School of Music dealt with isolation of wind players that I thought would have been a remarkable test case for this technology, was that in the basement of Sprague Hall across the street, that our amazing recording tech folks put together a series of, essentially nine rooms, eight practice rooms and a studio down in Jack Vees's domain, and allowed wind players, who were not allowed to congregate during the the height of the pandemic, to play chamber music together and to have orchestra sectionals with as many as eight players and a conductor in the studio. And I started thinking that that would be a fantastic way to record chamber music and just fix the broken parts, but I was dismayed to learn the wired-together rooms--and the beauty of the wired together rooms was that we could record, not just record but rehearse with no latency because they were wired with such minimal latency that it was not noticeable to performers, whereas there's no program that I'm aware of where you can do that on the internet yet. But it occurred to me that having a setup with multiple rooms, much the way pop and country music has been recorded for decades and decades where people are in isolation chambers with headphones on, but performing together, or in cells, like the Rhythm Section could record something, and then somebody else can record at another time, sort of the way Frank was was describing and this was something I never thought would fly and to be perfectly honest and candid, I'd far prefer to get the eight of us in the room and actually have a rehearsal and then then record it, but it does open the door to wondering about what the most practical uses might be. I mentioned film scoring earlier
because I've done a lot of that earlier in my in my life, and the two ways to synchronize music with with the picture: a streamer, a line that goes from left to right and a conductor has to adjust the tempo and make sure that the upbeat lines with the streamer so the downbeat is on the next scene so that the queue will line up, or just mathematically divides a click track so that the score, which doesn't allow-- it allows for beginning and end, but it doesn't allow for a lot of flexibility and nuance within the scene, unless the composer rewrites the music and makes a ritard. here, and speeds up another place, but it occurred to me that if the music were just recorded at approximately the length of the scene, that this technology would would make--oh, it would save the studios a lot of money because they could synchronize the music with a lot fewer takes. Don't tell the Musicians Union that I suggested this When Frank Morelli and I first virtually met Chris Raphael, we recorded for the Cadenza app, the last movement of the third of Beethoven's Duos for bassoon and clarinet, which is also a theme in variations--which we chose for that reason, because we could do it in sections and we were going back and forth with metronome markings, and playing with metronome on headphones so that we both played the same tempo, and this bar will take some time so let's put an extra beat in here and there, and a lot of people were doing a lot of that kind of thing, but I think it was Chris who said, well don't worry about that so much because you knew, I guess, that you could apply some of this magic and make it look like Frank and I-- but we actually did rehearse a lot virtually I'm going to end my talk by asking you. You talked about there's more in the future, and I think I'd really like to hear what the future of this application and other applications to your to your technology might be so thank you. [Applause] [David Shifrin] Maybe you have questions for me, as a live participant in that experiment.
[Kaitlin Pet] Yes please, everybody questions open to anybody's experience... [Audience Question] I once watched an interview with Pavarotti, and he said he doesn't treat seeing the Opera as a job, he treats its as enjoyment. So a collaboration is definitely a big part of that, like we have this we came to the same, similar conclusion of when to land on the downbeat of this measure, and right after that we feel "Oh my God, it's like I just met my soulmate." Or something like that. So I think that collaboration is a big part, a big source of the motivation of doing chamber music, but with this, your technology, I feel like this thing is kind of missing, so, is there any response to this question? [David Shifrin] A dazzling achievement, though. But the most dazzling technology, I hope, does not ever replace musicians coming together in the same place and making music together. I'll tell one other story from my experiences--many, many years ago in the 1970s I participated as a clarinetist in the Cleveland Orchestra's recording of Michael Tilson Thomas conducting Carmina Burana. And we performed it in Cleveland with two completely different soloists than were ultimately on the Columbia Records release of that piece.
and after that performance, the soprano and the tenor were dismissed and we went into an auditorium where all the seats were taken out, and we took up the whole auditorium for orchestra and chorus, just spread out, although we couldn't really hear each other but we watched MTT, and they put the tuba player in a box and there was someone in the back with headphones, and no soloists. It must have been 1973, 74-5, before you were doing this kind of work, but we just recorded all passages that had the solo voices, with no voices, and Judy Blegen, the soprano, and I forgot who the tenor was, dubbed it in in New York, and I said something to a guy named Andy Kazdin, who was the producer and said, "Is this representative of what we do?" and he said, "No, this is a different art form." And I went, "Ok," and it won a Grammy. So it isn't just to your point that it's not the same, but it's pretty impressive. It's different. [Nikki Pet] Yeah, I definitely agree with that. For me, personally as a performer, as well, I think we're held to a different standard in recordings as opposed to live.
If I can be self-critical for a second In the wings performance that you just heard I'm pretty happy with it as I'm playing it for you guys. I would feel very hesitant to put that online or to release that, because of certain imperfections that I know I would just listen to over and over again and think people who are listening to this recording are just narrowing in on that. And so the nice thing with a solo piece, right, you can just snip parts in and out and you put those in and it's fine, you've got a perfect take. With uh with the ensemble I actually tried to do this with my Messian Quartet for the End of Time chamber group last semester when we recorded. It's a lot more difficult especially when you're recording together--it's actually in this very room--we recorded together there are issues with the spatialization, and all of these things where we were able to do some edits, and sort of get that real perfect recording take, but it was a lot more difficult, and I think if you want-- it's not satisfying in the moment absolutely not I agree with you, but to be able to get that level of perfection-- [David Shifrin] To that point that I was lamenting the disappearance of the wired-together studios, because you could play the Messiaen Quartet all together and see one another on the giant video monitors and adjust the balance and if you had a really good take but the violin was out of tune or clarinet squeaked or whatever, you can just redo that part and make it fit, but you can still have had the experience of simultaneously recreating. [Nikki Pet] Exactly, and I think the nice thing about this technology is that those edits, up until now, have to be done manually and if you can optimize this type of program, it's just the click of the button, you put in your parameters and everything just fits.
[David Shifrin] When Kaitlin was explaining the process, I would start trying to count up the hours that it took you to do that, did you keep a log? It was quite labor intensive, I think [Kaitlin Pet] So what I actually did, I think I didn't do a great job explaining this earlier... So these three parameters here were literally the only three values I had to tweak. So we built an algorithm, basically that would say, okay I'm going to find the best combination, and there's three knobs that I can tweak: one is--how steady is the tempo; one is-- how together are the parts; and one is-- how much are the original patterns preserved. So all I had to do is find okay what are the best three sets of numbers that represent how much I care about the three of them. So it was it was work to build the machine that made this but the actual assembly process wasn't that tricky. [David Shifrin] Would this machine work for other pieces? [Kaitlin Pet] It would, yeah.
It would be very easy to take another machine and then, ah sorry, take another piece and then assemble it in the same way. For this one it is more optimized toward Classical style music, because it does have that steady tempo aspect, so for some other types of music maybe that's not something you care about so much, so in actually in the Brahms example that we described earlier what we did was more like this example here, so in the Brahms Clarinet Trio, basically we picked a leading part and then at all points in the score we would say, everyone else follows this leader, so we didn't control any of the steady tempo, we didn't control any of that, we just assumed, okay, these people know what they're doing-- we'll pick a leader and then we'll make everyone else quote-unquote "Follow the Leader" and that ended up sounding pretty good, partially because they knew the piece, they played together before, and partially because Romantic music can handle a little bit of tempo wiggle, whereas when we tried the same strategy with the Mozart, it just got faster and slower and faster and slower, and it sounded really bad. [David Shifrin] It is a different art form. [Audience Question] I'm just wondering, when you say stretching, I'm curious is that only increasing the duration? Because then, if somebody plays a note too long, how do you make that shorter? Do you have to go for the lowest common, or the longest common length denominator? It didn't sound like you were just taking the slowest tempo. [Kaitlin Pet] What we did was actually very similar-- let me go back to the video example, so specifically what we did was we took audio and cut it into a bunch of little overlapping chunks so then it functions very similar to frames in an animation, so that way we could say okay we want this to be longer, we'll repeat some frames to make it longer--we want it to be shorter we'll skip every so odd frame to make it shorter, so that gives a lot more flexibility. [Chris Raphael] So what happens on a note by note basis, each note could be stretched by a different amount, and in general would be. Or shrunk.
[Audience Question] I just have one more comment: in terms of this sort of technology as a separate art form than live music making I think, that was really interesting applications because then you can play with, you can play with the sounds that you have that you've had extemporized from live performance and--Nikki and I are doing a group project in one of our classes that will involve some of that playing around with pre-recorded work and then I was playing live on top of that. [Nikki Pet] Marty, that's a fantastic segue to our final example! As Marty so helpfully led us into, our last example is combining live performance with pre-recorded audio. In this case we are doing the three-part Ricercar from J.S. Bach's
Musical Offering. The reason we did this-- this is a test subject, or a lab rat, if you will, for the Yale Clarinet Celebration, this Sunday at 3 P.M. in Sprague if you guys are available So the piece that we would be using this for would be the clarinet arrangement premiere for Joan Tower's Fanfare for The Uncommon Woman No. 5, which she very graciously asked us to premiere with a live animation very similar to Wings. Now, while very exciting, there is a little bit of trepidation going into this project because, unlike the vanilla Cadenza/InfoPhil, which only handles one voice functionally at a time, this would now be following a quartet--so four different voices. While generally moving in unison, we were a little unsure that everything would be able to be detected. So we wanted to test it out, and we decided to use
this fugue, the three-part ricercar and have the top two fugal voices set for live clarinet and oboe, and then a digital bass clarinet. So this bass clarinet was made through a recording that I did over the summer, input into Cadenza, and then that recording follows us afterwards, and we have very affectionately nicknamed this hybrid of me and the computer: Nikkitron. So I'll let Kaitlin explain I'll let Kaitlin explain the inner workings of Nikkitron [Kaitlin Pet] All right, so I'm not gonna go into way too much depth here but Nikkitron uses those anticipation and time-stretching concepts I explained earlier in order to stay in sync with us in real time. So in terms of anticipation Nikkitron is anticipating when real Nikki and I are going to place our next notes in real time, so it's doing something very similar to that clapping exercise where it's guessing at every point, okay, where are they going to play next, where are they going to play next. So using that knowledge, it's then able to time-stretch individual notes in order to match us at our next note onset. So instead of stretching in a non-real-time way like we saw in Mozart, everyone recorded beforehand, we stretched it individually, can do it at my leisure, this is happening in real time, so as it's guessing where we're going to play next, it's going to stretch Nikki's part in order to stay in line with us.
All right, so we're going to start preparing and play this, and as we play we want you to keep something in mind: how is this similar or different from a quote-unquote "real trio" where there are three live players. Here me and Nikki have agency, we're driving the piece forward, but bass clarinet Nikkitron is following us. It doesn't really have a mind of its own, quote-unquote. [Nikki Pet] Confusingly, she was named Mickey like Mickey Mouse So in the house it was always, who are you asking for? Nikki or the cat? [David Shifrin] Is Nikkitron tuned to a fixed pitch? [Kaitlin Pet] No, Nikkitron is tuned to whatever Nikki was playing when she recorded the bass clarinet part. [Nikki Pet] We'll do some real chamber music and tune to Nikkitron [David Shifrin] So you're goin to respond to Nikkitron, too? [Nikki Pet] Oh yeah, Nikkitron's a full member.
role like is there a struggle for the technology to sense and react to your to your breathing because if the goal is ultimately for it to you know be like another musician or a human being then um as a someone who's a champion musician you know reacting to those extra musical impulses something that's important so initially like um I guess it's a three-prong question what is what exactly does it like react to and like is there scope for it um interact with extra musical cues like breaths and then like thirdly uh is their stuff also for them it to be applied to you know visual views like oh I will answer that great question so right now it's a hundred percent based on Note onset detection so that's the only thing it's listening for you oh note onset detection oh sorry uh Mike from the score so it sees for