VoiceInteraction s NAB Amplify 2021

Show video

Welcome to VoiceInteraction’s virtual NAB Show, of April 2021. While still missing the usual face-to-face meetings, during the live show, we believe that it is far more important to guarantee everyone's health, until the moment where we can all get together again. April has always represented a milestone for us.

This is one of the most important development cycles for VoiceInteraction, as it culminates with brand new releases for all of our products, just in time for our scheduled presence here, at the NAB Show. Despite the changes brought by last year, VoiceInteraction will maintain attendance, as usual: here to share with you, all the significant news and product innovations for 2021. Welcome, my name is João Neto and I’m the CEO and co-founder of Voiceinteraction.

VoiceInteraction is a product company that is dedicated to speech processing technologies. Founded in 2008, with a focus on the research and development of automatic speech recognition engines, text-to-speech, dialogue systems and natural language processing. The idea is to extract knowledge from speech. Our practice includes Artificial Intelligence, based on machine learning algorithms and deep neural networks.

A strong technical background, complete with a Research & Development Department, with several PhD’s and an innovative team of software engineers. VoiceInteraction is a product company that offers horizontal technology solutions for different markets: For the broadcast industry, a strong component in terms of live closed captioning, with Audimus.Media. For pre-recorded programs, we have Audimus.Server. Finally, the Media Monitoring System - Broadcast Edition, for compliance recording and analysis, complemented by trending news discovery and market analytics. The broadcast market has always been our main focus, nonetheless, VoiceInteraction is currently working in other markets, with different challenges. Administrative modernization, through automation of transcription and speech analytics for contact centers, are some of the examples of advancements, in other areas.

With over 500 clients worldwide, across the United States, Latin and South America, Europe and Asia. Voiceinteraction remains committed to the delivery of the best products for our clients. Since 2008, VoiceInteraction has been invested in the search for practical solutions that can be important for the broadcast industry. Our headquarters are located within walking distance from the University. Keeping an open door to emerging talents from the engineering field, while strengthening our relationship with Academia.

A decision that contributes to the entry of a steady flow of qualified resources, continuously empowering our workforce, all the while guaranteeing the renewal of products to come. VoiceInteraction’s core technology derives from the researcher mentality. This is the driving force behind several of our products. For that reason we have been working in AI and Machine Learning algorithms since the beginning.

In this area, there are two types of organizations today: companies that outsource technology, mainly from cloud providers and the ones that are owners and develop the technology. We strive to be the latter. At VoiceInteraction, we create and develop products based on our own proprietary speech processing technologies and machine learning algorithms. There are always higher costs associated with this approach.

It requires, not only, a strong background with a qualified R&D department, but also, thousands of hours of fully-transcribed and annotated data. Only then, it is possible to produce the level of results that we deliver to our clients, on a daily basis. With the global market view in mind, VoiceInteraction invested in various languages to open new doors.

We started in 2013, by bringing our technology to the brazilian market. In the search for new opportunities within the Broadcast industry, we made advancements to our offer, by creating tailored solutions for specific accents. After thousands of hours of closed captioning and offline transcriptions, our A.S.R. showed great improvements and our result delivery was elevated to a new standard.

In the brazilian market alone, our operation increased into more than 250 deployed systems, for Audimus.Media. Being able to provide a steady ROI to the same broadcasters that first invested in our technology. With the exposure from international fairs, such as the NAB Show, we felt that it was time to promote our offer to other markets.

Since coming to the United States, we improved our Automatic Speech Recognition even more. After more than 120 new licensed customers, VoiceInteraction’s product line-up has made actionable changes within the broadcast market. The 6th generation of products keeps honoring our commitment to provide high accuracy levels and fast turn-around deliveries. By April of 2021, we are releasing the new and improved versions of the different products. Audimus.Media is VoiceInteraction’s A.I. driven automatic closed captioning platform.

This software produces automatic closed captions, with high accuracy levels, that are delivered within low latency delays. This system is augmented with a new translation functionality running on premises, allowing the delivery of two closed caption feeds, simultaneously. Know more about how our A.I. driven platform is always on – accurate and reliable. Audimus.Media is an automatic closed captioning production software, designed to receive any signal, process it and send back high accurate captions within a small delay window.

The general workflow is simple. The master input signal that goes to the closed caption inserter is also routed to the Audimus.Media virtual or physical server. We have different input configuration scenarios but, traditionally, there is a Blackmagic Decklink board - or a similar device - receiving the signal. Then, the audio is extracted and sent to the speech recognition engine to be processed. Within a small window of 3 to 4 seconds, a complete line of caption is produced and sent to the encoder to be merged with the master signal as it advances down in the production pipeline. Now let’s take a look at this in a little more detail: when the audio is decoded by the Decklink board, it is sent to an Acoustic Speech Processor.

Here, the first step will be the ‘speech-non-speech’ detection, which identifies and segments the zones that will be captioned from the background zones with no speech. For every speech segment, the engine identifies who is talking – speaker identification. Moreover, it analyses if it is the same speaker of the past segment or a new one. If it is indeed a different speaker – a new speaker turn is created.

For every created turn, there is an unsupervised classification of speech segments, based on speaker voice characteristics - speaker clustering. In the background zones, events like music, applause or laughter can be detected and added as metadata to the captions. After the audio pre-processing stage, it is time to transform speech-to-text. Using a Deep Neural Network based training, VoiceInteraction created acoustics models, robust enough to respond to most of the day-to-day scenarios of a TV program. They are derived from thousands of hours of gathered data.

The acoustic data is combined with statistical models of the most relevant words that make up the vocabulary. These sets of 200,000 words are derived from web crawling, new words suggested by clients and regular manual transcription of real data. A singular language has millions of words and, as previously stated, these cannot be in the model, all at the same time. The accumulated delay of processing millions of entries would be too great for a normal CPU or GPU processor. So, the ingenious simplification comes by the way

of finding the best universe of words, for any given time period. One that offers the best chance for a result close to 100% correct. With that in mind, a new language model is created everyday. The motto, is to adapt as rapidly as the contents do. Using both models and an enhanced WFST search algorithm, it is possible to create almost un-delayed captions, maintaining efficient usage of memory, overall.  Denormalization, automatic punctuation and relevant tags are added before sending the caption to the closed captioning encoder. Audimus.Media has a web interface, where you may setup,

control and export produced closed captions. After this architecture presentation, a short demo of the interface will get you acquainted with some of the most relevant features. Based on a RESTful API, the platform is ready for on-top development and extension. One of the most relevant out-of-the-box integrations is the mute/unmute commands.

Upon configuration, these can be integrated on pre-existing workflows or directly in a GPIO pin, at master control. VoiceInteraction natively integrates with iNews, ENPS and MOS systems for automation of vocabulary information gathering. Finally, VoiceInteraction also natively supports the most common encoders and communication protocols, simplifying the overall process and integration. By April of 2021, the Broadcast market will witness the release of our new and improved Audimus.Media 6.7 –

developed with the end-viewer experience in mind, this generation of Audimus.Media will prove its potential, once again, by providing our customers with speech processing technologies that produce higher accuracy results. This time, with a set focus on live multilingual broadcasting. This means that live translation, accompanied by automatic punctuation, capitalization and denormalization are now a Broadcasting reality.

Enabled for on-top integrations with standard ‘Newsroom Computer Systems’ like NPS and MOS, providing various degrees of specific vocabulary control and fit for any local TV Station. As a result of thousands of hours of gathered data – the acoustic models may now be combined with statistical models, for a complete selection of the most relevant words that compose any vocabulary. These sets of 200.000 words are driven from web crawling;

new words suggested by clients and by manual transcription, sourced from our customer’s ongoing programming. The ideal solution for unanticipated situations, such as ‘Breaking News’ scenarios, that still demand continuous live closed captioning. Audimus.Media will be able to assist your TV station

with full media coverage, while at the same time, providing overall accessibility for your audience. More than 120 TV and cable news’ stations, throughout North America alone, have agreed that Audimus.Media is the most cost-effective solution on the market, when looking for automatic closed captioning production, during live broadcasts. Audimus.Server is our offline transcription platform for pre-recorded programs.

This software will receive any video or audio file and return a full text transcription, in a quarter of the duration of the original file’s length. Once finished, the results of the automatic transcription will be stored and indexed in a database. These files can be edited and exported to a large group of output formats. Know more about how our A.I. driven platform

returns full transcriptions in a quarter of the original file’s length. Audimus.Server is an offline transcription platform that is custom designed to assist with every Broadcaster's post-production workflow. Process and store every media asset.

All entries will be complete with full descriptive information and retrievable metadata. These actions will allow for better integration within your Broadcaster’s workflow, while enhancing productivity and widening your data reach. Audimus.Server creates acoustic and linguistic metadata from audio or video contents, by combining the results of the automatic language and speaker identification with the transcriptions that are produced by our speech recognizer – these, will later be enriched by Natural Language Processing annotations.

This system also supports the most commonly used subtitle and closed captioning formats, enabling swift and accurate transcriptions that propel translation workflows. Save the content’s descriptive metadata as a pre-formatted text document, XML representation file or simply export it to a suitable format for your operation. These exports may also be loaded into non-linear video editors, text formatting applications or subtitle ‘correction and translation’ tools.

Ingest the produced metadata into a media asset management system. This feature allows for an exhaustive indexation of the entire media archive which, consequently, enables the retrieval of relevant media files, by way of textual searches concerning any of the spoken contents. Avoid any manual constraint, when searching for keyword-based content descriptions.

Generate full-text transcripts with high accuracy levels, derived from any media asset. Expect this production to be concluded in less time than their playback durations. These actions are intended to void the time spent and human resources needed, during the creation of error-proof content transcriptions or translations. With Audimus.Server, the long and arduous task of producing offline transcriptions will simply become a revision job.

By April of 2021, the market will witness the release of the new and improved Audimus.Server 6.6. Developed with the furtherment of workflow integrations in mind, this 6.6 generation keeps moving forward, by providing our customers with faster turnaround rates and even higher accuracy levels. With a set focus on the new and improved in-app subtitle text editor that allows for our users to export fully-revised subtitles, for different workflows.

The script align algorithm has been revisited as well, with input flexibility improvements and new output alignment features. The Media Monitoring System - Broadcast Edition is an A.I. driven platform that extends the bounds of typical compliance software.

It is a long-term 24/7 HD capture and recording solution, specifically engineered for live content visualization and control. The system is configured to work around the clock, monitoring, not only, what is happening on your network, but also what other players are doing. You are able to record, monitor, flag, analyze and react to everything that is happening in your area. Know more about how our A.I. driven platform

extends the bounds of compliance software The Media Monitoring System - Broadcast Edition. A proprietary platform that is designed to overcome the results that are prompted by typical compliance software. This system provides our clients with an elevated recording solution for Live TV capture and multiple markets’ radio feeds. Simply opt for an on-site or distributed production.

An out-of-the-box setup that is able to receive several video and audio input signals, from multiple sources. Expect seamless integrations with your current workflow. Configured to work around the clock. This system will monitor, not only, your channel or network’s current production, but also, what other market players may be doing. You will then become able to record, monitor, flag, cross-analyze and react to everything that is happening in your area. Ensure ‘compliance & logging’, provided by our dedicated Alarm Center.

A closer look into the daily pitfalls, under monitoring, will reveal a need for the following: Server Status, Disk Storage, TS Monitoring, LKFS Infractions, Signal QoS and Closed Captioning Coverage. With real-time notifications in-app, via e-mail or instant messaging, your engineering team will always be in control and informed. Guarantee that your broadcast QoS is equipped with state-of-the-art analysis and real-time fault detection. While executing MMS-BE, expect a continuous and proactive analysis for any deviation from regulatory protocol, with the possibility of confirmation on any broadcast error, directly in-app. Our A.I. based engine is set on machine learning algorithms that are effectively responsible for mapping the full spectrum of the defined Broadcast content.

Widen your searchable metadata network, in order to find effective information, immediately. Also, minimize workflow headaches and complex integrations, by exporting media assets, compliance reports, closed captioning or loudness related Information, in a moments notice. Custom designed, multi-browser and responsive web dashboards were built on top of a powerful API. Employees will also interact with customized views, focused on maximizing their performance and response to day-to-day tasks. Comprehensive reports may later be downloaded, so to better absorb the information.

This past month of March witnessed the release of the new and improved MMS-BE 6.6, complete with the following features: An integrated ‘Multiviewer’, for multiple market locations, all of which are now gathered in the same page. A seamless and continuous 24/7 ‘Video Timeline’, with faster VOD navigation, for media clipping and exporting purposes. A customizable ‘Alert Center’, that provides a cohesive view of the relevant situations to be on the lookout for.

The introduction of ‘Monitoring Profiles’ for all closed captions, LKFS and image/video quality. Monitor TS parameters, with improved ‘TS Analytics’. ‘LKFS Monitoring’, for every channel within the incoming signal. Multi-language closed captioning ‘Monitoring and Extraction’ and finally, a RESTful API for third party workflow integration. A powerful tool for a wide variety of departments, inside any TV station.

Expect synchronized navigation within the captured channels’ feeds, providing a comparative view of all the concurrent channels. This feature represents, not only, the ability to observe trends, but also, steep increases or major crashes in viewership. A powerful cost-effective solution, now available for desktop, iOS and android. VoiceInteraction believes that true synergy can only be achieved when efforts combine: Engineering, Production, News, Programming, Sales, Traffic and Management – all departments speaking the same language.

Getting everybody on board. In current times, regulatory institutions have provided the Broadcast industry with a specific set of mandatory guidelines, concerning the overall accessibility status for audiences in general. These guidelines include the production of live and offline closed captions, as a complementary element, to all forms of produced and televised content. VoiceInteraction continues working hard towards the full automation of all of these processes. Based on large amounts of data that were collected and curated over the years, we keep training our Deep Neural Networks to use different Machine Learning algorithms, Feed Forward and Recurrent Neural Network Architectures. These A.I. based processes are important

to the diversification of models, especially for baseline acoustic models and language models that support the ASR engine. There are several other processes included in this operation, such as: Audio analysis, with speech-non-speech detection; Background classification and enhancement; Acoustic events detection; Spoken language identification; Emotion analysis; Speaker analysis with speaker turn; Speaker clustering and speaker gender and identification. The underlying technology is complemented by other particular features, such as: natural language processing, punctuation, capitalization and NER text analysis.

By customer demand, we can also announce model development for automatic translation, or dubbing, during live broadcast scenarios. With very low latency results, this feature is simply not available anywhere, as a comprehensive option, for the current broadcast market. As for all this to work, the required word-value is set between the thousands of millions up to three thousands of millions of words, per language model. This requires constant engine updates, in order to improve result efficiency while, simultaneously, shortening caption production times. Conscious of the demands caused by this past year, not only professionally but at a family and personal level, VoiceInteraction keeps the path set from its origin: growth within a collaborative environment. VoiceInteraction has kept production of state-of-the-art technology, for over 12 years now.

This means constant technological revolutions. As we go through 2021, we will continue to strive for excellence in all of our services. Our achievements are directly affected by our collaborative spirit. We are able to keep our mission statement still true: 'Customer satisfaction’. Now, recognizing the amount of trust that was deposited in VoiceInteraction’s practices, throughout the years, we still retain a strong belief for a promising future – for the company and our partners.

We will be - today and tomorrow - by your side, assisting with whichever may arise, whenever it may occur. To walk the path of success, together. Finally, it is our biggest expectation that the rest of 2021 may represent the year of personal connections and reopening our doors to the public. The year of coming back, experiencing live events and face-to-face meetings.

2021-05-15

Show video