On Demand Video 304 | OD304

On Demand Video 304 | OD304

Show Video

Hello everyone. Thank you for joining. Hope you are having a great experience, attending microsoft, ignite from the comforts, of your home my name is asrar, and i'm a senior business development, manager, for microsoft, startups, team, today, i'm joined by our esteemed, guest, garish, pancha, ceo, and founder of stream sets, which is a very strategic, partner, of our microsoft, for startups, program, in the next 30 minutes you'll learn how stream sets has helped customers. Speed up their adoption, of azure clouds, with its data ops platform. For smart data pipelines. At this time i want to remind, everyone, of our code of conduct, at microsoft, we seek to create a respectful. Friendly. Fun and inclusive, experience, for all of our participants. We encourage, everyone, to assist, us, in creating a welcoming, and safe environment. Thank you, in terms of the actual agenda, for this presentation. We'll provide a quick overview, of the microsoft for startups, program. And then deep dive into the challenges, that streamsets, is solving, and the difference, it's making for enterprise, customers. Now as you all know microsoft's, mission, is to empower, every person, every organization. On the planet to achieve more, which includes startups. Now how can we help great startups. Anywhere, in the world to empower, their businesses. That's really the question we ask, startups, just like yourself, and set ourselves, to deliver microsoft for startups, a program, designed from the ground up to reinvent, what our role is in helping startups, grow. Seeing startups, as a true partner across, all microsoft, platforms. Products, and business, motions. Now microsoft, for startups, is an exclusive. Program. That's dedicated. To helping qualified. Enterprise, ready. B2b, startups, rapidly, scale, their companies. We do this by providing, access to trusted technology, including, azure. Github, office 365. And much more combined, with access to customers, who are microsoft, sales, and marketing, engines. Which provides a streamlined, path for startups to connect their innovative, solutions, to the world's leading enterprises. Now the benefits, of the program, are focused, on these two pillars are on technical, enablement. And business and sales acceleration. The first pillar, really is about, access to technology. Which includes, access to azure cloud. Powerful, developer, tools including, visual studio and github enterprise, in addition to microsoft, power platform, and collaboration, tools like, office 365. Premium. We also offer enterprise, style technical, support and architectural. Design, sessions, along with, one-on-one, consultations. With our product group and engineering, teams. Similarly, the second pillar is centered around, business, and sales acceleration. Really connecting, innovative, startups. With fortune, 50. 100, 500, 1000. Customers. With streamlined, path to partnership, with startup engagement, manager. Who's dedicated, to help startups, navigate, their partnership, with microsoft. We also have startups. In getting. Your startup solution, listed in our commercial, marketplace, so it's available to customers, all around the world and most importantly, connecting. Microsoft, sellers, who are compensated. To sell your startup solution into their enterprise, customers. And retire, their quota. Finally, we showcase, startups, at events like these, microsoft, ignite as an example, today. Where we essentially. Showcase, our startups to the entire world and in our audiences. Both at microsoft, first party events as well as, third-party, industry, and startup events. Attendees, can always learn more by visiting. Startups.microsoft.com. Now with that i would like to hand it over to greece panchayat. Ceo, and founder at streamsets. Thank you over to you garish, thanks esrar, and we're very excited to be part of this, show ourselves, and be part of the, microsoft. Startups program here i'm going to give you a brief presentation. Here on, migrating, and continuously. Ingesting. Both on-premise. And cloud data to microsoft, azure data stores and platforms. Since our founding, in 2014. Our sole focus here at streamsets, has been in building a cloud-native, platform, to help data engineers, build enterprise-grade. Data pipelines. The data ops way, in support of next-generation, data analytics. Now our point of view here at stream sets is simple, for too long. The focus on data engineering. Has been, on ease of use and developer productivity. While this is of course a minimum market requirement. The singular, focus, has meant that all tooling, to date, has only enabled, data consumers, to create. Ad hoc, point-to-point. Data pipelines. And to grab-and-go, data. Such approaches, miss a second key market requirement. That of operationalization.

Operationalization. Requires, being able to collaborate. With different personas. Reuse. Artifacts, created by peers. Support pipelines, in production. And as requirements, change, evolve quickly. And with confidence. In today's uncertain, world, the business is clamoring, for more data than ever, and data engineering, is at the heart of delivering. That data, wherever it originates. And where it's needed, which nowadays, of course is in cloud and hybrid platforms. But data engineers face many many, pressures and challenges. First, the project backlog, is big, and growing, this is happening because business demands. Are evolving, beyond traditional, reporting. To. Modern analytics, data science. And aiml. Second. Upstream, data changes, are accelerating. Both small and large changes to data structure and semantics. Are unending. Are almost, always, outside your control. And very difficult to triage. We call this phenomenon, datadrift. The effect of datadrift, is to create a huge burden on engineering. And also on operations, in order to keep the lights on, and to respond, at the speed of business. And thirdly, as data platforms, evolve for example, from hadoop, into the cloud, data engineers, are on point for huge re-platforming. Projects, while still juggling their daily responsibilities. Now there are various, options out there for data engineers. Ranging from traditional, etl, tools, to simple elt, data loaders. To hand coding, using a variety of programming languages. All of these approaches. Make life hard for the data engineer, why is that, existing, tools are too hard. Requiring, a lot of specialized, skills. Or they're too simplistic. Making it easy for first-time, data load but making it painful. To operate data pipelines. On an ongoing, basis. All existing approaches, lead to brittle mappings, or pipelines, that require, significant, rework. Every time anything changes in the source, or destination. So engineers, end up spending 80, of their time on keeping the lights on leaving very little time for new value-added, work. And when these re-platforming. Projects come up it ends up being a full rewrite. It becomes a huge project. And thus slows down adoption of the new platforms, and technologies. Such strange sets we had lived through these problems before, and envisioned, a new world, a world of smart, data pipelines. As you're engineering, pipelines, to azure, for example, smart data pipelines. Let you go fast, as you get the data to the business, but also, allows, you to be confident, that the pipelines, you're building. Will hold up for ongoing operations. In our, worldview, there are three ways. We make pipelines, smart. The first is we provide full life cycle tooling that is powerful, enough to solve any data engineering need, and this is done through extensible, apis, and sdk.

And Through the ability to automate. Much of your work. While at the same time. Keeping the tooling visual. And abstracting, away the complexities. Of pipeline, development. Providing a single on-ramp. For drag-and-drop, etl developers, to do kind of full-fledged, data engineering, work, without needing to code. Secondly, smart data pipelines, are resilient. To all the data drift that's happening all the time in the world around us, it could be a schema change, could be a database, version upgrade. It could be a file format, change, smart pipelines. Are designed, to handle such changes with minimal, to no updates, required. So they run non-stop. Even as the data drift, is non-stop. And finally, smart data pipelines, are portable, lots of vendors, checkbox, support. For on-premise. And multiple cloud systems. But they require multiple, tools, or rewrites, to support the variety of different clouds. But with pipeline portability. You can take them from on-premise. To microsoft, azure, without rewriting, the pipelines. Simply update the destination. And you're done, so what is the technology, secret sauce behind these smart data pipelines. Is there anything, real, behind these claims. There is, the secret sauce is the architecture. Of these smart data pipelines. An architecture, that provides, the abstraction, to separate, the what of the data from the how of the data. The what, is the business meaning and the logic of the data, and its pipelines. And it's what the business actually cares about, the how of the data, is the technical, implementation, detail. Like what database, or what messaging bus it's in what the schema, data structure is what the format is etc. These are things that the business doesn't care about yet 95. Of changes in the data, are driven by the how. If we look at a typical etl mapping, or a typical data pipeline. Every single step in the data flow embeds, details, on the how, that's to say the structure. The schema, the semantics, etc. If anything, changes, you have to update, every single step in the pipeline.

Smart Data pipelines, however, abstract, away as much of the how as is possible. Each stage, is decoupled, from the others, and focuses. Only on the watt. So, if there's a change to a source system, say a column moves or a database, replaces, a json, file. All you have to do is update the source stage, and the rest of the pipeline, is unaffected. So dumb pipelines, data engineers spend 80 to 90 percent of the time, doing maintenance, and upkeep work, on changes that are trivial. With smart data pipelines, that time is freed up to deliver, new value to the business. So there's a typical view of how string set smart data pipelines, fit into an azure environment. Streamsets, helps you ingest data from source systems. Such as databases. Files, apis. Or kafka, and ingest them on a streaming, change data capture, or batch basis. Into raw landing zones. Including azure storage, azure synapse, azure, event hub and so on stream sets also migrates data from your legacy. Data lakes and big data platforms, such as hadoop. Into the azure, storage layers. From there, etl, and data processing pipelines, help transform. Filter and structure the data, either in a curated, data lake such as synapse or hdinsight. Or into a report ready conform, data warehouse such as synapse again or sql server. These etl pipelines, execute, natively, on spark. Also running on azure. So you can take advantage, of spark's processing power and scalability. And streamsets, provides a single control, hub, to manage and monitor, all of the data flows. Across, all of your pipelines, both on premise, and in the cloud, in real time, in a single console. A final key difference for stream sets is that we've always focused, on data ops. That of automating, and monitoring the entire data engineering lifecycle. And ecosystem. Many etl tools will let you design test and deploy a pipeline. However there's much more to data engineering than that, data engineers, need to be able to monitor the health of their deployed, pipelines. And troubleshoot, exceptions. In real time. Data engineers, need to enable more casual, users. Including, point-and-click, etl, developers. And data scientists. Enable them to design and deploy their own data pipelines. Enabling, self-service. Access to data for them. And, platform, operators, need to be able to easily, manage, the data pipeline, infrastructure. While also accelerating, the shift to the latest and greatest data platforms. Such as upgrading to sql server 2019. Standardizing, on adls. And more recently. Adopting, synapse. They need the ability, to monitor the data pipeline held across all pipelines, no matter who built them and where they're executing. Only stream sets supports the end-to-end, data engineering. And operations, of these smart data pipelines. Now we have thousands of businesses, using our software, from boutique systems integrators. To mid-market. Enterprises. To the largest, enterprises, in the world across every vertical. In every continent. But we're particularly, proud of many global 2000, companies that have made significant, investments, in both stream sets and microsoft. I'm going to highlight just a couple one is royal.shell. Number nine the global 2000.. They use stream sets for their shell.ai. Division, which is tasked with bringing machine learning and ai to all lines of business that shell, these data scientists need access to all types of data from traditional, downstream, customer data to upstream, and midstream, data such as, data that originates, in wells refineries. Iot. And also specialized, lines of business such as energy trading. At shell they were able to onboard hundreds, of data scientists, so they could self-service, data for their ai and ml needs. And by doing this on azure they were able to dramatically, lower their capital commitment. To sap, hana, their on-premise, data analytics, platform. A second example. That i'd like to highlight, is that of humana. Which continues, to climb up the fortune 100. With revenues, of 65 billion having grown more than 30 percent over the past two years. Humana, has implemented, a common, core, data fabric. With stream sets and other technologies. Cloud and on-premise, to support their business mission, of making their members 20, healthier. By sharing data at any and every touch point with their members. The common core data fabric, uses the fire standard, and ensures, all compliance, requirements, are met. With this they're able to provide, information. Provide, context. And provide the smarts.

For Built-in, built-for-purpose. Applications. You can see video keynotes, of these examples. From our data ops summit, and many more at streamsets.com. Customers. All right enough talking, from me for now, a picture is worth a thousand words, let's take a quick look at the tooling, in this pre-recorded, video. Here's what a data engineer can do with stream sets. Use data collector. For building. And executing, smart data pipelines, for batch, streaming. Or change data capture. Use transformer. To design etl. Or complex, data processing, that runs on spark. Transformer, comes pre-built, with multiple, processors. But gives the data engines flexibility. To extend capability. With their favorite spark based tools. With multiple, data flows running simultaneously. Across your environments. Control, hub gives you the management. And end-to-end, monitoring, of all your data operations. Let's take a look at how you can, migrate, external, data, into azure. Using stream sets transformer. The pipeline, here is configured, to migrate, delimited, files into azure storage. While it's migrating. It's also going to optimize, file creation in azure storage. By converting, data to parquet data format. Creating, large files, and auto partitioning, the files, based on the business logic. As you migrate data you'll get real-time, monitoring. Allowing you to track your migration, process. In the next example, i'll show you, how you can build an end-to-end, data processing, solution in stream sets using native azure services for real-time, ingest, and data processing. Here the first pipeline, is set up to ingest tweets in near real time. As the tweets are ingested, i archived the raw data in azure storage. And at the same time with simple configurations, i'm going to cleanse, and normalize, the events before processing, them in synapse. I'm also sending the cleanse data to azure event hub for further processing. In the next pipeline, i perform sentiment, analysis, on each tweet coming through eventhought. If you already have a piece of code for doing this you can easily plug it into a pipeline, and run it. I am leveraging the text analytics, api, from azure's, cognitive, service. To directly, analyze, each tweet. The output scores of the analysis, can be persisted, in any storage of your choice, a score value closer to one indicates, a positive sentiment. And a score value closer to zero would suggest a negative sentiment. I also set up. A simple pipeline. That. Allows me to analyze, tweets. Over a period of time, probably to identify. Influencers. And understand, their sentiment. As time progresses. A pipeline for performing, analytics, like this, can be run at scale over a spark cluster on your azure environment. Now given i have multiple, data flows, each executing, in its own silo. I created, a topology, and control hub to oversee, operations, of all of these data flows. And visualize, the data operations, in real time. As you can see, it shows me a single pane of class for all my data pipelines, that are running. Across, different environments. And across different executions. While showing me how much data is being processed, across, each stage. You saw how easy it is to ingest, data and apply machine learning, in near real time without, writing code. You also saw how transformer. Allows executing, a scalable. Etl, and data processing, logic, on top of spark. And lastly, also so how control, hub is a single place for designing. Deploying. Monitoring. And managing, all your data pipelines. Data processing, jobs, and execution, engines. From a single, hub. Okay, now that you've seen, the tool, and pre-recorded, action. You can check it out live. Data collector and transform, products are available, to try out on the azure marketplace. In a variety, of configurations. Just fire it up, and give it a try, you can also visit the streamset's, website to learn a lot more about our support for data ops, and modern data analytics. And you can get access to a free trial of control hub, so, go fast. Be confident. And good luck engineering, smart data pipelines. Over to you sir. Thank you so much chris, super helpful, and great demo as well i actually had a quick question so stream set solution. In some regards. Is very complementary. To microsoft's, first party offering, and given our partnership, of working together. Can you please touch a bit more about how stream, sets. Helps, enterprise. Customers. And how, you're actually using azure marketplace, and how it has been beneficial, for your business. Thank you, absolutely, uh well first maybe answering the marketplace. Question, we're very excited about the azure marketplace. Especially. In the covet era where, it's difficult for us to expand, our reach, in a physical manner. We're able to reach, more prospects, more customers, globally. And get traffic that we otherwise wouldn't get, in fact the azure marketplace.

Has Been, the most popular, cloud marketplace, lifetime to date for stream sets products, what we find is that these users, end up doing a lot of their proof of concept no poc. On their own. Get some development, going, and when they're ready to commit they're able to transact. Quickly and seamlessly. Both enterprise, subscriptions, and commit to consume subscriptions. Through the marketplace. For both our private and standard offers, so we love the marketplace. Uh with respect, to other microsoft, first party services. No we generally end up coexisting. With them as an example. Microsoft's, azure data factory. Is, a standard. Along with stream sets and both shell and mana the two examples, i gave you, where we find our sweet spot is when. Customers, have a much more of a hybrid, problem. So where they have still some, data and compute, sitting on premise. Where they have a very fragmented, data platform landscape. And where they. Care a lot about operationalization. And data ops, often the largest enterprises. Need that, service level agreement. To their end users, and we're able to bring that, to the table. Super helpful thank you for driving clarity. To that and the differentiation. And, we're super excited that azure marketplace, has been very helpful, and great to know you're already transacting, and reaping the benefits. Uh so again thank you everyone, in closing. I also want to share some of the resources. And calls to action if you want to learn more about the microsoft for startups, program. At, startups.microsoft.com. For stream set solution, on azure marketplace. You can go to aka, dot msl. Stream sets, and then follow us on our social handles, as well, and then finally. Please continue, your learning journey with microsoft, by going to microsoft.com. Learn, again, thank you everyone for your time today and, enjoy the rest of microsoft, ignite. Thank. You.

2020-10-05 00:38

Show Video

Other news