Artificial Intelligence & Robotics Tech News For October 2022
robot dog learns soccer skills researchers from the hybrid robotics Lab at the University of California Berkeley have trained a robot dog to be a soccer goalkeeper which has a better shot blocking rate than Premier League human players the robot dog can squat sidestep jump and die to block the soccer ball and then return to its starting position with a blocking rate of 87.5 percent it greatly exceeds the human average of professional soccer goalies which is just 69 percent the robot dog was trained via reinforcement learning which is an aspect of machine learning that allows artificial intelligence to learn by trial and error using the feedback received from their actions researchers have detailed the process in the form of a preliminary paper titled creating a dynamic quadrupedal robotic goalkeeper with reinforcement learning teaching a quadruped robot the task of being a soccer goalie involved many complex problems including combinations of Highly fluid Locomotion with quick and accurate non-prehensile object manipulation to control the soccer ball the mini cheetah which was the nickname given by the biomimetic robotics laboratory at MIT had to block the ball with Dynamic Locomotion movements within less than a second with a weight of only 20 pounds the tiny agile robot dog can run Squat and even do backflips or strafe sideways like a crab if it is knocked down it can swiftly return to its original position with a quick Kung fu-like movement of its elbows according to MIT the mini cheetah is the first robot dog that can do a reverse flip and trot across uneven terrain nearly twice the speed of a normal person's walk the soccer goal that the robot was tasked to defend measured five feet by three inches and was situated approximately 14 feet from the ball mini cheetah does not have a camera therefore the ball's location is determined through an external camera and YOLO which is a popular and lightweight computer vision algorithm that uses a convolutional neural network to perform real-time object detection the dog is subject to a variety of kicks and froze from humans as well as shots from a second quadruped robot dog that was developed by unitary robotics while the experts concentrated exclusively on goalkeeping the framework could extend to different situations like multi-scale ball kicking researchers have said making robots that could compete with human soccer players is one of their long-term goals currently there is already a separate yearly event called the RoboCup competition which is a robot versus robot soccer competition that has been going since 1997. breakthrough humanoid robotics technology clone robotics has gone to extraordinary lengths to ensure that humanoid robotics will have some of the most realistic hands available on the market with hydrostatic muscles that move under transparent skin with an uncanny resemblance to a real human appendage clone robotics claims to have created the first biomimetic hand named model number v15 which is capable of grasping all kinds of objects regardless of shape from tennis balls to suitcases to weights and more the robotic Ken's thumb and fingers are designed to be extremely realistic in terms of both appearance and operation and the internal muscles have been created by the team based on the concept of a McKibben muscle muscles are basically a mesh of tubes with balloons inside when a balloon's radius increases which is usually caused by a Pneumatic or hydraulic pump that is external to the muscle the mesh is then forced to shrink longitudinally clone robotics prefers not to utilize large pumps externally instead their aimless to design muscles that could be stimulated electronically to constrict with a certain degree of control with this design goal in mind the team devised a way of filling the balloon with acetaldehyde while using electric currents to activate the muscles when the electric current is applied the astable to hide in the balloons quickly begins to boil thereby raising the atmospheric pressure at just 68 degrees Fahrenheit which is up 6.6 Times Higher at 158 degrees Fahrenheit clone robotics then constructed the skeleton with a set of human-like bones using hinged joints which allowed for the mobility in the robotic hand that matches that of a real human hand this resulted in the robot arm having approximately 27 degrees of freedom which is similar to human hands with integrated hand and forearm rotations each of the movements from the robotic arm is controlled by a complicated network of tendons and muscles which extend along the forearm and hand with the Prototype that is currently in development Clone robotics is using a basic hydraulic setup to activate the muscles with pressure being distributed via a 500 watt pump running at 145 PSI all through a set of 36 Electro hydraulic valves with pressure gauges on each the built-in magnetic sensors relay information back to the onboard artificial intelligence which then adjusts the joint's inner angles as well as velocities the company plans to start selling its robot Hands by the end of this year but the exact cost has not yet been revealed clone robotics next product will be a full robotic torso with a spine including 124 muscles that are located in the hands neck shoulders chest and the upper back which will be designed for integration into the company's Locomotion platform to carry the unit's battery pack new Google AI turns text into high resolution videos Google brain presents imagine video which is a text conditional video generation tool based on a series of video diffusion artificial intelligence models recent advances in AI have made it possible to make incredible progress in the field of generative modeling particularly with text to image models image and video advances text to video AI systems with another step forward in the evolution of generative AI modeling capabilities video generative models can have a positive impact on society by helping to increase and enhance human creativity and while they can also be misused to create harmful hateful explicit or inappropriate content these concerns have been minimized by multiple measures for example in the internal trials input text prompt filtering and output video content filtering was employed to prevent the creation of such content although the internal testing suggests that most explicit and violent content can still be removed there are still social biases and stereotypes that can be difficult to filter meaning there are still many ethical and safety issues ahead Google's new image and text-to-video model works like this it first takes an input text prompt to then encode it with a T5 text encoder into textual embeddings then runs it through cascaded diffusion models The Base Video diffusion model generates a 16 frame video with a 24 by 48 pixel resolution at 3 frames per second which is followed by multiple temporal super resolution and spatial super resolution m models these models then up sample all the data to create a final 128 frame video with a 1280 by 768 pixel resolution at 24 frames per second which results in a 5.3 second long high definition video Imogen video creates high definition videos by using a base video generator model and a series of interleaved spatial temporal and super resolution video AI models the system is a scaled up high definition text to video model which includes design decisions such as choosing fully convolutional spatial and temporal super resolution models at specific resolutions and the V parameterization of diffusion models Google brain transferred previous findings on diffusion-based image generation into the video generation setting and for fast high quality sampling they use Progressive distillation on their video models with classifier-free guidance imagine video is capable of producing videos with High Fidelity and has high controllability and World Knowledge image in video can generate diverse videos and animations in different Artistic Styles and can even understand 3D objects in mentioned video also employs the video you net architecture for spatial Fidelity and temporal Dynamics temporal self-attention is used in The Base Video diffusion model while temporal convolutions are used in the temporal and spatial super resolution models this allows imagine video to model long-term temporal Dynamics using the video unit architecture image and video and its T5 XXL Frozen text encoder were trained using problematic data and the Google brain AI team will not release the image and video source code or the model until the aforementioned ethical and safety concerns have been addressed new Dolly powered robotics from open AI this is the first attempt to develop web scale diffusion artificial intelligence models of robotics the dollar bot from open a allows a robot arm to rearrange objects within a scene by inferring a description of the objects then creating an image that represents a natural human-like Arrangement and finally physically placing the objects in accordance with the target image this is significant because it can achieve this zero shot using doll e without any additional data collection or training this is a promising direction for web scale robot learning algorithms because the results from Human studies prove that it is possible to align future developments of these models with robotics applications the researchers have also proposed a list of recommendations to the text to image community one of the most significant recent advances in machine learning has been web scale image diffusion models such as DALLE-2 from OpenAI by training over hundreds of millions of image caption pairs from the web these models learn a language condition distribution over natural images from Which novel images can be generated given a text prompt large language AI models which are also trained on web scale data were recently applied to robotics to enable generalization of language-conditioned policies to novel language commands given these successes researchers from open AI wanted to see if web scale text to image diffusion models such as Dolly can be exploited for real world robotics since these models can generate realistic images of scenes they must also understand how to arrange objects in a natural way for example generating images with a kitchen prompt would be more likely to display plates neatly placed on a table or a shelf this is a clear application to robotics to predict goal States for object rearrangement tasks which is a canonical challenge in robotics manually aligning goal states with human values is brittle and cumbersome so this is where web scale learning offers a solution to implicitly Model Natural distributions of objects in a scalable unsupervised manner the research proposes dollybot as the first method to explore web scale image diffusion models for robotics a framework was designed which enables Dolly to be used to predict a goal state for object rearrangement given an image of an initial disorganized scene the pipeline converts the initial image into a text caption which is then passed into Dolly to generate a new image from which the AI then obtains gold poses for each object to note the publicly available dolly was used as it is without requiring any further data collection or training this is important because it allows for zero shot autonomous rearrangement going Beyond prior work which often requires collecting examples of desirable arrangements and training a model specifically for those scenes Elon Musk reveals the Tesla Optimus AI humanoid robot while Elon Musk showed off Tesla's humanoid AI robot for the first time he stated that the robot will pose a significant contribution towards progress in artificial intelligence autonomous driving and artificial general intelligence also known as AGI mus said the Tesla Optimus robot's current iteration has the ability to move all of its fingers independently with thumbs that move with two degrees of freedom allowing it to carry out a myriad of real world tasks the robot is being designed to be produced at high volumes with a price point of less than twenty thousand dollars Tesla's humanoid robot features 28 structural actuators with 11 degrees of freedom in its hands and supports Wi-Fi and LTE running at 52 volts the team intends to reduce the cost of the robot and increase its efficiency over time in order to enable production at scale by reducing their part count and power consumption to achieve this they will minimize the wire count in the arms and locate its compute and power distribution Hardware to the physical center of the robot in the middle of the robot's torso will be the battery pack which will hold 2.3 kilowatt hours to complete roughly eight hours of work on a single charge the battery pack has a single print circuit board within the pack to take care of sensing charge management and power distribution Tesla plans to use its supply chain autopilot software and hardware for the Tesla Optimus robot this will allow the robot to process Vision data make Split Second decisions using multi-sensory inputs and Carry Out Communications including audio support and Hardware level security to protect both the robot and humans in its vicinity the robot employs a form of Tesla's autopilot using its neural networks for computer vision to achieve volumetric depth rendering of objects as well as make an estimation of its local environment to navigate towards its charging station for instance the robots Locomotion planning is carried out in three different stages first by calculating its desired path then planning its footsteps across that path and finally by calculating the reference trajectory using State estimation and sensors the robot tracks its Central Mass trajectory in order to give the robot the ability to walk steadily despite Dynamic changes in terrain or balancing scenarios in terms of the robot being able to manipulate objects in the real world naturally they first generated a library of natural motion references using motion capture and then adapted those motions to current real world situations such as picking up an object in order to generalize variations of motions in the real world depending on object location they ran the various reference trajectories through a reference trajectory optimization program to sulfur where the hands move and how the robot should balance while adapting to a given situation the Tesla robot's hands are ergonomically designed to grasp objects like a human hand would in order to easily and efficiently function inside of a factory environment the hands have sensory feedback and adaptability for new objects with six actuators 11 degrees of freedom and non-back drivable fingers with the power to carry a 20 pound bag the robot's actuators come in six Unique Designs across 28 separate joints including three rotary actuators and three linear actuators with multiple sensors all with extremely impressive output force in fact the actuators are so powerful that they can lift a half ton grand piano which is a requirement as human muscles are also capable of this similar Force the structural foundation of the robot is extremely strong being built to easily weather Falls without breaking actuators or requiring expensive maintenance so it can continue working as normal they'd have run the model through extensive simulations using Tesla's crash software to create stresses in all the components to design them to be as resilient as possible they even designed the robot's joints after human joints to similarly be able to generate maximized Torque from a bent position while minimizing force in order to conserve energy consumption overall Tesla is off to an impressive start with the Optimus robot but it says they are going to begin optimizing the robot for real world tests in their factories to automate increasing numbers of operating processes over the next months and years so far Tesla's Optimus robot features a level of artificial intelligence that currently isn't present in other leading robotics companies such as Boston Dynamics furthermore Elon Musk mentioned that there is a level of democracy involved with the robot and its overall operations as Tesla shareholders can vote with their shares to alter the trajectory of the robot and its design or implementation the Tesla robot is expected to provide economic output capable of reaching two orders of magnitude more than that of humans currently but there is still a great deal of work ahead as the model shown is only the first iteration of the robot Tesla's ultimate goal is to make so many Optimus robots that the global economy becomes quasi-infinite leading to a future of abundance and abolishing poverty because of the ability of these humanoid robots to perform work tasks in the place of humans they could quickly start saving humans from having to perform dangerous or back-breaking tasks machine learning driven exoskeleton researchers from the Stanford biomechatronics laboratory have created machine learning powered exoskeleton boots which are robotics built in labs and allowers to move faster and easier while using less energy the robotic boots have a motor that is connected to the muscles in the calf to provide the wearer with a boost for every step unlike other exoskeletons out there this push is personalized thanks to a machine learning based AI model that was trained through years of work using emulators on a treadmill the device provides Twice The Energy savings of previous exoskeletons in the real world this translates to significant Energy savings and walking speed improvements the main goal is to Aid people with mobility issues to move around the world in the way they prefer especially those who are older with this breakthrough researchers believe this technology can be commercialized and miniaturized within the next few years the first time the user puts on an exoskeleton can be a bit of an adjustment but after about 15 minutes of walking it becomes quite natural the exoskeletons movement is like having an extra bounce in your stride and provides a noticeable increase in comfortability the biggest obstacle to successful exoskeletons in the past was the need for individualization most exoskeletons were designed using a combination of Intuition or biomimicry but people's sizes and movements are too complicated and diverse to tackle the issue the research team used a large exoskeleton emulator lab setup which quickly determined the best ways to help individuals and uncover the blueprints for Effective portable devices to be used in the field when students and volunteers were connected to the emulators researchers recorded their energy expenditure and motion data to determine how walking with the exoskeleton affected the user's energy output the resulting data showed the different advantages of various types of Aid provided through the simulator and was fed into a machine learning model to individually optimize the exoskeleton for each user the exoskeleton can speed up the pace of walking by applying torque to the ankle and replaces certain functions of the calf muscles when users take a step in the direction of their toes right before they are poised to walk the device assists them in pushing their foot off the ground the artificial intelligence component then Tunes the exoskeleton to provide a slightly different walking pattern depending on the user's size and speed when it is able to measure the movement the machine learning model determines the best way to Aid the user each time they step the researchers discovered that the exoskeleton allowed users to walk nine percent faster and with the use of 17 percent less energy for each mile traveled new Google AI generates 3D models from text dreamfusion is an evolution of dreamfield's generative 3D AI system which Google revealed in the latter part of 2021 Google combined the image analysis model called clip from open a with neural Radiance Fields also known as Nerf to create dream Fields allowing the neural network to store the 3D models dream Fields uses nurse ability to generate 3D views and combines it with clip's ability to assess content from images upon receiving the input text an untrained Nerf neural network model generates random views from one Viewpoint which are then evaluated by clip the feedback serves as a correction signal to the Nerf neural network model and the process is repeated until there is a 3D model that matches the description dreamfusion further refines this process with its artificial intelligence model dreamfusion uses Google's pre-trained 2D image diffusion AI model called imagine to perform text 3D synthesis for dream Fusion Google is planning on replacing clip from open AI which can also be used for 3D generation with a new loss based on its own image and AI model which Google AI stated could enable many new applications of pre-trained diffusion models 3D generation doesn't require 3D data for training because it wouldn't be available at the required scale dreamfusion instead learns to represent 3D objects using 2D images from Imogen generated from different perspectives this is done using gaze dependent prompts like front view and rear view for which the entire process is automated by the artificial intelligence dream Fusion produces reliable 3D objects that are more detailed with more quality and depth than dream fields and it also allows users to combine multiple 3D models into one scene as well as generate normals based from a text input the Google AI research team wrote that their approach doesn't require any 3D training data or modifications to the image diffusion modeling which demonstrates the effectiveness of pre-trained image diffusion models as priors users can export the Nerf models that they generate into meshes with the marching cues algorithm this allows them to be easily integrated into popular 3D rendering software or modeling software it is expected that Google's dream Fusion artificial intelligence model could prove to be an extremely useful tool for users in the metaverse this tool would serve non-technical users who want to create structures or objects to populate their simulated Worlds by allowing them to Simply type a description of what they Envision in the chat box to create it in a Flash breakthrough Tesla AI supercomputer Elon musk's Tesla will provide the equivalent computing power of what currently requires 4 000 Nvidia gpus and over 72 racks with just four of their Tesla made Dojo supercomputer cabinets each cabinet has two assemblies inside with an assembly having 61 tiles of 354 densely packed cores per tile each capable of 54 petaflops of compute and 640 gigabytes of high bandwidth memory every Dojo accelerator will have a total of 1.1 exaflops of machine learning compute power
1.3 terabytes of SRAM and 13 terabytes of dram Dojo promises to dramatically speed up the rate at which models can be trained to improve Tesla also has made similar Promises of performance for other types of work that involve creating Ai and machine learning models for autonomous vehicles they will deploy the dojo in clusters referred to as exopods which consist of 10 cabinets each with each exopod able to achieve 1.1 exaflops of machine learning computing power the combined total of the data center will be 8.8 exaflops of processing power primarily being devoted to training artificial intelligence algorithms for Tesla's self-driving vehicles and their Optimus humanoid robot for reference 8.8 exaflops is about eight times faster than the world's
fastest supercomputer which is named Frontier the method by which Dojo operates is different from GPU and cpu-based AI supercomputers Dojo is a different kind of supercomputer because it is composed of tiles which take an entirely different approach than regular gpus and CPUs modern gpus are equipped with many thousands of cores each the newly released Nvidia GeForce RTX 4090 comes with over 16 000 cores while the gpus in Tesla's previous Nvidia supercomputer each have over 6900 cores with a total of over 40 million GPU cores throughout the entire supercomputer gpus are gaining popularity for training artificial intelligence and machine learning applications like those used in developing Tesla self-driving systems CPUs currently contain up to 64 cores with each node offering a maximum capacity of two CPUs and 128 cores a cpu-based AI supercomputer can combine a number of these nodes into one system like the previously mentioned Frontier supercomputer which uses 9400 nodes with over 600 000 cores Dojo is unique because instead of combining many smaller chips like traditional approaches to Super Computing Tesla's D1 tile is one huge chip that has 354 cores that are specially designed for artificial intelligence and machine learning tasks six of these cores are placed in a trade together with other hardware for supporting computing two trays can be put into a single cabinet which gives the cabinet a total of four thousand two hundred and forty eight cores and a 10 cabinet exopod has a total of 42 480 cores because the dojo supercomputer is specially designed to process Ai and machine learning tasks it's several times more efficient than GPU and cpu-based supercomputers with the same data center footprint and Tesla plans to deploy their first dojo exopod in 2023 at their Palo Alto data center record-breaking humanoid underwater drone robotics with haptic feedback allow users to feel with their hands ocean one has swung through sunken ships planes as well as a submarine and plunged down to a depth of nearly one kilometer with its unique features that enable its users to feel as though they are physically interacting with the Underwater World by providing sensory feedback to their hands the underwater drone humanoid robot designed by Mika Robotics and Stanford experts physically Shields people from potentially dangerous and inaccessible underwater environments while linking their abilities knowledge experience and intuition to the task at hand ocean one's top half is a humanoid robot with two robotic arms two cameras in the face for operators to see in 3d and its lower half has eight multi-directional thrusters which enable precise underwater maneuvering the underwater robots haptic feedback system relies on touch and stereo Vision to create an incredibly real Sensations for the operator to allow them to feel like they're at the bottom of the ocean through the ocean one's robotic eyes the human operators can see the environment in high definition ocean one serves two goals to discover areas that no one else has explored before and to demonstrate that human touch vision and interaction can reach locations that were previously far removed from us while ocean one had numerous memorable adventures and accomplishments during two long distance trips across the Mediterranean the most notable achievement of its crew was the demonstration of operational autonomy for more than one thousand meters down it marked the first time ever that underwater robots were capable of providing a haptic feedback based interaction at this depth the ocean one robot's Journey to the one kilometer Mark was an extended effort that started with countless hours of design experimentation and assembly with team members in the lab dozens of trips to the Stanford pool for debugging and a myriad of lessons to be learned before facing the unpredictability of the deep sea the first version of this underwater drone humanoid robot was built with the intention of being able to reach depths of only 200 meters instead of one kilometer so to allow the robot to go deeper the scientists adapted the body to incorporate a special foam made from glass microspheres these glass microspheres provide buoyancy and are strong enough to withstand the mass of pressure at one kilometer depths which is 100 times more than that of sea level the robotic arms contain an oil-filled spring mechanism that compresses the oil to match the pressure outside to prevent collapse or cushioning of the electronic components the researchers also revise several tiny components across ocean one to limit the amount of compressed air within individual Parts while keeping the robot as small as possible ocean one included additional improvements that increased the flexibility of its head and arm motions as well as two different types of hands the ocean one project is not just a showcase for the latest Innovations in haptic feedback technology underwater drone Robotics and human robot interaction but also offers new possibilities for underwater engineering and Marine science activities including checking and fixing boats and infrastructure such as Bridges piers and pipelines that are submerged other Expeditions are planned for several locations across the globe such as lost cities in deep Waters coral reefs and archaeologically valuable wrecks that lie at depths that are beyond the reach of human beings these kinds of humanoid underwater drone robotics can go deep underwater to look for and acquire materials build infrastructure and conduct emergency recovery or disaster prevention operations but in the future they can also be adapted to operate deep inside mines on top of mountains or even in space break through Google AI text-based image editing model Hi-Fi image manipulation using text input is a long-standing problem in computer Graphics research using text commands to describe an edit like men wearing suits or pixel art is significantly easier than carrying out the changes manually in an image editing software text to image models like doll e Imaging and stable diffusion are proficient when creating images from scratch but for image editing tasks these models usually need the user to specify masks and often struggle with edits that depend on the masked portion of the image Google AI has developed unitune as a method to edit images by providing a textual description of the desired result preserving High Fidelity to the entirety of the input image including the edited portions accuracy is preserved both to visual details such as shapes colors and textures and to semantic details including actions poses and objects unitune is able to edit arbitrary images in complex cross-domain scenes and can perform localized and Global edits it is unique in its ability to perform stylistic changes on an image-wide basis without sacrificing semantic details and can complete complex local edits in their logical locations unit 2 uses large-scale text image diffusion AI models to execute expressive image editing and Google AR researchers found that by using the right parameters fine-tuning large diffusion models on a single image prompt pair doesn't result in catastrophic forgetting furthermore the semantic and visual knowledge that the AI model learned from its training is still usable across a very wide variety of edit operations by simply using classifier-free guidance the accuracy expressiveness balance can be tuned by controlling the number of training steps and learning rate or the amount of classifier-free guidance and SD edit fine tuning of diffusion models is a powerful technique relevant to many use cases like image to image translation and topic driven image generation by minimizing overfitting at training time this technique is able to learn the essence of a subject without learning transient image specific attributes such as pose camera angle or background the Google AI researchers say that unit tune is the first method to use fine-tuning of a large diffusion model for image editing tasks new deep learning technology breakthrough uses light waves MIT researchers have discovered a new approach that makes use of Optics to speed up machine learning computations using smart speakers and other low power devices the technique shifts the memory intensive steps involved in running a machine learning algorithm into components encoded onto light waves the light waves are then sent through a fiber optic device with a receiver that uses an ordinary Optical device to quickly execute calculations using the components of the machine learning model that are carried by the light waves this method can result in more than a 100 times Energy Efficiency improvement over other methods while also strengthening security since the user's personal data doesn't go through a centralized server this could allow autonomous vehicles to make decisions in real time and use only a tiny amount of energy compared to what current energy intensive systems require it can also enable an uninterrupted latency free chat using a smart home device or be used for live video processing via wireless networks or even allow high-speed image classification on a spacecraft that is thousands of miles away from Earth the neural network designed the researchers created is named netcast and it involves storing weights in a server that's connected to an Innovative device called a smart transceiver the smart transceiver is a small chip that transmits and receives data and uses silicon photonics to retrieve trillions of weights from memory every second by reading the weights from electrical signals and printing them onto light waves because the weight information is encoded in bits of ones and zeros it converts them by turning on the lasers for the value of one and turning them off for the value of zero it then periodically transmits groups of light waves through the fiber optic Network to reduce latency and prevent the need to request from the server after the light waves have arrived to the client a Broadband modulator makes use of the light waves by executing super fast analog calculations and encodes the input information that the client device receives such as sensor data into machine learning weights lastly it transmits each wavelength to a device that detects light to measure the outcome of the calculations the researchers came up with a method to make use of this modulator to perform trillions of multiplication computations every second which greatly enhance the processing speed on the device using an extremely small amount of energy they evaluated this design using weights that were transmitted across an 86 kilometer Fiber Wire which connects their lab with MIT Lincoln Laboratory the netcast neural network allowed them to perform computer vision tasks with extremely high Precision achieving an accuracy of 98.7 percent for image classification tasks and 98.8 percent in digit recognition tasks all with a speed of up to 96 kilobytes per second as a Next Step the researchers are looking to improve the Chip's smart transceiver to maximize performance and they intend to shrink the receiver to become about one-third of its current size which is as big as a shoebox right now this way it can be incorporated into an electronic device as small as a cell phone new general robotic arm manipulation prompt-based learning has been recognized as an efficient method for natural language processing in which one general purpose model of language is able to be instructed to execute any task that is specified through input prompts however task specification in robotics takes a variety of types like a robotic arm performing One-Shot demonstrations in imitation and following instructions in a language to achieve visual goals which are usually viewed as different jobs that require specialized machine learning models AI researchers from Nvidia Stanford Caltech singhwa and UT Austin have shown a full range of tasks that robots can perform using multimodal prompts that interleave Visual and textual tokens the artificial intelligence researchers have designed an AI agent based on Transformer neural networks a generalist robot and veinay which processes prompt signals and then outputs motor related actions in an auto-regressive way to train and evaluate the AI model the researchers developed a new simulation Benchmark with thousands of procedurally generated tabletop tasks with multimodal prompts over 600 000 expert trajectories for imitation learning and four levels of evaluation protocol for systematic generalization Vima is able to scale up in terms of AI model capacity as well as the size of data it is superior to previous methods in the most difficult zero shot generalization settings by as much as 2.9 times the performance rate when using identical training data with 10
times less training data Vima is still 2.7 times more efficient than the next best alternative quantum computer breakthrough tunes qubits for a programmable solid-state superconducting processor scientists have managed to show for the first time that large amounts of quantum bits also called kubits are able to be tuned to interact with one another and maintain coherence for an unimaginably long period of time in a programmable solid-state superconducting processor this breakthrough was achieved through the efforts of researchers at Arizona State University and XI Jiang University in China together with two researchers in the UK in a recent paper that was published researchers have presented the first look at the development of quantum mini body scurring states abbreviated as qmbs as a powerful method of maintaining coherence between the cubits that interact these exotic Quantum states provide the possibility of creating a vast multipartite entanglement to be used for a range of quantum Computing tasks to attain High processing speed with low power consumption the paper titled many body Hilbert space scarring on a superconducting processor was published in the journal major physics one of the researchers commented that human vs States possess the intrinsic and generic capability of multipartite entanglement making them extremely appealing to Applications such as Quantum sensing and neutralogy the main focus of the study is understanding how to delay thermalization in order to preserve coherence which has been regarded as a crucial research goal for Quantum Computing for some time in vitro neurons are able to learn and display the ability to communicate when they're embodied in a game World simulation integrating digital systems with neurons could enable performance that is not possible using silicon by itself researchers have presented dish brain which harnesses the inherent computational capabilities of neurons within a structured environment in vitro neural networks of rodent or human Origins are integrated into in silico Computing using an array of high-density multiple electrodes by recording an electrophysiological stimulation the culture is embedded into a simulated game world that mimics an arcade version of pom applying implications from the theory of active inference using the free energy principle the researchers observed apparent learning within five minutes of real-time gameplay that wasn't evident under control conditions more experiments show the importance of closed-loop structured feedback used to trigger the process of learning the ability of cultures to self-organize their activities in a purpose-driven manner when confronted with a lack of data about the consequences of their actions is what is known as artificial biological intelligence future applications could offer further insight into the cell-based correlation of intelligence utilizing the computing power of living neurons to produce synthetic biological intelligence which was once confined to the world of science fiction could now be within reach the advantages of biological computation have been extensively researched with the aim to create biomimetic devices capable of neuromorphic computing instantiating synthetic biological intelligence could bring about a paradigm shift of silicobiological computational platforms which exceed the performance of classical silicon Hardware theoretically synthetic biological intelligence could emerge prior to artificial general intelligence also known as AGI due to the inherent efficacy and evolutionary advantages of biological systems breakthrough deepmind AI discovers new matrices algorithms after publishing their paper in nature the Google deepmind AI team has introduced Alpha tensor which is the first artificial intelligence system to discover novel efficient and provably correct algorithms for fundamental tasks like multiplying matrices this answers a 50 year old question in mathematics how to multiply two matrices the fastest this paper represents a crucial step in Google deepmind's mission to advance science by unlocking the most fundamental problems with the use of artificial intelligence Alpha tensor is a system that builds on Alpha zero which was an agent that displayed remarkable performance in word games like go and chess back in January 2016. the paper from Google deepmind shows the evolution of alpha Zero's Journey from playing games to solving untackled math problems for the first time deepmind explored how modern artificial intelligence techniques can be used to automatically discover new matrix multiplication algorithms which is math that's used to process images on smartphones recognize speech commands generate graphics for computer games run simulations to predict weather and much more it also helps compress data and videos so they can be more easily shared on the internet Alpha tensor based on human intuition discovered algorithms that are more efficient for many Matrix sizes than the current state of the art the AI designed algorithms outperform human designed ones which represents a significant leap of progress in the area of algorithmic discovery the deepmind research team first converted the problem of finding efficient algorithms for matrix multiplication into an easy to learn single-player game the board is a three-dimensional tensor which is an array of numbers that shows how far off the current algorithm is the planner must use a set number of moves that correspond to the algorithm's instructions to modify the tensor and zero out its entries if the player succeeds in doing so the matrix multiplication algorithm is probably correct for any pair and the efficiency of the algorithm is then measured by how many steps it takes to zero out the tensor this game is extremely challenging because the number of possible algorithms to be considered is greater than the number of atoms in the universe even for small cases of matrix multiplication the possibilities are Limitless comparable to go which was a challenging AI game for decades as the number of moves possible at each step in the game is 30 times greater above 10 to the 33rd power depending on the setting chosen to play this game well one must be able to find the tiny needles among a vast array of options to surmount the challenges of this domain which are significantly different than those of traditional games deepmind developed multiple crucial components which include a novel neural network AI architecture that incorporates problem-specific inductive biases a procedure to generate useful synthetic data and a recipe to leverage symmetries of the problem the deepmind team started their AI without any knowledge of existing matrix multiplication algorithms and they used reinforcement learning to train the alpha tensor agent to play the game Alpha tensor learns and improves with time and eventually discovers historical fast matrix multiplication algorithms like stressens allowing it to surpass human intuition and uncover algorithms much faster than before after enough training Alpha tensor also uncovers a wide range of algorithms with state-of-the-art complexity with thousands of matrix multiplication algorithms for every size showing that there is more to matrix multiplication and was previously believed these algorithms have many mathematical and practical properties which deepmind took advantage of by modifying Alpha tensor so that they could find algorithms that run faster on specific Hardware such as an Nvidia V100 GPU or a Google tensor processing unit these algorithms multiply large matrices 10 to 20 percent faster than the more commonly used algorithms on the same Hardware which demonstrated Alpha tensor's flexibility to optimize arbitrary objectives Google deepmind's results can be used to guide future research in complexity Theory which is a method that aims at identifying the most efficient algorithms to solve computational problems Alpha tensor helps to understand the richness and efficiency of matrix multiplication algorithms by allowing researchers to explore the possible algorithms in a more efficient way than previous approaches this space could lead to new insights that will help deepmind researchers to determine the asymptotic complexity of matrix multiplication which is one of the most fundamental open issues of computer science because matrix multiplication is a core component in many computational tasks spanning computer Graphics digital Communications neural network training and scientific Computing Alpha tensor discovered algorithms could make computations in these fields significantly faster and more efficient Alpha tensor's ability to take into account any objective could lead to new applications for Designing algorithms that optimize metrics such as energy use and numerical stability this will help prevent small rounding errors from snowballing throughout the course of an algorithm's work breakthrough make a video text to video AI system from Meta Meta formerly known as Facebook has unveiled its new artificial intelligence system called make a video which allows people to transform text prompts into short high quality video clips make a video is a continuation of the recent advances from meta in generative AI research and has the potential to create new opportunities for creators as well as artists the new artificial intelligence model Builds on recent advances in text to image generation technology to facilitate their text to video generation to learn about the world and its descriptions the AI system uses images with descriptions and to learn how the world moves it uses unlabeled video by doing this make a video AI allows a user to bring their imagination to Life by creating unique Whimsical videos with just a text prompt it can create unique videos that are full of vibrant colors characters Landscapes and other details it can even create videos from existing images or create similar videos from existing videos that are given to it Mena says they are committed to developing responsible artificial intelligence by ensuring safe use of the state-of-the-art video generation technology so in order to reduce harmful misleading and biased content their research adheres to the following steps first the neural network analyzes millions of data points to gain insight into the world and then meta applies filters to minimize the possibility of generating harmful content second because make a video is able to create videos that look real they add a watermark to all videos which is intended to let viewers know that the video was created with AI and is not a recorded video third while they hope to make the technology widely accessible to the public soon until then they will continue to thoroughly analyze and test their artificial intelligence model to ensure that each step of the release is safe finally make a video uses publicly accessible data sets giving their research an additional level of transparency Mena says they are sharing information openly in a research paper and they plan to release a demonstration experience as part of an ongoing commitment to open science generative AI research encourages creativity by giving people the tools to create new content quickly and easily make a video is a follow-up to meta's earlier announcement of make a scene which is a multimodal generative AI method that gives users more control over the AI generated content they create meta demonstrated with make a scene how users can create photorealistic illustrations as well as storybook Quality Art by using words lines or even freeform sketches meta stated that it's important to think carefully about how new generative AI systems are created and they will continue to use their responsible AI framework to further the refinement and evolution of the emerging technology high resolution wearable Electro tactile rendering device to simulate sense of touch in the metaverse a team of researchers led by City University of Hong Kong has created a wearable tactile rendering system to provide long distance and simulated touch with large spatial resolution and rapid response the team showed its potential through a braille display that adds the feeling of touch in the metaverse to perform functions like gaming and shopping in virtual reality as well as possibly helping deep sea divers astronauts and other workers who must wear gloves that are extremely thick although there has been great progress in developing sensors that digitally capture tactile features with high resolution and high sensitivity we still lack a system that can effectively virtualize the sense of touch that can record and play back tactile Sensations over space and time working in collaboration with Chinese tencent's robotics X laboratory the team created a unique Electro tactile rendering technique to provide diverse tactile Sensations the results were published in the scientific journal science advances the methods used to recreate tactile Sensations can be divided into two categories electrostimulation and mechanical mechanical devices are used when you apply a localized force or vibration to the surface of the skin while these mechanical devices are able to produce continuous and stable tactile Sensations they can weigh a lot which can limit their resolution when used in the form of a wearable device Electro tactile stimulators trigger sensations of touch within the skin with electrodes that send an electric current through the skin they are lightweight and flexible while offering better resolution and quicker response however the majority of these depend on high voltage direct current impulses which can be up to several hundred volts in order to enter the topmost layer of the skin to stimulate nerves and receptors previously this could be dangerous and lack high resolution but the most recent Electro tactile actuator designed by the team of researchers is thin flexible and safe it easily fits into the finger cot and the device provides a host of different tactile Sensations including pressure vibration and roughness all with very high resolution instead of DC pulses the team devised an alternating stimulation technique that is high frequency and was able to reduce the operating voltage to 30 volts which ensures that the tactile display is both comfortable and safe they also suggested a new super resolution method that allows tactile Sensations in areas between electrodes too this improves the resolution of their spatial stimulators by more than 300 percent and provides the user with a truly amazing and lifelike sense of touch the new system can elicit tactile stimuli with a very high degree of spatial resolution at 76 dots per centimeter squared which is similar to the density of related receptors in the human skin and a rapid response rate of 4 kilohertz the team performed a variety of tests to illustrate the different ways to use this wearable Electro tactile rendering technology for instance they suggested an Innovative Braille method that is simpler for those who have a visual impairment the new method splits the numerals and the alphabet into distinct strokes and sequences exactly the way the letters are written wearing the electro tactile rendering device on the fingertip the wearer is able to recognize the alphabet displayed by detecting the direction and sequence of Strokes by using their fingertip this would be particularly useful for people who lose their eyesight later in life allowing them to continue to read and write using the same alphabetic system without the need to learn the whole Braille dot system the Second Use case is for games and applications that use virtual reality and augmented reality bringing the sensation of touch to the virtual world the electrodes are both flexible and expandable meaning they can cover larger areas such as the palm of one's hand the researchers showed that users can feel the texture of clothing in a virtual shop the users could also feel an itchy sensation on their fingers when they were kissed by a VR cat while stroking a real cat's fur users will feel a change in the roughness when Strokes shift in direction as well as speed the system is also beneficial in transmitting fine tactile information through thick gloves the team was able to integrate these light thin electrodes into a safety glove the sensor array detects the distribution of pressure on the outside of the glove and then relays the data to the user in real time by triggering the user's senses in the test the participant was able to swiftly and precisely locate tiny steel washers that were just one millimeter in radius and 0.44 millimeters thick by using the tactile feedback that the glove provided via its sensors and stimulation the system's capabilities are evident and can provide high quality tactile feedback which is currently inaccessible to Firefighters astronauts and deep sea divers who must wear heavy gloves and protective suits this technology could benefit a broad spectrum of applications such as information transmission surgical training Tel operation and multimedia entertainment artificial intelligence turns a 100 000 equation quantum physics problem into just four equations without losing accuracy this work was published in the September issue of physical review letters and it stands to revolutionize the way scientists study Quantum systems with many interacting electrons the approach can also be adapted to other problems such as helping in the design of materials that have desirable properties like superconductivity or utility for clean power generation the researchers from flatteren institute's Center for computational quantum physics started from a large object of all the coupled together differential quantum physics equations and they use their machine learning model to turn it into just a fraction of a fraction of its original size the problem-solved concerns how electrons behave as they're moving on a grid-like lattice and interactions occur when two electrons are located in the same lattice area the configuration also known as the Hubbard model allows scientists to model several important materials while also allowing scientists to study how electron Behavior leads to desirable phases of matter such as superconductivity in which electrons flow freely through a material the model can also be used to test new methods before they are applied to more complex Quantum systems because when electrons interact with one another they become mechanically entangled the Hubbard model requires cutting-edge computational methods and serious computing power even when working with a small number of electrons therefore increasing numbers of electrons makes the computational challenge exponentially more difficult one way of studying a Quantum system is by using what is referred to as a renormalization group this is a mathematical apparatus that physicists use in order to study how the behavior of a system such as the Hubbard model changes when scientists alter properties like temperature or when they examine the properties at different scales a renormalization group which keeps track of all possible electron couplings without sacrificing any can have tens or hundreds of thousands or even millions of equations that must be solved these equations are so incredibly complicated because each one represents an interaction between two electrons the researchers wondered if they could use a machine learning tool called a neural network to help manage the renormalization group the neural network is like a cross between a frantic switchboard operator and survival of the fittest evolution the machine learning program first creates connections within the full-sized renormalization group then the neural network tweaks those connections until it finds the smallest set of equations that yields the same solution to the original jumbo size renormalization group even with only four equations the program captured the quantum physics of the Hubbard model all in all this machine learning method was able to discover hidden patterns and it was able to amaze researchers with its results which proved to be more than they originally expected the machine learning program took a lot of computational power to train and it ran for several weeks but now their program can be adapted to solve other problems the researchers are currently investigating what the machine learning model actually learns about the system as this could lead to additional insights that would otherwise be difficult for physicists to uncover themselves the biggest question remaining is whether the new approach will work on complex Quantum systems such as materials in which electrons interact over long distances the researchers say there are exciting opportunities to use this machine learning technique in other areas related to renormalization groups including neuroscience and cosmology Soft Robotics learn to grip using the right amount of force MIT researchers from the computer science and artificial intelligence laboratory as well as the Toyota Research Institute created a robotic system for users to grasp the tools and use the right amount of force to accomplish a task such as squeegeeing liquids or writing words with a pen the robotic system called ser
2022-11-24 21:03