Active stereo vision technology - Pinpoint accuracy for machine vision

Active stereo vision technology - Pinpoint accuracy for machine vision

Show Video

Hello everyone, welcome to the IDS Vision Channel. My name is Boris Duchè, I'm a product manager here at IDS and I will hold today's presentation to dive into Ensenso technology. Please keep in mind, this presentation is made for advanced users, for engineers who want to get an indepth understanding the technology in order to optimize the setting of their camera for their application. Before we start talking about 3D, we need to have quick talk about 2D images to understand the difference. So when we talk about a 2D image we have to

understand that each pixel is encoded with three values. In the case of a color camera with a single value. In the case of these values are typically encoded in three times 8 Bits. Each value represents one color Channel red, green and blue and your display, your monitor will adjust the LEDs at the pixel level to display the appropriate color. This is how a 2D color camera works. When we talk about

3D, we work with depth maps, so how the 3D data is encoded within depth map. So typically for each pixel you will have three values to encode the pixel. The values would be the X Y Z values in milimeter. Okay so you have depth but also X Y information. For each pixel we like to display such an image in fake colors. It makes things easier to understand when we just look at the

image. So in this example you can see on my screen when the pixel is dark blue, it means that the pixel is far away from the camera. When the pixel is green, it means the pixel is close to the camera. This is only for visualization purpose. When you make your own algorithm you will of course not work with the fake color image, you will work with the real depth map where each pixel is encoded with X Y and Z value. This the depth map is a 2D represent of 3D data. There is another way to visualize 3D data, it's to use a

Point Cloud. Let me show you how it looks like. I have this software that is called Cloud compare, that I use for visualisation purposes and in here I have a 3D phone Cloud. So this is what you are more used to when you work with 3D, it's a real 3D file format. But keep in mind so sometime we like to work with a 3D data represented as a point cloud and sometime we like to work with a 3D data represented as a depth map. Usually when you make an

algorithm that can operate on the depth map it will make your algorithm go faster. And Ensenso cameras work according to the principle of active stereo vision. So I try to explain to you what stereo vision is first and then we will try to understand what active stereo vision is. About stereo vision it is the way that your eyes work. How does it work when you look at a point? The point will be imaged in both of your eyes to make a 2D image. And based on both of these images your brain will compute some 3D information, some depth information. This is why we says that if you have only one eye, you lose the depth

perception. If you close one eye and you try like this to close to touch your fingers, it's more difficult with only one eye than with two eyes. So this is the principle of stereo Vision. Good news, it's not so different with cameras, they operate the same. So how does it work? Once again this is basic stereo vision. You look at the point, the point is imaged in both of the sensors that makes the stereo vision system. There is no brain into a camera but hopefully the

camera is connected to a computer and you have the driver of the camera, of the computer and the driver will compute the distance from the point to the sensor. We will go more in depth a little bit later about how this process works. The distance between the two monocular sensors, that makes the stereo vision camera is called the baseline. Has a huge important on the quality of your data. Usually when the baseline is larger, the accuracy of your point cloud is

better. The angle between the two monocular cameras that makes your stereo vision system is called the vergence angle. It is of course important to have an overlap between the field of view. Let's take a look at an example how stereo vision works when light to a camera. You will look at the same scene from two different points of

view. The driver of the camera will take both left and right images and we try to find for each pixel of the left image the corresponding pixel the right image. If this algorithm is successful, then the computation of the depth map of the point cloud is successful. The example that you see in my slide, you have a lot of homogeneous areas. You have a pixel that is white, surrounded by pixels that are right and you look in the other image, if you can find the matching pixel and you have so many possible candidates. So many that the matching algorithm will not work. This is a limitation of serial vision cameras, they don't like homogen areas.

When you look at the pixel with local contrast. So the pixel is grey and the pixel on the right is white and the pixel on the bottom is dark grey maybe. And when you find look for the corresponding pixel on the other image, it's pretty easy to find. So the matching algorithm will work, you will be able to compute the depth map. So passive stereo vision system are limited when it comes to homogeneous areas. This is why Ensenso cameras are not passive stereo vision systems, they are active stereo vision system, which means in between the two molecular cameras we have added an additional component, which is a pattern projector. And that will just add

some texture, some information on the image. Let's look at exactly the same example. Now I can see the same scene from the left and right point of view. But now my pattern projector added some

texture to the scene and it makes it much more easy for my matching algorithm to distinguish from one pixel to the other. Because each pixel looks very different, so if I look in the right image I have a pixel that is black and the pixel on the top is white and the pixel on the right is light grey. Can I find a corresponding pixel on the left image? Yes I can, because each pixel is very different to the other and you will understand a little bit later when we go in depth is, that you don't need to look for the corresponding pixel, the whole image. We actually just look for single line. The computed depth map is much more dense, much more with less holes, yet you still can see that some information is missing. Why that? Because from the right and left point of view you don't

see the SE exactly the same scene. Okay you have some Shadows area that are not located in the same place for the left and the right point of view. So this computed depth maps that you can see on my slide I have made myself. It's not perfectly accurate but the idea is to show you wherever you can see the shadows on the right image. If you go on the left image, you don't have the same information. Same thing on the left image, I have shadows in here, shadows on the left of my cup. I don't have the same information in the right image so my

matching algorithm will fail. This is a limitation of active stereo vision cameras. Hopefully with an sensor you can work with multiple cameras together and get rid of the shadows.

Most of the camera families that we have in the Ensenso product line operate according to the same principle, even though they have different form factors. So for each of these cameras you have two monochrome sensors on the side and you have a projector unit in the middle. Please remember that today's presentation is for engineers, I want to give you in depth understanding how those Ensenso camera work. And now we will focus on the matching algorithm. We understand the idea behind. I made a

scheme to explain this to you. Here it, is on this scheme you have two sensors. Sensor S1, sensor S2. Course it corresponds to the two sensors of the two cameras that make the stereo vision camera. Two sensors and two optical

center and the optical center is a point within the lens, where the light beams will cross, will converge before reaching the sensor plan. So matching algorithm what it will do it will go through each pixels of the image number one and to try to find the corresponding pixel in the image number two. If it is successful, this matching algorithm then the computation of the distance from X P1. So from the point to the sensor is pretty easy. If your system is calibrated, meaning that you know the baseline distance between the two sensors and you know the vergence angle. But this is computed in the factory

calibration, so how does it works this matching algorithm? We are looking at the point x. The point X is image in the sensor plane one at a pixel P1 and you try to look for the corresponding pixel in the sensor plane two. Very cool thing, you don't have to look for all the pixels in sensor to. You just have to look at the single line. This concept is called epipolar geometry, makes things much easier for the matching algorithm of course and this is our matching algorithms. Use this principles so

whenever we pass the image of sensor number one, we just look for a single line on the sensor two. And it makes the matching algorithm much more faster of course to compute the depth information, that is the distance from X to P1. The system needs to be calibrated and it is very, very important that the calibration is accurate. So you know the baseline, you know the vergence angle. But try to imagine, if that moves a little bit from the production to the delivery of the camera to your place. I don't know, there is a

problem, there is a shock onto the camera and the vergence angle slightly moves. Then this value that we use to compute the depth is not accurate anymore. This is why Ensenso cameras the chassis is made in aluminium. We are very

experienced in making these cameras and the calibration is not lost, that's a very important step that we will discuss in the next chapter of this video. Now that we have a better understanding, how do matching algorithm works, I would like to talk a little bit about the image rectification. This is an intermediate step that is performed by the driver of the Ensenso cameras. Before the matching algorithm so the two images that come directly out of the two sensors that makes the Ensenso stereo camera. You

have the left and the right images and these images under the line. What does it mean? If I draw a line from one sensor to the other you can see they are not perfectly aligned, but because my Ensenso cameras are calibrated in the factory it means I can make a trick, an algorithm trick so that both images allow perfectly. Let me show you what it looks like, so these images in this slide is before the rectification. Now I will do the rectification to align perfectly my images. Here is how it looks like and now for each line of pixels of sensor, one this line is perfectly aligned with the same line of pixels on sensor two. It's very important of course for the

matching algorithm, because it means that for every pixel of this line in sensor one I will look in the same line in sensor 2. Makes things much more easier of course. What I want to show you in here is Ensenso cameras, the chassis is made in aluminium and the calibration is extremely robust. These images I took

with a quite old camera and yet you can look after the rectification of images. The pixels are still the images, sorry are still perfectly aligned. This is what makes Ensenso cameras very successful, because they are very reliable in the production line so the calibration actually does not move even after years of usage. So once again this is an intermediate result. You get the two images out of the sensor camera, you perform the rectification of the images so that both images perfectly align and then the matching algorithm can go through each line on of the sensor number one and find the corresponding pixel on the same line on pixel on sensor number 2. I would like to give you some explanation about the technologies, that our customers like to use very often it is called a Flex View technology.

So far we have seen how the stereo vision works when you work with Ensenso cameras. So you take a pair of images, the images are sent to the computer and the driver will operates a matching algorithm to compute the depth information with Flex View technology. We will not take single pair of images, we will acquire multiple pairs of images and we'll use these multiple pairs of images to compute the depth information. Of course it only works with a scene that is static, not moving, but it gives much better results and our customers like to use it very often. Let's explain how it

works. So in the case when you activate Flex View when working with Ensenso cameras, the button that is projected by the projector will move. It will move and as it moves the Ensenso will acquire multiple pair of images for the 3D reconstruction. You will see it better in a few seconds when we start to play with the software, so here you can see the scene where is the pattern projected on. The scene is moving because

we use the Flex View technology. We will acquire two four six image Pairs and base on these multiple image pairs we will compute the depth map. And here you can see on the left, the depth map that you get when Flex View is disabled and on the right when Flex View is enabled. So it gives you much better

results with much lower noise level. Of course it only works if the scene is static. Okay so now we have seen how does the camera work, how does the matching that work, how the images are rectified. Now I would like to take a look at the software, so that you can optimize the settings of the camera to match with your application.

If you watch this video I guess you should be familiar already with the Ensenso software that we provide with the Ensenso cameras. Let's open the Ensenso software. Okay in my case I have two virtual cameras available. You open this one and might work. So what you can see in the end of the software, you can see the rectified images. Remember that go full screen,

maybe so you have the left and right rectified images. Where you have the overlapped buttons of is projected by the projector. You can see the depth map, so remember depth map each pixel is included with X Y Z values milimeter depth map and fake colors and you can see the rendered 3D view. In that case, with the

monochrome information that is overlapped on the scene okay. So what we would like to do right now is to get a deeper understanding of the parameters available in this software, so that you can adjust them for your use case. Let's open the parameters view and by default, you will have the easy to use view of the parameters. That means five presets, depending if you prefer to have a point cloud very high data quality but quite a slow frame rate or you can choose to go dynamic fast, so in that case the acquisition rate is much higher but you will lose some accuracy. So dynamic seems fast, static seems quality. There's a huge impact on data quality, it also has a huge impact on frame time. What is the difference from dynamic fast to

static quality? Actually you have two very important differences. One is the matching algorithm. We discussed the matching algorithm in detail. In today's presentation the matching algorithm can select between the few options when you work with Ensenso cameras. Some of these algorithms are extremely fast, some of these algorithm are extremely precise.

It really depends on the tradeoff that you wants to achieve. Let's take a deeper look. If you have the advanced options in here under the stereo matching valuation, in there you will find the matching method that is the name of the matching algorithm that is used. When I choose to work with static scenes quality, the prefered

matching algorithm is the sequence coration. Is the one that gives one of the best results in terms of quality, but it is quite slow to operate. When you go to dynamic fast, it will use another kind of matching algorithm. It is a patch match, they all have some advantages and some disadvantages. Patch match is much, much faster but also the region of interest that you makes the 3D of interest that you working has no impact on the speed of the matching. So usually when you pre-select the static high quality it will go with the

sequence correlation. _But also it will apply the Flex View technology, so the Flex View technology you can see if it is applied or not in Flex View settings. So Flex View is enabled when you work with static scene qulity. If you go dynamic scene fast Flex View is off, because you want to take only one pair of images to acquire as to compute the 3D point cloud. So once again if I go back to static scene quality, now Flex View is enabled with 16 pairs of images for one configuration. We can take a look if you want. Let's close it. If you go to the monocular image view you have this slider in here, that permits to go through all the images acquired you can see. Remember

the project pattern is moving and you can go through all the images to understand what is happening. Let's continue your journey to better understanding Ensenso cameras and Ensenso software. So we discussed already about the matching algorithm and the safy technology, that permits to achieve high data quality or high frame rates. You also might be interested when you it's Advanced options in here you

might be interested into the post processing parameters that you can see in here. The post processing parameters. You can just have a look at the depth map on the left play, a little bit with the postprocessing algorithm and we'll have a quite obvious visual feedback, visual understanding what the parameter is about once again. With me adjust for example some median filter radius at the minimum and now I increase this parameter to the maximum and you can see at the edges of the depth map he impact of the parameter. Okay if you want in depth explanation, in depth understanding what each parameter is about, you just have to overlay your mouse on top of the parameter. Click on the link and you have

access to the documentation and you will have all the explanations that you need. I would say that most of the time you don't need to have so much understanding, you don't need to play with them so much. Usually just add the advanced option, you just choose the predefined set of parameters and you should be satisfied with the results that you get. It's only in case you see a part is missing and you try to understand why this part is missing and then you need to go into the parameters. In this very specific example I'm a little bit disappointed, because I don't see my object in here. With the default preset I try to understand why and there it is not because of the post-processing algorithm it is because of the predefined 3D Vision of interest. So I'm quite sure that if I increase the

depth range of my camera in here that was the problem. So the maximum distance and depth range those are parameters that define. Let's say a 3D area of interest, so the maximum distance will filter out any points that is more far away from this maximum distance and the death range will give you the width of the 3D region of interest. So you try to make the 3D region of interest as close as possible to your object or to your bin. If you working in a bin picking application so that you can filter out any noise or any objects that you are not interested in.

Another topic that I would like to address in today's video, it is the frame time, frame rate. A lot of customers come to us and say 'how fast can you go within cameras?' and it's not that easy to answer the question because it depends a lot on your application, depends a lot on the parameter set that you will use. I think you understood now that the Ensenso will acquire a pair of images, send this pair of images to the PC and on the PC side the matching algorithm will be performed by the driver. In my case the matching algorithm performed by my computer. It takes 1.3 seconds because I'm currently working on the laptop. It's not a very powerful configuration and the matching algorithm takes lots of time. Remember this is an Ensenso with 5 megapixel images for both left and right sensors. This computation of the

disparity map can be increased by changing the matching algorithm. This we seen already. Reminder I'm here into static scene quality, if I go to dynamic scene fast I decrease the computation time for the matching algorithm, therefore I increase my frame rate but also the configuration of my PC will have a huge impact. So currently I'm working on my laptop, I think I'm using the graphic card of my laptop but if you have a more powerful graphic card will have a huge impact and you can expect to diminish this computation time and to increase the frame rate of your camera. Also like any other camera one of the limiting factor is the exposure time of your camera, so by default the exposure time is set to automatic in the case of the Ensenso sensor C57. The projector unit is extremely powerful which means you can

work with quite small exposure time. So how do you adjust the exposure time? You can see your left and right images is in here. By default the exposure time is set automatically but you can disable that and you can play with the exposure time.

You to try to have left and right images. With enough contrast for the matching algorithm to operate, if your exposure time is too long, it will have an impact from the maximum frame rate that you can achieve. Okay usually will leave this automatic but it depends in some application you may want to set it fixed. Okay if you want some help understanding the maximum frame rate,

you can achieve with an Ensenso camera. It depends from one family to the other but you can access the documentation, it will explain how to distinguish between the acquisition time and the computation time that is needed for the matching algorithm to perform, to find the corresponding pixel for the images. Last but not least I would like to have a quick talk about calibration. So you understood to read the first part of this presentation, that is extremely important to have a very accurate calibration. Calibration is done factory

chassis of Ensenso cameras is made in aluminium and Ensenso cameras have been in the market for quite a few years now, so the calibration is extremely stable. Yet you may want to check the calibration of your camera, this is a good habit to have, so you should have calibration plates that sold our sales team. And when you install the camera it is good habit when you install the camera for the first time to show the calibration plate to the camera and to go in the workspace calibration tabulation. In here shows the calibration plate to the camera and it's to recalibrate camera after moting. This is an extremely simple

process, you just have to click once. It will take an image and it will just make a very light recalibration of the camera, to make sure that you have the best achievable result. So this is a good habit to take that concerns the intrinsic calibration of the camera, another kind of calibration you want. You may be interested in it is about changing the coordinate system of your point cloud. So by default, when you buy an Ensenso camera and work with it for the first time, it look like this. Give me a second.

By default when you buy an Ensenso camera, the coordinate system is in here. You can see all the free axis or at least you can see two. Yeah you can see the free axis by default, the center is on the left camera. Sometimes you like to change the coordinate system that you work with. Typically, if you make a bin picking application you like to have the origin of the coordinate system in the middle of the bin for example. So in that case you will go on the workspace

calibration tab in here and you will hit the button set original calibration plate and you just have to show calibration plate to your camera. It will be displayed and overlay live in the camera images, I cannot show it right now because I'm using a virtual camera and you will just hit this button and it will redefine the origin of your coordinate system. So this is something, one of the first things that you do usually when you work with a Ensenso camera. Here we are at the end of today's presentation, I hope that you liked it, that you learned a little bit of something. Keep in mind, first contact

you have with the Ensenso cameras you should stick to the five presets that you have very easy to use, it's almost a plug and play camera and as you get more comfortable with the tool you start to show the advanced option and you start to try to optimize the settings. Please remember that you have access to support at IDS support team in Europe, support team in the US, support team in Asia. So how do you reach our support team if you face any problem? You just need to go to our website, you go to the IDS website. Should be in your favorites already, you just hit support button in here, support and get technical support ticket. You will have to fill a form in here, that

gives some information what is the project you're working on and this ticket automatically will be forwarded to the support team the closest to your area. Don't hesitate if you feel uncomfortable to optimize the settings of your camera, ask the support team to organize a team viewer session. Usually for an experienced Ensenso user it takes 5 minutes maximum to optimize the setting and in five minutes it can help yourself so much time. And once you get comfortable with Ensenso tools it will be for you also a few minutes to install the Ensenso camera. Thank you very much for attending this IDS Vision

Channel video and we hope to see you soon for another video. Bye bye!

2024-08-18 21:55

Show Video

Other news