This AI Agent I Built Can SEE And SPEAK | n8n Tutorial

This AI Agent I Built Can SEE And SPEAK | n8n Tutorial

Show Video

This AI agent can see, speak and think for themsel without even writing a single line of code. So let's see the demo. Let me write a message in a telegram saying that hey how are you? And we get a response I am doing well. Thanks and how about you? It not only works with the text message, it also worked with the voice message. So let me show you that. Hey, what are you doing? And how is your day going on? And it not only takes the voice into consideration but also replies in voice. So now as we

can see we got a reply in voice. So let's see what does it says. I'm here to assist you and I'm ready to help. How's your day going? So as you can see it understands the text. It understands the

voice. It also understand the image that we'll give to him and it and it will also understand what we are writing in the caption. So let me show you the demo for that. As you can see we are passing this photo and we also asking what is written in this photo. So it will understand what is there in the image and and it will also answer what we have asked in the question for that image. So let's see what does it reply. So now as

we can see it is properly written that the text in the image read lang chain. There are also several icon representing different format and platform include PDF, HTML, markdown, Google drive, Microsoft word and the YouTube icon. So you can see that this AI agent not only understand the text, it also understand the voice which we are seeing and it will also understand the image that you're passing and what kind of question you are passing for the image. So if you want to see how do we build this AI agent this smart AI agent integrated with the telegram so stick till the end because this video is going to be amazing because you are going to learn so many different kind of a things that how does it work everything how does a different flow work and how does the routing and everything works over here so if you like such kind of a content please subscribe to my channel because I mix a lot of videos around this domain and don't forget to send it your friends. So let's get into the video. So this is the workflow which we are going to make today. Don't worry, I'll be showing you each and every node step by steps. So, first we need is a telegram

trigger. So, we'll search for telegram and then trigger and on message. We need a credential to connect. So, I have already my credential but is very simple. Okay. And we need a trigger on message. So, let's do the test step. So,

it is on and we can write message over here. Hey, there. And you can see that we got a message and then let's add the brain for all this thing. Let's add the AI agent and we need a message. What kind of message user has sent? So for that we'll make it defined below and this hey message and let's test this and it is saying that it must be connected with the brain for that which is llm. So we need to connect each chat

model I'll use open AI and I am I think it's fine for mini for me. Okay and we can use instead of 40 mini you can use 4.1 which is which is cheaper and much smarter. Okay. And we'll need a simple memory to have all the memory um to have the all the conversation that we already had.

And for the session ID what we can do is we can define from the chat id that will be best session ID for this thing. And from telegram trigger what we can do we can have this uh chat ID which will have the sessions and let's make the context window as 10. So it will remember last 10 conversation which we had. Okay. And

that's it I think. So let's test it again. So as you can see we are getting the response over here. Hello, how are you? And we can customize more to this.

What we can do is that inside the system message you are a helpful assistant to tell him the name who is talking with him. You can say we can add this thing which tells that the username is this and we are using username from you. Okay. So when now if we do the test steps it will respond as hey Lit how is this going? What can I help you with? Sounds amazing right? Yeah let's let's go to the next step. Now let's send this message to telegram again.

So from telegram what we'll do we'll have and send a message. So it will be send a text message and we'll connect this same account and we want to send a message from open AI output part over here and we'll have a text and chat ID will be from the telegram trigger which we had and everything would remain same. Let's do test steps. If you go to

telegram we can see we have got this message. If you if you don't want this thing, we can go in the workflow inside this add fill append trigger it off test steps and we can see we got the same message without the automatic appendation of N10. So our now normal text is working. So now let's integrate the voice part. So for that what we'll do we'll get the same input from the telegram uh trigger. So for that let's go over here and let's pass the uh let's pass the voice. So for that let's come

over here and start the workflow. pass the voice. Hey, how are you? What's your name? So now as you can see it got the message over here. But if you see it failed in the agent part because it is not customized for the voice part. It is

only customized for the text part. So what we'll do we we need a switch over here where it will understand that we are getting the text message or the voice message. In the voice message you can see that there is a node of voice. So we need to accordingly route this thing. So for that what we'll do we'll add a node over here which is called a switch node and in this switch we'll assign the routing rule. So if there is a voice node if there is a voice object so what we'll do is equal to object exist then we'll remain it to um voice. Okay then

test tab then we can see it went to this part. So we don't need this thing. So and and we'll add one more thing. So if there is kind of um text part so we don't see the text. So let's go over here and write a text hey there. So now as you can see in

the whenever the text is coming we'll get we'll get this text part over here. So we'll add another routing rule inside we'll check if there is a text part. Okay if there is a string this exist is equal to string exist then we'll remain it to text text test steps. So now we can see we have the two routing part the one is for the text and one is for the voice. So we'll connect test part with this thing because it was working properly. So now let's add the routing

for the voice message. So for that let's get a voice message. So let's go to Telegram and let's say hey this is the test message. So now we have sent the

message. We'll go in the uh workflow. We'll test the workflow. You can see that we have got the voice message and we can see there is one item over here. So to to understand this item let's go to telegram get the get a in the file action. We'll get a file then we'll connect this file with this thing and we'll add the credential and in the file id from the telegram we can see that inside the voice we'll have a file ID we'll add the file ID over here and we can see in the test tabs we have the file okay so we have got the file over here so now we need to transcribe this file from voice to text so for that we'll be using open AI we'll go for over here search for open AI inside open AI we'll transcribe a recording And we'll have a uh connect your account. Then audio transcriber data.

Keep everything same over here. Let's do the test step. And you can see we got the message. Hey, this is a test message. So it is working properly fine over here. So now let's connect this thing with connect this part with this uh agent tool over here. So but now if

you notice we are facing a problem over here. So it is expecting a JSON dossage to text. But from here we are getting the only text part over here. So whenever we get a message from this thing, it is coming in the form of message.ext. So what we need to do, we

need to convert this thing to normal uh text part. So for this thing, we'll come over here, add the um edit node. Let's rename it to um set node and rename it. So now let's connect this part. We need

to have a text message. So for that, let's go to Telegram. Let's send a message. Let's send it. Let's come over here. Test workflow. Now you can see we got a text message. Now we need to

convert this message in the form of normal text. So let's get that value. So we'll get a value from here. Let's convert into text. Now if you see we getting the text. Now inside this thing we will be just looking for the normal JSON.ext and we'll reply it. Now we're getting error in the simple memory part because we have changed the message over there as well. So from this thing

we will not be getting from JSON dossage. we will only getting from JSON dot chat dot uh let's go to the telegram part and this thing we can see we are getting message from here so let's remove this part let's add this thing uh wait a second let's add this thing and now we're good to go let's test the step and yeah in the JSON uh simple memory part also we can see that chat id doesn't exist so for that thing we need to run it again so we'll run this step we getting the message now let's add that um uh chat id. So for that thing we'll drag and drop this part. Let's remove this. Drag and drop. And now we getting the same message. Let's now test

it. Now as you can see we are getting an amazing message over here. So everything is working properly fine. So let's test for the voice part as well. Hey, how are

you? What are you doing? Now as you can see everything is going properly and we got the message and let's see what we got the message and we got the message hey lakshit I am here and ready to help you out with anything you need. So we can see we are getting the message in the text format but we need the message in the normal voice format which we are sent for. So for that thing what we'll do now we'll add a if node over here. If we are getting a

text then go and directly send to the uh in the text format. If we are getting a voice format then send a message in voice format. So for that thing we'll need a if node inside if we add a condition that whenever we are getting the voice message if there is a voice exist so is is object exist then it's true okay then let's test this step so we are getting the voice message then it is true if it is a false then we'll directly send the message to the telegram because it means it is a normal text message but it is true then we need to convert the message into the voice message yeah so for that thing what we'll do we'll add a node over here open AI inside the open AI will generate audio. For generation of audio,

everything will remain the same over here. Inside the text input, what we can do, we can just get a text from the AI agent and just add it over here. And you can choose your own voice. There are different different voice over here. Whichever you like, you can choose it over here. I'll keep the default. And let's do the test step. As you can see,

we got the data file. So, it's working properly. So, now we need to send this data file to the telegram. Let's add the telegram node. Let's go to the

telegram and then let's select a send a uh audio file. Then turn on the binary file. We'll select the telegram because we need the chat ID. We'll go to the telegram trigger part and we'll be looking for the chat ID. Sorry uh sorry not telegram go to the switch part because from we got the chat chat ID. So we'll have a chat ID over here. We'll

keep this binary file on because we have generated a binary file and we'll just change the credential to the um channel to which we are sending the ID. Let's test tab and you can see the message has gone inside telegram. We can see we have got the audio file. Let's listen to it.

Hey lock sheet, I'm here and ready to help you out with anything you need. Since I'm an AI, I don't really do things unless you ask. So right now I'm just waiting to chat with you. So now you can see that we are done with the text part and with the audio part. Now

another thing which is remaining is the image part. So I hope you like the video. So please press the like button, subscribe to the channel and comment your feedback and suggestion. It will help me a lot. So let's go for the image part. So for that thing you understand

uh I think you got it right. We need another switch which will identify is it a voice, text or image. So for that let's send the image first. So you can see we sending this image. So let's don't write first anything. Let's send it. So now as you can see we got the

image inside the switch. It got failed because it is neither a voice nor it is a text. So we need to add another routing rule which will identify if it is a photo or not. So you might be seeing there are multiple photos. Photo 0 1 2 and three. But we only send the

one photo. So why there are multiple photos over here? So in Telegram it by default send the same photo in different different resolutions. It's all photo zero, photo one, photo two, photo three all are the same photos but in different different resolution. So you can use however you want. So you can see photo zero is the smallest photo with a width of 1951. Photo one is the bigger than

that. Photo 2 is bigger than that. So photo3 is bigger than that and photo 4 is the highest quality according to the resolution you send. It will have multiple resolutions over there. If my

file would have been less resolution then photo 4 might not exist. So we need a photo two which will almost be there. Okay. So we'll drag and drop the photo to over here and we'll see for it does it exist for not. So we won't be sending the photo to because we'll be sending the photo array over here and we need to see does this array exist or not. So for

that thing we'll uh go to the array and inside array we'll search for exist part and if it exist then we need to output image as image. Let's do the test tab and we can see we got in the image output over here. So now we need a third rule for the image part. So first thing is we need to get the image. So for that we'll go to the telegram and we can see we have not renamed it over here. That is a very bad practice. I also I also

did over here but you never follow that thing because naming not only helps you but it also helps any to identify it. So let's go over here. Let's right click and rename. We'll rename to um what what does it do? It do the get file. So get file and let's rename part. Okay. So now we need to add for for we were on the image part. So for the image we need to

get the image. For that let's add the image. Uh let's go to the telegram. Then we'll search for the get

a file. We'll keep everything same. Just add the file ID over here. So for the file ID, we'll drag and drop this 42 file ID. Let's drag and drop over here.

Let's do the test step. So now you can see it is telling wrong file ID or file is temporarily not available. So for let's see why it is giving the error. It is giving this error because the channel is wrong. So we need to go agent vision

and let's test again. And as you can see we got the proper file over here. So now once we got the file now we need to identify what is going inside the file.

So for that let's connect uh let's first rearrange this thing. There is an amazing part over here. So we can click on this brush tool which will do the neatness for you. So let's go over here

and then let's add the and let's understand what is going inside the image. So for that we'll add a open AI node. Inside this we'll search for generator image and analyze the image. We'll do the analyze the image. And let's the image we are asking what is in the image. So we'll make it customized

for now. Let's keep it the same. What's in the image and let's do the URL. So from the we don't have the URL we have the binary file. So for that we'll go in the image type. Go search for the binary file. Let's keep everything the same and

let's do the test step which will give an error and we got an error. Okay. This is not the error. First we need to select the model. So let's select the

model of GPT40. Okay. And we'll do the test step and it will give an error. And we got an error because it is saying that we it is having an invalid meme file. I don't know how the sound my me but it is I am I m I me f type. Okay. But why it is giving this error? Because open AI wants the meme type as a JPEG or JPG something. But Telegram is giving me application octet stream file over here.

So we need to convert this meme to proper format which which OpenAI accepts. So for that thing we'll add a code node over here. Let's search for code and just drag and just write this code which I'm writing over here because I got from the uh online because you don't need to understand anything. It's just converting the file. So it just it's just a simple file. It just take the

extension of which the file is there and converts properly. So let let's do the test step. So now as you can see earlier it was application octed stream. Now it's converted image JPEG which open accepts. So it is working properly. And now let's go to the open AI. Let's rename it. So yeah I'm forgetting again

but you don't forget. I also try to do it. We will learn together. That's my moto. Okay. Let's rename it and rename and we'll remain this part as well. So

now over here and let's rename this also. Okay. Now let's analyze the image and let's see what does it give. Now as you can see it is giving a proper message and everything. Now we need to

send this message to telegram. So for that we'll uh connect this node over here. But you will see some discrepancy over here. So in telegram we can see it is not working properly. Okay. And it is

also uh expecting a JSON.output file. But we are sending some different file format over here. It is sending content file. So we need to convert into output part. So we'll add a node. We'll do a set field. So let's rename this

thing set field. So we'll get a message. We'll get a file from OpenAI. This is the content part. And we'll convert it to output and test tab. And now it got a proper output. And inside this thing,

you can see it is working properly. Now the only problem is the chat ID part over here. So for the chat ID I think we need to run it again or we can get yeah you can get the chat ID from the switch part. So let's remove this and let's add the chat ID part over here. And I think we need to run everything again because it will it will then run properly. So

let's do one thing. Let's test workflow and let's uh go telegram and append the image. Let's send this image. inside the workflow you can see it is going it is going now it'll go over here then it'll go over here and inside the telegram web we got the proper message now how to ask question regarding the image so for that thing what we'll do so let's send the image again and let's write the message also so let's send the message image so let's write so we'll do one thing we'll come over here and we'll run this uh node telegram trigger inside telegram trigger we when we'll go over here we can see that with each and every image at the end We are getting a caption part over here. So we are getting a caption. So we need to ask open a specific question. What is there inside the caption? So we'll go to the analyze image part and instead of hard codingly writing what's in the image. So we'll go to the expression part. Okay. Let's make it

big. Let's do one thing. Then let's go to the telegram part. Let's drag and drop the caption part. Since we know

that caption is optional, okay, it won't always be there. So for that we need to add the um pipe double pipe and let's say write some default message. Default message it could be what is in the image. Okay. So if this caption part is not there then this will always work.

Okay. And let's go again to the um switch part. Then we'll get a get let's get the file. Let's convert it to mim type. Then let's analyze the image. Let's do the

test step. So now as you can see we're getting the me proper message over here. Let's do the edit step. Then let's send the message to telegram. So now as you can

see we getting the proper message over here. So now everything is working properly. So now Telegram works with the voice message. It works with the text message. So it also works with the image file with the customized caption or you can say whatever question you want to ask. So let's run each and everything

from scratch and let's understand how it work properly. So now let's do one thing. Let's put everything into the production mode. So let's turn this active part and we'll see everything working in the production mode instead of coming again and again and turning and clicking the test workflow. So let's

go over here. Let's start with the normal test. So we can see we have written this message. It is understanding that hey Lakshit. So it's that's my name. I'm your ready to chat

your name is Lakshit. Thanks for asking. How can I assist you? So now now let's talk with him in the voice part. Who told you that my name is Lakshit? Is there something fishy going on over here? So we have sent this message.

Let's wait for the reply. So now as you can see we got a reply. Let's hear it. No worries Lakshit. You

mentioned your name earlier in our conversation. Specifically you shared it in the information you see. So how smart it is. It is able to understand that I have shared my name in the earlier conversation or is getting my name from the telegram. Now let's share the image. Let's ask him which color is in the background. Let's send

it. So we got a simple sweet and simple message. The background color is in dark blue. How smart it is. So there are this

is only the tip of iceberg which I have showed you what you can do with the telegram and integrated with the internet workflow and not just telegram. It could be integrated with the WhatsApp. It could integrated with the slack. It could be integrated with the Gmail. You think about it and it can be done with that. So this is this is just

my contribution telling you what are the possibilities that can be done done in the nin and I'll be also giving you so many different different ideas if you stick to my channel. So let's have a long-term relationship because I'm going to make so many different kind of videos which you can't imagine which also I can't imagine because what all new technology will come I'll make a video on that. So stick to my channel send it to your friend who are learning this kind of amazing automation stuff and subscribe to my channel like and comment your feedback and suggestion. We'll see you in the next video. Till then take

care. [Music]

2025-05-03 07:46

Show Video

Other news

How Space Technologies Accelerate Scientific Breakthroughs 2025-05-07 23:33
Intel Foundry Direct Connect Keynote (Highlights) 2025-05-05 09:48
This Open-Source Device Makes Any Car Self-Driving 2025-05-03 15:54