CIS30C Lab 13: Find Geolocation, PDF metadata, PDF image, Website technology, Firefox data in Python
hmm hi michelle how are you sorry i'm it's late but just answering some questions from my colleagues okay okay good i'm glad you're doing well i think you picked up your medal from the um inland empire mayor's cup i believe they told me right okay so let me get the transcript ready and then we'll go through the live okay that's it cool i don't think so i think it's probably plated if it anything yeah i think abraham was probably joking i think it was abraham or someone but yeah i don't think it's gold okay so today we're gonna do oh um also i posted i just posted a job for city of you kaipa yeah it's heavy well you know you can do the gold test there there are places that you can like the jewelry places you can bring it in there and all they do is they swipe it on the stone or they also have like um chemical tests and so they can tell you whether it's bold or not i don't think it is right i know when i buy metals for cyber camp it's heavy too and it looks gold that's not yeah so city of yukaipa is uh hiring there's a part-time position there so i put the link in case you're interested um it's part-time and it's mostly i t okay so if you are you can check that out okay so let's take a look at the lab let me open that up so we're gonna cover all the concept that we touch on in the chapter today um now uh for the geolocation i wanted to do it a little different than the book in that um i wanted to use a module that's going to give you the actual physical address of a place and coordinates um in the book they use ip which is great you can you can use it to to look for system and things like that so it uses the geo ip2 you saw plenty of example we looked at that during the assignment so i want to use this opportunity to show you how you can also do actual geolocation using some of the things that google and other companies will be using okay so let's just let's share screen here so geocoding uh to start this is a way that we can get the location information um of a place and getting the coordinates of that location um so now you can also do reverse geocoding where you would use the coordinate and be able to get the address so we can do it both ways and we are and the script is fairly short for using geo pi so geo pi you can access the project information there and also look at the documentation i put the link this is used for web services web applications and other things um even including mapping in some cases so what we will need to do first is we will need to bring in the package so or the module so you would bring in by doing the pip install gopi and it's gonna use what's called the nominativ api and this is an open street map to give you the coordinates of the actual address okay so in the code you will see that we will import in that super class and we would use that super class to instantiate our object and we are going to look for the address now keep in mind that the address you cannot add in the comma between the street and the city because it's not able to parse if you use a separator okay so when you type out the address you don't need the zip code or the number you simply just put the street number the street and the city without any comma okay so let me see [Music] why oh let me restart my virtual machine i had installed the new um the new add-ons for it and now it's not let's close this let's break this down a little bit okay so after you install the geopi what you can do is you can write a short script and in this we are going to use the geocoder okay and we're going to bring in this class with that class we're going to instantiate the object called geolocator and for the location all that is is we are going to put our address in the parameter as an argument and then we are going to use the geocode method to be able to pull the latitude and the longitude and then when you print it's going to give you the latitude and longitude and additional information like the county where the address belongs to uh and so on okay so just a few lines and we'll be able to get some information about the physical address let's see right control post key my virtual machine is i had it rebuilt and it's not gone hold on one second let's try this again okay so while that's going so when you do this you can simply write the script and we'll be able to pull the information now when you run it you simply just invoke python and then you run the python file okay you will be able to see the coordinates inside so okay now it's starting earlier it was not even wanting to put the boot in okay any questions regarding geo pie you've got an error okay what kind of error do you have connection oh it's time now okay let me try online and i'll let you know right now give me one second okay so maybe something's funny with their server so when i tested this yesterday it was working fine i was able to um to output something we'll see right now i still have the script on here so okay so we're gonna okay so let's do python 3 and i think i labeled this as geology yeah so i'm able to put let me nano in my file and see if there's an error on the document side yeah so you should be able to see something like this where it would say i use the school address so it would say the street the county california some zip code information and then for the coordinates so i printed location.address um so you can oh i'm using script under number three yes that's correct so i just added this line but i can comment this out so you can see it's exactly the same as number three yeah so it should be able to do that so i think check your connection or your network configuration yes i'm getting the coordinates here okay yeah so my script is the same as number three i was able to run on the number six okay so number six uses the same thing right so here's my script for number three so i have from geopi dot geocoders import nomina tim write geolocator user agent is my app and then location we would have geo code same exact thing and then the address make sure that your address doesn't have any comma or any symbol and then i would print latitude longitude and then print location.raw you can also print location.address
right so if i do a print location.address like this let me clear it so you can see so if i invoke python and rerun the file so i would it would give me the additional address yes um for the first print line so this one has double parentheses the other one doesn't right because it needs to wrap it for the inside let me edit that let's see let me see if i take this off if i take the parentheses off let me test it yeah it still produce the same thing regardless yeah okay good so on the the paret you don't have to use the double parenthesis there i think the single one works just fine okay they all should have single parentheses there okay so once you have the output you would see that it would give you the coordinates okay so take a screenshot and then provide the coordinates oh the single did not work yeah i just modified mine right now so let me show you yeah so i modified mine right here to have a single now if you're using nano i just space out the print so that way it color codes it but i also added the location.address okay okay that's fine you can use it with the double i don't understand why it's not but it could just be the version of of new perhaps okay so we can add in the double and the double should work as well let me clear it so you can see let's just run it yeah so we're getting the same thing for both okay so the output it gives you the county information the zip code might not be exact um i think it's just gonna be you know in that area and it also gives you additional details so here you actually using the database osm uh to be able to pull right the address coordinates so this is a way that we can incorporate the python script for the api and we can use it to look for the exact coordinates for our physical location or a place so put down the coordinates for the address and then in some information about what is being displayed like the county and some of the characteristic of that physical place now for number six we just simply reverse it right we would use the coordinates to see if we're going to get the same uh at least the street not the exact physical address so so for the next part we are going to do the reverse geolocator so it's going to use the reverse method using that particular module and again you're going to instantiate the object using this class and the difference between this one and the last one is that you have the reverse method there to find in your let location zoom in once again no i don't want that okay so using the reverse method we can find the address from the coordinates and then we can print out the actual address and then we can print out the again the latitude and longitude so simply you can just edit your location object to be able to output so i'm gonna that's the code now you can do a lot more with this particular this particular module you can use it with a lot of different things so if you look at the information that they provided online so for example if you go into their documentation for geopi um you can also use this with adapters um you can also implement distance that will be good for mapping on the application you can also look at how that this can be integrated with cloud or other type of web services okay and then and so on so there are a lot of different elements to this that you can use right um and i simply just started with the geocoder example so you can see the basic okay so for the reverse geocoder once we have the script we would simply save and then we would run address log.py okay so it tells you the place id which should be closely to the last one um and then again now it would show like the teemo street which is a close by street to the address itself so even if you're using the exact coordinates it's going to be close enough right or in proximity of the actual physical address so that's how you can use the reverse geo loc geo coder decoders i should say okay so after you have the script ran take a screenshot and then it gave me the general address that's being displayed so for us right it tells you that here and then also the address printed so it doesn't give you the actual house number it's just going to be the street same thing with the other one right you would get the sheet now depending on the actual region some coordinates work really well in finding the exact address and then in other cases even if we're using the exact coordinate from the last the last script we're still going to get something in proximity not the exact okay any question all right so next we're going to talk about metadata and i think we saw a little bit of this in the assignment um so pi pdf2 we can obtain data that would be related to the pdf we can find author information title subject number of pages and we can even have the script to read part of the the actual file so what i did was i have you you can use an existing pdf file that has many pages but this is a free cyber security book right so if you click this link now you want to use your your browser in here to do that okay yes oh okay so what i did was when you download it it's going to put it onto the download folders and of course you can use the terminal right to use firefox and visit the site so this is the file once it's downloaded now if we just check out the file we would see that it does have images it has many pages 317 like you said and then we can send it to a certain page or access a certain page using a script so i'm going to be back in my terminal and i want to make sure that i install pi pdf2 so we would do let me clear this pip 3 install pi pd f2 okay so i had already installed it but it would take a few minutes if you have it once i have it installed i'm going to write the script so in the script we are using pi pef2 we're going to import in the pdf file reader it's going to allow us to access the file to determine the number of pages and pull some text so this is a super class so we are going to instantiate an object called reader and we are going to use pdf file reader to [Music] access the pdf file so make sure that your file name is exactly typed as it is named so if you want to rename it right if you rename it then you have to make sure that it matches this part then we define number of pages that's going to be reader.num pages so it's going to determine the number of total pages that you have in that document and then another object called page and it's going to read the first page so here is an in-depth zero and usually the first page would have some kind of uh general information about the book or art or different design things on there then we define an object called text and we would use this to extract text would then print the number of pages the first page and the text so let me come back to here and for that script we're gonna call it um oh i didn't give it a name for you guys so let me see pdf reader i label it as pdfreader.py
so i'm going to go ahead and nano into that to make sure okay so once we have that we would save it and then we can invoke python to run it so here it print out the total pages is going to be 3 17. the first page that has some of the numerical value represents different things like design things that will go into there and then to extract the text we simply have the text object and we want to print that out so um it gives me cyber for beginners so here but if we compare to what it actually is right it doesn't understand um let me go back up to page one because there's a break right that could be that they separated this cyber security um they separated it so it becomes it shows security cyber first in the output and i think what it is is when they design this book this is a separate text box compared to this and they added that in a separate time so this is probably why they have it shows up as security first and then cyber for beginners okay instead of the other way around so this is one way for us to use pi pdf2 to get some text data and also details about that particular pdf so when you get to the stage after you run the script and get the result take a screenshot and then answer the question what text information is displayed any question now um so make sure we have these two so for the image part it is a little bit longer and i'm using something that is provided by the documentation and also your textbook looks similar to this so to pull image we can um we can use the same module but image is a little bit more complex in that you have to access the image and then view the image as a certain uh type of file okay so here we are gonna import sys so we're gonna use the sys module as well and from the pi pdf two we're gonna look at different attributes for the image then we have a function we are going to [Music] inside that function we are going to use we we're going to have an a container called reader and we're similar to the last example or exercise we're gonna have the pdf file reader read the pdf we can use any values so if i wanted to read 154 right 454 pages i can then in the next part we are going to set up how each of the image can be treated as object and for the object we are going gonna use extension to access the object and if there's no extension it's just gonna define the file name after the second object and that will be dot png so in the case that if there's no image then it will print no images found so in this one we're not going to have much details in that whether we're just gonna check to see if it has an image at least one or um it would it would print no image found so in the main i would then define right where if it's not two then it's going to print this okay and then we would exit so for the pdf we simply use the rgb part of this is module so here we go we would do nano pdf image so that's the script and then after i save that i would invoke python to run it so the output it just tells me that the this is the output that means that it has images in the pdf now if it doesn't it would say no image is found so we have images and that's proven to be true in the case where it would um if you look at the document it shows images screenshots any question okay so for the script after you've done save and run the script as the result just take a screenshot and then explain how the module is used in the program any questions okay so what so as you run this you would see that it it tells us that it has images right all right so i'm gonna go ahead and clear this in the next part it goes over website technology and in the book it mentioned two built with and well wall missing an a here vaporizer i'll fix that and re-upload it okay so this is used so they this version both versions would have websites and python uh integration with the api we can use that with to detect what is used to make this particular website or host it okay you might see server information you might see os information you might see other things that's related to that with the technology that's made that's used to make the website so let's go back here now if we test it on our regular browser right if you go to built with website and then what you're going to do is you are going to check wordpress so using my regular browser i should just put https uww depending on the tool that you use to search or and also the the content of the actual site so note that here it gives you a warning it says wordpress.com is on our misleading profile site list and they're using multiple domains in order to support this so now it would break down various things what they use for analytics so you have facebook domain sites quant cast bing and so on then we also have some widgets that would be for mobile platforms and other things so this is a good way to kind of check out what that website component is made of built-in what properties do not work with your yeah so if you're using uh the onion router or some kind of proxy via vpn it's not going to be able to send back the header information and such so now i just use a regular browser to do this or you can do it in the virtual machine what what properties do not work with if you are using okay yeah so i think when you're using in encrypted traffic it won't let you uh obtain some information there let's try it in firefox okay so i'm getting the same thing on a room too here and then if you scroll down you will see more details for what it's used so there are quite a few tools that you use for analytics you just have to list some of them that's fine i think our websites gather data so analytics is um top right there okay so answer the questions once you have the details are you still not able to connect maybe turn off the vpn for the time being okay yeah you can use windows that's fine now if we use built with with python not too much that we have to do but keep in mind that this is the simplest level so you're not digging deeper into what would be available as far as built-in method for this module okay so let's come back here let me come back to my terminal and i'm going to go ahead and look at the script so three line right we import built with we define an object called tech use and then we're gonna parse the the url for the domain and then we print it out very simple so now after we have written those three lines we would then invoke python and we would run the script okay so it tells me that it is nginx or yeah nga swept server there's some google font and e-commerce information here's the cms it uses wordpress the programming language is php and for the blocks they use php and wordpress so once you have the output information you can answer the question okay i have information on the languages the cms okay and the web server now it won't be as detailed as if you're using the actual service on the website okay so if we compare that right you would have more details via the website so okay and of course the information is going to be the same because it's the same target right it's the same url that we're using in both 17 and 20. then for the next part we're going to use the second tool which is the weaponizer and you can use your windows for this so we can do the port switcher so we're gonna take work link and then plug it in here now localizer sometimes uh it would require that you register to get 50 searches for free so if it prompt you you know sign up information it's because they don't give you just free they want to track so i ended up signing for my account as you can see there and then it's going to give me this so the technology stack for this site is your windows and that will be running iis so we see that here and it also gives you the version so for as far as pen testing goes right we would start here we can also use our script we can find out more about what's really running in the back for a certain site in hosting functionality and also the front end with the you know the development for the content that you see so because the our target has some kind of service that you can pay for so you have some payment information cloud web services certificate manager so they use aws for a few things okay so based on the result from the url on the web service here we would answer the question what os is used what type of server so you have os here and then your web server is iis of course because it's windows and then for the ca manager it uses the aws or ssltls certificate manager now if you subscribe to quapolizer they give you more details about the organization and you can you can get a little bit more feature so that's if you pay for the service it's fairly affordable but we don't need that for our lab we just need to use this the free version and again if if it's not giving you anything it's because you have to sign up okay then we're going to test this with the python script to make sure that it's going to give us relevant correct information like what we've seen from the web website so next you have to install this okay python wobblizer is what the package is so we would do a pip3 install python and then we write the script for number 24.
so i have that already pre-written so let's open it up so we're using the api and you simply use its package to import in wapalyzer and webpage using web page class we can instantiate an object called web page and then inside that we would the parameter we would use the target url in this case we're using port swigger then we define the second object and this is a wobblizer object and we want to have use the method latest layers there and we will print we can print it to analyze the web page we can print it with the categories for that web page so once i have the script save and then invoke python to run this group similar to what you've seen in the other tool but in this one right um it did catch some warning information there but it still went through [Music] it shows me the type of web server i have and for the operating system is windows server and then we have broken down to categories and so on so it is of course not as detailed as the glue like what you see also in the prior exercise but it also given you the general information so if you want to to check out right a domain and what kind of os they're running there are multiple tools in security and we can do that and this is one of them not only that we can use this to really see how they what kind of tool they use to uh support that website or add elements to that website now compare this to built with right but with gives you a little bit more on like the programming language and the details so every um every tools is slightly different but still they still gonna give you the general okay so that was 24 and then as you get the output for 24 we want to do screen capture and then check to see if it's giving you the right os which is windows server so answer this question and then if it's displaying the server type which is iis 10.0 now compare built it with webalizer you would see that um the interface the functionality so the goal for each of these are the same like i said the output for it is a little different so i i suggest so mine was having the same issue um i ended up signing up for a free account which made it a lot faster you can only okay so try using the url without the extension right like um let me copy this without the verb extension chart like that yep i think sometimes without the extension is better because in general it's going to look at the index page right for that particular website and to really give you the summary of what technology is used there any questions okay so as we complete that step for wapalizer we answer the questions and provide screen capture okay now there is a small section in the chapter that talks about browser metadata um or we can do some forensic with the browser via a tool called firefed that's for firefox browser since we're using db linux it's commonly pre-built with not pre-built but pre-installed with your db and linux like ubuntu okay so we will need to install firefight and it is a command line tool that would inspect the profiles of firefox including history cookies um passwords that's stored so simply what it does is it goes and retrieve the file that's saved for the browser activities and um you know so how the browser is used will be identified here okay now i just want to let you know that firefag when i ran it with the other commands um it's it's it's have some trace back so the module itself have some mechanism issue or um problem uh bug it's a little buggy okay so just to let you know this is why i don't want to extend it further but i want you to know that there's a tool that you can use for that so it's used extract password um pre-preferences add-ons history favorites stuff like that okay so we will do a pip three install firefight and as i completed that step before where i install it already you can also do it with the upgrade let's try that too okay so it's already been installed for mine but yours might take a minute or so then just like any command line tool we will start with help and this gives you a list of command that firefit access let me try so you can try some of the commands okay so it tells me the version is fine there now if you're not sure how to use it you can go to the documentation they show you some example here like for example if i want to see bookmarks i can do that so let's try it yeah i i get the same thing so i think what it is is the module um that is written here they have it's still having issues and i also clone it and you know and navigate to the directory that has it i was running into the same thing okay so fire bed cookies see this works yeah so it's handling all these exceptions and that's coming from your site packages and or the actual clone itself so stability wise this is not fully stable this tool right but i want you to know that it is there and then we can do the same thing to test for history we would use a summary right so not just um so i think there's definitely something with this package or we might have missing something for the install any questions okay so all you need to do for this is to access the help and see how you would be able to use some of the commands and then answer the question so if i want to find browser cookies summary i would use firefed cookies dash dash s for summary and similarly we would do the same thing for history so we would have fire fed right and you can see history is this there's a history dash and it's also handling the exception again okay so i think you know the components that are on there for this application to work there's some issue with it so you would say that it's not working here using cookies and history command okay but if i do this firebag um cookies dot half and so you can go a little deeper where you would say okay so what is what when i'm using this particular tool so answer the question for twenty-seven and i can also do [Music] fire effect history health now they say that this supposed to run you can run this on shell on the window side as well but um since we're using linux for all this labs i want to just use linux straight up so and i did um let me show you i i did clone it too to see if it works all right so it's here and then you know you can so if i navigate to firefight right and list again to see and it has a subfolder so we can find out what's missing too so so this it's not a full flush tool part yet it's not perfect right um but the book mentioned that it is a good tool for firefox okay so that's basically our lab for today and then um if you have all the end questions answered and if you have the screenshots and the script brand then you're set so it covers all the components that we talked about in the chapter so next week we'll do uh encryption and then maybe i will combine lab and assignment together or something like that and then we'll do some final review we'll see if not we will finish the chapter next week uh or that will be our last chapter and then in week 15 i can do a final review in one session and i'll let you have the rest of the week off so you can prepare for the final exam and work on your project okay any question for the final project do i just make a simple http yes that's fine you don't need any um if you want to test a login like to brute force it then you can create an account um but i you don't need to do that right yes just run web services on the local machine and then you can test it on the local machine you don't need to you know network crazy stuff no i think we did a lab where we enable http service using each using python so you can reference that lab and then you can implement monitoring and security tools on that server okay meaning that you can write the script for that any question okay let me stop recording in case