Successful communications intelligence (COMINT) data analysis in 5 easy steps
Hi everyone. Thanks for joining us today for this on-demand webinar about the five components required for successful communications intelligence analysis. So on today's presentation, I'm delighted to say that we're joined by some real experts in the world of law enforcement data analysis. And before we dive in, it's my pleasure to introduce them. So firstly, we have Giuseppe Francavilla. Giuseppe has a distinguished career and background in the world of law enforcement data analysis technologies.
He began his career and the Italian financial police, before moving to the commercial sector with roles at IBM, and BAE systems and several happy years also at Cambridge Intelligence. He's currently a consultant for Sistemi & Automazione, applying his experience to their extensive range of data analysis products. We're also joined today by Valerio Fumi. Valerio was a criminal lawyer and an intelligence analyst, and he works as a training consultant for Sistemi helping their customers to extract intelligence from call data records, social media data.
And he also has experience as a trainee judge working on a number of cases covering white collar crime, drug trafficking and the mob. So not quite as exciting, but my name's David Burnett, I'm the head of sales at Cambridge intelligence. We are a trusted partner of Sistemi and Automazione. And I have over 20 years of experience working in the cyber sector. Let's quickly go over what we'll cover in this session. So I'll begin by discussing the policing data challenge.
Some of the misconceptions of how that's done, and then a little bit more on the reality and how modern investigative work is actually carried out. I'll then pass to Giuseppe who's going to go on to discuss some of the history of Sistemi and Automazione, the organizations that they work with the communications intelligence challenge and why it's an increasingly important aspect of law enforcement before then passing to Valerio, who's going to demonstrate some of the concepts we'll introduce with a demo of TETRAS, which is Sistemi's widely used platform for law enforcement COMINT analysis and visualization. Then we'll come back to me.
I'm going to give a little bit more information on Cambridge intelligence and close today's session off. This session is being recorded. So we won't actually have a live Q&A as we normally do, however we are of course, keen to get people's feedback and questions. And if there's anything you'd like to ask me, or Giuseppe or Valerio then just submit your questions using the YouTube channel comment section below and we'll make sure to get back to you. Just to dive into this, I think when most people think of police investigations, they think of something similar to this. I think we've all seen an image similar to this on TV shows and in films, of the investigation wall, which is, you know, a sprawling map of paper and pins joined with bits of string by police officers investigating a single incident.
The reality is nothing like that. The reality is quite different. So as you can see here, The reality is much more high-tech.
So whether it's call data records or open source intelligence, like online social media profiles, incident and investigations, databases, financial records, body cam data. You can see them all here. Police investigations involve massive amounts of data. So police investigators and analysts need to be comfortable with complex data analysis. They need to know how to turn data into the intelligence that they need to either make an arrest or secure a conviction. So this leads us to the COMINT analysis challenge.
COMINT, of course, being communications intelligence. COMINT it's an increasingly important part of law enforcement. It allows investigators to connect suspects, to incidents locations and other individuals, but it's not a simple task.
It can be quite challenging. The data involved is, is big. It's varied, and it's extremely complex.
Printouts can include thousands of data points from phone calls and cell tower pings to online banking data and internet browsing history, it's a huge technical challenge, before we even get to the analyst experience. So for the analyst, we like to say they're trying to connect people to digital shadows. I suppose you could call it. The investigator will often only have an IP address or a subscriber account number to work with.
In reality the human behind those accounts could be anyone. But if you can combine the subscriber information with a handset location or someone's social media history then you can start to build a narrative. It's like putting a jigsaw together. They need to be able to combine all the pieces to understand the full picture.
So there's also the challenge of ever-growing complexity. Crimes are increasingly planned and committed online, which then leaves digital footprints that need to be joined up and interpreted. So piecing together those records is a huge challenge for investigators and analysts. Then let's finally the resources policing is often actually slowed down by legacy technologies.
So all their un-intuitive desktop tools that don't work the way modern policing needs them to, Combined with resources, around both people and funds. It's clear why COMINT analysis that can be such a challenge. And part of the solution is to give analysts the best possible tools for the task, which is why we've invited the team from Sistemi to join us today. So on that note, I'm going to pass you across to Giuseppe. I'll just stop sharing my screen Giuseppe, so you can share yours Hi David, thanks for your introduction, and also for this opportunity for me to catch up with Cambridge intelligence. I'm really happy about that.
So, Sistemi & Automazione - Sistemi - it's quite interesting. So they are one of the major suppliers of investigative solutions to government, financial, and utilities and organizations in a wide range of sectors in Italy and in Europe. They are really proud of being the very first one to introduce link analysis visualization software into the Italian community 25 years ago. So the graph really it's important for them in their architecture, because it achieves the function of fusing data coming from different heterogeneous data sets. It could be unstructured data, it could be relational graph databases, social media logs, and this fusion is made possible through the adoption of semantics all across their application.
So, the graph is important, not only because of the fusion, but also because it becomes the main point of interactions with the whole application. So it's not just seen as an output device. But it's really the right place for the analyst to use all the functionalities available within the applications without having to leave the graph itself.
And this is made possible through a vast range of drop-down menus and other controls, which are within the user interface. We have been talking about the graph because it's the most prominent component. However, we should not ignore that there are other components that contribute to the success of the application itself. So, in a way, if we want to use a metaphor, we can say in the same way in which an orchestra is really the combination of a variety of instruments. Each one has got its own characteristics.
Each one plays its role, but they all contribute to the single performance into, to the fusion itself, which is the symphony at the end of the day. In the same way, the analyst must be able to leverage different components inside the applications, for each task he's to perform each single time Based on our experience, we can say that we have identified five components which are, in a way mandatory, if you want to have a performing analytical solution. And these components are, first of all, the ability to acquire data, to structure this data and to store this data in an efficient way. The second component, if you want is a set of analytics. The third component is what we've been looking at until now, which is basically the visualization itself that can be expressed at a connection level, but also it can be expressed through different dimensions - like the time dimension, the space dimension. The fourth component is vital because every time you achieve some results here, we're able to share this data - these results - with other people to export some of the results into a format that can be used by other applications.
And finally all this like in the symphony we said before, it must be synchronized. It must be aligned. So both the information and the data and the applications can run in parallel and ensure the same truth, if you want in every single task. As an example of the application of these five components, five principles, we want to show you quickly one of the applications developed by Sistemi and this is called TETRAS. TETRAS is an acronym for telephone traffic system. The application started exactly 20 years ago when telephones basically were really simple devices this is why the name is telephone traffic.
But in reality, personal communication has evolved, and the application has evolved in the same way. So they (phones) now include a variety of different ways to interact, to exchange information. It could be SMS, it could be IP connections, it could be bank connections or anything like that. So TETRAS is able to analyze all these different ways in which communication gets expressed. The application has reached the third version [which] has been currently deployed to 800 sites all over the country in Italy. And it's basically the de-facto solution for COMINT analysis by all the analysts, which are over 2000.
Just a brief introduction. It's a client-server architecture. This is important because for instance, when there are tasks that could take longer or require more resources, they are automatically assigned to the server.
In this way the client, the analyst, can get on with these tasks without having to wait for the completion of that input or whatever analysis has been performed in the background. The application reflects the way investigations are carried out. So it's divided into cases, into sub-cases and it's a desktop application. However, there is also the possibility to have a web access, which has been designed specifically with management in mind, because they may want to have the flexibility to do quick querying in complete freedom, and like David was pointing out before the real purpose of the application is to be able to combine, I liked the word, digital shadows, these details, you know. And it could be email accounts, it could be subscriptions, it could be just metadata of a call or an IP address connection. And all these contributions really define the profile of a real person, real subject, a real organization.
So let's look at these five components in detail. So the first one is the data acquisition. So TETRAS is very very powerful from that point of view, because in most cases, this acquisition is completely automatic.
So if you have a printout or a download of CDRs or other forms of communications, and you submit that to the system, it will be automatically, we have more than 250 formats, others can be developed for different data providers by our specialists. The input is powerful also because there are a lot of tasks connected to that. When you import a list of calls, all the numbers included in that call will be normalized. That means they will be all set in a standard format, which is the international prefix, national prefix, and the number itself. A variety of input can be done, not just for CDRs, but also for IP network logs for social media logs.
You can input a list of cell towers, lists of people if you want. And in this latest version we have also added the ability to ingest whatever extractions from mobile devices, it could be smartphones or tablets or a PC / laptops down through the most common standards available on the market. The second component, very, very important, is analytics.
We've got a wide range of analytics, but at this point, I would basically hand over the floor to my colleague, Valerio who we go into detail about that. Thank you, Valerio. So thank you. Thank you, David and Giuseppe for the floor and for the kind introduction.
I'll start from the pre-analytic tool which is considered, which is fully considered one of the most powerful analysis tools of a TETRAS 3. First of all, the software starts itself automatically, with a kind of pre-analysis as soon as the importation is over. And it will provide the analyst with a complete overview all over the data imported, no matter if they come from cell towers or a smartphone or any other kind of different or several devices. And you will be able to display according to their occurrences, the relevant, or less, presence of targets and elements of interest in your case, in the files you have just imported. If you decide to drill into details and to dig across the view of your data, you can display, in stats and your tab your files and immediately figure out, according to the stats, how much traffic they have performed and with whom by choosing a different and several types of charts, like pie charts or histograms or bar charts and so on.
Moving forward and starting to view real analysis tools. We have, first of all, the smart analysis, which is a summary of the most addressed needs of law enforcement agencies and police agencies. Which we have collected through our 20 years experience in this field.
Starting from this, we can perform from the easier as you can see, kind of query like the most contacted phone of a target, in an extremely automatic and fast way. But we can also use this tool and this summary to perform really more advanced queries, like in order to to find out and to figure out if there have been people in the same place at the same time, like in this instance, I have just a performed for you automatically. One of the most ambitious -- connecting to what David and Giuseppe just said -- one of the most ambitious achievements of our software was connecting in the same directory every kind of entity, such as the web accounts, email, and phone number, but not only -- information can also come from social networks, social media or manual and physical observation from police or law enforcement -- was connecting all this data in just one directory. And this is what we have achieved through the behavioral analysis in the identity directory. From here, we can put finally the focus, not only on devices and on and over calls, but we'll move the focus on people and on their relationships. By figuring out and displaying their social patterns and their movements, according to the people, they have been connected to the most, and the cell towers and the paths they have displayed all over their movements on foreign and national territories.
According to this need to track down peoples' movements, and not only device movements, we have created a new kind of map visualization that we can start from every kind of target we decide to put the focus on. And I'm going to show you an instance of this by sending to the map in this case, just phone traffic records, but we can do that with every kind of information that has a form of geo-reference, and you can perform - I'm gonna perform this path. So, from here we can display every kind of movement through the map of our targets simply by giving the order to the software and we can also switch the speed, according to your need. From here, you can confirm or discuss if there are truth or lies in the witness or the criminal's declaration by figuring out if the subject was exactly there or not or somewhere else. And finally there is a strong device that makes it easier giving feedback, which is the report function. And allow the analyst to automatize in a standard template, every kind of information to make the feedback easier.
And it can be customized and populated with the information, not only from phone data and traffic records, but can also customize after or previously then the performance of the report. And you can distribute it automatically just after the performing of the report. That was just a short overview of the most important functions of the software. I thank you for your attention. And I'll give back the floor to Giuseppe. Thank you, Valerio.
I'm going to quickly show you now how everything works together. Okay. So the first thing I like to highlight is the fact that we have two viewers actually available for the analysts. The first one is the IBM i2 Analyst Notebook, which has been traditionally the way to visualize this data. And in the last few years, Sistemi has also been creating an additional viewer through KeyLines. So the analysts can choose simply what he prefers like that.
And then once he's made the selection that viewer will be basically the standard viewer. Let's look at a simple way to analyze. Let's do a search. Here, we've got an investigation here. I can easily switch the investigations by clicking on it.
Now let's look at the people involved in my investigation. I can see the list of people here and I've got some intelligence recently acquired on one of the subjects on which I have very little at the moment. And it's this guy here, so that's Huslam. So we select this person and then we'll basically send this person to the viewer.
There you are, and from the viewer, like I was saying, we can retrieve all the information and perform other operations regarding this person. First thing is I'm going to visualize the type of data I've got in my case, in my investigation. I can see the person is associated to this lady he has also been associated to these two SIM cards. This is his address, his residence, and finally he is also connected to this phone, to this mobile phone. I can go back to the database and task for the subscriber of this phone. I can see that this is the person that according to the printout of the information I've got from the telecom provider is a subscriber.
I can also ask for other subscribers that could have the same name or a similar name in they say that I've got four people actually with a similar name for these people, then I can now query their phone. And here they are. And then for each one of these phones, I can also ask for the printouts, the type of information that have been ingested in the system. So each one of these printouts is available to be viewed in exactly the same way Valerio was showing you before, but I'm doing it in this case directly from the graph. You can see here it's quite a large number of calls included in this printout, 86,000. So I'm going back now.
I'm not interested in this printout. So for the moment, I'm basically going back on this information and because these are the same people what I can do is I can merge these people. I can combine them using the combo feature. That's familiar to the people using the Analyst Notebook, okay. And here I've got my guy, but let's see. So the intelligence I have about this person, about Sadak.
Let's see how we can progress that we are going back to the query function. This is a particularly powerful query function because you can ask for a huge number of parameters. So in this case, the intelligence I have is about the certain set of calls, which were performed by this person between 8 and 11 in the afternoon, in the evening.
The calls were fairly short. So I can say I'm only interested in the calls between 10 seconds and 2 minutes. And the other thing is I don't want to see all his calls because according to the intelligence I have, they were only a voice over internet protocol (VOIP) calls. So I do my query and here I get the list of calls. I can see them, but I can also go straight onto my viewer and basically commit all these calls to the viewer. So let's see what else I can find out.
There is this lady. It could be quite interesting. So let's see what I've got about her. She's connected to Sadak, as we've seen before. She's also connected to other people like Kiril here.
It sounds interesting. So it looks like just out of the latest James Bond film almost, but there you are. Kiril is connected also to the same phone. And I can see that both Julia and Kiril have connections with these two phones. Now that if you see the frame here, it's got the same color yellow. That means that both phones are part of a list.
Lists are very, very useful because it enables analysts to save a subset of this information, according to certain criteria. So these two calls probably belong to the same list. So I can see what is included in this list. There are all these phone numbers, which were called by these users. And there you are.
Here I've got this list of calls. This is interesting for me. This phone has a black frame and in the meaning of this color is a convention to express the fact that this phone has been involved in more than one case.
So I can visualize the fact that this phone, in fact, has been involved in three cases, here, in four cases actually. And I can also see whether this one is included in some lists, which is included in the target list. This is very, very relevant because after each import, I've got the possibility to identify some of the numbers as my targets - specific targets - for the investigation. This could be the most active number or other numbers that I have decided to identify. So here I've got the list of targets, again. This was the number we started from, but I can see there is a connection with this other number and with these other numbers here.
Let me see if this number has also got some kind of lists, which it has, okay. So the list, in this case, is foreign numbers that have been in touch with it. And here, I've expanded this and I can see that these numbers are actually one Chinese number and the other one is Hungarian. Now the fact that I've got some of these foreign numbers here, I can see some Turkish number as well, makes me think: I've been working on foreign numbers in the same investigations before, and I've saved my graph into the database. So what I can do is I can go back to the database and retrieve my graphs and add this graph to my existing graph.
So I can see that it's a coincidence. This enlargement refers to the same number mostly. And then there is another set of numbers, here, which are mostly provided by this telecom provider, 'WIND'.
I'm also thinking that I've got two additional pieces of intelligence and these graphs, however, have been produced by another team that has been using Analyst Notebook. So, I'm going now to open these two charts, with the Analyst Notebook, and look at it. So the first chart refers to two companies that have been identified during the investigation. Okay. This is the first company here. They are shell companies.
So, basically, shell companies they are just made up, but in reality, they are a way to cover these dodgy activities being performed by these people. One is based in the UK. There are a lot of, I can see some Russian names. And, again, here is another company. This is based in Tunisia and in these are the people.
Okay. So, I'm going now to try and reuse this information in my graph. And this, it's very easy to do, because what you need to do is simply select this data that you're interested in, and then...
let me see. So we select all the data or part of the data and we just drag and drop this data into my graph. There is a second graph that I'm also interested in merging in my investigation at the moment.
Let me recall the second graph, which is this one, this is purely an import of telecom calls. Again, I can do the same thing. I can select all this data and then drag and drop this data into my KeyLines graph. Okay. And I can close now the Analyst Notebook.
I don't need it anymore. And this is my resulting graph. So on this side, I've got the data that have been charting from my database.
But this is the data related to the two companies. And this is the data related to the printout that has been imported through the Analyst Notebook. Now I would like to point something that's quite interesting. You see, when you input calls, like in this case, generally you get this type of outputs where the icons, you know, are pretty much the same because at the end of the day, they're all telephone numbers.
However, look at the way TETRAS works on the data, it's much smarter than that because it not only identifies them as numbers, but also identifies the single telecom provider. And in case of a foreign number, it also identifies where this number is based. It puts the flag on the icon to identify the nationality of that number. And beside this data, I've got lots of additional information that I can consider.
So for instance, in these connections here, I've got 10 calls. So if I go back, I can see these calls individually. And I can do this both for the numbers that I've imported from the Analyst Notebook and also from the numbers that I had in my database here. I can see all these calls again.
But I can do more than that because I've been dealing with all this disparate type of data. What I'm trying to do now is to find some elements in common. And the first one is to identify whether I've got the same phone number in this graph. And if I ask for this, basically, I do get that. So you can see here, these two numbers had been merged. One was from my database.
The other one was from the Analyst Notebook charts that have been imported. Okay. I can do other types of matching if you want. So, for instance, I can say "let me identify all the numbers that belong to the same telecom operator". Okay. And here I am.
So all these ones are provided by WIND and each one of these numbers still keeps its own identity. So, and although they have been combined I'm still able to perform my activities, to expand this information. And so on. Another matching that can do is, for instance, by country.
I may want to identify all the phone calls that belong to a certain country and certain nationality, and so on. If I'm happy with it I can save my graph onto the database. We can call it, I don't know, David's investigation. And, so, I'm able to reuse this graph or to send this graph to my colleagues or to merge it with other information.
Okay, thanks Giuseppe. And thanks Valerio. That was great.
Really interesting. Just a reminder again for everyone that's watching, if you have any questions, please do submit them using the YouTube comment section, and we'll definitely get to those and answer those for you. So, before we wrap up, I'd like to give a little more information on Cambridge Intelligence. So, at Cambridge intelligence, we build data visualization tools that help to make the world a safer place. From law enforcement to cyber security and fraud detection, we work with organizations that are in the globe. Every day, thousands of analysts are relying on technology to join the dots in data and uncover hidden threats.
And they do this with our data visualization products for connected data. So using our toolkits, it's quick and easy to build game-changing data visualizations and just deploy them anywhere, to anyone. And as I said, we do this with our toolkits.
And we hope you as viewers, found it informative. Thank you for taking the time to watch the presentation.