Cybersecurity: Finding One Needle
in Many Haystacks

By: Mark Cummings, Ph.D., William Yeack, CSE

How is it that after all these billions of dollars of investment, the bad guys can still get in? The good news is that today’s cybersecurity technology blocks 99 percent of attacks.The bad news is that there is such a massive volume of attacks that the remaining one percent that gets through can create a financial and operating nightmare.

For example, in the first half of 2022, there were 236 million ransomware attacks reported during which the bad guys got in. There were likely many more that have not become public. This means that adequate protection relies on quickly finding and neutralizing the one percent. Unfortunately, the indications of an attack are a tiny needle in the vast collection of haystacks of data. The larger the collection of haystacks, the longer it takes to find these needles. Making the collection of haystacks smaller dramatically reduces the chances of finding them at all.

There are two paths emerging to solve this problem:

  1. New central site hardware based on emerging massively powerful chips
  2. Non-centralized security orchestration architectures

The optimal solution is likely to be a combination of these two paths. This combination is especially important in light of the fact that an increase in hardware horsepower is likely to also be used by attackers. Both emerging solutions are briefly described below.

The haystacks

The power of our interconnected world has created amazing advantages. As a result, mid- to large-size organizations have networks of thousands of phones and PCs, hundreds to thousands of servers, tens of private clouds, tens of public clouds, millions of apps and applets, billions of accesses to the apps, and millions of emails, web accesses, app accesses, and so on per day. Metadata on this can be quite large. Buried in this metadata (many haystacks from all of these many sources) is one piece of data that shows abnormal behavior (the needle). Typical tools in use today gather all of the metadata possible, store it in one place—and then seek to find that needle. The more data, the more likely the data is to include the needle. But the more data there is, the longer it can take to find those needles that are the symptoms of a breach.

How large can the haystacks get? Industry observers have told me that users of one industry-leading tool have generated so much cybersecurity metadata that they have had to create a second data center the size of their primary operational data center to store it. Thus, there is also additional overhead in collecting, transporting, securing, and storing this information.

With current AI technology, this search process is done through pattern recognition. Inference engines look for patterns that they have been trained to find. There are problems with both the time to train and the time to find. Finding is done with inference engines running on trained models. With general purpose clouds, it can take hours, days, or weeks to find the needle. And that is if the attacks are still following the same pattern the system has been trained to detect, end user or app behavior has not changed, and the overall system has not changed.

This is the challenge—while industry insiders are telling me that, in some cases, the time to catch a successful attack before it takes over your whole network is now less than 20 minutes. This is part of the reason we have seen so many successful ransomware attacks. A senior tech executive at Google, responsible for a Google AI accelerator chip, told me that with their industry-leading chip, it took two weeks to train a system. That is two weeks after all the training data and all the other resources including programmers were available. Unfortunately, sophisticated attackers are changing their attacks frequently. These changes happen as fast as every hour, with some in just minutes.

Because of current processing times for both training and inference, AI systems are not able to respond as quickly as we would like. In addition, identification of symptoms of a breach is only part of the job. Sometimes these symptoms are not caused by a breach, but by something else—thus are false positives. These AI systems leave it to


Latest Updates

Subscribe to our YouTube Channel