Archive for the ‘AI and machine learning’ Category

Improving AI-based defenses to disrupt human-operated ransomware

June 21st, 2022 No comments

Microsoft’s deep understanding of human-operated ransomware attacks, which are powered by a thriving cybercrime gig economy, continuously informs the solutions we deliver to protect customers. Our expert monitoring of threat actors, investigations into real-world ransomware attacks, and the intelligence we gather from the trillions of signals that the Microsoft cloud processes every day provide a unique insight into these threats. For example, we track human-operated ransomware attacks not only as distinct ransomware payloads, but more importantly, as a series of malicious activities that culminate in the deployment of ransomware. Detecting and stopping ransomware attacks as early as possible is critical for limiting the impact of these attacks on target organizations, including business interruption and extortion.

To disrupt human-operated ransomware attacks as early as possible, we enhanced the AI-based protections in Microsoft Defender for Endpoint with a range of specialized machine learning techniques that find and swiftly incriminate – that is, determine malicious intent with high confidence – malicious files, processes, or behavior observed during active attacks.

The early incrimination of entities – files, user accounts, and devices – represents a sophisticated mitigation approach that requires an examination of both the attack context as well as related events on either the targeted device or within the organization. Defender for Endpoint combines three tiers of AI-informed inputs, each of which generates a risk score, to determine whether an entity is associated with an active ransomware attack:

  • A time-series and statistical analysis of alerts to look for anomalies at the organization level
  • Graph-based aggregation of suspicious events across devices within the organization to identify malicious activity across a set of devices
  • Device-level monitoring to identify suspicious activity with high confidence

Aggregating intelligence from these sources enables Defender for Endpoint to draw connections between different entities across devices within the same network. This correlation facilitates the detection of threats that might otherwise go unnoticed. When there’s enough confidence that a sophisticated attack is taking place on a single device, the related processes and files are immediately blocked and remediated to disrupt the attack.

Disrupting attacks in their early stages is critical for all sophisticated attacks but especially human-operated ransomware, where human threat actors seek to gain privileged access to an organization’s network, move laterally, and deploy the ransomware payload on as many devices in the network as possible. For example, with its enhanced AI-driven detection capabilities, Defender for Endpoint managed to detect and incriminate a ransomware attack early in its encryption stage, when the attackers had encrypted files on fewer than four percent (4%) of the organization’s devices, demonstrating improved ability to disrupt an attack and protect the remaining devices in the organization. This instance illustrates the importance of the rapid incrimination of suspicious entities and the prompt disruption of a human-operated ransomware attack.

Line chart illustrating how Defender for Endpoint detected and incriminated a ransomware attack when attackers had encrypted files on 3.9% of the organization’s devices.
Figure 1: Chart showing Microsoft Defender for Endpoint incriminating a ransomware attack when attackers had encrypted files on 3.9% of the organization’s devices

As this incident shows, the swift incrimination of suspicious files and processes mitigates the impact of ransomware attacks within an organization. After incriminating an entity, Microsoft Defender for Endpoint stops the attack via feedback-loop blocking, which uses Microsoft Defender Antivirus to block the threat on endpoints in the organization. Defender for Endpoint then uses the threat intelligence gathered during the ransomware attack to protect other organizations.

Diagram with icons and lines depicting the incrimination and protection process.
Figure 2: Overview of incrimination using cloud-based machine learning classifiers and blocking by Microsoft Defender Antivirus

In this blog, we discuss in detail how Microsoft Defender for Endpoint uses multiple innovative, AI-based protections to examine alerts at the organization level, events across devices, and suspicious activity on specific devices to create a unique aggregation of signals that can identify a human-operated ransomware attack.

Detecting anomalies in alerts at the organization level

A human-operated ransomware attack generates a lot of noise in the system. During this phase, solutions like Defender for Endpoint raise many alerts upon detecting multiple malicious artifacts and behavior on many devices, resulting in an alert spike. Figure 3 shows an attack that occurred across a single organization.

Line chart depicting the spread of a human-operated ransomware in an organization.
Figure 3: Graph showing a spike in alerts during the ransomware phase of an attack

Defender for Endpoint identifies an organization-level attack by using time-series analysis to monitor the aggregation of alerts and statistical analysis to detect any significant increase in alert volume. In the event of an alert spike, Defender for Endpoint analyzes the related alerts and uses a specialized machine learning model to distinguish between true ransomware attacks and spurious spikes of alerts.

If the alerts involve activity characteristic of a ransomware attack, Defender for Endpoint searches for suspicious entities to incriminate based on attack relevance and spread across the organization. Figure 4 shows organization-level detection.

Diagram with icons showing organization-level anomaly detection, including monitoring for alerts, anomaly detection based on alert counts, analysis of each alert, and incrimination of suspicious entities on individual devices.
Figure 4: Overview of organization-level anomaly detection

Graph-based monitoring of connections between devices

Organization-level monitoring can pose challenges when attacks don’t produce enough noise at the organization level. Aside from monitoring anomalous alert counts, Defender for Endpoint also adopts a graph-based approach for a more focused view of several connected devices to produce high-confidence detections, including an overall risk score. For this level of monitoring, Defender for Endpoint examines remote activity on a device to generate a connected graph. This activity can originate from popular admin tools such as PsExec / wmi / WinRm when another device in the organization connects to a device using admin credentials. This remote connection can also indicate previous credential theft by an attacker.

As administrators often use such connectivity tools for legitimate purposes, Defender for Endpoint differentiates suspicious activity from the noise by searching specifically for suspicious processes executed during the connection timeframe.

Diagram with icons and arrows showing a typical attack pattern involving the command line as an initial attack vector via credential theft and compromised with tools such as psexec and wmi. The target then scans the network to connect to Active Directory and spread throughout the organization.
Figure 5: Diagram of a typical attack pattern from initial attack vector to scanning and lateral movement

Figure 5 shows a typical attack pattern wherein a compromised device A is the initial attack vector, and the attacker uses remote desktop protocol (RDP) or a remote shell to take over the device and start scanning the network. If possible, the attackers move laterally to device B. At this point, the remote processes wmic.exe on the command line and wmiprvse.exe on the target can spawn a new process to perform remote activities.

Graph-based detection generates the entities in memory to produce a virtual graph of connected components to calculate a total risk score, wherein each component represents a device with suspicious activities. These activities might produce low-fidelity signals, such as scores from certain machine learning models or other suspicious signals on the device. The edges of the graph show suspicious network connections. Defender for Endpoint then analyzes this graph to produce a final risk score. Figure 6 highlights an example of graph-based aggregation activities and risk score generation.

Diagram with text and arrows showing the aggregation of signals to produce a risk score for multiple devices. A numerical algorithm is used to analyze the risk score of each device based on suspicious activity.
Figure 6: Diagram showing the aggregation of signals to produce a risk score for multiple devices

Identifying suspicious activity with high confidence on a single device

The final detection category is identifying suspicious activity on a single device. Sometimes, suspicious signals from only one device represent enough evidence to identify a ransomware attack, such as when an attack uses evasion techniques like spreading activity over a period of time and across processes unrelated to the attack chain. As a result, such an attack can fly under the radar, if defenses fail to recognize these processes as related. If the signals are not strong enough for each process chain, no alerts will generate.

Figure 7 depicts a simplified version of evasion activity using the Startup folder and autostart extension points. After taking over a device, an attacker opens cmd.exe and writes a file to the Startup folder to carry out malicious activities. When the device restarts, the file in the Startup folder performs additional commands using the parent process ID explorer.exe, which is unrelated to the original cmd.exe that wrote the file. This behavior splits the activity into two separate process chains occurring at different times, which could prevent security solutions from correlating these commands. As a result, when neither individual process produces enough noise, an alert might not appear.

Diagram with icons and arrows depicting evasion activity using four different processes, wherein cmd.exe commands the device to restart and then open explorer.exe which appears as an entirely separate process.
Figure 7: Evasion activity split into two separate process chains occurring at different times

The enhanced AI-based detections in Defender for Endpoint can help connect seemingly unrelated activity by assessing logs for processes that resemble DLL hijacking, autostart entries in the registry, creation of files in startup folder, and similar suspicious changes. The incrimination logic then maps out the initiation of the first process in relation to the files and tasks that follow.

Human-operated ransomware protection using AI

Attackers behind human-operated campaigns make decisions depending on what they discover in environments they compromise. The human aspect of these attacks results in varied attack patterns that evolve based on unique opportunities that attackers find for privilege escalation and lateral movement. AI and machine learning present innovative methods for surfacing sophisticated attacks known for using advanced tools and techniques to stay persistent and evasive.

In this blog, we discussed enhancements to cloud-based AI-driven protections in Microsoft Defender for Endpoint that are especially designed to help disrupt human-operated ransomware attacks. These enhanced protections use AI to analyze threat data from multiple levels of advanced monitoring and correlate malicious activities to incriminate entities and stop attacks in their tracks. Today, these AI protections are triggered in the early stages of the ransomware phase, as the attack starts to encrypt data on devices. We’re now working to expand these protections to trigger even earlier in the attack chain, before the ransomware deployment, and to expand the scope to incriminate and isolate compromised user accounts and devices to further limit the damage of attacks.  

This innovative approach to detection adds to existing protections that Microsoft 365 Defender delivers against ransomware. This evolving attack disruption capability exemplifies Microsoft’s commitment to harness the power of AI to explore novel ways of detecting threats and improve organizations’ defenses against an increasingly complex threat landscape.

Learn how Microsoft helps you defend against ransomware.

Learn how machine learning and AI drives innovation at Microsoft security research.

Arie Agranonik, Charles-Edouard Bettan, Sriram Iyer
Microsoft 365 Defender Research Team

The post Improving AI-based defenses to disrupt human-operated ransomware appeared first on Microsoft Security Blog.

Best practices for AI security risk management

December 9th, 2021 No comments

Today, we are releasing an AI security risk assessment framework as a step to empower organizations to reliably audit, track, and improve the security of the AI systems. In addition, we are providing new updates to Counterfit, our open-source tool to simplify assessing the security posture of AI systems.

There is a marked interest in securing AI systems from adversaries. Counterfit has been heavily downloaded and explored by organizations of all sizes—from startups to governments and large-scale organizations—to proactively secure their AI systems. From a different vantage point, the Machine Learning Evasion Competition we organized to help security professionals exercise their muscles to defend and attack AI systems in a realistic setting saw record participation, doubling the amount of participants and techniques than the previous year.

This interest demonstrates the growth mindset and opportunity in securing AI systems. But how do we harness interest into action that can raise the security posture of AI systems? When the rubber hits the road, how can a security engineer think about mitigating the risk of an AI system being compromised?

AI security risk assessment framework

The deficit is clear: according to Gartner® Market Guide for AI Trust, Risk and Security Management published in September 2021, “AI poses new trust, risk and security management requirements that conventional controls do not address.1 To address this gap, we did not want to invent a new process. We acknowledge that security professionals are already overwhelmed. Moreover, we believe that even though the attacks on AI systems pose a new security risk, current software security practices are relevant and can be adapted to manage this novel risk. To that end, we fashioned our AI security risk assessment in the spirit of the current security risk assessment frameworks.

We believe that to comprehensively assess the security risk for an AI system, we need to look at the entire lifecycle of system development and deployment. An overreliance on securing machine learning models through academic adversarial machine learning oversimplifies the problem in practice. This means, to truly secure the AI model, we need to account for securing the entire supply chain and management of AI systems.

Through our own operations experience in building and red teaming models at Microsoft, we recognize that securing AI systems is a team sport. AI researchers design model architectures. Machine learning engineers build data ingestion, model training, and deployment pipelines. Security architects establish appropriate security policies. Security analysts respond to threats. To that end, we envisioned a framework that would involve participation from each of these stakeholders.

“Designing and developing secure AI is a cornerstone of AI product development at Boston Consulting Group (BCG). As the societal need to secure our AI systems becomes increasingly apparent, assets like Microsoft’s AI security risk management framework can be foundational contributions. We already implement best practices found in this framework in the AI systems we develop for our clients and are excited that Microsoft has developed and open sourced this framework for the benefit of the entire industry.”—Jack Molloy, Senior Security Engineer, BCG

As a result of our Microsoft-wide collaboration, our framework features the following characteristics:

  1. Provides a comprehensive perspective to AI system security. We looked at each element of the AI system lifecycle in a production setting: from data collection, data processing, to model deployment. We also accounted for AI supply chains, as well as the controls and policies with respect to backup, recovery, and contingency planning related to AI systems.
  2. Outlines machine learning threats and recommendations to abate them. To directly help engineers and security professionals, we enumerated the threat statement at each step of the AI system building process. Next, we provided a set of best practices that overlay and reinforce existing software security practices in the context of securing AI systems.
  3. Enables organizations to conduct risk assessments. The framework provides the ability to gather information about the current state of security of AI systems in an organization, perform gap analysis, and track the progress of the security posture.

Updates to Counterfit

To help security professionals get a broader view of the security posture of the AI systems, we have also significantly expanded Counterfit. The first release of Counterfit wrapped two popular frameworks—Adversarial Robustness Toolbox (ART) and TextAttack—to provide evasion attacks against models operating on tabular, image, and textual inputs. With the new release, Counterfit now features the following:

  • An extensible architecture that simplifies integration of new attack frameworks.
  • Attacks that include both access to the internals of the machine learning model and with just query access to the machine learning model.
  • Threat paradigms that include evasion, model inversion, model inference, and model extraction.
  • In addition to algorithmic attacks provided, common corruption attacks through AugLy are also included.
  • Attacks are supported for models that accept tabular data, images, text, HTML, or Windows executable files as input.

Learn More

These efforts are part of broader investment at Microsoft to empower engineers to securely develop and deploy AI systems. We recommend using it alongside the following resources:

  • For security analysts to orient to threats against AI systems, Microsoft, in collaboration with MITRE, released an ATT&CK style Adversarial Threat Matrix complete with case studies of attacks on production machine learning systems, which has evolved into MITRE ATLAS.
  • For security incident responders, we released our own bug bar to systematically triage attacks on machine learning systems.
  • For developers, we released threat modeling guidance specifically for machine learning systems.
  • For engineers and policymakers, Microsoft, in collaboration with Berkman Klein Center at Harvard University, released a taxonomy documenting various machine learning failure modes.
  • For security professionals, Microsoft open sourced Counterfit to help with assessing the posture of AI systems.
  • For the broader security community, Microsoft hosted the annual Machine Learning Evasion Competition.
  • For Azure machine learning customers, we provided guidance on enterprise security and governance.

This is a living framework. If you have questions or feedback, please contact us.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.


1 Gartner, Market Guide for AI Trust, Risk and Security Management, Avivah Litan, et al., 1 September 2021 GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

The post Best practices for AI security risk management appeared first on Microsoft Security Blog.

New Secured-core servers are now available from the Microsoft ecosystem to help secure your infrastructure

December 7th, 2021 No comments

In the current pandemic-driven remote work environments, security has become increasingly important. Earlier this year, Colonial Pipeline, one of the leading suppliers of fuel on the East Coast of the United States, was hit by a ransomware attack.1 This caused a massive disruption of the fuel supply chain and a surge in gasoline prices. In another unrelated incident, Chinese start-up Socialarks suffered a massive data breach,2 which exposed personally identifiable information (PII) of over 214 million users of some of the most popular worldwide social networks. These data breaches are extremely expensive, with the average cost of a data breach estimated at USD4.2 million dollars for every breach in 2021.3 There has also been a surge in the number of ransomware attacks, with a ransomware attack expected every 11 seconds and the total costs of damages due to these attacks is estimated to be about USD20 billion dollars in 2021.4

As we discussed at Microsoft Inspire earlier this year, threats against infrastructure can come from a variety of sources—attackers exploiting web shells, brute force login attacks, software vulnerabilities, and credential theft—to achieve goals like deploying ransomware. With cyberattacks continuing to rise, the need for secure computing has never been more important. Customers care about the protection of their data and workloads, and platform security can be an important tool in a comprehensive defense-in-depth strategy. Applying our learnings from the Secured-core PC initiative, Microsoft is collaborating with partners to expand Secured-core to Windows Server, Microsoft Azure Stack HCI, and Azure-certified IoT devices.

REvil ransomware use case

Let’s dive into the typical kill chain of a human-operated ransomware campaign undertaken by REvil (or Sodinokibi), which very recently impacted over thousands of businesses worldwide including the recent attack on Kaseya.5 The attackers used a variety of different techniques, such as compromised Remote Desktop Protocol (RDP) credentials and vulnerabilities in the operating system and applications to gain an initial foothold in the organizations. Documents from the United States Department of Justice’s investigation6 delve into how REvil carried out the ransomware attack on Kaseya by using the following attack pattern:

Stages of attack with tools and techniques used in the REvil ransomware attack on Kaseya

Figure 1. Kill chain of REvil ransomware.

The ransomware operators can gain administrative privileges on the compromised devices, steal passwords from the memory using credential dumping tools, such as Mimikatz, and use Cobalt Strike and Metasploit to hop laterally and establish persistence on the victim’s networks. After obtaining the necessary privileges and access across the infrastructure, the ransomware activates, initiating the encryption of all the files and leaving an electronic note to the user indicating the amount that they need to pay to decrypt their files.

Ransomware attacks like these result in an enormous loss of time and money for enterprises. Continuing to raise the security bar for critical infrastructure against attackers makes it easier for organizations to meet that higher bar, which is an important priority for both customers and Microsoft. Successfully protecting systems requires a holistic approach that builds security from the chip to the cloud across hardware, firmware, and the operating system.

Secured-core servers leverage your infrastructure to help protect you from security threats

Secured-core servers take a defense-in-depth approach to basic system security. Secured-core servers are built around three distinct security pillars:

  1. To protect the server infrastructure with a hardware-based root of trust.
  2. To defend sensitive workloads against firmware-level attacks.
  3. To prevent access and the execution of unverified code on the systems.

Partnering with leading original equipment manufacturers (OEMs) and silicon vendors, Secured-core servers use industry-standard hardware-based root of trust coupled with security capabilities built into today’s modern central processing units (CPUs). Secured-core servers use the Trusted Platform Module 2.0 and Secure boot to ensure that only trusted components load in the boot path.

“To help our customers remain secure and accelerate their business outcomes, Hewlett Packard Enterprise (HPE) is excited to release the new Gen 10 Plus (v2) products for Azure Stack HCI 21H2 and Windows Server 2022 which can be delivered with the HPE GreenLake edge-to-cloud platform,” said Keith White, Senior Vice President and General Manager, GreenLake Cloud Services Commercial Business. “These offer unprecedented host protection by combining HPE’s security technologies with Secured-core server functionalities for a secure, hybrid implementation.”

Additional details will be made available soon as part of the Azure Stack HCI: Secured-core Server Solution Brief. Configuration details can be found in the section “Configuring and validating Secured-core” of the Implementing Microsoft Windows Server 2022 Using HPE Proliant Servers, Storage, and Networking Options white paper.

Secured-core servers use hardware-rooted security in the modern CPU with Dynamic Root of Trust Measurement (DRTM) to launch the system into a trusted state, mitigating attacks from advanced malware that attempts to tamper with the system.

Enabled with Hypervisor-Protected Code Integrity (HVCI), a Secured-core server only starts executables signed by known and approved authorities. This ensures that code running within the trusted computing base runs with integrity and is not subject to exploits or attacks. The hypervisor sets and enforces permissions to prevent malware from attempting to modify the memory and executing.

In the REvil ransomware example that was described earlier, Secured-core servers would have made it much harder for the attackers to effectively deploy and activate their payload. HVCI comes enabled with a code integrity security policy that blocks drivers that tamper with the kernel, such as Mimikatz. Additionally, since Virtualization-based security (VBS) is enabled out of the box, IT administrators can easily enable features, such as Credential Guard, which safeguard the credentials in an isolated environment that is invisible to attackers. By preventing credential theft (stage two of the kill chain, represented in Figure 1), Secured-core servers can help make it extremely hard for attackers to hop laterally in the network, thereby, stopping the attack.

Look for Secured-core server solutions in the HCI and Windows Server catalogs

You can now find a breadth of servers certified for Secured-core server AQ in the Azure Stack HCI catalog. Enhancements made to the catalog allow you to easily identify Azure Stack HCI solutions that support Secured-core server functionality with the new Secured-core server badge.

Azure stack HCI catalog screenshot showing four Secured-core server solutions from H P E.

Figure 2. Azure Stack HCI Catalog Secured-core servers.

Secured-core servers support all the protections offered in the trusted enterprise virtualization use case, plus additional features to protect hosts from firmware-level attacks. In addition to the Azure Stack HCI catalog, the Windows Server Catalog lists dozens of hardware platforms from our various ecosystem partners that meet the Secured-core server AQ. Learn more about how the Secured-core servers provide exceptional host security in our blog post.

Manage your Secured-core server easily with the Microsoft Windows Admin Center

Windows Admin Center is your user interface (UI) for managing the status and configuration of your Secured-core server. Windows Admin Center is a locally deployed, browser-based application for managing Windows servers, clusters, hyper-converged infrastructure, as well as Windows clients, and is ready to use in production.

New functionality in Windows Admin Center makes it extremely easy for customers to configure the Secured-core features for Windows Server and Azure Stack HCI systems. The new Windows Admin Center security functionality, now included with the product, enables advanced security with a click of the button from a web browser anywhere in the world. For Windows Server and validated Azure Stack HCI solutions, customers can look for Secured-core certified systems to simplify acquiring secure hardware platforms.

Windows Admin Center screenshot showing six Secured-core features status each on a two-node demo cluster.

Figure 3. Windows Admin Center Secured-core server cluster management.

The Windows Admin Center UI allows you to easily configure the six features that encompass Secured-core server: Hypervisor Enforced Code Integrity, Boot Direct Memory Access (DMA) Protection, System Guard, Secure Boot, Virtualization-based security, and Trusted Platform Module 2.0. Download the latest version of Windows Admin Center today.

Begin your Secured-core journey

Secured-core servers, which are now available in the Azure Stack HCI and Windows Server catalogs, come fully equipped with industry-leading security mitigations built into the hardware, firmware, and the operating system to help thwart some of the most advanced attack vectors. Coupled with Windows Admin Center, managing and monitoring the security state of your mission-critical infrastructure has never been easier.

To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.


1US fuel pipeline hackers ‘didn’t mean to create problems,’ Mary-Ann Russon, BBC News. 10 May 2021.

2200 million Facebook, Instagram, and LinkedIn users’ scraped data exposed, Security Magazine. 12 January 2021.

3How much does a data breach cost? Cost of a Data Breach Report 2021, IBM.

4Global Ransomware Damage Costs Predicted To Reach $20 Billion (USD) By 2021, Steve Morgan, Cybercrime Magazine. 21 October 2019.

5Ukrainian Arrested and Charged with Ransomware Attack on Kaseya, The United States Department of Justice. 8 November 2021.

6United States of America V. Yevgeniy Igorevich Polyanin, United States District Court for the Norther District of Texas Dallas Division. 24 August 2021.

The post New Secured-core servers are now available from the Microsoft ecosystem to help secure your infrastructure appeared first on Microsoft Security Blog.

CISO Spotlight: How diversity of data (and people) defeats today’s cyber threats

October 20th, 2020 No comments

This year, we have seen five significant security paradigm shifts in our industry. This includes the acknowledgment that the greater the diversity of our data sets, the better the AI and machine learning outcomes. This diversity gives us an advantage over our cyber adversaries and improves our threat intelligence. It allows us to respond swiftly and effectively, addressing one of the most difficult challenges for any security team. For Microsoft, our threat protection is built on an unparalleled cloud ecosystem that powers scalability, pattern recognition, and signal processing to detect threats at speed, while correlating these signals accurately to understand how the threat entered your environment, what it affected, and how it currently impacts your organization. The AI capabilities built into Microsoft Security solutions are trained on 8 trillion daily threat signals from a wide variety of products, services, and feeds from around the globe. Because the data is diverse, AI and machine learning algorithms can detect threats in milliseconds.

All security teams need insights based on diverse data sets to gain real-time protection for the breadth of their digital estates. Greater diversity fuels better AI and machine learning outcomes, improving threat intelligence and enabling faster, more accurate responses. In the same way, a diverse and inclusive cybersecurity team also drives innovation and diffuses group think.

Jason Zander, Executive Vice President, Microsoft Azure, knows firsthand the advantages organizations experience when embracing cloud-based protections that look for insights based on diverse data sets. Below, he shares how they offer real-time protection for the breadth of their digital estates:

How does diverse data make us safer?

The secret ingredient lies in the cloud itself. The sheer processing power of so many data points allows us to track more than 8 trillion daily signals from a diverse collection of products, services, and the billions of endpoints that touch the Microsoft cloud every month. Microsoft analyzes hundreds of billions of identity authentications and emails looking for fraud, phishing attacks, and other threats. Why am I mentioning all these numbers? It’s to demonstrate how our security operations take petabytes’ worth of data to assess the worldwide threat, then act quickly. We use that data in a loop—get the signals in, analyze them, and create even better defenses. At the same time, we do forensics to see where we can raise the bar.

Microsoft also monitors the dark web and scans 6 trillion IoT messages every day, and we leverage that data as part of our security posture. AI, machine learning, and automation all empower your team by reducing the noise of constant alerts, so your people can focus on meeting the truly challenging threats.

Staying ahead of the latest threats

As the pandemic swept the globe, we were able to identify new COVID-19 themed threats—often in a fraction of a second—before they breached customers’ networks. Microsoft cyber defenders determined that adversaries added new pandemic-themed lures to existing and familiar malware. Cybercriminals are always changing their tactics to take advantage of recent events. Insights based on diverse data sets empower robust real-time protection as our adversaries’ tactics shift.

Microsoft also has the Cyber Defense Operations Center (CDOC) running 24/7. We employ over 3,500 full-time security employees and spend about $1 billion in operational expenses (OPEX) every year. In this case, OPEX includes all the people, equipment, algorithms, development, and everything else needed to secure the digital estate. Monitoring those 8 trillion signals is a core part of that system protecting our end users.

Tried and proven technology

If you’re part of the Microsoft ecosystem—Windows, Teams, Microsoft 365, or even Xbox Live—then you’re already benefitting from this technology. Azure Sentinel is built on the same cybersecurity technology we use in-house. As a cloud-native security information and event management (SIEM) solution, Azure Sentinel uses scalable machine learning algorithms to provide a birds-eye view across your entire enterprise, alleviating the stress that comes from sophisticated attacks, frequent alerts, and long resolution time frames. Our research has shown that customers who use Azure Sentinel achieved a 90 percent reduction in alert fatigue.

Just as it does for us, Azure Sentinel can work continuously for your enterprise to:

  • Collect data across all users, devices, applications, and infrastructure—both on-premises and in multiple clouds.
  • Detect previously undetected threats (while minimizing false positives) using analytics and threat intelligence.
  • Investigate threats and hunt down suspicious activities at scale using powerful AI that draws upon years of cybersecurity work at Microsoft.
  • Respond to incidents rapidly with built-in orchestration and automation of common tasks.

Diversity equals better protection

As Jason explained, Microsoft is employing AI, machine learning, and quantum computing to shape our responses to cyber threats. We know we must incorporate a holistic approach that includes people at its core because technology alone will not be enough. If we don’t, cybercriminals will exploit group preconceptions and biases. According to research, gender-diverse teams make better business decisions 73 percent of the time. Additionally, teams that are diverse in age and geographic location make better decisions 87 percent of the time. Just as diverse data makes for better cybersecurity, the same holds true for the people in your organization, allowing fresh ideas to flourish. Investing in diverse teams isn’t just the right thing to do—it helps future proof against bias while protecting your organization and customers.

Watch for upcoming posts on how your organization can benefit from integrated, seamless security, and be sure to follow @Ann Johnson and @Jason Zander on Twitter for cybersecurity insights.

To learn more about Microsoft Security solutions visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post CISO Spotlight: How diversity of data (and people) defeats today’s cyber threats appeared first on Microsoft Security.

Microsoft Digital Defense Report 2020: Cyber Threat Sophistication on the Rise

September 29th, 2020 No comments

Today, Microsoft is releasing a new annual report, called the Digital Defense Report, covering cybersecurity trends from the past year. This report makes it clear that threat actors have rapidly increased in sophistication over the past year, using techniques that make them harder to spot and that threaten even the savviest targets. For example, nation-state actors are engaging in new reconnaissance techniques that increase their chances of compromising high-value targets, criminal groups targeting businesses have moved their infrastructure to the cloud to hide among legitimate services, and attackers have developed new ways to scour the internet for systems vulnerable to ransomware.

In addition to attacks becoming more sophisticated, threat actors are showing clear preferences for certain techniques, with notable shifts towards credential harvesting and ransomware, as well as an increasing focus on Internet of Things (IoT) devices. Among the most significant statistics on these trends:

  • In 2019 we blocked over 13 billion malicious and suspicious mails, out of which more than 1 billion were URLs set up for the explicit purpose of launching a phishing credential attack.
  • Ransomware is the most common reason behind our incident response engagements from October 2019 through July 2020.
  • The most common attack techniques used by nation-state actors in the past year are reconnaissance, credential harvesting, malware, and Virtual Private Network (VPN) exploits.
  • IoT threats are constantly expanding and evolving. The first half of 2020 saw an approximate 35% increase in total attack volume compared to the second half of 2019.

Given the leap in attack sophistication in the past year, it is more important than ever that we take steps to establish new rules of the road for cyberspace; that all organizations, whether government agencies or businesses, invest in people and technology to help stop attacks; and that people focus on the basics, including regular application of security updates, comprehensive backup policies, and, especially, enabling multi-factor authentication (MFA).  Our data shows that enabling MFA would alone have prevented the vast majority of successful attacks.

To read the full blog and download the Digital Defense Report visit the Microsoft On-the-issues Blog.

CTA: To learn more about Microsoft Security solutions visit our website.  Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post Microsoft Digital Defense Report 2020: Cyber Threat Sophistication on the Rise appeared first on Microsoft Security.

Microsoft Security: How to cultivate a diverse cybersecurity team

August 31st, 2020 No comments

Boost creative problem solving with a diverse cybersecurity team

In cybersecurity, whether we are talking about cryptocurrency mining, supply chain attacks, attacks against IoT, or COVID-19-related phishing lures, we know that gaining the advantage over our adversaries requires greater diversity of data to improve our threat intelligence. If we are to future proof bias in tech however, our teams must also be as diverse, as the problems we are trying to solve.

Unfortunately, our cybersecurity teams don’t reflect this reality. A 2019 report by (ISC)2 found that less than 25 percent of cybersecurity professionals are women. People of color and women aren’t paid as well as white men and are underrepresented in management. Time and again, studies have found that gender-diverse teams make better business decisions 73 percent of the time. What’s more, teams that are also diverse in age and geographic location make better decisions 87 percent of the time. With a talent shortfall estimated between 1.5 million and 3.5 million, we must recruit, train, and retain cyber talent from a wide variety of backgrounds in order to maintain our advantage.

Diversity fuels innovation

You can see the evidence that diversity drives innovation when you look at artificial intelligence (AI) and machine learning. The AI capabilities built into Microsoft Security solutions are trained on 8 trillion daily threat signals from a wide variety of products, services, and feeds from around the globe (see Figure 1). Because the data is diverse, AI and machine learning algorithms can detect threats in milliseconds.

A graph showing Microsoft Intelligent Security.

Figure 1: Trillions of signals from around the globe allow Microsoft Security solutions to rapidly detect and respond to threats.

Just last year, the World Economic Forum complied several studies that provide further evidence that diversity sparks innovation. Cities with large immigration populations tend to have higher economic performance. Businesses with more diverse management teams have higher revenues. A C-suite with more women is likely to be more profitable. When people with different backgrounds and experiences collaborate, unique ideas can flourish. What’s more, if you want to build technology solutions that are inclusive of everyone, diverse teams help avoid bias and develop features that meet the needs of more people.

So how do you increase the diversity of your team? Expand the pipeline. Invest in your team. And create an inclusive culture.

Expand the pipeline

To recruit the very best people from all backgrounds, start by prioritizing unique perspectives. Machine learning, artificial intelligence, and quantum computing hold promise for addressing cyber threats; however, technology is not enough. Some problems can only be solved by people. You need teams that can anticipate what’s next and respond quickly in high-stress situations.

If everybody on the team has similar skills and backgrounds, you risk group think and a lack of creativity. It’s why diverse teams make better decisions than individuals 87 percent of the time (all-male teams only make better decisions than individuals 58 percent of the time).

To attract the diverse talent you need, expand your criteria. Look beyond the typical degrees, experience level, and certifications that you typically recruit for. Leverage training programs that help people acquire the technical skills you need. For example, BlackHoodie is a reverse engineering program for women. Consider people without college degrees, veterans, and people looking to switch careers. Work with colleges and other groups that represent disadvantaged communities, such as historically black colleges and universities.

Invest in your team

Cybersecurity teams around the globe are understaffed, while the amount of work continues to grow. Security operation center (SOC) analysts suffer from alert fatigue because they must monitor thousands of alerts—many of them false positives. Stress levels are high, and individuals work long hours. These work conditions can lead to burnout, which makes people less effective.

Reduce routine tasks with AI, machine learning, and automation. AI, machine learning, and automation can empower your team by reducing the noise, so people can focus on challenging threats that are, frankly, more fun. Azure Sentinel is a cloud-native SIEM that uses state of the art, scalable machine learning algorithms to correlate millions of low fidelity anomalies to present a few high-fidelity security incidents to analysts. Our research has shown that customers who use Azure Sentinel achieved a 90 percent reduction in alert fatigue.

: Azure Sentinel makes it easy to collect security data across your entire hybrid organization from devices, to users, to apps, to servers on any cloud.An image showing how Figure 2: Azure Sentinel makes it easy to collect security data across your entire hybrid organization from devices, to users, to apps, to servers on any cloud.

Provide growth opportunities and training. The threat landscape changes rapidly requiring security professionals to continuously upgrade their skills. Human beings also need new challenges to stay engaged. Provide opportunities for everyone to use creative problem-solving skills. Encourage individuals to learn from each other, such as through an apprenticeship program. Offer regular training for people at all levels of your organization. The Microsoft SOC focuses its training programs on three key areas:

  • Technical tools/capabilities.
  • Our organization (mission and assets being protected).
  • Attackers (motivations, tools, techniques, habits, etc.).

Take care of employees’ mental health. Stress is driving too many people to leave cybersecurity. In fact, stress has motivated 66 percent of IT professionals to look for a new job. Fifty-one percent would be willing to take a pay cut for less stress. Late nights and high-pressure incident response take a toll on employees. In these circumstances, it’s important to respect time off. People should be able to enjoy their days off without worrying about work. A collaborative culture that is forgiving of mistakes can also reduce the pressure. Ask your team how they are doing and really listen when they tell you. Their answers may trigger a great idea for alleviating stress.

Create an inclusive culture

People go where they are invited, but they stay where they are welcome. As you bring new people into your security organization, foster an environment where everybody feels accepted. All ideas should be listened to and considered. People who express ideas that challenge old methods can lead to breakthroughs and creativity. Here are a few ideas for making sure everyone feels included:

  • Solicit input from everybody, so you don’t just hear from those that are comfortable speaking up.
  • Provide mentorship and sponsorship programs for women and other underrepresented groups to help prepare them for advancement
  • Expand your definition of diversity to include neuro atypical, nonbinary, LGBTQ, religious affiliation, and education level in addition to race and gender.
  • Make a conscious effort to evaluate performance, not communication or presentation style.
  • Hold leadership and vendors accountable for diversity metrics.

As we look past the COVID-19 pandemic, we can expect that cybersecurity challenges will continue to evolve. AI, machine learning, and quantum computing will shape our response, but technology will not be enough. We need creative people to build our products, design our security programs, and respond to threats. We need teams that are diverse as the problems we face.

To learn more about Microsoft Security solutions visit our website.  Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post Microsoft Security: How to cultivate a diverse cybersecurity team appeared first on Microsoft Security.

Seeing the big picture: Deep learning-based fusion of behavior signals for threat detection

July 23rd, 2020 No comments

The application of deep learning and other machine learning methods to threat detection on endpoints, email and docs, apps, and identities drives a significant piece of the coordinated defense delivered by Microsoft Threat Protection. Within each domain as well as across domains, machine learning plays a critical role in analyzing and correlating massive amounts of data to detect increasingly evasive threats and build a complete picture of attacks.

On endpoints, Microsoft Defender Advanced Threat Protection (Microsoft Defender ATP) detects malware and malicious activities using various types of signals that span endpoint and network behaviors. Signals are aggregated and processed by heuristics and machine learning models in the cloud. In many cases, the detection of a particular type of behavior, such as registry modification or a PowerShell command, by a single heuristic or machine learning model is sufficient to create an alert.

Detecting more sophisticated threats and malicious behaviors considers a broader view and is significantly enhanced by fusion of signals occurring at different times. For example, an isolated event of file creation is generally not a very good indication of malicious activity, but when augmented with an observation that a scheduled task is created with the same dropped file, and combined with other signals, the file creation event becomes a significant indicator of malicious activity. To build a layer for these kinds of abstractions, Microsoft researchers instrumented new types of signals that aggregate individual signals and create behavior-based detections that can expose more advanced malicious behavior.

In this blog, we describe an application of deep learning, a category of machine learning algorithms, to the fusion of various behavior detections into a decision-making model. Since its deployment, this deep learning model has contributed to the detection of many sophisticated attacks and malware campaigns. As an example, the model uncovered a new variant of the Bondat worm that attempts to turn affected machines into zombies for a botnet. Bondat is known for using its network of zombie machines to hack websites or even perform cryptocurrency mining. This new version spreads using USB devices and then, once on a machine, achieves a fileless persistence. We share more technical details about this attack in latter sections, but first we describe the detection technology that caught it.

Powerful, high-precision classification model for wide-ranging data

Identifying and detecting malicious activities within massive amounts of data processed by Microsoft Defender ATP require smart automation methods and AI. Machine learning classifiers digest large volumes of historical data and apply automatically extracted insights to score each new data point as malicious or benign. Machine learning-based models may look at, for example, registry activity and produce a probability score, which indicates the probability of the registry write being associated with malicious activity. To tie everything together, behaviors are structured into virtual process trees, and all signals associated with each process tree are aggregated and used for detecting malicious activity.

With virtual process trees and signals of different types associated to these trees, there’s still large amounts of data and noisy signals to sift through. Since each signal occurs in the context of a process tree, it’s necessary to fuse these signals in the chronological order of execution within the process tree. Data ordered this way requires a powerful model to classify malicious vs. benign trees.

Our solution comprises several deep learning building blocks such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN). The neural network can take behavior signals that occur chronologically in the process tree and treat each batch of signals as a sequence of events. These sequences can be collected and classified by the neural network with high precision and detection coverage.

Behavior-based and machine learning-based signals

Microsoft Defender ATP researchers instrument a wide range of behavior-based signals. For example, a signal can be for creating an entry in the following registry key:


A folder and executable file name added to this location automatically runs after the machine starts. This generates persistence on the machine and hence can be considered an indicator of compromise (IoC). Nevertheless, this IoC is generally not enough to generate detection because legitimate programs also use this mechanism.

Another example of behavior-based signal is service start activity. A program that starts a service through the command line using legitimate tools like net.exe is not considered a suspicious activity. However, starting a service created earlier by the same process tree to obtain persistence is an IoC.

On the other hand, machine learning-based models look at and produce signals on different pivots of a possible attack vector. For example, a machine learning model trained on historical data to discern between benign and malicious command lines will produce a score for each processed command line.

Consider the following command line:

 cmd /c taskkill /f /im someprocess.exe

This line implies that taskill.exe is evoked by cmd.exe to terminate a process with a particular name. While the command itself is not necessarily malicious, the machine learning model may be able to recognize suspicious patterns in the name of the process being terminated, and provide a maliciousness probability, which is aggregated with other signals in the process tree. The result is a sequence of events during a certain period of time for each virtual process tree.

The next step is to use a machine learning model to classify this sequence of events.

Data modeling

The sequences of events described in the previous sections can be represented in several different ways to then be fed into machine learning models.

The first and simple way is to construct a “dictionary” of all possible events, and to assign a unique identifier (index) to each event in the dictionary. This way, a sequence of events is represented by a vector, where each slot constitutes the number of occurrences (or other related measure) for an event type in the sequence.

For example, if all possible events in the system are X,Y, and Z, a sequence of events “X,Z,X,X” is represented by the vector [3, 0, 1], implying that it contains three events of type X, no events of type Y, and a single event of type Z. This representation scheme, widely known as “bag-of-words”,  is suitable for traditional machine learning models and has been used for a long time by machine learning practitioners. A limitation of the bag-of-words representation is that any information about the order of events in the sequence is lost.

The second representation scheme is chronological. Figure 1 shows a typical process tree: Process A raises an event X at time t1, Process B raises an event Z at time t2, D raises X at time t3, and E raises X at time t4. Now the entire sequence “X,Z,X,X”  (or [1,3,1,1] replacing events by their dictionary indices) is given to the machine learning model.

Diagram showing process tree

Figure 1. Sample process tree

In threat detection, the order of occurrence of different events is important information for the accurate detection of malicious activity. Therefore, it’s desirable to employ a representation scheme that preserves the order of events, as well as machine learning models that are capable of consuming such ordered data. This capability can be found in the deep learning models described in the next section.


Deep learning has shown great promise in sequential tasks in natural language processing like sentiment analysis and speech recognition. Microsoft Defender ATP uses deep learning for detecting various attacker techniques, including malicious PowerShell.

For the classification of signal sequences, we use a Deep Neural Network that combines two types of building blocks (layers): Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory Recurrent Neural Networks (BiLSTM-RNN).

CNNs are used in many tasks relating to spatial inputs such as images, audio, and natural language. A key property of CNNs is the ability to compress a wide-field view of the input into high-level features.  When using CNNs in image classification, high-level features mean parts of or entire objects that the network can recognize. In our use case, we want to model long sequences of signals within the process tree to create high-level and localized features for the next layer of the network. These features could represent sequences of signals that appear together within the data, for example, create and run a file, or save a file and create a registry entry to run the file the next time the machine starts. Features created by the CNN layers are easier to digest for the ensuing LSTM layer because of this compression and featurization.

LSTM deep learning layers are famous for results in sentence classification, translation, speech recognition, sentiment analysis, and other sequence modeling tasks. Bidirectional LSTM combine two layers of LSTMs that process the sequence in opposite directions.

The combination of the two types of neural networks stacked one on top of the other has shown to be very effective and can classify long sequences of hundreds of items and more. The final model is a combination of several layers: one embedding layer, two CNNs, and a single BiLSTM. The input to this model is a sequence of hundreds of integers representing the signals associated with a single process tree during a unit of time. Figure 2 shows the architecture of our model.

Diagram showing layers of the CNN BiLSTM model

Figure 2. CNN-BiLSTM model

Since the number of possible signals in the system is very high, input sequences are passed through an embedding layer that compresses high-dimensional inputs into low-dimensional vectors that can be processed by the network. In addition, similar signals get a similar vector in lower dimensional space, which helps with the final classification.

Initial layers of the network create increasingly high-level features, and the final layer performs sequence classification. The output of the final layer is a score between 0 and 1 that indicates the probability of the sequence of signals being malicious. This score is used in combination with other models to predict if the process tree is malicious.

Catching real-world threats

Microsoft Defender ATP’s endpoint detection and response capabilities use this Deep CNN-BiLSTM model to catch and raise alerts on real-world threats. As mentioned, one notable attack that this model uncovered is a new variant of the Bondat worm, which was seen propagating in several organizations through USB devices.

Diagram showing the Bondat attack chain

Figure 3. Bondat malware attack chain

Even with an arguably inefficient propagation method, the malware could persist in an organization as users continue to use infected USB devices. For example, the malware was observed in hundreds of machines in one organization. Although we detected the attack during the infection period, it continued spreading until all malicious USB drives were collected. Figure 4 shows the infection timeline.

Column chart showing daily encounters of the Bondat malware in one organization

Figure 4. Timeline of encounters within a single organization within a period of 5 months showing reinfection through USB devices

The attack drops a JavaScript payload, which it runs directly in memory using wscript.exe. The JavaScript payload uses a randomly generated filename as a way to evade detections. However, Antimalware Scan Interface (AMSI) exposes malicious script behaviors.

To spread via USB devices, the malware leverages WMI to query the machine’s disks by calling “SELECT * FROM Win32_DiskDrive”. When it finds a match for “/usb” (see Figure 5), it copies the JavaScript payload to the USB device and creates a batch file on the USB device’s root folder. The said batch file contains the execution command for the payload. As part of its social engineering technique to trick users into running the malware in the removable device, it creates a LNK file on the USB pointing to the batch file.

Screenshot of malware code showing infection technique

Figure 5. Infection technique

The malware terminates processes related to antivirus software or debugging tools. For Microsoft Defender ATP customers, tamper protection prevents the malware from doing this. Notably, after terminating a process, the malware pops up a window that imitates a Windows error message to make it appear like the process crashed (See figure 6).

Screenshot of malware code showing infection technique

Figure 6. Evasion technique

The malware communicates with a remote command-and-control (C2) server by implementing a web client (MSXML). Each request is encrypted with RC4 using a randomly generated key, which is sent within the “PHPSESSID” cookie value to allow attackers to decrypt the payload within the POST body.

Every request sends information about the machine and its state following the output of the previously executed command. The response is saved to disk and then parsed to extract commands within an HTML comment tag. The first five characters from the payload are used as key to decrypt the data, and the commands are executed using the eval() method. Figures 7 and 8 show the C2 communication and HTML comment eval technique.

Once the command is parsed and evaluated by the JavaScript engine, any code can be executed on an affected machine, for example, download other payloads, steal sensitive info, and exfiltrate stolen data. For this Bondat campaign, the malware runs coin mining or coordinated distributed denial of service (DDoS) attacks.

Figure 7. C2 communication

Figure 8. Eval technique (parsing commands from html comment)

The malware’s activities triggered several signals throughout the attack chain. The deep learning model inspected these signals and the sequence with which they occurred, and determined that the process tree was malicious, raising an alert:

  1. Persistence – The malware copies itself into the Startup folder and drops a .lnk file pointing to the malware copy that opens when the computer starts
  2. Renaming a known operating system tool – The malware renames exe into a random filename
  3. Dropping a file with the same filename as legitimate tools – The malware impersonates legitimate system tools by dropping a file with a similar name to a known tool.
  4. Suspicious command line – The malware tries to delete itself from previous location using a command line executed by a process spawned by exe
  5. Suspicious script content – Obfuscated JavaScript payload used to hide the attacker’s intentions
  6. Suspicious network communication – The malware connects to the domain legitville[.]com


Modeling a process tree, given different signals that happen at different times, is a complex task. It requires powerful models that can remember long sequences and still be able to generalize well enough to churn out high-quality detections. The Deep CNN-BiLSTM model we discussed in this blog is a powerful technology that helps Microsoft Defender ATP achieve this task. Today, this deep learning-based solution contributes to Microsoft Defender ATP’s capability to detect evolving threats like Bondat.

Microsoft Defender ATP raises alerts for these deep learning-driven detections, enabling security operations teams to respond to attacks using Microsoft Defender ATP’s other capabilities, like threat and vulnerability management, attack surface reduction, next-generation protection, automated investigation and response, and Microsoft Threat Experts. Notably, these alerts inform behavioral blocking and containment capabilities, which add another layer of protection by blocking threats if they somehow manage to start running on machines.

The impact of deep learning-based protections on endpoints accrues to the broader Microsoft Threat Protection (MTP), which combines endpoint signals with threat data from email and docs, identities, and apps to provide cross-domain visibility. MTP harnesses the power of Microsoft 365 security products to deliver unparalleled coordinated defense that detects, blocks, remediates, and prevents attacks across an organization’s Microsoft 365 environment. Through machine learning and AI technologies like the deep-learning model we discussed in this blog, MTP automatically analyzes cross-domain data to build a complete picture of each attack, eliminating the need for security operations centers (SOC) to manually build and track the end-to-end attack chain and relevant details. MTP correlates and consolidates attack evidence into incidents, so SOCs can save time and focus on critical tasks like expanding investigations and proacting threat hunting.


Arie Agranonik, Shay Kels, Guy Arazi

Microsoft Defender ATP Research Team


Talk to us

Questions, concerns, or insights on this story? Join discussions at the Microsoft Threat Protection and Microsoft Defender ATP tech communities.

Read all Microsoft security intelligence blog posts.

Follow us on Twitter @MsftSecIntel.

The post Seeing the big picture: Deep learning-based fusion of behavior signals for threat detection appeared first on Microsoft Security.

Misconfigured Kubeflow workloads are a security risk

June 10th, 2020 No comments

Azure Security Center (ASC) monitors and defends thousands of Kubernetes clusters running on top of AKS. Azure Security Center regularly searches for and research for new attack vectors against Kubernetes workloads. We recently published a blog post about a large scale campaign against Kubernetes clusters that abused exposed Kubernetes dashboards for deploying cryptocurrency miners.

In this blog, we’ll reveal a new campaign that was observed recently by ASC that targets Kubeflow, a machine learning toolkit for Kubernetes. We observed that this attack effected on tens of Kubernetes clusters.

Kubeflow is an open-source project, started as a project for running TensorFlow jobs on Kubernetes. Kubeflow has grown and become a popular framework for running machine learning tasks in Kubernetes. Nodes that are used for ML tasks are often relatively powerful, and in some cases include GPUs. This fact makes Kubernetes clusters that are used for ML tasks a perfect target for crypto mining campaigns, which was the aim of this attack.

During April, we observed deployment of a suspect image from a public repository on many different clusters. The image is ddsfdfsaadfs/dfsdf:99. By inspecting the image’s layers, we can see that this image runs an XMRIG miner:

We can see that this image runs an XMRIG miner:

This repository contains several more images, which differ in the mining configuration. We saw some deployments of those images too.

Looking at the various clusters that the above image ran on showed that most of them run Kubeflow. This fact implies that the access vector in this attacker is the machine-learning framework.

The question is how can Kubeflow be used as an access vector for such an attack?

Kubeflow framework consists of many different services. Some of those services include: frameworks for training models, Katib and Jupyter notebook server, and more.

Kubeflow is a containerized service: the various tasks run as containers in the cluster. Therefore, if attackers somehow get access to Kubeflow, they have multiple ways to run their malicious image in the cluster.

The framework is divided into different namespaces, which are a collection of Kubeflow services. Those namespaces are translated into Kubernetes namespaces in which the resources are deployed.

In first access to Kubeflow, the user is prompted to create a namespace:

In first access to Kubeflow, the user is prompted to create a namespace.

In the picture above, we created a new namespace with the default name anonymous. This namespace is broadly seen in the attack and was one of the indicators to the access vector in this campaign.

Kubeflow creates multiple CRDs in the cluster which expose some functionality over the API server:

Kubeflow creates multiple CRDs in the cluster.

In addition, Kubeflow exposes its UI functionality via a dashboard that is deployed in the cluster:

Kubeflow exposes its UI functionality via a dashboard.

The dashboard is exposed by Istio ingress gateway, which is by default accessible only internally. Therefore, users should use port-forward to access the dashboard (which tunnels the traffic via the Kubernetes API server).

In some cases, users modify the setting of the Istio Service to Load-Balancer which exposes the Service (istio-ingressgateway in the namespace istio-system) to the Internet. We believe that some users chose to do it for convenience: without this action, accessing to the dashboard requires tunneling through the Kubernetes API server and isn’t direct. By exposing the Service to the Internet, users can access to the dashboard directly. However, this operation enables insecure access to the Kubeflow dashboard, which allows anyone to perform operations in Kubeflow, including deploying new containers in the cluster.

If attackers have access to the dashboard, they have multiple methods to deploy a backdoor container in the cluster. We will demonstrate two options:

  1. Kubeflow enables users to create a Jupyter notebook server. Kubeflow allows users to choose the image for the notebook server, including an option to specify a custom image:

Image of a Jupyter notebook server custom image deployment option.

This image doesn’t necessarily have to be a legitimate notebook image, thus attackers can run their own image using this feature.

  1. Another method that attackers can use is to deploy a malicious container from a real Jupyter notebook: attackers can use a new or existing notebook for running their Python code. The code runs from the notebook server, which is a container by itself with a mounted service account. This service account (by default configuration) has permissions to deploy containers in its namespace. Therefore, attackers can use it to deploy their backdoor container in the cluster. Here’s an example of deploying a container from the notebook using its service account:

Here’s an example of deploying a container from the notebook using its service account.

The Kubernetes threat matrix that we recently published contains techniques that can be used by attackers to attack the Kubernetes cluster. A representation of this campaign in the matrix would look like:

A representation of this campaign in the matrix.

The attacker used an exposed dashboard (Kubeflow dashboard in this case) for gaining initial access to the cluster. The execution and persistence in the cluster were performed by a container that was deployed in the cluster. The attacker managed to move laterally and deploy the container using the mounted service account. Finally, the attacker impacted the cluster by running a cryptocurrency miner.

How to check if your cluster is impacted?

  1. Verify that the malicious container is not deployed in the cluster. The following command can help you to check it:

kubectl get pods –all-namespaces -o jsonpath=”{.items[*].spec.containers[*].image}”  | grep -i ddsfdfsaadfs 

  1. In case Kubeflow is deployed in the cluster, make sure that its dashboard isn’t exposed to the internet: check the type of the Istio ingress service by the following command and make sure that it is not a load balancer with a public IP:

kubectl get service istio-ingressgateway -n istio-system


Azure Security Center has detected multiple campaigns against Kubernetes clusters in the past that have a similar access vector: an exposed service to the internet. However, this is the first time that we have identified an attack that targets Kubeflow environments specifically.

When deploying a service like Kubeflow within a cluster it is crucial to be aware of security aspects such as:

  1. Authentication and access control to the application.
  2. Monitor the public-facing endpoints of the cluster. Make sure that sensitive interfaces are not exposed to the internet in an unsecure method. You can restrict public load balancers in the cluster by using Azure Policy, which now has integration with Gatekeeper.
  3. Regularly monitor the runtime environment. This includes monitoring the running containers, their images, and the processes that they run.
  4. Allow deployments of only trusted images and scan your images for vulnerabilities. The allowed images in the cluster can be restricted by using Azure Policy.

To learn more about AKS Support in Azure Security Center, please see this documentation.

Start a trial of Azure Security Center Standard to get advanced threat protection capabilities.

The post Misconfigured Kubeflow workloads are a security risk appeared first on Microsoft Security.

The science behind Microsoft Threat Protection: Attack modeling for finding and stopping evasive ransomware

June 10th, 2020 No comments

The linchpin of successful cyberattacks, exemplified by nation state-level attacks and human-operated ransomware, is their ability to find the path of least resistance and progressively move across a compromised network. Determining the full scope and impact of these attacks is one the most critical, but often most challenging, parts of security operations.

To provide security teams with the visibility and solutions to fight cyberattacks, Microsoft Threat Protection (MTP) correlates threat signals across multiple domains and point solutions, including endpoints, identities, data, and applications. This comprehensive visibility allows MTP to coordinate prevention, detection, and response across your Microsoft 365 data.

One of the many ways that MTP delivers on this promise is by providing high-quality consolidation of attack evidence through the concept of incidents. Incidents combine related alerts and attack behaviors within an enterprise. An example of an incident is the consolidation of all behaviors indicating ransomware is present on multiple machines, and connecting lateral movement behavior with initial access via brute force. Another example can be found in the latest MITRE ATT&CK evaluation, where Microsoft Threat Protection automatically correlated 80 distinct alerts into two incidents that mirrored the two attack simulations.

The incident view helps empower defenders to quickly understand and respond to the end-to-end scope of real-world attacks. In this blog we will share details about a data-driven approach for identifying and augmenting incidents with behavioral evidence of lateral movement detected through statistical modeling. This novel approach, an intersection of data science and security expertise, is validated and leveraged by our own Microsoft Threat Experts in identifying and understanding the scope of attacks.

Identifying lateral movement

Attackers move laterally to escalate privileges or to steal information from specific machines in a compromised network. Lateral movement typically involves adversaries attempting to co-opt legitimate management and business operation capabilities, including applications such as Server Message Block (SMB), Windows Management Instrumentation (WMI), Windows Remote Management (WinRM), and Remote Desktop Protocol (RDP). Attackers target these technologies that have legitimate uses in maintaining functionality of a network because they provide ample opportunities to blend in with large volumes of expected telemetry and provide paths to their objectives. More recently, we have observed attackers performing lateral movement, and then using the aforementioned WMI or SMB to deploy ransomware or data-wiping malware to multiple target machines in the network.

A recent attack from the PARINACOTA group, known for human-operated attacks that deploy the Wadhrama ransomware, is notable for its use of multiple methods for lateral movement. After gaining initial access to an internet-facing server via RDP brute force, the attackers searched for additional vulnerable machines in the network by scanning on ports 3389 (RDP), 445 (SMB), and 22 (SSH).

The adversaries downloaded and used Hydra to brute force targets via SMB and SSH. In addition, they used credentials that they stole through credential dumping using Mimikatz to sign into multiple other server machines via Remote Desktop. On all additional machines they were able to access, the attackers performed mainly the same activities, dumping credentials and searching for valuable information.

Notably, the attackers were particularly interested in a server that did not have Remote Desktop enabled. They used WMI in conjunction with PsExec to allow remote desktop connections on the server and then used netsh to disable blocking on port 3389 in the firewall. This allowed the attackers to connect to the server via RDP.

They eventually used this server to deploy ransomware to a huge portion of the organization’s server machine infrastructure. The attack, an example of a human-operated ransomware campaign, crippled much of the organization’s functionality, demonstrating that detecting and mitigating lateral movement is critical.

PARINACOTA ransomware attack chain

Figure 1. PARINACOTA attack with multiple lateral movement methods

A probabilistic approach for inferring lateral movement

Automatically correlating alerts and evidence of lateral movement into distinct incidents requires understanding the full scope of an attack and establishing the links of an attacker’s activities that show movement across a network. Distinguishing malicious attacker activities among the noise of legitimate logons in complex networks can be challenging and time-consuming. Failing to get an aggregated view of all related alerts, assets, investigations, and evidence may limit the action that defenders take to mitigate and fully resolve an attack.

Microsoft Threat Protection uses its unique cross-domain visibility and built-in automation powered to detect lateral movement The data-driven approach to detecting lateral movement involves understanding and statistically quantifying behaviors that are observed to a part of one attack chain, for example, credential theft followed by remote connections to other devices and further unexpected or malicious activity.

Dynamic probability models, which are capable of self-learning over time using new information, quantify the likelihood of observing lateral movement given relevant signals. These signals can include the frequency of network connections between endpoints over certain ports, suspicious dropped files, and types of processes that are executed on endpoints. Multiple behavioral models encode different facets of an attack chain by correlating specific behaviors associated with attacks. These models, in combination with anomaly detection, drive the discovery of both known and unknown attacks.

Evidence of lateral movement can be modeled using a graph-based approach, which involves constructing appropriate nodes and edges in the right timeline. Figure 2 depicts a graphical representation of how an attacker might laterally move through a network. The objective of graphing an attack is to discover related subgraphs with high enough confidence to surface for immediate further investigation. Building behavioral models that can accurately compute probabilities of attacks is key to ensuring that confidence is correctly measured and all related events are combined.

Visualization of network with an attacker moving laterally

Figure 2. Visualization of network with an attacker moving laterally (combining incidents 1, 2, 4, 5)

Figure 3 outlines the steps involved for modeling lateral movement and encoding behaviors that are later referenced for augmenting incidents. Through advanced hunting, examples of lateral movement are surfaced, and real attack behaviors are analyzed. Signals are then formed by aggregating telemetry, and behavioral models are defined and computed.

Diagram showing steps for specifying statistical models for detecting lateral movement

Figure 3. Specifying statistical models to detect lateral movement encoding behaviors

Behavioral models are carefully designed by statisticians and threat experts working together to combine best practices from probabilistic reasoning and security, and to precisely reflect the attacker landscape.

With behavioral models specified, the process for incident augmentation proceeds by applying fuzzy mapping to respective behaviors, followed by estimating the likelihood of an attack. For example, if there’s sufficient confidence that the relative likelihood of an attack is higher, including the lateral movement behaviors, then the events are linked. Figure 4 shows the flow of this logic. We have demonstrated that the combination of this modeling with a feedback loop based on expert knowledge and real-world examples accurately discovers attack chains.

Diagram showing steps of algorithm for augmenting incidents using graph inference

Figure 4. Flow of incident augmentation algorithm based on graph inference

Chaining together the flow of this logic in a graph exposes attacks as they traverse a network. Figure 5 shows, for instance, how alerts can be leveraged as nodes and DCOM traffic (TCP port 135) as edges to identify lateral movement across machines. The alerts on these machines can then be fused together into a single incident. Visualizing these edges and nodes in a graph shows how a single compromised machine could allow an attacker to move laterally to three machines, one of which was then used for even further lateral movement.

Diagram showing relevant alerts as an attack move laterally from one machine to other machines

Figure 5. Correlating attacks as they pivot through machines

Augmenting incidents with lateral movement intel

The PARINACOTA attack we described earlier is a human-operated ransomware campaign that involved compromising six newly onboarded servers. Microsoft Threat Protection automatically correlated the following events into an incident that showed the end-to-end attack chain:

  • A behavioral model identified RDP inbound brute force attempts that started a few days before the ransomware was deployed, as depicted in Figure 6.
  • When the initial compromise was detected, the brute force attempts were automatically identified as the cause of the breach.
  • Following the breach, attackers dropped multiple suspicious files on the compromised server and proceeded to move laterally to multiple other servers and deploy the ransomware payload. This attack chain raised 16 distinct alerts that Microsoft Threat Protection, applying the probabilistic reasoning method, correlated into the same incident indicating the spread of ransomware, as illustrated in Figure 7.

Graph showing increased daily inbound RDP traffic

Figure 6. Indicator of brute force attack based on time series count of daily inbound public IP

Diagram showing ransomware being deployed after an attacker has moved laterally

Figure 7. Representation of post breach and ransomware spreading from initial compromised server

Another area where constructing graphs is particularly useful is when attacks originate from unknown devices. These unknown devices can be misconfigured machines, rogue devices, or even IoT devices within a network. Even when there’s no robust telemetry from devices, they can still be used as linking points for correlating activity across multiple monitored devices.

In one example, as demonstrated in figure 8, we saw lateral movement from an unmonitored device via SMB to a monitored device. That device then established a connection back to a command-and-control (C2), set up persistence, and collected a variety of information from the device. Later, the same unmonitored device established an SMB connection to a second monitored device. This time, the only actions the attacker took was to collect information from the device.

The two devices shared a common set of events that were correlated into the same incident:

  • Sign-in from an unknown device via SMB
  • Collecting device information

Diagram showing suspicious traffic from unknown devices

Figure 8: Correlating attacks from unknown devices


Lateral movement is one of the most challenging areas of attack detection because it can be a very subtle signal amidst the normal hum of a large environment. In this blog we described a data-driven approach for identifying lateral movement in enterprise networks, with the goal of driving incident-level discovery of attacks, delivering on the Microsoft Threat Protection (MTP) promise to provide coordinated defense against attacks. This approach works by:

  • Consolidating signals from Microsoft Threat Protection’s unparalleled visibility into endpoints, identities, data, and applications.
  • Forming automated, compound questions of the data to identify evidence of an attack across the data ecosystem.
  • Building subgraphs of lateral movement across devices by modeling attack behavior probabilistically.

This approach combines industry-leading optics, expertise, and data science, resulting in automated discovery of some of the most critical threats in customer environments today. Through Microsoft Threat Protection, organizations can uncover lateral movement in their networks and gain understanding of end-to-end attack chains. Microsoft Threat Protection empowers defenders to automatically stop and resolve attacks, so security operations teams can focus their precious time and resources to more critical tasks, including performing mitigation actions that can remove the ability of attackers to move laterally in the first place, as outlined in some of our recent investigations here and here.



Justin Carroll, Cole Sodja, Mike Flowers, Joshua Neil, Jonathan Bar Or, Dustin Duran

Microsoft Threat Protection Team


The post The science behind Microsoft Threat Protection: Attack modeling for finding and stopping evasive ransomware appeared first on Microsoft Security.

Microsoft researchers work with Intel Labs to explore new deep learning approaches for malware classification

May 8th, 2020 No comments

The opportunities for innovative approaches to threat detection through deep learning, a category of algorithms within the larger framework of machine learning, are vast. Microsoft Threat Protection today uses multiple deep learning-based classifiers that detect advanced threats, for example, evasive malicious PowerShell.

In continued exploration of novel detection techniques, researchers from Microsoft Threat Protection Intelligence Team and Intel Labs are collaborating to study new applications of deep learning for malware classification, specifically:

  • Leveraging deep transfer learning technique from computer vision to static malware classification
  • Optimizing deep learning techniques in terms of model size and leveraging platform hardware capabilities to improve execution of deep-learning malware detection approaches

For the first part of the collaboration, the researchers built on Intel’s prior work on deep transfer learning for static malware classification and used a real-world dataset from Microsoft to ascertain the practical value of approaching the malware classification problem as a computer vision task. The basis for this study is the observation that if malware binaries are plotted as grayscale images, the textural and structural patterns can be used to effectively classify binaries as either benign or malicious, as well as cluster malicious binaries into respective threat families.

The researchers used an approach that they called static malware-as-image network analysis (STAMINA). Using the dataset from Microsoft, the study showed that the STAMINA approach achieves high accuracy in detecting malware with low false positives.

The results and further technical details of the research are listed in the paper STAMINA: Scalable deep learning approach for malware classification and set the stage for further collaborative exploration.

The role of static analysis in deep learning-based malware classification

While static analysis is typically associated with traditional detection methods, it remains to be an important building block for AI-driven detection of malware. It is especially useful for pre-execution detection engines: static analysis disassembles code without having to run applications or monitor runtime behavior.

Static analysis produces metadata about a file. Machine learning classifiers on the client and in the cloud then analyze the metadata and determine whether a file is malicious. Through static analysis, most threats are caught before they can even run.

For more complex threats, dynamic analysis and behavior analysis build on static analysis to provide more features and build more comprehensive detection. Finding ways to perform static analysis at scale and with high effectiveness benefits overall malware detection methodologies.

To this end, the research borrowed knowledge from  computer vision domain to build an enhanced static malware detection framework that leverages deep transfer learning to train directly on portable executable (PE) binaries represented as images.

Analyzing malware represented as image

To establish the practicality of the STAMINA approach, which posits that malware can be classified at scale by performing static analysis on malware codes represented as images, the study covered three main steps: image conversion, transfer learning, and evaluation.

Diagram showing the steps for the STAMINA approach: pre-processing, transfer learning, and evaluation

First, the researchers prepared the binaries by converting them into two-dimensional images. This step involved pixel conversion, reshaping, and resizing. The binaries were converted into a one-dimensional pixel stream by assigning each byte a value between 0 and 255, corresponding to pixel intensity. Each pixel stream was then transformed into a two-dimensional image by using the file size to determine the width and height of the image.

The second step was to use transfer learning, a technique for overcoming the isolated learning paradigm and utilizing knowledge acquired for one task to solve related ones. Transfer learning has enjoyed tremendous success within several different computer vision applications. It accelerates training time by bypassing the need to search for optimized hyperparameters and different architectures—all this while maintaining high classification performance. For this study, the researchers used Inception-v1 as the base model.

The study was performed on a dataset of 2.2 million PE file hashes provided by Microsoft. This dataset was temporally split into 60:20:20 segments for training, validation, and test sets, respectively.

Diagram showing a DNN with pre-trained weights on natural images, and the last portion fine-tuned with new data

Finally, the performance of the system was measured and reported on the holdout test set. The metrics captured include recall at specific false positive range, along with accuracy, F1 score, and area under the receiver operating curve (ROC).


The joint research showed that applying STAMINA to real-world hold-out test data set achieved a recall of 87.05% at 0.1% false positive rate, and 99.66% recall and 99.07% accuracy at 2.58% false positive rate overall. The results certainly encourage the use of deep transfer learning for the purpose of malware classification. It helps accelerate training by bypassing the search for optimal hyperparameters and architecture searches, saving time and compute resources in the process.

The study also highlights the pros and cons of sample-based methods like STAMINA and metadata-based classification methods. For example, STAMINA can go in-depth into samples and extract additional signals that might not be captured in the metadata.  However, for bigger size applications, STAMINA becomes less effective due to limitations in converting billions of pixels into JPEG images and then resizing them. In such cases, metadata-based methods show advantages over our research.

Conclusion and future work

The use of deep learning methods for detecting threats drives a lot of innovation across Microsoft. The collaboration with Intel Labs researchers is just one of the ways in which Microsoft researchers and data scientists continue to explore novel ways to improve security overall.

This joint research is a good starting ground for more collaborative work. For example, the researchers plan to collaborate further on platform acceleration optimizations that can allow deep learning models to be deployed on client machines with minimal performance impact. Stay tuned.


Jugal Parikh, Marc Marino

Microsoft Threat Protection Intelligence Team


The post Microsoft researchers work with Intel Labs to explore new deep learning approaches for malware classification appeared first on Microsoft Security.

Secure the software development lifecycle with machine learning

April 16th, 2020 No comments

Every day, software developers stare down a long list of features and bugs that need to be addressed. Security professionals try to help by using automated tools to prioritize security bugs, but too often, engineers waste time on false positives or miss a critical security vulnerability that has been misclassified. To tackle this problem data science and security teams came together to explore how machine learning could help. We discovered that by pairing machine learning models with security experts, we can significantly improve the identification and classification of security bugs.

At Microsoft, 47,000 developers generate nearly 30 thousand bugs a month. These items get stored across over 100 AzureDevOps and GitHub repositories. To better label and prioritize bugs at that scale, we couldn’t just apply more people to the problem. However, large volumes of semi-curated data are perfect for machine learning. Since 2001 Microsoft has collected 13 million work items and bugs. We used that data to develop a process and machine learning model that correctly distinguishes between security and non-security bugs 99 percent of the time and accurately identifies the critical, high priority security bugs, 97 percent of the time. This is an overview of how we did it.

Qualifying data for supervised learning

Our goal was to build a machine learning system that classifies bugs as security/non-security and critical/non-critical with a level of accuracy that is as close as possible to that of a security expert. To accomplish this, we needed a high-volume of good data. In supervised learning, machine learning models learn how to classify data from pre-labeled data. We planned to feed our model lots of bugs that are labeled security and others that aren’t labeled security. Once the model was trained, it would be able to use what it learned to label data that was not pre-classified. To confirm that we had the right data to effectively train the model, we answered four questions:

  • Is there enough data? Not only do we need a high volume of data, we also need data that is general enough and not fitted to a small number of examples.
  • How good is the data? If the data is noisy it means that we can’t trust that every pair of data and label is teaching the model the truth. However, data from the wild is likely to be imperfect. We looked for systemic problems rather than trying to get it perfect.
  • Are there data usage restrictions? Are there reasons, such as privacy regulations, that we can’t use the data?
  • Can data be generated in a lab? If we can generate data in a lab or some other simulated environment, we can overcome other issues with the data.

Our evaluation gave us confidence that we had enough good data to design the process and build the model.

Data science + security subject matter expertise

Our classification system needs to perform like a security expert, which means the subject matter expert is as important to the process as the data scientist. To meet our goal, security experts approved training data before we fed it to the machine learning model. We used statistical sampling to provide the security experts a manageable amount of data to review. Once the model was working, we brought the security experts back in to evaluate the model in production.

With a process defined, we could design the model. To classify bugs accurately, we used a two-step machine learning model operation. First the model learned how to classify security and non-security bugs. In the second step the model applied severity labels—critical, important, low-impact—to the security bugs.

Our approach in action

Building an accurate model is an iterative process that requires strong collaboration between subject matter experts and data scientists:

Data collection: The project starts with data science. We identity all the data types and sources and evaluate its quality.

Data curation and approval: Once the data scientist has identified viable data, the security expert reviews the data and confirms the labels are correct.

Modeling and evaluation: Data scientists select a data modeling technique, train the model, and evaluate model performance.

Evaluation of model in production: Security experts evaluate the model in production by monitoring the average number of bugs and manually reviewing a random sampling of bugs.

The process didn’t end once we had a model that worked. To make sure our bug modeling system keeps pace with the ever-evolving products at Microsoft, we conduct automated re-training. The data is still approved by a security expert before the model is retrained, and we continuously monitor the number of bugs generated in production.

More to come

By applying machine learning to our data, we accurately classify which work items are security bugs 99 percent of the time. The model is also 97 percent accurate at labeling critical and non-critical security bugs. This level of accuracy gives us confidence that we are catching more security vulnerabilities before they are exploited.

In the coming months, we will open source our methodology to GitHub.

In the meantime, you can read a published academic paper, Identifying security bug reports based solely on report titles and noisy data, for more details. Or download a short paper that was featured at Grace Hopper Celebration 2019.

Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity. To learn more about our Security solutions visit our website.

The post Secure the software development lifecycle with machine learning appeared first on Microsoft Security.

Welcoming more women into cybersecurity: the power of mentorships

March 19th, 2020 No comments

From the way our industry tackles cyber threats, to the language we have developed to describe these attacks, I’ve long been a proponent to challenging traditional schools of thought—traditional cyber-norms—and encouraging our industry to get outside its comfort zones. It’s important to expand our thinking in how we address the evolving threat landscape. That’s why I’m not a big fan of stereotypes; looking at someone and saying they “fit the mold.” Looking at my CV, one would think I wanted to study law, or politics, not become a cybersecurity professional. These biases and unconscious biases shackle our progression. The scale of our industry challenges is too great, and if we don’t push boundaries, we miss out on the insights that differences in race, gender, ethnicity, sexuality, neurology, ability, and degrees can bring.

As we seek to diversify the talent pool, a key focus needs to be on nurturing female talent. Microsoft has hired many women in security, and we will always focus on keeping a diverse workforce. That’s why as we celebrate Women in Cybersecurity Month and International Women’s Day, the security blog will feature a few women cybersecurity leaders who have been implementing some of their great ideas for how to increase the number of women in this critical field. I’ll kick it off the series with some thoughts on how we can build strong mentoring relationships and networks that encourage women to pursue careers in cybersecurity.

There are many women at Microsoft who lead our security efforts. I’m incredibly proud to be among these women, like Joy Chik, Corporate Vice President of Identity, who is pushing the boundaries on how the tech industry is thinking about going passwordless, and Valecia Maclin, General Manager of Security Engineering, who is challenging us to think outside the box when it comes to our security solutions. On my own team, I think of the many accomplishments of  Ping Look, who co-founded Black Hat and now leads our Detection and Response Team (DART), Sian John, MBE, who was recently recognized as one of the top 50 influencers in cybersecurity in the U.K., and Diana Kelley, Microsoft CTO, who tirelessly travels to the globe to share how we are empowering our customers through cybersecurity—just to name a few. It’s important we continue to highlight women like these, including our female cybersecurity professionals at Microsoft who made the Top 100 Cybersecurity list in 2019. The inspiration from their accomplishments goes far beyond our Microsoft campus. These women represent the many Microsoft women in our talented security team. This month, you’ll also hear from some of them in subsequent blog posts on how to keep the diverse talent you already have employed. And to conclude the month, Theresa Payton, CEO at Fortalice Solutions, LLC., and the host of our CISO Spotlight series will share tips from her successful experience recruiting talented women into IT and cybersecurity.

Our cyber teams must be as diverse as the problems we are trying to solve

You’ve heard me say this many times, and I truly believe this: As an industry, we’ve already acknowledged the power of diversity—in artificial intelligence (AI). We have clear evidence that a variety of data across multiple sources and platforms enhances and improves AI and machine learning models. Why wouldn’t we apply that same advantage to our teams? This is one of several reasons why we need to take diversity and inclusion seriously:

  • Diverse teams make better and faster decisions 87 percent of the time compared with all male teams, yet the actual number of women in our field fluctuates between 10 and 20 percent. What ideas have we missed by not including more women?
  • With an estimated shortfall of 3.5 million security professionals by 2021, the current tech talent pipeline needs to expand—urgently.
  • Cyber criminals will continue to exploit the unconscious bias inherent in the industry by understanding and circumventing the homogeneity of our methods. If we are to win the cyber wars through the element of surprise, we need to make our strategy less predictable.

Mentoring networks must start early

Mentorship can be a powerful tool for increasing the number of women in cybersecurity. People select careers that they can imagine themselves doing. This process starts young. Recently a colleague’s pre-teen daughter signed up for an after-school robotics class. When she showed up at the class, only two other girls were in the room. Girls are opting out of STEM before they can (legally) opt into a PG-13 movie. But we can change this. By exposing girls to technology earlier, we can reduce the intimidation factor and get them excited. One group that is doing this is the Security Advisor Alliance. Get involved in organizations like this to reach girls and other underrepresented groups before they decide cybersecurity is not for them.

Building a strong network

Mentoring young people is important, but to solve the diversity challenges, we also need to bring in people who started on a different career path or who don’t have STEM degrees. You simply won’t find the talent you need through the anemic pipeline of college-polished STEM graduates. I recently spoke with Mari Galloway, a senior security architect in the gaming industry and CEO of the Women’s Society of Cyberjutsu (WSC) about this very topic in my podcast. She agreed on the importance of finding a mentor, and being a mentee.

Those seeking to get into cybersecurity need a network that provides the encouragement and constructive feedback that will help them grow. I have mentored several non-technical women who have gone on to have successful roles in cybersecurity. These relationships have been very rewarding for me and my mentees, which is why I advocate that everybody should become a mentor and a mentee.

If you haven’t broken into cybersecurity yet, or if you are in the field and want to grow your career, here are a few tips:

  • Close the skills gap through training and certificate programs offered by organizations like Sans Institute and ISC2. I am especially excited about Girls Go Cyberstart, a program for young people that Microsoft is working on with Sans Institute.
  • Build up your advocate bench with the following types of mentors:
    • Career advocate: Someone who helps you with your career inside your company or the one you want to enter.
    • Coach: Someone outside your organization who brings a different perspective to troubleshooting day-to-day problems.
    • Senior advisor: Someone inside or outside your organization who looks out for the next step in your career.
  • Use social media to engage in online forums, find local events, and reach experts. Several of my mentees use LinkedIn to start the conversation.
  • When you introduce yourself to someone online be clear that you are interested in their cumulative experience not just their job status.

For those already in cybersecurity, be open to those from the outside seeking guidance, especially if they don’t align with traditional expectations of who a cybersecurity professional is.

Mentorship relationships that yield results

A mentorship is only going to be effective if the mentee gets valuable feedback and direction from the relationship. This requires courageous conversations. It’s easy to celebrate a mentee’s visible wins. However, those moments are the result of unseen trench work that consists of course correcting and holding each other accountable to agreed upon actions. Be prepared to give and receive constructive, actionable feedback.

Creating inclusive cultures

More women and diverse talent should be hired in security not only because it is the right thing to do, but because gaining the advantage in fighting cybercrime depends on it. ​Mentorship is one strategy to include girls before they opt out of tech, and to recruit people from non-STEM backgrounds.

What’s next

Watch for Diana Kelley’s blog about how to create a culture that keeps women in the field.

Learn more about Girls Go Cyberstart.

Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity. Or reach out to me on LinkedIn or Twitter.

The post Welcoming more women into cybersecurity: the power of mentorships appeared first on Microsoft Security.

New Microsoft Security innovations and partnerships

February 20th, 2020 No comments

Today on the Official Microsoft Blog, Ann Johnson, Corporate Vice President of the Cybersecurity Solutions Group, shared how Microsoft is helping turn the tide in cybersecurity by putting artificial intelligence (AI) in the hands of defenders. She announced the general availability of Microsoft Threat Protection, new platforms supported by Microsoft Defender Advanced Threat Protection (ATP), new capabilities in Azure Sentinel, and the general availability of Insider Risk Management in Microsoft 365.

Today, we’re also announcing:

  • An expanded public preview of FIDO2 security key support in Azure Active Directory (AD) to encompass hybrid environments. Workers can now sign in to work-owned Windows 10 devices with their Azure AD accounts using a FIDO2 security key instead of a password and automatically get single sign-on (SSO) to both on-premises and cloud resources.
  • New integration between Microsoft Cloud App Security and Microsoft Defender ATP that enables endpoint-based control of unsanctioned cloud applications. Administrators can now control the unauthorized use of cloud apps with protection built right into the endpoint.
  • Azure Security Center for IoT now supports a broader range of devices including Azure RTOS OS, Linux specifically Ubuntu and Debian, and Windows 10 IoT core. SecOps professionals can now reason over signals in an experience that combines IT and OT into a single view.
  • Two new features of Office 365 Advanced Threat Protection (ATP), campaign views and compromise detection and response, are now generally available. Campaign views gives security teams a complete view of email attack campaigns and makes it easier to address vulnerable users and configuration issues. Compromise detection and response speeds the detection of compromised users and is critical to ensuring that attacks are blocked early, and the impact of a breach is minimized.
  • In partnership with Terranova, we will offer customized user learning paths in Office 365 ATP later this year. User education needs to be part of every organization’s security strategy and we are investing to raise security awareness training efficacy.

These innovations are just a part of our commitment to built-in and cross-platform security that embraces AI and is deeply integrated together.

This integration also spans a broad ecosystem of security vendors to help solve for our customers’ security and compliance needs. We now have more than 100 members in the Microsoft Intelligent Security Association, including new members such as ServiceNow, Thales, and Trend Micro, and new IoT security solution providers like Attivo Networks, CyberMDX, CyberX, and Firedome to alleviate the integration challenges enterprises face.

To recognize outstanding efforts across the security ecosystem, on February 23, 2020—the night before the RSA Conference begins—we’ll host our inaugural security partner awards event, Microsoft Security 20/20, to celebrate our partners.

Good people, supported by AI and automation, have the advantage in the ongoing cybersecurity battle. That’s why we continue to innovate with new security and compliance solutions to help our customers in this challenge.

The post New Microsoft Security innovations and partnerships appeared first on Microsoft Security.

Free import of AWS CloudTrail logs through June 2020 and other exciting Azure Sentinel updates

February 20th, 2020 No comments

SecOps teams are increasingly challenged to protect assets across distributed environments, analyze the growing volume of security data, and prioritize response to real threats.

As a cloud-native SIEM solution (security information and event management), Azure Sentinel uses artificial intelligence (AI) and automation to help address these challenges. Azure Sentinel empowers SecOps teams to be more efficient and effective at responding to threats in the cloud, on-premises, and beyond.

Azure Sentinel

Intelligent security analytics for your entire enterprise.

Learn more

Our innovation continues, and we have some exciting news to share for the RSA 2020 conference including the ability to import AWS CloudTrail data for free through June 2020, opportunities to win up to $1,000 for community contributions, and many other product updates.

Enable unified response across multiple clouds—now with free import of AWS CloudTrail data through June 2020

More than 60 percent of enterprises have a hybrid cloud strategy—a combination of private and multi-cloud deployments. We’re committed to help SecOps teams defend the entire stack, not just Microsoft workloads. That’s why Azure Sentinel includes built-in connectors to bring together data from Microsoft solutions with data from other cloud platforms and security solutions.

You can already ingest data from Azure activity logs, Office 365 audit logs, and alerts from Microsoft 365 security solutions at no additional cost. To further help our customers secure their entire multi-cloud estate, today we’re announcing the ability to import your AWS CloudTrail logs into Azure Sentinel at no additional cost from February 24, 2020 until June 30, 2020.

New and existing customers of Azure Sentinel can take advantage of this offer by using the built-in connector for AWS CloudTrail logs. Data retention charges after 90 days period and other related charges are applicable during this time as per Azure Sentinel terms. Learn more about Azure Sentinel pricing.

Image of AWS CloudTrail logs.

Once connected to your AWS CloudTrail logs, you can visualize and get relevant insights using built-in workbooks. You can even customize these dashboards and combine insights from other sources to meet your needs:

Image of AWS network activities.

Detections and hunting queries developed by Microsoft Security experts will make it easier to identify and respond to potential threats in your AWS environment:

Image showing credential abuse in AWS CloudTrail.

Gain visibility into threats targeting IoT

With the exponential growth in connected devices creating an uptick in attacks targeting IoT, it is critical for enterprise SecOps teams to include IoT data in their scope. A new Azure Security Center for IoT connector makes it easy for customers to onboard data from Azure IoT Hub-managed deployments into Azure Sentinel. Customers can now monitor alerts across all IoT Hub deployments along with other related alerts in Azure Sentinel, inspect and triage IoT incidents, and run investigations to track an attacker’s lateral movement within their enterprise.

With this announcement Azure Sentinel is the first SIEM with native IoT support, allowing SecOps and analysts to identify threats in these complex converged environments.

In addition, Upstream Security, a cloud-based automotive cybersecurity detection and response company, is launching integration with Azure Sentinel. This will enable customers to send threats detected by Upstream Security’s C4 platform to Azure Sentinel for further investigation.

Collect data from additional data sources

We’re continually adding new data connectors from leading security solutions and partners. Each of these data connectors have sample queries and dashboards to help you start working with the data immediately in Azure Sentinel:

  • Forcepoint—Three new connectors enable customers to bring in data from Forcepoint NextGen Firewall logs (NGFW), Cloud Access Security Broker (CASB) logs and events, and Data Loss Prevention (DLP) incident data in Azure Sentinel.
  • Zimperium—Customers can use the Zimperium Mobile Threat Defense (MTP) connector to get Zimperium threat logs in Azure Sentinel.
  • Squadra technologies—Customers can get their Squadra secRMM (security removable media manager) event data for the USB removable devices in Azure Sentinel.

Bring SIGMA detections to Azure Sentinel

The SOC Prime Threat Detection Marketplace—which includes 950+ rules mapped to MITRE ATT&CK to address over 180 attacker techniques—now supports Azure Sentinel analytics rules. The SOC Prime marketplace provides unprecedented access to the latest threat detection content from the SIGMA community, SOC Prime team, and its Threat Bounty Program members. New detection rules are continuously created and updated by security researchers and published daily at the SOC Prime marketplace, helping companies to detect latest threats, vulnerability exploitation attempts and enable TTP-based threat hunting. Once the rules are published, using the Azure Sentinel integration you can instantly deploy them from within TDM to your Azure Sentinel instance with just one click.

Use ReversingLabs threat intelligence to inform threat response

ReversingLabs brings two new integrations to Azure Sentinel, enabling customers to leverage rich ReversingLabs threat intelligence for hunting and investigation in Azure Sentinel. The first integration features an Azure Sentinel Notebooks sample that connects to the Reversing Labs API to enable hunting scenarios that include ReversingLabs threat intelligence data. In addition, a new ReversingLabs TitaniumCloud connector for Azure Logic Apps and sample playbook enable security incident responders to automatically identify key information about file-based threats to rapidly triage incoming alerts.

Detect threats with greater confidence using new machine learning models

Azure Sentinel uses AI-based Fusion technology to stitch together huge volumes of low and medium fidelity alerts across different sources and then elevates the combined incidents to a high priority alert that security professionals can investigate. Learn how Azure Sentinel evaluated nearly 50 million suspicious signals for Microsoft in a single month to create just 23 high confidence incidents for our SecOps team to investigate.

In addition to the existing machine learning detections that look for multi-stage attacks, we are introducing several new scenarios in public preview using Microsoft Defender Advanced Threat Protection (ATP) and Palo Alto logs. These new detections will help SecOps teams to identify attacks that may otherwise be missed and reduce the mean time to remediate threats.

Manage incidents across multiple tenants and workspaces

Managed security service providers and large enterprises often need a central place to manage security incidents across multiple workspaces and tenants. Integration of Azure Sentinel with Azure Lighthouse now lets you view and investigate incidents from different tenants and workspaces in a central pane. This will also help enterprises who need to keep separate workspaces in different regions to meet regulatory requirements while managing incidents in a central place.

Join the Azure Sentinel private preview in Azure Government

Azure Sentinel is now available in private preview in Azure Government, starting with US Gov Virginia region. To join the preview please contact us at

Azure Sentinel is currently going through the FedRAMP-High certification process, and Microsoft anticipates achieving compliance by the summer of 2020.

Get rewarded up to $1,000 for your contributions to the Azure Sentinel community

Cybersecurity is a community-driven effort with defenders helping each other to scale against sophisticated, rapidly evolving threats. Azure Sentinel has a thriving community of threat hunters that share hunting, detection and investigation queries, automated workflows, visualizations, and much more in the Azure Sentinel GitHub repository.

We’re announcing a special program for our threat hunter community, featuring:

Review the Recognition and Rewards documentation and see our newly redesigned GitHub experience.

Try Azure Sentinel and visit us at the RSA Conference 2020

Since the general availability of Azure Sentinel last September, there are many examples of how Azure Sentinel helps customers like ASOS, Avanade, University of Phoenix, SWC Technology Partners, and RapidDeploy improve their security across diverse environments while reducing costs.

It’s easy to get started. You can access the new features in Azure Sentinel today. If you are not using Azure Sentinel, we welcome you to start a trial.

Our team will be showcasing Azure Sentinel at the RSA Conference next week. Take a look at all the featured sessions, theater sessions and other activities planned across Microsoft Security technologies. We hope to meet you all there.

Also, bookmark the Security blog to keep up with our expert coverage on security matters and follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post Free import of AWS CloudTrail logs through June 2020 and other exciting Azure Sentinel updates appeared first on Microsoft Security.

Azure Sentinel uncovers the real threats hidden in billions of low fidelity signals

February 20th, 2020 No comments

Cybercrime is as much a people problem as it is a technology problem. To respond effectively, the defender community must harness machine learning to compliment the strengths of people. This is the philosophy that undergirds Azure Sentinel. Azure Sentinel is a cloud-native SIEM that exploits machine learning techniques to empower security analysts, data scientists, and engineers to focus on the threats that matter. You may have heard of similar solutions from other vendors, but the Fusion technology that powers Azure Sentinel sets this SIEM apart for three reasons:

  1. Fusion finds threats that fly under the radar, by combining low fidelity, “yellow” anomalous activities into high fidelity “red” incidents.
  2. Fusion does this by using machine learning to combine disparate data—network, identity, SaaS, endpoint—from both Microsoft and Partner data sources.
  3. Fusion incorporates graph-based machine learning and a probabilistic kill chain to reduce alert fatigue by 90 percent.

Azure Sentinel

Intelligent security analytics for your entire enterprise.

Learn more

You can get a sense of how powerful Fusion is by looking at data from December 2019. During that month, billions of events flowed into Azure Sentinel from thousands of Azure Sentinel customers. Nearly 50 billion anomalous alerts were identified and graphed. After Fusion applied the probabilistic kill chain, the graph was reduced to 110 sub graphs. A second level of machine learning reduced it further to just 25 actionable incidents. This is how Azure Sentinel reduces alert fatigue by 90 percent.

Infographic showing alerts to high-fidelity incidents.

New Fusion scenarios—Microsoft Defender ATP + Palo Alto firewalls

There are currently 35 multi-stage attack scenarios generally available through Fusion machine learning technology in Azure Sentinel. Today, Microsoft has introduced several additional scenarios—in public preview—using Microsoft Defender Advanced Threat Protection (ATP) and Palo Alto logs. This way, you can leverage the power of Sentinel and Microsoft Threat Protection as complementary technologies for the best customer protection.

  • Detect otherwise missed attacks—By stitching together disparate datasets using Bayesian methods, Fusion helps to detect attacks that could have been missed.
  • Reduce mean time to remediate—Microsoft Threat Protection provides a best in class investigation experience when addressing alerts from Microsoft products. For non-Microsoft datasets, you can leverage hunting and investigation tools in Azure Sentinel.

Here are a few examples:

An endpoint connects to TOR network followed by suspicious activity on the Internal network—Microsoft Defender ATP detects that a user inside the network made a request to a TOR anonymization service. On its own this incident would be a low-level fidelity. It’s suspicious but doesn’t rise to the level of a high-level threat. Palo Alto firewalls registers anomalous activity from the same IP address, but it isn’t risky enough to block. Separately neither of these alerts get elevated, but together they indicate a multi-stage attack. Fusion makes the connection and promotes it to a high-fidelity incident.

Infographic of the Palo Alto firewall detecting threats.

A PowerShell program on an endpoint connects to a suspicious IP address, followed by suspicious activity on the Internal network—Microsoft Defender ATP generates an alert when a PowerShell program makes a suspicious network connection. If Palo Alto allows traffic from that IP address back into the network, Fusion ties the two incidents together to create a high-fidelity incident

An endpoint connects to a suspicious IP followed by anomalous activity on the Internal network—If Microsoft Defender ATP detects an outbound connection to an IP with a history of unauthorized access and Palo Alto firewalls allows an inbound request from that same IP address, it’s elevated by Fusion.

How Fusion works

  1. Construct graph

The process starts by collecting data from several data sources, such as Microsoft products, Microsoft security partner products, and other cloud providers. Each of those security products output anomalous activity, which together can number in the billions or trillions. Fusion gathers all the low and medium level alerts detected in a 30-day window and creates a graph. The graph is hyperconnected and consists of billions of vertices and edges. Each entity is represented by a vertex (or node). For example, a vertex could be a user, an IP address, a virtual machine (VM), or any other entity within the network. The edges (or links) represent all the activities. If a user accesses company resources with a mobile device, both the device and the user are represented as vertices connected by an edge.

Image of an AAD Detect graph.

Once the graph is built there are still billions of alerts—far too many for any security operations team to make sense of. However, within those connected alerts there may be a pattern that indicates something more serious. The human brain is just not equipped to quickly remove it. This is where machine learning can make a real difference.

  1. Apply probabilistic kill chain

Fusion applies a probabilistic kill chain which acts as a regularizer to the graph. The statistical analysis is based on how real people—Microsoft security experts, vendors, and customers—triage alerts. For example, defenders prioritize kill chains that are time bound. If a kill chain is executed within a day, it will take precedence over one that is enacted over a few days. An even higher priority kill chain is one in which all steps have been completed. This intelligence is encoded into the Fusion machine learning statistical model. Once the probabilistic kill chain is applied, Fusion outputs a smaller number of sub graphs, reducing the number of threats from billions to hundreds.

  1. Score the attack

To reduce the noise further, Fusion uses machine learning to apply a final round of scoring. If labeled data exists, Fusion uses random forests. Labeled data for attacks is generated from the extensive Azure red team that execute these scenarios. If labeled data doesn’t exist Fusion uses spectral clustering.

Some of the criteria used to elevate threats include the number of high impact activity in the graph and whether the subgraph connects to another subgraph.

The output of this machine learning process is tens of threats. These are extremely high priority alerts that require immediate action. Without Fusion, these alerts would likely remain hidden from view, since they can only be seen after two or more low level threats are stitched together to shine a light on stealth activities. AI-generated alerts can now be handed off to people who will determine how to respond.

The great promise of AI in cybersecurity is its ability to enable your cybersecurity people to stay one step ahead of the humans on the other side. AI-backed Fusion is just one example of the innovative potential of partnering technology and people to take on the threats of today and tomorrow.

Learn more

Read more about Azure Sentinel and dig into all the Azure Sentinel detection scenarios.

Also, bookmark the Security blog to keep up with our expert coverage on security matters. Follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post Azure Sentinel uncovers the real threats hidden in billions of low fidelity signals appeared first on Microsoft Security.

NERC CIP compliance in Azure

February 12th, 2020 No comments

When I did my first North American Electric Reliability Corporation—Critical Infrastructure Protection (NERC CIP) compliance project it was 2009. NERC CIP was at version 3. It was the first mandatory cybersecurity standard that the utility I was working for had to meet. As it does today, the Bulk Electric System (BES) had the responsibility to keep North America powered, productive, and safe with near 100 percent uptime. Critical infrastructure for us is not email and payroll systems, it’s drinking water and hospitals. Leading the way to the cloud was not top of mind. The NERC CIP standards were written for on-premise systems.

NERC CIP compliance was a reason many participants in the BES would not deploy workloads to the cloud. NERC CIP version 6 is now in force. NERC has recognized the change in the technology landscape including the security and operational benefits that well architected use of the cloud has to offer.

Microsoft has made substantial investments in enabling our BES customers to comply with NERC CIP in Azure. Microsoft engaged with NERC to unblock NERC CIP workloads from being deployed in Azure and Azure Government.

All U.S. Azure regions are now approved for FedRAMP High impact level. We use this to establish our compliance to NERC and the Regional Reliability Councils.

In June 2019, NERC Electric Reliability Organization (ERO) conducted an audit of Azure in Redmond, Washington. NERC, NERC regional auditor organizations, and the NERC CIPC (Critical Infrastructure Protection Committee) were represented.

We prepared a NERC CIP compliance guide for Azure, and a Cloud Implementation Guide for NERC Audits, which includes pre-filled Reliability Standard Audit Worksheet (Reliability Standard Audit Worksheet (RSAW)) responses. This will help our customers save time and resources in responding to audits.

NERC’s BES Cyber Asset 15-minute rule is important to deploying appropriate NERC CIP workloads to Azure. This rule sets out requirements for BES Cyber Assets that perform real-time functions for monitoring or controlling the BES under the current set of CIP standards and the NERC Glossary of Terms. BES Cyber Assets, under the 15-minute rule, are those that would affect the reliable operation of the BES within 15 minutes of being impaired.

Under the current rules, BES Cyber Assets—like Supervisory Control and Data Acquisition Systems (SCADA) and Energy Management Systems (EMS)—are not good candidates a for move to the cloud for this reason.

Importantly, the NERC CIP standards also recognize that the needs of Bulk Electric System Cyber System Information (BCSI) are different from BES Cyber Assets. BCSI is information that could be used to gain unauthorized access or pose a security threat to the Bulk Electric Cyber System. BCSI is not subject to the 15-minute rule.

Many of the workloads that will benefit most from the operational, security, and cost savings benefits of the cloud are BCSI.

Machine learning, multiple data replicas across fault domains, active failover, quick deployment, and pay for use benefits are now available for BCSI NERC CIP workloads when they’re moved to or born in Azure.

Examples include:

  • Transmission asset status, management, planning, and predictive maintenance.
  • Transmission network planning, demand forecasting, and contingency analysis.
  • Common Information Model (CIM) modeling and geospatial asset location information.
  • Operational equipment data and SCADA Historical Information System.
  • Streaming of operational phasor data to the cloud for storage and analytics.
  • Artificial intelligence (AI) and Advanced Analytics for forecasting, maintenance, and outage management.
  • Internet of Things (IoT) scenarios for transmission line monitoring and maintenance.
  • NERC CIP audit evidence, reports, and records.

We can use information retention and protection on confidential documents with BCSI sensitive information. Azure’s machine learning helps us improve smart grid and do predictive maintenance on plant equipment. We can experiment, fail fast, and stand up infrastructure in hours, not months. The powerful tools and agile technologies that other industries rely on are now available for many NERC CIP workloads.

There are currently over 100 U.S. power and utility companies that use Azure. NERC CIP regulated companies can enjoy the benefits of the cloud in Azure.

In my next post, I’ll discuss the use of Azure public cloud and Azure Government for NERC CIP compliance.

Thanks to Larry Cochrane and Stevan Vidich for their excellent work on Microsoft’s NERC CIP compliance viewpoint and architecture. Some of their documents are linked above.

Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post NERC CIP compliance in Azure appeared first on Microsoft Security.

Afternoon Cyber Tea—From threat intelligence to chatbots: A look at AI in cybersecurity

February 10th, 2020 No comments

I’ve often said our teams should be as diverse as the problems we are trying to solve. Hiring a diverse security team isn’t just the right thing to do, it’s also good business. This is a topic I’m very passionate about, so I was delighted to interview Jane Frankland for the second podcast of Afternoon Cyber Tea, From threat intelligence to chatbots.

Jane founded and ran a cybersecurity company that conducted penetration testing. She also authored the book Insecurity: Why a Failure to Attract and Retain Women in Cybersecurity Is Making Us All Less Safe, and she provides consulting for the cybersecurity community.

Jane and I talked about how important it is for defenders to think like an attacker and the security challenges facing chatbots and other artificial intelligence (AI) technologies. One critical concern that we need to address is the replication of cultural bias in our AI. We both agreed that staffing AI teams with a diverse group of people can help. Jane is a powerful advocate for making cybersecurity and technology spaces more inclusive of women, and she talked through a few research-backed approaches that organizations can take to attract more women to their organizations. It was a great conversation, and I hope you’ll listen to this episode of Afternoon Cyber Tea with Ann Johnson on Apple Podcasts or Podcast One.

Join me at RSA Conference 2020

If you will be in San Francisco in February for the RSA Conference, I will be delivering a keynote, “Why your people are still your best cyber defense,” on February 26, 2020 at 4:05 PM. Over the years, I’ve learned that the companies that are most successful at recovering from a cyberattack tend to have two things in common: the right technology and good people. AI and machine learning will be vital tools in the fight for cybersecurity, but so will the human spirit. Join me at this keynote to hear how to create a culture where people are your best defense.

What’s next

In this important cyber series, I talk with cybersecurity influencers about trends shaping the threat landscape and explore the risk and promise of systems powered by AI, Internet of Things (IoT), and other emerging tech.

You can listen to Afternoon Cyber Tea with Ann Johnson on:

  • Apple Podcasts—You can also download the episode by clicking the Episode Website link.
  • Podcast One—Includes option to subscribe, so you’re notified as soon as new episodes are available.
  • CISO Spotlight page—Listen alongside our CISO Spotlight episodes, where customers and security experts discuss similar topics such as Zero Trust, compliance, going passwordless, and more.

In the meantime, bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us at @MSFTSecurity for the latest news and updates on cybersecurity. Or reach out to me on LinkedIn or Twitter if you have guest or topic suggestions.

The post Afternoon Cyber Tea—From threat intelligence to chatbots: A look at AI in cybersecurity appeared first on Microsoft Security.

RSA Conference 2020—Empower your defenders with artificial intelligence and automation

February 4th, 2020 No comments

The RSA Conference 2020 kicks off in less than three weeks, and the Microsoft Security team can’t wait. This is one of our most important annual events because it provides an invaluable opportunity for us to connect with customers, partners, and other security thought leaders. New ideas are explored. Conventional thinking is challenged. For as important as technology is to cybersecurity, it’s the people doing this work, day in and day out, that truly inspire us.

The role of people in security will be a big theme in Microsoft’s presence at RSA Conference 2020. Our job as technologists is to build intelligent solutions that unleash defenders to do what they do best: creative problem solving. Artificial intelligence (AI) and automation are vital for strong cybersecurity and risk management, not because technology alone can defeat cyberattacks, but because these tools enable people to defend against emergent threats. In Microsoft’s two keynotes, 11 earned sessions, and 42 theater events, Microsoft security experts will share thoughts on how you can empower the heart of your security organization—your people—with AI, machine learning, and automation.

Here are a few highlights to help you plan your time.

Why your people are still your best cyber defense

Keynote speaker: Ann Johnson, Corporate Vice President, Microsoft Cybersecurity Solutions Group

When: Wednesday, February 26

Time: 4:05 PM – 4:25 PM

Access to AI and machine learning, which is powered by the cloud, will mean the difference between struggle or success for modern organizations that defend against cybercriminals. But technology is not enough. The organizations that quickly recover from a cyberattack have another thing in common: an agile team that can problem solve under stress. It works because attackers haven’t prepared for the resilience of the human spirit. So how do you build a culture where people are your best defense? Ann will share some best practices at her keynote.

Collaborating to improve open source security: how the ecosystem is stepping up

Keynote speaker: Mark Russinovich, Chief Technology Officer, Azure

When: Friday, February 28

Time: 9:50 AM

The software supply chain is increasingly under attack. Bad actors attempt to insert malware at all points in the complex network of open source packages, spanning languages, operating systems, runtimes, and tools that make up modern software. But the story isn’t all bad news. Industry and the open-source community have come together to mitigate these threats and improve the security of open source software. This collaboration has produced new ideas for building trust in the supply chain for consumers and producers of software, large and small. Mark’s talk is a great opportunity for you to learn more about the future of supply chain security.

Zero Trust: the buzz, the myths and the facts

Earned session speaker: Bret Arsenault, CVP and Chief Information Security Officer (CISO), Microsoft

When: Thursday, February 27

Time: 9:10 AM – 10:10 AM

Session code: STR-R02

“Zero Trust” is the biggest buzz-word in security since block chain, but what does it mean? Is there a consistent approach or definition? In this session, Bret will discuss what Zero Trust is (and what it isn’t) based on his real-world experience defining a Zero Trust strategy at Microsoft. And, as you’ve come to expect, he’ll give it to you straight. Carve out time for this event to get practical advice for applying Zero Trust to your own organization.

The Microsoft booth theater sessions

The Microsoft booth will be a buzz of activity. If you’re interested in learning about our platform investments, come to one—or several—of our 42 theater sessions. These presentations will dive into our solutions across Zero Trust, Identity & Access Management, Threat Protection, Information Protection & Compliance, and Cloud Security. Learn how our integrated solutions, with AI and machine learning built in, enable defenders to safeguard data, devices, apps, and people.

Or take on hacker “157” in our virtual reality escape room for a fun way to see how our solutions work together.

Read about more of our featured sessions.

Microsoft Security 20/20 partner awards event

Microsoft will host a private awards ceremony to recognize partners in 16 award categories that span security integration partners, system integrators, and managed security service providers. These partners have developed and delivered exceptional Microsoft-based solutions and services during the past year. It will be an honor to celebrate their vision at this event.

Visit the Microsoft RSA Conference 2020 website to register and learn more about our featured speakers and sessions, so you can make the most of your time.

Also, bookmark the Security blog to keep up with our expert coverage on security matters and follow us at @MSFTSecurity for the latest news and updates on cybersecurity.

The post RSA Conference 2020—Empower your defenders with artificial intelligence and automation appeared first on Microsoft Security.

Data science for cybersecurity: A probabilistic time series model for detecting RDP inbound brute force attacks

December 18th, 2019 No comments

Computers with Windows Remote Desktop Protocol (RDP) exposed to the internet are an attractive target for adversaries because they present a simple and effective way to gain access to a network. Brute forcing RDP, a secure network communications protocol that provides remote access over port 3389, does not require a high level of expertise or the use of exploits; attackers can utilize many off-the-shelf tools to scan the internet for potential victims and leverage similar such tools for conducting the brute force attack.

Attackers target RDP servers that use weak passwords and are without multi-factor authentication, virtual private networks (VPNs), and other security protections. Through RDP brute force, threat actor groups can gain access to target machines and conduct many follow-on activities like ransomware and coin mining operations.

In a brute force attack, adversaries attempt to sign in to an account by effectively using one or more trial-and-error methods. Many failed sign-ins occurring over very short time frequencies, typically minutes or even seconds, are usually associated with these attacks. A brute force attack might also involve adversaries attempting to access one or more accounts using valid usernames that were obtained from credential theft or using common usernames like “administrator”. The same holds for password combinations. In detecting RDP brute force attacks, we focus on the source IP address and username, as password data is not available.

In the Windows operating system, whenever an attempted sign-in fails for a local machine, Event Tracing for Windows (ETW) registers Event ID 4625 with the associated username. Meanwhile, source IP addresses connected to RDP can be accessed; this information is very useful in assessing if a machine is under brute force attack. Using this information in combination with Event ID 4624 for non-server Windows machines can shed light on which sign-in sessions were successfully created and can further help in detecting if a local machine has been compromised.

In this blog we’ll present a study and a detection logic that uses these signals. This data science-driven approach to detecting RDP brute force attacks has proven valuable in detecting human adversary activity through Microsoft Threat Experts, the managed threat hunting service in Microsoft Defender Advanced Threat Protection. This work is an example of how the close collaboration between data scientists and threat hunters results in protection for customers against real-world threats.

Insights into brute force attacks

Observing a sudden, relatively large count of Event ID 4625 associated with RDP network connections might be rare, but it does not necessarily imply that a machine is under attack. For example, a script that performs the following actions would look suspicious looking at a time series of counts of failed sign-in but is most likely not malicious:

  • uses an expired password
  • retries sign-in attempts every N-minutes with different usernames
  • over a public IP address within a range owned by the enterprise

In contrast, behavior that includes the following is indicative of an attack:

  • extreme counts of failed sign-ins from many unknown usernames
  • never previously successfully authenticated
  • from multiple RDP connections
  • from new source IP addresses

Understanding the context of failed sign-ins and inbound connections is key to discriminating between true positive (TP) and false positive (FP) brute force attacks, especially if the goal is to automatically raise only high-precision alerts to the appropriate recipients, as we do in Microsoft Defender ATP.

We analyzed several months’ worth of data to mine insights into the types of RDP brute force attacks occurring across Microsoft Defender ATP customers. Out of about 45,000 machines that had both RDP public IP connections and at least 1 network failed sign-in, we discovered that, on average, several hundred machines per day had high probability of undergoing one or more RDP brute force attack attempts. Of the subpopulation of machines with detected brute force attacks, the attacks lasted 2-3 days on average, with about 90% of cases lasting for 1 week or less, and less than 5% lasting for 2 weeks or more.

Figure 1: Empirical distribution in number of days per machine where we observed 1 or more brute force attacks

As discussed in numerous other studies [1], large counts of failed sign-ins are often associated with brute force attacks. Looking at the count of daily failed sign-ins, 90% of cases exceeded 10 attempts, with a median larger than 60. In addition, these unusual daily counts had high positive correlation with extreme counts in shorter time windows (see Figure 2). In fact, the number of extreme failed sign-ins per day typically occurred under 2 hours, with about 40% failing in under 30 minutes.

Figure 2: Count of daily and maximum hourly network failed sign-ins for a local machine under brute force attack

While a detection logic based on thresholding the count of failed sign-ins during daily or finer grain time window can detect many brute force attacks, this will likely produce too many false positives. Worse, relying on just this will yield false negatives, missing successful enterprise compromises: our analysis revealed several instances where brute force attacks generated less than 5-10 failed attempts at a daily granularity but often persisted for many days, thereby avoiding extreme counts at any point in time. For such a brute force attack, thresholding the cumulative number of failed sign-ins across time could be more useful, as depicted in Figure 3.

Figure 3: Daily and cumulative failed network sign-in

Looking at counts of network failed sign-ins provides a useful but incomplete picture of RDP brute force attacks. This can be further augmented with additional information on the failed sign-in, such as the failure reason, time of day, and day of week, as well as the username itself. An especially strong signal is the source IP of the inbound RDP connection. Knowing if the external IP has a high reputation of abuse, as can be looked up on sites like, can directly confirm if an IP is a part of an active brute force.

Unfortunately, not all IP addresses have a history of abuse; in addition, it can be expensive to retrieve information about many external IP addresses on demand. Maintaining a list of suspicious IPs is an option, but relying on this can result in false negatives as, inevitably, new IPs continually occur, particularly with the adoption of cloud computing and ease of spinning up virtual machines. A generic signal that can augment failed sign-in and user information is counting distinct RDP connections from external IP addresses. Again, extreme values occurring at a given time or cumulated over time can be an indicator of attack.

Figure 4 shows histograms (i.e., counts put into discrete bins) of daily counts of RDP public connections per machine that occurred for an example enterprise with known brute force attacks. It’s evident that normal machines have a lower probability of larger counts compared to machines attacked.

Figure 4: Histograms of daily count of RDP inbound across machines for an example enterprise

Given that some enterprises have machines under brute force attack daily, the priority may be to focus on machines that have been compromised, defined by a first successful sign-in following failed attempts from suspicious source IP addresses or unusual usernames. In Windows logs, Event ID 4624 can be leveraged to measure successful sign-in events for local machine in combination with failed sign-ins (Event ID 4625).

Out of the hundreds of machines with RDP brute force attacks detected in our analysis, we found that about .08% were compromised. Furthermore, across all enterprises analyzed over several months, on average about 1 machine was detected with high probability of being compromised resulting from an RDP brute force attack every 3-4 days. Figure 5 shows a bubble chart of the average abuse score of external IPs associated with RDP brute force attacks that successfully compromised machines. The size of the bubbles is determined by the count of distinct machines across the enterprises analyzed having a network connection from each IP. While there is diversity in the origin of the source IPs, Netherlands, Russia, and the United Kingdom have a larger concentration of inbound RDP connections from high-abuse IP.

Figure 5: Bubble chart of IP abuse score versus counts of machine with inbound RDP

A key takeaway from our analysis is that successful brute force attempts are not uncommon; therefore, it’s critical to monitor at least the suspicious connections and unusual failed sign-ins that result in authenticated sign-in events. In the following sections we describe a methodology to do this. This methodology was leveraged by Microsoft Threat Experts to augment threat hunting and resulted in new targeted attack notifications.

Combining many relevant signals

As discussed earlier (with the example of scripts connecting via RDP using outdated passwords yielding failed sign-ins), simply relying on thresholding failed attempts per machine for detecting brute force attacks can be noisy and may result in many false positives. A better strategy is to utilize many contextually relevant signals, such as:

  • the timing, type, and count of failed sign-in
  • username history
  • type and frequency of network connections
  • first-time username from a new source machine with a successful sign-in

This can be even further extended to include indicators of attack associated with brute force, such as port scanning.

Combining multiple signals along the attack chain has been proposed and shown promising results [2]. We considered the following signals in detecting RDP inbound brute force attacks per machine:

  • hour of day and day of week of failed sign-in and RDP connections
  • timing of successful sign-in following failed attempts
  • Event ID 4625 login type (filtered to network and remote interactive)
  • Event ID 4625 failure reason (filtered to %%2308, %%2312, %%2313)
  • cumulative count of distinct username that failed to sign in without success
  • count (and cumulative count) of failed sign-ins
  • count (and cumulative count) of RDP inbound external IP
  • count of other machines having RDP inbound connections from one or more of the same IP

Unsupervised probabilistic time series anomaly detection

For many cybersecurity problems, including detecting brute force attacks, previously labeled data is not usually available. Thus, training a supervised learning model is not feasible. This is where unsupervised learning is helpful, enabling one to discover and quantify unknown behaviors when examples are too sparse. Given that several of the signals we consider for modeling RDP brute force attacks are inherently dependent on values observed over time (for example, daily counts of failed sign-ins and counts of inbound connections), time series models are particularly beneficial. Specifically, time series anomaly detection naturally provides a logical framework to quantify uncertainty in modeling temporal changes in data and produce probabilities that then can be ranked and thresholded to control a desirable false positive rate.

Time series anomaly detection captures the temporal dynamics of signals and accurately quantifies the probability of observing values at any point in time under normal operating conditions. More formally, if we introduce the notation Y(t) to denote the signals taking on values at time t, then we build a model to compute reliable estimates of the probability of Y(t) exceeding observed values given all known and relevant information, represented by P[y(t)], sometimes called an anomaly score. Given a false positive tolerance rate r (e.g., .1% or 1 out of 10,000 per time), for each time t, values y*(t) satisfying P[y*(t)] < r would be detected as anomalous. Assuming the right signals reflecting the relevant behaviors of the type of attacks are chosen, then the idea is simple: the lowest anomaly scores occurring per time will be likely associated with the highest likelihood of real threats.

For example, looking back at Figure 2, the time series of daily count of failed sign-ins occurring on the brute force attack day 8/4/2019 had extreme values that would be associated with an empirical probability of about .03% out of all machine and days with at least 1 failed network sign-in for the enterprise.

As discussed earlier, applying anomaly detection to 1 or a few signals to detect real attacks can yield too many false positives. To mitigate this, we combined anomaly scores across eight signals we selected to model RDP brute force attack patterns. The details of our solution are included in the Appendix, but in summary, our methodology involves:

  • updating statistical discrete time series models sequentially for each signal, capturing time of day, day of week, and both point and cumulative effects
  • combining anomaly scores using an approach that yields accurate probability estimates, and
  • ranking the top N anomalies per day to control a desired number of false positives

Our approach to time series anomaly detection is computationally efficient, automatically learns how to update probabilities and adapt to changes in data.

As we describe in the next section, this approach has yielded successful attack detection at high precision.

Protecting customers from real-word RDP brute force attacks through Microsoft Threat Experts

The proposed time series anomaly detection model was deployed and utilized by Microsoft Threat Experts to detect RDP brute force attacks during threat hunting activities. A list that ranks machines across enterprises with the lowest anomaly scores (indicating the likelihood of observing a value at least as large under expected conditions in all signals considered) is updated and reviewed every day. See Table 1 for an example.

Table 1: Sample ranking of detected RDP inbound brute force attacks

For each machine with detection of a probable brute force attack, each instance is assigned TP, FP, or unknown. Each TP is then assigned priority based on the severity of the attack. For high-priority TP, a targeted attack notification is sent to the associated organization with details about the active brute force attack and recommendations for mitigating the threat; otherwise the machine is closely monitored until more information is available.

We also added an extra capability to our anomaly detection: automatically sending targeted attack notifications about RDP brute force attacks, in many cases before the attack succeeds or before the actor is able to conduct further malicious activities. Looking at the most recent sample of about two weeks of graded detections, the average precision per day (i.e., true positive rate) is approximately 93.7% at a conservative false positive rate of 1%.

In conclusion, based on our careful selection of signals found to be highly associated with RDP brute force attacks, we demonstrated that proper application of time series anomaly detection can be very accurate in identifying real threats. We have filed a patent application for this probabilistic time series model for detecting RDP inbound brute force attacks. In addition, we are working on integrating this capability into Microsoft Defender ATP’s endpoint and detection response capabilities so that the detection logic can raise alerts on RDP brute force attacks in real-time.

Monitoring suspicious activity in failed sign-in and network connections should be taken seriously—a real-time anomaly detection capable of self-updating with the changing dynamics in a network can indeed provide a sustainable solution. While Microsoft Defender ATP already has many anomaly detection capabilities integrated into its EDR capabilities, we will continue to enhance these detections to cover more security scenarios. Through data science, we will continue to combine robust statistical and machine learning approaches with threat expertise and intelligence to deliver industry-leading protection to our customers.



Cole Sodja, Justin Carroll, Joshua Neil
Microsoft Defender ATP Research Team



Appendix 1: Models formulation

We utilize hierarchical zero-adjusted negative binomial dynamic models to capture the characteristics of the highly discrete count time series. Specifically, as shown in Figure 2, it’s expected that most of the time there won’t be failed sign-ins for valid credentials on a local machine; hence, there are excess zeros that would not be explained by standard probability distributions such as the negative binomial. In addition, the variance of non-zero counts is often much larger than the mean, where for example, valid scripts connecting via RDP can generate counts in the 20s or more over several minutes because of an outdated password. Moreover, given a combination of multiple users or scripts connecting to shared machines at the same time, this can generate more extreme counts at higher quantiles resulting in heavier tails, as seen in Figure 6.

Figure 6: Daily count of network failed sign-in for a machine with no brute force attack

Parametric discrete location/scale distributions do not generate well-calibrated p-values for rare time series, as seen in Figure 6, and thus if used to detect anomalies can result in too many FPs when looking across many machines at high time frequencies. To overcome this challenge dealing with the sparse time series of counts of failed sign-in and RDP inbound public connections we specify a mixture model, where, based on our analysis, a zero-inflated two-component negative binomial distribution was adequate.

Our formulation is based on thresholding values that determine when to transition to a distribution with larger location and/or scale as given in Equation 1. Hierarchical priors are given from empirical estimates of the sample moments across machines using about 1 month of data.

Equation 1: Zero-adjusted negative binomial threshold model

Negative binomial distribution (NB):

To our knowledge, this formulation does not yield a conjugate prior, and so directly computing probabilities from the posterior predicted density is not feasible. Instead, anomaly scores are generated based on drawing samples from all distributions and then computing the empirical right-tail p-value.

Updating parameters is done based on applying exponential smoothing. To avoid outliers skewing estimates, such as machines under brute force or other attacks, trimming is applied to sample from the distribution at a specified false positive rate, which was set to .1% for our study. Algorithm 1 outlines the logic.

The smoothing parameters were learned based on maximum likelihood estimation and then fixed during each new sequential update. To induce further uncertainty, bootstrapping across machines is done to produce a histogram of smoothing weights, and samples are drawn in accordance to their frequency. We found that weights concentrated away from 0 vary between .06% and 8% for over 90% of machines, thus leading to slow changes in the parameters. An extension using adaptive forgetting factors will be considered in future work to automatically learn how to correct smoothing in real time.

Algorithm 1: Updating model parameters real-time

Appendix 2: Fisher Combination

For a given device, for each signal that exists a score is computed defined as a p-value, where lower values are associated with higher likelihood of being an anomaly. Then the p-values are combined to yield a joint score across all signals based on using the Fisher p-value combination method as follows:

The use of Fisher’s test applied to anomaly scores produces a scalable solution that yields interpretable probabilities that thus can be controlled to achieve a desired false positive rate. This has even been applied in a cybersecurity context. [3]



[1] Najafabadi et al, Machine Learning for Detecting Brute Force Attacks at the Network Level, 2014 IEEE 14th International Conference on Bioinformatics and Bioengineering
[2] Sexton et al, Attack chain detection, Statistical Analysis and Data Mining, 2015
[3] Heard, Combining Weak Statistical Evidence in Cyber Security, Intelligent Data Analysis XIV, 2015

The post Data science for cybersecurity: A probabilistic time series model for detecting RDP inbound brute force attacks appeared first on Microsoft Security.

Finding a common language to describe AI security threats

December 13th, 2019 No comments

As artificial intelligence (AI) and machine learning systems become increasingly important to our lives, it’s critical that when they fail we understand how and why. Many research papers have been dedicated to this topic, but inconsistent vocabulary has limited their usefulness. In collaboration with Harvard University’s Berkman Klein Center, Microsoft published a series of materials that define common vocabulary that can be used to describe intentional and unintentional failures.

Read Solving the challenge of securing AI and machine learning systems to learn more about Microsoft’s AI taxonomy papers.

The post Finding a common language to describe AI security threats appeared first on Microsoft Security.

Categories: AI and machine learning Tags: