EvolitBlogContact

Entra ID Protection in Practice: Risk Policies, Leaked Credentials, and Surviving a Mass Account Lockout

The April 2025 MACE Credential Revocation incident locked out users across hundreds of organizations — triggered by a Microsoft false-positive bug, not an actual breach. This guide covers how Entra ID Protection risk policies work, how to deploy them without causing a self-inflicted outage, and how to bulk-remediate hundreds of flagged accounts with PowerShell.

Entra ID Protection in Practice: Risk Policies, Leaked Credentials, and Surviving a Mass Account Lockout

TL;DR: Entra ID Protection can block your users automatically based on risk signals — including false positives at massive scale, as the MACE Credential Revocation incident in April 2025 demonstrated. This article covers how risk policies actually work, how to deploy them safely without locking out your organization, and how to bulk-remediate hundreds of flagged accounts via PowerShell when things go sideways.

The Problem

Picture a Saturday morning: you wake up to 200 Entra ID Protection alerts — half your organization's accounts now show High Risk status with "leaked credentials" as the detection reason. Your helpdesk phone is ringing off the hook. Users can't sign in; they're hitting a password reset prompt but SSPR was never enabled. You have 30 minutes to stabilize the situation.

This is exactly what happened to administrators across hundreds of organizations the weekend of April 19–20, 2025. Microsoft rolled out a new feature called MACE Credential Revocation to detect compromised accounts — and due to an internal logging error, triggered a mass false-positive event. Risk-based Conditional Access policies did exactly what they were configured to do: block accounts flagged as High Risk. The result was that roughly 30–40% of users in affected tenants lost access, without any actual compromise occurring.

Entra ID Protection is one of the most effective identity security tools in the Microsoft stack. But misconfigured risk policies don't just slow attackers down — they can paralyze your organization faster than any attacker could. The difference between a protection system and a self-inflicted outage comes down to a handful of configuration decisions most admins get wrong the first time.

How Entra ID Protection Works

Entra ID Protection continuously analyzes signals from sign-ins and user behavior, then assigns risk at two distinct levels: user risk (probability the identity itself is compromised) and sign-in risk (probability that a specific sign-in wasn't performed by the legitimate user). Both use three tiers: Low, Medium, and High.

Detections fall into two timing categories:

  • Real-time — evaluated during the sign-in flow, can block immediately (e.g., Anonymous IP address, Unfamiliar sign-in properties)
  • Offline — evaluated after the fact using aggregated signals (e.g., Atypical travel, Leaked credentials)

Key detections every M365 admin should understand:

DetectionTimingRisk LevelLicense Required
Leaked credentialsOfflineAlways HighFree / P1
Anonymous IP addressReal-timeVariableP2
Atypical travelOfflineVariableP2
Password sprayReal-time / OfflineVariableP2
Anomalous TokenReal-time / OfflineVariableP2
Unfamiliar sign-in propertiesReal-timeVariableP2

Critical licensing nuance: Leaked credentials detection fires even without an Entra ID P2 license. You'll see the alert regardless. But risk-based Conditional Access policies that automatically act on those alerts do require P2. Without P2, you see the fire alarm — you just can't trigger the sprinklers automatically.

Microsoft's leaked credentials pipeline monitors dark web forums, breach dump repositories, paste sites, law enforcement seizure data, and other sources through the Microsoft Threat Intelligence Center (MSTIC) and Digital Crimes Unit (DCU). When a credential pair matching a valid account in your tenant is found, the service validates the actual password hash before emitting a detection. This detection always fires at High — there is no configuration to lower its threshold. It's the only detection that's hardcoded to maximum severity.

The MACE Incident: A Real-World Case Study

On Friday, April 18, 2025, Microsoft's internal systems accidentally logged a subset of short-lived user refresh tokens instead of just their metadata. To protect customers, Microsoft invalidated those tokens — which inadvertently triggered mass "leaked credentials" alerts in Entra ID Protection between 4:00 AM and 9:00 AM UTC on Sunday, April 20.

The culprit was a new service principal named MACE Credential Revocation (Application ID: 7d636ec3-f39c-44f5-8b73-fa28a0e0c5bc), provisioned automatically into tenants via "Microsoft Azure AD Internal – Jit Provisioning" just before the incident. Admins on Reddit's r/sysadmin started reporting accounts flagged as High Risk despite unique passwords never used elsewhere — including passwordless accounts that have no password to leak.

The scope was significant. Enterprise and SMB tenants alike were affected. Microsoft 365 Business Basic through E5 licensing levels were both hit. Tenants with pre-configured Conditional Access risk policies suffered user lockouts. Tenants without those policies saw alerts in their reports but no disruption.

For organizations without risk policies: users saw the flag in reports but signed in normally.

For organizations with user risk policies set to require remediation or block: users couldn't sign in at all. Without SSPR configured, each blocked user required a manual admin password reset. If your helpdesk didn't have a bulk remediation script ready, you were doing this one by one — on a weekend, during what should have been off-hours.

Microsoft's official remediation: use Confirm User Safe in the Risky Users report. This marks the detection as a false positive and immediately clears the block.

The MACE incident is a preview of what happens any time risk policies run without a tested remediation procedure. The next trigger could be a real compromise, a Microsoft deployment error, or an authorized penetration test that wasn't excluded from policy scope. Have the procedure documented and tested before you need it.

Configuring Risk Policies Step by Step

Microsoft now recommends configuring risk policies through Conditional Access, not the legacy ID Protection portal. This is more than a preference: legacy risk policies in ID Protection will be retired on October 1, 2026. If you're still using the old interface, start your migration planning now.

Prerequisites — skip these and policies become a liability

Before enabling any user risk policy:

1. Configure SSPR (Self-Service Password Reset) Without SSPR, a user flagged as High Risk hits the remediation prompt, clicks the password reset link, and gets "Self-service password reset is not enabled." They're completely blocked with no self-service path. At 50 simultaneous lockouts, admin intervention for each user is an unacceptable bottleneck.

Check status: Entra admin center → Protection → Password reset → Properties

2. Enable Password Writeback for hybrid environments If you sync from on-premises AD via Entra Connect, password writeback must be enabled on the Entra Connect server. Without it, cloud-initiated password resets don't propagate to on-premises AD. The user resets their cloud password, the hash doesn't change in AD, risk stays elevated — the user gets re-prompted immediately after logging in.

3. Verify MFA registration for all users before going live If a user has no registered MFA method, they can't complete the MFA challenge required during risk remediation. They're blocked with no path forward. Use the Authentication Methods Activity report to identify users with no MFA registered, and address gaps before switching any policy from Report-only to On.

User Risk Policy (Microsoft recommendation: High)

Entra admin center → Conditional Access → New policy
- Assignments → Users: All users
  - Exclude: Emergency access accounts, service accounts
- Target resources: All resources
- Conditions → User risk → Configure: Yes
  - User risk level: High
- Access controls → Grant: Require risk remediation
  (auto-selects: Require authentication strength + Sign-in frequency: Every time)
- Enable policy: Report-only  ← always start here

Sign-in Risk Policy (Microsoft recommendation: Medium and above)

Entra admin center → Conditional Access → New policy
- Assignments → Users: All users
  - Exclude: Emergency access accounts
- Target resources: All resources
- Conditions → Sign-in risk → Configure: Yes
  - Sign-in risk level: High, Medium
- Access controls → Grant: Require authentication strength → Multifactor authentication
- Session → Sign-in frequency: Every time
- Enable policy: Report-only  ← always start here

Never combine user risk and sign-in risk conditions in the same Conditional Access policy. Microsoft explicitly warns against this. Create separate policies for each risk type — mixing them produces unpredictable enforcement behavior and makes troubleshooting significantly harder.

Leave policies in Report-only for at minimum 7 days. Review Conditional Access Insights during this window. Flag any service accounts, automation scripts, or third-party integrations that show up as "would have been blocked." Add them to exclusions before going live.

Bulk Remediation: Unblocking Hundreds of Users via PowerShell

When an incident hits, you don't have time to navigate the Entra portal for each individual user. Here's the complete PowerShell workflow.

Connect with required permissions

# Requires Security Administrator role
Connect-MgGraph -Scopes "IdentityRiskEvent.Read.All","IdentityRiskyUser.ReadWrite.All"

List all high-risk users

$riskyUsers = Get-MgRiskyUser -Filter "RiskLevel eq 'high'" | 
    Select-Object UserDisplayName, RiskDetail, RiskLastUpdatedDateTime, Id

$riskyUsers | Format-Table UserDisplayName, RiskDetail, RiskLastUpdatedDateTime -AutoSize

Bulk Confirm User Safe (false positive remediation)

Use this when you've confirmed detections are false positives — like the MACE incident. Confirm Safe immediately removes the block, signals to Microsoft's ML model that these detections were erroneous, and puts accounts into learning mode to rebuild behavioral baselines.

# Get IDs of all currently high-risk users
$userIds = Get-MgRiskyUser -Filter "RiskLevel eq 'high'" | 
    Select-Object -ExpandProperty Id

# Confirm all as safe (false positives)
$body = @{
    userIds = $userIds
} | ConvertTo-Json

Invoke-MgGraphRequest -Method POST `
    -Uri "https://graph.microsoft.com/v1.0/identityProtection/riskyUsers/confirmSafe" `
    -Body $body `
    -ContentType "application/json"

Write-Host "Confirmed $($userIds.Count) users as safe."

Bulk Dismiss Risk (stale or benign detections)

Use Dismiss for old risk entries, or where the risk was real but non-malicious (e.g., authorized penetration testing). Note: Dismiss does NOT signal to Microsoft that the detection was a false positive — similar future events will still generate detections.

# Dismiss high-risk flags older than 90 days
$staleRiskyUsers = Get-MgRiskyUser -Filter "RiskLevel eq 'high'" | 
    Where-Object { $_.RiskLastUpdatedDateTime -lt (Get-Date).AddDays(-90) }

if ($staleRiskyUsers.Count -gt 0) {
    Invoke-MgDismissRiskyUser -UserIds $staleRiskyUsers.Id
    Write-Host "Dismissed $($staleRiskyUsers.Count) stale risk entries."
} else {
    Write-Host "No stale high-risk users found."
}

Filter by specific detection type

# Show only leaked credentials detections, sorted by date
Get-MgRiskDetection -Filter "RiskEventType eq 'leakedCredentials'" | 
    Select-Object UserDisplayName, DetectedDateTime, RiskLevel |
    Sort-Object DetectedDateTime -Descending |
    Format-Table -AutoSize

Common Pitfalls and Edge Cases

Pitfall 1: Passwordless users locked out with no remediation path

Users authenticating via FIDO2 security keys or Windows Hello for Business have no password to reset. If your user risk policy's remediation path is "secure password change," passwordless users hit a permanent block — the remediation step is literally impossible for them. You need a separate Conditional Access policy targeting passwordless users with Require authentication strength: Passwordless MFA and a distinct user group assignment. Don't mix passwordless and password-based users in the same user risk policy; their remediation paths are fundamentally different.

Pitfall 2: Report-only data timing gives a false sense of safety

Conditional Access Insights can lag up to one hour. More critically, offline risk detections are evaluated after a sign-in completes. A user signs in at 9:00 AM with Risk Level = None, gets flagged as High Risk at 9:30 AM due to an offline Atypical travel detection. Your Report-only logs will show "would have blocked" for their next sign-in — not the one already completed. This means a 3-day Report-only window dramatically underrepresents the actual impact of your policy. Run Report-only for 10–14 days minimum, spanning different days of the week, including any known work-from-home patterns, shift changes, and scheduled overnight tasks.

Pitfall 3: Sync accounts and service identities

Entra Connect Sync Account, Power Automate connections, third-party SaaS integrations — these make sign-ins that can trigger Anomalous Token or Unfamiliar sign-in properties detections. If you haven't explicitly excluded these from risk policies, you risk breaking background automation. Every service account must be in your exclusion group. Don't rely on "it hasn't been a problem yet" — these detections fire inconsistently and often at the worst possible moment. Migrate interactive service accounts to Managed Identities where possible.

Pitfall 4: Hybrid environments without Password Hash Sync

Leaked credentials detection works by comparing password hashes. For accounts synchronized from on-premises AD via Entra Connect, Password Hash Synchronization (PHS) must be enabled — without it, Microsoft can't validate hashes and leaked credential detections simply don't fire for those accounts. This sounds like a safety measure until you realize you have a blind spot covering your entire AD-synced user population. If your organization chose Pass-through Authentication (PTA) or ADFS specifically to avoid sending hashes to the cloud, you've also disabled leaked credential detection for your on-prem accounts. Document this tradeoff explicitly.

Pitfall 5: Confirm Safe vs Dismiss — the difference matters long-term

Both actions immediately clear the risk level, but their long-term effects differ. Confirm Safe tells Microsoft's ML model this detection pattern was a false positive — it reduces similar false positives in your tenant and contributes to global detection accuracy improvement. Dismiss is for benign true positives (real anomalous behavior that wasn't malicious). Using Dismiss when you should use Confirm Safe means Microsoft's model doesn't learn from your feedback. During incidents like MACE where detections are confirmed false positives at scale, always use Confirm Safe — not Dismiss.

How We Handle This at Evolit

At Evolit, we manage dozens of Microsoft 365 tenants, and incidents like the April 2025 MACE event forced us to formalize our mass-remediation runbook. When simultaneous lockouts hit 30+ users across a tenant, we use Nexma to triage incoming helpdesk volume. Instead of each "I can't log in" becoming a separate email thread or Teams DM, all tickets land in a unified queue with automatic grouping by detection type and time window. The on-call engineer sees "38 users locked out, all leakedCredentials, all flagged between 4:00–9:00 AM UTC" at a glance, cross-references with Entra ID Protection, confirms the MACE pattern, and runs the bulk Confirm Safe script. Nexma logs the admin action with timestamp and automatically notifies affected users when access is restored. Without that structure — managing 38 parallel conversations while running PowerShell remediation — the margin for human error is significant, and inevitably some users stay blocked for hours longer than necessary. More at nexma.app.

Summary

  • Leaked credentials always fires as High Risk — no configurable threshold; Microsoft incidents like MACE can mass false-positive your entire tenant overnight
  • Configure SSPR and MFA registration before enabling risk policies — without them, blocked users have no self-service remediation path and require manual admin intervention for every account
  • Start in Report-only for minimum 7 days — offline detections and data lag mean shorter windows underrepresent actual policy impact
  • Never combine user risk and sign-in risk in a single Conditional Access policy — create separate policies; Microsoft explicitly recommends this and the enforcement behavior is significantly more predictable
  • Confirm Safe for false positives, Dismiss for benign true positives — the distinction matters for Microsoft's detection model training
  • Have your bulk Confirm Safe script tested before an incident — 30 minutes with a pre-validated PowerShell runbook beats 4 hours of portal clicking for a 200-user lockout
  • Legacy ID Protection risk policies retire October 1, 2026 — migrate to Conditional Access-based policies now, not the week before the deadline forces your hand