The Bug
Here is the scenario. You have a Business Rule that fires when an incident is resolved. It reads the assignment_group reference field, calls getRefRecord() to navigate to the group record, reads the group manager's email, and sends a notification. In development it works perfectly. In production, intermittently, the notification goes to the wrong person. Sometimes nobody gets it at all.
No exceptions. No errors in the system log. No Business Rule skip entries. The rule appears to have executed correctly. It just operated on the wrong data.
This is the async context trap. It is one of the most time-consuming bugs to diagnose in ServiceNow because it leaves absolutely nothing in the logs, it is inconsistent, and the code looks completely correct — because in synchronous execution, it is correct. The problem is context, not code.
What Async Actually Means in ServiceNow
Most developers know that async Business Rules run after the transaction completes, in a separate thread. What is less well understood is what that actually means for the data you have access to.
When a Business Rule runs synchronously — the default for Before and After rules — it executes within the same database transaction as the record save. The current object is a live GlideRecord with an active database connection. Reference fields resolve against the current transaction state.
When a Business Rule runs async, ServiceNow does the following:
- The original transaction commits to the database normally.
- A job is queued for the async scheduler, containing a serialised snapshot of
currentat the time the transaction committed. - A worker thread picks up the job — possibly seconds later, possibly minutes later under load.
- That worker thread deserialises the snapshot back into a
currentobject and executes the Business Rule script.
Step 4 is where the problem lives. The current object in the async thread is built from a snapshot. The GlideElement objects for reference fields carry cached metadata that was accurate at queue time. By the time the worker runs, the database may have moved on.
When You Go Async Without Knowing It
The obvious case is a Business Rule explicitly set to When: async. But there are less obvious paths into async context:
Execution time threshold promotion. After major platform upgrades, ServiceNow can reclassify After rules as async if they exceed a new execution time threshold. This happens silently — no notification, no change to the Business Rule's When field, no indication that your rule is now running in a different context. I hit this after a Washington-to-Xanadu upgrade. Three After rules stopped behaving correctly with no error in the log. Nothing in the code had changed. The platform had quietly reclassified them.
Import Sets and Transform Maps. Business Rules triggered by record creation during an import run in async context by default, even when the rule's When field says After.
Scheduled jobs. Any Business Rule invoked programmatically from a scheduled script runs without the synchronous transaction context the rule may be written to expect.
Why getRefRecord() Lies
When you call current.some_reference_field.getRefRecord(), you are asking ServiceNow to follow the reference and return a GlideRecord for the referenced row. In sync context this is a live database read — it returns whatever is in the database right now.
In async context, the GlideElement object for the reference field was deserialised from the snapshot. It has a value (the sys_id) and a display value (the display value of the referenced record at queue time). When getRefRecord() executes in the worker thread, it uses this GlideElement's internal state rather than performing a guaranteed fresh query.
What you end up with depends on the state of that cache:
- If the referenced record has not changed: you probably get the right data. This is why the bug is intermittent — it only manifests when there is meaningful lag between queue time and execution.
- If the referenced record changed between queue time and execution: you get the pre-change version. You are working on a record that no longer exists in that state.
- If the GlideElement internal state is inconsistent (which happens when display value and value are out of sync): you get an empty or null record and your code proceeds with no data.
The sys_id itself — the value — is reliably stored in the snapshot. Only the cached metadata and the reference resolution mechanism are affected. This is the key insight the fix relies on.
Spotting It in the Wild
Because the bug is load-dependent, you need to confirm it deliberately rather than chase it intermittently. Add a temporary diagnostic to your Business Rule:
// Temporary diagnostic — confirms async staleness issue
var refField = current.assignment_group;
var sysId = refField.toString(); // reliable from snapshot
// What getRefRecord() returns
var viaRef = refField.getRefRecord();
var nameViaRef = viaRef.name.toString();
// What a fresh query returns
var fresh = new GlideRecord('sys_user_group');
fresh.get(sysId);
var nameFresh = fresh.name.toString();
gs.log(
'DIAG | sysId: ' + sysId +
' | getRefRecord: ' + nameViaRef +
' | fresh: ' + nameFresh +
' | match: ' + (nameViaRef === nameFresh)
);
Run this under load — import a batch of records, trigger a high-volume update. If you start seeing match: false in the log you have confirmed it. The sys_id will be consistent; the value returned by getRefRecord() will occasionally be wrong.
The Fix
Extract the sys_id using toString() and run an explicit GlideRecord.get() in the worker thread's context. That is it.
// Do not use this in any Business Rule that may run async
var group = current.assignment_group.getRefRecord();
var manager = group.manager.toString();
// toString() on a reference field reliably returns the sys_id.
// Drive a fresh GlideRecord query from that sys_id.
var groupSysId = current.assignment_group.toString();
if (!groupSysId) {
gs.log('No assignment group on record ' + current.number);
return;
}
var group = new GlideRecord('sys_user_group');
if (!group.get(groupSysId)) {
gs.log('Group not found: ' + groupSysId);
return;
}
// Fresh read from the database. Always correct.
var groupName = group.name.toString();
var managerSysId = group.manager.toString();
The null checks are not optional. In an async context it is entirely possible for a record that existed at queue time to have been deleted or reassigned by execution time. Handle that gracefully or you will have a different silent failure.
The Same Problem Kills Dot-Walking
Dot-walking — reading fields across reference boundaries inline like current.assignment_group.manager.email.toString() — hits exactly the same issue. Each dot is a reference resolution, and each one is subject to cache staleness in async context.
In sync context dot-walking is fine and often convenient. In async context, break every chain into explicit queries:
// Replaces: current.assignment_group.manager.email.toString()
// Extract into a Script Include to keep Business Rule readable
function getGroupManagerEmail(groupSysId) {
if (!groupSysId) return null;
var group = new GlideRecord('sys_user_group');
if (!group.get(groupSysId)) return null;
var managerSysId = group.manager.toString();
if (!managerSysId) return null;
var manager = new GlideRecord('sys_user');
if (!manager.get(managerSysId)) return null;
return manager.email.toString() || null;
}
var email = getGroupManagerEmail(
current.assignment_group.toString()
);
Verbose. But debuggable at every step, null-safe at every boundary, and correct regardless of whether the rule runs sync or async.
The Rule Worth Memorising
In async context, treat
currentas a bag of sys_ids and primitive values only. Never callgetRefRecord(). Never dot-walk. Never callgetDisplayValue()on reference fields. Use the sys_id to drive an explicit fresh query for anything you need.
The more practical version: write all your Business Rules as if they might run async, even when they are set to After. A future upgrade, an import set trigger, or an admin changing the When field can move you into async without warning. The explicit re-query pattern costs a few extra lines and works correctly in both contexts.
getRefRecord() or a dot-walk chain in a Business Rule, add a comment explaining why it is safe — sync only, never triggered from async path. If you cannot write that comment confidently, replace it with an explicit query. Future-you, debugging a production incident at 5pm, will be grateful.I found this one the hard way. Production incident notification system, Business Rule three months old and never touched, started sending notifications to the wrong team after an upgrade. No code change. No error in the log. Just wrong data, on live incidents, in production. Took most of an afternoon to diagnose. Hopefully this saves you that afternoon.