Async PICOs and Custom Beacon Wakeups in Cobalt Strike

03 June 2026

Introduction
Shortcomings of Cobalt Strike's Native Asynchronous BOF
The journey to the Async PICOs implementation
Starting an Asynchronous BOF
Tracking Running Tasks
BeaconPrintf from an Asynchronous BOF
Putting it together
Using Async PICOs
Demo: Monitor TGT
Conclusion

It is 3:47 AM. You are not at your keyboard. Your Cobalt Strike’s beacon is sleeping for another four hours. At 3:48 AM, a Tier Zero service account logs in, runs its scheduled task, and logs out. You will never know it happened. The TGT was there for sixty seconds. You were both asleep. The obvious next thought is: there has to be a way to keep a BOF running in the background and wake the beacon when something interesting happens.

There is, sort of. Outflank sketched the concept in mid-2025 with their post on asynchronous BOFs (Async BOFs). Conquest and some other C2s already ship native support for background tasks with custom wakeups. Even Cobalt Strike's latest release added something called "Asynchronous BOFs." The idea is in the air.

The problem is that none of these give you exactly what you need for this scenario. Outflank's design targets their own implant. Conquest is a different framework entirely. And Cobalt Strike's native async BOFs, while real, do not run in the same process as the beacon, and do not support adding custom wakeups. You still cannot drop a TGT monitor into a CS beacon and let the beacon sleep only until an event happens. So we built it ourselves.

If you’re only interested in running it, go to (https://github.com/nccgroup/async-pico-hub) or skip to Using Async PICOs. If you’d like to read about the design decisions, see it in action, and understand why it was done the way it was, keep reading.

But before we get to our solution, we need to address one more dead end.

Shortcomings of Cobalt Strike's Native Asynchronous BOFs

Cobalt Strike does include a feature called “Asynchronous BOFs” in a recent release, so it is worth explaining why it does not address the gap described above.

The native implementation runs asynchronous BOFs in a separate process rather than inside the Beacon process itself. This design introduces several limitations. From an OPSEC perspective, cross-process execution is not trivial to implement cleanly without increasing the risk of detection. It also breaks the model we needed, since the BOFs are no longer executing in the same memory space as Beacon.

These externally executed BOFs cannot wake Beacon on demand, and there is no built-in mechanism to coordinate or force a wake up event from within the BOF itself. As a result, workflows that depend on reacting immediately to events, such as executing a standard BOF in response, are not supported.

The output model introduces additional constraints. Native async BOF output only reaches the operator when Beacon wakes up, which means background tasks cannot immediately deliver data to the operator independently of Beacon’s sleep cycle. For workflows that rely on near real-time feedback or event-driven behavior, this delay becomes a practical limitation.

Task management is also constrained. The state of running asynchronous tasks is only visible to the operator who launched them. Since task tracking is implemented in Sleep, which does not provide shared global variables, there is no shared view of task state across operators. In collaborative operations, this makes it difficult to inspect or manage long-running activity consistently.

Finally, native async BOFs cannot be stopped gracefully. There is no mechanism to signal a BOF to begin shutdown, which means there is no reliable way to ensure state is cleaned up or resources are released in a controlled manner.

In short, while Cobalt Strike’s native asynchronous BOFs are a useful addition, they operate outside the Beacon process, and have practical constraints around output delivery, shared task visibility, and lifecycle management. That combination is why the feature was not sufficient for our use case and why a custom implementation was necessary.

To avoid confusion with Cobalt Strike’s native asynchronous BOFs, we will refer to the implementation presented in this post as Async PICOs. The name comes from Crystal Palace, where a PICO (Position-independent Code Object) refers to a position-independent COFF object that can be embedded and executed outside the standard BOF model. In practice, PICOs are similar to Cobalt Strike BOFs, but without a dependency on the Cobalt Strike API. Since our implementation relies on Crystal Palace to transform BOFs into position-independent code, the distinction is useful both technically and conceptually.

The journey to the Async PICOs implementation

If we reduce Outflank’s asynchronous BOF model to its bare essentials, the design revolves around three minimum requirements:

An Async BOF must be able to start and stop cleanly
It must execute independently in another thread or thread pool inside the beacon’s process
It must be able to send output back to the operator immediately and wake the beacon when necessary

We ultimately arrived at an asynchronous execution layer for Cobalt Strike that could run Async PICOs in Beacon’s process, let operators manage their state, and safely wake Beacon to print output. The rest of this post explains how we arrived there. The sections that follow walk through the three problems in the order we encountered them during development and how they shaped the final implementation: how to start code that can execute from an arbitrary memory location, how to track running tasks without losing shared state, and how to safely print output from a background thread without destabilizing Beacon.

Starting an Asynchronous PICO

The first question we faced was simple: how do you run a BOF in a background thread inside the beacon process?

After exploring available solutions and finding no workable option, we decided to build our own BOF to act as an Asynchronous PICO loader. The design required two things: allocating memory inside the Beacon process and starting execution on a separate thread.

Since the code would execute from freshly allocated memory rather than a fixed memory position, we could not simply run a normal BOF object directly. The code had to be position independent, we needed shellcode that could execute from an arbitrary address.

The next question became: what is the easiest and most practical way to convert a BOF into shellcode?

After some research, we found that Raphael Mudge’s Crystal Palace already solved this problem. Rather than building a custom loader or linker pipeline ourselves, Crystal Palace can take a compiled object (COFF) file and transform it into several formats, including PIC. This became a key building block for the implementation because it allowed us to preserve a normal BOF development workflow while producing code that could execute from an arbitrary memory location.

At a high level, Crystal Palace is a Java-based linker that lifts a compiled COFF file into an intermediate representation, transforms it, and can lower it back into another format like shellcode. Along the way, it can move functions, remove unused code, merge object files, resolve DFR stubs, and perform other useful transformations. It also enables BOFs to use global variables, a capability that is normally unavailable in native BOFs. The mechanism used to locate or allocate storage for those globals is configurable, which makes it flexible enough to adapt to different execution models and memory layouts. In practice, the workflow is simple: compile the “BOF” normally, then pass the object file through Crystal Palace to generate PIC.

However, we usually prefer to write our BOFs in C++, which introduced another challenge. Crystal Palace’s COFF parser makes assumptions that do not always hold for binaries generated by MSVC. For example, sometimes MSVC emits duplicate symbols like .text$mn, while Crystal Palace assumes symbols are unique. It also injects runtime helpers like __chkstk for large stack frames and symbols such as ??3@YAXPEAX@Z when C++ features like delete are used. It is not possible for Crystal Palace to resolve this automatically.

To solve this, we built CPP-BOF-FIXER. It sits between the compiler and Crystal Palace, parsing the raw COFF output, collapsing similar sections, binding MSVC runtime symbols to local implementations, and flattening the file into a format that both Crystal Palace and Cobalt Strike’s limited COFF parser can process.

This removes many of the restrictions typically associated with writing BOFs in C++. Features such as templates, classes, RAII, COM interfaces, constructors, and destructors become usable again. The workflow becomes: compile normally, pass the object file through CPP-BOF-FIXER, and then send the result to Crystal Palace for shellcode generation.

The same limitations affect Cobalt Strike’s standard BOF and sleepmask loaders, so CPP-BOF-FIXER also improves compatibility for conventional BOFs and sleepmasks. One tool ultimately solved two related problems.

With executable shellcode available, the remaining problem was how to run it asynchronously inside the Beacon process. The simplest and most practical solution was CreateThread, which allowed us to execute the PICO on a dedicated background thread while also passing context directly to the entry function.

This gave us a clean communication model for Async PICOs. At startup, we pass a structure containing a stop handle, execution arguments, and a pointer to an output structure where the Async PICO can write results. The output structure was initially a practical workaround. At the time, we had not yet found a reliable way to call BeaconPrintf, so storing output locally gave us a temporary bridge.

We called this BOF AsyncPICOMgr, and exposed it through the command:

picos start [PATH_TO_PICO] [arguments]

Tracking Running Tasks

At this point, we could run asynchronous PICOs on separate threads, but we still lacked visibility and control. There was no reliable way to see which PICOs were active, stop them, or retrieve their output. Running background tasks without a way to manage them quickly became impractical, so we needed a registry to track the state associated with each running task.

Our first idea was to implement this in Sleep. The plan was straightforward: track calls to picos start ..., store task information in variables, and have AsyncPICOMgr return identifiers and handles on task creation that could later be used to stop, inspect, or interact with a task. On paper, this gave us a simple management layer without requiring changes to AsyncPICOMgr.

In practice, the approach broke down quickly. Sleep has no native mechanism for sharing state across operators, which meant only the operator who launched a task would be able to see it. Restarting or closing the Cobalt Strike client would also discard all task information, making the registry fragile and short-lived.

We explored building a synchronization layer in Sleep to work around these limitations. The idea was to create a notification system that propagated task updates to all operators and reconstructed state whenever a client reconnected. One advantage of this design was that operators could still inspect running tasks even while the beacon slept, since the state would live outside the beacon process.

The downside was complexity. Synchronization logic became difficult to reason about, debugging Sleep proved frustrating, and the amount of infrastructure required grew quickly for what was ultimately a bookkeeping problem. Rather than continuing down that path, we simplified the design and moved responsibility for task management into AsyncPICOMgr itself.

AsyncPICOMgr became responsible for creating, storing, and managing the registry of running tasks. This came with a tradeoff: operators can only list running tasks while the beacon is awake. However, the implementation became significantly simpler and more reliable.

The registry itself is stored in Beacon’s internal key-value store using BeaconAddValue and retrieved through BeaconGetValue. Because the data lives inside Beacon, other PICOs can access it directly and interact with asynchronous tasks.

Normal BOFs can read Task State through the Beacon Registry

For example, a KeyloggerBOF can inspect the registry, determine whether a keylogger task is already running, retrieve its output structure, and print collected results. This turns the registry into a shared coordination layer for asynchronous PICOs.

`BeaconPrintf` from an Asynchronous PICO

At this stage, we could run asynchronous PICOs and manage their state, but one major limitation remained: output. An Async PICO could write data into its output structure, but operators would only see it after running another normal BOF to retrieve it. In practice, this meant asynchronous code had no direct way to send output when it became available.

The obvious solution seemed to be BeaconPrintf, but this quickly exposed one of the biggest limitations of working with CS Beacon internals. We do not have access to the implementation of BeaconPrintf, which makes it difficult to understand how safely it can be called from background threads. Outflank avoids this problem because they know the internal implementation and can link and copy it into their asynchronous BOFs. Without access to the same internals, we needed another approach.

Our first idea was straightforward: find some way to wake the beacon up and let each asynchronous thread call BeaconPrintf independently while it is still awake. However, without source code or implementation details, there is no guarantee that BeaconPrintf is thread-safe. If multiple background threads invoked it at the same time, they could corrupt Beacon state or cause crashes.

What can happen if BeaconPrintf uses shared beacon internal state? A race condition.

This forced us to rethink the design. Instead of allowing asynchronous PICOs to call the CS Beacon’s BeaconPrintf function directly, we needed a way to coordinate access to it so that Beacon itself remained the only component interacting with its internal routines.

The sleepmask turned out to be the most natural place to handle this orchestration. We modified it so that whenever the beacon or a BOF attempts to sleep, execution waits on an event-driven mechanism instead. At the center of this system is a producer-consumer queue shared between the asynchronous BOFs and the beacon.

The asynchronous PICOs act as producers. Rather than calling BeaconPrintf, they write output events into a thread-safe queue that lives in memory outside the beacon’s encrypted region. The sleepmask acts as the consumer. Whenever the beacon wakes, whether on its normal interval or because an Async PICO triggered a wake event, the sleepmask locks the queue, drains pending messages, and sends them upstream.

Normal CS Beacon vs CS Beacon with Async PICOs

This design solves several problems at once. Async PICOs never touch beacon internals while the beacon is sleeping, the beacon remains encrypted, and operators receive output as soon as it becomes available. More importantly, it removes the thread-safety concern around BeaconPrintf entirely. The asynchronous PICOs never call it directly. They only write events to the queue, while Beacon, running on its own thread, remains the only component responsible for invoking BeaconPrintf.

As a result, the queue becomes the only shared surface between asynchronous execution and Beacon internals. Access to it is protected by a critical section stored in unencrypted memory, eliminating concurrent access to Beacon routines. When a new event arrives, the beacon wakes, prints the output to the operator console, and returns to sleep.

With these pieces in place, the design of asynchronous PICOs for Cobalt Strike was complete. We could now build long-running monitoring BOFs such as keyloggers, cliploggers, and a TGT monitor, which we will cover later.

Putting it together

At a high level, our implementation performs three tasks in sequence. First, it loads a position-independent PICO into a background thread so execution can continue independently from Beacon. Second, it maintains a registry of running tasks so operators can inspect, manage, or stop them. Finally, it provides a safe path for asynchronous output, allowing background threads to send data back to the operator without calling Beacon internals directly.

The diagram below shows how these pieces fit together. On the left is the beacon thread. When an operator starts a task, Beacon runs the AsyncPICOMgr BOF, which allocates memory, copies the PIC code, and spawns a new thread for execution. Once the task is launched, Beacon returns to sleep.

On the right, the asynchronous PICO runs independently. In this example, a Monitor TGT task waits for an admin logon, retrieves the ticket, and writes the result into a queue stored in unencrypted heap memory. Writing to the queue signals a wake event. When Beacon wakes, the sleepmask drains pending messages and Beacon relays the output to the operator through BeaconPrintf, all from Beacon’s own thread.

When the task is no longer needed, the asynchronous thread receives a stop signal, performs cleanup, and exits cleanly.

A few implementation details are worth calling out for context. The public implementation is intentionally kept simple and prioritizes simplicity and readability over stealth and detection-related tradeoffs. For example, asynchronous PICOs are launched using CreateThread, which is straightforward to reason about but may be detectable because the resulting thread begins execution from unbacked memory. More advanced execution strategies and OPSEC considerations exist, though we chose not to include them in the public implementation.

The same tradeoff applies to global variable storage. As discussed earlier, Crystal Palace makes it possible for BOFs to use globals, but the mechanism for locating or allocating storage is configurable. The public implementation uses the default simple shared storage model that does not isolate global state per thread. In practice, this means only one asynchronous PICO can be run at a time. Internally, we use a more advanced approach that provides per-thread storage, allowing multiple asynchronous PICOs to execute simultaneously while maintaining independent state.

Using Async PICOs

The operator interface intentionally stays small. Managing an Async PICO requires only three commands:

picos start [path to pico] [arguments] — load and start a PICO
picos — list running PICOs
picos stop [id or name] — signal a clean shutdown

In practice, usage is straightforward. An operator starts a PICO, allows it to run in the background, and receives output whenever Beacon wakes in response to queued events. The asynchronous task continues running independently, even while the beacon sleeps.

Before running Async PICOs, two setup steps are required. First, load the picos.cna Aggressor script. Second, either use the provided example sleepmask or adapt an existing sleepmask to include the queue and wake-up orchestration described earlier. Full installation details, source code, and example configurations are available in the project repository (link to repo).

The example Async PICO included with the project can be launched as follows:

picos start /home/user/picos/ExamplePICO.obj Title Body

This starts a simple background task that displays a message box, continuing to run even while the beacon is asleep.

This message window is opening from a new thread

At any point, picos can be used to inspect active tasks and verify that the example PICO is still running.

When the message box is eventually closed, the PICO writes an event into the queue, Beacon wakes immediately, and the result is relayed back to the operator.

Beacon wakes up from sleep immediately

Demo: Monitor TGT

To demonstrate a more realistic use case, we included a TGT Monitoring PICO in the public Async PICO hub. This implementation is based on Jakob Friedl’s Conquest version of the same asynchronous BOF and highlights the event-driven model described throughout this post.

The PICO runs quietly in the background and waits for a relevant event. Whenever a logon occurs, it retrieves the TGT, writes the result to the queue, and wakes Beacon so the operator immediately receives the output.

The video below shows the Monitor TGT PICO running in practice, including Beacon wake-up and operator notification.

Conclusion

Cobalt Strike’s native asynchronous BOFs solve a different problem than the one we set out to address. For our use case, we needed long-running, event-driven BOFs that could execute inside the Beacon process, wake Beacon when necessary, and safely send output back to the operator without relying on unsupported assumptions about Beacon internals.

Building that required solving three practical problems: starting code that can execute from arbitrary memory locations, tracking asynchronous tasks without losing shared state, and relaying output from background threads without risking concurrent access to Beacon routines. Solving the first problem required going from a BOF to position-independent code, which is where Crystal Palace and CPP-BOF-FIXER became essential. The final design emerged from these constraints: position-independent BOFs running on background threads, a shared registry stored in Beacon’s key-value store, and a producer-consumer queue that safely coordinates output.

The result is a foundation for event-driven monitoring inside Cobalt Strike. With these pieces in place, Cobalt Strike can now support monitoring workflows such as custom keyloggers, cliploggers, TGT monitoring, and other long-lived tasks, all driven by an event-based execution model.

Finally, this framework should not be viewed as something to run unchanged. While it works as presented, it is intended as a base to build your asynchronous tradecraft from. In particular, your CS sleepmask will need to be adjusted, and the OPSEC considerations discussed earlier should be taken into account before deploying it in production.