Show and tell: agent smith


So we are seeing more and more cases of malicious code injected into shared models, skills, tool, agents, and projects. I had a case where a shared model had a number of malicious code blocks that the users had not noticed due to the size of the codebase. At the time of this writing, we are seeing proliferation of malicious skills and prompts with a variety of payloads. When run in an IDE with AI tools, much of these produce execution and C2 activity that is detectable if something was watching closely enough. The challenge is that these events will be in a haystack of benign process and network events from tasks the AI tools were given permission for by the user. One of the purposes of the ODR (open DR) project is to hunt for this class of threat activity using anomaly detection techniques and a learning informed detection pipeline that is continuously updated.

One reason to run these hunts at the local level, in addition to conventional SOC and SIEM operations, is that the definition of what normal looks like, for a particular IDE, is largely in the head of the developer. Another reason is that much of the context is on the endpoint where the activity took place, and all of that state cannot generally be logged to a SIEM due to data volume and cost. At the same time, devs cannot monitor every action taken by their tools, and they are not trained in what to look for. This feels like a job for an autonomous agent pack.

Smith is an autonomous agent pack that will process most alerts and anomaly detections generated by ODR (open DR, also in this GitHub org.) ODR mainly looks for strange things happening in your AI dev tools at present but Smith will process most alerts generated by ODR. There is a show and tell video here: https://youtu.be/lsh3JRne9sg and the project lives in a repo here: https://github.com/opendr-io/agentic-park/tree/main/smith

Smith processes raw event and alert data from the ODR (open DR) project in order to investigate alerts and anomaly detections. It outputs an analysis of each alert. It lets you know when it thinks it has a high confidence detection and engages you in a collaborative analysis conversation in order to work out whether the detected activity is benign and expected or unexpected and potentially malicious.

Some sample alert data is provided so that it can do something out of the box. Sample event data is not provided due to its size, and the need to sanitize it, but can be generated by running openDR. It has a filter layer to try to stop prompt injections from reaching the agents and one filter test case alert is included in the sample data; you will see it get “intercepted” by the filter. That is an interesting area of research and we would like to hear from both offensive and defensive researchers as we add more filtering techniques and more detections.

The Causality project

Some time ago, one of my stakeholders said, “We may never get to zero cves. How can we identify the ones that matter the most?” Annual CVE volume has since quadrupled over the past decade. Recent research continues to explore the challenges associated with vulnerability management, and the limitations of existing prioritization methodologies, such as severity, which are not always good predictors of exploitation and risk. [1] [2] EPSS, while more sophisticated and predictive, is an ongoing topic of discussion as to whether it predicts exploitability or exploitation. [3] Of the roughly forty thousand CVEs issued last year, less than a percent were added to watchlists for observed exploitation activity and we lack a methodology for targeting this subset. Having spent a good deal of time with red teams, I believe exploit selection and usage resembles tool or equipment selection in other adversarial pursuits. I would liken it to athletes choosing equipment, lawyers choosing precedents and arguments, or warfighters choosing weapons and tactics. Factors such as theaters of operations, playing fields, opponents, past experience, and bias for successful tactics used in the past, are more influential to selection than mathematical scores and metrics used by existing prioritization methodologies. 

Last year, I experimented with applying a number of machine learning models to the problem of CVE prediction and arrived at one that yielded the best results which we named CAUSALITY. This model has, at the time of this writing, produced sixty provable correct predictions. A provable prediction means that a CVE was rated “hot” or “warm” – meaning it has potential to see heavy exploitation and be watchlisted – before it was added to a watch list. The prediction lead times range between days and months. The predictions are published in a Github repo (https://github.com/opendr-io/causality) where anyone can audit them to verify we are making predictions forward in time by comparing the time deltas. The correct predictions made to date are summarized in the readme for the repo where the raw data is published. I am not publishing output there constantly, only enough to prove prognostication, as extraordinary claims require extraordinary evidence.

On the questions of sensitivity, specificity, precision and recall; I am open to suggestion. Is a prediction a false positive if it does not come true in a month? In three months? a year? The interval for the published predictions ranges from a few days to as long as 137 days. Meanwhile, the watchlists continue to upgrade CVEs from prior years, even some from the prior decade, as they are selected for weaponization by threat actors. The way I think about this is more like having an advantage in an adversarial process. If this were hockey, instead of cybersecurity, and a model could predict that most successful shots on goal would come from a subset of 8-11% of the total shots, that would increase our odds of winning the game. Prioritizing a subset of CVEs according to their potential yields a larger risk reduction at a lower cost relative to existing processes.  When exploitation cycle avoidance can be realized, where the prediction lead time is sufficient, the ROI is much higher.

CVEs have interesting differences from other data domains. CVE classification differs from malware classification in that there are no benign CVEs apart perhaps from those that have been rejected or withdrawn. They are on a gradient of risk potential, and some never amount to much of anything, but their presence cannot be considered benign. Rather, the objective is to try to identify the smallest set that yields the greatest risk reduction, and to deal with those quickly enough to avoid exploitation.

[1] https://arxiv.org/abs/2302.14172: Enhancing Vulnerability Prioritization: Data-Driven Exploit Predictions with Community-Driven Insights

[2] https://arxiv.org/pdf/2508.13644v1: Conflicting Scores, Confusing Signals: An Empirical Study of Vulnerability Scoring Systems

[3] https://www.linkedin.com/posts/resilientcyber_vulnerability-scoring-frameworks-activity-7363978158439600128-oS3t?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAZIaEBGLaE7H8r2VCTwQayr6Vq_PFIqYY,