Posted in Blog, detection engineering, Resources

Simple SIEM Terraform

More SIEM vendors are offering Infrastructure as Code (IaC) deployment for ingesting logs, but the guides are often written in a way that makes them difficult to use within an existing IaC environment. This can get especially “fun” when you have multiple logging systems in place, which is not that uncommon. Following the stock suggestions can also lead to more resource sprawl than necessary because multiple roles may be created when a single role would suffice. Having done this a couple times, I wanted to make a few notes on how to deploy AWS resources in a way that is flexible and solution agnostic. Basically, a setup that will let you set up access to the S3 bucket(s) of your choosing for multiple solutions. The goal was both to have a solution that would make it easy to track what resources were associated with centralized logging and could be used with one or more solutions. The added benefit is having a setup that makes it easier to avoid vendor lock-in. There’s nothing wrong with the docs – they are just written to be deployed in isolation. While you could hand those docs off to DevOps for deployment, that can be a fairly big blocker and result in a fair amount of back and forth to get a clean understanding of what the requirements are because the pattern is likely a bit different from the typical pipeline.

I’ll be using Panther and Scanner as examples because those are two that I’m familiar with. The two have slightly different approaches to how they deal with the trust relationship. My hope is that showing the two will help detection engineers understand how logging architecture can be set up via code. Plus both have documentation that makes this easy to follow for someone less familiar with IaC. I would implement these as modules so you can deploy to multiple AWS accounts and keep things uniform.

These should serve as a starting point to help avoid sprawl and avoid having to manually deploy resources (which requires permissions in AWS accounts that really should be locked down). You could use the repo as a module, but that may not work well in your environment. Have a conversation with your DevOps team about how to best fit this in with your existing code.

The key components are:

  • SIEM general resources
    • SNS module
    • S3 module
  • Panther resources
    • IAM
    • Bucket access
    • SQS subscription
  • Scanner resources
    • IAM
    • Index bucket
    • Bucket access
    • SQS subscription

You will use the SIEM resources as building blocks to deploy the SNS topic and provide access to the S3 buckets to the IAM resources created from the Panther and Scanner sections.

You could use Terragrunt stacks or data blocks to avoid having to hard code buckets, keys, roles, etc. That’s not necessary, but it can be really helpful if something changes or you end up deploying in multiple environments. Which option works will largely depend on your overall IaC setup. Data blocks may be easiest and avoid having to learn a new tool. Carefully consider the repo structure used for IaC to determine how you can reduce manual toil for resource deployment both now and in the future.

For Panther, you will need to set up the SQS queue to allow your AWS account by adding an initial resource. You could do this through the UI and using one of the provided CloudFormation stacks, but that kind of defeats the whole purpose of this approach – you don’t want anything done manually. Plus there’s a decent chance that you won’t have the required permissions to deploy that way in all of your environments. There are two ways to work around this. If you are in a more strictly controlled environment and cannot run a manual apply (whether through CI/CD or CLI), you will need to split the code into multiple PRs. First create the IAM resources and bucket access and go through the CI/CD process for those, then create a log source in Panther using that role info to register the account. Once that log source has been successfully created, you can then add the code for the SQS subscription in a new PR.

If you are responsible for deploying the resources yourself and have the option to only go through the CI/CD process once, that can be a little “easier”. It just depends on where your organization is and what your role/responsibilities are. It can be the better choice to reduce the burden on your teammates by only sending through a single PR when you know there will be multiple steps. You would create all the code and deploy your infrastructure, expecting the error on your SQS resource. Then manually add a log source using the account in Panther as above, deploy again, and complete the CI/CD process.

For Scanner, you need to create the source to get the external ID and then deploy with that info. Depending on timing, you may have to go through adding the account again and update the external ID if setup times out. Hopefully with this guide you won’t need to. If you are within 24h and keep the browser tab with the account setup open, it should be fine, beyond that, you may need to reset the information. Also consider how you want to protect the Scanner index bucket in your environment,

Please note, if you don’t have things set up to deploy to AWS via Terraform, whether via CLI or CI/CD, this code will not work. And if you do, there will likely be some tweaks needed to fit your environment and naming conventions. But it should get you most of the way there. Hashicorp has a walkthrough on automating Terraform with GitHub Actions that would help you get set up if you need to start from nothing.

This approach could easily be extended to grant access to other vendors needing S3 access. As more teams try to extract more value from logs, it’s not uncommon to need access from multiple tools. Applying IaC to manage this access makes it much easier to track what access a tool has and remove that access if the time comes. An added benefit is it’s equally easy to grant a new tool access to buckets.

Posted in Blog, Cons

HouSecCon 2025 – From Dumpster Fire To Detection

Another great year at HouSecCon (next year to be CYBR.SEC.CON) in the books. Over 3000 attendees this year, which is pretty amazing. I love speaking at this con, and it’s always fun to see what else is on the schedule. Caught a great talk on developing a detection framework for account takeover that was similar to what I’ve done in the past – really good to see how others have approached the problem and what pieces they have included, especially with different systems, because it gets me thinking about how I can tweak it to work with mine.

My topic this year was “From Dumpster Fire To Detection” – basically how to manage the hype and get from all the excitement to usable detections. I’ll add a link once the video is posted, but I also wanted to summarize my thoughts and give a bit of the hands-on walkthrough that I did live. Slides are below.

Optional: Lab setup

If you want to use this as a lab and don’t have a SIEM available as part of your day job, pick your favorite open source detection repo. I would use either panther-analysis or one of the scanner-inc detection repos because both allow you to do mock testing. Scanner does require API info, but I believe there’s still a playground/demo environment you can use for that. If not, you could use Sigma and one of the ways of testing those (sigma-cli, sigmac, etc). I’m partial to panther-analysis because I like the customization available like being able to dynamically set severity based on context and it’s what I’m most familiar with. Whichever option you use, go through the repo setup and ensure you have everything functional before you get to the hands-on part.

The Dumpster Fire

It seems like every day something hits the news cycle claiming to be the next thing that will “pwn all the things”. This often comes in the form of teasers about upcoming disclosures, including overblown impact estimates. I understand this part to an extent. As an industry, we don’t exactly have the best track record of patching quickly. But the hype has gotten a bit out of control at times with wild speculation and people combing through git commits trying to guess what the CVE will look like based on commits since the last release. And sometimes these get enough traction that you will get impact inquiries before information is even released. On one hand, I understand wanting to know if your vendors have been impacted. On the other, there are contracts in place addressing disclosure. And until details are released, there’s not a lot that can even be said. Then once the drop happens, there’s a rush to identify IOCs, get blogs or videos published with analysis (sometimes with revisions coming quickly after), and more impact inquiries.

As a security engineer, there’s a lot you can to do manage this. A big one is managing expectations for security inquiries. This means having a good relationship with whomever is fielding them. Some things you can’t know before information is released. And once info is released, it takes time to investigate. If you might be impacted, take the time to check your environment – is it in use, how critical is it, how hard would it be to patch? I like to use these as an opportunity to tabletop things a bit and what-if my response. It’s also a great time to take hardening actions, check your logging and monitoring (and see if you have any detections you want to add), and think about some worst case impact scenarios (an opportunity to do a mini threat model). Incident response takes practice, and while you won’t want to document everything each hype cycle, some are worth going through the documentation. I like to have a shortened version of my IR template that is just for industry security events that I can use to take a look at my environment for similar issues and review the actions taken by the impacted organization. The compromise of Snowflake customers was a good one for this – it was spun like Snowflake was breached, but it was an issue with customer creds. Snowflake handled it well, and a big takeaway was the importance of emphasizing/pushing MFA on customers and regularly reinforcing baseline security settings. This is also a good time to make sure you know who your communication lines go through, including customer coms and legal.

The Aftermath

Once the news is official, we get to do what we do – investigate, remediate, protect, detect. And of course, don’t panic. When investigating, I like to go to the original research as much as possible, but it’s important to know yourself and know when you need to go to the summaries or breakdowns. Or when it’s time to phone a friend who’s an expert. Especially when there are multiple layers of configuration involved, multiple sets of eyes are important because you can’t be an expert on everything. Don’t be afraid to reach out to your developers or IT team to check on environment specifics. Identifying the real impact, both overall and in your environment, is critical. It’s better to error on the side of caution, but you don’t want to cause undue panic.

Once you’ve done the initial investigation and have a solid assessment, it’s time to remediate. Can you patch? Even better, can you remove the thing entirely? What additional remediation is necessary will depend on impact, but having your IR processes in line will help a lot. And this is part of why I like taking some time during the hype cycle to do preliminary work if there’s potential impact. We likely all have a backlog of runbooks that is longer than we have time to prioritize. These industry events can be opportunities to at least get a minimal runbook developed and prepped with queries, notes, etc.

Moving on to protect, is there hardening that needs done? Updates to configuration since initial install? Are there defense in depth steps that can be added? Sometimes the best/only time you can do this is when something has eyes on it because everyone thought it was going to be a dumpster fire. Might as well take advantage of it.

Now on to the detections, keep in mind we would prefer to use higher levels on the Pyramid of Pain (hash < IP < domain < network/host artifacts < tools < TTPs) to create more robust detections, but sometimes you need to make the basic IOC detections in events like this. Consider what makes the most sense for the circumstance. Find existing detections that can be modified if possible. A new detection may make the most sense, but you likely can find something to at least serve as the base. And determine if there are any exceptions needed. The answer is pretty much always “yes” to are there any exceptions needed. Please, don’t put detections into prod with no exceptions unless you are really, really sure about that. Especially if you aren’t the one triaging them. Be kind to your analysts. And make the titles verbose, with dynamic details. Small things like this go a long way to fighting burnout and alert fatigue.

The Walkthrough

The article I discussed live was Tales from the cloud trenches: The Attacker doth persist too much, methinks from Datadog. Great writeup on long-term AWS access key leakage and specific IOCs involved. Plus specific detection ideas. The process I use is to dump the link into the Readwise Reader app and make highlights there – then I can export those in Markdown for easy sharing. I usually send them to myself to share with my team and develop detections from. Then go to the repo for my SIEM if I don’t know what detection I want to modify off the top of my head and go from there. One of the things I like about detection as code is the familiarity you get with your detections. It’s a little bit different way of living with them, and at least for me, I have more of the detections internalized than I do from just working in a UI. Reviewing others detections before they can be merged also means I’m seeing the logic of those detections more than I would just triaging alerts.

With these, getting the key IOCs for an immediate environment check is the first step. You may need to pause to investigate depending on team size, or hand those off while you continue researching. Understand both your team size and expertise in making these decisions. With some application and code vulnerabilities, I prioritize information gathering to get to my subject matter experts as fast as possible so they can stay in the code. I get the info, pull IOCs, and summarize key points – including questions about parts I’m not clear on or things that I need them to specifically weigh in on. Other times, I’ve been the person having to let others do the initial research while I’m searching. Clear communication is vital as is being able to set aside your ego. You may have to do everything. When that’s the case – work on developing your virtual network where you can ask non-specific questions. And consider how (work-approved) AI can be helpful – Notebook LM in particular can be great to review and summarize a lot of links. You’ll still have to check things you get from AI, but it’s a tool that you can’t (and shouldn’t) ignore.

Datadog does a great job listing IOCs, suggesting detection opportunities, and linking to their stock detections that relate to the post – that’s pretty much their standard. If you aren’t sure of the schema, the linked detections should give you enough to translate it into whatever your SIEM uses. And since naming conventions tend to be somewhat consistent across vendors, looking at how Datadog names things can give you a starting point for other vendors.

One of the things that came out in the discussion, was how good of a signal something like an impossible travel alert is. For some environments, really strong. For others, really not because of cloud footprint, VPN usage, etc. IP geolocation data can be fickle, so having multiple sources to check is important. As is just knowing your environment. Scale does become a factor, but there are general things you can do regardless of size, like figuring out what normal looks like.

The Extras

I had a chance to join the CYBRSECMedia podcast to chat about detection engineering with Michael and Sam (HouSecCon founders). Lots of fun – even if I did keep disappearing.

And here are my slides. I’ll add the full talk video once it’s up. Once again I avoided walking off the stage, so that’s a win in my book!

Posted in Blog, detection engineering, Logging

Snowflake “Breach” and Logging

It’s been a bit, and the Snowflake “breach” has largely faded, but it’s a good one to review if you either use Snowflake or work at a SaaS company. While there was a lot of chatter, what eventually came out was that Snowflake was not breached, but multiple customers were. This was likely related to poor security practices in some fashion (lack of MFA, password reuse, etc.) rather than something Snowflake did. Snowflake responded by pushing customers to use MFA, though the option offered is Duo. Nothing wrong with Duo, but it’s pretty surprising that other options aren’t available. Single-sign on is also available and is probably a preferable option. If you haven’t, review their best practices and make sure everything is set up correctly.

There are a couple important takeaways from this series of breaches. For any SaaS company, the big one is that you need to handle breaches being attributed to your security when in reality it’s on the customer. There are security features that might be a good thing to make mandatory like MFA, password length, etc. if what you provide is likely to handle sensitive data. The second is to have a plan for dealing with the publicity fallout and misattribution. I think Snowflake did a good job managing the public fallout. I didn’t see any blaming or bad-mouthing of customers, and it looks like steps were taken to more strongly encourage customers to enable MFA.

For customers, it’s a reminder to adhere to good security practices and implement the recommended best practices. Beyond that, it’s a reminder to think about the monitoring and detection strategy being applied. This is where things get interesting. Snowflake tends to get a ton of interesting data since that’s kind of its entire purpose. But what I’ve found odd is there’s not a lot of information about ingesting Snowflake logs into the SIEM of your choosing. Nor is there a lot of info on effective alerting. There were some really good resources about the IOCs related to the string of breaches, but there hasn’t been much on the day to day stuff. So I went digging, and what I found for logging was interesting.

More SIEMs are providing ways to ingest Snowflake logs, but there’s not a lot of information about just getting the logs. Given the volume of Snowflake logs, I’m hesitant to pull the data into a SIEM. Getting the data into S3 or blob storage where you can analyze as needed or use other observability options. For some additional ideas for ingest, Panther has a beta ingest set up a bit differently. Datadog and Splunk also have options. Scanner has a set of detections that would work for this implementation. It has been good to see more information coming out about Snowflake logging since this hit the news.

When I started looking at how to effectively log Snowflake years ago, I was surprised how little was there about getting Snowflake logs to various monitoring options. There’s definitely more info out now than then, but I thought I’d share the notes I have. This may not be the best solution for your situation, but it worked well with minimal overhead. With less time crunch, I might have done this differently, but it works for a basic solution that will get the logs to long-term storage.

Logging Overview

The good thing is that Snowflake does have pretty solid logging and a year of retention (Snowflake docs).

You could go with tasks or alerts if you are using something without a stock integration; I opted for tasks for a more complete picture. The Account Usage database has a ton of information, so you’ll want to be selective about what gets pulled in. There is also a cost to all of this, but it may be worth it to have the visibility. Not all tables are available at all levels, which is not unexpected, and you need the Enterprise tier to have ACCESS_HISTORY available. The surprising part is that the latency on the logs varies from minutes to up to 12 hours. Most are in the 2-3 hour range. That’s a really long time to be in the dark about activities in your tenant. Login history and changes to roles/users all have a latency of up to 2 hours. I’m still trying to wrap my head around that latency. I suspect most customers end up just accepting the logs and alerts will have to live in Snowflake if not using a solution with an easy setup option.

The approach I settled on was using copy into to dump the logs using an S3 integration. This had minimal overhead and meant I did not need to create a custom user. Plus having the logs in S3 gives you a lot of options. It’s important to note that you cannot create tasks or alerts in the SNOWFLAKE database, so these will need to be created in a writeable database.

A basic copy into query would look like

copy into '<s3_location>'
from (select * from snowflake.account_usage.<table> order by created_on desc limit 10)
STORAGE_INTEGRATION = <name>
FILE_FORMAT = (TYPE = CSV, FIELD_OPTIONALLY_ENCLOSED_BY = '"')
HEADER = TRUE

The timestamps were not in a great format, and some tables don’t even have an event timestamp. I found using concat worked well to get the format in a way that makes S3 partitioning nice. You may want to convert all timestamp fields.

partition by concat(current_date, 'T', split_part(convert_timezone('America/Los_Angeles', 'UTC', current_timestamp

I settled on recurring tasks for ease of export. There are other ways that will work, but for quick implementation with minimal overhead this worked well. Starting with LOGIN_HISTORY is a good idea to get things worked out and test formatting. I found that you can also pull the LOGIN_HISTORY logs fairly often, but others (GRANTS_TO_USERS and GRANTS_TO_ROLES) showed more latency – probably give them the full 2h Snowflake talks about. You should also consider adding an INGEST_TIME field to capture when the log was actually pulled in.

convert_timezone('America/Los_Angeles', 'UTC', CURRENT_TIMESTAMP) as INGEST_TIME

I also used a time_delta field to see how long it was between ingest and creation on tables with multiple timestamp.

array_min([timestampdiff(minute, created_on, ingest_time), timestampdiff(minute, deleted_on, ingest_time)]) as time_delta

I’m a fan of pretty much everything as code, so using the Snowflake Terraform provider works well to implement this approach. Some (very basic) starter code where you’ll need to fill out the parts you need:

resource "snowflake_warehouse" "read/write_warehouse" {
name =
comment =
warehouse_size =
auto_resume =
auto_suspend =
initially_suspended = false
}

resource "snowflake_task" "<name>" {
count = # Use if only want to deploy in some environments
comment =
database =
schema =
name =
schedule = "USING CRON 0/10 * * * * UTC"
enabled = true
sql_statement = <<EOF
copy into
from (
select
<table> as snowflake_event,
convert_timezone... as ingest_time,
concat... to convert other times as <new_fields>,
array_min([timestampdiff(minute, created_on, ingest_time), timestampdiff(minute, deleted_on, ingest_time)]) as time_delta --Compare all relevant times to get most recent
<any other things>
from snowflake.account_usage.<table>
where (120 <= time_delta and time_delta <= 130)
order by time_delta asc
)
partition by concat(current_date, 'T', split_part(convert_timezone('America/Los_Angeles', 'UTC', current_timestamp), ' ', 2), 'Z') --Adjust as needed for destination
STORAGE_INTEGRATION = "<name>"
FILE_FORMAT = (TYPE = CSV, FIELD_OPTIONALLY_ENCLOSED_BY = '"')
HEADER = TRUE
EOF

Wrap Up

Hopefully this gives you an idea of how you can easily set up Snowflake ingest without having a system with a ready to go ingest. Digging through the linked documentation should give you an idea of what information is most important to you and give you ideas for detections. Carefully consider the cost implications of both the compute and ingest when deciding what and how to log. You may not need to export the logs to meet your needs, and you may be able to use your SSO provider to monitor access (initial and on-going). Think creatively because if you are tight on ingest space, the logging and alerting available within Snowflake may make export unnecessary.

Posted in Blog, detection engineering, Resources, Review

Book Review: Practical Threat Detection Engineering

TL; DR – If you are in detection engineering or want to be, buy the book. I could immediately put the content into action – totally worth the investment. Packt link

Overview

I’m pretty sure that I bought this about the day it came out or on pre-order. Partly because there isn’t a ton in the space. Partly because I’m familiar with Megan Roddie’s work with SANS and through BrakeSec and she does good work. Partly because I have impulse buying issues and NEW BOOK! There’s a lot covered, and I think the book is appropriate across the experience spectrum. Fundamentals are covered plus detailed information about developing a program and writing detections. There is also a full lab build out – I haven’t done that yet because I got the book more for the program development stuff and the labs are generally things I’ve done. If you’ve never built a detection lab before, there’s enough detail that it’s easy to follow. I appreciate the care given to the building of the lab environment and going back to lab exercises throughout the book. I may end up going back to build the labs to support some other stuff I’m working on.

Strengths

I think the biggest strength of the book is that it’s the most comprehensive book I’ve seen on detection engineering. I was really happy with the content. As a former academic, I always want to have sources to back up my approaches. The authors did a great job covering everything from the importance of having a program to measuring effectiveness to writing better detections. There’s beneficial information regardless of how mature your detection engineering program is. The coverage about data sources and figuring out what data you need is really helpful in prioritizing ingestion and making sure you have the info you need. I really liked parts about validating detections and combining detections (like these 2 detections should fire together and you have a problem if they don’t). I think reading through the detection development chapters is a great way to help you increase the effectiveness and durability of detections. And like most (all?) Packt books, code is available on GitHub.

Weaknesses

I would like to see the section on careers expanded, but to be fair, what’s there is good and it’s a new enough field that I don’t know how much you can add. There are a couple typos – fewer than a lot of infosec books I’ve read. That’s pretty much it.

Conclusion

Buy the book. If you want to get into detection engineering, this will get you on the right path. If you are in detection engineering, this can be great backup for the things you want to implement and/or help you figure out the next steps. Probably the best thing is that this has so much info together in a logical, easy to follow format. Can you probably find most of this info online for free? Yes. Is it worth the time? IMO, no. There’s more info available than you can ever consume. Sometimes the best thing you can do is put out some cash to have things organized for you. I know that’s not always an option. One of the authors, Gary Katz, blogs on Medium and has several posts that are based on the book content. There are also great resources like Detection Engineering Weekly if you really can’t afford the book right now.

Posted in Blog, detection engineering

Thoughts on Practical Threat Modeling

Ask 5 different people what threat modeling is, and you’ll likely get 5 similar, but different, answers. GRC, dev, SOC analyst, IR, red team – there’s overlap but it’s difficult to get to a common model. As someone with more of a detection engineering focus, I have goals for threat modeling that may not line up with any of these. I wouldn’t necessarily say a common model is required, but the biggest issue I find with the different approaches is that you are left with vagueness and a sense of dread. Sure, you feel better because “threat modeling”, but there isn’t a clear path forward.

So, how do you make threat modeling more than an exercise in futility?

Cat attempting to run up a slide

Where do we begin with threat modeling? The simplest method may just be asking “what if”? I love the what if game. I trend pessimistic, so I can come up with a rather terrifying set of what ifs. It’s a starting point, and I suppose better than nothing. But it’s unlikely to lead to much action beyond maybe we should do more with this. But if FUD gets the ball rolling, it can be useful.

So you play “what if” and convince the powers that be that threat modeling to give our FUD a little more structure would be helpful. Now what? I’ve used most of the typical threat modeling approaches – STRIDE, PASTA, OCTAVE, etc. This blog covers a lot of options in a short space to give you an idea of the different options out there. All of these have their strengths, but I’ve found them kind of lacking for various reasons. Digging around, I think Katie Nickels nailed the frustrations with a lot of these in her blog for red canary. I wanted something practical, efficient, actionable, and directly linkable to threats. I also wanted something you could zoom in and out with as needed so you aren’t switching methods all the time.

Spinning top

First up, what are you threat modeling? This will drive later questions like who needs to be there and what deep dive approach should be taken. An important question to ask is if you are approaching the thing at the right level. Do you want to look at an entire system (say a SaaS product you use) or a specific type of thing (like an AWS Lambda) or maybe even the org. Each will need a slightly different approach, but there are commonalities you can use to keep the threat modeling program consistent and focused(ish).

Along with the what, determining the business objectives and integrations in place is needed. I’m big on asking what is the point of this. If something is worth threat modeling, you should be able to articulate specific needs associated with whatever it is. If you can’t, maybe this should be a discussion about deprecating rather than protecting. The integrations should inform this. Can the thing meet the business need with the existing integrations? How risky are those integrations? Does this innocuous seeming thing provide an easy path to the crown jewels? I like to include a system/data/integration model of some sort too. How detailed you get will be driven by what you are doing. If your org uses data classification, adding those to the model would be helpful. This helps you find integrations you might have missed and can help identify movements paths you might not think of otherwise. The threat modeling lead may be able to complete part of this, but your SME should be contributing. Use existing documentation (and link to it) wherever you can to avoid redundant work.

With the lead and SME introduced, the next question is who needs to be in the (virtual) room? A minimum of 3 is good to ensure coverage – lead, SME, attacker. I think someone in the detection and response space is a good person to lead the meeting. This person preps the threat model and guides the meeting. If your org has a threat modeling team, they would lead, but I find few orgs have a dedicated team. You need at least one person (the SME) who is very familiar with what you are addressing. If possible include an attack focused person, especially if your lead isn’t strong on the offensive side. I want to keep the group to 5 or fewer. More than that you run into issues with scheduling and chasing rabbits. The documentation produced can (and should) be reviewed by others to spot potential gaps.

When you start getting in to the threat model, I start with threat actors. Most of the time, these will be general and fairly consistent. The general ones you should be able to identify from an org level threat model (or looking at info for your sector). Internal and external should be considered. I don’t spend much time here unless there’s a reason the threat actor might be unique or different.

Since I’m generally acting as the lead, from threat actors I go into biggest concerns (AKA what keeps me up at night about this thing) and a deep dive into the appropriate threat modeling approach. I tend to use MITRE ATT&CK to guide the deep dive because of the expansive coverage. It forces you to think about more things – many may be N/A or of low concern, but at least you’ll give them a passing thought. I might go STRIDE or PASTA for a development threat model or OCTAVE for org level, but most of the time ATT&CK helps me derive focused action items. Once I do the deep dive, I’ll use that information to expand on the main concerns and identify the key areas to address. Discussion of these things will make up the bulk of the threat modeling meeting. When you work through the concerns, you should be collecting action items related to hardening and detections. There may be things that fall into other categories as well. Link the detections to whatever framework you link to (like ATT&CK) and put them into your detection pipeline. The hardening items will end up with the system (or whatever) owner.

Initial work done, time for the meeting. Generally 50 minutes is sufficient, but you may find a follow-up meeting is necessary. Ideally people would review the prepped threat model prior to the meeting (possibly with their teams) and bring feedback. I like setting aside 10 minutes at the beginning of the meeting for review because even when people have done the pre-work, the review time helps shift gears and remind everyone what you’re doing. During the discussion, the lead should be jotting down action items for hardening and detections. I’ll either use a desktop-size whiteboard or have the threat model document open to the action item area in a second window (others on the meeting may not love the switching during screen share – using a separate screen is better). After the meeting, the lead will need to generate the action items in whatever system you use and help drive those to completion.

Now comes the hard part of following through, but you should have a tangible to do list rather than just doom.

News anchors saying tonight at 11, doom - from Futurama

There are some disadvantages – it’s a heavier workload for the person prepping the documentation, the deep dive part can vary, and there can be things missed (that’s a risk with any approach). But the advantages of asynchronous development/review with a shorter, more focused meeting makes it easier to actually get threat modeling done and done in a way that you can do something with it beyond filling a pretty report. I’m finding total time for initial prep to generally be an hour-ish for things I’m familiar with and a couple hours for things I’m not familiar with. I find that acceptable though since I’m probably going to need to write detections for whatever it is anyways. Total time investment for prep and action item creation can be as little as 4 hrs (1 hr prep, 1 hr x 3 for meeting). That’s really not a lot of time to get valuable information. And it keeps the time commitment down enough that people don’t groan when you recommend (or insist) on doing a threat model – well, at least not to your face. I’ve also found the prep work makes for a more fruitful meeting. That synchronous time is high cost, so I want it to also be high value. This is an evolving approach. But I’m liking it. You can argue this goes beyond threat modeling. Maybe that’s a good thing? I consider threat modeling without actionable outcomes to be…unsatisfactory.

Documentation outline:

  • People
    • Lead
    • SME
    • Attacker
  • Thing being modeled
  • Business objectives
  • Diagram
  • Threat actors – general internal/external, anything specific?
  • Top concerns/threats
  • Deep dive – matrix of whatever specific approach works for you

Pre-meeting:

  • Lead preps threat model
  • SME and team review
  • Attacker and team review

Meeting agenda: 50 min

  • Threat model review 10 min
  • Threat model discussion 30 min
  • Wrap up and action items 10 min

Post meeting:

  • Hardening efforts by SME and team
  • Detections enter detection lifecycle
  • If available, engage in purple teaming to validate detections
  • Revisit threat model as needed – when an impacting change occurs and/or on a scheduled basis