Pen Testing: Past, Present, and Future

An Inside Look Through Decades Of Security

29 August 2022

By Joel Scambray

A little over twenty years ago, I co-authored a book called “Hacking Exposed: Network Security Secrets and Solutions.” It gained some popularity at the time for its hands-on description of the dark art of “pen testing” at the practitioner level, even though many had written about similar topics for many years before (Dan Farmer and Chris Klaus being two prominent examples).

 

The Past

Back then, pen testing typically referred to goal-oriented technical security assessment, like a capture-the-flag exercise but with the flag being a file on a protected server somewhere on a computer network.

A team of experienced security hackers spent time trying to break into systems around the flag using whatever tools and techniques they could cobble together from information found on the young internet and within the minds of a select few uber-hackers who specialized in breaking different technologies like Microsoft Windows, Novell Netware, and Cisco network devices. They would then pivot from one compromised system to the next until they had penetrated the defenses of the system housing the flag, achieving the objective, and ending the exercise.

Over time, much less attention was paid to the “flag” given that most computer networks were replete with valuable assets all around. So “pen testing” practically came to mean “experienced security hackers spending time trying to break in to as many systems as they can over a specified period of time, and reporting the holes they exploited and the valuable things they exposed.”

It was more art than science, and many so-called gray-hat consulting teams came and went through the late nineties until this very day charging attractive fees to perform pen tests on targets ranging from corporate networks to public web applications. 

What is penetration testing?

The old-school definition was something like “experienced security hackers spending time trying to break in to as many systems as they can over a specified period of time, and reporting the holes they exploited and the valuable things they exposed.”

The industry was content to define very little structure around this activity. The main differences between different types of pen testing arose organically for practical reasons: 

- The target of the assessment needed to be defined at a high level because it was rare to find testers with both broad and deep technical skillsets. So, pen tests came to be categorized as focused on networks, applications, physical facilities, humans (aka. social engineering), devices, and other types of targets. 

- Manual versus automated approaches were often separated, mainly due to the (often unfounded) notion that tools simply could not provide the same level of intuition that human testers used to circumvent technical obstacles and find novel vulnerabilities. 

- Black box versus white box (aka. uninformed versus informed pen testing) defined whether the attack team would be provided access to non-public information about the target, typically application source code, configuration information, and/or various levels of user or administrative account credentials. While everyone asserted that in theory, one should assume a knowledgeable adversary, most pen tests were conducted black box, probably because it seemed to provide a more “real-world” approximation of the probability of compromise.  

- Duration – how long would the pen test team get to focus on a set of targets? An assumed spectrum of value was associated with longer-duration tests and was meant to approximate the dedication of adversaries, from unfunded “script kiddies” to well-funded nation-state-backed attackers. 

Looking back, even though there wasn’t much definition around pen testing, I fondly recall these days as the “roaring” era of cybersecurity assessment. This more hands-on approach shook up the staid world of system configuration reviews that usually got tucked into financial audits conducted by big accounting firms. And it found real vulnerabilities in applications and infrastructure used by millions of people that were probably overlooked before pen testing became more widespread.

The Present

Fast forward to 2022, and it’s with mixed emotions that I report: things have not changed much, at least with regards to terminology and fundamental concepts. While that’s good for the improvements in vulnerability identification driven by pen testing, it’s less good in that things remain more art than science, and thus there remains significant confusion and misunderstanding of how to maximize the value from such an exercise.

Today it’s fair to say that “pen testing” remains a poorly-defined term thrown around to mean various things, but still mainly “experienced security hackers spending time trying to break into a bunch of stuff.” 

Types of penetration testing: 

  • Network penetration testing 

  • Web application penetration testing

  • Mobile application penetration testing

  • Native/compiled application penetration testing

  • Physical facilities penetration testing

  • Social engineering assessment 

While this still serves a valuable purpose, there continue to be misconceptions about: 

  • What constitutes “experienced” pen testers - while certifications such as CREST and OSCP have started to put some criteria around this, the quality one receives when purchasing a “pen test” still varies greatly. NCC Group is a CREST Member Company and has several OSCP-certified consultants.
  • Breadth vs. depth – the original association with goal-oriented capture-the-flag testing may have dissipated for some organizations a long time ago, but no common understanding replaced it. While the use of more precise terms like “vulnerability assessment” (to mean a more breadth-oriented survey of an attack surface) has increased, even professionals can still often disagree over what constitutes a “pen test.” Interestingly, goal-oriented pen testing has experienced something of a resurgence recently, with new terminology like “red teaming.” NCC Group offers “Full Spectrum Attack Simulation,” or FSAS, which is similar.
  • Automation benefits and drawbacks – use of tools or automation in pen testing is still disdained by the “1337” (“leet” or elite in hackerspeak), but there is growing appreciation that automation has its place in security testing. Automated tools can be employed to enhance and amplify manual testing and, in some cases, may provide equivalent precision to human-driven analysis. Certainly, no serious professional would recommend all manual testing for large-scale assessment, and very few organizations have the real need or budget for it. 
  • The value of white box testing – probably the biggest misconception we see is the continued assumption that black box / uninformed attacker testing is superior to white box / informed attacker. The flip side of pen testing’s popularity has unfortunately displaced important supporting approaches like threat modeling and source code review. These approaches help channel security testing effort towards a better “definition of done,” as opposed to repeating the same exercise and expecting a different result (aka. the definition of insanity). This situation has led to some derogatory terms for pen testing including “whack-a-mole,” “badness-ometer,” or “hamster wheel of pain.”
  • How to measure “quality” outcomes – so you got zero findings on your last 2-week pen test. Were you happy or sad? Especially for targets that have been tested several times over the years, the disappearance of high-risk security vulnerabilities is to be expected, maybe even cheered. But that leaves only duration of testing as the core measure of quality. And indeed, pen test duration remains the “coin of the realm” with many prominent organizations like Microsoft continuing to define durations like “12 days of manual [pen] testing” as a benchmark.
  • Compliance versus exploitation – many equate pen testing with a fixed outcome or “definition of done,” while at the same time expecting extensive, customized research into verifying “0-day” vulnerabilities in their technologies. These are incongruent expectations. Many call the former type of assessment a “compliance-oriented” engagement, but I like to use the term “deterministic,” because it takes expected inputs and provides fixed outputs. This is opposed to “non-deterministic” testing, which is closer to the classic definition of pen testing: “weapons free” use of techniques, tools, and time to try to find previously unknown vulnerabilities. Instead, if you want some kind of certification that you’re “compliant” with the OWASP Top 10 or something, you should take the time to define your “deterministic” inputs and outcomes more clearly so that you can more consistently agree on “done.”

These drawbacks are even more relevant to organizations who’ve conducted numerous pen tests. They’ve passed beyond the initial value proposition of “hackers spending time trying to break into a bunch of stuff” and really need a better set of criteria to gauge success and build on it over time. 

The Future

Looking forward, how are experienced firms rethinking pen testing? Our more advanced customers are trying to put more structure around security testing. As you might have guessed, they are confronting and trying to solve some of the drawbacks we talked about during our discussion of “Present.” Here are some of the things we are seeing: 

  • Clearer definitions of activities and time – as the industry continues to struggle with defining what experienced pen testers do, security assessors are setting more explicit expectations around the specific activities, time commitment, and deliverables at a more granular level. In parallel, customers invest more effort in quantifying target composition and specify what granular test cases are of interest within a given engagement. Evolving frameworks like IT Asset Management (ITAM) and Software Bill of Materials (SBOM) may help organizations break out the components of a given target/attack surface, enabling higher-quality pen test engagements.
  • Realizing the value of different approaches, especially white box – as we noted in our discussion of “Present,” many organizations that have done pen testing over the last several years are reaching diminishing returns. This is where we see great value in adding different assessment methodologies like white box pen testing, threat modeling, code review, fuzz testing, and red teaming. While these typically require more effort and engagement from internal teams, the payoff is significant in contextual understanding, novel attack vector brainstorming, and avoidance of well-trod ground. 
  • Using automation to scale coverage – anyone hiring in the cybersecurity market over the last several years knows that there are not enough skilled pen testers to cover all the technology attack surface we keep adding to every day. This means that you’re vulnerable without an automation strategy by definition. Let’s set the historical record straight: automation does not inherently provide a lesser-quality assessment. It’s an immensely powerful tool when deployed with good fit-for-purpose. Especially for large-scale network and application attack surfaces, regular scanning is effectively table stakes for diligence nowadays.
  • Clarify deterministic versus non-deterministic testing – non-deterministic “classic” pen testing is scenario-specific, requires effort to understand features and risks unique to the target, and involves additional time for proof-of-concept exploit development. Compliance-oriented or deterministic engagements involve well-defined test cases with “pass/fail” outcomes that may provide the basis for some sort of certification. These engagement models have different prerequisites, assessment methodologies, and reporting requirements. If you don’t understand the differences and are not clear on what outcome you want, you will likely pay for something that is either under or over-scoped for your purposes. 

 

Penetration testing vs. vulnerability scanning is not “either/or” – you should be doing both! There are many excellent, free network penetration testing tools like Nmap and Metasploit, which can be helpful to get you started.

We hope you’ve enjoyed this tour of pen testing past, present, and future. From humble beginnings, pen testing has become the most recognized practice within the cybersecurity universe. This is based on the fundamental risk identification value it provides. Extracting the maximum value from pen testing requires effort on both the assessor and customer side, mainly around more carefully defining the target attack surface and desired activities, time commitment, and expected outcomes. Reach out to NCC Group if you’d like to learn more applying these concepts to provide maximum pen testing value. 

Penetration testing steps

  1. Planning – defining scope (both in terms of targets and activities), timeframes, budget, etc. 

  2. Contracting with an independent firm – (if applicable) writing up specific activities, time, and deliverable expectations, additional terms like scheduling, legal/procurement review, signatures, etc.

  3. Engagement – kick-off meetings, review rules of engagement, share contact info, escalation procedures, etc.

  4.  Testing

    1. Reconnaissance

    2. Exploitation (with approval as necessary)

    3. Pivot/escalate (if scoped)

    4. Clean-up (tools and other materials left on systems)

    5. Write-up of individual findings, activities, outcomes, and recommendations

  5. Reporting – collecting individual finding write-ups into a single report, adding summaries and other supporting sections, QA review of the report, etc.

  6. Presentation of results aka. read-out or debrief meeting to review and discuss the final report

  7. Remediation of findings by the responsible organization (aka. target of the pen test); this can take significant time if there are many identified vulnerabilities that require fixing, and/or some of them require significant effort to fix (e.g. design-level change

  8. Re-testing (Optional) to validate remediation was effective.

  9. Post-engagement planning – next assessment, areas that were missed/under-analyzed, scope alterations, etc.

About the Author

Joel Scambray

Joel Scambray

Senior Vice PresidentData & Application Security Services

Joel Scambray has helped Fortune 500-class organizations address information security challenges for over twenty years as a consultant, author, speaker, executive, and entrepreneur. He is widely recognized as co-author of the Hacking Exposed book series, and has worked/consulted for companies including Microsoft, Google, Amazon, and Ernst & Young. He has helped start & build security companies valued collectively in the hundreds of millions of dollars. Joel is currently Senior Vice President of Data & Application Security services at NCC Group.