www.lesswrong.com Open in urlscan Pro
13.32.121.101 Public Scan

Back to summary

Submitted URL:
https://link.mail.beehiiv.com/ss/c/u001.dSnm3kaGd0BkNqLYPjeMf1uX3H94iTJqR9esI_z8xTsqa_f2bcpYVShFobh2Tj8lly-czAIB06pNR5NvR60SEz...
Effective URL:
https://www.lesswrong.com/posts/GCHyDKfPXa5qsG2cP/human-study-on-ai-spear-phishing-campaigns?utm_source=www.therundown.ai&...
Submission: On January 07 via api (January 7th 2025, 2:20:02 pm UTC) from AE — Scanned from DE

Form analysis
1 forms found in the DOM

<form class="vulcan-form document-new" id="new-comment-form">
  <div class="FormErrors-root form-errors"></div>
  <div class="form-input input-contents form-component-EditorFormComponent">
    <div class="EditorFormComponent-root">
      <div>
        <div class="EditorFormComponent-editor EditorFormComponent-commentBodyStyles ContentStyles-base content ContentStyles-commentBody">
          <div class="EditorFormComponent-commentEditorHeight EditorFormComponent-ckEditorStyles">
            <div>
              <div class="ck-blurred ck ck-content ck-editor__editable ck-rounded-corners ck-editor__editable_inline" lang="en" dir="ltr" role="textbox" aria-label="Rich Text Editor. Editing area: main. Press Alt+0 for help." contenteditable="true">
                <p class="ck-placeholder" data-placeholder="Text goes here! See lesswrong.com/editor for info about everything the editor can do.

lesswrong.com/editor covers formatting, draft-sharing, co-authoring, LaTeX, footnotes, tagging users and posts, spoiler tags, Markdown, tables, crossposting, and more."><br data-cke-filler="true"></p>
              </div>
            </div>
          </div>
        </div>
      </div>
    </div>
  </div>
  <div class="CommentsNewForm-submit"><button tabindex="0" class="MuiButtonBase-root MuiButton-root MuiButton-text MuiButton-flat CommentsNewForm-formButton CommentsNewForm-submitButton" type="submit" id="new-comment-submit"><span
        class="MuiButton-label">Submit</span><span class="MuiTouchRipple-root"></span></button></div>
</form>

Text Content

This website requires javascript to properly function. Consider activating
javascript to get access to all site functionality.
LESSWRONG
IS FUNDRAISING!
LW
$

Login

Human study on AI spear phishing campaigns
6 min read
•
Full paper: https://arxiv.org/abs/2412.00586
•
Abstract
•
Method
•
Results
•
Automated intent detection
•
The economics of AI-enhanced phishing
•
Future Work
•
Conclusion

AI MisuseAI RiskComputer Security & CryptographyAI
Frontpage



74






HUMAN STUDY ON AI SPEAR PHISHING CAMPAIGNS

by Simon Lermen, Fred Heiding
3rd Jan 2025
6 min read
8




74



TL;DR: We ran a human subject study on whether language models can successfully
spear-phish people. We use AI agents built from GPT-4o and Claude 3.5 Sonnet to
search the web for available information on a target and use this for highly
personalized phishing messages. We achieved a click-through rate of above 50%
for our AI-generated phishing emails.


FULL PAPER: HTTPS://ARXIV.ORG/ABS/2412.00586

This post is intended to be a brief summary of the main findings, these are some
key insights we gained:

 1. AI spear-phishing is highly effective, receiving a click-through rate of
    more than 50%, significantly outperforming our control group.
 2. AI-spear phishing is also highly cost-efficient, reducing costs by up to 50
    times compared to manual attacks.
 3. AI models are highly capable of gathering open-source intelligence. They
    produce accurate and useful profiles for 88% of targets. Only 4% of the
    generated profiles contained inaccurate information.
 4. Safety guardrails are not a noteworthy barrier for creating phishing mails
    with any tested model, including Claude 3.5 Sonnet, GPT-4o, and o1-preview.
 5. Claude 3.5 Sonnet is surprisingly good at detecting AI-generated phishing
    emails, though it struggles with some phishing emails that are clearly
    suspicious to most humans.


ABSTRACT

In this paper, we evaluate the capability of large language models to conduct
personalized phishing attacks and compare their performance with human experts
and AI models from last year. We include four email groups with a combined total
of 101 participants: A control group of arbitrary phishing emails, which
received a click-through rate (recipient pressed a link in the email) of 12%,
emails generated by human experts (54% click-through), fully AI-automated emails
54% (click-through), and AI emails utilizing a human-in-the-loop (56%
click-through). Thus, the AI-automated attacks performed on par with human
experts and 350% better than the control group. The results are a significant
improvement from similar studies conducted last year, highlighting the increased
deceptive capabilities of AI models. Our AI-automated emails were sent using a
custom-built tool that automates the entire spear phishing process, including
information gathering and creating personalized vulnerability profiles for each
target. The AI-gathered information was accurate and useful in 88% of cases and
only produced inaccurate profiles for 4% of the participants. We also use
language models to detect the intention of emails. Claude 3.5 Sonnet scored well
above 90% with low false-positive rates and detected several seemingly benign
emails that passed human detection. Lastly, we analyze the economics of
phishing, highlighting how AI enables attackers to target more individuals at
lower cost and increase profitability by up to 50 times for larger audiences.


METHOD

In a brief summary, the method consists of 5 steps:

 1. Collecting Targets.
 2. Collecting information about the targets using an AI web-browsing agent that
    crawls publicly available information.
 3. Writing personalized phishing emails based on the collected information and
    carefully crafted language model prompts.
 4. Sending emails using custom-built and automated software.
 5. Analyzing outcomes by tracking clicks on embedded URLs in emails.

Example from the paper for an AI-compiled profile from step 2 and an AI-written
email from step 3.


RESULTS

For full results, go to the paper

The results of the phishing emails are presented in the figure below (figure 5
in the paper). The control group emails received a click-through rate of 12%,
the emails generated by human experts 54%, the fully AI-automated emails 54%,
and the AI-emails utilizing a human-in-the-loop 56%. Both the AI-generated email
types (fully automated and human-in-the-loop) performed on par with the emails
created by human experts.

Click-through rates for the 4 different groups in our human study

The figure below shows that about 40% of both AI groups specifically mentioned
in their free text answer that personalization increased their trust in the
email message, compared to 0% in the control group and about 20% in the human
expert group. The presentation received equally trustworthy scores for the AI
and human expert-generated emails. 

Common reasons given in free text survey responses for why the email was
trustworthy as a percentage of survey responses per group.

Furthermore, we analyzed the quality of AI-compiled profiles. In a manual
analysis of 51 profiles, we found that 88% of profiles were accurate and useful,
8% were accurate but not useful and 4% contained inaccurate information.

To measure the time saved, we measured how much time we spent on the
interventions for the human-in-the-loop group. The information gathering (OSINT)
updates for this group took an average of 1:05 minutes, and the email message
updates averaged 1:36 minutes. Thus, the total combined time was 2:41 minutes
per message. To better measure the time difference between AI-automated and
manual attacks, we manually replicated the OSINT information gathering and email
crafting processes
provided by our AI tool. We performed the manual replication on four targets.
The process required an average of 23:27 minutes to gather OSINT data and an
additional 10:10 minutes to write each personalized email, totaling about 34
minutes per target. Thus the human-in-the-loop based AI-automation was about 92%
faster than the fully manual process.


AUTOMATED INTENT DETECTION

On average, Claude achieved a true positive detection rate of 97.25% with no
false positives. If we weigh the detection rates by category, i.e., each
category is given the same weight regardless of the number of messages in the
category, the detection rate remains almost identical (97.64%). Claude 3.5
Sonnet far outperformed GPT-4o. The paper contains additional results using
other models for suspicious intent detection.

However, the possibilities of jailbreaks and prompt injections pose a
significant challenge to using language models to prevent phishing.

 

Overview of suspicion scores evaluated by the Claude 3.5 Sonnet and GPT-4o. The
first row is evaluated for suspicion by GPT-4o, and the second by Claude 3.5
Sonnet. The plots compare different types of mail, from legitimate mail, mail
generated for our 2 AI groups (orange), mail generated by 3 different AI models
(red), and other types of phishing mail (blue). For more information on the data
used, see section 4.2 of the paper. For a theoretical detection threshold of
50%, we show a cutoff line with corresponding false positive (FP) and true
positive (TP) percentages.


THE ECONOMICS OF AI-ENHANCED PHISHING

Table 4 from the paper shows part of our economic analysis. We estimate q for
three different scenarios, considering low, medium and high conversion rates.
conversion rate refers to the ratio of opened URLs that result in a successful
fraud. Using fully automated AI with no human intervention always leads to the
highest returns.

 


FUTURE WORK

For future work, we hope to scale up studies on human participants by multiple
orders of magnitude and measure granular differences in various persuasion
techniques. Detailed persuasion results for different models would help us
understand how AI-based deception is evolving and how to ensure our protection
schemes stay up-to-date. Additionally, we will explore fine-tuning models for
creating and detecting phishing. We are also interested in evaluating AI's
capabilities to exploit other communication channels, such as social media or
modalities like voice. Lastly, we want to measure what happens after users press
a link in an email. For example, how likely is it that a pressed email link
results in successful exploitation, what different attack trees exist (such as
downloading files or entering account details in phishing sites), and how well
can AI exploit and defend against these different paths? We also encourage other
researchers to explore these avenues. 

We propose personalized mitigation strategies to counter AI-enhanced phishing.
The cost-effective nature of AI makes it highly plausible we're moving towards
an agent vs agent future. AI could assist users by creating personalized
vulnerability profiles, combining their digital footprint with known behavioral
patterns.


CONCLUSION

Our results reveal the significant challenges that personalized, AI-generated
phishing emails present to current cybersecurity systems. Many existing spam
filters use signature detection (detecting known malicious content and
behaviors). By using language models, attackers can effortlessly create phishing
emails that are uniquely adapted to every target, rendering signature detection
schemes obsolete. As models advance, their capabilities of persuasion will
likely also increase. We find that LLM-driven spear phishing is highly effective
and economically viable, with automated reconnaissance that provides accurate
and useful information in almost all cases. Current safety guardrails fail to
reliably prevent models from conducting reconnaissance or generating phishing
emails. However, AI could mitigate these threats through advanced detection and
tailored countermeasures.


 



AI Misuse1AI Risk1Computer Security & Cryptography1AI2
Frontpage


74


Human study on AI spear phishing campaigns
4cousin_it
3Dagon
4cousin_it
3Dagon
3cousin_it
3Fred Heiding
2Haiku
8Fred Heiding

New Comment




Submit
8 comments, sorted by
top scoring
Click to highlight new comments since: Today at 3:19 PM
[-]cousin_it4d4

0


Are AI companies legally liable for enabling such misuse? Do they take the
obvious steps to prevent it, e.g. by having another AI scan all chat logs and
flag suspicious ones?

Reply
[-]Dagon3d3

0


No, they're not.  I know of no case where a general-purpose toolmaker is
responsible for misuse of it's products. This is even less likely for software,
where it's clear that the criminals are violating their contract and using it
without permission.

None of them, as far as I know, publish specifically what they're doing.  Which
is probably wise - in adversarial situations, telling the opponents exactly what
they're facing is a bad idea.  They're easy and cheap enough that "flag
suspicious uses" doesn't do much - it's too late by the time the flags add up to
any action.

This is going to get painful - these things have always been possible, but have
been expensive and hard to scale.  As it becomes truly ubiquitous, there will be
no trustworthy communication channels.

Reply
[-]cousin_it3d4

0


This one isn't quite a product though, it's a service. The company receives a
request from a criminal: "gather information about such-and-such person and
write a personalized phishing email that would work on them". And the company
goes ahead and does it. It seems very fishy. The fact that the company fulfilled
the request using AI doesn't even seem very relevant, imagine if the company had
a staff of secretaries instead, and these secretaries were willing to make
personalized phishing emails for clients. Does that seem like something that
should be legal? No? Then it shouldn't be legal with AI either.

Though probably no action will be taken until some important people fall victim
to such scams. After that, action will be taken in a hurry.

Reply
[-]Dagon3d3

0


"seem like something that should be legal" is not the standard in any
jurisdiction I know.  The distinctions between individual service-for-hire and
software-as-a-service are pretty big, legally, and make the analogy not very
predictive.

I'll take the other side of any medium-term bet about "action will be taken in a
hurry" if that action is lawsuit under current laws.  Action being new laws
could happen, but I can't guess well enough to have any clue how or when it'd
be.

Reply
[-]cousin_it3d3

0


Fair enough. And it does seem to me like the action will be new laws, though
you're right it's hard to predict.

Reply
[-]Fred Heiding2d3

0


Great discussion. I’d add that it’s context-dependent and somewhat ambiguous.
It’s noteworthy that our work shows that all tested AI models conflict with at
least three of the eight prohibited AI practices outlined in the EU’s AI Act.

It’s also worth noting that the only real difference between sophisticated
phishing and marketing can be the intention, making mitigation difficult.
Actions from AI companies to prevent phishing might restrict legitimate use
cases too much to be interesting.

Reply
[-]Haiku3d2

1


I'm glad we now have a study to point to! "Automated Spear Phishing at scale"
has been a common talking point regarding current risks from AI, and it always
seemed strange to me that I hadn't heard about this strategy being validated.
This paper shows that the commonly-shared intuition about this risk was
correct... and I'm still confused about why I haven't yet heard of this strategy
being maximally exploited by scammers.

Reply1
[-]Fred Heiding3d8

2


Thanks for your feedback! It’s just a matter of time before scammers maximize
their use of AI. Hopefully, the defense community can use this time to optimize
our head start. Stay tuned for our coming work!

Reply
Moderation Log
8Comments


PreviousNext

www.lesswrong.com Open in urlscan Pro 13.32.121.101 Public Scan

Form analysis 1 forms found in the DOM

Text Content

www.lesswrong.com Open in urlscan Pro
13.32.121.101 Public Scan

Form analysis
1 forms found in the DOM