If you're planning to use an AI chatbot for nefarious purposes, Microsoft is waiting around the corner, or so it wants to think.
In a post in her blog published today, the company announced a new feature coming to Azure AI Studio and the Azure OpenAI Service, which developers use to build artificial intelligence applications and custom Copilots. The new feature is called Prompt Shields and is designed to protect AI chatbots from two different types of attacks.
The first type of attack is known as direct attack ή jailbreak. In this scenario, the person using the chatbot writes a prompt designed to manipulate the AI into doing something against its rules and constraints. For example, someone can write a prompt with keywords or phrases like “ignore previous instructions” or “system bypass” to intentionally bypass security measures.
The second type of attack is called indirect attack (indirect attack) or direct cross-domain communication injection attack (cross-domain prompt injection attack). Here, a malicious user sends information to the chatbot with the intention of carrying out some kind of cyber attack. It typically uses external data, such as an email or document, with instructions designed to exploit the chatbot.
Like other forms of malware, indirect attacks may seem simple or innocent instructions to the user, but they carry specific risks. A compromised Copilot built through Azure AI could be vulnerable to fraud, malware distribution or content manipulation if it is able to process data, either on its own or with the help of extensions, according to Microsoft.
To try to prevent both direct and indirect attacks against AI chatbots, the new Prompt Shields feature will be integrated with the content filters in the Azure OpenAI Service. Using machine learning, the feature will try to find and eliminate potential threats in user prompts and third-party data.
Prompt Shields is currently available in preview mode for Azure AI Content Safety, and will soon be available in Azure AI Studio starting April 1st.
Microsoft today deployed another weapon in the war against AI manipulation: spotlighting, techniques designed to help AI models better distinguish valid AI prompts from those that are potentially dangerous or unreliable.