{"id":89368,"date":"2024-08-10T08:06:46","date_gmt":"2024-08-10T12:06:46","guid":{"rendered":"https:\/\/coinscreed.com\/staging\/?p=89368"},"modified":"2024-08-10T08:06:53","modified_gmt":"2024-08-10T12:06:53","slug":"anthropic-offers-15k-for-next-gen-ai-jailbreaks","status":"publish","type":"post","link":"https:\/\/coinscreed.com\/staging\/anthropic-offers-15k-for-next-gen-ai-jailbreaks\/","title":{"rendered":"Anthropic Offers $15K Bounty for Next-Gen AI Jailbreaks"},"content":{"rendered":"\n<p>This program is part of Anthropic's &#8220;red teaming&#8221; efforts, where engineers intentionally try to manipulate or disrupt their AI models.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img fetchpriority=\"high\" decoding=\"async\" width=\"764\" height=\"401\" src=\"https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2024\/08\/images-1.png\" alt=\"Anthropic Offers $15K Bounty for Next-Gen AI Jailbreaks\" class=\"wp-image-89369\" srcset=\"https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2024\/08\/images-1.png 764w, https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2024\/08\/images-1-300x157.png 300w\" sizes=\"(max-width: 764px) 100vw, 764px\" \/><figcaption class=\"wp-element-caption\">Anthropic Offers $15K Bounty for Next-Gen AI Jailbreaks<\/figcaption><\/figure>\n\n\n\n<p>On August 8, Anthropic, a company that specializes in artificial intelligence, announced the introduction of an expanded bug bounty program. The program will provide rewards of up to $15,000 to participants who are able to &#8220;jailbreak&#8221; the company's unannounced &#8220;next generation&#8221; intelligent system. <\/p>\n\n\n\n<p>The generative artificial intelligence system known as Claude-3, which is the main AI model of Anthropic, is comparable to <a href=\"https:\/\/coinscreed.com\/staging\/jpmorgan-openai-chatgpt-in-ai-llm-suite.html\" data-type=\"post\" data-id=\"89293\">OpenAI's ChatGPT <\/a>and Google's Gemini. The corporation engages in a practice known as &#8220;red teaming&#8221; as part of its efforts to guarantee that Claude and its other models are capable of working in a secure manner.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Anthropic Red teaming<\/h2>\n\n\n\n<p>In its most basic form, red teaming refers to the act of intentionally attempting to cause damage to something. In Claude's case, the goal of red teaming is to identify all potential avenues for encouragement, manipulation, or disruption that could result in less than ideal outputs.<\/p>\n\n\n\n<p>During red teaming attempts, engineers may reword questions or reframe a query in order to fool the artificial intelligence into producing information that it has been designed to avoid.<\/p>\n\n\n\n<p>An artificial intelligence system that has been trained on data obtained from the internet, for instance, is likely to contain personally identifiable information on a large number of individuals. <\/p>\n\n\n\n<p>Anthropic has implemented guardrails as part of its safety strategy in order to prevent Claude and its other models from releasing such information. Protective measures have been taken.<\/p>\n\n\n\n<p>The effort of trying to figure out every possible unwanted output becomes exponentially more difficult as artificial intelligence models become more strong and capable of replicating human speech.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Anthropic Bug bounty<\/h2>\n\n\n\n<p><a href=\"https:\/\/coinscreed.com\/staging\/anthropic-launches-ai-chatbot-claude-3.html\" data-type=\"post\" data-id=\"72859\">Anthropic<\/a> has integrated a number of innovative safety interventions in its models, one of which being the &#8220;Constitutional AI&#8221; paradigm; yet, it is always refreshing to see new perspectives on a problem that has been around for a much longer time.<\/p>\n\n\n\n<p>According to a post on the company's blog, the latest project will focus on universal jailbreak attacks, building upon the existing bug bounty programs. <\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cThese are exploits that could allow consistent bypassing of AI safety guardrails across a wide range of areas. By targeting universal jailbreaks, we aim to address some of the most significant vulnerabilities in critical, high-risk domains such as CBRN (chemical, biological, radiological, and nuclear) and cybersecurity.\u201d<\/p>\n<\/blockquote>\n\n\n\n<p>The company encourages artificial intelligence researchers with prior experience and those who &#8220;have demonstrated expertise in identifying jailbreaks in language models&#8221; to submit their applications by August 16th. <\/p>\n\n\n\n<p>The company is only admitting a limited number of participants overall.Even though the company plans to &#8220;expand this initiative more broadly in the future,&#8221; it won't choose all applicants.<\/p>\n\n\n\n<p>Those selected for red-teaming will have early access to a &#8220;next generation&#8221; <a href=\"https:\/\/www.google.com\/search?q=Anthropic+Offers+%2415K+Bounty+for+Next-Gen+AI+Jailbreaks&oq=Anthropic+Offers+%2415K+Bounty+for+Next-Gen+AI+Jailbreaks&gs_lcrp=EgZjaHJvbWUyBggAEEUYOdIBBzkxN2owajmoAg6wAgE&client=ms-android-vivo-rvo2&sourceid=chrome-mobile&ie=UTF-8\" data-type=\"link\" data-id=\"https:\/\/www.google.com\/search?q=Anthropic+Offers+%2415K+Bounty+for+Next-Gen+AI+Jailbreaks&oq=Anthropic+Offers+%2415K+Bounty+for+Next-Gen+AI+Jailbreaks&gs_lcrp=EgZjaHJvbWUyBggAEEUYOdIBBzkxN2owajmoAg6wAgE&client=ms-android-vivo-rvo2&sourceid=chrome-mobile&ie=UTF-8\" target=\"_blank\" rel=\"noopener\">artificial intelligence model.<span class=\"wpil-link-icon\" title=\"Link goes to external site.\" style=\"margin: 0 0 0 5px;\"><svg width=\"24\" height=\"24\" style=\"height:16px; width:16px; fill:#000000; stroke:#000000; display:inline-block;\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" xmlns:svg=\"http:\/\/www.w3.org\/2000\/svg\"><g id=\"wpil-svg-outbound-7-icon-path\" fill=\"none\" clip-path=\"url(#clip0_31_188)\">\r\n                            <path d=\"M9.16724 14.8891L20.1672 3.88908\" stroke-linecap=\"round\"\/>\r\n                            <path d=\"M13.4497 3.53554L20.5208 3.53554L20.5208 10.6066\" stroke-linecap=\"round\" stroke-linejoin=\"round\"\/>\r\n                            <path d=\"M17.5 13.5L17.5 16.26C17.5 17.4179 17.5 17.9968 17.2675 18.4359C17.0799 18.7902 16.7902 19.0799 16.4359 19.2675C15.9968 19.5 15.4179 19.5 14.26 19.5L7.74 19.5C6.58213 19.5 6.0032 19.5 5.56414 19.2675C5.20983 19.0799 4.92007 18.7902 4.73247 18.4359C4.5 17.9968 4.5 17.4179 4.5 16.26L4.5 9.74C4.5 8.58213 4.5 8.0032 4.73247 7.56414C4.92007 7.20983 5.20982 6.92007 5.56414 6.73247C6.0032 6.5 6.58213 6.5 7.74 6.5L11 6.5\" stroke-linecap=\"round\"\/>\r\n                        <\/g>\r\n                        <defs>\r\n                            <clipPath id=\"clip0_31_188\">\r\n                                <rect fill=\"white\" height=\"24\" width=\"24\"\/>\r\n                            <\/clipPath>\r\n                        <\/defs><\/svg><\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This program is part of Anthropic&#8217;s &#8220;red teaming&#8221; efforts, where engineers intentionally try to manipulate or disrupt their AI models. On August 8, Anthropic, a company that specializes in artificial intelligence, announced the introduction of an expanded bug bounty program. The program will provide rewards of up to $15,000 to participants who are able to [&hellip;]<\/p>\n","protected":false},"author":62,"featured_media":89369,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[21],"tags":[3996,16537,21235,21234,870],"class_list":["post-89368","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news","tag-ai","tag-anthropic","tag-bug-bounty-program-2","tag-red-teaming","tag-technology"],"jetpack_featured_media_url":"https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2024\/08\/images-1.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/posts\/89368","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/users\/62"}],"replies":[{"embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/comments?post=89368"}],"version-history":[{"count":0,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/posts\/89368\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/media\/89369"}],"wp:attachment":[{"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/media?parent=89368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/categories?post=89368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/tags?post=89368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}