{"id":56148,"date":"2023-08-08T05:46:47","date_gmt":"2023-08-08T09:46:47","guid":{"rendered":"https:\/\/coinscreed.com\/staging\/?p=56148"},"modified":"2023-08-08T05:46:49","modified_gmt":"2023-08-08T09:46:49","slug":"openai-debuts-web-crawler-gptbot-alongside-gpt-5-plans","status":"publish","type":"post","link":"https:\/\/coinscreed.com\/staging\/openai-debuts-web-crawler-gptbot-alongside-gpt-5-plans\/","title":{"rendered":"OpenAI Debuts Web Crawler &#8216;GPTBot&#8217; Alongside GPT-5 Plans"},"content":{"rendered":"\n<p><a href=\"https:\/\/coinscreed.com\/staging\/chatgpt-founder-sam-altman-to-launch-worldcoin-crypto.html\" target=\"_blank\" rel=\"noreferrer noopener\">ChatGPT<\/a> users have the option of disabling the web crawler by including a &#8220;disallow&#8221; command in a standard server file.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-2-1024x576.webp\" alt=\"\" class=\"wp-image-56155\" srcset=\"https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-2-1024x576.webp 1024w, https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-2-300x169.webp 300w, https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-2-768x432.webp 768w, https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-2-750x422.webp 750w, https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-2-1140x641.webp 1140w, https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-2.webp 1248w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">OpenAI Debuts Web Crawler &#8216;GPTBot' Alongside GPT-5 Plans<\/figcaption><\/figure>\n\n\n\n<p>According to the company, OpenAI's &#8220;GPTBot&#8221; is a new web crawling tool that could conceivably enhance future ChatGPT models. \u201cWeb pages crawled with the GPTBot user agent may potentially be used to improve future models,&#8221; OpenAI wrote in a new blog post, adding that it could increase accuracy and expand the capabilities of future iterations.<\/p>\n\n\n\n<p>A <a href=\"https:\/\/www.cloudflare.com\/learning\/bots\/what-is-a-web-crawler\/#:~:text=A%20web%20crawler%2C%20or%20spider,appear%20in%20search%20engine%20results.\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">web crawler,<span class=\"wpil-link-icon\" title=\"Link goes to external site.\" style=\"margin: 0 0 0 5px;\"><svg width=\"24\" height=\"24\" style=\"height:16px; width:16px; fill:#000000; stroke:#000000; display:inline-block;\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" xmlns:svg=\"http:\/\/www.w3.org\/2000\/svg\"><g id=\"wpil-svg-outbound-7-icon-path\" fill=\"none\" clip-path=\"url(#clip0_31_188)\">\r\n                            <path d=\"M9.16724 14.8891L20.1672 3.88908\" stroke-linecap=\"round\"\/>\r\n                            <path d=\"M13.4497 3.53554L20.5208 3.53554L20.5208 10.6066\" stroke-linecap=\"round\" stroke-linejoin=\"round\"\/>\r\n                            <path d=\"M17.5 13.5L17.5 16.26C17.5 17.4179 17.5 17.9968 17.2675 18.4359C17.0799 18.7902 16.7902 19.0799 16.4359 19.2675C15.9968 19.5 15.4179 19.5 14.26 19.5L7.74 19.5C6.58213 19.5 6.0032 19.5 5.56414 19.2675C5.20983 19.0799 4.92007 18.7902 4.73247 18.4359C4.5 17.9968 4.5 17.4179 4.5 16.26L4.5 9.74C4.5 8.58213 4.5 8.0032 4.73247 7.56414C4.92007 7.20983 5.20982 6.92007 5.56414 6.73247C6.0032 6.5 6.58213 6.5 7.74 6.5L11 6.5\" stroke-linecap=\"round\"\/>\r\n                        <\/g>\r\n                        <defs>\r\n                            <clipPath id=\"clip0_31_188\">\r\n                                <rect fill=\"white\" height=\"24\" width=\"24\"\/>\r\n                            <\/clipPath>\r\n                        <\/defs><\/svg><\/span><\/a> also known as a web spider, is a form of bot that indexes the content of websites on the web. They are utilized by search engines like Google and Bing for websites to appear in search results.<\/p>\n\n\n\n<p>OpenAI stated that the web crawler will collect publicly accessible data from the internet, but exclude sources that require paywalls, are known to collect personally identifiable information, or contain text that violates its policies. <\/p>\n\n\n\n<blockquote class=\"twitter-tweet\"><p lang=\"en\" dir=\"ltr\">Breaking \ud83d\udea8<br><br>OpenAI just launched GPTBot, a web crawler designed to automatically scrape data from the entire internet.<br><br>This data will be used to train future AI models like GPT-4 and GPT-5!<br><br>GPTBot ensures that sources violating privacy and those behind paywalls are excluded. <a href=\"https:\/\/t.co\/oR3kY4buaU\" target=\"_blank\">pic.twitter.com\/oR3kY4buaU<span class=\"wpil-link-icon\" title=\"Link goes to external site.\" style=\"margin: 0 0 0 5px;\"><svg width=\"24\" height=\"24\" style=\"height:16px; width:16px; fill:#000000; stroke:#000000; display:inline-block;\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" xmlns:svg=\"http:\/\/www.w3.org\/2000\/svg\"><use href=\"#wpil-svg-outbound-7-icon-path\"><\/use><\/svg><\/span><\/a><\/p>&mdash; Shubham Saboo (@Saboo_Shubham_) <a href=\"https:\/\/twitter.com\/Saboo_Shubham_\/status\/1688678363060121600?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">August 7, 2023<span class=\"wpil-link-icon\" title=\"Link goes to external site.\" style=\"margin: 0 0 0 5px;\"><svg width=\"24\" height=\"24\" style=\"height:16px; width:16px; fill:#000000; stroke:#000000; display:inline-block;\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" xmlns:svg=\"http:\/\/www.w3.org\/2000\/svg\"><use href=\"#wpil-svg-outbound-7-icon-path\"><\/use><\/svg><\/span><\/a><\/blockquote> \n\n\n\n<p>Website proprietors can block web crawlers by adding a &#8220;disallow&#8221; command to a standard server file. Three weeks before the release of the new crawler, the company filed a trademark registration for &#8220;GPT-5,&#8221; the successor to the current GPT-4 model.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img decoding=\"async\" width=\"1017\" height=\"307\" src=\"https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-1.webp\" alt=\"\" class=\"wp-image-56154\" srcset=\"https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-1.webp 1017w, https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-1-300x91.webp 300w, https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-1-768x232.webp 768w, https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-1-750x226.webp 750w\" sizes=\"(max-width: 1017px) 100vw, 1017px\" \/><figcaption class=\"wp-element-caption\">Instructions to \u201cdisallow\u201d GPTBot for ChatGPT users. Source:\u00a0OpenAI<\/figcaption><\/figure>\n\n\n\n<p>The application, submitted to the United States Patent and Trademark Office on July 18, concerns the term &#8220;GPT-5,&#8221; which includes software for AI-based human speech and text, audio-to-text conversion, and voice and speech recognition.<\/p>\n\n\n\n<p>However, observers may refrain from holding their breath for the next version of ChatGPT. Sam Altman, the founder, and CEO of OpenAI, stated in June that the company is &#8220;nowhere close&#8221; to beginning GPT-5 training, citing the need for multiple safety audits before starting.<\/p>\n\n\n\n<p>Concerns about OpenAI's data-gathering methods have recently been expressed, particularly regarding copyright and consent. Japan's privacy commission issued a warning to OpenAI in June regarding collecting sensitive data without permission, while Italy temporarily banned ChatGPT in April because it violated multiple European Union privacy laws.<\/p>\n\n\n\n<p>In June, 16 plaintiffs filed a class action lawsuit against OpenAI, alleging that the AI company accessed private information from ChatGPT user interactions. If these allegations are confirmed to be accurate, OpenAI and Microsoft, also named as a defendant, will violate <a href=\"https:\/\/en.wikipedia.org\/wiki\/Computer_Fraud_and_Abuse_Act\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">the Computer Fraud and Abuse Act, <span class=\"wpil-link-icon\" title=\"Link goes to external site.\" style=\"margin: 0 0 0 5px;\"><svg width=\"24\" height=\"24\" style=\"height:16px; width:16px; fill:#000000; stroke:#000000; display:inline-block;\" viewBox=\"0 0 24 24\" version=\"1.1\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" xmlns:svg=\"http:\/\/www.w3.org\/2000\/svg\"><use href=\"#wpil-svg-outbound-7-icon-path\"><\/use><\/svg><\/span><\/a>a statute with a precedent for web-scraping cases.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>ChatGPT users have the option of disabling the web crawler by including a &#8220;disallow&#8221; command in a standard server file. According to the company, OpenAI&#8217;s &#8220;GPTBot&#8221; is a new web crawling tool that could conceivably enhance future ChatGPT models. \u201cWeb pages crawled with the GPTBot user agent may potentially be used to improve future models,&#8221; [&hellip;]<\/p>\n","protected":false},"author":53,"featured_media":56155,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[21],"tags":[6803,15616,14990,15617,15425],"class_list":["post-56148","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news","tag-ai-2","tag-chatbots-2","tag-chatgpt-2","tag-gpt5","tag-openai-2"],"jetpack_featured_media_url":"https:\/\/coinscreed.com\/staging\/wp-content\/uploads\/2023\/08\/con-2.webp","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/posts\/56148","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/users\/53"}],"replies":[{"embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/comments?post=56148"}],"version-history":[{"count":0,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/posts\/56148\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/media\/56155"}],"wp:attachment":[{"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/media?parent=56148"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/categories?post=56148"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/coinscreed.com\/staging\/wp-json\/wp\/v2\/tags?post=56148"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}