WordPress plugin: Track ChatGPT Hits

Due to high demand I decided to make a user friendly version of tracking known OpenAI bot hits. This WordPress plugin tracks URL requests by the ChatGPT / OpenAI bots and direct user actions by tracking request made by specific user agents.

You can download the plugin at https://www.notprovided.eu/area51/track-chatgpt-v_1_0.zip – You can simply upload the folder track-chatgpt to your plugin directory via sFTP or use the WordPress plugin interface to upload the ZIP.

There are currently two known useragents and a small set of IP addresses which can be used to check if it are valid requests by OpenAI ChatGPTs. The plugin will show if the request where from a verified source (valid requests) or not.

The plugin shows a graph of the hits during the past 28 days. It has a download functionality to download the full dataset at once.

REST API is also enabled, so you can connect via /wp-json/chatgpt-tracker/v1/download-data/ and use automated exports to an external database to include the data into your monitoring dashboards. Its a simple dump of the full dataset. I may update this feature in the future if there is enough interest for it.

Any feature requests? PM me on LinkedIn or Twitter or leave a comment at the bottom.

For updates (if useragents or IPs change for example), follow me on LinkedIn or Twitter. I’m trying to get the plugin into the official WordPress repository as soon as possible which enables auto-updates too.

Tracking on the Edge

When there is a CDN or caching in between the requests and the server, you may want to use a worker to put it in between and catch the requests. Quick example;

// Cloudflare Worker Script to track requests with User-Agent "GPTBot" or "ChatGPT-User"

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request));
});

async function handleRequest(request) {
  // Extract the User-Agent from the request headers
  const userAgent = request.headers.get('User-Agent');

  // Check if the User-Agent is "GPTBot" or "ChatGPT-User"
  if (userAgent === "GPTBot" || userAgent === "ChatGPT-User") {
    // Handle the tracking logic here
    // Log the request, count it, send to analytics, etc.
    console.log(`Detected bot request from ${userAgent}:`, new Date().toISOString());

    // Insert logic for tracking, alerting, or handling the bot request. You could sent the request into Google Analytics for example
  }

  // Proceed with the request
  return fetch(request);
}

Frequently Asked Questions

How can I test if the tracker works?

The easiest way to test the plugin is to go to the ChatGPT 4 interface and request it to summarize one of your latest URLs on your website. Make sure it actually requests the URL. It could be Bing has already crawled and stored the contents so that will be used instead of visiting the live URL. Make sure it actually shows it is browsing:

You can also set your own browser with a useragent containing GPTBot or ChatGPT. You will notice those hits will be documented as invalid since the IP address will not match OpenAI’s ones.

Does it have any privacy related impact?

No, it doesn’t impact any privacy related matters since the plugin only tracks and documents user-agents and IP adresses from validated sources.

What is the difference between crawling and browsing behaviour?

More information about the behaviour of the different bots can be found on ChatGPT-User browsing and GPTBot crawling

Can I use this data in external reporting?

Yes, you definitely can. You can use the REST API. The plugin has a specific endpoint enabled.

Known issues / Feature requests

  1. Feature: Referral traffic: do people click on the mentioned URLs from chat.openai.com to your website
  2. Issue: When your site is using a CDN like Cloudflare, it reports their IP addresses

Changelog

= 1.0 =

  • Added a download functionality
  • Added a simple graph plotting the last 28 days of hits.

= 0.5 =

  • Basic functionality
“>

3 Comments

  1. Suggestie: time stamp of een ander aanknopingspunt delen om het te matchen met web analytics tools. Time stamp twijfel ik over omdat wellicht de klik maar de bron iets later komt na het schrapen.

    Reply

Leave a Comment.