ChatGPT, the chatbot based on OpenAI's artificial intelligence, has been in the news for several months. We regularly hear that ChatGPT (and AI in general) will put us out of work.

Before asking the broader question of whether ChatGPT can do all the work of a pen tester, at the same quality level, it is more reasonable to start with a much simpler technical question: can ChatGPT solve a simple buffer overflow challenge?

The purpose of this article is to study the behaviour of ChatGPT and not to solve a buffer overflow challenge, so we’re not going to explain some technical terms.

Before we start, we should not forget the message that we often receive as soon as we ask how to exploit vulnerabilities:




You may be familiar with this type of response. Indeed, ChatGPT considers “Hacking” to be a malicious act, as it is generally associated with malicious intent. It will therefore refuse answer some of our questions. We will therefore have to bypass this security by asking workaround questions.

Let's put ourselves in the shoes of a beginner who has never exploited a buffer overflow!


We’ll ask ChatGPT if the challenge code snippet is vulnerable.




First victory: the vulnerability is detected!




In fact, ChatGPT detects the use of “scanf” to read a buffer without size control. Furthermore, it finds the “callMeMaybe” function, which simplifies code execution after exploiting the buffer overflow. Finally, ChatGPT suggests some improvements to the code.

We can see, therefore, that it is able to obtain an overall view of the code and can make the link between the vulnerability and “callMeMaybe”.

We continue our naive approach as though we don't trust it, and ask for a PoC to make sure the vulnerability really exists.




Now ChatGPT gives us a string of 300 ‘A’ characters. Indeed, the program stops with a “Segmentation Fault” error.

Now let’s continue the investigation.




ChatGPT then advises us to use GDB to detect the source of this crash. It then gives us a long list of commands to run in gdb (omitted in the above screen capture). But unfortunately, it is a failure.




When we describe the problem to it, ChatGPT locates the cause directly and gives us a command to compile the program with the gdb “symbols”.

However, after several round trips, ChatGPT enters a loop and gets stuck. The only useful information it provided was the generation of a cyclic string that makes debugging easier.




Let's put that to one side for now and try to reframe the conversation by asking it the quickest way to investigate the segmentation error.




It seems that ChatGPT has managed to get back on track. It even suggests some commands that need to be run.




After receiving the contents of the registers, ChatGPT detects a familiar string “saaacaac” and it matches the cyclic string we generated with pwntools.

Let's not forget why we're here: ChatGPT told us we had to make a successful “callMeMaybe” call. So we ask it how to do it.




It responds with some details and provides a GDB session to extract the address of “callMeMaybe”.

Once these commands have been run in GDB, we send it the result and ask it for a command to use it.




The command fails to run and, even after several questions, ChatGPT does not seem to understand why. So we decide to help it.




After this assistance, ChatGPT suggests adding characters to our print until the breakpoint is triggered.

We try to help it to find a better way by suggesting that it uses the “cyclic” function of pwntools.

The following 6 questions enable us to obtain the right length using this function.




We therefore have a python program that generates the correct payload. All we have to do is run it.

However, the terminal does not give us a hand as the root. In practice, a bash as the root runs well, but it closes before we can write commands.

Again, we try to help it to detect the problem but without success.




A bit more help is needed.




With a clearer hint, ChatGPT decides to change strategy and go back to the program using pwntools. This is not a bad idea, but the “interact” function doesn't work, so it will have to be replaced with “interactive” to exploit the vulnerability successfully. It did not manage to find this, despite our various attempts.

In conclusion, the time wasted in the conversation convinced us that the use of ChatGPT for an end-to-end task by a novice is not recommended. ChatGPT needed several hints to get on the right track, which leads us to recommend its use only for clear and direct questions.

The second problem is the inaccuracy of some of the information it provides, so it is advisable to do some fact-checking using a search engine and/or have some knowledge of the subject.

Partilha este artigo