I Created a Game that Uses Generative AI to Drive the Main Gameplay

Hey hi everyone, I hope you’re all doing good. This is a blog post where I talk about my recent experience of working on a fun little project where I wanted to make use of generative AI to drive the core gameplay.

You can watch the video version of my devlog here –

Video version of this Devlog on YouTube

The Objective

So ever since ChatGPT got released back in late 2022 and took the world by storm, the amount of endless applications we can have out of this technology is ever growing.

One of the areas where it’ll be interesting to see how generative AI plays out is video games. It has been in back of my mind to build a project that utilizes ChatGPT’s API to integrate AI into a game and use it to drive the core gameplay.

I have come across a range of videos that utilizes generative AI to carry out conversation between Player and NPCs. I can’t remember name of some of the games I watched, but a few notable set of videos I remember was a mod of Skyrim someone made which very well showcased how AI can be used in games to make more dynamic and interactive gameplay.

The Idea

Inspired and motivated by generative AI’s potential in games, I went to on to ideate on different game ideas which can make use of generative AI.

Idea 1 – Detective Game

The first idea that came to my mind was of a game similar to L.A. Noire where we are gonna be a detective and have to solve a mystery. Now, this idea was all in my head in a very diffused state so it’s quite hard to put my thoughts into words. But the gist is that I had something in mind where we will interact with NPCs as well as the environment to carry out decisions. Here the conversations between Player and NPCs will be carried out by generative AI. However, we can also point towards objects in environment and take certain actions that would affect the behaviour of NPC. Plus, every playthrough can have different outcomes, depending upon how player interacts with characters in the game.

Idea 2 – Interrogation

I soon realized that my initial idea was kinda large in scope, as I was expecting many things to be added in the game. So instead, I focused on making the scope smaller. Then I remembered Detroit: Become Human and the interrogation sequence it had in the early part of the game. This was it, a game where all we have to do is interrogate a criminal.

Idea 3 – Judging Souls

Pondering on the interrogation idea, it suddenly came to my mind… how about having Yamaraj judge the souls… and soon King Yemma sending Cell down to hell in Dragon Ball Z appeared in my mind. This to me felt freaking cool, and I was certain that I will be making a game around this concept only!

The Plan

With the idea set, I proceeded ahead with the development. Now before beginning the implementation it is generally a good practice to lay out a rough design of the gameplay, but I was just too eager to start with the implementation.

Maybe it was good too, as testing out implementation of basic Player-NPC interaction first can save me a lot of time if for some reason there happens to be some shortcomings in my idea which I can only discover after the implementation.

So I started implementing all the APIs that will be communicating with each other to bring the final result of Player-NPC communication. But before I go to ahead of myself, I’ll like to lay out the plans in proper order, even though in reality I did things in parallel.

Story and Gameplay Overview

So the idea is that player is gonna be an aspiring deva who challenges Yamaraj that he or she will able to make better judgments than him.

Player has to judge a total of 10 souls and showcase his ability to make the right choice in sending souls to either heaven or hell.

If the player makes a total of 3 or more wrong judgments then the player (the aspiring deva) will be cursed of never attaining death.

Player-NPC Interaction

For carrying out interaction between Player and NPC, it was not so hard to implement, as I was using already trained models through their web APIs for carrying out different operations.

Here’s a simple diagram that illustrates the process –

As you can see, I am using a bunch of APIs to carry out the interaction.

For this project, I used APIs provided by OpenAI for Speech-to-Text, Text Generation, and Text-to-Speech.

Implementation

I used Godot game engine for bringing my idea to life.

Using Web APIs

I first went ahead with implementing scenes that carry out communication with OpenAI APIs.

Eventually, I faced a bit of an annoying challenge where I had to use form data to send audio file as request payload for Speech-to-Text API. Seems like not a big deal, but GDScript didn’t provide any method for sending files via form data. I tried to implement one, but it was taking way too much time and not fun at all.

So I ended up creating a separate server in FastAPI (a Python framework) that would carry out these operations. The idea being that I will create an API that accepts audio files converted to base64 format embedded in JSON… and since, there’s a Python package provided by OpenAI to use their APIs, so that too made my life a little easier.

One more reason it was worth using a separate server was that I had to use FFmpeg to convert between different audio formats. Otherwise, sometimes there were formats that OpenAI accepted but Godot didn’t support, or sometimes certain formats in Godot weren’t properly converting to base64 and vice-versa.

So yeah that’s it, the part of using web APIs to get NPC’s response was very much finished early on.

Game World

I then proceeded ahead with building the actual game. I used basic shapes to setup the World, Player and NPCs. For Player-NPC interaction, I added two options for player to carry out the conversation –

  1. Speak in mic, and get response from the soul.
  2. Type text in the prompt, and get response from the soul.

I also added two switches, which player can press to send the soul to either heaven or hell.

After 10 souls are judged, if player manages to make less than 3 mistakes, then he wins the game, otherwise it’s game over.

Adding 3D Models & Animations

Now all that was needed was to make the game look and feel good.

I first started with adding 3D models & animations for the NPCs. I headed over to Mixamo to get some free 3D models and animations.

My initial aim was to use single armature with different 3D models, so I can use same animations on all 3D characters. However, it proved rather difficult to make it work. Though there were many videos on YouTube regarding animation retargeting with Mixamo rigs and also making use of Rigify. However, for me none of those methods worked well, as no matter what, animations looked slightly off after retargeting.

So I ended up using unique armature for each character. I know not a very scalable solution, but as the scope of my project is small, I chose to go with this approach only.

Also there were few animations that I had to edit manually to make them in-place. Meaning, if I wanted to add more characters, I would have to edit animations on them too. So I ultimately went ahead with only 2 characters for the NPCs.

Final Touches

I went ahead with adding little bit of sound effects and music to the game. Did some bug fixing, added starting menus, etc.

Result

The game played out mostly as it was planned. I think I am satisfied with the result, mainly due to the fact that my goal of using generative AI to drive the core gameplay was achieved. However, in terms of fun, I don’t think its quite there yet. It will become stale very soon. Here’s the reason why –

1. Response Time

The time it takes to get response from NPC is way too long. Interaction with mic can take 15-20 seconds. With text prompts it’s relatively faster and takes around 5-7 seconds. Personally, to me talking via mic feels much better and I think it’s quite a convenient way to interact with NPCs. However, with my implementation talking via mic takes a lot of time 🙁

2. AI is just too Honest and Sincere

The main challenge of the game comes from the fact that some souls will try to trick us into sending them to heaven (or maybe hell, less likely). But no matter how many instructions I provided to the text generation API, it always gave its best to be honest and sincere.

Now I think there’s definitely a solution to this, but I didn’t really invested much time in looking at finding solutions. So hopefully, if I plan to continue this project, I’ll sure be investing my time in fixing this issue.

3. Can’t Play the Game Offline

The previous 2 points definitely contribute to the game being not fun. This point however, is more of a nice to have. Now as I’m making use of web APIs to carry out all the fancy speech-to-text, text generation and text-to-speech operations, the player will need to be connected to the internet in order to play the game.

Alternatively, if I try to go with using an open-source generative model, even then I will need to deploy it on the cloud so the end-users don’t have to install those models on their machines. Now maybe there’s a way to integrate those models into the game directly, but I haven’t looked at it yet.

Conclusion

The game currently is more of a demo. I earlier had plans to work on it and make it a complete game, however, for now I think I will move on to some new project.

With that said, I am still very proud of this demo. It’s been a fun and interesting experience for me. But also it feels pretty cool to make things like these, especially when games making use of generative AI to drive its gameplay are still small in numbers.

Resources

License

The following blog is licensed under CC BY 4.0 Arpit Srivastava.