Asking ChatGPT to Play a Synthesizer
I’ve been developing against OpenAI’s chat completion API — the RESTful version of ChatGPT — for a little while and up until now, I’ve avoided learning about the “function calling” feature. Honestly I didn’t quite get it and I didn’t see it as useful to my use cases, but with the announcement of the new Assistants API (in open beta at the time of writing) and improvements made to the function calling feature, I resolved to make a proof-of-concept. I decided to make an Assistant that could play a synthesizer upon request.
In this post, I will briefly describe the process of working with the Assistants API and how I used it. This will not be highly technical and I will not attempt to reproduce the documentation from OpenAI, which is already great. If you’re as uncertain about this feature as I was before embarking on this journey, I hope this simple real-world example will clear things up. Then, I hope you build something cool.
This post assumes a familiarity with ChatGPT (as a user or as a developer), some high-level concepts such as “chat prompts” and “models”, a teensy bit of JavaScript, and the word “grok”.
Grokking Assistant “Function Calling”?
In brief, an “assistant” in OpenAI is like a customized ChatGPT that many users can create threads on and use on their own (again, checkout the official docs for an in-depth overview). By “customized ChatGPT”, I mean the following:
- An assistant is created with a custom initial prompt to set the context of the assistant.
- Users of the assistant get their own “threads”, which can be further customized (for example, a user might have a “basic” thread, and others might have “premium” threads).
- A model can be selected (e.g.,
gpt-4-1106-preview
) - Assistants can retrieve knowledge outside the model.
- Assistants can be given reference material (e.g., files).
- Assistants can interpret and run code.
And the topic of the day: assistants can call functions. The basic idea behind the function calling feature, is you describe your functions to the assistant when creating or editing it, and then the assistant can call those functions, or rather, it can ask you to call those functions.
For example, let’s imagine that you have a travel assistant. You ask it for suggestions on activities to do for the day in the city you’re in and it comes up with something (let’s assume the model knows something about the city you’re in already). You might ask, “What are some weather appropriate activities that I can do in Tokyo today?”.
ChatGPT has no idea what the weather is in Tokyo today, but your application might have an integration with a weather API. You can describe a function in your application to your assistant, and your assistant will tell you when to call it. The workflow at a glance is something like:
- You create a travel assistant.
- You describe your weather-checking function to your assistant.
- Using the travel assistant, you ask “What are some weather appropriate activities that I can do in Tokyo today?”
- The assistant knows it needs to check the weather and it knows you have a function to do so, so it gives you a response that cues you to check the weather.
- Your code sees that the assistant would like to call a function with the parameter (“Tokyo”), and so you do.
- You return the weather report to your assistant.
- Your assistant waits for the weather report and then responds to the original question once it has the report.
If the assistant sees that it’s raining in Tokyo, it might offer indoor activities. Or, it might just tell you to pack an umbrella.
Getting It To Work
As mentioned above, this product is currently in beta and a little rough around the edges. There are several ways you can play around with it:
- Use “Assistant” Playgrounds: https://platform.openai.com/playground (super easy)
- Use the OpenAI node package: https://github.com/openai/openai-node (hard)
- Use the REST API: https://platform.openai.com/docs/api-reference/assistants (harder)
To do everything programmatically, there are a lot of functions you’ll need to call to create and edit your assistant, create a thread, use the thread, poll for this, poll for that, and so on. You will want to follow along with examples in the documentation to get this running, believe me. Thankfully, the docs provide examples in several formats to follow along with (e.g., cURL, Node). For those reasons and more, I recommend you start with the playgrounds.
Playgrounds provide a web-based way to mess around with creating and editing assistants. Behind the scenes they are interacting with the same REST API that you’ll eventually work with in the wild, and there is a super-nifty “logs” feature of playgrounds that shows every API call’s request and response as you interact with the assistant. In this way, you can see exactly how to build the thing on your own.
You can save yourself a lot of coding in your proof of concept by using the playground to create assistants. You can quickly tweak prompts and your function call inputs. Whenever a playground needs to “call” a function, it will halt the chat and wait for your input. Returning to our travel assistant example above, it would halt and ask you for the weather in Tokyo right there in the chat. You return it the weather and the chat resumes chatting.
I did most of the work for this project in playgrounds, but eventually I had to get at the API with good-old JavaScript fetch.
Asking for a Song and Getting a Song
By now, you can probably guess how I made an assistant play a synthesizer:
- I created a JavaScript synthesizer that can play notes.
- I created an assistant with the context of being a keyboard player.
- I told the assistant how to use my “play notes” function.
- I ask the assistant to play something (it searches its model and it produces a song).
Creating a Synthesizer
Admittedly, I did very little work on this front. Allow me to introduce you to one of my favorite things that Mozilla ever did: A tutorial for creating a simple synthesizer using vanilla JavaScript. I’ve played around with this thing many times and I was again glad it exists when I needed to find some JavaScript code that could play notes on a keyboard. It didn’t take much bend their sample into something that had a “play a note” function that my assistant could know about.
Creating an Assistant
There’s nothing magical about creating the assistant, though it might seem so at first. Here’s the prompt I used to set the mood for the chat:
You control a synth keyboard. You play songs by calling the “play_notes” function with notes ranging from A0 to C8 and duration in ms. Notes with only a duration can be used for pauses.
And here is the schema for my function call:
Calling on Behalf of the Assistant
Given the description of my function above, the assistant will surmise how and when to ask me to use it. Now when I ask it to, for example, “Play ‘Mary Had a Little Lamb’”, it will think for a second and eventually produce a response containing something like:
Which our good friend JSON.parse
will turn into a useful object for my function:
Once I’ve got the notes, I’ve got my couple functions built on top of my heroes’ code at Mozilla:
And Bob’s your uncle, we’ve got ourselves a tune:
Conclusion
I whipped most of this project up on a Saturday and didn’t spend a terrible amount of time on it, but I had a lot of fun doing it. Admittedly, I’m fairly familiar with working with the OpenAI API already and I hardly wrote any of the synth code, but even so, I think this goes to show how much can be done with so little, once you’re aware of the what’s out there.
All the API/function calls that are needed to get your assistant up and running, and then to get a user thread running, and then to check for function calls, and respond to function calls, is a lot of orchestration to keep track of it. Some of it feels a little awkward at first, but if you’ve got an idea, hang in there and build something cool.