ChatGPT has been the driving force in 2023’s AI boom. It has become somewhat synonymous with generative AI models, as one of the most powerful large language models (LLMs). Its accessibility to the public through its free model has given hundreds of millions of people the opportunity to use sophisticated AI assistance in their daily lives.
The ChatGPT API has sparked a new era in human-machine interaction, making it easier to build a powerful language-based search for almost anything – including music. Many companies have since set up their music search using text prompts.
Despite its convenience and user-friendly platform, there are risks to using the platform for your music search. First, ChatGPT’s ability to get access to valuable music data for free is concerning. Second, there are concerns around OpenAI’s dependency on Microsoft. Third, ChatGPT’s progression to a monopoly will make it difficult for other genAI companies to set the tone in the market – and give OpenAI a uniquily strong negotiation position. Finally, there are ethical concerns about ChatGPT’s use of training data.
Using ChatGPT in Your Music Search
Before exploring these risks, it is useful to see how and why people are using it in their music search, as there a couple of ways in which it is being used.
- In the first scenario, a user tells ChatGPT to translate text prompts into any given tagging taxonomy. For instance: “I need a song that sounds like Pirates of The Caribbean and it needs to feature a trumpet”. ChatGPT does a reasonably good job of translating it into keywords such as orchestral, adventurous, swashbuckling, dramatic, and trumpet. Based on these keywords, the user can search for suitable content. It’s a fairly straightforward music search but only works for very limited use cases.
- The second scenario requires a pre-trained AI system that extracts comprehensive information from audio files, in the same way that ChatGPT extracts information from text prompts. Based on this, users can build a more seamless prompt search, comparing the text embedding to the audio embeddings in their music catalog. It’s a bit harder to build but translates into better results.
The Data You Feed ChatGPT
Whenever someone uses ChatGPT for music search, they are essentially providing ChatGPT with free access to valuable data; teaching it how humans describe music, what we find particularly important in music, how we perceive it, and how it makes us feel. Every music search prompt is essentially a glimpse into our musical minds.
OpenAI, ChatGPT’s parent company, has already released generative AI models that can generate music. These systems are operated by text prompts, but unlike ChatGPT, they generate music instead of text.
It is generally believed that generative AI in music such as Stable Audio, Google’s MusicLM, or Meta’s MusicGen lacks sophistication compared to human music creations – not only in terms of their sound quality but also how well the music fits the prompt. This is due to the training data that is available. These systems need full-text music descriptions and the corresponding audio files. The more complex and detailed the description, the better. But it is usually very expensive and time consuming to create or acquire this data, which is why genAI in music is still lacking the quality of LLMs.
However, conventionally tagged music is widely available, and enough to train good genAI models. The data gathered from a prompt-based music search can help OpenAI make much better connections between full-text prompts and tags, and thus generate suitable corresponding music to the text prompts.
For instance, if a user describes a film scene in a search prompt: “Give me a song that fits well to a scene where someone walks along Route 66 with their thumb out trying to hitchhike” and get underwhelming results, the user would add specificity, such as “sparse, blues, slide guitar”. In this moment chatGPT has made the connection between the film scene and the music tags.
Political Power-Play and OpenAI’s Dependency on Microsoft
Generating the amounts of text, as ChatGPT does, is very computationally heavy and expensive. According to reports, its daily operating cost is close to three-quarters of a million US Dollars.
Through their cloud service Azure, OpenAI’s biggest investor Microsoft is financing 99% of the cost. It’s not unreasonable to assume that eventually Microsoft will want to see a return on its investment, and that could mean a price increase for users of ChatGPT.
To make matters worse, the recent turmoil at OpenAI which saw CEO Sam Altman being dismissed only to be reinstated in his position days later raises questions about their internal cohesion and strategy. While things appear friendly on the outside, it is not far-fetched to imagine that this has led to an even bigger divide between OpenAI and Microsoft, which will undoubtedly have repercussions for the ordinary users of ChatGPT.
Progressing Monopolization
ChatGPT is undoubtedly the leader of the pack in the textual genAI game. Sure, competitors and tech journalists are keen to convince us that models such as LLaMA, Gronk, Gemini will lead to ChatGPT’s demise, but I’m not convinced. They may be formidable models, but it is unlikely that they can generate similar public attention and user numbers in the same amount of time.
This bears a substantial risk. There is a finite amount of training data on the internet. Most of the models above are mostly trained on the same information, hence they generate comparable answers to prompts. This is particularly problematic for music-related searches, which makes it harder to differentiate between the different LLMs.
To achieve differentiation, companies need to acquire data sources that will enable unique answers from their AI model. The only way this is possible is if they are generating training data proprietary to their company.
One of the most scalable ways is to harvest information from user interactions, which is why the elevated use of ChatGPT is a cause for concern. It stands far ahead of the competition purely based on the amount of accessible data it holds.
There will be a big difference in negotiating power for the music industry if one instead of several genAI companies are setting the tone for the entire market.
Ethical Concerns
Finally, there are significant concerns around the ethical use of training data. Many music companies were up in arms about genAI models because they claimed they were trained on unethically sourced datasets. Universal even urged Apple and Spotify to block genAI companies from accessing training data through their APIs.
The recent lawsuit from The New York Times against OpenAI does not give the impression that the training data was sourced with the consent of the copyright owners. In dubio pro reo, but what kind of message does it send if you are a music company that condemns using copyrighted music for training while using AI that was likely trained on unlicensed copyrighted text?
Conclusion
Text prompts will likely become the incumbent way of interacting with machines. And the music industry should not close its mind to this. However, when adopting this new trend, there are risks to consider. Using systems like ChatGPT can be a quick and easy way to let users search for music. However, there are clear downsides of giving up valuable data for free, whiling paving the way for a monopoly in this important area at the same time.
Apart from ChatGPT, there are search systems that allow for natural language search specifically for music. For instance, Azure OpenAI promises to not use any user data for retraining yet it’s economic feasibility remains a question mark.