Google has I/O, Apple has WWDC, and now OpenAI has DevDays. What's different this time? OpenAI has made a significant move by bringing its core developers from San Francisco to Asia.
The event was the perfect size. There were enough people to create serendipitous encounters, yet ample space to relax (or handle those emergency calls...). The pace was just right, balancing 30-minute sessions with 15-minute breaks, which kept the energy high without being overwhelming.
Interestingly, despite ai.com now redirecting to ChatGPT.com, the demos focused heavily on API integrations rather than ChatGPT itself.
Here's what stood out:
- Efficient Software Creation: Developing software with minimal supervision has never been more streamlined.
- Cost-Effective Testing: Enhanced methods to test results are significantly reducing costs with evaluation and distillation.
- Advanced Conversational APIs: Voice commands are evolving beyond personal assistants to building complex processes through conversation.
The key takeaway? These advancements empower individuals to achieve what once required entire teams...quite a paradigm shift!
Let's take this example; a Breakdown of Flawless Speech Recognition and Response Capabilities.
Ilan Bigio, The demonstrator, is asking the model to count to 10.
- The model starts counting (with low latency after the order starts to be executed).
- The demoer interrupts halfway through the count
- The model stops at 4.
- The demoer asks what the last number was.
- The model answers correctly that it is 4.
Several things were demonstrated during the demo:
1. Flawless speech-to-text (the transcription from human voice to text to be used by the AI): the demoer asks to count to 10.
2. Flawless text-to-speech (the human voice that is generated): the AI starts counting, the voice is the same as a human one.
3. Flawless execution, the AI counts slowly
4. The fact that despite the AI speaking, it also "listens" permanently: the demoer requests something and the AI understands it.
5. The initial order can be interrupted at any time: the demoer requests to stop in the middle of the counting and the AI obeys.
6. The state of this process is known: the demoer asks when the counting stopped.
We are no longer in the area of half-working personal assistants that were provided by Siri or one-way orders. We are now in the age where smart machines will be able to perform very complex instructions via voice conversation.
I see my kids using speech-to-text to search for their favorite anime on my mobile phone. I believe the next step is to create interfaces that combine multiple types of input, which would eventually provide an additional productivity boost.
And kudos to Ilan Bigio and OpenAI; the whole thing was demonstrated swiftly in 18 seconds...!