AI Architect by MindStudio
Posts
ChatGPT & Meta AI Can Talk + MindStudio API Updates

ChatGPT & Meta AI Can Talk + MindStudio API Updates

In the last couple of weeks, we got new models from Google, Meta, and OpenAI. Voice Mode was released to all US users and worldwide users with a VPN.

Giorgio Barilla
September 27, 2024

You're receiving this email because you registered for one of our workshops. You can unsubscribe at the bottom of each email at any time.

o1-preview and o1-mini are now available to all Individual and Teams accounts and, starting from this week, you can initialize variables in the API without the “launch” syntax and return the result in JSON. Learn more here.

In other news, the tech world was crazy active in the past two weeks. We got new Gemini Models (1.5 Pro —02 and 1.5 Flash —02), Meta (Llama 3.2, multimodal), fast and iterative updates to the o1 models via API, and finally… VOICE MODE!

ChatGPT and Meta AI can now talk, and the results are very impressive.

Continue reading to learn more!

Resources for Pros

Watch our new use cases playlist for agencies and implementation specialists

Try out the MindStudio Trainer v2 - available to all users

Which Model to Choose for Your Next AI Build

Learn to use RAG in MindStudio: when is it optimal and when isn’t

What’s coming next

More types of data sources and data retrieval techniques (e.g. GraphRAG)

Workspace-level knowledge sources

(maybe) an AI designer template

As a reminder, we’re now welcoming partners that want to build AIs for their clients. Sign up for extra support, training resources, and more here.

🗞️ Industry news

OpenAI is king of the hill again… but its leadership is struggling

TLDR: Voice Mode is awesome but relegated to ChatGPT for now - no API. It’s a novelty item with limited practical utility, but it’s a glimpse into how our interactions with AI will change going forward. The update comes together with news of OpenAI going for-profit and a massive portion of the leadership team resigning at the same time, including ex-CTO Mira Murati which also served as CEO when the board outed Altman back in 2023.

Wow, I never thought I’d say it again but OpenAI is back to frequent, groundbreaking releases without 15 safety blog posts per week and “hints” in Sam’s posts on X.

To recap what happened since we last talked:

Voice Mode

Voice Mode is now generally available to everyone outside of the EU, UK, Liechtenstein, Switzerland, Iceland, and Norway. Circumventing this limit is very easy, all you need is a US VPN. Proton offers one for free;

The new voice mode is available for plus, teams, and enterprise plans only, and it’s working within the ChatGPT interface. This is not a new text to speech model, at least for now, so you can’t use it through the API. Once they turn this into an API endpoint, we’ll be happy to add it to MindStudio!

Here’s what I found out from my testing:

Voice mode is incredibly fun. Like, first time using ChatGPT kind of fun. I loved it, all my friends loved it. It makes you laugh, think, and test it. It’s a truly enjoyable experience;
It’s a novelty item. Once you get past the initial excitement, there’s not much you can do with it that you couldn’t do with the previous Voice Mode. Contrary to the presentation in the Spring Event, the current Voice Mode can’t see, so it can’t know what you’re doing, it can’t sing or replicate any sound, and it’s very limited in what it can say or talk about;
It’s scary. If you have older parents or grandparents, this is the first time you’ll feel like this is genuinely scary for the future of voice calls. The time might have come to set up “safe words” to ensure we’re talking to who we think we’re talking to.

OpenAI is not open and it’ll be a for-profit company

OpenAI was supposed to be a non-profit research lab, but they’re far from that original idea at this point. What was once a research lab is now the highest valued AI startup in the world with mass consumer products like ChatGPT.

This week, rumors started circulating and it looks like OpenAI will turn into a for-profit company sometimes in the future, probably sooner rather than later. Sam Altman debunked the articles claiming he’d get a 7% equity stake at a $150b valuation, trying to steer away from the personal gain.

At the same time, Mira Murati (ex-CTO) resigned, and with her other two key components of the OG founding team at OpenAI. The leadership is now more confused than ever before, including when the board fired Altman back in 2023.

o1-preview and o-1 mini top the charts

OpenAI o1-preview and o1-mini now top the chart in the leaderboard arena and in the artificial analysis benchmarks. This isn’t surprising, as the models can “reason” and outperform most competitors like Gemini 1.5 Pro and Claude 3.5 Sonnet (although the latter might still outperform o1 in coding).

The team also increased rate limits, which is why we’re now happy to announce the models are available in MindStudio. Give them a go!

Meta announces project “Orion”, Llama 3.2 Multimodal, and a 1b version capable of running on smartphones

On Wednesday this week, Meta hosted their yearly Meta Connect event where Mark Zuckerberg presented the new $299 Quest 3s headset, Llama 3.2 Multimodal 1b, 3b, 11b and 90b, the first holographic glasses using the project name “Orion”, AI Studio & AI voice for the Meta AI platform, now likely to be the most used AI assistant in the world.

My two cents on the event:

Llama 3.2 is pretty much the same model as 3.1, but multimodal. This is very important given how crucial vision is to true intelligence and to match up with the closed models from Google, Anthropic, and OpenAI;
Llama 3.2 1b is probably the most interesting of the bunch for common people looking to run models locally. Up until now, you could run models on your device, but your device had to be a pretty powerful machine. With the 1b version, Llama 3.2 can run on any consumer device, including cheap laptops and phones;
Meta AI Voice is impressive, but it’s on par with Gemini Live rather than Voice Mode. ChatGPT Voice Mode is by far the best speech-to-speech interaction you’ll have this year;
Meta will start auto-transcribing and translating all content into multiple languages, starting from English <> Spanish. This includes lip syncing, which means creators will be able to reach many more people without having to learn a new language!
Project Orion is very cool. Basically a Vision Pro without the headset, just fairly normal looking, albeit chunky, glasses. However, it’s a prototype. We’re unlikely to see any meaningful updates to this in the next year or two, and it definitely won’t touch the shelves until 2026 or later;
AI Studio will let you create characters for Meta Voice, or you can use one of the predefined templates from popular voice actors.

All in all, a very interesting event, especially if you’re into AI + VR. I’d suggest giving it a quick watch here.

Zapier, Slack, and Squarespace announce new AI plans

Zapier, Slack, Salesforce and Squarespace all hosted events filled to the brim with AI announcements. Here’s the TLDR:

Zapier packaged up Interfaces + Tables + the base platform to provide a more streamlined plan to work with AI;
Salesforce announced AgentForce, a new AI-building toolkit that can integrate in Slack and do agentive tasks for you. These feel similar to Microsoft Copilot Studio agents;
Squarespace has a new design intelligence layer that helps build stunning websites and sections using AI.

Regardless of the company, it’s clear we’re moving into “phase 2” of AI, where the more invisible it becomes, the better. Instead of slapping users in the face with AI everywhere, companies are starting to use AI to manage their pipelines, operations, workflows, and improve their productivity.

This is a shift we feel and spouse at MindStudio too.

🔥 Product Updates

Structured JSON and Cloud Functions

Welcome to AI cloud functions, a new paradigm in the MindStudio platform. Using the latest features in the API and workflow manager, developers can now build AI-powered functions to execute tasks like comment moderation, image analysis, or human-in-the-loop pipelines.

The new API comes with:

Run other apps within the editor: You can now run other MindStudio apps and workflows from within any app. This makes it even easier to interconnect all your apps in the workspace together;
More integrations: We’re in the process of getting approved in the Make.com marketplace;
Launch Variables: You can now initialize all variables in the start block and skip the special syntax (“$launchVariables→variable”) altogether. Simply initialize the variable at the beginning and refer to it as usual later on;
Structured JSON outputs: stop wondering what format your AI will respond in. The result can now be saved in a proper, JSON-safe format so you can take the response and use it in your codebase or automation platform easily.

Learn more about the API and check out our tutorial here. We understand this is a big shift in how you think about MindStudio workflows, and we’re actively working on new resources to help you build with the API:

MindStudio AI Functions: copy-pastable functions you can use in any workflow. These will be hosted on the learn page of https://mindstudio.ai and include many popular use cases such as checking whether a comment is spam or written in another language;
MindStudio AI Packages: an app you can remix to get access to a set of AI functions. We have one ready for content moderation coming early next week;
Interactive Demos: with everything changing so rapidly, we understand you want constantly up-to-date documentation. While videos are great, they’re very hard to update. Going forward, we’ll use a mix of videos and interactive demos to help visual learners grasp new concepts quickly! Here’s an example of an interactive demo.

💡 Tip of The Week

Don’t obsess over tokens, and use the context window of models like Gemini 1.5 Flash to test out massive prompts versus chain prompting.

There are two core strategies to prompt an LLM:

Chain prompting: multiple prompts, one after the other, that focus on one task. For example, if you’re writing a blog post, you might want to generate the title first, then the outline, then the actual sections;
Mega prompts: these where the only option at the beginning. They’re usually massive, multi-pages prompts with a huge set of instructions for the LLM to follow.

Realistically speaking, you’ll likely need both in your workflows.

Nowadays, models can ingest a huge number of tokens, and while it can sound scary, cost fell dramatically over the last year and is not that big of an issue anymore. If you want to save money, try all workflows with something like Llama 3.1 8b or Gemini 1.5 Flash. They’re so cheap you’d struggle to spend over two cents even for significantly large prompts.

Remember, AI is not a chatbot, and MindStudio is your personal IDE that helps craft your way to professional custom AI solutions. Don’t limit yourself and keep building!

🤝 Community Events

You can register for upcoming events on our brand new events page here.

Our new webinar series is up on there as well, with the following on-demand webinars:

MindStudio for Partners: Level Up Your Agency With AI Workflows (new video!)
Customize AI Applications With Your Data

Plus, we have new weekly and bi-weekly events:

Thank you for being an invaluable member of our community, it’s always great to see many of you join multiple workshops 🔥

If you’re interested in any topic in particular, feel free to reply and I’ll do my best to include it in the next releases. We’re going to update all of these soon.

🌯 That’s a wrap!

Stay tuned to learn more about what’s next and get tips & tricks for your MindStudio build.

You saw it here first,

Giorgio Barilla
MindStudio Developer & Project Manager @ MindStudio