AI Architect by MindStudio
Posts
OpenAI o1 (Strawberry) is here + Ideas to use MindStudio's API v2

OpenAI o1 (Strawberry) is here + Ideas to use MindStudio's API v2

After a long hiatus, OpenAI is back with a new model. o1 and o1-mini can "think" for minutes at a time and reached PhD level responses in STEM subjects.

Giorgio Barilla
September 13, 2024

You're receiving this email because you registered for one of our workshops. You can unsubscribe at the bottom of each email at any time.

This week, we released the v2 of our API. With v2, your API will now be tied to your workspace, and not your apps. With one call, you can interact with all apps and workflows in your apps!

On Thursday, OpenAI ended their long hiatus by releasing OpenAI o1 and o1-mini, dropping the GPT name indicating a completely new paradigm. Sam Altman is looking to raise $7b at a $150b valuation, ChatGPT reached 11m paid users, and Replit & Cursor launched o1 and o1-mini in their AI coding assistants.

While the newsletter will be somewhat of an OpenAI exclusive, it’s worth noting Mistral also released a new model through a magnet link on X (likely indicating it’s open source) and Hume introduced EVI 2, a very impressive voice-to-voice empathetic AI model.

Continue reading to learn more!

Resources for Pros

Watch our new use cases playlist for agencies and implementation specialists

Try out the MindStudio Trainer v2 - available to all users

Which Model to Choose for Your Next AI Build

Learn to use RAG in MindStudio: when is it optimal and when isn’t

What’s coming next

More types of data sources and data retrieval techniques (e.g. GraphRAG)

Workspace-level knowledge sources

(maybe) an AI designer template

As a reminder, we’re now welcoming partners that want to build AIs for their clients. Sign up for extra support, training resources, and more here.

🗞️ Industry news

Welcome o1, the first AI model with CoT and PhD-level reasoning

TLDR: o1-preview and o1-mini are the new models released yesterday. Both are available for ChatGPT plus subscribers. The new family of models “thinks” before answering and reached PhD level test scores on STEM subjects. Responses take longer, and these models shouldn’t be used for generic chatting. The new Chain of Thought paradigm is likely to fire up competition, and we might see Gemini 2 or Claude 3 Opus soon compete with the new infrastructure.

On Thursday morning, a few AI enthusiasts received an email from The Information hinting OpenAI would release the “Strawberry” model by the end of the week. Given OpenAI doesn’t typically release anything on a Friday, that meant we’d see it during the day, likely at 10am PT.

That’s exactly what happened. Sam Altman broke the silence together with his team of engineers and everyone’s feed got inundated with o1 and o1-mini news.

o1 and o1-mini are a new kind of AI model, and that’s why they don’t use the “GPT” name. They’re large language models, but they “think” or “reason” before replying. The assumption here is that the longer they think, the better the end response will be.

o1-mini and o1-preview are available right now in ChatGPT Plus and through API with severe rate limiting. This is a breath of fresh air after so many promised products we’re still waiting for months later.

It’s important to note o1 and o1-mini are truly new models, the likes we probably haven’t seen since GPT-4 in November 2023 (but I’ll wait for third-party evals before confirming). GPT-4o, 4o mini, and the variations of GPT-4o were marginal improvements in reasoning that didn’t really bring OpenAI any closer to AGI. o1 and o1-mini aren’t AGI either, but you WILL see a significant difference, and their internal data proves these models are much better than GPT-4o at STEM subjects.

Artificial Analysis reports GPQA results for o-1 preview, leapfrogging Claude 3.5 Sonnet

Of course, I tested the model a couple of minutes after release like any nerd that’s been following OpenAI for years, and here are my takeaways for now. Keep in mind these are very much first impressions, and I’ll share more in the next newsletter and once the model gets into MindStudio:

This is a true, massive, update. o1 and o1-mini appear to be leagues above other models in solving for complex math equations, reason on controversial topics, and plan out entire reports. OpenAI’s internal research concluded these models match up with near-PhDs in STEM subjects. Chain of thought is not new, and people have been mixing and matching models to reach these results for a long time, but o1 seems to be on a different level;

from OpenAI’s research paper “Learning to reason with LLMs”

These models are terrible for chat interfaces. Many AI beginners will be disappointed to learn these models aren’t meant to be used in chat interfaces. They require minutes to reply to a single query, their output is huge (up to 65,536 tokens for o1-mini), and they’re severely limited. ChatGPT Plus users get 30 chats per WEEK on o1-preview and 50 on o1-mini;
o1-preview is weird. OpenAI’s own chart suggests o1-preview performs worse than o1-mini while costing much, much more. It doesn’t make much sense, and in my limited testing I did find o1-mini performs very well in reasoning. I believe the main difference is the output engine. o1-preview seems to match GPT-4o quality, while o1-mini is less creative;

o1-mini better than o1-preview

o1-preview is 4x more expensive than GPT-4o and o1-mini is 20x more expensive than GPT-4o mini. o1-mini might be available in ChatGPT free, but o1 likely won’t. Currently, only Claude 3 Opus is more expensive than o1-preview;
These models require a very high amount of compute to “think” (chain of thought), and some on X are wondering how bad this can get if they reason for minutes, days, or potentially weeks. If we find that the answer gets better the more it thinks, then the energy consumption could become seriously problematic - much more than it currently is;

OpenAI’s o1-mini “thinking” and showing the steps

OpenAI clarified they’re targeting STEM subjects with these models, and that generic chats are still better with the old GPT-4o and GPT-4o mini, which are also more cost effective. Imagine using these to help doctors diagnose disease, project financial outcomes from investments, invest in the stock market, or manage the power plants of a city. This is not a model to have random chats about your day;
o1 in ChatGPT doesn’t get live access to the web, document upload, or custom GPT support. The API is also limited: no fine tuning, no JSON function calling.

The o1 family will change the strategy of all other players as well. Just like GPT-4 started a competition to match its performance, o1 will fire up Meta, Google, and Anthropic. I’m now expecting Gemini 2 and Claude 3 Opus to come much sooner, and Meta might want to spruce up investment in Llama 4.

o1 might be Sam’s gateway to new funding. He’s been talking about a new raise, estimated to be around $7b at a $150b valuation. The release spurred ChatGPT subscriptions, now reaching 11m, and usage. All good metrics to have weeks before a huge round.

Overall, you should keep expectations in check because this model isn’t necessarily more creative than GPT-4o and actually scored worse than the old family in some preferability tests. What the new family excels at might not matter for your workflow, but it matters for the advancements of AI as a whole and the new chain of thought paradigm.

Hume Introduces EVI 2, the best voice-to-voice model while you wait for Voice Mode in ChatGPT

Hume released EVI 2, their new foundational voice-to-voice model. This is basically voice mode with a focus on empathy.

The new model is much better than EVI 1, which frequently lagged and didn’t let you interrupt it. EVI 2:

Lets you interrupt. Fast inference and better UI make it easier to have long and multifaceted discussions with EVI 2;
New voices. The updated voices and tones are a refreshing positive note. EVI 1 felt clunky, EVI 2 is very natural sounding in nearly all emotions (excited, scared, happy, sad, etc);
The company released EVI 2 small, and will release the full scale model soon;
EVI 2 is incapable of cloning voices without modifications to its code. This is by design.

All in all, this is a great release from the Hume team, which deserves kudos for one of the only valid experiences of voice-to-voice. And no waitlist.

Mistral releases a new magnet link for “Pixtral 12b”

In full Mistral’s fashion, the company released its new model with a direct magnet link on X. This also suggests the new model is likely open source, contrary to the most recent Mistral Large 2.

We know very little about this new model. Some users suggest the name might be “Pixtral 12b”, potentially replacing Nemo and competing with GPT-4o mini.

The arena for low-level intelligence is crowded and full of awesome models like GPT-4o mini, Gemini 1.5 Flash, and Llama 3.1 8b & 70b. Mistral might have a hard time spreading the word, and the o1 announcement greatly overshadowed this release.

I’m looking forward to testing the model and we will probably add it to MindStudio in the upcoming model releases.

🔥 Product Updates

Overview panel of API v2

This week, we released v2 of our MindStudio API.

Here’s what’s new:

Workspace-Level API Management: Manage your API settings directly from your workspace, with access to high-level statistics on successful and failed requests;
API Keys Management: View, create, and delete API keys easily. Track when each key was created and last used;
API Logs: Access detailed logs of API requests, including request and response data, HTTP status codes, and source information;

Debugger Improvements: API requests are now visible from the debugger and tagged as “API” for easy identification during development;
Help & Documentation: Access sample requests and documentation directly from the API settings in your workspace.

If you’re currently using the API, you’ll need to update it to v2 before October 11, 2024. All Zaps need to be updated to MindStudio 1.0.3 as well. Please do it as soon as possible to avoid losing connection.

For the visual learners out there, Luis recorded a great video overview of API v2 to showcase the improvements and the vision for the future. You can find it here.

💡 Tip of The Week

8 MindStudio flows working in Google Docs

The new API made it much easier to integrate MindStudio apps and workflows in the apps you already work in. For example, I integrated 8 MindStudio workflows in a Google Doc (image above) using Google Apps scripts. To go deeper into a few of these:

Generate an image: the script runs, calls a MindStudio flow using Ideogram v2 to generate an image, and returns the image to Google. The Google Apps script also takes care of rendering the image;
Fact check: for this one, MindStudio’s flow uses Perplexity to search the web for the selected text and fact check the text

Fact checking the selected text

Write a section: this one renders a modern Tailwind-based form to capture the topic, tone of voice, and desired length for a new section of text and writes it out with Gemini 1.5 Pro;

The form appears in a popup

All flows will generate the correct asset in Google Docs (e.g. the write a section generates HTML to render titles and formatting, generate table will follow the required format to create a real table in Google Docs, and so on).

Building Google Apps Scripts to interface with APIs is quite easy, and I did it all with Claude 3.5 Sonnet in MindStudio. It took a bit of time, around 1 hour, but I think the result is pretty great for a non-coder.

Google apps are not the only places where you can call an API. Since the release of V2 I also made MindStudio flows run in Coda documents, Toddle.dev apps, Bubble.io no-code apps, and more.

While it requires more technical knowledge than an embed, the API gives you an unparalleled level of freedom to design the experiences you want where you need them most, rather than use a separate tool that might be dysfunctional for your flows.

🤝 Community Events

You can register for upcoming events on our brand new events page here.

Our new webinar series is up on there as well, with the following on-demand webinars:

MindStudio for Partners: Level Up Your Agency With AI Workflows (new video!)
Customize AI Applications With Your Data

Plus, we have new weekly and bi-weekly events:

Thank you for being an invaluable member of our community, it’s always great to see many of you join multiple workshops 🔥

If you’re interested in any topic in particular, feel free to reply and I’ll do my best to include it in the next releases. We’re going to update all of these soon.

🌯 That’s a wrap!

Stay tuned to learn more about what’s next and get tips & tricks for your MindStudio build.

You saw it here first,

Giorgio Barilla
MindStudio Developer & Project Manager @ MindStudio