AI Architect by MindStudio
Posts
AI Breakthroughs: GPT-4o, new Gemini 1.5 Flash, 14 New Models in MindStudio, Multimodal Assistants

AI Breakthroughs: GPT-4o, new Gemini 1.5 Flash, 14 New Models in MindStudio, Multimodal Assistants

This week, we got two new models from Google and OpenAI, including the new state-of-the-art GPT4o, and plenty of news from Google including AI Search.

Giorgio Barilla
May 17, 2024

You're receiving this email because you registered to one our workshops. You can unsubscribe at the bottom of each email at any time.

MindStudio is on (another) roll. This week, we launched 14 new models in the platform, including the newest state-of-the-art GPT-4o and the highly requested Gemini 1.5 Pro and Perplexity’s Sonar.

The AI industry as a whole got a sneak peek of the future with OpenAI and Google both announcing monumental changes to their product. OpenAI released the most natural voice assistant ever, Google shipped AI overviews in the US and improved Gemini’s context to 2 million tokens, and Microsoft entered a deal with France for a €4b investment.

Today’s newsletter will be longer than usual - but what a ride!

New Guides for Pros

Llama 3 in MindStudio: top open source models

GPT-4o: SOTA performance in your AI-powered workflows

Build AI apps that can search the web with Perplexity

Gemini 1.5 Pro: 1m context window for your long docs and videos

What’s coming next

Audio models (in and out) like Eleven Labs and GPT-4o voice

Live search with citations from Perplexity’s API

More RAG options to cover a larger number of use cases

More workshops to take you from zero to hero in MindStudio (100% free)

As a reminder, we’re now welcoming partners that want to build AIs for their clients. Sign up for extra support, training resources, and more here.

🗞️ Industry news

OpenAI Releases GPT-4o, new SOTA model and truly multimodal

On Monday, OpenAI hosted a surprise spring update to announce the new ChatGPT app and GPT-4o. The new GPT-4o is truly multimodal (versus calling different models to “mimic” multimodality) from a single inference point.

Core differentiators:

- GPT-4o is now the only model ever to go past 1300 ELO in the lmys leaderboard, scoring 1310 in coding. Overall, the model got 1289 points, +31 from Gemini 1.5 Pro and +33 from Claude 3 Opus. It’s the new State of The Art model;

- GPT-4o’s voice modality is shockingly human. The team showcased it in ChatGPT, and the AI was able to basically flirt with the team, increase or decrease the tone of voice, and inference got very, very close to human level. You won’t have to wait seconds before getting the response now. See the demo here;

- GPT-4o is 2x cheaper and 50% faster than GPT-4 Turbo, meaning it’s now almost faster than GPT-3.5 Turbo;

- GPT-4o will be available for free in ChatGPT with rate limits. This might be OpenAI’s attempt to train on user data vs facing public outcry whenever a new training data source is revealed.

MindStudio added GPT-4o to the model list just hours after release. You can start playing with it today.
Google announces Gemini 1.5 Flash and expands the max content window to 2m tokens

On Tuesday, Google hosted Google I/O 2024 in Mountain View and stuffed AI into every core product they own. In just a few hours, the team announced:

- AI Overviews in Search are now rolling out to everyone in the US for generic queries like “how to plan a trip”;

- Gemini 1.5 Flash, a new small model that outperforms others in its class. The model will have the same context size as its bigger brother but costs nearly 10x less;

- AI in Workspace products such as Google Drive, Google Slides, Google Docs, etc. powered by Gemini 1.5. They seem to be setting aside Gemini 1.0 Ultra for the time being;

- A new, massive, 2m context size rolling out very soon to all users;

- New image and video models to generate and read multimodal outputs;

- Project Astra, an alternative to the new ChatGPT assistant presented on Monday. No idea on when this will go live, though;

- Context caching and other token optimization techniques. We’ll look into ways to add these to MindStudio when they become generally available.

Plus a plethora of other AI updates in Android 15, Pixel 8a, Google Lens, Google Search, Labs, and more. Google went ALL-IN this year, and Apple will show its response in June.
Microsoft's Largest Investment in France to Boost AI and Innovation

At the Choose France summit, Microsoft announced a historic €4 billion investment to accelerate AI adoption and innovation in France.

This investment, the largest in Microsoft's 41-year history in the country, aims to enhance cloud and AI infrastructure, train 1 million individuals, and support 2,500 AI startups by 2027.

The initiative aligns with France's National Strategy for AI, positioning the country as a leader in AI development and usage. Microsoft will expand its datacenter footprint and build a new campus in Mulhouse, boosting France's competitive edge and data security.

Notably, this week also saw 2 leaders at OpenAI leaving the company.

Ilya Sutskever, ex Chief Scientist, officially resigned and wrote a long post on X wishing OpenAI the best.

Jan Leike, ex Co-lead of Superalignment and Ilya’s colleague also resigned. He didn’t spare many words, as his post literally quotes “I resigned” - that’s it.

🔥 Product Updates

This week, MindStudio released 14 new models to the platform. You can now start building with:

Gemini 1.5 Pro: the highly requested Google model with a 1m context size (soon, 2m!). With this model, you can analyze and summarize huge documents and entire books.
GPT-4o: the new SOTA model from OpenAI became available in the platform a few hours after launch. Enjoy GPT-4 Turbo capabilities at half the cost;
Llama 3: the Llama 3 family includes 8b, the smallest, and 70b, the mid-size model. A 400b parameters Llama 3 is in training. The new models are very cost-efficient given their reasoning capabilities and can serve as great allies in your workflows;
Perplexity API: Perplexity offers 4 models, 2 “chat” (offline) and 2 “online” models. Both are based on Llama 3. The online models trigger a live search through the popular AI search engine and feeds the response in the prompt. This means your workflow can now fetch and organize real-time data from search rather than stopping at the model cutoff date;
Reka models: Reka is a new startup that nearly matched GPT-4 performance in their vision models. We’re excited to support it in MindStudio;
Mistral: we now added Mixtral 8×22b, one of the leading open-source models with great capabilities and open terms and conditions;
Command: Cohere is one of the leading model providers building multilingual capabilities. Their Command R and Command R+ sometimes feel more human than others when prompting in Italian, French, German, and more languages in EU and Asia.

Together with in-prompt conditional logic, these batch of models build up on MindStudio’s model agnosticism and lets you choose the best model for the job. Remember you can always test different models against each other with the Profiler.

Our next focus areas are:

Audio: this is now our #1 priority. We want to deliver audio in input and output, giving you even more flexibility in how users can interact with your workflows;
Image blocks: in the future, you won’t need you own API for models like DALL-E 3 or Stability AI;
Vision: make use of Gemini and GPT-4o’s multimodality to give users an upload block, similar to other apps;
Dark Mode: another popular feature request is coming very soon for all users.

Let us know what you think of these updates and how we can make MindStudio even more relevant for your builds.

💡 Tip of The Week

We’ll release a full video about the new conditional logic in prompts, but if you’re here, it’s because you like to get into the nitty gritty of all of this.

So let’s do just that 😉

Conditional logic in prompts lets you add optional inputs that can direct the prompts later on.

You might have the inputs:

name (mandatory) => variable “name”
LinkedIn URL (mandatory) => variable “linkedin”
CV file upload (optional) => variable “cv”

In the following send message block, you can craft two prompts. One will assume the user uploaded a CV, the other will act as if they didn’t.

For example:

{{#if cv}}
Human provided the following info: 

Name: {{name}}
LinkedIn Profile (extract of their public URL): {{linkedin}}
CV: {{cv}}

Help them craft a better description for their LinkedIn profile.

{{else}}

Human provided the following info: 

Name: {{name}}
LinkedIn Profile (extract of their public URL): {{linkedin}}
CV: {{cv}}

Help them craft a CV to land their dream job.

{{/if}}

As you can see, the if/then condition picks the optional field and branches the prompt based on whether the user uploaded a CV. If they did, the AI will help them craft a better description. If they didn’t, it will help them craft the CV first.

The conditional logic needs three components to work:

{{if}}
{{else}}
{{/if}}

The first “if” starts the condition, the last one stops it, and the “else” determines what happens when the condition isn’t met.

The condition will always check if "variable X" exists, with X being the CV in the example above. Currently, you can only verify if any value is present in the variable, not whether it contains a specific value.

🤝 Community Events

If you want to hangout with our team, we usually host a Discord event every Friday @ 3PM Eastern. Join our Discord channel to keep up to date with the hangouts - our entire team is hyper active there.

I also worked on a set of new workshops we will publish next week. We will start with a new batch from next week and go from there.

There’s a kicker for these new workshops. We took the learnings from the previous ones to deliver a better, more streamlined experience optimized to make you learn more in less time. I’m confident you will enjoy the new format.

The first two editions will focus on data sources and one live build. We uploaded the recording of most of the previous workshops on YouTube - make sure to check them here out if you haven’t already.

Thank you for being an invaluable member of our community, it’s always great to see many of you join multiple workshops 🔥

🌯 That’s a wrap!

Stay tuned to learn more about what’s next and get tips & tricks for your MindStudio build.

You saw it here first,

Giorgio Barilla
MindStudio Developer & Project Manager @ MindStudio