AI: Weekly Summary. RTZ #583

Michael Parekh AI: Reset to Zero

6 months ago

4 MIN READ

OpenAI leads in AI Reasoning going into 2025: OpenAI, Google and others are ending the year neck and neck on AI Reasoning product innovations, but it looks like OpenAI is ahead by a nose with o3. The company presented this latest reasoning model on the 12th day of its AI ‘Shipmas’, and the initial reaction has been positive. Early performance against the toughest benchmarks have been promising to say the least, with more progress to come. The new model has far better performance than o1 released only 3 months ago. The company skipped o2 due to a naming conflict with a UK Telco. Even though the actual product won’t be available to most until early 2025. The product goes into safety evaluations with trusted partners for now. Google of course announced its Gemini Experimental Reasoning product, which is also promising. Other LLM AI companies like Anthropic, xAI, and others globally are also in the race of course. More here.

OpenAI focused on robots again: OpenAI apparently is again interested in making humanoid robots, leaning into the general AI industry ramp on foundation LLM based robots. If true, OpenAI goes up against Elon Musk’s Optimum robot efforts, led by Tesla and xAI. Not to mention Google, Nvidia and others. And a host of AI robot startups in the US and China in particular. It’s an area of long-term promise, but a LOT of near term work. Especially in the area of training and inference data needed to scale humanoid robot capabilities both for general purpose, and various vertical industry activities. At the very least, OpenAI’s efforts would give it a front row seat on the development expertise in this early area of applying AI vigorously in the physical world. More here.

Microsoft/OpenAI new deal negotiations: New reports on Microsoft and OpenAI going into next year renegotiating the terms of their iconic partnership, given the expected OpenAI transition from non-profit to a for profit (PBC) structure. Lots apparently revolve around OpenAI’s definitions and actual path to AGI, with recent reports suggesting that a trigger may be OpenAI reaching a $100 billion in profits. Presumably GAAP net profits. If so, then only a handful of companies like Aramaco and Apple qualify and may indicate a lengthier timeline, unless of course these terms are re-negotiated. Both companies are actively focused on making their current partnership work, especially in building AI data centers, despite the alignment questions for the two companies going forward. More here.

Industry leaning into AI ‘risk-on’ into 2025: We end 2024 with a record amount of big tech investments in AI data centers, on a path to potentially see a trillion or more invested over this decade. And that includes the Power investments ahead for this ramp of course. Firms like Goldman Sachs are positing that ‘for Big Tech, Ai Capex could be the new M&A’. And of course the ongoing debate around the returns on these investments relative to the current upfront expenditures. Expect this debate to get louder next year. But for now, most of the stakeholders are in a ‘risk-on’ mode going into 2025. More here.
AI Evals Evolving: As the year ends with relentless LLM AI reasoning and agentic development and competition amongst the major players, the need for ever tougher evaluation tests for these models is accelerating. There is the growing need to evaluate and benchmark these models, to better compare their capabilities against one another. Both from an academic research perspective, and of course commercial decisions for companies and developers putting these models to work. Another priority is of course having capable methodologies to audit the safety of these models, and the ability to keep them on operational guidelines as the AI Scales. Lastly, as end users combine multiple models large and small, the ‘interoperability’ of these models is also an increasing evaluation issue. And that has the need for its a new set of technologies to accomplish that difficult, opaque task. Expect the industry to expend significant resources on this front as LLM AIs scale into next year and beyond. More here.