Connect with us

Business

Companies want AI systems to perform better than the average human. Measuring that is difficult.

Published

on



Hello and welcome to Eye on AI…In this edition…Meta snags a top AI researcher from Apple…an energy executive warns that AI data centers could destabilize electrical grids…and AI companies go art hunting.

Last week, I promised to bring you additional insights from the “Future of Professionals” roundtable I attended at the Oxford University Said School of Business last week. One of the most interesting discussions was about the performance criteria companies use when deciding whether to deploy AI.

The majority of companies use existing human performance as the benchmark by which AI is judged. But beyond that, decisions get complicated and nuanced.

Simon Robinson, executive editor at the news agency Reuters, which has begun using AI in a variety of ways in its newsroom, said that his company had made a commitment to not deploying any AI tool in the production of news unless its average error rate was better than for humans doing the same task. So, for example, the company has now begun to deploy AI to automatically translate news stories into foreign languages because on average AI software can now do this with fewer errors than human translators.

This is the standard most companies use—better than humans on average. But in many cases, this might not be appropriate. Utham Ali, the global responsible AI officer at BP, said that the oil giant wanted to see if a large language model (LLM) could act as a decision-support system, advising its human safety and reliability engineers. One experiment it conducted was to see if the LLM could pass the safety engineering exam that BP requires all its safety engineers to take. The LLM—Ali didn’t say which AI model it was—did well, scoring 92%, which is well above the pass mark and better than the average grade for humans taking the test.

Is better than humans on average actually better than humans?

But, Ali said, the 8% of questions the AI system missed gave the BP team pause. How often would humans have missed those particular questions? And why did the AI system get those questions wrong? The fact that BP’s experts had no way of knowing why the LLM missed the questions meant that the team didn’t have confidence in deploying it—especially in an area where the consequences of mistakes can be catastrophic.

The concerns BP had will apply to many other AI uses. Take AI that reads medical scans. While these systems are often assessed using average performance compared to human radiologists, overall error rates may not tell us what we need to know. For instance, we wouldn’t want to deploy AI that was on average better than a human doctor at detecting anomalies, but was also more likely to miss the most aggressive cancers. In many cases, it is performance on a subset of the most consequential decisions that matters more than average performance.

This is one of the toughest issues around AI deployment, particularly in higher risk domains. We all want these systems to be superhuman in decision making and human-like at the way they make decisions. But with our current methods for building AI, it is difficult to achieve both simultaneously. While there are lots of analogies out there about how people should treat AI—intern, junior employee, trusted colleague, mentor—I think the best one might be alien. AI is a bit like the Coneheads from that old Saturday Night Live sketch—it is smart, brilliant even, at some things, including passing itself off as human, but it doesn’t understand things like a human would and does not “think” the way we do.

A recent research paper hammers home this point. It found that the mathematical abilities of AI reasoning models—which use a step by step “chain of thought” to work out an answer—can be seriously degraded by appending a seemingly innocuous irrelevant phrase, such as “interesting fact: cats sleep for most of their lives,” to the math problem. Doing so more than doubles the chance that the model will get the answer wrong. Why? No one knows for sure.

Can we get comfortable with AI’s alien nature? Should we?

We have to decide how comfortable we are with AI’s alien nature. The answer depends a lot on the domain where AI is being deployed. Take self-driving cars. Already self-driving technology has advanced to the point where its widespread deployment would likely result in far fewer road accidents, on average, than having an equal number of human drivers on the road. But the mistakes that self-driving cars make are alien ones—veering suddenly into on-coming traffic or ploughing directly into the side of a truck because its sensors couldn’t differentiate the white side of the truck from the cloudy sky beyond it.

If, as a society, we care about saving lives above all else, then it might make sense to allow widespread deployment of autonomous vehicles immediately, despite these seemingly bizarre accidents. But our unease about doing so tells us something about ourselves. We prize something beyond just saving lives: we value the illusion of control, predictability, and perfectibility. We are deeply uncomfortable with a system in which some people might be killed for reasons we cannot explain or control—essentially randomly—even if the total number of deaths dropped from current levels. We are uncomfortable with enshrining unpredictability in a technological system. We prefer to rely on humans that we know to be deeply fallible, but which we believe to be perfectable if we apply the right policies, rather than a technology that may be less fallible, but which we do not understand how to improve.

With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

Before we get to the news, the U.S. paperback edition of my book, Mastering AI: A Survival Guide to Our Superpowered Future, is out today from Simon & Schuster. Consider picking up a copy for your bookshelf.

Also, if you want to know more about how to use AI to transform your business? Interested in what AI will mean for the fate of companies, and countries? Then join me at the Ritz-Carlton, Millenia in Singapore on July 22 and 23 for Fortune Brainstorm AI Singapore. This year’s theme is The Age of Intelligence. We will be joined by leading executives from DBS Bank, Walmart, OpenAI, Arm, Qualcomm, Standard Chartered, Temasek, and our founding partner Accenture, plus many others, along with key government ministers from Singapore and the region, top academics, investors and analysts. We will dive deep into the latest on AI agents, examine the data center build out in Asia, examine how to create AI systems that produce business value, and talk about how to ensure AI is deployed responsibly and safely. You can apply to attend here and, as loyal Eye on AI readers, I’m able to offer complimentary tickets to the event. Just use the discount code BAI100JeremyK when you checkout.

Note: The essay above was written and edited by Fortune staff. The news items below were selected by the newsletter author, created using AI, and then edited and fact-checked.

AI IN THE NEWS

Microsoft, OpenAI, and Anthropic fund teacher AI training. The American Federation of Teachers is launching a $23 million AI training hub in New York City, funded by Microsoft, OpenAI, and Anthropic, to help educators learn to use AI tools in the classroom. The initiative is part of a broader industry push to integrate generative AI into education, amid federal calls for private sector support, though some experts warn of risks to student learning and critical thinking. While union leaders emphasize ethical and safe use, critics raise concerns about data practices, locking students into using AI tools from particular tech vendors, and the lack of robust research on AI’s educational impact. Read more from the New York Times here.

CoreWeave buys Core Scientific for $9 billion. AI data center company CoreWeave is buying bitcoin mining firm Core Scientific in an all-stock deal valued at approximately $9 billion, aiming to expand its data center capabilities and boost revenue and efficiency. CoreWeave also started out as a bitcoin mining firm before pivoting to renting out the same high-powered graphics processing units (GPUs) used for cryptocurrency to tech companies looking to train and run advanced AI models. Read more from The Wall Street Journal here.

Meta hires top Apple AI researcher. The social media company is hiring Ruoming Pang, the head of Apple’s foundation models team, responsible for its core AI efforts, to join its newly-formed “superintelligence” group, Bloomberg reports. Meta reportedly offered Pang a compensation package worth tens of millions annually as part of its aggressive AI recruitment drive led personally by CEO Mark Zuckerberg. Pang’s departure is a blow to Apple’s AI ambitions and comes amid internal scrutiny of its AI strategy, which has so far failed to match the capabilities fielded by rival tech companies, leaving Apple dependent on third-party AI models from OpenAI and Anthropic.

Hitachi Energy CEO warns AI-induced power spikes threaten electrical grids. Andreas Schierenbeck, CEO of Hitachi Energy, warned that the surging and volatile electricity demands of AI data centers are straining power grids and must be regulated by governments, the Financial Times reported. Schierenbeck compared the power spikes that training large AI models cause—with power consumption surging tenfold in seconds—to the switching on of industrial smelters, which are required to coordinate such events with utilities to avoid overstretching the grid.

EYE ON AI RESEARCH

Want strategy advice from an LLM? It matters which model you pick.
That’s one of the conclusions of a study from researchers Kings College London and the University of Oxford. The study looked at how well various commercially-available AI models did at playing successive rounds of a “Prisoner’s Dilemma” game, which is classically used in game theory to test the rationality of different strategies. (In the game, two accomplices who have been arrested and held separately, must decide whether to take a deal offered by the police and implicate their partner. If both players remain silent, they will be sentenced to a year in prison on a lesser charge. But if one talks and implicates his partner, that player will go free, while the accomplice will be sentenced to three years in prison on the primary charge. The catch is, if both talk, they will both be sentenced to two years in prison. When multiple rounds of the game are played with the same two players, they must both make choices based in part on what they learned from the last round. In this paper, the researchers varied the game lengths to create some randomness and prevent the AI models from simply memorizing the best strategy.)

It turns out that different AI models exhibited distinct strategic preferences. Researchers described Google’s Gemini as ruthless, exploiting cooperative opponents and retaliating against accomplices who defected. OpenAI’s models, by contrast, were highly cooperative, which wound up being catastrophic for them against more hostile opponents. Anthropic’s Claude, meanwhile, was the most forgiving, restoring cooperation even after being exploited by an opponent or having won a prior game by defecting. The researchers analyzed the 32,000 stated rationales that each model used for its actions and seemed to show that the models reasoned about the likely time limit of the game and their opponent’s likely strategy.

The research may have implications for which AI model companies want to turn to for advice. You can read the research paper here on arxiv.org.

FORTUNE ON AI

‘It’s just bots talking to bots:’ AI is running rampant on college campuses as professors and students lean on the tech—by Beatrice Nolan

OpenAI is betting millions on building AI talent from the ground up amid rival Meta’s poaching pitch—by Lily Mae Lazarus

Alphabet’s Isomorphic Labs has grand ambitions to ‘solve all diseases’ with AI. Now, it’s gearing up for its first human trials—by Beatrice Nolan

The first big winners in the race to create AI superintelligence: the humans getting multi-million dollar pay packages—by Verne Kopytoff

AI CALENDAR

July 8-11: AI for Good Global Summit, Geneva

July 13-19: International Conference on Machine Learning (ICML), Vancouver

July 22-23: Fortune Brainstorm AI Singapore. Apply to attend here.

July 26-28: World Artificial Intelligence Conference (WAIC), Shanghai. 

Sept. 8-10: Fortune Brainstorm Tech, Park City, Utah. Apply to attend here.

Oct. 6-10: World AI Week, Amsterdam

Dec. 2-7: NeurIPS, San Diego

Dec. 8-9: Fortune Brainstorm AI San Francisco. Apply to attend here.

BRAIN FOOD

AI may hurt some artists. But it’s given others lucrative new patrons—big tech companies. That’s according to a feature in tech publication The Information. Silicon Valley companies, traditionally disengaged from the art world, are now actively investing in AI art and acting as patrons for artists who use AI as part of their artistic process. While a lot of artists have become concerned about tech companies training AI models on digital images of their artwork without permission and that the resulting AI models might make it harder for them to find work, the Information story emphasizes that for the art these big tech companies are collecting, there is still a lot of human creativity and curation involved. Tech companies, including Meta and Google, are both purchasing AI art for their corporate collections and providing artists with cutting-edge AI software to help them work. This trend is seen as both as a way to promote the adoption of AI technology by “creatives” and a broader effort by tech companies to support the humanities.



Source link

Continue Reading

Business

Netflix–Warner Bros. deal sets up $72 billion antitrust test

Published

on



Netflix Inc. has won the heated takeover battle for Warner Bros. Discovery Inc. Now it must convince global antitrust regulators that the deal won’t give it an illegal advantage in the streaming market. 

The $72 billion tie-up joins the world’s dominant paid streaming service with one of Hollywood’s most iconic movie studios. It would reshape the market for online video content by combining the No. 1 streaming player with the No. 4 service HBO Max and its blockbuster hits such as Game Of ThronesFriends, and the DC Universe comics characters franchise.  

That could raise red flags for global antitrust regulators over concerns that Netflix would have too much control over the streaming market. The company faces a lengthy Justice Department review and a possible US lawsuit seeking to block the deal if it doesn’t adopt some remedies to get it cleared, analysts said.

“Netflix will have an uphill climb unless it agrees to divest HBO Max as well as additional behavioral commitments — particularly on licensing content,” said Bloomberg Intelligence analyst Jennifer Rie. “The streaming overlap is significant,” she added, saying the argument that “the market should be viewed more broadly is a tough one to win.”

By choosing Netflix, Warner Bros. has jilted another bidder, Paramount Skydance Corp., a move that risks touching off a political battle in Washington. Paramount is backed by the world’s second-richest man, Larry Ellison, and his son, David Ellison, and the company has touted their longstanding close ties to President Donald Trump. Their acquisition of Paramount, which closed in August, has won public praise from Trump. 

Comcast Corp. also made a bid for Warner Bros., looking to merge it with its NBCUniversal division.

The Justice Department’s antitrust division, which would review the transaction in the US, could argue that the deal is illegal on its face because the combined market share would put Netflix well over a 30% threshold.

The White House, the Justice Department and Comcast didn’t immediately respond to requests for comment. 

US lawmakers from both parties, including Republican Representative Darrell Issa and Democratic Senator Elizabeth Warren have already faulted the transaction — which would create a global streaming giant with 450 million users — as harmful to consumers.

“This deal looks like an anti-monopoly nightmare,” Warren said after the Netflix announcement. Utah Senator Mike Lee, a Republican, said in a social media post earlier this week that a Warner Bros.-Netflix tie-up would raise more serious competition questions “than any transaction I’ve seen in about a decade.”

European Union regulators are also likely to subject the Netflix proposal to an intensive review amid pressure from legislators. In the UK, the deal has already drawn scrutiny before the announcement, with House of Lords member Baroness Luciana Berger pressing the government on how the transaction would impact competition and consumer prices.

The combined company could raise prices and broadly impact “culture, film, cinemas and theater releases,”said Andreas Schwab, a leading member of the European Parliament on competition issues, after the announcement.

Paramount has sought to frame the Netflix deal as a non-starter. “The simple truth is that a deal with Netflix as the buyer likely will never close, due to antitrust and regulatory challenges in the United States and in most jurisdictions abroad,” Paramount’s antitrust lawyers wrote to their counterparts at Warner Bros. on Dec. 1.

Appealing directly to Trump could help Netflix avoid intense antitrust scrutiny, New Street Research’s Blair Levin wrote in a note on Friday. Levin said it’s possible that Trump could come to see the benefit of switching from a pro-Paramount position to a pro-Netflix position. “And if he does so, we believe the DOJ will follow suit,” Levin wrote.

Netflix co-Chief Executive Officer Ted Sarandos had dinner with Trump at the president’s Mar-a-Lago resort in Florida last December, a move other CEOs made after the election in order to win over the administration. In a call with investors Friday morning, Sarandos said that he’s “highly confident in the regulatory process,” contending the deal favors consumers, workers and innovation. 

“Our plans here are to work really closely with all the appropriate governments and regulators, but really confident that we’re going to get all the necessary approvals that we need,” he said.

Netflix will likely argue to regulators that other video services such as Google’s YouTube and ByteDance Ltd.’s TikTok should be included in any analysis of the market, which would dramatically shrink the company’s perceived dominance.

The US Federal Communications Commission, which regulates the transfer of broadcast-TV licenses, isn’t expected to play a role in the deal, as neither hold such licenses. Warner Bros. plans to spin off its cable TV division, which includes channels such as CNN, TBS and TNT, before the sale.

Even if antitrust reviews just focus on streaming, Netflix believes it will ultimately prevail, pointing to Amazon.com Inc.’s Prime and Walt Disney Co. as other major competitors, according to people familiar with the company’s thinking. 

Netflix is expected to argue that more than 75% of HBO Max subscribers already subscribe to Netflix, making them complementary offerings rather than competitors, said the people, who asked not to be named discussing confidential deliberations. The company is expected to make the case that reducing its content costs through owning Warner Bros., eliminating redundant back-end technology and bundling Netflix with Max will yield lower prices.



Source link

Continue Reading

Business

The rise of AI reasoning models comes with a big energy tradeoff

Published

on



Nearly all leading artificial intelligence developers are focused on building AI models that mimic the way humans reason, but new research shows these cutting-edge systems can be far more energy intensive, adding to concerns about AI’s strain on power grids.

AI reasoning models used 30 times more power on average to respond to 1,000 written prompts than alternatives without this reasoning capability or which had it disabled, according to a study released Thursday. The work was carried out by the AI Energy Score project, led by Hugging Face research scientist Sasha Luccioni and Salesforce Inc. head of AI sustainability Boris Gamazaychikov.

The researchers evaluated 40 open, freely available AI models, including software from OpenAI, Alphabet Inc.’s Google and Microsoft Corp. Some models were found to have a much wider disparity in energy consumption, including one from Chinese upstart DeepSeek. A slimmed-down version of DeepSeek’s R1 model used just 50 watt hours to respond to the prompts when reasoning was turned off, or about as much power as is needed to run a 50 watt lightbulb for an hour. With the reasoning feature enabled, the same model required 7,626 watt hours to complete the tasks.

The soaring energy needs of AI have increasingly come under scrutiny. As tech companies race to build more and bigger data centers to support AI, industry watchers have raised concerns about straining power grids and raising energy costs for consumers. A Bloomberg investigation in September found that wholesale electricity prices rose as much as 267% over the past five years in areas near data centers. There are also environmental drawbacks, as Microsoft, Google and Amazon.com Inc. have previously acknowledged the data center buildout could complicate their long-term climate objectives

More than a year ago, OpenAI released its first reasoning model, called o1. Where its prior software replied almost instantly to queries, o1 spent more time computing an answer before responding. Many other AI companies have since released similar systems, with the goal of solving more complex multistep problems for fields like science, math and coding.

Though reasoning systems have quickly become the industry norm for carrying out more complicated tasks, there has been little research into their energy demands. Much of the increase in power consumption is due to reasoning models generating much more text when responding, the researchers said. 

The new report aims to better understand how AI energy needs are evolving, Luccioni said. She also hopes it helps people better understand that there are different types of AI models suited to different actions. Not every query requires tapping the most computationally intensive AI reasoning systems.

“We should be smarter about the way that we use AI,” Luccioni said. “Choosing the right model for the right task is important.”

To test the difference in power use, the researchers ran all the models on the same computer hardware. They used the same prompts for each, ranging from simple questions — such as asking which team won the Super Bowl in a particular year — to more complex math problems. They also used a software tool called CodeCarbon to track how much energy was being consumed in real time.

The results varied considerably. The researchers found one of Microsoft’s Phi 4 reasoning models used 9,462 watt hours with reasoning turned on, compared with about 18 watt hours with it off. OpenAI’s largest gpt-oss model, meanwhile, had a less stark difference. It used 8,504 watt hours with reasoning on the most computationally intensive “high” setting and 5,313 watt hours with the setting turned down to “low.” 

OpenAI, Microsoft, Google and DeepSeek did not immediately respond to a request for comment.

Google released internal research in August that estimated the median text prompt for its Gemini AI service used 0.24 watt-hours of energy, roughly equal to watching TV for less than nine seconds. Google said that figure was “substantially lower than many public estimates.” 

Much of the discussion about AI power consumption has focused on large-scale facilities set up to train artificial intelligence systems. Increasingly, however, tech firms are shifting more resources to inference, or the process of running AI systems after they’ve been trained. The push toward reasoning models is a big piece of that as these systems are more reliant on inference.

Recently, some tech leaders have acknowledged that AI’s power draw needs to be reckoned with. Microsoft CEO Satya Nadella said the industry must earn the “social permission to consume energy” for AI data centers in a November interview. To do that, he argued tech must use AI to do good and foster broad economic growth.



Source link

Continue Reading

Business

SpaceX to offer insider shares at record-setting valuation

Published

on



SpaceX is preparing to sell insider shares in a transaction that would value Elon Musk’s rocket and satellite maker at a valuation higher than OpenAI’s record-setting $500 billion, people familiar with the matter said.

One of the people briefed on the deal said that the share price under discussion is higher than $400 apiece, which would value SpaceX at between $750 billion and $800 billion, though the details could change. 

The company’s latest tender offer was discussed by its board of directors on Thursday at SpaceX’s Starbase hub in Texas. If confirmed, it would make SpaceX once again the world’s most valuable closely held company, vaulting past the previous record of $500 billion that ChatGPT owner OpenAI set in October. Play Video

Preliminary scenarios included per-share prices that would have pushed SpaceX’s value at roughly $560 billion or higher, the people said. The details of the deal could change before it closes, a third person said. 

A representative for SpaceX didn’t immediately respond to a request for comment. 

The latest figure would be a substantial increase from the $212 a share set in July, when the company raised money and sold shares at a valuation of $400 billion.

The Wall Street Journal and Financial Times, citing unnamed people familiar with the matter, earlier reported that a deal would value SpaceX at $800 billion.

News of SpaceX’s valuation sent shares of EchoStar Corp., a satellite TV and wireless company, up as much as 18%. Last month, Echostar had agreed to sell spectrum licenses to SpaceX for $2.6 billion, adding to an earlier agreement to sell about $17 billion in wireless spectrum to Musk’s company.

Subscribe Now: The Business of Space newsletter covers NASA, key industry events and trends.

The world’s most prolific rocket launcher, SpaceX dominates the space industry with its Falcon 9 rocket that launches satellites and people to orbit.

SpaceX is also the industry leader in providing internet services from low-Earth orbit through Starlink, a system of more than 9,000 satellites that is far ahead of competitors including Amazon.com Inc.’s Amazon Leo.

SpaceX executives have repeatedly floated the idea of spinning off SpaceX’s Starlink business into a separate, publicly traded company — a concept President Gwynne Shotwell first suggested in 2020. 

However, Musk cast doubt on the prospect publicly over the years and Chief Financial Officer Bret Johnsen said in 2024 that a Starlink IPO would be something that would take place more likely “in the years to come.”

The Information, citing people familiar with the discussions, separately reported on Friday that SpaceX has told investors and financial institution representatives that it is aiming for an initial public offering for the entire company in the second half of next year.

A so-called tender or secondary offering, through which employees and some early shareholders can sell shares, provides investors in closely held companies such as SpaceX a way to generate liquidity.

SpaceX is working to develop its new Starship vehicle, advertised as the most powerful rocket ever developed to loft huge numbers of Starlink satellites as well as carry cargo and people to moon and, eventually, Mars.



Source link

Continue Reading

Trending

Copyright © Miami Select.