Connect with us

Business

Companies want AI systems to perform better than the average human. Measuring that is difficult.

Published

on



Hello and welcome to Eye on AI…In this edition…Meta snags a top AI researcher from Apple…an energy executive warns that AI data centers could destabilize electrical grids…and AI companies go art hunting.

Last week, I promised to bring you additional insights from the “Future of Professionals” roundtable I attended at the Oxford University Said School of Business last week. One of the most interesting discussions was about the performance criteria companies use when deciding whether to deploy AI.

The majority of companies use existing human performance as the benchmark by which AI is judged. But beyond that, decisions get complicated and nuanced.

Simon Robinson, executive editor at the news agency Reuters, which has begun using AI in a variety of ways in its newsroom, said that his company had made a commitment to not deploying any AI tool in the production of news unless its average error rate was better than for humans doing the same task. So, for example, the company has now begun to deploy AI to automatically translate news stories into foreign languages because on average AI software can now do this with fewer errors than human translators.

This is the standard most companies use—better than humans on average. But in many cases, this might not be appropriate. Utham Ali, the global responsible AI officer at BP, said that the oil giant wanted to see if a large language model (LLM) could act as a decision-support system, advising its human safety and reliability engineers. One experiment it conducted was to see if the LLM could pass the safety engineering exam that BP requires all its safety engineers to take. The LLM—Ali didn’t say which AI model it was—did well, scoring 92%, which is well above the pass mark and better than the average grade for humans taking the test.

Is better than humans on average actually better than humans?

But, Ali said, the 8% of questions the AI system missed gave the BP team pause. How often would humans have missed those particular questions? And why did the AI system get those questions wrong? The fact that BP’s experts had no way of knowing why the LLM missed the questions meant that the team didn’t have confidence in deploying it—especially in an area where the consequences of mistakes can be catastrophic.

The concerns BP had will apply to many other AI uses. Take AI that reads medical scans. While these systems are often assessed using average performance compared to human radiologists, overall error rates may not tell us what we need to know. For instance, we wouldn’t want to deploy AI that was on average better than a human doctor at detecting anomalies, but was also more likely to miss the most aggressive cancers. In many cases, it is performance on a subset of the most consequential decisions that matters more than average performance.

This is one of the toughest issues around AI deployment, particularly in higher risk domains. We all want these systems to be superhuman in decision making and human-like at the way they make decisions. But with our current methods for building AI, it is difficult to achieve both simultaneously. While there are lots of analogies out there about how people should treat AI—intern, junior employee, trusted colleague, mentor—I think the best one might be alien. AI is a bit like the Coneheads from that old Saturday Night Live sketch—it is smart, brilliant even, at some things, including passing itself off as human, but it doesn’t understand things like a human would and does not “think” the way we do.

A recent research paper hammers home this point. It found that the mathematical abilities of AI reasoning models—which use a step by step “chain of thought” to work out an answer—can be seriously degraded by appending a seemingly innocuous irrelevant phrase, such as “interesting fact: cats sleep for most of their lives,” to the math problem. Doing so more than doubles the chance that the model will get the answer wrong. Why? No one knows for sure.

Can we get comfortable with AI’s alien nature? Should we?

We have to decide how comfortable we are with AI’s alien nature. The answer depends a lot on the domain where AI is being deployed. Take self-driving cars. Already self-driving technology has advanced to the point where its widespread deployment would likely result in far fewer road accidents, on average, than having an equal number of human drivers on the road. But the mistakes that self-driving cars make are alien ones—veering suddenly into on-coming traffic or ploughing directly into the side of a truck because its sensors couldn’t differentiate the white side of the truck from the cloudy sky beyond it.

If, as a society, we care about saving lives above all else, then it might make sense to allow widespread deployment of autonomous vehicles immediately, despite these seemingly bizarre accidents. But our unease about doing so tells us something about ourselves. We prize something beyond just saving lives: we value the illusion of control, predictability, and perfectibility. We are deeply uncomfortable with a system in which some people might be killed for reasons we cannot explain or control—essentially randomly—even if the total number of deaths dropped from current levels. We are uncomfortable with enshrining unpredictability in a technological system. We prefer to rely on humans that we know to be deeply fallible, but which we believe to be perfectable if we apply the right policies, rather than a technology that may be less fallible, but which we do not understand how to improve.

With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

Before we get to the news, the U.S. paperback edition of my book, Mastering AI: A Survival Guide to Our Superpowered Future, is out today from Simon & Schuster. Consider picking up a copy for your bookshelf.

Also, if you want to know more about how to use AI to transform your business? Interested in what AI will mean for the fate of companies, and countries? Then join me at the Ritz-Carlton, Millenia in Singapore on July 22 and 23 for Fortune Brainstorm AI Singapore. This year’s theme is The Age of Intelligence. We will be joined by leading executives from DBS Bank, Walmart, OpenAI, Arm, Qualcomm, Standard Chartered, Temasek, and our founding partner Accenture, plus many others, along with key government ministers from Singapore and the region, top academics, investors and analysts. We will dive deep into the latest on AI agents, examine the data center build out in Asia, examine how to create AI systems that produce business value, and talk about how to ensure AI is deployed responsibly and safely. You can apply to attend here and, as loyal Eye on AI readers, I’m able to offer complimentary tickets to the event. Just use the discount code BAI100JeremyK when you checkout.

Note: The essay above was written and edited by Fortune staff. The news items below were selected by the newsletter author, created using AI, and then edited and fact-checked.

AI IN THE NEWS

Microsoft, OpenAI, and Anthropic fund teacher AI training. The American Federation of Teachers is launching a $23 million AI training hub in New York City, funded by Microsoft, OpenAI, and Anthropic, to help educators learn to use AI tools in the classroom. The initiative is part of a broader industry push to integrate generative AI into education, amid federal calls for private sector support, though some experts warn of risks to student learning and critical thinking. While union leaders emphasize ethical and safe use, critics raise concerns about data practices, locking students into using AI tools from particular tech vendors, and the lack of robust research on AI’s educational impact. Read more from the New York Times here.

CoreWeave buys Core Scientific for $9 billion. AI data center company CoreWeave is buying bitcoin mining firm Core Scientific in an all-stock deal valued at approximately $9 billion, aiming to expand its data center capabilities and boost revenue and efficiency. CoreWeave also started out as a bitcoin mining firm before pivoting to renting out the same high-powered graphics processing units (GPUs) used for cryptocurrency to tech companies looking to train and run advanced AI models. Read more from The Wall Street Journal here.

Meta hires top Apple AI researcher. The social media company is hiring Ruoming Pang, the head of Apple’s foundation models team, responsible for its core AI efforts, to join its newly-formed “superintelligence” group, Bloomberg reports. Meta reportedly offered Pang a compensation package worth tens of millions annually as part of its aggressive AI recruitment drive led personally by CEO Mark Zuckerberg. Pang’s departure is a blow to Apple’s AI ambitions and comes amid internal scrutiny of its AI strategy, which has so far failed to match the capabilities fielded by rival tech companies, leaving Apple dependent on third-party AI models from OpenAI and Anthropic.

Hitachi Energy CEO warns AI-induced power spikes threaten electrical grids. Andreas Schierenbeck, CEO of Hitachi Energy, warned that the surging and volatile electricity demands of AI data centers are straining power grids and must be regulated by governments, the Financial Times reported. Schierenbeck compared the power spikes that training large AI models cause—with power consumption surging tenfold in seconds—to the switching on of industrial smelters, which are required to coordinate such events with utilities to avoid overstretching the grid.

EYE ON AI RESEARCH

Want strategy advice from an LLM? It matters which model you pick.
That’s one of the conclusions of a study from researchers Kings College London and the University of Oxford. The study looked at how well various commercially-available AI models did at playing successive rounds of a “Prisoner’s Dilemma” game, which is classically used in game theory to test the rationality of different strategies. (In the game, two accomplices who have been arrested and held separately, must decide whether to take a deal offered by the police and implicate their partner. If both players remain silent, they will be sentenced to a year in prison on a lesser charge. But if one talks and implicates his partner, that player will go free, while the accomplice will be sentenced to three years in prison on the primary charge. The catch is, if both talk, they will both be sentenced to two years in prison. When multiple rounds of the game are played with the same two players, they must both make choices based in part on what they learned from the last round. In this paper, the researchers varied the game lengths to create some randomness and prevent the AI models from simply memorizing the best strategy.)

It turns out that different AI models exhibited distinct strategic preferences. Researchers described Google’s Gemini as ruthless, exploiting cooperative opponents and retaliating against accomplices who defected. OpenAI’s models, by contrast, were highly cooperative, which wound up being catastrophic for them against more hostile opponents. Anthropic’s Claude, meanwhile, was the most forgiving, restoring cooperation even after being exploited by an opponent or having won a prior game by defecting. The researchers analyzed the 32,000 stated rationales that each model used for its actions and seemed to show that the models reasoned about the likely time limit of the game and their opponent’s likely strategy.

The research may have implications for which AI model companies want to turn to for advice. You can read the research paper here on arxiv.org.

FORTUNE ON AI

‘It’s just bots talking to bots:’ AI is running rampant on college campuses as professors and students lean on the tech—by Beatrice Nolan

OpenAI is betting millions on building AI talent from the ground up amid rival Meta’s poaching pitch—by Lily Mae Lazarus

Alphabet’s Isomorphic Labs has grand ambitions to ‘solve all diseases’ with AI. Now, it’s gearing up for its first human trials—by Beatrice Nolan

The first big winners in the race to create AI superintelligence: the humans getting multi-million dollar pay packages—by Verne Kopytoff

AI CALENDAR

July 8-11: AI for Good Global Summit, Geneva

July 13-19: International Conference on Machine Learning (ICML), Vancouver

July 22-23: Fortune Brainstorm AI Singapore. Apply to attend here.

July 26-28: World Artificial Intelligence Conference (WAIC), Shanghai. 

Sept. 8-10: Fortune Brainstorm Tech, Park City, Utah. Apply to attend here.

Oct. 6-10: World AI Week, Amsterdam

Dec. 2-7: NeurIPS, San Diego

Dec. 8-9: Fortune Brainstorm AI San Francisco. Apply to attend here.

BRAIN FOOD

AI may hurt some artists. But it’s given others lucrative new patrons—big tech companies. That’s according to a feature in tech publication The Information. Silicon Valley companies, traditionally disengaged from the art world, are now actively investing in AI art and acting as patrons for artists who use AI as part of their artistic process. While a lot of artists have become concerned about tech companies training AI models on digital images of their artwork without permission and that the resulting AI models might make it harder for them to find work, the Information story emphasizes that for the art these big tech companies are collecting, there is still a lot of human creativity and curation involved. Tech companies, including Meta and Google, are both purchasing AI art for their corporate collections and providing artists with cutting-edge AI software to help them work. This trend is seen as both as a way to promote the adoption of AI technology by “creatives” and a broader effort by tech companies to support the humanities.



Source link

Continue Reading

Business

U.S. consumers are so strained they put more than $1B on BNPL during Black Friday and Cyber Monday

Published

on



Financially strained and cautious customers leaned heavily on buy now, pay later (BNPL) services over the holiday weekend.

Cyber Monday alone generated $1.03 billion (a 4.2% increase YoY) in online BNPL sales with most transactions happening on mobile devices, per Adobe Analytics. Overall, consumers spent $14.25 billion online on Cyber Monday. To put that into perspective, BNPL made up for more than 7.2% of total online sales on that day.

As for Black Friday, eMarketer reported $747.5 million in online sales using BNPL services with platforms like PayPal finding a 23% uptick in BNPL transactions.

Likewise, digital financial services company Zip reported 1.6 million transactions throughout 280,000 of its locations over the Black Friday and Cyber Monday weekend. Millennials (51%) accounted for a chunk of the sizable BNPL purchases, followed by Gen Z, Gen X, and baby boomers, per Zip.

The Adobe data showed that people using BNPL were most likely to spend on categories such as electronics, apparel, toys, and furniture, which is consistent with previous years. This trend also tracks with Zip’s findings that shoppers were primarily investing in tech, electronics, and fashion when using its services.

And while some may be surprised that shoppers are taking on more debt via BNPL (in this economy?!), analysts had already projected a strong shopping weekend. A Deloitte survey forecast that consumers would spend about $650 million over the Black Friday–Cyber Monday stretch—a 15% jump from 2023.

“US retailers leaned heavily on discounts this holiday season to drive online demand,” Vivek Pandya, lead analyst at Adobe Digital Insights, said in a statement. “Competitive and persistent deals throughout Cyber Week pushed consumers to shop earlier, creating an environment where Black Friday now challenges the dominance of Cyber Monday.”

This report was originally published by Retail Brew.



Source link

Continue Reading

Business

AI labs like Meta, Deepseek, and Xai earned worst grades possible on an existential safety index

Published

on



A recent report card from an AI safety watchdog isn’t one that tech companies will want to stick on the fridge.

The Future of Life Institute’s latest AI safety index found that major AI labs fell short on most measures of AI responsibility, with few letter grades rising above a C. The org graded eight companies across categories like safety frameworks, risk assessment, and current harms.

Perhaps most glaring was the “existential safety” line, where companies scored Ds and Fs across the board. While many of these companies are explicitly chasing superintelligence, they lack a plan for safely managing it, according to Max Tegmark, MIT professor and president of the Future of Life Institute.

“Reviewers found this kind of jarring,” Tegmark told us.

The reviewers in question were a panel of AI academics and governance experts who examined publicly available material as well as survey responses submitted by five of the eight companies.

Anthropic, OpenAI, and GoogleDeepMind took the top three spots with an overall grade of C+ or C. Then came, in order, Elon Musk’s Xai, Z.ai, Meta, DeepSeek, and Alibaba, all of which got Ds or a D-.

Tegmark blames a lack of regulation that has meant the cutthroat competition of the AI race trumps safety precautions. California recently passed the first law that requires frontier AI companies to disclose safety information around catastrophic risks, and New York is currently within spitting distance as well. Hopes for federal legislation are dim, however.

“Companies have an incentive, even if they have the best intentions, to always rush out new products before the competitor does, as opposed to necessarily putting in a lot of time to make it safe,” Tegmark said.

In lieu of government-mandated standards, Tegmark said the industry has begun to take the group’s regularly released safety indexes more seriously; four of the five American companies now respond to its survey (Meta is the only holdout.) And companies have made some improvements over time, Tegmark said, mentioning Google’s transparency around its whistleblower policy as an example.

But real-life harms reported around issues like teen suicides that chatbots allegedly encouraged, inappropriate interactions with minors, and major cyberattacks have also raised the stakes of the discussion, he said.

“[They] have really made a lot of people realize that this isn’t the future we’re talking about—it’s now,” Tegmark said.

The Future of Life Institute recently enlisted public figures as diverse as Prince Harry and Meghan Markle, former Trump aide Steve Bannon, Apple co-founder Steve Wozniak, and rapper Will.i.am to sign a statement opposing work that could lead to superintelligence.

Tegmark said he would like to see something like “an FDA for AI where companies first have to convince experts that their models are safe before they can sell them.

“The AI industry is quite unique in that it’s the only industry in the US making powerful technology that’s less regulated than sandwiches—basically not regulated at all,” Tegmark said. “If someone says, ‘I want to open a new sandwich shop near Times Square,’ before you can sell the first sandwich, you need a health inspector to check your kitchen and make sure it’s not full of rats…If you instead say, ‘Oh no, I’m not going to sell any sandwiches. I’m just going to release superintelligence.’ OK! No need for any inspectors, no need to get any approvals for anything.”

“So the solution to this is very obvious,” Tegmark added. “You just stop this corporate welfare of giving AI companies exemptions that no other companies get.”

This report was originally published by Tech Brew.



Source link

Continue Reading

Business

Hollywood writers say Warner takeover ‘must be blocked’

Published

on



Hollywood writers, producers, directors and theater owners voiced skepticism over Netflix Inc.’s proposed $82.7 billion takeover of Warner Bros. Discovery Inc.’s studio and streaming businesses, saying it threatens to undermine their interests.

The Writers Guild of America, which announced in October it would oppose any sale of Warner Bros., reiterated that view on Friday, saying the purchase by Netflix “must be blocked.”

“The world’s largest streaming company swallowing one of its biggest competitors is what antitrust laws were designed to prevent,” the guild said in an emailed statement. “The outcome would eliminate jobs, push down wages, worsen conditions for all entertainment workers, raise prices for consumers, and reduce the volume and diversity of content for all viewers.”

The worries raised by the movie and TV industry’s biggest trade groups come against the backdrop of falling movie and TV production, slack ticket sales and steep job cuts in Hollywood. Another legacy studio, Paramount, was sold earlier this year.

Warner Bros. accounts for about a fourth of North American ticket sales — roughly $2 billion — and is being acquired by a company that has long shunned theatrical releases for its feature films. As part of the deal, Netflix co-CEO Ted Sarandos has promised Warner Bros. will continue to release moves in theaters.

“The proposed acquisition of Warner Bros. by Netflix poses an unprecedented threat to the global exhibition business,” Michael O’Leary, chief executive officer of the theatrical trade group Cinema United, said in en emailed statement Friday. “The negative impact of this acquisition will impact theaters from the biggest circuits to one-screen independents.”

The buyout of Warner Bros. by Netflix “would be a disaster,” James Cameron, the director of some of Hollywood’s highest-grossing films in history including Titanic and Avatar, said in late November on The Town, an industry-focused podcast. “Sorry Ted, but jeez. Sarandos has gone on record saying theatrical films are dead.”

On a conference call with investors Friday, Sarandos said that his company’s resistance to releasing films in cinemas was mostly tied to “the long exclusive windows, which we don’t really think are that consumer friendly.”

The company said Friday it would “maintain Warner Bros.’ current operations and build on its strengths, including theatrical releases for films.”

On the call, Sarandos reiterated that view, saying that, “right now, you should count on everything that is planned on going to the theater through Warner Bros. will continue to go to the theaters through Warner Bros.” 

Competition from online outfits like YouTube and Netflix has forced a reckoning in Hollywood, opening the door for takeovers like the Warner Bros. deal announced Friday. Media giants including Comcast Corp., parent of NBCUniversal, are unloading cable-TV networks like MS Now and USA, and steering resources into streaming. 

In an emailed note to Warner Bros. employees on Friday, Chief Executive Officer David Zaslav said the board’s decision to sell the company “reflects the realities of an industry undergoing generational change in how stories are financed, produced, distributed, and discovered.”

The Producers Guild of America said Friday its members are “rightfully concerned about Netflix’s intended acquisition of one of our industry’s most storied and meaningful studios,” while a spokesperson for the Directors Guild of America raised concerns about future pay at Warner Bros.

“We will be meeting with Netflix to outline our concerns and better understand their vision for the future of the company,” the Directors Guild said.

In September, the DGA appointed director Christopher Nolan as its president. Nolan has previously criticized Netflix’s model of releasing films exclusively online, or simultaneously in a small number of cinemas, and has said he won’t make movies for the company.

The Screen Actors Guild said Friday that the transaction “raises many serious questions about its impact on the future of the entertainment industry, and especially the human creative talent whose livelihoods and careers depend on it.”

Oscar winner Jane Fonda spoke out on Thursday before the deal was announced. 

“Consolidation at this scale would be catastrophic for an industry built on free expression, for the creative workers who power it, and for consumers who depend on a free, independent media ecosystem to understand the world,” the star of the Netflix series Grace and Frankie wrote on the Ankler industry news website.

Netflix and Warner Bros. obviously don’t see it that way. In his statement to employees, Zaslav said “the proposed combination of Warner Bros. and Netflix reflects complementary strengths, more choice and value for consumers, a stronger entertainment industry, increased opportunity for creative talent, and long-term value creation for shareholders.”



Source link

Continue Reading

Trending

Copyright © Miami Select.