Connect with us

Business

The world’s best AI models operate in English. Other languages—even major ones like Cantonese—risk falling further behind

Published

on



How do you translate “dim sum”? Many English speakers would find the question strange, knowing the term refers to the large array of small dishes that accompanies a Cantonese-style brunch—and so doesn’t need translation. 

But words like “dim sum” are a challenge for developers like Jacky Chan, who launched a Cantonese large language model last year through his startup Votee. It might be obvious to a human translator what words are loanwords and which need direct translation. Yet it’s less intuitive for machines.  

“It’s not natural enough,” Chan says. “When you see it, you know it’s not something a human writes.”

Translation troubles are part of a growing list of issues when today’s AI models, strongest in English and other major languages, try to work in an array of smaller tongues still spoken by tens of millions of people.  

When AI “models encounter a word they don’t know or that doesn’t exist in another culture, they will simply make up a translation,” explains Aliya Bhatia, a senior policy analyst at the Center of Democracy & Technology, where she researches issues related to multilingual AI. “As a result, many machine-created datasets could feature mistranslations, words that no native speaker actually uses in a specific language.” 

LLMs need data, and lots of it. Text from books, articles and websites is broken down into smaller word sequences to form a model’s training dataset. From this, LLMs learn how to predict the next word in a sequence, eventually generating text.  

AI can now generate text remarkably well—at least, it can in English. In other languages, performance lags significantly. Roughly half of all web content is in English, meaning there’s no shortage of digital resources for LLMs to learn from. Many other languages do not enjoy this same abundance. 

Low-resource languages

So-called low resource languages are those with limited online data. Endangered languages, no longer being passed down to younger generations, clearly fall into this category. But widely spoken languages like Cantonese, Vietnamese and Bahasa Indonesia are also considered low-resource. 

One reason could be limited internet access, which would prevent the creation of digital content. Another could be government regulation, which might limit what’s available online. Indonesia, for example, can remove online content without offering a way to appeal decisions. The resulting self-censorship may mean that available data in some regional languages might not represent authentic local culture. 

This resource gap leads to a performance gap: Non-English LLMs are more likely to produce gibberish or inaccurate answers. LLMs also struggle with languages that don’t use Latin script, the set of letters used in English, as well as those with tonal features that are hard to represent in writing or code.  

Currently, the best-performing models work in English and, to a lesser extent, Mandarin Chinese. That reflects where the world’s biggest tech companies are based. But outside of San Francisco and Hangzhou, a legion of developers, large and small, are trying to make AI work for everyone. 

South Korean internet firm Naver has built an LLM, HyperCLOVA X, which it claims is trained on 6,500 times more Korean data than GPT-4. Naver is also working in markets like Saudi Arabia and Thailand in a bid to expand its business creating “sovereign AI,” or AI tailored to a specific country’s needs. “We focus on what companies and governments that want to use AI would want, and what needs Big Tech can’t fulfill,” CEO Choi Soo-Yeon told Fortune last year.  

In Indonesia, telecom operator Indosat and tech startup Goto are collaborating to launch a 70 billion parameter LLM that operates in Bahasa Indonesia as well as five other local languages, including Javanese, Balinese, and Bataknese. 

One hurdle is scale. The most powerful LLMs are massive, made up of billions of word sequences converted into variables known as parameters. OpenAI’s GPT-4 is estimated to have around 1.8 trillion parameters. DeepSeek’s R1 has 671 billion

Non-English LLMs seriously struggle to achieve this kind of scale. The Southeast Asian Languages in One Model (SEA-LION) project has trained two models from scratch: One with 3 billion parameters and one with 7 billion, much smaller than leading English and Chinese models.  

Chan, from Votee, faces these struggles when dealing with Cantonese, spoken by 85 million people across southern China and Hong Kong. Cantonese uses different grammar for formal writing compared to informal writing and speech. Available digital data is scarce and often low-quality. 

Training on digitalized Cantonese texts is like “learning from a library with many books, but they have lots of typos, they are poorly translated, or they’re just plain wrong,” says Chan. 

Without a comprehensive dataset, an LLM can’t produce complete results. Data for low-resource language often skews towards formal texts—legal documents, religious texts, or Wikipedia entries—since these are more likely to be digitized. This bias can distort an LLM’s tone, vocabulary and style, and limit its knowledge.  

LLMs have no inherent sense of what is true, and so false or incomplete information will be reproduced as fact. A model trained solely on Vietnamese pop music might struggle to accurately answer questions on historical events, particularly those not related to Vietnam.  

Translating English content

Turning English content into the target language is one way to supplement the otherwise-limited training data. As Chan explains, “we synthesize the data using AI so that we can have more data to do the training.” 

But machine translation carries risk. It can miss linguistic nuance or cultural context. A Georgia Tech study of cultural bias in Arabic LLMs found that AI models trained on Arabic datasets still exhibited Western bias, such as referencing alcoholic beverages in Islamic religious contexts. It turned out that much of the pre-training data for these models came from web-crawled Arabic content that was machine-translated from English, allowing cultural values to sneak through.  

In the long-term, AI-generated content might end up polluting low-resource languages datasets. Chan likens it to “a photocopy of a photocopy,” with each iteration degrading the quality. In 2024, Nature warned of “model collapse,” where AI-generated text could contaminate the training data for future LLMs, leading to worse performance.   

The threat is even greater for low-resource languages. With less genuine content out there, AI-generated content could quickly end up making up a larger share of what’s online in a given language.  

Large businesses are starting to realize the opportunities in building a non-English AI. But while these companies are key players in their respective tech sectors, they’re still much smaller than giants like Alibaba, OpenAI, and Microsoft.  

Bhatia says more organizations—both for-profit and not-for-profit—need to invest in multilingual AI if this new technology is to be truly global.  

“If LLMs are going to be used to equip people with access to economic opportunities, educational resources, and more, they should work in the languages people use,” she says. 

Fortune is bringing Brainstorm AI back to Asia on July 22-23 with the latest edition of our Brainstorm AI Singapore conference. Fortune will be convening the smartest people we know—technologists, entrepreneurs, Fortune Global 500 executives, investors, policymakers, and the brilliant minds in between—to explore and interrogate the most pressing questions about AI. Register here!



Source link

Continue Reading

Business

Mark Zuckerberg says the ‘most important thing’ he built at Harvard was a prank website

Published

on



For Mark Zuckerberg, the most significant creation from his two years at Harvard University wasn’t the precursor to a global social network, but a prank website that nearly got him expelled.

The Meta CEO said in a 2017 commencement address at his alma mater that the controversial site, Facemash, was “the most important thing I built in my time here” for one simple reason: it led him to his wife, Priscilla Chan.

“Without Facemash I wouldn’t have met Priscilla, and she’s the most important person in my life,” Zuckerberg said during the speech.

In 2003, Zuckerberg, then a sophomore, created Facemash by hacking into Harvard’s online student directories and using the photos to create a site where users could rank students’ attractiveness. The site went viral, but it was quickly shut down by the university. Zuckerberg was called before Harvard’s Administrative Board, facing accusations of breaching security, violating copyrights, and infringing on individual privacy.

“Everyone thought I was going to get kicked out,” Zuckerberg recalled in his speech. “My parents came to help me pack. My friends threw me a going-away party.”

It was at this party, thrown by friends who believed his expulsion was imminent, where he met Chan, another Harvard undergraduate. “We met in line for the bathroom in the Pfoho Belltower, and in what must be one of the all time romantic lines, I said: ‘I’m going to get kicked out in three days, so we need to go on a date quickly,’” Zuckerberg said.

Chan, who described her now-husband to The New Yorker as “this nerdy guy who was just a little bit out there,” went on the date with him. Zuckerberg did not get expelled from Harvard after all, but he did famously drop out the following year to focus on building Facebook.

While the 2010 film The Social Network portrayed Facemash as a critical stepping stone to the creation of Facebook, Zuckerberg himself has downplayed its technical or conceptual importance.

“And, you know, that movie made it seem like Facemash was so important to creating Facebook. It wasn’t,” he said during his commencement speech. But he did confirm that the series of events it set in motion—the administrative hearing, the “going-away” party, the line for the bathroom—ultimately connected him with the mother of his three children.

Chan, for her part, went on to graduate from Harvard in 2007, taught science, and then attended medical school at the University of California, San Francisco, becoming a pediatrician.

She and Zuckerberg got married in 2012, and in 2015, they co-founded the Chan Zuckerberg Initiative, a philanthropic organization focused on leveraging technology to address major world challenges in health, education, and science. Chan serves as co-CEO of the initiative, which has pledged to give away 99% of the couple’s shares in Meta Platforms to fund its work.

You can watch the entirety of Zuckerberg’s Harvard commencement speech below:

For this story, Fortune journalists used generative AI as a research tool. An editor verified the accuracy of the information before publishing. 



Source link

Continue Reading

Business

Senate Dems’ plan to fix Obamacare premiums adds nearly $300 billion to deficit, CRFB says

Published

on



The Committee for a Responsible Federal Budget (CRFB) is a nonpartisan watchdog that regularly estimates how much the U.S. Congress is adding to the $38 trillion national debt.

With enhanced Affordable Care Act (ACA) subsidies due to expire within days, some Senate Democrats are scrambling to protect millions of Americans from getting the unpleasant holiday gift of spiking health insurance premiums. The CRFB says there’s just one problem with the plan: It’s not funded.

“With the national debt as large as the economy and interest payments costing $1 trillion annually, it is absurd to suggest adding hundreds of billions more to the debt,” CRFB President Maya MacGuineas wrote in a statement on Friday afternoon.

The proposal, backed by members of the Senate Democratic caucus, would fully extend the enhanced ACA subsidies for three years, from 2026 through 2028, with no additional income limits on who can qualify. Those subsidies, originally boosted during the pandemic and later renewed, were designed to lower premiums and prevent coverage losses for middle‑ and lower‑income households purchasing insurance on the ACA exchanges.

CRFB estimated that even this three‑year extension alone would add roughly $300 billion to federal deficits over the next decade, largely because the federal government would continue to shoulder a larger share of premium costs while enrollment and subsidy amounts remain elevated. If Congress ultimately moves to make the enhanced subsidies permanent—as many advocates have urged—the total cost could swell to nearly $550 billion in additional borrowing over the next decade.

Reversing recent guardrails

MacGuineas called the Senate bill “far worse than even a debt-financed extension” as it would roll back several “program integrity” measures that were enacted as part of a 2025 reconciliation law and were intended to tighten oversight of ACA subsidies. On top of that, it would be funded by borrowing even more. “This is a bad idea made worse,” MacGuineas added.

The watchdog group’s central critique is that the new Senate plan does not attempt to offset its costs through spending cuts or new revenue and, in their view, goes beyond a simple extension by expanding the underlying subsidy structure.

The legislation would permanently repeal restrictions that eliminated subsidies for certain groups enrolling during special enrollment periods and would scrap rules requiring full repayment of excess advance subsidies and stricter verification of eligibility and tax reconciliation. The bill would also nullify portions of a 2025 federal regulation that loosened limits on the actuarial value of exchange plans and altered how subsidies are calculated, effectively reshaping how generous plans can be and how federal support is determined. CRFB warned these reversals would increase costs further while weakening safeguards designed to reduce misuse and error in the subsidy system.

MacGuineas said that any subsidy extension should be paired with broader reforms to curb health spending and reduce overall borrowing. In her view, lawmakers are missing a chance to redesign ACA support in a way that lowers premiums while also improving the long‑term budget outlook.

The debate over ACA subsidies recently contributed to a government funding standoff, and CRFB argued that the new Senate bill reflects a political compromise that prioritizes short‑term relief over long‑term fiscal responsibility.

“After a pointless government shutdown over this issue, it is beyond disappointing that this is the preferred solution to such an important issue,” MacGuineas wrote.

The off-year elections cast the government shutdown and cost-of-living arguments in a different light. Democrats made stunning gains and almost flipped a deep-red district in Tennessee as politicians from the far left and center coalesced around “affordability.”

Senate Minority Leader Chuck Schumer is reportedly smelling blood in the water and doubling down on the theme heading into the pivotal midterm elections of 2026. President Donald Trump is scheduled to visit Pennsylvania soon to discuss pocketbook anxieties. But he is repeating predecessor Joe Biden’s habit of dismissing inflation, despite widespread evidence to the contrary.

“We fixed inflation, and we fixed almost everything,” Trump said in a Tuesday cabinet meeting, in which he also dismissed affordability as a “hoax” pushed by Democrats.​

Lawmakers on both sides of the aisle now face a politically fraught choice: allow premiums to jump sharply—including in swing states like Pennsylvania where ACA enrollees face double‑digit increases—or pass an expensive subsidy extension that would, as CRFB calculates, explode the deficit without addressing underlying health care costs.



Source link

Continue Reading

Business

Netflix–Warner Bros. deal sets up $72 billion antitrust test

Published

on



Netflix Inc. has won the heated takeover battle for Warner Bros. Discovery Inc. Now it must convince global antitrust regulators that the deal won’t give it an illegal advantage in the streaming market. 

The $72 billion tie-up joins the world’s dominant paid streaming service with one of Hollywood’s most iconic movie studios. It would reshape the market for online video content by combining the No. 1 streaming player with the No. 4 service HBO Max and its blockbuster hits such as Game Of ThronesFriends, and the DC Universe comics characters franchise.  

That could raise red flags for global antitrust regulators over concerns that Netflix would have too much control over the streaming market. The company faces a lengthy Justice Department review and a possible US lawsuit seeking to block the deal if it doesn’t adopt some remedies to get it cleared, analysts said.

“Netflix will have an uphill climb unless it agrees to divest HBO Max as well as additional behavioral commitments — particularly on licensing content,” said Bloomberg Intelligence analyst Jennifer Rie. “The streaming overlap is significant,” she added, saying the argument that “the market should be viewed more broadly is a tough one to win.”

By choosing Netflix, Warner Bros. has jilted another bidder, Paramount Skydance Corp., a move that risks touching off a political battle in Washington. Paramount is backed by the world’s second-richest man, Larry Ellison, and his son, David Ellison, and the company has touted their longstanding close ties to President Donald Trump. Their acquisition of Paramount, which closed in August, has won public praise from Trump. 

Comcast Corp. also made a bid for Warner Bros., looking to merge it with its NBCUniversal division.

The Justice Department’s antitrust division, which would review the transaction in the US, could argue that the deal is illegal on its face because the combined market share would put Netflix well over a 30% threshold.

The White House, the Justice Department and Comcast didn’t immediately respond to requests for comment. 

US lawmakers from both parties, including Republican Representative Darrell Issa and Democratic Senator Elizabeth Warren have already faulted the transaction — which would create a global streaming giant with 450 million users — as harmful to consumers.

“This deal looks like an anti-monopoly nightmare,” Warren said after the Netflix announcement. Utah Senator Mike Lee, a Republican, said in a social media post earlier this week that a Warner Bros.-Netflix tie-up would raise more serious competition questions “than any transaction I’ve seen in about a decade.”

European Union regulators are also likely to subject the Netflix proposal to an intensive review amid pressure from legislators. In the UK, the deal has already drawn scrutiny before the announcement, with House of Lords member Baroness Luciana Berger pressing the government on how the transaction would impact competition and consumer prices.

The combined company could raise prices and broadly impact “culture, film, cinemas and theater releases,”said Andreas Schwab, a leading member of the European Parliament on competition issues, after the announcement.

Paramount has sought to frame the Netflix deal as a non-starter. “The simple truth is that a deal with Netflix as the buyer likely will never close, due to antitrust and regulatory challenges in the United States and in most jurisdictions abroad,” Paramount’s antitrust lawyers wrote to their counterparts at Warner Bros. on Dec. 1.

Appealing directly to Trump could help Netflix avoid intense antitrust scrutiny, New Street Research’s Blair Levin wrote in a note on Friday. Levin said it’s possible that Trump could come to see the benefit of switching from a pro-Paramount position to a pro-Netflix position. “And if he does so, we believe the DOJ will follow suit,” Levin wrote.

Netflix co-Chief Executive Officer Ted Sarandos had dinner with Trump at the president’s Mar-a-Lago resort in Florida last December, a move other CEOs made after the election in order to win over the administration. In a call with investors Friday morning, Sarandos said that he’s “highly confident in the regulatory process,” contending the deal favors consumers, workers and innovation. 

“Our plans here are to work really closely with all the appropriate governments and regulators, but really confident that we’re going to get all the necessary approvals that we need,” he said.

Netflix will likely argue to regulators that other video services such as Google’s YouTube and ByteDance Ltd.’s TikTok should be included in any analysis of the market, which would dramatically shrink the company’s perceived dominance.

The US Federal Communications Commission, which regulates the transfer of broadcast-TV licenses, isn’t expected to play a role in the deal, as neither hold such licenses. Warner Bros. plans to spin off its cable TV division, which includes channels such as CNN, TBS and TNT, before the sale.

Even if antitrust reviews just focus on streaming, Netflix believes it will ultimately prevail, pointing to Amazon.com Inc.’s Prime and Walt Disney Co. as other major competitors, according to people familiar with the company’s thinking. 

Netflix is expected to argue that more than 75% of HBO Max subscribers already subscribe to Netflix, making them complementary offerings rather than competitors, said the people, who asked not to be named discussing confidential deliberations. The company is expected to make the case that reducing its content costs through owning Warner Bros., eliminating redundant back-end technology and bundling Netflix with Max will yield lower prices.



Source link

Continue Reading

Trending

Copyright © Miami Select.