Connect with us

Business

OpenAI’s new AI safety tools could give a false sense of security

Published

on



OpenAI last week unveiled two new free-to-download tools that are supposed to make it easier for businesses to construct guardrails around the prompts users feed AI models and the outputs those systems generate.

The new guardrails are designed so a company can, for instance, more easily set up contorls to prevent a customer service chatbot responding with a rude tone or revealing internal policies about how it should make decisions around offering refunds, for example.

But while these tools are designed to make AI models safer for business customers, some security experts caution that the way OpenAI has released them could create new vulnerabilities and give companies a false sense of security. And, while OpenAI says it has released these security tools for the good of everyone, some question whether OpenAI’s motives aren’t driven in part by a desire to blunt one advantage that its AI rival Anthropic, which has been gaining traction among business users in part because of a perception that its Claude models have more robust guardrails than other competitors.

The OpenAI security tools—which are called gpt-oss-safeguard-120b and gpt-oss-safeguard-20b—are themselves a type of AI model known as a classifier, which is designed to assess whether the prompt a user submits to a larger, more general-purpose AI model as well as that larger AI model produces meet a set of rules. Companies that purchase and deploy AI models could, in the past, train these classifiers themselves, but the process was time-consuming and potentially expensive, since the developers would have to collect examples of content that violates the policy in order to train the classifier. And then, if the company wanted to adjust the policies used for the guardrails, they would have to collect new examples of violations and retrain the classifier.

OpenAI is hoping the new tools can make that process faster and more flexible. Rather than being trained to follow one fixed rulebook, these new security classifiers can simply read a written policy and apply it to new content.

OpenAI says this method, which it calls “reasoning-based classification,” allows companies to adjust their safety policies as easily as editing the text in a document instead of rebuilding an entire classification model. The company is positioning the release as a tool for enterprises that want more control over how their AI systems handle sensitive information, such as medical records or personnel records.

However, while the tools are supposed to be safer for enterprise customers, some safety experts say that they instead may give users a false sense of security. That’s because OpenAI has open-sourced the AI classifiers. That means they have made all the code for the classifiers available for free, including the weights, or the internal settings of the AI models.

Classifiers act like extra security gates for an AI system, designed to stop unsafe or malicious prompts before they reach the main model. But by open-sourcing them, OpenAI risks sharing the blueprints to those gates. That transparency could help researchers strengthen safety mechanisms, but it might also make it easier for bad actors to find the weak spots and risks, creating a kind of false comfort.

“Making these models open source can help attackers as well as defenders,” David Krueger, an AI safety professor at Mila, told Fortune. It will make it easier to develop approaches to bypassing the classifiers and other similar safeguards.”

For instance, when attackers have access to the classifier’s weights, they can more easily develop what are known as “prompt injection” attacks, where they develop prompts that trick the classifier into disregarding the policy it is supposed to be enforcing. Security researchers have found that in some cases even a string of characters that look nonsensical to a person can, for reasons researchers don’t entirely understand, convince an AI model to disregard its guardrails and do something it is not supposed to, such as offer advice for making a bomb or spew racist abuse.

Representatives for OpenAI directed Fortune to the company’s blog post announcement and technical report for the models.

Short-term pain for long-term gains

Open-source can be a double-edged sword when it comes to safety. It allows researchers and developers to test, improve, and adapt AI safeguards more quickly, increasing transparency and trust. For instance, there may be ways in which security researchers could adjust the model’s weights to make it more robust to prompt injection without degrading the model’s performance.

But it can also make it easier for attackers to study and bypass those very protections—for instance, by using other machine learning software to run through hundreds of thousands of possible prompts until it finds ones that will cause the model to jump its guardrails. What’s more, security researchers have found that these kinds of automatically-generated prompt injection attacks developed on open source AI models will also sometimes work against proprietary AI models, where the attackers don’t have access to the underlying code and model weights. Researchers have speculated this is because there may be something inherent in the way all large language models encode language that similar prompt injections will have success against any AI model.

In this way, open sourcing the classifiers may not just give users a false sense of security that their own system is well-guarded, it may actually make every AI model less secure. But experts said that this risk was probably worth taking because open-sourcing the classifiers should also make it easier for all of the world’s security experts to find ways to make the classifiers more resistant to these kinds of attacks.

“In the long term, it’s beneficial to kind of share the way your defenses work— it may result in some kind of short-term pain. But in the long term, it results in robust defenses that are actually pretty hard to circumvent,” Vasilios Mavroudis, principal research scientist at the Alan Turing Institute, said.

Mavroudis said that while open-sourcing the classifiers could, in theory, make it easier for someone to try to bypass the safety systems on OpenAI’s main models, the company likely believes this risk is low. He said that OpenAI has other safeguards in place, including having teams of human security experts continually trying to test their models’ guardrails in order to find vulnerabilities and hopefully improve them.

“Open-sourcing a classifier model gives those who want to bypass classifiers an opportunity to learn about how to do that. But determined jailbreakers are likely to be successful anyway,” Robert Trager, co-director of the Oxford Martin AI Governance Initiative, said.

“We recently came across a method that bypassed all safeguards of the major developers around 95% of the time — and we weren’t looking for such a method. Given that determined jailbreakers will be successful anyway, it’s useful to open-source systems that developers can use for the less determined folks,” he added.

The enterprise AI race

The release also has competitive implications, especially as OpenAI looks to challenge rival AI company Anthropic’s growing foothold among enterprise customers. Anthropic’s Claude family of AI models have become popular with enterprise customers partly because of their reputation for stronger safety controls compared to other AI models. Among the safety tools Anthropic uses are “constitutional classifiers” that work similarly to the ones OpenAI just open-sourced.

Anthropic has been carving out a market niche with enterprise customers, especially when it comes to coding. According to a July report from Menlo Ventures, Anthropic holds 32% of the enterprise large language model market share by usage compared to OpenAI’s 25%. In coding‑specific use cases, Anthropic reportedly holds 42%, while OpenAI has 21%. By offering enterprise-focused tools, OpenAI may be attempting to win over some of these business customers, while also positioning itself as a leader in AI safety.

Anthropic’s “constitutional classifiers,” consist of small language models that check a larger model’s outputs against a written set of values or policies. By open-sourcing a similar capability, OpenAI is effectively giving developers the same kind of customizable guardrails that helped make Anthropic’s models so appealing.

“From what I’ve seen from the community, it seems to be well received,” Mavroudis said. “They see the model as potentially a way to have auto-moderation. It also comes with some good connotation, as in, ‘we’re giving to the community.’ It’s probably also a useful tool for small enterprises where they wouldn’t be able to train such a model on their own.”

Some experts also worry that open-sourcing these safety classifiers could centralize what counts as “safe” AI.

“Safety is not a well-defined concept. Any implementation of safety standards will reflect the values and priorities of the organization that creates it, as well as the limits and deficiencies of its models,” John Thickstun, an assistant professor of computer science at Cornell University, told VentureBeat. “If industry as a whole adopts standards developed by OpenAI, we risk institutionalizing one particular perspective on safety and short-circuiting broader investigations into the safety needs for AI deployments across many sectors of society.”



Source link

Continue Reading

Business

Exclusive: Alphabet’s CapitalG names Jill Chase and Alex Nichols as general partners

Published

on


I love watching “Next Man Up” basketball, where the spotlight rotates unpredictably. One night it’s the bench guard dropping 30, the next it’s the role player posting a triple-double.

CapitalG’s Jill Chase—who captained her college basketball team at Williams College—says this logic actually applies to Alphabet’s growth firm. When I ask her what basketball team is most like CapitalG, she lists the WNBA’s Golden State Valkyries. 

“Everybody has a different skill set, and everybody is willing to drop anything to help each other win,” said Chase. “It’s a different person every night who wins the game. And I think that’s really consistent with the way CapitalG is building its culture.”

For the first time since the firm was started in 2013, it’s promoting two general partners, Chase and Alex Nichols, Fortune has exclusively learned. Chase, who joined CapitalG in 2020 specifically with a thesis around AI, has backed Abridge, Baseten, Canva, LangChain, Physical Intelligence, and Rippling. 

Nichols, meanwhile, joined CapitalG in 2018 as an associate and was promoted to partner just two years ago. He previously worked with managing partner Laela Sturdy on the firm’s investments in Duolingo, Stripe, and Whatnot, and recently led CapitalG’s investment in Zach Dell’s energy startup BasePower. At a moment where there’s mounting angst around data centers and what it will take to power them, Nichols has a surprising take on how AI will affect energy—that both batteries and solar are getting cheaper and better at something like Moore’s Law speed. Those twin cost curves, over time, should actually drive energy prices down

“I’m actually very optimistic about the future of energy prices,” he said. “You look at the history of energy consumption versus GDP. And cheap energy means more production, more income, and means a higher standard of living.”

At a moment when venture is perhaps more competitive than ever—and there are certainly some solo GPs out there making their mark—there’s an argument that as lines blur between disciplines in an AI-ified world, venture is by necessity a team sport.  

Sturdy—who’s been CapitalG’s managing partner since 2023 (and also captained her college basketball team)—and Chase both have clearly taken some learnings from their time on the court. Chase sees venture overall as becoming more team-oriented: “Historically, it used to be like ‘you made general partner, go out and win your deal.’ To me, that’s not the right way to be successful in venture ever.” 

Sturdy adds that in basketball, like venture, “We have to look at the scoreboard every once in a while, and you have to get back up when you get crushed… And, of course, coming together is better than playing alone.”

Term Sheet Podcast…This week, I spoke with Exelon CEO Calvin Butler. As resource-hungry data centers continue to sprout across the country, many are questioning whether the nation’s utility network can keep pace with such large-scale demand. Butler says it can. Listen and watch here.

See you tomorrow,

Allie Garfinkle
X:
@agarfinks
Email: alexandra.garfinkle@fortune.com
Submit a deal for the Term Sheet newsletter here.

Joey Abrams curated the deals section of today’s newsletter. Subscribe here.

VENTURE CAPITAL

humans&, a San Francisco-based AI lab, raised $480 million in seed funding. SV Angel and Georges Harik led the round and were joined by NVIDIA and others.

Emergent, a San Francisco-based platform designed for AI software creation, raised $70 million in Series B funding. Khosla Ventures and SoftBank led the round and were joined by Prosus, Lightspeed, Together, and Y Combinator.

Exciva, a Heidelberg, Germany-based developer of therapeutics designed for neuropsychiatric conditions, raised €51 million ($59 million) in Series B funding. Gimv and EQT Life Sciences led the round and were joined by Fountain Healthcare Partners, LifeArc Ventures, and others.

Pomelo, a Buenos Aires, Argentina-based payments infrastructure company, raised $55 million in Series C funding. Kaszek and Insight Partners led the round and were joined by Index Ventures, Adams Street Partners, S32, and others.

Cloover, a Berlin, Germany-based operating system designed for energy independence, raised $22 million in Series A funding. MMC Ventures and QED Investors led the round and were joined by Lowercarbon Capital, BNVT Capital, Bosch Ventures, and others.

Statusphere, a Winter Park, Fla.-based influencer marketing technology platform, raised $18 million in Series A funding. Volition Capital led the round and was joined by HearstLab, 1984 Ventures, and How Women Invest.

Dominion Dynamics, an Ottawa, Canada-based defense technology company, raised $21M CAD ($15.2M USD) in seed funding. Georgian led the round and was joined by Bessemer Venture Partners and British Columbia Investment Management Corporation.

Cosmos, a New York City-based image collection and discovery platform, raised $15 million in Series A funding. Shine Capital led the round and was joined by Matrix and others.

Mave, a Toronto, Canada-based real estate AI company, raised $5 million in seed funding from Staircase Ventures, Relay Ventures, N49P, and Alate Partners.

Stilla, a Stockholm, Sweden-based developer of an AI designed to accommodate entire teams, raised $5 million in pre-seed funding. General Catalyst led the round and was joined by others.

Asymmetric Security, a London, U.K. and San Francisco-based cyber forensics company, raised $4.2 million in pre-seed funding. Susa Ventures led the round and was joined by Halcyon Ventures, Overlook Ventures, and angel investors.

PRIVATE EQUITY

ConnectWise, backed by Thoma Bravo, acquired zofiQ, a Toronto, Ontario-based agentic AI technology company designed to automate high-service desk operations. Financial terms were not disclosed. 

Grant Avenue Capital acquired 21st Century Healthcare, a Tempe, Ariz.-based vitamins, minerals, and supplements company. Financial terms were not disclosed.

Highlander Partners acquired Tapatio, a Vernon, Calif.-based hot sauce brand. Financial terms were not disclosed. 

Platinum Equity acquired Czarnowski Collective, a Chicago, Ill.-based exhibit and events company. Financial terms were not disclosed.

United Building Solutions, backed by AE Industrial, acquired DFW Mechanical Group, a Wylie, Texas-based HVAC solutions company. Financial terms were not disclosed.

IPOS

PicPay, a Sao Paolo, Brazil-based digital bank, now plans to raise up to $435.1 million in an offering of 22.9 million shares priced between $16 and $19 on the Nasdaq. The company posted $1.7 billion in revenue for the year ended September 30. J&F International and Banco Original back the company.

Ethos Technologies, a San Francisco-based online life insurance provider, plans to raise up to $210 million in an offering of 10.5 million shares priced between $18 and $20. The company posted $344 million in revenue for the year ended Sept. 30. General Catalyst, Heroic Ventures, Eric Lantz, and others back the company.

FUNDS + FUNDS OF FUNDS

Blueprint Equity, a La Jolla, Calif.-based growth equity firm, raised $333 million for its third fund focused on enterprise software, business-to-business, and tech-enabled services companies.

PEOPLE

Area 15 Ventures, a Castle Pine, Colo.-based venture capital firm, promoted Adam Contos to managing partner.

Bull City Venture Partners, a Durham, N.C.-based venture capital firm, hired Carly Connell as a principal.

Harvest Partners, a New York City-based private equity firm, promoted Lucas Rodgers to partner, Matthew Bruckmann and Ian Singleton to principal, and Connor Scro to vice president on the private equity team. 

Wingman Growth Partners, a Greenwich, Conn.-based private equity firm, hired Cheri Reeve as CFO. She previously served as principal and CFO at Atlas Holdings.



Source link

Continue Reading

Business

Davos 2026: reading the signals, not the headlines

Published

on


Davos 2026: reading the signals, not the headlines | Fortune

Louisa Loran advises boards and leadership teams on transformation and long-term value creation and currently serves on the boards of Copenhagen Business School and CataCap Private Equity. At Google, Louisa launched a billion-dollar supply chain solutions business, doubled growth in a global industry vertical, and led strategic business transformation for the company’s largest customers in EMEA—working at the forefront of AI, data, and platform innovation. At Maersk, she co-authored the strategy that redefined the brand globally and doubled its share price, helping pivot the company from traditional shipping to integrated logistics. Her career began in the luxury and FMCG space with Moët Hennessy and Diageo, where she built iconic brands and led innovation at the intersection of heritage and digital transformation.



Source link

Continue Reading

Business

Hotels allege predatory pricing, forced exclusivity in Trip.com antitrust probe

Published

on



China’s hotels are welcoming record numbers of travelers, yet room rates are sinking—a paradox many operators blame on Trip.com Group Ltd.

For Gary Huang, running a five-room homestay in the scenic Huzhou hills near Shanghai was supposed to secure his family’s financial future. Instead, he and other hoteliers in China’s southeastern Zhejiang province say nightly rates have fallen to levels last seen more than a decade ago, as Trip.com’s frequent discount campaigns force them to cut prices simply to remain visible on China’s dominant booking platform.

“The promotion campaigns now are almost a daily routine,” said Huang, who asked to use his self-given English name out of concern of speaking out against Trip.com. “We have to constantly cut prices at least 15% to attract travelers. We have no choice but to go along with the price cuts.”

Trip.com has been central to China’s post-pandemic travel rebound, connecting millions of travelers with small operators like Huang. But for many hotels, visibility—and sometimes survival—comes at the expense of profits.

That dynamic is now at the heart of Beijing’s antitrust probe. Regulators allege Trip.com is abusing its market position, with analysts citing deflation across the sector as the government’s main concern. Interviews with lodging operators, industry groups and travel consultants describe a system where constant price-cutting and opaque policies are eroding profitability, even as demand rebounds.

Trip.com has said it’s cooperating with the government’s investigation. The company’s stock dove more 16% since the probe was announced a week ago. 

Revenue per room—a key hotel metric—was flat across China in 2025, even as other Asian markets saw gains, according to Bloomberg Intelligence. Marriott International Inc.’s revenue per room in China fell 1% most of last year, while Hilton’s China room revenue trailed its regional peers.

The company controls about 56% of China’s online travel market, according to China Trading Desk, and has grown into the world’s largest booking site. Its dominance has helped fuel domestic tourism’s recovery—nearly 5 billion trips were logged in the first three quarters of 2025—but operators say the benefits are being offset by falling room yields.

“The market has developed unevenly and innovation is lacking due to monopolistic practices,” said He Shuangquan, head of the Yunnan Provincial Tourism Homestay Industry Association that represents some 7,000 operators. “The entire online travel agency sector is stagnating in a pool of dead water.”

‘Pick-one-of-two’

The broader challenge is oversupply and cautious consumer spending. In regions like Yunnan, hotel capacity has tripled since the pandemic, just as travelers tightened budgets. Consultants note that while people are traveling more, they’re spending less—leaving hotels slashing rates to fill empty beds and posting billions in losses.

For operators like Huang, the paradox is stark: the platform that delivers customers is also accelerating the race to the bottom. The complaints center around Trip.com’s “er xuan yi,” Mandarin for pick-one-of-two exclusivity arrangements—a practice that Chinese regulators have repeatedly vowed to stamp out.

Trip.com categorizes merchants into tiers with “Special Merchants” enjoying the most visibility and traffic, Yunnan Provincial Tourism’s He said. However, these top-tier merchants are typically prohibited from listing on rival platforms like Alibaba’s Fliggy, ByteDance’s Douyin or Meituan. Merchants who aren’t bound by these exclusive arrangements report being effectively compelled to offer the lowest prices on Trip.com’s online booking platform Ctrip, or risk facing a raft of measures like lowered search rankings.



Source link

Continue Reading

Trending

Copyright © Miami Select.