Skip to main content

· 8 min read
Enes HOSGOR

Policy makers regulating artificial intelligence (created by DALL-E)

AI technology is being rapidly developed and is already being used in sectors such as finance, healthcare, and energy. For these and other critical sectors, there is immense public interest in validating that AI is trustworthy and accurate. The software industry developing these tools is accustomed to quickly getting products to market, and is unlikely to self-regulate at a socially-acceptable level. After all, this is the industry that gave us the mantra “move fast and break things.

There is little debate at this point that there is real value in using AI to help expedite the development of cutting-edge solutions that solve real problems across virtually every industry. But in the rush to build and deliver those products, we cannot lose sight of the potential pitfalls of AI that can arise when underlying models are not subject to a rigorous validation process. The Biden Administration recognizes the importance and urgency of this issue too, which is why it recently issued an Executive Order that “establishes new standards for AI safety and security, protects Americans’ privacy, advances equity and civil rights, stands up for consumers and workers, promotes innovation and competition, advances American leadership around the world, and more.”

As part of that Order, the U.S. Department of Health and Human Services (HHS) is “directed to establish an AI safety program to work in partnership with Patient Safety Organizations” in order to ensure AI solutions are safe and equitable before they hit the market. The goal of the Order is clear, but there is not enough detail about what the structure of the AI safety program should look like, nor is there clarity on exactly what should be measured. Office of the National Coordinator for Health Information Technology (ONC) has recently announced new rules affecting AI vendors and clinicians who use HHS-certified decision support software, effective by the end of 2024. These rules mirror FDA's Good Machine Learning Practices and force both vendors and providers to adopt an AI lifecycle perspective on how models are trained, developed, validated and monitored in a transparent and rigorous manner.

Structure

While the safety program could be established independently by regulatory agencies on a sector-by-sector basis (FDA , Federal Energy Regulatory Commission , U.S. Securities and Exchange Commission , etc.), the commonalities across AI systems present a strong argument for a centralized AI Validation Agency (AIVA) that operates a central validation platform. The problem with a singular, centralized agency however, is that no government body has the expertise, tools, or the human resources required to manage AI validation at scale. In order to meet the velocity at which AI is being developed, we need to create a hub and spoke framework that enables an AIVA to coordinate with relevant government agencies as well as regional and sector-specific AI Validation Platform Entities (AIVPE) that report through to AIVA. This hub and spoke system is an appropriate design as it allows for 1) regional office specialization based on location (i.e. financial AI in New York City), and 2) independent/redundant validation from separate offices to increase confidence in the conclusions.

The focus of the central agency is AI validation, but it cannot maintain expertise in every relevant application domain, nor is there one single organization that can handle the sheer volume of new models that need testing. For example, in the case of medical AI, the spoke office would test and assure the quality of AI tools in a manner compliant with current healthcare regulatory and quality standards. The spoke office would also create a repository to report and track clinical errors associated with the use of the technology in order to fine-tune future models. This repository (and other domain-specific ones) should be available to the public for transparency.

The efficacy of the hub and spoke model will also depend on robust public-private data partnerships that leverage the best available domain expertise. The best way to test and validate new AI models is through hospitals and patient safety organizations donating real world medical data for testing. The U.S. Department of Health and Human Services (HHS)'s Agency for Healthcare Research and Quality (AHRQ) can facilitate this.

A hub-and-spoke framework for model validation (created by DALL-E)

Measurement

AI validation has to be done by experts who are equipped with the appropriate testing tools that do not require coding skills, to inspect and audit AI trustworthiness. Depending on the design and purpose of the AI tool, this can be checked by either a) using test data that allows the tester to give the tool known inputs and verify that it gives outputs that match those in the test dataset or b) using a subject matter expert (example: a radiologist for medical imaging AI validation) to assess the quality of the outputs from a variety of inputs. AIVA's focus on validation of the outcome is of utmost importance. Regulating internal model specifics would likely have unintended consequences as the technology matures.

AIVA would act as a centralized entity that maintains the AIVP and concentrates expertise in AI testing, conducting three general activities:

● Quantitative validation: Using test datasets with known outputs to assess the AI performance using statistical measures (accuracy, precision, recall, etc.). This includes detection of discriminatory behaviors or other socially problematic outputs.

● Qualitative validation: Connecting with subject matter experts to rate the quality of the AI outputs (Example: Doctors when assessing medical AI, FERC experts when assessing electricity dispatch AI). Experts will also advise about subject-specific risks or other concerns that should be assessed/considered for each AI application.

● Communication/dissemination: Similar to the ways that the FDA maintains detailed information about drug trial design and results or the EPA offers detailed emissions data, an important function of the AIVA is to maintain an AIVP that reports methods and results. After privacy and security concerns have been addressed, AIVA should strive for transparency in operations and results.

The validation process also needs regulatory oversight. National Institute of Standards and Technology (NIST) is leading the pack here so far with its AI risk management framework, though there is no universally accepted standard yet. There are roughly 200 standards being discussed globally, but we need to continue to move toward a single set of accepted benchmarks for all applications.

Cost

Depending on the design of relevant regulatory structures, the AIVA costs could be shared with the regulated entities, similar to the cost sharing approach in the FDA and credit ratings agencies. Around half of FDA funding comes from “user fees” paid by the industrial entities seeking fast and effective regulation of drugs or medical devices. In credit ratings, the big three (S&P Global, Moody's Corporation, Fitch Ratings) are financial services firms that offer government-sanctioned credit ratings in the US, with their rating services funded through payments by the rated credit instruments. In both cases, the value of efficient and effective ratings/approval supports the cost-sharing model.

Conclusion

There is simply too much collective benefit across multiple industries that we risk not realizing if there is not a concerted effort to validate AI models and ensure their efficacy and equity in the market. AI has the power to improve human outcomes, save lives, and make things so much more efficient that it requires our government and society to treat it as seriously as we do education, energy, or commerce. The best way to ensure that AI is safe, fair, and equitable is to have a central agency supported by a hub-and-spoke network that serves as a critical watchdog layer.

About the author

Dr. Enes Hosgor is an engineer by training and an AI entrepreneur by trade driven to unlock scientific and technological breakthroughs having built AI products and companies in the last 10+ years in high compliance environments. After selling his first ML company based on his Ph.D. work at Carnegie Mellon University, he joined a digital surgery company named caresyntax to found and leads its ML division. His penchant for healthcare comes from his family of physicians including his late father, sister and wife. Formerly a Fulbright Scholar at the University of Texas at Austin, some of his published scientific work can be found in Medical Image Analysis; International Journal of Computer Assisted Radiology and Surgery; Nature Scientific Reports and British Journal of Surgery, among other peer-reviewed outlets.

Go to post on Linkedin

· 3 min read
Gesund.ai

Gesund.ai is proud to announce that we’ve received a clean SOC 2 - Type II attestation report. This rigorous, independent assessment of our internal security controls serves as validation of our dedication and adherence to the highest standards for security, confidentiality, availability. 

This is an important milestone but is in no way an end to our commitment to our customers and the security of their data. Gesund.ai views security as the foundation upon which our products are built and upon which trust with our customers is earned and maintained.

Gesund.ai uses Drata’s automated platform to continuously monitor its internal security controls against the highest possible standards. With Drata, Gesund.ai has real-time visibility across the organization to ensure the end-to-end security and compliance posture of our systems.

"We are thrilled to achieve our SOC 2 - Type II compliance. A shortage of well-established standards, best practices and compliance protocols has long hurt responsible innovation in medical AI that is safe, effective and equitable for all stakeholders. Gesund is embracing compliance in all possible ways to pave the path to clinical-grade AI creation and adoption. Stay tuned for more quality measures from Gesund as the benchmark for medical AI trustworthiness." attested Dr. Enes Hosgor, CEO, Gesund.ai.

Conducted by MJD Advisors, a nationally recognized CPA firm registered with the Public Company Accounting Oversight Board, this attestation report affirms that Gesund.ai’s information security practices, policies, procedures, and operations meet the rigorous SOC 2 Trust Service Criteria for security, confidentiality, availability.

Developed by the AICPA, SOC 2 is an extensive auditing procedure that ensures that a company is handling customer data securely and in a manner that protects the organization as well as the privacy of its customers. SOC 2 is designed for service providers storing customer data in the cloud.

As more enterprises look to process sensitive and confidential business data with cloud-based services like Gesund.ai, it’s critical that they do so in a way that ensures their data will remain safe. Our customers carry this responsibility on their shoulders every single day, and it’s important that the vendors they select to process their data in the cloud approach that responsibility in the same way. 

We welcome all customers and prospects who are interested in discussing our commitment to security and reviewing our SOC compliance reports to contact us.

About Gesund.ai

Gesund is the world’s first compliant AI factory on a mission to help bring clinical-grade AI solutions to market. Backed by marquee investors including Merck, McKesson, Northpond and 500, Gesund orchestrates the entire AI/ML lifecycle for all stakeholders by bringing models, data and experts together in a no-code environment.

About Drata

Drata is the world's most advanced security and compliance automation platform with the mission to help businesses earn and keep the trust of their users, customers, partners, and prospects. With Drata, thousands of companies streamline over 10 compliance frameworks—such as SOC 2, ISO 27001, GDPR, and more—through continuous, automated control monitoring and evidence collection, resulting in a strong security posture, lower costs, and less time spent preparing for annual audits. The company is backed by ICONIQ Growth, Alkeon Capital, Salesforce Ventures, GGV Capital, Cowboy Ventures, Leaders Fund, Okta Ventures, SVCI, SV Angel, and many key industry leaders. For more information, visit drata.com.