Rob Carey, president of Cloudera Government Solutions, said federal agencies must develop a proper data management strategy to protect sensitive data used to train artificial intelligence models.

“Moving forward, investment in trusted data will be vital for the progress of AI in the public sector,” Carey, a previous Wash100 awardee, wrote in a commentary published Monday in Nextgov/FCW.

He noted that functional AI systems depend on clean, secure data, which can be achieved through the use of open data lakehouses that help facilitate data-driven operations and data literacy by improving trust in data through governance.

Carey stated that these open data lakehouses serve as centralized repositories that enable data storage and distribution and help improve flexibility to expand analytics and AI while ensuring data quality and streamlining data security.

“Data lakehouses also provide end-to-end management and control capabilities throughout the data lifecycle,” he added.

The Cloudera executive cited how open data initiatives and data lakehouses could help the government manage their data assets in a secure manner and stressed the need of language models and vector databases for trusted data to support agency missions.

“Amid rising security threats to AI models, it’s important for federal agencies to maintain data integrity, privacy, and compliance with regulatory requirements,” Carey added.

