The artificial intelligence tools and applications taking over the government and commercial technology landscapes today are some of the fastest growing in history. As AI continues its impressive trajectory and captures the interest of public and private sector leaders alike, new opportunities and challenges are emerging.
To get a better grasp of the current AI market, Executive Mosaic spoke with Hitachi Vantara Federal’s Chief Data Scientist Dr. Pragyansmita Nayak. As a subject matter expert on data, AI and machine learning, Nayak shared her thoughts on how AI can change the way we do data analytics and the growing applications, concepts and technologies we should be keeping an eye on in the coming years.
Read below for Nayak’s full Executive Spotlight interview.
Tell me about the current state of the artificial intelligence market. Where are you seeing new opportunities in AI, and where do you think the market is heading?
The AI market is in a vibrant stage right now, and there’s a lot of potential. We are just getting started in having the right kind of applications using artificial intelligence.
Artificial intelligence and machine learning are often conflated together, but they’re different concepts. However, if we look at both AI and ML and think of how these domains are being perceived, everybody is looking at them with lots of interest and looking for good ways of using them.
In the defense space, there’s a lot of interest in leveraging the predictive analytics capabilities for predictive maintenance of vehicles and vessels, be it within the Army, the Navy or the Air Force, and including space as well. We are just getting started on that journey with a lot of potential and a lot of untapped questions. So we are just scratching the surface — we are figuring out different ways of using it at the moment.
Data is often coming from multiple sources that organizations need to collect, analyze and understand in order to use it. What are some of the key challenges and opportunities you’re seeing emerge as organizations harness data and use it to drive decisions?
Data is the foundation of good AI/ML solutions. Literally the ‘garbage in/garbage out’ philosophy applies here. If I don’t have good data, my solution is going to be equally bad — it’s likely to be biased, and it’s likely to be an incomplete solution because ML’s algorithm picks up on the patterns and trends in the data. So it is very important that all types of data are being used as part of the solution, and that those data have been properly vetted by the data stewards or the chief data officers in the organization to make sure that it is the type of data which your organization should be using for solution development.
To use data for decision intelligence or action analytics, you need good quality data, so organizations are moving from a data rich environment to a data-driven environment. These different types of data that we have to deal with is the key challenge we’re facing.
There are various ways in which data differ. For example, you can have structured relational or tabular data, which most of us are very familiar with, or unstructured data, which is our social media messages, log files, audio, video and any files which we are generating in our respective file shares. And when I say file shares, that in turn has its own categories of file block or object storage, and then these data could be accessible via a database or a collection of APIs.
Then this can go even further to another level of data categorization being based on its location. Is it on premise? Is it on the cloud? Is it spread across multiple clouds? Do you have a hybrid setup? All these different categorizations are different ways in which data is available. As a solution, we want to tap into these data sources, which comes with its inherent complexities. Being able to navigate around them and look at all this data as one holistic source is important. There’s a lot of opportunity in that and how you are able to work across these different mechanisms in which data can be accessed and processed and analyzed, and being able to come up with interesting solutions for those decision intelligence or action analytics applications.
What kind of tools and technologies can organizations use to make their data more accessible and understandable?
Some of the tools and technologies that organizations can use to make the data more accessible and understandable underlie the whole data management principle around the trifecta of people, process and technology — making them all work together in a cohesive manner so that you have reusable, repeatable and continuous improvement, and there’s continuous feedback in the development that is happening so that they are enriching the application even further.
I would say the second thing would be that you have metadata associated with the data that is apart from the data itself — the characteristics of the data. Where did it originate? What was the process in which it was created? For example, if it’s certain data which are being measured by sensors or certain equipment, keeping track of those sensors and measurements. When was it measured? What manipulation has that dataset undergone over time by the time I’m looking at it?
What’s important is not just the machine learning algorithm solution being developed, but also over time figuring out the shift or the ‘drift,’ that has happened in the data patterns itself, and also in terms of the model.
Certain tools which can help in that process are metadata-driven, and a data catalog would go a long way in terms of keeping track of those characteristics and giving you a ‘shopping portal’ kind of access to all the data assets of your organization.
Which emerging technologies do you anticipate will have the greatest impact on the federal landscape in the next few years?
I would start first with data — where the data is located and having immediate access to it. As I mentioned earlier, metadata management and keeping track of the metadata properties of different types of structured unstructured data that is being used needs to be an ongoing thing. Sometimes it’s not given the due importance that it needs, but I think that’s growing in relevance and gaining prominence as people are understanding more about it.
Another thing I expect to grow in relevance is object storage, so that when you are keeping track of unstructured data — primarily files, audio, video files, your social media posts, anything of that nature — then you’re tracking both the data as well as the metadata with the object and using that as part of your data analytics applications.
The other concept which I’m expecting to grow in relevance is natural language processing combined with deep learning, which is the OpenAI ChatGPT application that everybody is talking about today. That is definitely going to kick off. When you are talking about machine learning, deep learning, natural language processing, there are algorithms underlying that. So I expect the whole explainable AI aspect of that to be equally growing in importance as we are going along.
At the moment, most of these algorithms are black box in nature, where you don’t have insight into what data was used for training or what are the parameters of the models which are used. For example, if there’s a neural network doing deep learning, what are the parameters? How is it determining the next word that it is recommending in your sentences? Having more clarity on that is important just to avoid the whole issue of bias, and more so when looking in the federal landscape where fairness and ethics are a lot more important.
One last item I would also identify as important for the federal landscape is machine learning-based automation with human-in-the-loop. That is, you don’t entirely automate and hand it off to a machine learning algorithm, but have certain points in that automation where it needs that human feedback for validation, for confirmation that it is still remaining fair in the way it is going about its operations. So having that kind of oversight with human-in-the-loop is equally important, particularly in the federal landscape.