CIO Jamie Smith's engineering team is building out a skills inference model using generative AI technology. But he's also concerned that if chatbots are allowed to replace IT workers, originality will die.

The emergence of artificial intelligence (AI) has opened the door to endless opportunities across hundreds of industries, but privacy continues to be huge concern. The use of data to inform AI tools can unintentionally reveal sensitive and personal information.
Chatbots built atop large language models (LLMs) such as GPT-4 hold tremendous promise to reduce the amount of time knowedge workers spend summarizing meeting transcripts and online chats, creating presenations and campaigns, performing data analysis and even compiling code. But the technology is far from fully vetted.
As AI tools continue to grow and gain acceptance โ not just within consumer-facing applications such as Microsoftโs Bing and Googleโs Bard chatbot-powered search engines โ thereโs a growing concern over data privacy and originality.
Once LLMs become more standardized, and more companies use the same algorithms, will originality of ideas become waterered down?

University of Phoenix CIO Jamie Smith
Jamie Smith, chief information officer at the University of Phoenix, has a passion for creating high-performance digital teams. He started his career as a founder of an early internet consulting firm, and he has looked to apply technology to business problems since.
Smith is currently using an LLM to build out a skills inference engine based on generative AI. But, as generative AI becomes more pervasive, Smithโs also concerned about the privacy of ingested data and how the use of the same AI model by a plethora of organizations could affect originality that only comes from human beings.
The following are excerpts of Smithโs interview with Computerworld:
What keeps you up at night? โIโm having a hard time seeing how all of this [generative AI] will augment versus replace all our engineers. Right now, our engineers are amazing problem-solving machines โ forget about coding. Weโve enabled them to think about student problems first and coding problems second.
โSo, my hope is, [generative AI] will be like bionics for engineers that will allow them more time to focus on student issues and less time thinking about how to get their code compiled. The second thing is, and the less optimistic view, is engineers will become less involved in the process and in turn weโll get something thatโs faster, but that doesnโt have a soul to it. Iโm afraid that if everyone is using the same models, where is the innovation going to come from? Whereโs that part of a great idea if youโve shifted that over to computers?
โSo, thatโs the yin and the yang of where I see this heading. And as a consumer myself, the ethical considerations really start to amplify as we rely more on the black-box models that we really donโt understand how they work.โ
How could AI tools unintentionally reveal sensitive data and private information? โGenerative AI works by ingesting large data sets and then building inferences or assumptions from those data sets.
โThere was this famous story where Target started sending out things to a guyโs teenage daughter who was pregnant at time, and it was before he knew. She was in high school at the time. So, he came into Target really angry. The model knew before the father did that his daughter was pregnant.
โThatโs one example of inference, or a revealing of data. The other simple issue is how secure is the data thatโs ingested? What are the opportunities for it to go out in an unsensitized way that will unintentionally unveil things like health information. โฆPersonal health information, if not scrubbed properly, can get out there unintentionally. I think there are more subtle ones, and those concern me a little bit more.
โWhere the University of Phoenex is located is where Waymo has had its cars located. If you consider the number of sensors on those cars and all that data going back to Google. They can suggest things like, โHey, they can read license plates. I see that your car is parked at the house from 5 p.m. to 7 p.m. Thatโs a good time to reach you.โ With all these billions of sensors out there, all connected back [to AI clouds], there are some nuanced ways that we might not consider uber-private data, but revealing data that could get out there.โ
Prompt engineering is a nascent skill growing in popularity. As generative AI grows and ingests industry- or even corporate-specific data for tailoring LLMs, do you see a growing threat to data privacy? โFirst, do I expect prompt engineering as a skill to grow? Yes. Thereโs no question about that. The way I look at it, engineering is about coding, and training these AI models with prompt engineering is almost like parenting. Youโre trying to encourage an outcome by continuing to refine how you ask it questions and really helping the model understand what a good outcome is. So, itโs similar, but a different enough skill setโฆ. Itโll be interesting to see how many engineers can cross that chasm to get to prompt engineering.
โOn the privacy front, weโre invested in a company that does corporate skills inference. It takes a bit of what youโre doing in your systems of work, be it your learning management system, email, who you work for and what you work with and infers skills and skill levels around proficiencies for what you may need.
โBecause of this, weโve had to implement that in a single tenant model. So, weโve stood up a new tenant for each company with a base model and then their training data, and we hold their training data for the least amount of time to train the model and then cleanse it and send it back to them. I wouldnโt call that a best practice. Thatโs a challenging thing to do to scale, but youโre getting into situations where some of the controls donโt yet exist for privacy, so you have to do stuff like that.
โThe other thing Iโve seen companies start to do is introduce noise into the data to sanitize it in such a way where you canโt get down to individual predictions. But thereโs always a balance between how much noise you introduce to how much that will decrease the outcome in terms of the modelโs prediction.
โRight now, weโre trying to figure out our best bad choice to ensure privacy in these models because anonymizing isnโt perfect. Especially as weโre getting into images, and videos and voice and those things that are much more complex than just pure data and words, those things can slip through the cracks.โ
Every large language model has a different set of APIs to access it for prompt engineering โ at some point do you believe things will standardize? โThere are a lot of companies that were built on top of GPT-3. So, they were basically making the API easier to deal with and the prompts more consistent. I think Jasper was one of those several start-ups to do that. So clearly thereโs a need for it. As they evolve beyond large language models and into images and sound, there will have to be standardization.
โRight now, itโs like a dark art โ prompt engineering is closer to sorcery than engineering at this point. There are emerging best practices, but this is a problem anyways in having a lot of [unique] machine learning models out there. For example, we have a machine learning model thatโs SMS-text for nurturing our prospects, but we also have a chatbot thatโs for nurturing prospects. Weโve had to train both those models separately.
โSo [there needs to be] not only the prompting but more consistency in training and how you can train around intent consistently. There are going to have to be standards. Otherwise, itโs just going to be too messy.
โItโs like having a bunch of children right now. You have to teach each of them the same lesson but at different times, and sometimes they donโt behave all that well.
โThatโs the other piece of it. Thatโs what scares me, too. I donโt know that itโs an existential threat yet โ you know, like itโs the end of the world, apocalypse, Skynet is here thing. But it is going to really reshape our economy, knowledge work. Itโs changing things faster than we can adapt to it.โ
Is this your first foray into the use of large language models? โItโs my first foray into large language models that havenโt been trained off of our data โ so, what are the benefits of it if you have a million alumni and petabytes and petabytes of digital exhaust over the years?
โAnd so, we have an amazing nudge model that helps with student progression if theyโre having trouble in a particular course; it will suggest specific nudges. Those are all large language models, but that was all trained off of UoP data. So, these are our first forays into LLMs where the training has already been done and weโre counting on othersโ data. Thatโs where it gets a little less comfortable.โ
What skills inference model are you using? โOur skills inference model is proprietary, and it was developed by a company called EmPath, which weโre investors in. Along with EmPath, there are a couple of other companies out there, like Eightfold.ai, that are doing skills inference models that are very similar.โ
How does skills inference work? โSome of it comes out of your HR system and if you have certifications you can achieve. The challenges weโve found is no one wants to go out there and keep the manual skills profile up to date. Weโre trying to open up to systems youโre always using. So, if your emailing back and forth and doing code check-ins in terms of engineers โ or based on your title, job assessments โ whatever digital exhaust we can get that doesnโt require someone going out. And then you train the model, and then you have people go out and validate the model to ensure the assessment of themselves is accurate. Then you use that and continue to iterate.โ
So, this is a large language model like GPT-4? โIt is. What chatGPT and GPT-4 are going to be good at doing is the natural language processing part of that, of inferring a skills taxonomy based on things youโve done and being able to then train that. GPT-4 has mostly scraped [all the input it needs]. One of the hard things for us is choosing. Do I pick an IBM skills taxonomy? Do I pick an MC1 taxonomy? The benefit of large language models like GPT-4 is that theyโve scraped all of them, and it can provide information in any way you want it. Thatโs been really helpful.โ
So, is this a recruitment tool, or a tool for upskilling and retraining an existing workforce? โThis is less for recruitment because there are lots of those on applicant tracking platforms. Weโre using it for internal skills development for companies. And weโre also using it for team building. So, if you have to put together a team across a large organization, itโs finding all the people with the right skills profile. Itโs a platform designed to target learning and to help elevate skills โ or to reskill and upskill your existing employees.
โThe interesting thing is while AI is helping, itโs also disrupting those same employees and needing them to be reskilled. Itโs causing the disruption and helping solve the problem.โ
Are you using this skills inference tech internally or for clients? โWe are wrapping it into a bigger platform now. So, weโre still in a dark phase now with a couple of alpha implementations. We actually implemented it ourselves. So, itโs like eating your own filet mignon.
โWe have 3,500 employees and went through an implementation ourselves to ensure it worked. Again, I think this is going to be one of those industries where the more data you can feed it, the better it works. The hardest thing I found with this is data sets are kind of imperfect; itโs only as good as the data youโre feeding it until we can wire more of that noise in there and get that digital exhaust. Itโs still a lot better than starting from scratch. We also do a lot of assessment. We have a tool called Flo which analyzes the check-ins and check-outs of code suggested learning. Itโs one of the tool suites we look at for employee reskilling.
โIn this case, thereโs probably less private data in there on an individual basis, but again because the companyโs view of this is so proprietary in terms of feeding information in [from HR and other systems], weโve had to turn this into kind of a walled garden.โ
How long has the project been in development? โWe probably started it six to eight months ago, and we expect it to go live in the next quarter โ for the first alpha customer, at least. Again, weโre learning our way through it, so little pieces of it are live today. The other thing is there are a lot of choices for curriculum out there besides the University of Phoenix. So the first thing we had to do is map every single course we had and decide what skills come out of those courses and have validation for each of those skills. So thatโs been a big part of the process that doesnโt even involve technology, frankly. Itโs nuts-and-bolts alignment. You donโt want to have one course spit out 15 skills. Itโs got to be the skills you really learn from any given course.
โThis is part of our overall rethinking of ourselves. The degree is important, but your outcomes are really about getting that next job in the shortest amount of time possible. So, this overall platform is going to help do that within a company. I think a lot of times if youโre missing a skill, the first inclination is to go out and hire somebody versus reskill an employee you already have who already understands the company culture and has a history with the organization. So, weโre trying to make this the easy button.
โThis will be something weโre working on for our business-to-business customers. So, weโll be implementing it for them. We have over 500 business-to-business customer relationships now, but thatโs really more of a tuition benefit kind of thing where your employer pays a portion of the tuition.
โThis is about how to deepen our relationship with those companies and help them solve this problem. So, weโve gone out and interviewed CHROs and other executives trying to make what we do more applicable to what they need.
โHey, as a CIO myself, I have that problem. The war for talent is real, and we canโt buy enough talent at the current arms-race for wages. So, we have to upskill and reskill as much as possible internally as well.โ