Lucas Mearian
Senior Reporter

Q&A: Univ. of Phoenix CIO says chatbots could threaten innovation

feature
Mar 22, 202313 mins

CIO Jamie Smith's engineering team is building out a skills inference model using generative AI technology. But he's also concerned that if chatbots are allowed to replace IT workers, originality will die.

ai artificial intelligence circuit board circuitry mother board nodes computer chips

The emergence of artificial intelligence (AI) has opened the door to endless opportunities across hundreds of industries, but privacy continues to be huge concern. The use of data to inform AI tools can unintentionally reveal sensitive and personal information.

Chatbots built atop large language models (LLMs) such as GPT-4 hold tremendous promise to reduce the amount of time knowedge workers spend summarizing meeting transcripts and online chats, creating presenations and campaigns, performing data analysis and even compiling code. But the technology is far from fully vetted. 

As AI tools continue to grow and gain acceptance โ€” not just within consumer-facing applications such as Microsoftโ€™s Bing and Googleโ€™s Bard chatbot-powered search engines โ€” thereโ€™s a growing concern over data privacy and originality. 

Once LLMs become more standardized, and more companies use the same algorithms, will originality of ideas become waterered down? 

jamie smith 01 fullres crop University of Phoenix

University of Phoenix CIO Jamie Smith

Jamie Smith, chief information officer at the University of Phoenix, has a passion for creating high-performance digital teams. He started his career as a founder of an early internet consulting firm, and he has looked to apply technology to business problems since.

Smith is currently using an LLM to build out a skills inference engine based on generative AI. But, as generative AI becomes more pervasive, Smithโ€™s also concerned about the privacy of ingested data and how the use of the same AI model by a plethora of organizations could affect originality that only comes from human beings.

The following are excerpts of Smithโ€™s interview with Computerworld

What keeps you up at night? โ€œIโ€™m having a hard time seeing how all of this [generative AI] will augment versus replace all our engineers. Right now, our engineers are amazing problem-solving machines โ€“ forget about coding. Weโ€™ve enabled them to think about student problems first and coding problems second.

โ€œSo, my hope is, [generative AI] will be like bionics for engineers that will allow them more time to focus on student issues and less time thinking about how to get their code compiled. The second thing is, and the less optimistic view, is engineers will become less involved in the process and in turn weโ€™ll get something thatโ€™s faster, but that doesnโ€™t have a soul to it. Iโ€™m afraid that if everyone is using the same models, where is the innovation going to come from? Whereโ€™s that part of a great idea if youโ€™ve shifted that over to computers?

โ€œSo, thatโ€™s the yin and the yang of where I see this heading. And as a consumer myself, the ethical considerations really start to amplify as we rely more on the black-box models that we really donโ€™t understand how they work.โ€

How could AI tools unintentionally reveal sensitive data and private information? โ€œGenerative AI works by ingesting large data sets and then building inferences or assumptions from those data sets.

โ€œThere was this famous story where Target started sending out things to a guyโ€™s teenage daughter who was pregnant at time, and it was before he knew. She was in high school at the time. So, he came into Target really angry. The model knew before the father did that his daughter was pregnant.

โ€œThatโ€™s one example of inference, or a revealing of data. The other simple issue is how secure is the data thatโ€™s ingested? What are the opportunities for it to go out in an unsensitized way that will unintentionally unveil things like health information. โ€ฆPersonal health information, if not scrubbed properly, can get out there unintentionally. I think there are more subtle ones, and those concern me a little bit more.

โ€œWhere the University of Phoenex is located is where Waymo has had its cars located. If you consider the number of sensors on those cars and all that data going back to Google. They can suggest things like, โ€˜Hey, they can read license plates. I see that your car is parked at the house from 5 p.m. to 7 p.m. Thatโ€™s a good time to reach you.โ€™ With all these billions of sensors out there, all connected back [to AI clouds], there are some nuanced ways that we might not consider uber-private data, but revealing data that could get out there.โ€

Prompt engineering is a nascent skill growing in popularity. As generative AI grows and ingests industry- or even corporate-specific data for tailoring LLMs, do you see a growing threat to data privacy? โ€œFirst, do I expect prompt engineering as a skill to grow? Yes. Thereโ€™s no question about that. The way I look at it, engineering is about coding, and training these AI models with prompt engineering is almost like parenting. Youโ€™re trying to encourage an outcome by continuing to refine how you ask it questions and really helping the model understand what a good outcome is. So, itโ€™s similar, but a different enough skill setโ€ฆ. Itโ€™ll be interesting to see how many engineers can cross that chasm to get to prompt engineering.

โ€œOn the privacy front, weโ€™re invested in a company that does corporate skills inference. It takes a bit of what youโ€™re doing in your systems of work, be it your learning management system, email, who you work for and what you work with and infers skills and skill levels around proficiencies for what you may need.

โ€œBecause of this, weโ€™ve had to implement that in a single tenant model. So, weโ€™ve stood up a new tenant for each company with a base model and then their training data, and we hold their training data for the least amount of time to train the model and then cleanse it and send it back to them. I wouldnโ€™t call that a best practice. Thatโ€™s a challenging thing to do to scale, but youโ€™re getting into situations where some of the controls donโ€™t yet exist for privacy, so you have to do stuff like that.

โ€œThe other thing Iโ€™ve seen companies start to do is introduce noise into the data to sanitize it in such a way where you canโ€™t get down to individual predictions. But thereโ€™s always a balance between how much noise you introduce to how much that will decrease the outcome in terms of the modelโ€™s prediction.

โ€œRight now, weโ€™re trying to figure out our best bad choice to ensure privacy in these models because anonymizing isnโ€™t perfect. Especially as weโ€™re getting into images, and videos and voice and those things that are much more complex than just pure data and words, those things can slip through the cracks.โ€

Every large language model has a different set of APIs to access it for prompt engineering โ€” at some point do you believe things will standardize? โ€œThere are a lot of companies that were built on top of GPT-3. So, they were basically making the API easier to deal with and the prompts more consistent. I think Jasper was one of those several start-ups to do that. So clearly thereโ€™s a need for it. As they evolve beyond large language models and into images and sound, there will have to be standardization.

โ€œRight now, itโ€™s like a dark art โ€” prompt engineering is closer to sorcery than engineering at this point. There are emerging best practices, but this is a problem anyways in having a lot of [unique] machine learning models out there. For example, we have a machine learning model thatโ€™s SMS-text for nurturing our prospects, but we also have a chatbot thatโ€™s for nurturing prospects. Weโ€™ve had to train both those models separately.

โ€œSo [there needs to be] not only the prompting but more consistency in training and how you can train around intent consistently. There are going to have to be standards. Otherwise, itโ€™s just going to be too messy.

โ€œItโ€™s like having a bunch of children right now. You have to teach each of them the same lesson but at different times, and sometimes they donโ€™t behave all that well.

โ€œThatโ€™s the other piece of it. Thatโ€™s what scares me, too. I donโ€™t know that itโ€™s an existential threat yet โ€” you know, like itโ€™s the end of the world, apocalypse, Skynet is here thing. But it is going to really reshape our economy, knowledge work. Itโ€™s changing things faster than we can adapt to it.โ€

Is this your first foray into the use of large language models? โ€œItโ€™s my first foray into large language models that havenโ€™t been trained off of our data โ€” so, what are the benefits of it if you have a million alumni and petabytes and petabytes of digital exhaust over the years?

โ€œAnd so, we have an amazing nudge model that helps with student progression if theyโ€™re having trouble in a particular course; it will suggest specific nudges. Those are all large language models, but that was all trained off of UoP data. So, these are our first forays into LLMs where the training has already been done and weโ€™re counting on othersโ€™ data. Thatโ€™s where it gets a little less comfortable.โ€

What skills inference model are you using? โ€œOur skills inference model is proprietary, and it was developed by a company called EmPath, which weโ€™re investors in. Along with EmPath, there are a couple of other companies out there, like Eightfold.ai, that are doing skills inference models that are very similar.โ€

How does skills inference work? โ€œSome of it comes out of your HR system and if you have certifications you can achieve. The challenges weโ€™ve found is no one wants to go out there and keep the manual skills profile up to date. Weโ€™re trying to open up to systems youโ€™re always using. So, if your emailing back and forth and doing code check-ins in terms of engineers โ€” or based on your title, job assessments โ€” whatever digital exhaust we can get that doesnโ€™t require someone going out. And then you train the model, and then you have people go out and validate the model to ensure the assessment of themselves is accurate. Then you use that and continue to iterate.โ€

So, this is a large language model like GPT-4?  โ€œIt is. What chatGPT and GPT-4 are going to be good at doing is the natural language processing part of that, of inferring a skills taxonomy based on things youโ€™ve done and being able to then train that. GPT-4 has mostly scraped [all the input it needs]. One of the hard things for us is choosing. Do I pick an IBM skills taxonomy? Do I pick an MC1 taxonomy? The benefit of large language models like GPT-4 is that theyโ€™ve scraped all of them, and it can provide information in any way you want it. Thatโ€™s been really helpful.โ€

So, is this a recruitment tool, or a tool for upskilling and retraining an existing workforce? โ€œThis is less for recruitment because there are lots of those on applicant tracking platforms. Weโ€™re using it for internal skills development for companies. And weโ€™re also using it for team building. So, if you have to put together a team across a large organization, itโ€™s finding all the people with the right skills profile. Itโ€™s a platform designed to target learning and to help elevate skills โ€” or to reskill and upskill your existing employees.

โ€œThe interesting thing is while AI is helping, itโ€™s also disrupting those same employees and needing them to be reskilled. Itโ€™s causing the disruption and helping solve the problem.โ€

Are you using this skills inference tech internally or for clients? โ€œWe are wrapping it into a bigger platform now. So, weโ€™re still in a dark phase now with a couple of alpha implementations. We actually implemented it ourselves. So, itโ€™s like eating your own filet mignon. 

โ€œWe have 3,500 employees and went through an implementation ourselves to ensure it worked. Again, I think this is going to be one of those industries where the more data you can feed it, the better it works. The hardest thing I found with this is data sets are kind of imperfect; itโ€™s only as good as the data youโ€™re feeding it until we can wire more of that noise in there and get that digital exhaust. Itโ€™s still a lot better than starting from scratch. We also do a lot of assessment. We have a tool called Flo which analyzes the check-ins and check-outs of code suggested learning. Itโ€™s one of the tool suites we look at for employee reskilling.

โ€œIn this case, thereโ€™s probably less private data in there on an individual basis, but again because the companyโ€™s view of this is so proprietary in terms of feeding information in [from HR and other systems], weโ€™ve had to turn this into kind of a walled garden.โ€

How long has the project been in development? โ€œWe probably started it six to eight months ago, and we expect it to go live in the next quarter โ€” for the first alpha customer, at least. Again, weโ€™re learning our way through it, so little pieces of it are live today. The other thing is there are a lot of choices for curriculum out there besides the University of Phoenix. So the first thing we had to do is map every single course we had and decide what skills come out of those courses and have validation for each of those skills. So thatโ€™s been a big part of the process that doesnโ€™t even involve technology, frankly. Itโ€™s nuts-and-bolts alignment. You donโ€™t want to have one course spit out 15 skills. Itโ€™s got to be the skills you really learn from any given course.

โ€œThis is part of our overall rethinking of ourselves. The degree is important, but your outcomes are really about getting that next job in the shortest amount of time possible. So, this overall platform is going to help do that within a company. I think a lot of times if youโ€™re missing a skill, the first inclination is to go out and hire somebody versus reskill an employee you already have who already understands the company culture and has a history with the organization. So, weโ€™re trying to make this the easy button.

โ€œThis will be something weโ€™re working on for our business-to-business customers. So, weโ€™ll be implementing it for them. We have over 500 business-to-business customer relationships now, but thatโ€™s really more of a tuition benefit kind of thing where your employer pays a portion of the tuition.

โ€œThis is about how to deepen our relationship with those companies and help them solve this problem. So, weโ€™ve gone out and interviewed CHROs and other executives trying to make what we do more applicable to what they need.

โ€œHey, as a CIO myself, I have that problem. The war for talent is real, and we canโ€™t buy enough talent at the current arms-race for wages. So, we have to upskill and reskill as much as possible internally as well.โ€

Lucas Mearian

With a career spanning more than two decades in journalism and technology research, Lucas Mearian is a seasoned writer, editor, and former IDC analyst with deep expertise in enterprise IT, infrastructure systems, and emerging technologies. Currently a senior writer at Computerworld covering AI, the future of work, healthcare IT and financial services IT, his 23-year tenure has included roles such as Senior Technology Editor and Data Storage Channel Editor, where he covered cutting-edge topics like blockchain, 3D printing, sustainable IT, and autonomous vehicles. He has appeared on several podcasts, including Foundryโ€™s Today In Tech. He also served as a research manager at IDC, where he focused on software-defined infrastructure, compute, and storage within the Infrastructure Systems, Platforms, and Technologies group.

Before entering tech media, he served as Editor-in-Chief of the Waltham Daily News Tribune and as a senior reporter for the MetroWest Daily News. Heโ€™s won first place awards from the New England Press Association, the American Association of Business Publication Editors, and has been a finalist for several Jesse H. Neal Awards for outstanding business journalism. A former U.S. Marine Corps sergeant who served in reconnaissance, he brings a disciplined, analytical mindset to his work, along with outstanding writing, research, and public speaking skills.

More from this author