Are you curious about learning data in 2024 but wondering about how to approach ethical considerations in machine learning? Well, to get more insight on this crucial area, we sat down and talked to one of the experts to give you some insights into where the industry is at.
An experienced engineer, William Tracy helped to design CareerFoundry’s new Machine Learning with Python Course, in particular the Ethics and Direction of Machine Learning Programs section. So who better to talk to about the ethics of AI in data?
William is a North Carolina native with a masters and BS in mechanical engineering, and a BA in toy design. He’s both designed and operated heavy construction machinery (big toys) and is an expert in linkage design and optimization. A professional engineer, he’s also dabbled in production-scale 3D printing as well as machine learning.
I’ve broken the interview into several topics, so you can negotiate it with the clickable menu:
- Ethical considerations in data
- Regulations and policies
- User privacy
- Accountability and responsibility
- Wider world
- Designing ethics into CareerFoundry’s ML specialization course
Interview with data expert William Tracy
Hi William! Can you tell us about your history working with data and ethics?
William Tracy: I’ve worked as a mechanical engineer since 2005, and as a P.E. (professional engineer) since 2009.
Being a professional engineer means you’re held to the ethical standard for your area and if you violate any of those ethics, you can potentially lose your privilege to work in that area.
I also worked for Caterpillar Inc. on small construction equipment as a performance engineer and analyst for most of that time. Caterpillar has a very developed set of ethical rules, as the company has been around more than 100 years.
They require all employees to complete a yearly code of conduct and self-report any breaches in ethics for that year, including gifts from suppliers that might be considered bribes, collusion with other companies, and what data we are at liberty to use in our designs. The company is very careful to avoid even the perception of using data that is not owned by it.
As IBM reports that 85% of consumers say that it’s important for organizations to factor in ethics as they use AI, how can companies educate themselves as quickly and effectively as possible in ethics?
William Tracy: The most important thing to remember about generative AI is that there is no “intelligence” behind it. It’s a mathematical formula, better named machine learning.
Ethics, however, are a set of human moral rules, some of which are subjective in scope. What may be a breach in ethics for one person, company, or country, may not be for another.
When using AI, the answers generated are created from many other examples, sometimes in a managed dataset, sometimes freely gathered on the internet. How then can you know whose ethics you are using?
AI is often sold as a “black box” that creates solutions to your problems with apparent magic. It’s instead most important for companies to understand where the data the AI is using comes from, if not specifically regulating that data.
Lastly, any answer generated without a human intelligence behind it must be examined by a human to determine if it follows the set of ethics used by that organization.
Ethical considerations in data
Can you describe a time when you faced an ethical dilemma in your work?
William Tracy: While working at Caterpillar, the company required its employees to avoid even the perception of bending their set of ethics. Once day, I was invited to meet with a representative at a large construction convention. While meeting them, I was given a bag of “swag” like the ones available for free at their table. Now, on its own this wouldn’t be an ethical problem.
However, when leaving the convention, I finally looked in the bag and discovered it also contained a very nice tie pin. If I hadn’t already left the convention, I would’ve returned the gift, which could be considered a bribe by my company. It may not have been considered such by the other company, depending on the general code of ethics in their country.
Fortunately, we ended up not considering that company in the eventual lineup of suppliers, which removed my need to consider whether their gift was affecting my opinion.
How did ethical considerations influence the design of CareerFoundry’s ML specialization course?
William Tracy: Ethics were one of the main influences when writing CareerFoundry’s ML specification course. Machine learning is an evolving discipline, which especially in the last 4-5 years has entered a period of new growth and pushing boundaries. This is why it’s so important to learn about it, as it’s affecting more and more systems we interface with daily.
This also means there isn’t a fully realized code of ethics to refer to when using machine learning and generative AI. Sometimes it’s not easy to know when a new invention has the potential to cause real problems with those it affects. When students are experimenting with a newer technology like machine learning, it’s vital for them to think about how that technology might affect others in unforeseen ways.
Can you share some specific examples or modules from the course that address ethics?
William Tracy: Task 1.2 in the course deals specifically with the ethics of machine learning, addressing many of the topics already presented here. We deliberately placed it so early in the course, before most of the technology is explained to students, because this is also where it should be in the design process.
Before unleashing a new technology on the world, think about where and how it can be used, and if there are any immediately obvious ill effects that can be designed out or removed when beginning your machine learning project.
Some of those issues, for example using copyrighted images or text without consent, could have been averted completely if the people creating the technology had considered the ethical implications more carefully and put in safeguards to prevent them from being misused.
In your opinion, why is it important for data professionals to have a grounding in ethics?
William Tracy: The ability to consider long-term effects of a developing issue or technology is different from the technical know-how to create or understand the technology. They require a founding in different principles, one in ethics, and the other in technical knowledge or information processing. One is considered more an art and the other more a science.
Because of this, many people don’t get an adequate grounding in both sides if they focus on a particular discipline. A data professional who also has a grounding in ethical considerations may be able to better consider the long-term effect of the information they’re handling and which use to put it to.
What are the ethical implications of using synthetic data in training models?
William Tracy: One solution to get around using ethically-fraught real-world data is to use a set of synthetic data instead.
But again, it’s best to always consider where the data comes from in the first place. Synthetic data is said to remove ethical issues, as the data is generated and not real. However, it’s still important to consider how the data was created and if there was inherent bias added while it was created.
While running the data, it’s possible for there to be increased model error in the final product, because synthetic data doesn’t capture all real-world aspects and uses. Finally, if the synthetic data doesn’t capture real-world issues, this may allow for unforeseen ethical issues when the model is finally exposed to real data.
Regulations and policies
How do you see the regulatory landscape evolving for machine learning and AI?
William Tracy: Slowly, although it’s gaining speed as more people realize the dangers. Once again, the problem here is how entrenched the new technology becomes before rules are set in place to govern them.
There’s also a lot of misunderstanding on the part of what AI is, which is why I prefer the term “machine learning.” A lot of the new governmental committees are first focused on how AI might be used in war. This is a big deal! Having a machine learning model in charge of military might without strong human oversight, with the issues we’ve seen so far, is very much not a good idea.
However, AI is already affecting human livelihoods where people and corporations think they can get a product normally offered by humans very cheaply or for free when created by a machine learning model. We’ve seen this in the outcry from artists and authors over AI-generated art and text, and we’ve seen this from companies deciding to let go people who generate content which the corporation assumes will be created “good enough” by AI.
Regulatory committees are being set in place by the US government to study this right now, but governmental regulations also move very slowly.
These committees are going to look at what has already been done as an example of what should be done. Can you see where that might lead to issues?
What kind of consequences are there for not checking if machine learning programs reach ethical standards?
William Tracy: Look to the artistic community first on this. They are the ones with less money and ability to change policy. Companies will sue and protect themselves from harm with lawyers. What’s a lone artist going to do, whose pictures are being recreated for sale by AI, or an author with three knock-off copies of their latest book?
The person who created the machine learning program in the first place certainly deserves some part of the blame, but were they the person who used it to recreate products that are income for another? Is the person who used it guilty, since the product was freely available and there are no rules to say what they’re doing is unethical or illegal? Ethics should come as the first stage of design for any new technology, thinking how and where it can be used, and who it can impact.
How do machine learning models pose risks to user privacy?
William Tracy: Machine learning models are built on large amounts of pre-existing data. This data can be real, or generated.
When using real data, there’s a chance that exact data can make its way into the output of the model. Remember there is no “intelligence” behind what’s chosen to pass through the layers of the model. It’s all based on mathematics, so there’s nothing to stop private information from passing through, except if a human puts that stopgap in the model.
A real life example of this, and one way it was discovered image-generating models were scraping art from the internet, was that artist signatures kept showing up in machine learning-generated pictures, even when the original author had been expressly clear that their work was not to be used by these systems.
What are some best practices for ensuring data privacy when training models?
William Tracy: The best way to avoid data privacy issues is not to use private data when training models. Then there’s no chance it can be passed through the model layers and come out the other side.
Artificial or synthetic data can be used to populate the training dataset so there is no risk to user privacy. However when this is not practical, culling and controlling the original dataset may also work. This requires more work from the data scientists creating the model (much as a first pass on ethical consequences does) to determine what’s acceptable and not acceptable to use in the training set.
Once a model is deployed, what ethical considerations come into play in terms of its continuous monitoring and updating?
William Tracy: It’s well known in engineering design that a problem removed during original concepting costs nothing but a little time, one removed during initial designs may cost a little money or time to redesign, and one caught in prototyping might cost a little or a lot in retooling molds or changing suppliers. However, by far the largest cost to design changes comes when a recall or update is required for a product already available to customers.
Similarly, once a model is deployed, it’s being used by people with no connection to the original creator or team. It’s very easy to monitor your own ethical actions. But it’s nearly impossible to regulate the ethical actions of others, except by regulatory law.
If a released machine learning model either has an ethical failure, or a user discovers a feature can be used for unethical behavior, the creator should ethically patch or change the model so that breach cannot be used again. This could require removing access to the model or retraining it on a new dataset and losing all the training experience. It also may be impossible, if others have made copies or recreated the model themselves.
Accountability and responsibility
Who should be held accountable when a machine learning model makes a mistake or causes harm?
William Tracy: As for many new technologies, this is quite a complicated question:
- Is the creator of the model at fault for the model being used in a manner not specified or accounted for?
- Is it the user’s fault for finding a new method of using the model that falls outside ethical standards because there is no regulation?
- Is it the government or a regulatory body’s fault for not creating guidelines for how the model is used?
It’s a little bit of all of these, but because machine learning is very new, at least when being used by large numbers of people, there’s no precedent yet. There are already lawsuits looming for mistakes and errors in the first generation of machine learning and AI models, and these will likely create the framework for determining who is held accountable in the future.
So, how can organizations ensure responsible development and deployment of ML models?
William Tracy: If you were given a magic device that would let you skip a day into the past and change history, would you simply press the button and make it up as you go along? Hopefully you’d give a little thought as to what the consequences might be and what could go wrong.
Similarly, there’s no precedent for how AI will affect an organization’s output and profit. It’s easy to adopt what you think is a get-rich-quick scheme, but once the results come in, they might be disappointing. Research some of the content creation blogs, AI assistant bots, and media corporations that have been in the news lately for examples. Practicing forward thinking and assessing risks before deployment will save companies a lot of money in the future when using AI practically.
Are there any tools or frameworks that you recommend for assessing the ethical considerations in machine learning, beyond bias detection?
William Tracy: There are several frameworks of ethics designed to lead you down the path to the best decision in terms of moral judgment. Firstly, I’d recommend checking out:
- Edwin Smith’s “Three Steps to Making an Ethical Decision”
- The Markkula Center’s Framework for Ethical Decision-Making
- Penn State’s “Guide to Ethical Frameworks”
As well as those, there’s two easy ways to think through this:
- Would I want someone else to do this to me?
- Do I have any self-interest (money, power, influence, family, shame…) that is affecting my judgment?
That said, ethics are always based on the society and values you are familiar with. They change with the times and with geography, to some extent. Be wary of trusting just to a set of checkboxes to determine if you’re doing the right thing, especially if that thing has never been done before!
Before choosing an action, and after using the tools available, take a step back and look at the project as a whole. See who and what it affects. Apply your internal sense of right and wrong and ask “will this harm anyone?”
What role do data professionals play in ensuring that ML technologies benefit society at large?
William Tracy: Data professionals are in a unique position with regard to machine learning. They know the ins and outs of a technology that is obscure to most other people!
Many AI and machine learning tools are sold as black boxes where the end user doesn’t need to know “how the sausage is made” to get their answer. Data professionals know “how the sausage is made,” at least in part, and even if they don’t have the ethical training to completely evaluate the decision, they can explain the functions and outcomes to someone who does.
How important is it for ML practitioners to collaborate with ethicists or experts from other disciplines?
William Tracy: This leads directly from the question above. If a machine learning practitioner doesn’t have the ethical background, collaborating with experts is a great way to both learn ethics, and to make sure the correct decisions are made.
Beyond bias, what emerging ethical challenges do you foresee in the next decade for machine learning and AI?
William Tracy: There are a lot! In a brief list of issues: copyright and IP, plagiarism, out of country “laundering,” overburdening of the market for certain creations, and quality of creations.
The above are all practical examples for machine learning models and AI assistants, many of which are happening right now. There’s another big issue, however, that gets to the heart of the difference between machine learning and artificial intelligence.
Remember that there is no internal “intelligence” driving the results of machine learning models—it’s simply math. The intelligence part falls on the user. However, as models get more complex, there’s the possibility of creating more elaborate systems using machine learning, and ones that nest machine learning models within each other so they feed off of each other. We must always check where these systems lie on the scale of true intelligence.
We haven’t yet managed to make a machine that is capable of making informed decisions and self direction like a human—that is, an actual artificial intelligence. If this happens (and I personally believe this is still 50-100 years off if it does happen) then we have an entirely different set of ethical issues that must be considered. If we are using a self-aware system for a task, we must now ask if it’s worthy of being treated like a human, and whether we can use it for tasks like this.
Designing ethics into CareerFoundry’s ML specialization course
Can you share a real-world example where ethical considerations in machine learning significantly impacted the outcome of a project?
William Tracy: A court ruling in August 2023 says that AI-created art cannot receive copyrights in the US. This has immediate impacts for any company that’s planning on licensing AI art for merchandising purposes.
For example, anyone else can use that art as well, because it cannot be restricted. This ruling will certainly be challenged in the near future, and may be reversed. It also may be applied to other content created by AI. This uncertainty has direct impacts on any company thinking of using content created with help or wholly by AI.
How does CareerFoundry’s ML specialization course prepare students to tackle ethical challenges in their careers?
William Tracy: The machine learning specialization course is specifically designed to encourage the student to think about the ethics of the project they are working on.
An entire lesson is devoted to ethics, and the ethics must be considered in the final project for both achievements. Thinking in this way while completing the project will help the student to be ready to consider ethical challenges in whatever work they do.
How important is continuous learning and updating one’s knowledge in the field of ML ethics?
William Tracy: Because data analytics and machine learning are newer disciplines, there’s more responsibility placed on the practitioner to make sure to keep learning about the field and keeping abreast of the latest developments. Especially because the field is changing so much, knowing the more recent advances can mean the difference between sounding informed in a job interview, and sounding out of date.
Thanks William for your time!
Wrap-up and further reading
Well, I hope you got a lot out of that enlightening interview on the ethical considerations in machine learning with someone very familiar with the field. To remind yourself, here are some key takeaways:
- Ethics are a set of human moral rules, some of which are subjective in scope. What may be a breach in ethics for one person, company, or country, may not be for another.
- When using AI, the answers generated are created from many other examples, sometimes in a managed dataset, sometimes freely gathered on the internet. How then can you know whose ethics you are using?
- It’s vital for data professionals to have a grounding in ethics because the ability to consider long-term effects of a developing issue or technology is different from the technical know-how to create or understand the technology.
- Regulations and policies for machine learning and AI are evolving slowly, and there is a lot of misunderstanding on the part of what AI is.
If you think that figuring out how to use machine learning with the guidance and assistance of experienced industry experts like William sounds appealing, then our Machine Learning with Python Specialization Course is for you. In this remote, fully mentored course, you’ll learn how to build and deploy machine learning algorithms over two months, with ML projects to show for it.
If you’d like to read more about Machine Learning first, then check out these articles: