Drs. Marshini Chetty and Nick Feamster Reveal a World of Potential for Big Data Science
This week, two members of the Princeton University Computer Science department, Dr. Nick Feamster, professor, and Dr. Marshini Chetty, research faculty, visited PDS for a half day to discuss artificial intelligence and machine learning as part of the School's Pioneers in Science annual speaker series.
"Artificial intelligence and machine learning have become buzzwords in computer science and industry, but few us really have any idea what it actually means," noted PDS Science Chair Jason Park. "Dr. Feamster and Dr. Chetty, who also are PDS parents, are giving our Upper School community a sense of how machine learning and big data science are the wave of the future in technology, drawing from their own research and that of leading pioneers in the field."
Opening the Upper School assembly in McAneny Theater on April 1, Aaliyah Sayed '21, VP of the Computer Science and Technology Club, and Om Suchak '21, head of the AI Club, introduced the Pioneers in Science speakers and invited students interested in computer science to the Q&A session afterward in the STEAM Center.
Drs. Chetty and Feamster began the presentation focusing on internet security and safety and the role of artificial intelligence (AI) and machine learning (ML) in combating internet attacks and misleading online information. Dr. Feamster first boiled down the definition of AI as "more about having a lot of so-called 'dirty data' that needs to be crunched," and machine learning as "all about prediction based on data inputs."
He then explained the work of a dedicated group of computer scientists at Princeton University who are using these big data approaches to analyze smart home security and identify potential problems through Princeton's "Smart Home Lab," an actual home on nearby Prospect Street. Their message, in a world of "smart-everything" and constant connectivity:
1.) it's essential to raise awareness about the security, privacy and performance problems arising from data being captured via your mobile and other computer devices as well as your security cameras, smart TVs and other devices and appliances;
2) there's a lot we can do if we apply AI and ML to help tackle the challenges of data mining and disinformation;
3) while online ads designed to leverage our consumer patterns may seem to be the most pervasive – and profitable -- applications of AI and ML today (think: Google), it's crucial for today's students to embrace applications of big data computer science that bring the potential to improve societies and tackle systemic problems in areas from medical and health research to social and legal justice and public policy.
"There's a huge opportunity to use AI and machine learning to impact society in positive ways besides getting people to click on ads," Dr. Feamster noted.
On the raising-awareness front, Drs. Chetty and Feamster are pursuing big data studies using Python (a computer coding language)-based "Web Crawler Prefix Sampling" that can help classify disinformation vs. legitimate information, including helping identify undisclosed ads that masquerade as personal or "influencer" content. Dr. Chetty pointed to their recent study of 500,000 influencer YouTube videos pointing viewers to about 400,000 affiliate marketing urls, and 2.1 million influencer Pinterest pins pointing viewers to about 1.7 million urls: the results of their data analysis showed that a whopping 90% of YouTube video affiliate url marketing is undisclosed and an even higher 93% of Pinterest pin affiliate marketing does not contain the required ad disclosures. The goal of this research is to help viewers more easily identify "hidden" ads embedded in content with new applications and features for browsers and platforms.
After the presentations, Upper School computer science students gathered in the Wellemeyer STEAM Center for an extended Q&A session.
"If you know Java, you can use Python and quickly start writing machine learning code – yet the other side of these arrays is that you have no idea what it's doing to make predictions. It's critical to try to understand what the tool is doing to come to its conclusions," Dr. Feamster explained.
"Any time there's an improvement in algorithms that come from machine learning, I benefit because I can apply it to what I'm doing," said Dr. Chetty.
A recurring discussion item centered on the most promising areas of machine learning and artificial intelligence. "One hot area now is deep learning. It's quite interesting because in certain areas – vision, speech and translation, for example – it's working really well," Dr. Chetty explained.
"The applications of machine learning are incredibly broad. There's extensive interest in predictive policing and criminal justice – analyzing data to help determine whether someone is going to be a repeat offender, whether to set bail or hold someone, for example. A huge area of current exploration is evaluating fairness in machine learning and attempting to make more transparent the process of understanding if and why an algorithm is fair. There's an intense interest in so-called 'explainable' learning," Dr. Feamster explained, "and AI is terrible at this because it just gives the answer."
"Institutions may want to use explainable learning systems in areas such as college admissions analysis, but your model is only as good as the human decision-making that goes into selecting the data sets," said Dr. Chetty. "People are inherently biased and so the formulas can be biased."
For example, Dr. Chetty pointed out that facial recognition software is much better at recognizing white faces than black faces because of the inherent bias in the algorithm components. "We may never get to fairness, but we can do more to understand why machine-learning-based decisions are made and why algorithms are what they are," Dr. Chetty said. "It's a human design problem, and we're counting on today's students to join the effort to figure this out." - Melanie Shaw
Photo, above, from left to right: Dr. Nick Feamster, PDS Science Department Chair Jason Park, Om Suchak '21, Aaliyah Sayed '21, Dr. Marshini Chetty and PDS computer science teacher Theodor Brasoveanu at the Wellemeyer STEAM Center after the Pioneers in Science presentation on April 1, 2019. Below: Drs. Chetty and Feamster during an in-depth Q&A session with PDS computer science students.