A brief introduction to geographic analysis

Making mistakes in geographic analysis is disturbingly easy. The “Intro to Geographic Analysis” materials briefly discuss computational representations of geographic data. Then I delve into potential gotchas — from spatial databases to hexagonal partitioning, from avoiding analysis on lat-longs to choosing appropriate graphical formats, and more.

Hybrid Word-Character Neural Machine Translation for Arabic

My final project for Stanford CS 224N was on hybrid word-character machine translation for Arabic.

Traditional models of neural machine translation make the false-but-true-in-English assumption that words are essentially equivalent to units of meaning. Morphologically rich languages disobey this assumption. We implement a hybrid translation model that backs off unknown words to a representation created by modeling their constituent characters in TensorFlow, we apply the model to Arabic translation, and approach state-of-the-art performance for Arabic over the weeks allotted for a class project.

Microservices for a non-technical audience

In the wake of 2015/2016 microservice hype, my tech-adjacent leadership struggled to understand “what a microservice is”, and whether they should push their organizations to transition to microservices (or allow their engineers to push them toward that end). Upon request, I gave this talk on microservices for non-technical audiences to distill tangible wisdom and add advice.

My First AI (or: Decision Trees & Language Modeling for Middle Schoolers)

I gave the keynote address at Byte Sized, a workshop for middle school girls spearheaded by SAIL ON students. My First AI (or: Decision Trees & Language Modeling for Middle Schoolers) solidified the basics of artificial intelligence and the if/else statements taught the previous day.

The talk introduces the language identification problem within AI, teaches about decision trees, and then asks students to write decision trees in small groups to distinguish between Hmong, Balinese, Zulu, and other languages. After a debrief on why computers are might be more effective than human-written rules, it briefly ties in themes of feature extraction and gradient descent via GBMs.

Modeling with Naive Bayes

As the progenitor and leader of SAIL ON, the Stanford AI Lab’s year-round effort to attract and keep underrepresented minorities in the field of Artificial Intelligence, I engage high schoolers about artificial intelligence, machine learning, and the positive social impacts of our field.  SAIL ON meets once a month in the Computer Science building at Stanford.  Its trifold focus allows past participants in the SAILORS two-week summer camp to continue to learn about AI, to nurture strong relationships with each other, and to lead outreach projects that bring the technical, humanistic, and diversity missions of the AI outreach program to the wider community.

Screenshot of title slide of deck

As the educational component of the October meeting of SAIL ON, we discussed and applied Naive Bayes modeling.  Like other machine learning methods, Naive Bayes is a generic approach to learning from specific data such that we can make predictions in the future.  Whether the application is predicting cancer/whether you’ll care about an email/who will win an election, we can use the mathematics of Naive Bayes to connect a set of input variables to a target output variable.  (Of course, some problems are harder than others!)

We focused on the derivation of Naive Bayes with a chalk-and-talk discussion (slides), identifying why Naive Bayes is mathematically justified and posing some deeper thought questions.  We checked understanding with a hand-calculation of a Naive Bayes problem (handout): does a shy student received an A in a class, given some observations about her and some observations about more forthcoming students?  We then turned to a Jupyter Notebook that applies the same methods on a larger scale, working on the Titanic challenge from Kaggle with an applied introduction to pandas and sklearn: given passenger manifest records, can we predict who survived?

By providing this basis, I hope to increase appreciation for applications of what students are seeing in their math classes, and to facilitate students moving further on their own with applied machine learning before November’s meeting.

Screenshot of worksheet
Screenshot of Jupyter notebook on github