Pronouncing English from Colombian Spanish

I was invited to help a Colombian Spanish-speaking adult with pronunciation in English. I veer “academic”, so I went looking for rigorous scientific phonetics and phonology resources. It turns out that there aren’t many.

Even so, I ended up creating some pronunciation resources for a Colombian Spanish speaker learning English as a second language. I want to save them in case they are ever useful to someone else (including future me!):

(It turned out these resources weren’t perfectly right for my speaker, unfortunately. For instance, her dialect uses the short i /ɪ/ sound like in “fit”, but more rarely uses the long i /i/ like in “feet”; many Spanish dialects are the opposite.)

Spanish-English minimal pairs

berry/very; bag/beg; hat/hut; hat/heart; heart/hot; heart/hut; heart/hurt; wait/wet; hey/hi; bear/beer; beat/bit; beg/bug; beg/big; bird/bored; bird/bud; pot/port; boat/bought; hope/hop; hole/howl; bill/pill; cheap/jeep; cherry/sherry; chart/tart; deep/jeep; dent/tent; day/they; dawn/thorn; fast/past; ferry/very; bag/back; heart/art; jaw/your; line/nine; long/wrong; sun/sung; bank/bang; rock/wok; seat/sheet; sing/thing; Sue/zoo; tin/thin; then/zen; verse/worse; Luke/look; sheep/ship; cot/caught; further/farther

Exercises

  • Minimal pair memory
  • Decide whether “each item on my list is the same as the one on your list” where the 2 lists contain homophones, minimal pairs, etc. (fare/fair; fat/vat; …)
  • Stand up if two words are the same; stay seated if they’re different
  • Label each wall with a sound; listen to words and touch the appropriate wall
  • Write 2 columns of words in a shared space; pronounce a word from the list; ask whether it came from column 1 or 2; switch to student-led
  • R-controlled vowel bingo: rows for ɛ˞, ɑ˞, ɔ˞ (or maybe er/ir/ur/or/ar); columns for consonants of your choice; write in/pronounce words that use each cell to win a prize
  • Play the “MM-mm” syllable stress game: repeat the MM-mms to get native-like stress in words, and then build to sentences
  • Given a list of sentences with bolded stressed components, say them aloud and stand up quickly at stressed parts then sit back down (or raise hands, clap hands, tap table, etc.) – “I love coffee; I come here often; I don’t see it; Try this pizza!; He hurt his neck; etc.” – more at fluentu
  • Read a word, exaggerating the stressed syllable – then “echo” the word with a sentence that has a similar stress pattern (e.g., “interruption” –> “Let’s have lunch now”; “interruption” –> “He’s my uncle”; “interruption” –> “I said, under”; “interact” –> “It’s a fact”; “interact” –> “Here’s your hat”; “interact” –> “Where’s my snack?”) – more at fluentu
  • Use voice recognition on the phone to get independent feedback on interpretability

Configuring Python for corporate MITM certificates

We all love HTTPS because it gives us privacy. All HTTPS communications are private between the user and the remote server. Mostly.

On a corporate network, connections usually go through a proxy. The proxy likely man-in-the-middles all the corporate HTTPS connections: it pretends to be the remote site to me, and it pretends to be me to the remote site. Corporate man-in-the-middle (MITM) lets the organization audit & block traffic that would otherwise be opaque and potentially dangerous, which is great – I’m security-minded and all, but when someone who knows a job does it, you get better outcomes. But even though it’s good, corporate proxying still causes trouble for me whenever my whole system doesn’t know to trust the proxy. And that happens more than I’d like.

I find that Python semi-regularly fails requests that succeed in the browser, responding to me with CERTIFICATE_VERIFY_FAILED. When that happens, I let Python know that I trust the corporate certificate by adding it to Python’s parallel trust store, wherever that might be, by doing the following:

  1. Get the PEM for the certificate. On a Mac, go to Keychain Access and “export as pem”.

  2. Run the following:

    import certifi
    import shutil
    
    path_to_mitm_pem = "/path/to/the_exported_corporate_certificate.pem"
    
    # Python certificate store location may vary based on dependency management approach
    cert_store = certifi.where()
    print(f"Python is using the cert store at: {cert_store}")
    with open(path_to_mitm_pem, "rt") as f:
    	assert f.readline() == "-----BEGIN CERTIFICATE-----\n"
    with open(path_to_mitm_pem, 'rb') as new_pem_f, open(cert_store, 'ab') as cert_store_f:
    	shutil.copyfileobj(new_pem_f, cert_store_f)
    

After configuring the cert store, HTTPS requests should start succeeding.

I swear I have to do this at least once a month for some new environment or another.

Your very own Star Trek Computer: Making sense of unexpected pipenv behavior with an LLM

Story time!

So, recently, I’m playing in someone else’s codebase, and I need to use their code to create a new Docker image that I can use for myself elsewhere. Their thin base image doesn’t have all my usual ML dependencies, so I need to extend the image. When I try, it blows up with an “Unknown compiler” message. With some legwork I learn that sklearn depends on scipy, and scipy needs to be compiled from source (and has for years now), and that thin base image doesn’t have the compiler. So I move to a non-default base image, and all is well.

Then I notice something I can’t explain, which I never would have noticed if I hadn’t been in this codebase. I had set Python to 3.9 using pyenv. I’m using pipenv --three to create a Python 3 Pipfile and then pipenv install to add some dependencies to it. (That’s all wrapped in a build script, because that’s how the codebase works, but that’s all the build script is doing.) But when I cat the Pipfile, the environment I just created is using Python 3.10, not 3.9! My mind is blown. I’ve never really used pipenv before… but I had a deeply rooted expectation that pipenv --three would use python3 --version for building the Pipfile. And clearly, it did not.

I read the pipenv docs and don’t see anything obviously relevant.

I do some internet searches. No answer.

I figure the explanation has to be obvious to anyone who actually has some pipenv experience – so I ask around. None of my usual suspects have pipenv experience either.

Then I have my epiphany. ChatGPT was just recently officially corporately blessed. This is the perfect question to use with ChatGPT – I can learn about pipenv, and I can use this to demonstrate how ChatGPT might help developers at the next engineering department sprint demo. (A good number of our developers are using Python professionally for the first time, in a codebase that extensively uses pipenv, and have not yet tried any LLMs – I like the odds of a sprint demo on this topic actually being useful for the audience.) So, I spend a few minutes composing a very careful message with an SSCCE and everything:

I need help making sense of my interactions with pipenv. I think I am creating a 
Python 3.9 environment and installing packages into it, and then freezing that as 
a Pipfile. But when I check the Pipfile, it shows 3.10 as my frozen environment.

Here's my shell interactions:

pipenv global 3.9
pipenv --three
python --version # output: 'Python 3.9.13'
pipenv install $pkg
cat Pipfile | grep python_version # output: 'python_version = "3.10"'

How does pipenv decide which version of Python to use? Be super succinct.

ChatGPT gives me a lovely (but still verbose) response, whose first and penultimate lines are entirely correct and ultimately all I need: “When you run pipenv --three, pipenv creates a new virtual environment using the latest version of Python that you have installed on your system. To create a virtual environment using Python 3.9, you can use pipenv --python 3.9 instead of pipenv --three.” Ah hah!

I ask it a bunch of related pipenv sense-making questions to get myself smarter on pipenv. ChatGPT gets most of my questions beautifully, perfectly, verifiably right. Then, harkening back to my “sklearn requires scipy requires a compiler” trouble from earlier, I ask a trickier (pure pip) question: “can I pip install precompiled binaries of scikit-learn and its upstreams instead of building from source?” ChatGPT’s answer is so wrong it’s painful: “Yes, there are precompiled binary versions of scikit-learn that you can install instead of building from source. You can try installing the precompiled binary version of scikit-learn by running the following command: pip install -U scikit-learn”. No, sorry, definitely not. In no Python world can you get a precompiled binary for all dependencies by upgrading a downstream user library written in Python. That’s a farcical statement.

And so I shared the story and take-aways at this sprint demo this week:

  • pyenv and pipenv work together… mostly.
  • pipenv will ignore your pyenv!
  • You get your very own Star Trek Computer now… it’s better than a rubber duck, and it’s the kindest, least judgmental coworker ever. (But it’s not your actual coworker; don’t send confidential or sensitive information.)
  • Robots are fallible too.

My stealth recruiting pitch

I have a stealth recruiting talk that I give for machine learning at Xpanse. It goes like this:

  • If you want a real mission in your work, cybersecurity can deliver.
  • My realm of cybersecurity is impossible without AI.
  • Doing this job means solving cool, hard problems.

I pretend the talk is all very objective and about teaching you stuff (which hopefully it also does). I also hint at a lot of the problems that I’ve worked on and solved the past few years. Technical people really like being shown problems and getting to chew on them (which is convenient, because I’m not comfortable publicly sharing how I solved these problems!). And I’m sure it helps that I’m earnest – the slides really are what I love most about my job. The talk works, too. We get solid applicants from it.

I was inspired by a talk I saw from Stitch Fix a number of years ago. I have really minimal interest in clothing… but after hearing them give a technical talk about the problems they were solving, I became convinced they were doing really neat modeling and would be worth considering in a job hunt. Pretty effective. So I tried to channel that insight.

The slides I’m linking here conclude with a harder sell than I usually give, as well as some cross-team Palo Alto projects, because I revised this version for an explicitly recruiting context. (One of our senior recruiters had seen me give this talk at the Lesbians Who Tech conference, and he asked me to give it again in a different context.)

Comfort, distress and dominance: Reading body language

Body language can indicate state of mind. Being familiar with body language tells can help people read a room, avoid closing past the sell on a negotiation, and become more self-aware. I wrote and delivered a short orientation to Comfort, distress and dominance: Reading body language as part of a non-technical skills development series within an established team. It is framed as three 2-3 minute topic introductions followed by 5-10 minutes of small group moderated discussion.