Amit's Blog

Math, Code, etc.

So You Still Don’t Understand Hindley-Milner? Part 3

In Part 2, we finished defining all the formal terms and symbols you see in the StackOverflow question on the Hindley-Milner algorithm, so now we’re ready to translate what that question was asking about, namely the rules for deducing statements about type inference. Let’s get down to it!

The rules for deducing statements about type inference

[Var]

\[\underline{x:\sigma \in \Gamma}\] \[\Gamma \vdash x:\sigma\]

This translates to: If “\(x\) has type \(\sigma\)” is a statement in our collection of statements \(\Gamma\), then from \(\Gamma\) you can infer that \(x\) has type \(\sigma\). Here \(x\) is a variable (hence the name of this rule of inference). Yes, it should sound that painfully obvious. The terse, cryptic way that [Var] is expressed isn’t that way because it contains some deep, difficult fact. It’s terse and succinct so that a machine can understand it and type inference can be automated.

So You Still Don’t Understand Hindley-Milner? Part 2

In Part 1, we said what the building blocks of the Hindley-Milner formalization would be, and in this post we’ll thoroughly define them, and actually formulate the formalization:

Formalizing the concept of an expression

We’ll give a recursive definition of what an expression is; in other words, we’ll state what the most basic kind of expression is, we’ll say how to create new, more complex expressions out of existing expressions, and we’ll say that only things made in this way are valid expressions.

Entropy: How Password Strength Is Measured

My colleague Mike Sierchio wrote a cool post on password strength, and the concept of entropy. As he points out, entropy isn’t always entropy. That confusion is apparently not uncommon, as it’s been asked about on IT Security Stack Exchange as well. So what’s really going on?

Let’s step back for a sec and fill in some context. What are we trying to do? We’d like some way to measure how hard it is to guess our passwords, a number that serves as a heuristic standard of password strength. But there are two fundamentally different things we might want to measure:

  1. How hard would it be for someone to guess your password with essentially no knowledge of how you created your password?

  2. How hard would it be for someone to guess your password if they knew the process used to generate it? This is of course assuming that there is a process, for example some script that does some Math.rand-ing and produces a password string.

The term “entropy” has been used to refer to both kinds of calculations, but they’re clearly entirely different things: the former essentially takes a string as input, the latter takes a random process as input. Hence, “entropy is not entropy.”

Alright, well if entropy isn’t entropy, let’s see what entropies are. We’ll look at the standard mathematical formulation of the random-process-entropy which comes from information theory. And we’ll look at the function used to calculate particular-string-entropy in one of the most popular password strength testers. And that’s all we’re going to do, we’ll look at how the calculations are done, without dwelling too much on the differences between the two approaches or what their use cases are.

Yo Dawg, I Herd You Like Math

I’ve been learning a bit of statistical computing with R lately on the side from Chris Paciorek’s Berkeley course. I just got introduced to knitr and it’s damned sweet! It’s an R package which takes a LaTeX file with embedded R, and produces a pure LaTeX file (similar to how Rails renders an .html.erb file into a .html file), where the resulting LaTeX file has the output of the R code. It makes it super easy to embed statistical calculations, graphs, and all the good stuff R gives you right into your TeX files. It let’s you put math in your math, so you can math while you math.

I’ve got a little project which:

  1. Runs a Python script which will use Selenium to scrape a web page for 2012 NFL passing statistics.
  2. “Knits” a TeX file with embedded R that cleans the raw scraped data, produces a histogram of touchown passes for teams, and displays the teams with the least and greatest number of touchdowns.
  3. Compiles the resulting TeX file and opens the resulting PDF.
  4. Cleans up any temporary work files.

Is It Possible to Be 15% Swedish?

This question came up as a joke during a team standup a few months ago. Although the obvious answer is “no,” if you’re willing to play fast and loose with your metaphysics for a bit, the answer can be “yes” and there’s a cute solution that ties together binary numbers and binary trees. This post itself is a bit of a joke in that it’s just for fun, but it might be nice to see the familiar concepts of binary numbers and binary trees in a new light.

The obvious answer is “no”

Let’s quickly see why the real life answer is “no.” But first we should lay out the assumptions implicit in the problem. We’re going to assume that at some point in time, everyone was either entirely Swedish or entirely non-Swedish. There’s a chicken-and-egg problem that we’re sweeping under the rug here, but that’s what rugs are for. Next we’re assuming that every person after that point in time has their Swedishness wholly and equally determined by their parents Swedishness. So if mom is 17% Swedish and dad is 66% Swedish, then baby is ½ x 17% + ½ x 66% = 41.5% Swedish.

So You Still Don’t Understand Hindley-Milner? Part 1

I was out for drinks with Josh Long and some other friends from work, when he found out I “speak math.” He had come across this StackOverflow question and asked me what it meant:



Before we figure out what it means, let’s get an idea for why we care in the first place. Daniel Spiewak’s blog post gives a really nice explanation of the purpose of the HM algorithm, in addition to an in-depth example of its application:

Functionally speaking, Hindley-Milner (or “Damas-Milner”) is an algorithm for inferring value types based on use.  It literally formalizes the intuition that a type can be deduced by the functionality it supports.

Okay, so we want to formalize an algorithm for inferring types of any given expression. In this post, I’m going to touch on what it means to formalize something, then describe the building blocks of the HM formalization. In Part 2, I’ll flesh out the building blocks of the formalization. Finally in Part 3, I’ll translate that StackOverflow question.

Setting Up PostgreSQL With MacPorts for Ruby on Rails Development

This post is pretty much just what the title says. If you develop in Rails, on a Mac, using PostgreSQL as your database and MacPorts as your package management system of choice, it can be hard to get everything set up and going. This is especially true if you’re new to absolutely everything, which was my situation when I first tried to do this. Part of what was hard was that most of the advice on the Internet was just a bunch of commands to run with little explanation, so it was hard to know what you had to do to customize it for your environment. So this explanation is hopefully a little more thorough. However, if you’re impatient and just want the TL;DR, here it is (with several implicit assumptions):

1
2
3
4
5
6
7
8
9
10
11
12
13
sudo port install postgresql92-server
sudo mkdir -p /opt/local/var/db/postgresql92/defaultdb
sudo chown postgres:postgres /opt/local/var/db/postgresql92/defaultdb
sudo su postgres -c '/opt/local/lib/postgresql92/bin/initdb -D /opt/local/var/db/postgresql92/defaultdb'
sudo su postgres
/opt/local/lib/postgresql92/bin/pg_ctl -D /opt/local/var/db/postgresql92/defaultdb/ -l /opt/local/var/db/postgresql92/defaultdb/server.log start
exit
echo 'PATH=/opt/local/lib/postgresql92/bin/:$PATH' >> ~/.bashrc
source ~/.bashrc
rails new my_app --database=postgresql
sed -e 's/username: my_app/username: postgres/g' -i '' my_app/config/database.yml
cd my_app
rake db:create:all