How a Math Genius Hacked OkCupid to Find True Love – Wired Science
Chris McKinlay was folded into a cramped fifth-floor cubicle in UCLA’s math sciences building, lit by a single bulb and the glow from his monitor. It was 3 in the morning, the optimal time to squeeze cycles out of the supercomputer in Colorado that he was using for his PhD dissertation. (The subject: large-scale data processing and parallel numerical methods.) While the computer chugged, he clicked open a second window to check his OkCupid inbox.
McKinlay, a lanky 35-year-old with tousled hair, was one of about 40 million Americans looking for romance through websites like Match.com, J-Date, and e-Harmony, and he’d been searching in vain since his last breakup nine months earlier. He’d sent dozens of cutesy introductory messages to women touted as potential matches by OkCupid’s algorithms. Most were ignored; he’d gone on a total of six first dates.
On that early morning in June 2012, his compiler crunching out machine code in one window, his forlorn dating profile sitting idle in the other, it dawned on him that he was doing it wrong. He’d been approaching online matchmaking like any other user. Instead, he realized, he should be dating like a mathematician.
OkCupid was founded by Harvard math majors in 2004, and it first caught daters’ attention because of its computational approach to matchmaking. Members answer droves of multiple-choice survey questions on everything from politics, religion, and family to love, sex, and smartphones.
On average, respondents select 350 questions from a pool of thousands—“Which of the following is most likely to draw you to a movie?” or “How important is religion/God in your life?” For each, the user records an answer, specifies which responses they’d find acceptable in a mate, and rates how important the question is to them on a five-point scale from “irrelevant” to “mandatory.” OkCupid’s matching engine uses that data to calculate a couple’s compatibility. The closer to 100 percent—mathematical soul mate—the better.
But mathematically, McKinlay’s compatibility with women in Los Angeles was abysmal. OkCupid’s algorithms use only the questions that both potential matches decide to answer, and the match questions McKinlay had chosen—more or less at random—had proven unpopular. When he scrolled through his matches, fewer than 100 women would appear above the 90 percent compatibility mark. And that was in a city containing some 2 million women (approximately 80,000 of them on OkCupid). On a site where compatibility equals visibility, he was practically a ghost.
He realized he’d have to boost that number. If, through statistical sampling, McKinlay could ascertain which questions mattered to the kind of women he liked, he could construct a new profile that honestly answered those questions and ignored the rest. He could match every woman in LA who might be right for him, and none that weren’t.
Chris McKinlay used Python scripts to riffle through hundreds of OkCupid survey questions. He then sorted female daters into seven clusters, like “Diverse” and “Mindful,” each with distinct characteristics. Maurico Alejo
Even for a mathematician, McKinlay is unusual. Raised in a Boston suburb, he graduated from Middlebury College in 2001 with a degree in Chinese. In August of that year he took a part-time job in New York translating Chinese into English for a company on the 91st floor of the north tower of the World Trade Center. The towers fell five weeks later. (McKinlay wasn’t due at the office until 2 o’clock that day. He was asleep when the first plane hit the north tower at 8:46 am.) “After that I asked myself what I really wanted to be doing,” he says. A friend at Columbia recruited him into an offshoot of MIT’s famed professional blackjack team, and he spent the next few years bouncing between New York and Las Vegas, counting cards and earning up to $60,000 a year.
The experience kindled his interest in applied math, ultimately inspiring him to earn a master’s and then a PhD in the field. “They were capable of using mathematics in lots of different situations,” he says. “They could see some new game—like Three Card Pai Gow Poker—then go home, write some code, and come up with a strategy to beat it.”
Now he’d do the same for love. First he’d need data. While his dissertation work continued to run on the side, he set up 12 fake OkCupid accounts and wrote a Python script to manage them. The script would search his target demographic (heterosexual and bisexual women between the ages of 25 and 45), visit their pages, and scrape their profiles for every scrap of available information: ethnicity, height, smoker or nonsmoker, astrological sign—“all that crap,” he says.
To find the survey answers, he had to do a bit of extra sleuthing. OkCupid lets users see the responses of others, but only to questions they’ve answered themselves. McKinlay set up his bots to simply answer each question randomly—he wasn’t using the dummy profiles to attract any of the women, so the answers didn’t matter—then scooped the women’s answers into a database.
McKinlay watched with satisfaction as his bots purred along. Then, after about a thousand profiles were collected, he hit his first roadblock. OkCupid has a system in place to prevent exactly this kind of data harvesting: It can spot rapid-fire use easily. One by one, his bots started getting banned.
He would have to train them to act human.
He turned to his friend Sam Torrisi, a neuroscientist who’d recently taught McKinlay music theory in exchange for advanced math lessons. Torrisi was also on OkCupid, and he agreed to install spyware on his computer to monitor his use of the site. With the data in hand, McKinlay programmed his bots to simulate Torrisi’s click-rates and typing speed. He brought in a second computer from home and plugged it into the math department’s broadband line so it could run uninterrupted 24 hours a day.
After three weeks he’d harvested 6 million questions and answers from 20,000 women all over the country. McKinlay’s dissertation was relegated to a side project as he dove into the data. He was already sleeping in his cubicle most nights. Now he gave up his apartment entirely and moved into the dingy beige cell, laying a thin mattress across his desk when it was time to sleep.
For McKinlay’s plan to work, he’d have to find a pattern in the survey data—a way to roughly group the women according to their similarities. The breakthrough came when he coded up a modified Bell Labs algorithm called K-Modes. First used in 1998 to analyze diseased soybean crops, it takes categorical data and clumps it like the colored wax swimming in a Lava Lamp. With some fine-tuning he could adjust the viscosity of the results, thinning it into a slick or coagulating it into a single, solid glob.
He played with the dial and found a natural resting point where the 20,000 women clumped into seven statistically distinct clusters based on their questions and answers. “I was ecstatic,” he says. “That was the high point of June.”
He retasked his bots to gather another sample: 5,000 women in Los Angeles and San Francisco who’d logged on to OkCupid in the past month. Another pass through K-Modes confirmed that they clustered in a similar way. His statistical sampling had worked.
Now he just had to decide which cluster best suited him. He checked out some profiles from each. One cluster was too young, two were too old, another was too Christian. But he lingered over a cluster dominated by women in their mid-twenties who looked like indie types, musicians and artists. This was the golden cluster. The haystack in which he’d find his needle. Somewhere within, he’d find true love.
Actually, a neighboring cluster looked pretty cool too—slightly older women who held professional creative jobs, like editors and designers. He decided to go for both. He’d set up two profiles and optimize one for the A group and one for the B group.
He text-mined the two clusters to learn what interested them; teaching turned out to be a popular topic, so he wrote a bio that emphasized his work as a math professor. The important part, though, would be the survey. He picked out the 500 questions that were most popular with both clusters. He’d already decided he would fill out his answers honestly—he didn’t want to build his future relationship on a foundation of computer-generated lies. But he’d let his computer figure out how much importance to assign each question, using a machine-learning algorithm called adaptive boosting to derive the best weightings.
Emily Shur (Grooming by Andrea Pezzillo/Artmix Beauty)
With that, he created two profiles, one with a photo of him rock climbing and the other of him playing guitar at a music gig. “Regardless of future plans, what’s more interesting to you right now? Sex or love?” went one question. Answer: Love, obviously. But for the younger A cluster, he followed his computer’s direction and rated the question “very important.” For the B cluster, it was “mandatory.”
When the last question was answered and ranked, he ran a search on OkCupid for women in Los Angeles sorted by match percentage. At the top: a page of women matched at 99 percent. He scrolled down … and down … and down. Ten thousand women scrolled by, from all over Los Angeles, and he was still in the 90s.
He needed one more step to get noticed. OkCupid members are notified when someone views their pages, so he wrote a new program to visit the pages of his top-rated matches, cycling by age: a thousand 41-year-old women on Monday, another thousand 40-year-old women on Tuesday, looping back through when he reached 27-year-olds two weeks later. Women reciprocated by visiting his profiles, some 400 a day. And messages began to roll in.
“I haven’t until now come across anyone with such winning numbers, AND I find your profile intriguing,” one woman wrote. “Also, something about a rugged man who’s really good with numbers … Thought I’d say hi.”
“Hey there—your profile really struck me and I wanted to say hi,” another wrote. “I think we have quite a lot in common, maybe not the math but certainly a lot of other good stuff!”
“Can you really translate Chinese?” yet another asked. “I took a class briefly but it didn’t go well.”
The math portion of McKinlay’s search was done. Only one thing remained. He’d have to leave his cubicle and take his research into the field. He’d have to go on dates.
On June 30, McKinlay showered at the UCLA gym and drove his beat-up Nissan across town for his first data-mined date. Sheila was a web designer from the A cluster of young artist types. They met for lunch at a cafe in Echo Park. “It was scary,” McKinlay says. “Up until this point it had almost been an academic exercise.”
By the end of his date with Sheila, it was clear to both that the attraction wasn’t there. He went on his second date the next day—an attractive blog editor from the B cluster. He’d planned a romantic walk around Echo Park Lake but found it was being dredged. She’d been reading Proust and feeling down about her life. “It was kind of depressing,” he says.
Date three was also from the B group. He met Alison at a bar in Koreatown. She was a screenwriting student with a tattoo of a Fibonacci spiral on her shoulder. McKinlay got drunk on Korean beer and woke up in his cubicle the next day with a painful hangover. He sent Alison a follow- up message on OkCupid, but she didn’t write back.
The rejection stung, but he was still getting 20 messages a day. Dating with his computer-endowed profiles was a completely different game. He could ignore messages consisting of bad one-liners. He responded to the ones that showed a sense of humor or displayed something interesting in their bios. Back when he was the pursuer, he’d swapped three to five messages to get a single date. Now he’d send just one reply. “You seem really cool. Want to meet?”
By date 20, he noticed latent variables emerging. In the younger cluster, the women invariably had two or more tattoos and lived on the east side of Los Angeles. In the other, a disproportionate number owned midsize dogs that they adored.
His earliest dates were carefully planned. But as he worked feverishly through his queue, he resorted to casual afternoon meetups over lunch or coffee, often stacking two dates in a day. He developed a set of personal rules to get through his marathon love search. No more drinking, for one. End the date when it’s over, don’t let it trail off. And no concerts or movies. “Nothing where your attention is directed at a third object instead of each other,” he says. “It’s inefficient.”
Love is a Data Field
McKinlay’s code found that the women clustered into statistically identifiable groups who tended to answer their OkCupid survey questions in similar ways. One group, which he dubbed the Greens, were online dating newbies; another, the Samanthas, tended to be older and more adventuresome. Here’s how each cluster answered four of the most popular questions.
The Questions
One night
A few months to a year
Several years
The rest of my life
1-2 dates
3-5 dates
6 or more dates
Only after the wedding
Yes, and I enjoyed myself
Yes, and I did not enjoy myself
No, and I would never
No, but I’d like to
Extremely important
Somewhat important
Not very important
Not important at all
Most unsuccessful daters confront self-esteem issues. For McKinlay it was worse. He had to question his calculations.
Then came the message from Christine Tien Wang, a 28-year-old artist and prison abolition activist. McKinlay had popped up in her search for 6-foot guys with blue eyes near UCLA, where she was pursuing her master’s in fine arts. They were a 91 percent match.
He met her at the sculpture garden on campus. From there they walked to a college sushi joint. He felt it immediately. They talked about books, art, music. When she confessed that she’d made some tweaks to her profile before messaging him, he responded by telling her all about his love hacking. The whole story.
“I thought it was dark and cynical,” she says. “I liked it.”
It was first date number 88. A second date followed, then a third. After two weeks they both suspended their OkCupid accounts.
“I think that what I did is just a slightly more algorithmic, large-scale, and machine-learning-based version of what everyone does on the site,” McKinlay says. Everyone tries to create an optimal profile—he just had the data to engineer one.
It’s one year after their first date, and McKinlay and Tien Wang have met me at the Westwood sushi bar where their relationship began. McKinlay has his PhD; he’s teaching math and is now working on a postgraduate degree in music. Tien Wang was accepted into a one-year art fellowship in Qatar. She’s in California to visit McKinlay. They’ve been staying connected on Skype, and she has returned for a couple of visits.
At my request, McKinlay has brought his lab notebook. Tien Wang hasn’t seen it before today. It’s page after page of formulas and equations in McKinlay’s tight handwriting, ending in a neatly ordered list of women and dates, a few terse notes about each. Tien Wang leafs through it, laughing at some of the highlights. On August 24, she notices, he took two women to the same beach on the same day. “That’s horrible,” she says.
To Tien Wang, McKinlay’s OkCupid hacking is a funny story to tell. But all the math and coding is merely prologue to their story together. The real hacking in a relationship comes after you meet. “People are much more complicated than their profiles,” she says. “So the way we met was kind of superficial, but everything that happened after is not superficial at all. It’s been cultivated through a lot of work.”
“It’s not like, we matched and therefore we have a great relationship,” McKinlay agrees. “It was just a mechanism to put us in the same room. I was able to use OkCupid to find someone.”
She bristles at that. “You didn’t find me. I found you,” she says, touching his elbow. McKinlay pauses to think, then admits she’s right.
A week later Tien Wang is back in Qatar, and the couple is on one of their daily Skype calls when McKinlay pulls out a diamond ring and holds it up to the webcam. She says yes.
They’re not entirely sure when they’ll get married. There’s research to be done to determine the optimal wedding day.