Proof
  • November 22nd
    119 notes
    Source

    What’s Wrong with OKCupid’s Matching Algorithm

    isomorphismes:

    OKCupid is using the wrong mathematics to match potential dates together. But before I critique them, let me compliment them on what they’re doing right:

    • “Our” mutual score is the geometric average of your score of me, and my score of you.
    • They low-ball the match % until they have enough statistical confidence in the number of questions we’ve both answered.
    • Questions come from users as well as staff. So they avoid some potential blind spots. (crowdsourcing)
    • OKCupid prompts you with questions that have the greatest chance of distinguishing you as quickly as possible. (maximally separating hyperplanes) If OKC already knows you want your date to shower at least once a day, keep a clean room, and that picking food from the trashcan is unacceptable, it won’t ask if you prefer crustpunks or gutterpunks.
    • You don’t have to be the same as me for us to match. I get to specify what answers I want from you.
    • They use a logarithmic scale of importance. Logs are the natural way we perceive levels or categories of importance. (For example “categories” of how big a war emerge naturally when you take the log of number of deaths.)
    • It’s simple. At least they’re not using a non-linear Bayesian splitting tree didactogram or some other hunky machine-learning jiu jitsu.

    But, there’s still room for improvement — as was pointed out to me by Becky Russoniello. Currently, OKCupid is set up to award high scores just for being not-a-terrible match. That’s bad.

    HOW MATCH PERCENTAGES WORK NOW

    To show why I need to first detail how your score of me is calculated:

    1. You answer questions like, “Is homosexuality a sin?” Your answer consists of: (a) what you think, (b) what answer/s are acceptable for me to give, and (c) how important it is for me to get this question “right” per your definition.
    2. The question’s importance draws from {Mandatory, Very Important, Somewhat Important, A Little Important, Irrelevant} which correspond to the numbers {250, 50, 10, 1, 0}.
    3. If I get a Very Important question “right”, I get 50/50 points, and if I get a Very Important question “wrong”, I get 0/50 points. If I haven’t answered the Very Important question, I get 0/0 points — neither penalised nor rewarded.

    For more details, see their FAAAQ.

    THE PROBLEM

    Here’s the important flaw: the denominator grows as long as we’ve answered the same question. In practice, the Mandatory questions both

    1. crowd out more interesting differentiators, and
    2. inflate the scores of people who merely have tolerable political views.

    To demonstrate this, I’ll share some of the Mandatory questions from my own OKCupid profile.

    • Do you think homosexuality is a sin?
    • How often are you open with your feelings? (can’t be Rarely or Never)
    • Would it bother you if your boss was minority, female, or gay?
    • Would you write your child’s college entry essay?
    • What volume level do you prefer when listening to music? (can’t be “I prefer not to listen to music”)
    • Would you try to control your mate with threats of suicide?
    • Gay marriage — should it be legal?
    • Are you married, engaged to be married, or in a relationship that you believe will lead to marriage?
    • How important to you is a match’s sense of humor? (can’t be Not Important)
    • Would the world be a better place if people with low IQ’s were not allowed to reproduce?

    Some other doozies which I might wrongly make Mandatory or Very Important, include:

    • Which is bigger? The Earth, or the Sun?
    • How many continents are there?
    • Do you consider astrology to be a legitimate science?

    The problem with all of these filters, is that I mean them to act only in a negative direction. (Could I call them “quasi-filters”?)

    NON-TERRIBLE ≠ GOOD

    In other words, someone doesn’t become a great potential match simply because they’re not

    • a bigot,
    • a cheat,
    • a eugenicist,
    • or a depressive manipulative.

    You need to receive those check-marks just to get to zero with me. You also need to be not-married-to-someone-else. That doesn’t win you plus points, it’s just a requirement. But under the current OKCupid schema, you do win 250/250 from me for simply being available. Oops.

    Likewise, knowing basic facts from grade-school seems like, uh, necessary. But, even if somebody thinks there are 6 or 8 continents, do you really think you won’t be able to tell once they message you?

    Few people will be culled by the Continents question, and if you make 10 such easy questions Mandatory, then everybody else will start with 2500/2500 points — so the rest of your match questions will barely distinguish one from the other. Even the Very Important questions (50 points apiece) will only budge the score a little below a default of 100%. And the Somewhat Important questions, which tend to be the more discriminative ones, are mowed down by the juggernaught of Easy Questions.

    OKCupid asks other, more useful questions, like:

    • Are you annoyed by people who are super logical?
    • Do you like abstract art?
    • Do you spend more money on clothes, or food?
    • Could you tolerate a ___________________ [my political / religious views] ?
    • Do you like dogs?

    would actually distinguish among potential dates for me. Let’s face it: I write a blog about mathematics, so someone who is annoyed by super logical people is probably going to dislike me. And, I like abstract art — so that could be something to talk about.

    Although everyone knows there are 7 continents, not everyone is bothered by “logical” personalities. So those questions better sort the available dates.

    want to go on a cruise on us stevenf?

    SPAMMABLE

    The worst side effect of the current scoring system, is that a spammer could easily answer only the questions with obvious answers (basic facts and display of non-bigotry) and get a decently high match percentage with a lot of people. At which point, the spammer uploads a picture of an attractive guy/girl, writes very little generic profile stuff, and scams away.

    HELLO ME NOT DEAD

    THEY CAN MAKE THE SYSTEM BETTER

    I think the right model for thinking about how people evaluate potential dates can be found in economics. Specifically, Kahneman & Tversky’s Prospect Theory:

    The main lessons I draw from prospect theory, as a theory of psychology, are:

    1. We evaluate things based on a reference point (“zero”).
    2. Small perceived negatives are twice as bad, as small perceived positives are good (“local kink at zero”).
    3. Really bad or really good, we lose our ability to coherently measure how good/bad (“log-like at high distances”).

    How does P.T. apply to dating and OKCupid?

    Amos Tversky

    Bigots, cheats, eugenicists, and depressive manipulatives are way off in negative land. I’m not even interested in meeting them. I don’t care whether OKC gives them a 0% or a 10%, because those are effectively the same to me: ignore. I want OKCupid to accurately score people who are somewhere north of my reference point.

    • What if the scoring system simply trashed everyone below 50%? They could all be labelled “non-match” and then twice as many numbers would be available to grade the remaining candidates.

      That’s a mathematically good idea, but doesn’t address the issue of dilution. And, it seems to ignore an aspect of “numbers psychology”: people like using only the upper half of the scale. Think about how people use the hotness scale: they would never be comfortable dating a 4.
    • What if OKCupid revamped their whole framework along the lines of Prospect Theory? Try to establish a reference point, do some research into psychology papers that bear on the topic, and so on.

      Well, it might be cool. But that’s a lot of work, and OKC is already successful. Big changes alienate users.

    Here’s the simplest solution I can think of — which requires no UI changes and no research. In fact an OKC developer should only need to amend one line of code.

    • Mandatory questions can only give out negative points for answering wrong. No plus points for right answers to Mandatory.

    Mathematically this is ugly because you introduce a discontinuity — but, so what? I think this is what the broad majority of people mean when they say something is mandatory. If you have a mandatory employee meeting, do people get a bonus for showing up? Does HM Revenue pat you on the back for paying your taxes?

    In the eloquent phrasing of Chris Rock:

    If OKC ends out giving some negative (or I guess imaginary, under the square root from the geometric average) scores, so what? I was ignoring everybody under 60% anyway.

    YOU CAN MAKE YOUR SCORES BETTER

    If you use OKCupid, there is a way to improve your matches even if they never change their matching algorithm:

    • Lower the importance of questions with obvious answers. I bet you won’t start matching with people who believe the Earth is larger than the Sun. And you will pick up extra precision in matches with other people.
    • Even if something is mandatory for you to date someone, don’t use the Mandatory category like that. Maybe you can have a few mandatory questions, but overall it just dilutes the scoring.
    1. webapppenetrationtesting reblogged this from isomorphismes
    2. fulgore-vox likes this
    3. myhandmadejewelry likes this
    4. static-void likes this
    5. jerw17 reblogged this from isomorphismes
    6. tallpawl reblogged this from proofmathisbeautiful
    7. nope-not-today likes this
    8. haleysunshine reblogged this from isomorphismes
    9. jjjjasmine reblogged this from proofmathisbeautiful and added:
      wow. I skimmed this and I don’t really use the site, but I’m glad that there’s a thorough analysis.
    10. rei0 reblogged this from isomorphismes
    11. labisclosed reblogged this from proofmathisbeautiful
    12. irenaissance likes this
    13. intothecontinuum likes this
    14. flightbulb likes this
    15. mathdroid reblogged this from isomorphismes
    16. orchmail likes this
    17. orchmail reblogged this from proofmathisbeautiful
    18. frostedge likes this
    19. erasorhed reblogged this from proofmathisbeautiful
    20. mindcrafter likes this
    21. humantooth likes this
    22. xhyori likes this
    23. notagood likes this
    24. lauratheoutlandish likes this
    25. teapotahedron likes this
    26. untitledmontage likes this
    27. shmericalex likes this
    28. awfuldaring reblogged this from proofmathisbeautiful
    29. looweezbx likes this
    30. jillyyfish reblogged this from proofmathisbeautiful
    31. auto-nomously likes this
    32. redcloud-reblogs reblogged this from proofmathisbeautiful
    33. tonyt reblogged this from proofmathisbeautiful
    34. thequirkyinventor likes this
    35. misskaffeine likes this
    36. tyb reblogged this from proofmathisbeautiful
    37. mathdroid likes this
    38. heavyblackout likes this
    39. sinsquared likes this
    40. yaspneumaticpsuche likes this
    41. shappenfit likes this
    42. skepticalorangutan likes this
    43. entireteamissnorlax reblogged this from proofmathisbeautiful
    44. clickbreatheclick reblogged this from proofmathisbeautiful
    45. anoemi likes this
    46. treehouseboat likes this
    47. randometry likes this
    48. threegoldbones reblogged this from proofmathisbeautiful
    49. philimina likes this
    50. slitcut likes this
    51. Show more notesLoading...
RSS Archive
Ask me anything Submit
Hosted by
  • heathernicolezilla