EDM Developer Interviewing Guide

by Jason Williscroft

Here at HotQuant we interview EDM developers all the time, both to subcontract for us and on behalf of our clients. Obviously much of an interview is going to be platform-specific, but the fundamentals are universal: the modeling, storage, and manipulation of data using SQL, and problem-solving in a data-centric environment.

I’ve evolved a list of interview questions that have served me very well in getting to the bottom of what a candidate does and doesn’t know in this space. Of course there’s no substitute for actually seeing the candidate’s code and actually watching him solve problems, but once I work my way through this list I’ve got a pretty good sense of whether I should bother following up.

Fair warning: I wrote this interview, and I can tell the difference between a candidate who knows his stuff and one who just memorized the answers. If you can’t pass this interview yourself, don’t conduct it! Also, if you’re a potential candidate: you’re welcome. If you’re actually qualified to pass this interview, nothing on the list will surprise you. Just remember this stuff is all bullfeathers unless you can express it effectively in code.

  1. Rate your SQL Query skills. On a scale of 1-10, where 1 means “What’s SQL?” and 10 means “I invented it!”, where do you place yourself? This is about level-setting. I’ll come up with my own rating during the interview, and I want to know if the candidate over- or under-rates his skills. The elements of the following questions have numbers after them (10) indicating how high a solid answer should place the candidate on the scale. People who over-rate their skills tend not to be open to learning, so that would be a mark against. Anybody who rates himself a 9 or a 10 is either grossly overconfident or is sitting on the wrong side of the table. I’m also interested in hearing whether the candidate will take advantage of the question to express the distinction between data definition and data manipulation (4).
  2. Compare & contrast the INNER, OUTER, and CROSS JOIN, and give me a scenario where you would use each. This is very basic, and I expect a solid answer (4). CROSS JOINs are rare, so I would expect a bit of a pause coming up with a scenario, but I do want a scenario (5). Bonus points if the candidate uses the term cartesian product in connection with the CROSS JOIN (6). More bonus points if the candidate explains that a FULL OUTER JOIN and a CROSS JOIN are not the same thing, and why (5). Reference here.
  3. What makes a clustered index a clustered index? Obviously the candidate needs to know what a clustered index is and how it differs from non-clustered ones (5). At a minimum I want to hear that a table can only have one clustered index because it governs how the data is physically laid out on the disk (4). Bonus points if the candidate gives his answer in terms of leaf nodes and data pages (7). Even more if the elements of his answer are all correct (8). Reference here.
  4. Compare and contrast a JOIN to an APPLY (5). What are the types of APPLY and how do they work (6)? Give me a scenario where you would use an APPLY (6). Nice if the candidate has actually used the APPLY syntax to do real work (6). A key usage is the application of an inline, table-valued UDF, so extra points for that (7). Reference here.
  5. Compare and contrast the different types of User-Defined Function. There are three; the candidate should be able to name them all (4). I would expect a discussion of table-valued vs. scalar (5) and inline vs. multi-statement (6). I’d also hope to hear a discussion of relative performance characteristics (7) and how they interact with the SQL Query Optimizer (8). Bonus points for focusing on Inline, Table-Valued UDFs as the most performant of the three (7), characterizing them as a kind of parameterized view (7), and understanding that the right way to call them is with an APPLY (7). Reference here.
  6. Tell me about Common Table Expressions. These are an essential tool. At a minimum I would expect the candidate to be able to talk about their effects on code readability (5) and their use as a vehicle for recursion (7). Bonus points for discussing the performance implications of calling a recursive CTE (8). Reference here.

If a candidate can’t provide satisfactory answers to all the question elements above ranked 5 or below, I’m terminating the interview. No point in continuing. If they couldn’t get to 7, I may still be interested, but I’m going to be expecting to hear just how awesome the candidate is on some otherdevelopment platform and just why I should care.

Anyway, now that we have a pretty good picture of the candidate’s SQL skills, I’d like to see him apply them against his own data model. Ideally we have a white board for this one, but if it’s a phone interview we can talk our way around it. The vehicle is the iTunes Test, it is an exercise in thinking on one’s feet, and it has three parts:

  1. Data Modeling. Everybody’s familiar with the iTunes Store. Ask the candidate to draw you a data model that captures the essential elements of the way Albums, Artists, and Songs are represented in the iTunes Store (obviously, any related thing will work). Here’s what I am looking for:
    • Can the candidate abstract out the parts that are actually relevant to the question? If he spends more time noodling over how to represent a Song genre than he does establishing a key relationship between, say, Artists and Songs, then we have a problem.
    • Does the candidate show an awareness of simple best practices? If he’s using the Song title as a primary key, we have a problem.
    • Does the candidate know how to communicate data relationships visually (or verbally)? Is he consistent in his use of notation–UML, ERD, crow’s foot, whatever–or is he all over the map?
    • Does the candidate make some attempt to capture the complexity of the domain? If there isn’t at least one many-to-many relationship in his model, I’m disappointed.
  2. Model Analysis. Whatever model the candidate came up with on the fly is going to contain some intrinsic constraints. For example, the way he set it up may only allow a single artist on a song. So the question is: what constraints does your model impose on the data? If the candidate can’t identify most of the constraints in his own model, we may have a problem.
  3. SQL Query. Ask the candidate to write a query against his own model. The specifics will depend on the actual model, obviously, but make it reasonably tricky. Example: “Write me a query that lists every album in the database with two artists on every song.” The more complex the candidate’s model, the more interesting the query can be. If the candidate can’t query his own model, that’s a problem. Also watch to see how the query develops: you’ll learn a lot about how the candidate thinks when approaching a problem.

It’s impossible to provide numerical guidance around these questions because this part of the interview is more fluid. But, assuming the candidate avoids the obvious gotchas, here’s what I am looking for:

  • Is the candidate collaborative? Does he engage me by asking clarifying questions and calling out assumptions as he works? If I start reading my email while he’s coding his query, does he notice and try to re-engage me?
  • Is he expressive? Does he draw pictures and diagrams, use verbal analogies, and otherwise engage me with a vocabulary of shared symbology? Or is it all just words?
  • Is he iterative? Is he confident enough to produce a first cut at the problem that may be obviously wrong but just as obviously a step in the right direction? Can he integrate my feedback and iterate rapidly to a satisfactory solution?
  • Is he courageous? When I deliberately misread his code and tell him it won’t work, can he take a step back, re-analyze his own work, and then confidently explain to me why I’m the one who is wrong?

Again, no hard standards here. But the soft standard is that, over the course of this portion of the interview, I should acquire a warm, fuzzy, and rather well-informed feeling of confidence in the candidate. No warm fuzzies, no dice.

I’ll leave it up to you to come up with questions that specifically address your EDM platform of choice. But it’s worth pointing out that a candidate who can sit the above interview to my satisfaction will be able to operate effectively on any EDM platform, given appropriate specialized training. Considering how rare it is to find a an EDM operator who is (a) truly talented and (b) not already engaged, finding solid data-centric developers and training them up on your platform may actually be your most efficient option.

It’s certainly what we do at HotQuant.

Previous Post SQLEXPRESS Upgrade Gotcha
Next Post Markit EDM Test Harness: First Looks