The Unrealistic Expectations of IT-Recruiters: A Data Analysis

Diving into IT job descriptions and their flaws using data analysis.

ZhongTr0n
Geek Culture

--

Image source: Pexels.com

Introduction

A couple of weeks ago I shared an interactive network graph that maps out the landscape of IT skills as mentioned in job descriptions. The initial idea was to analyze IT job descriptions and quantify just how often they have unrealistic expectations. Since one of the first steps was to extract the features (a list of IT skills), it seemed interesting to throw it in a graph and share the results. Continuing on that article, I will now take a look at the (unrealistic?) expectations IT job descriptions often contain.

Network graph for IT-skills (image by author)

The idea of writing about unrealistic expectations in job descriptions has been sitting with me for a while. You can often find posts on Twitter or Reddit of people sharing screenshots from ridiculous job descriptions. There is even this one guy who added Pokemon to his LinkedIn profile just to see if recruiters would notice. And then last year, I met a recruiter through a mutual friend. I asked her if she finds it challenging writing job descriptions about things — let’s say IT — of which her knowledge is limited, to which she replied “haha yeah we have no idea, but it doesn’t matter”. It’s hard to capture in words but there was a sense of misplaced pride in the way she said it, that made me feel compelled to start looking at the data to try and quantify this problem.

Before we dive into this, I want to make it clear that I know the vast majority of recruiters do great work in helping us and our employers find great matches. This analysis is aimed at the minority among the recruiters who do things… differently.

The Data

For this analysis I used a different dataset than the one I used for the network analysis. As the network analysis was mainly focussed on feature extraction, well structured data was very useful. Once that was done and I had my list of features I could use a dataset with more qualitative information like fully written out job descriptions and salaries.

On Kaggle, I found this dataset which is based on an old competition for which Adzuna (a UK job search engine) provided the data. The dataset contains 244768 job descriptions of which I filtered out the 38483 IT jobs.

Each record consist of one job description and the columns provide the following information:

  • Location
  • Company
  • Full description
  • Salary
  • Contract type/time
  • Source
Sample for the dataset (image by author)

Analysis

For this analysis we will focus on three things:

  • Years of experience
  • Skills required
  • Typical “red flags”

Let’s start with the years of experience.

Years of Experience

This is an extract from a one of the job descriptions in the data:

“Unix Are you a Junior Tester? Have you got 12 years testing experience?”

Or what about this one:

“You will have 12 years experience working with SAP FI/CO on a technical level. You will be working as a junior consultant offering 1st and 2nd level support. They require applicants to speak English, Dutch and French”

Especially the less experienced among you have seen many job postings like this. Companies looking for graduates, junior or entry level positions and then requesting an unreasonable amount of experience.

Using a combination of NLTK and regex I tried to extract the years of experience required from each job. Once I got this information I took a look at the title to identify junior and senior positions. This was rather straightforward as I just labeled all jobs with the word “senior” in the job title as senior and for junior I used the keywords “junior”, “entry level” and/or “graduate”.

Plotting this information shows the following:

Experience required by type of position (image by author)

The results seem to be pretty reasonable. Most junior jobs require little to no experience. Even though some descriptions do ask for unreasonable experience way beyond two years, it seems to be rather exceptional. Furthermore, I must also add there is some bias in this analysis. Some job descriptions mention the experience of their company in terms like “we have 15 years of experience in…” which is also picked up by script. I ran some samples and these misidentifications do not appear to happen very often, but it certainly does account for a small part of the numbers.

When looking at the relative numbers (the total sample consists of 38k job descriptions) the amount of unreasonable experience requests is limited, yet it is still astonishing that almost one hundred junior positions demand experience of at least 5 years.

Skills Required

Image source: LinkedIn.com

Some recruiters like to fish with a big net. They list a bunch of technologies that are vaguely related to the function and hope it will result in finding a candidate. But how often does this lazy approach take place? I used two different metrics to quantify this;

Total number of skills; with this fairly simple metric, I counted how many of the features (IT-skills) in my network chart occur in a single job description.

Skill separation; A network chart consists of nodes that are connected through edges. If skills are closely related the distance (number of edges to form a connection) will be short. However, if skills have nothing to do with each other like, let’s say MongoDB and Photoshop, the distance to cover will be a lot longer.

Network paths lenghts and skill relation (image by author)

Using this principle (and some additional parameters), I calculated how spread out the different skills actually are by using a ratio of edges used in shortest path versus the total number of nodes.

In short; a value of 1 means all skills are closely related. A higher number means the skills are less related to each other.

Combining both metrics gives the following result:

Detecting job description with high number of unrelated skills (image by author)

Overall, the results are not that bad. Most job descriptions don’t list an excessive amount of skills and for those that do, the skills are highly related. However in the red area there are some job descriptions with both a high number of skills (more than 15) and skills that are not closely related. Here you can find an example of one of those data points in the red area;

Java/Spring Developer […] Key Skills Required: Agile working practices Modern OSs (kernel and user spaces, memory management, file systems, basic network services, users ) Messagedriven architectures (preferably with JMS and/or AMQP) ANSI SQL (preferably as implemented by one of the big RDBMS: Oracle Database, MS SQL Server, PostgreSQL) Java EE Servlet containers or application servers (e.g. Tomcat) Desirable Skills: Spring Framework UNIX / Linux / OS X (good command line skills (e.g. find me all files modified yesterday that contain the text that matches some regular expression ), filesystem structure, daemons) Maven, Spring Security Akka Groovy (Groovy language); Grails Scala (functional programming, Scala actors); integrating Scala with Spring applications ObjectiveC (particularly for iOS programming) CSS, JavaScript (jQuery preferred) Ajax Ruby on Rails […]

I must admit my Java abilities are basic to say the least, which makes it a bit harder for me to interpret these skill requirements. But Java, Oracle, SQL, PostgresQL, ObjectiveC, Linux, Scala, Javascript, CSS, Ruby on Rails and much more seems to be a broad range of skills to demand from a developer.

Before we go to the ‘red flags’, remember that this analysis is based on the skills I used for the network chart. This means many other IT-skills that are not in the network chart are not picked up in this analysis. In reality, the numbers would be higher.

Typical “Red Flags”

Lots of social media mention typical “red flags” in job descriptions. Red flags are considered to be things like keywords that would predict a low salary or bad working environment. After scrolling through r/recruitinghell I was able to draft a list of keywords that would be red flags. Using this list I took a look at the average salary of job descriptions containing these keywords compared to the overall average.

Buzzwords & average salary (image by author)

For a lot of these keywords there seems to be an obvious relation with the salary, especially for ninja, gogetter and ad-hoc duties. But as you know correlation is not always causation. There could be many other variables in play. For example, these buzzwords are often used by startups which are known to have lower salaries.

One of the interesting keywords is “passion” or “being passionate about”. Using some regex and NLTK, I visualized the most common things employers want you to be passionate about.

Things you should be passionate about according to recruiters (image by author)

You could argue that there are more exciting things in life to be passionate about, but then again I’m the one writing code for this analysis in my free time so who am I to judge ¯\_(ツ)_/¯.

Conclusions

Overall not bad

Overall, it is not that bad. The vast majority of job descriptions seem to have realistic demands (or at least by the metrics in this analysis) and are well written. Required experience seems to match the titles, the number of “red flags” or buzzwords is limited and the amount of required skills is reasonable and relevant.

As with many data problems, there is an exposure bias where the bad examples get a lot of attention which generates the idea the landscape is not doing great.

But When It Is Bad, It’s Really Bad

As said, overall things are not that bad. However, when things are bad, they tend to be really bad. Junior positions requiring up to 15 years of experience, job descriptions listing more than 15 skills that are not even related to each other, .. examples like that are not a good look for the recruiting industry. When looking for a job seeing ads like that can be extremely demotivating so it’s a good thing we can label them as exceptions. Additionally, in this day and age of social media these terrible examples are often shared on various platforms creating a boomerang effect for whoever wrote the job description.

The Tables Are Turning

Image source: giphy.com

Things are changing. Technology is disrupting a lot of industries and recruiting is certainly one of them. One of the trends in this movement of disruption is often to cut out the middleman and replace him/her with a digital process. The bad job descriptions lacking comprehension of the role and skills are often written by third party recruiters. Online platforms like LinkedIn could help connect employers with candidates without the need for a middle bring both together. Searching for jobs is a tiring process for the candidates and an expensive process for the companies. There is a lot of room for improvement and all parties involved would benefit from it.

Further Research and Improvements

As always, some points of improvement and topics for future research.

  • Improve the list of skills: As mentioned above, the list of skills is based on the featuresI extracted for the network graph. Despite the fact I’m pretty happy with this list, it is far from complete. Many important frameworks/technologies/libraries are missing which could provide key insights in the analysis of required skills.
  • Include other variables: Salary is not a great indicator for job satisfaction. If I would somehow be able to get my hands on a big dataset with job descriptions that includes numbers like retention for said positions it could result in a very powerful analysis. Having a direct connection with how the job is described and how candidates experience it would offer a very clear view on the impact of the wording.
  • Look at other industries: The current analysis only looks at the IT job landscape. For me this is a familiar playground, which makes it easier to look at the data. Ofcourse, this is only a small portion of the job landscape and it would be nice to look at other industries.
  • Bigger and better data: I know I tend to say this on almost any analysis/report but it (almost) always applies. More and better data would improve the analysis. Unfortunately, it is not that easy to find huge (free) and recent data on job descriptions.

Tech Stack

Data analysis (Python): pandas, numpy, regex, word2number, nltk and networkx

Data visualization (Python): matplotlib, seaborn and wordcloud

About Me: My name is Bruno and I work as a data science consultant. If you want to see the other stuff I built, like a mumble rap detector, make sure to take a look at my profile. Or connect with me via my website: https://www.zhongtron.me

--

--