Improving the Oil Job Index

1 month, 2 weeks ago | Jared Hammond
Site Updates | Under the Hood | Coding

If you're somewhat interested in coding and how this site works read ahead, if not you've been warned there is fair bit of coding!

How do you improve the (subjectively) best oil job site on the internet? Good question.

Well, get rid of

Those examples are pretty obvious and I should have noticed them a long time ago. The keyword was 'oil' and for one job the text was in Italian so the site couldn't get any further context.

Then there are jobs like these below who are related to oil industry but also are not oil jobs at the same time:

What is needed is a way to test for 'oil jobness' - the degree to which a job description is about an oil job

Checking For Oil Job Keywords

Each job has a job description and a job title. The solution doesn't need to be too exotic; all we are going to do is check the number of oil job keywords in each job before we save it to the database.

The four ingredients/steps we will need are:

  1. A list of oil job keywords to test against
  2. The number of keywords that counts as a pass
  3. A counter for keywords in the job description
  4. Check step 3 passes step 2, if yes save the job

Generating a List of Oil Job Keywords

Warning this section is coding heavy
In the Python code below we are solving step 1 - "A list of oil job keywords to test against"

  1. Making a function "OilKeyWords" which takes two text inputs: "job_details" & "job_title".
  2. Telling it where the file is relative to the home directory.
  3. Opening "oil_keyword_test.txt", reading it and storing it as "oil_keyword_test_file_text".
  4. Getting that raw text and splitting it at each new line "\n" and storing it as a list "oil_keywords"

# Comments in Python look like this! 
def OilKeyWords(job_details, job_title): #1
	folderPath = 'collection/jobsearch' #2
	with open(folderPath + '/oil_keyword_test.txt') as oil_keyword_test_file: #3
	    oil_keyword_test_file_text = #3 continued

	oil_keywords = oil_keyword_test_file_text.split('\n') #4

If we print oil_keywords it will be in a nice list for us to use in a "For Loop"

>>> print(oil_keywords)
['bakken', 'barnett', 'drill', 'drilling', 'eagle ford', 'exploration', 'fpso', 'frac', 'gas', 'geology', 'gom', 'gulf of mexico', 'hydrocarbon', 'lng', 'marcellus', 'midstream', 'north sea', 'o&g', 'offshore', 'oil', 'oil & gas', 'oil and gas', 'oil field', 'oil rig', 'oil well', 'oilfield', 'oilrig', 'oilwell', 'onshore', 'permian', 'pipeline', 'prms', 'reservoir', 'rig', 'seismic', 'wellsite']

Counting Oil Job Keywords in the Job Description

Lets introduce a job description that will pass the test:
job_detail='Oil & gas job description that will pass this simple Python code. Just add Drill and Reservoir'
Simple check for a single keyword:

>>> job_detail.count('gas') # Python in-built function 'count'

Great, the word gas is in the description once, and the count returned one. Now, slightly more complicated, loop over every keyword ("kw") in the "oil_keywords" list and count the total:

>>> kw_count = 0 # introduce a variable to keep track of the keyword count
>>> for kw in oil_keywords: # a for loop.  'kw' represents each keyword in the 'oil_keyword' list
	kw_count += job_detail.count(kw) # the counter adds the count to itself

>>> kw_count
# 1! ... There are 5 keywords
# that didn't count 'Oil', 'Oil & gas', 'Reservoir' or 'Drill'
# because the job description contains uppercase characters
# easily fixed below using the lower function:
>>> kw_count = 0 # reset the counter
>>> for kw in oil_keywords:
	kw_count += job_detail.lower().count(kw) # make the text lower case
	kw_count += job_title.lower().count(kw) # also check the imaginary job title

>>> kw_count
5 # better

Putting it Together

With the building blocks above It's now a re-useable & simple check we can pass on jobs to check they are relevant Oil Industry jobs and not Olive Oil jobs.
I've gone ahead and used a more efficient way of counting the keywords as the real website analyzes 1000's jobs an hour... and as everyone always says, "the servers ain't free".

  1. After multiple tests on the job search I've found that a job description containing 3 or 4 keywods performs best. I've chosen "3" as the hurdle so each description needs 4 or more.
  2. The for loop is replaced by two built-in functions sum and map which together does the same thing more efficiently.
  3. Also check the job title for keywords
  4. If there are enough keywords, continue with the rest of the job workflow (there is a lot more code...) and then save the job

def OilKeyWords(job_details, job_title): #
	# ...snip...
	oil_keywords = oil_keyword_test_file_text.split('\n') # Where the first example ended
	required_word_count = 4 #1
	description_count = sum(map(job_details.getText().lower().count, oil_keywords)) #2
	title_count = sum(map(job_title.lower().count, oil_keywords)) #3
	if (title_count + description_count) > required_word_count: #4
	    # ...snip... #4  continued


If I want this site to stay as "the (subjectively) best oil job site on the internet", I need to make sure the jobs are 100% relevant to our industry while also not excluding genuine jobs. I'm hoping this code change improves the quality of the jobs in the niches I don't check that often like 'sales'. Plus It should free up some resources to make the site a little bit faster.

If you have any ideas for the site leave a comment below or head to the contact page

- Jared Hammond
Founder | Coder | Reservoir Engineer

p.s. Help the site out and give it a linkedin share or follow (helps me keep this site free)

Back to the Oil Job Search