11 months, 3 weeks ago | Jared Hammond
Site Updates | Under the Hood | Coding
If you're somewhat interested in coding and how this site works read ahead, if not you've been warned there is fair bit of coding!
Well, get rid of
What is needed is a way to test for 'oil jobness' - the degree to which a job description is about an oil job
Each job has a job description and a job title. The solution doesn't need to be too exotic; all we are going to do is check the number of oil job keywords in each job before we save it to the database.
The four ingredients/steps we will need are:
Warning this section is coding heavy
In the Python code below we are solving step 1 - "A list of oil job keywords to test against"
# Comments in Python look like this! def OilKeyWords(job_details, job_title): #1 folderPath = 'collection/jobsearch' #2 with open(folderPath + '/oil_keyword_test.txt') as oil_keyword_test_file: #3 oil_keyword_test_file_text = oil_keyword_test_file.read() #3 continued oil_keywords = oil_keyword_test_file_text.split('\n') #4
>>> print(oil_keywords) ['bakken', 'barnett', 'drill', 'drilling', 'eagle ford', 'exploration', 'fpso', 'frac', 'gas', 'geology', 'gom', 'gulf of mexico', 'hydrocarbon', 'lng', 'marcellus', 'midstream', 'north sea', 'o&g', 'offshore', 'oil', 'oil & gas', 'oil and gas', 'oil field', 'oil rig', 'oil well', 'oilfield', 'oilrig', 'oilwell', 'onshore', 'permian', 'pipeline', 'prms', 'reservoir', 'rig', 'seismic', 'wellsite']
Lets introduce a job description that will pass the test:
job_detail='Oil & gas job description that will pass this simple Python code. Just add Drill and Reservoir'
Simple check for a single keyword:
>>> job_detail.count('gas') # Python in-built function 'count' 1
Great, the word gas is in the description once, and the count returned one. Now, slightly more complicated, loop over every keyword ("kw") in the "oil_keywords" list and count the total:
>>> kw_count = 0 # introduce a variable to keep track of the keyword count >>> for kw in oil_keywords: # a for loop. 'kw' represents each keyword in the 'oil_keyword' list kw_count += job_detail.count(kw) # the counter adds the count to itself >>> kw_count 1 # 1! ... There are 5 keywords # that didn't count 'Oil', 'Oil & gas', 'Reservoir' or 'Drill' # because the job description contains uppercase characters # easily fixed below using the lower function: >>> kw_count = 0 # reset the counter >>> for kw in oil_keywords: kw_count += job_detail.lower().count(kw) # make the text lower case kw_count += job_title.lower().count(kw) # also check the imaginary job title >>> kw_count 5 # better
With the building blocks above It's now a re-useable & simple check we can pass on jobs to check they are relevant Oil Industry jobs and not Olive Oil jobs.
I've gone ahead and used a more efficient way of counting the keywords as the real website analyzes 1000's jobs an hour... and as everyone always says, "the servers ain't free".
def OilKeyWords(job_details, job_title): # # ...snip... oil_keywords = oil_keyword_test_file_text.split('\n') # Where the first example ended required_word_count = 4 #1 description_count = sum(map(job_details.getText().lower().count, oil_keywords)) #2 title_count = sum(map(job_title.lower().count, oil_keywords)) #3 if (title_count + description_count) > required_word_count: #4 # ...snip... job.save() #4 continued
If I want this site to stay as "the (subjectively) best oil job site on the internet", I need to make sure the jobs are 100% relevant to our industry while also not excluding genuine jobs. I'm hoping this code change improves the quality of the jobs in the niches I don't check that often like 'sales'. Plus It should free up some resources to make the site a little bit faster.
If you have any ideas for the site leave a comment below or head to the contact page
- Jared Hammond
Founder | Coder | Reservoir Engineer