Life at Expedia Group Blog
Machine Learning at Expedia Group; lessons learnt and more
In the first few weeks of the year, Aida Mashkouri Najafi and I (Katrina Soderquest) presented at the January session of the London Women in Machine Learning and Data Science Meetup group. We both Lead Data Scientists within the EG Data Science Centre of Excellence and previously, Hotels.com data science team. Between us, we presented a range of thoughts on careers in general and opportunities & projects within Expedia Group [EG].
A light-hearted yet insightful introduction from Alice Jacques (Acting Director of Data Science, Expedia Partner Solutions [EPS]) discussed how her parents’ habit of sending her every article they ever saw containing the term machine learning, had led to an interesting revelation.
Back in the very early days of ‘data science’, there was a 50:50 gender split. Now, the field is 27% women, and Alice discussed some of the things EPS specifically and EG are doing to combat this – for example, removing gendered language from job descriptions, which tends to encourage more male applicants.1
For my presentation, entitled ‘9 lessons: thoughts from the data science coalface’, I aimed to cover a range of more practical every-day thoughts around getting into data science and doing data science as a job. To do so, I spoke about the projects that I have worked on throughout my career both at EG and before, with a focus on nine lessons learnt:
1. Sometimes simple works – you can often beat low baselines quickly with nothing more than an ‘out-of-the-box’ random forest.
2. Find opportunities to upskill.
3. Ethics – can you justify the aims of your work, the side-effects of your model and the data you are using to yourself, your friends, your family.
4. Don’t forget the (statistics) basics – p values, hypothesis testing, correcting for multiple testing.
5. Do you want to work in ‘big data’? Don’t get carried away – not all problems are ‘big data problems’ but for those that are you’ll probably need spark and an understanding of cloud services.
6. What are your strengths – data science is a broad field. Some practitioners flourish at the sheer level of organisation needed to get a basic algorithm live in a place where no machine learning has gone before; others thrive on using a heavy-duty statistical background to make an already refined algorithm even better.
7. MLeap is great (https://github.com/combust/mleap).
8. The field moves quickly – deep learning, Bayesian techniques and multi-objective optimisation are some of the most noticeable growing areas over the past few years.
9. It’s called data science – evidence and experimentation must be central to your work.
With an audience that included both tech professionals looking to move into data science, people doing data-related jobs outside of data science and data scientists themselves, questions varied from; how to get more practice if you’re currently outside the field (hint – Kaggle is very good for this), to how to balance career development and getting the job done (no easy answer although the time it takes to train a model can be useful for pursuing non-project career goals).
In the second presentation, Aida used the opportunity to present work the Hotels.com data science team, in collaboration with partners across EG, have undertaken in computer vision. Topics such as image classification, image objective quality analysis and image aesthetic scoring are exciting and technically challenging projects with demonstrated potential to improve customer experience and increase both conversion and profitability on the website.
After an overview of projects, tools and tech stacks, Aida focused on her work on image classification. This work is hugely important given issues of mislabelling and image manipulation within a set of tens of millions photos where manual verification of each image would be ruinously expensive and time-consuming. Such work has its issues – what if a photo can be classified into multiple categories, or a category that doesn’t exist? Aida talked through how they’ve approached this work – using MTurk to produce an evaluation data set and assessing model performance on photos with different levels of confidence for classification from manual annotation. She also discussed what next steps might look like in terms of interplay between algorithmically scoring and active learning before answering a barrage of exciting questions at the end of the presentation on the practicalities of working with the deep learning algorithms that have become so central to computer vision work.
And then we were done. The meeting was judged a general success and definite congratulations and thanks to everyone who helped organise.
If you’re passionate in solving problems and building creative solutions by implementing the newest technology and approaches. Have a look at our Technology career path here. Or, why not sign up to be apart of our careers community.