Diving into Data Science: My Last 2 Weeks of Attending Conference, Meetup & Workshop

It's been a somewhat turbulent but interesting last couple of weeks. In my quest for education and knowledge in the data science field, I attended ODSC West 2017 in San Francisco from November 4th through the 6th, and two local data science meet-ups, one a Discussion sub-group and the other an Applied Data Science sub-group of the Portland Data Science group.

I'll start by discussing some of the talks I attended at the ODSC conference. I'll follow-up with more posts on that conference and the meet-ups I've attended in the past week.

ODSC West 2017

I spread my time at ODSC between technical and non-technical talks and picked up some good tips on how to prepare for a data science career. As is usual at a conference, the best part were the informal conversations over meals and drinks and between sessions.

First the talks.

Computer Vision

I was most impressed with two talks on computer vision.

Building an Object Detection Toolkit in Tensorflow

This talk, given by Alan Descoins and Javier Rey of Tryo Labs was about Luminoth, a computer vision toolkit recently open-sourced by Tryo Labs. The talk featured a nicely-illustrated slide presentation of the Fast R-CNN (Region-based Convolutional Neural Network) design employed in Luminoth. Besides the great slides and explanation provided, I was also impressed with the enthusiasm that Alan and Javier had for their new open source creation. The day before I had the pleasure of having an informal lunch conversation with the Tryo Labs team. They're a great group of people and their excitement about Luminoth was contagious.

My understanding of neural networks in general was very limited before hearing that talk, but my interest was peaked. The talk was on Saturday, and later that evening as I was having dinner at one of the airport restaurants, I met another conference attendee who happened to have recently written a lucid blog post about convolutional neural networks. In the Machine Learning course I'm currently taking on Udemy, neural networks are a little further down the road, but I can't wait to get there and master that material. I could just dive in, I'm used to that style of learning, but I want to get the data munging, visualization and basic machine learning well understood first.

Artificial Intelligence for Sustainability

In this talk, Stefano Ermon, an Assistant Professor of Computer Science at Stanford, applied computer vision to the problem of extreme poverty. Using an R-CNN, he showed how extreme poverty can be identified using analysis of satellite imagery. The neural network was necessary in order to map the rich visual data in satellite images to a set of output features which could be used as proxies to eventually derive poverty measures.

These methods of arriving at data which in some countries is not available in any other format, has led Ermon to apply them to other human problems such as hunger and food security. His research site points to papers and talks for further exploration.

Tips for Landing a Data Science Job

I attended a great talk by a data scientist at Github on this topic.

Advice for New and Junior Data Scientists

Hamel Husain, a Senior Data Scientist at Github, offered a number of key insights in this talk.

General Data Science Job Insights

  • There are a lot of jobs with the Data Scientist title. He showed a slide that referenced a number of employers, indicating the types of specialization typically practiced there or the type of preparation Hamel recommends. This list is not complete but was all I could record at the time (videos of presentations are being made available in several weeks).

    • H2O, DataRobot: be good at competitive data science (e.g., Kaggle)
    • Yahoo, Indeed: natural language processing (NLP)
    • Facebook: NLP, machine learning research
    • drive.ai, cruise: Computer Vision
    • Slack: SQL, dashboards, business intuition
    • Lyft: econometrics, A/B testing, statistical modeling
  • Be clear about your goals with regard to the type of work you want to do. Just expressing interest in deep learning is too vague.

  • Know what you don't know and try to limit your "unknown unknowns."

  • Be wary of people overselling the role to you. Do not become a trophy data scientist.

  • MOOCs are for our benefit only. Don't just rely on them as a credential on your resume. You should show off your projects. It's ok to list credentials, just don't expect them to be enough to get your a job. Hamel said that hiring managers see certification credentials without supporting work as an indication of a junior-level person. Maybe that's enough to get some jobs, but it's often not enough to get a job.

  • If you are looking at which bootcamps to attend, first check out the LinkedIn pages of data scientists at companies you want to work at, and see where they went.

  • Kaggle is useful for developing specific skills like model evaluation, spotting data leakage, feature engineering and for learning design patterns. Hamel pointed out that you should not focus completely on Kaggle and ignore other skills that may be useful or required for jobs.

Tips for figuring out what to learn

  • Conferences: go to talks where you learn something you cannot readily learn on the web.
  • Twitter: Useful for finding out a lot from the main contributors to data science projects.
  • Want to learn list: Figure out what you don't know and keep a list of things you want to learn.

Hiring Traps

Hamel dived into a few hiring traps that we should look out for.

  1. Trophy Data Scientist

    • Companies want to hire a data scientist but do not know what to do with them. This is a reality, he says.
    • Watch out for buzzwords. I think the presenter's point was that buzzwords without substantive commentary on the problems to be solved is a warning sign.
    • They describe the techniques they are using but not the problems that they are solving. This is very common.
    • Ask about their infrastructure/tech stack to understand the state of their data. The employer might not be far enough up the data science pyramid to need a data scientist.
  2. Overselling the role

  3. They really need a data engineer.

When you finally get an offer

  • They are lucky to have found you. It's not just that you have finally been offered a job after months of searching. They have also been looking for a while and are truly lucky to find you!
  • Negotiate: Don't sabotage a job that is otherwise amazing in all other dimensions because you are beaten down by the process of searching for a job and interviewing. Employers want you to be happy at the job so negotiate with that in mind.