Introduction
Hello aspiring data scientists, data analysts and all-around data enthusiasts. If you have an interest in understanding more about potential pathways and careers in data, statistics, or software development for data related fields then I hope you find these career tips and tricks helpful for thinking about your future. I’ll preface this by saying that there is an immense variety of pathways and opportunities for kicking off a career in data science and so whilst I’ll try to provide multiple perspectives from my time working in data related roles, my experience is very much just one route of many. Before I go any further, I should introduce myself.
My name is Hayden Sansum and I’ve been working or studying in Data Science for almost 10 years at this point. I grew up in a rural area in the United Kingdom and spent much of my childhood tinkering and building; starting from Lego and K’Nex and working my way up to lawn mowers and computers. My dad started his career as an Apprentice Machinist and retired as a Director of Mechanical Engineering and given this influence, I had always thought I would become an Engineer myself. It seemed like a given that engineering would be my path.
When it came to choosing subjects in school I leaned heavily into mathematics and sciences, what I thought would help me become the engineer I wanted to be. In 2010, I was lucky to be accepted into the University of Bath as an Integrated Mechanical and Electrical engineer.
After finally completing my five-year degree, including a full year of internship work, I ended up realizing that engineering might not actually be the career for me – it had been my entire goal up until this point and so after graduating I was left at a bit of a crossroads. I spent some time reflecting about the elements of engineering I enjoyed: the experimentation, system building, and one specific course I took developing AI algorithms in Python. That one course had really inspired me about the potential of machine learning and AI. I focused on applying for roles where I could leverage these specific skills and continue to develop them further – and that led me to Data Science.
Fast forward a fair few years and a master's degree in data science later, and I am now working at Day Zero Diagnostics, a biotech start-up based in Boston where I leverage machine learning to solve antibiotic resistance. As a Senior Data Scientist, I work across many different areas and my daily routine can vary hugely (the benefits of working at a small company!). I split my time between software engineering, where I build algorithms and tools; designing statistical analyses, where I try to develop insights; and machine learning research, where I investigate methods to improve the performance of ML models. Data Science is a broad field and there are so many different routes to making an impact, all of which makes it an exciting and dynamic space to build a career!
Data Science Fellow
career options
As I mentioned, there is no single path to a career in Data Science and depending on your personal interests and ways of working you may find that one area resonates with you more than another. Below I’ll try to lay out some of the most commonly defined roles where you could look to focus, and what the difference is between them. There is rarely a consensus on the boundaries between these roles and different companies may have their own definitions of what each of these career paths might mean. Being able to understand what you’re looking for out of a role and understanding which questions to ask of an employer will be important skills moving forward.
A Data Scientist is a generalist with an understanding of statistics, machine learning and software development best practices. A Data Scientist will be reviewing and analyzing data and building models and tools for machine learning. A data scientist may skew more into many of the areas listed below depending on the need of the role.
More specific in focus, a Data Analyst will spend more time generating insights from data. Leaning more into data processing, data cleaning, statistical analyses and reporting results to stakeholders.
This role is all about understanding and researching algorithms and approaches for advancing and improving machine learning models. Projects are often longer term and more academic, requiring reviews of existing papers and literature. The most common pathway to becoming a Machine Learning Scientist would be via a PHD.
Machine Learning Engineers can be thought of as more applied machine learning scientists. More similar in background to a Data Scientist, a Machine Learning Engineer will focus almost entirely on building, developing, and implementing machine learning models. This role has a large software engineering component as efficient implementation is vital here.
Natural language processing (NLP) has become such a major branch of machine learning that many career opportunities have opened up in this space. NLP researchers or engineers will focus specifically on understanding patterns in text/sequence data and developing models for speech processing or translation. This is an example of a specialist focus within Data Science and is one of the most common, although others exist (e.g. Computer Vision).
A Data Engineer will be building data storage and processing architectures to allow other data related roles to access and utilize the data. Data engineers will be very hands on with raw data and in building software systems. Data Engineers are similarly more skewed towards software development.
Data Science Fellow
skills
What are the main hard skills you use on a daily basis in your current job?
Knowledge of Python is considered the most fundamental skill needed in Data Science, and something I personally use day in and day out. There are other programming languages (e.g. R, Javascript, C etc.) and having an in-depth knowledge of any language will be an extremely helpful skill, but Python is considered the most standard and most valued in the industry. I developed a foundational knowledge of various programming languages during my engineering degree, including Python, and converted those fundamentals into expertise gradually over time, through continued use. The more you can gain experience using Python to build tools the better. I would recommend focusing on building up software best practices such as writing clean, documented, and tested code, and learning common libraries (Pandas, Numpy, Scikit learn) which is what I spend a lot of my time leveraging.
Working with data is the foundation of any Data Science role and being able to correctly interpret and make valid inferences from data is vital. Statistics is a deep field but the elements I lean on most regularly are fundamentals of probability, distributions of data and hypothesis testing. I studied statistics formally during my Master of Data Science and I would highly recommend taking a few courses in statistics (there are also many great University courses in Statistics available online).
Relating to software best practices, I have been using Git and GitHub to collaborate with others on projects throughout my career. Git allows me to track and version my own work, so I don’t ever have to worry about losing progress, and then the GitHub platform allows me to share that work with others. GitHub has become ubiquitous in the Data Science community and many workflows such as planning and tracking work start with GitHub. This is a skill I’ve picked up gradually through constant use, but there are great, interactive resources online that I used to boost my understanding.
Throughout my career I have utilized many different ML algorithms and models (e.g., unsupervised clustering/outlier detection, linear/logistic regression, gradient boosting, and neural networks). I developed my knowledge of each of these algorithms through a combination of academic research and on the job training but there is a shared set of skills that I utilize all the time regardless of the algorithm. These fundamentals include approaches like data preprocessing, cross validation, hyperparameter tuning and assessing uncertainty.
What are the main soft skills you use on a daily basis in your current job?
Being able to share findings and insights with various stakeholders (from technical colleagues to non-technical senior leadership) is a vital part of being a Data Scientist. In my current role, I discuss my analyses and findings from my work with my team daily and am required to summarize and present my findings to the company every month. I find that being able to be concise and compelling when discussing data is a highly valuable skill. There is not really a shortcut here aside from practice, so I would advise putting yourself forward to talk through and present your work to others whenever you can throughout your career.
It is all too easy in Data Science and analytical fields to find yourself diving into a rabbit hole. Often there are a multitude of viable solutions and opportunities for exploration, and it can be difficult to know which ones to pursue. Being able to stay objective and critical when approaching problems is a difficult but valuable skill to have. In my opinion, the best way to develop your problem solving and critical thinking skills is to be open in talking through your ideas and thoughts with colleagues. I find it helpful to organize and write down my thoughts before asking for feedback, and then remain open in discussing the benefits and drawbacks of each approach.
Even in my current role at a startup, I rarely work completely independently. On the technical side, I often engage in reviewing other’s code and having others review my own code. This process (often performed using collaboration and version control tools such as GitHub) is one of the fastest ways to learn and pick up skills when working as a Data Scientist. It can be made even more effective by fostering a respectful and open working environment. Considering colleagues' points of view and being receptive to feedback are skills which can be practiced and improved over time. I try to remain conscious of this and engage with others as often as possible.
Hayden
’s personal path
Tell us about your personal journey in
Data Science Fellow
:
When I graduated from my master's in data science, it was my first time searching for roles in the US, after moving here from the UK. Alongside the new working environment, I was also facing uncertainty regarding my VISA, having applied for OPT but at a time when the Government was rethinking their approach to processing. There was a great deal of uncertainty on when I would be allowed to start working, and that combined with the difficulties of getting through the initial VISA filter many (especially larger) companies may have, meant I did not find much traction when applying for roles online. I cannot remember exactly how many applications and cover letters I ended up submitting but it must have been at least twenty.
One area where I was very proactive during my degree, however, was visiting career fairs and attending company events. I also reached out to the careers department to attend specific events for international students and to get feedback on my resume and practice interviewing. It ended up being these events which proved most helpful to my own career search.
During a career fair, I had a really engaging conversation (with my now current manager) about some of the projects I had been working on and the opportunities at Day Zero Diagnostics. I had many such conversations, but this one led to the hiring manager reaching out directly and asking me if I would be interested in applying for a role. This direct contact had helped me avoid the online application filters and get a foot in the door. The role that was offered was more junior than I was searching for, but I felt I should be flexible in exploring the opportunity.
In the end I managed to negotiate a role which matched my experience and skills. In the end, landing a role in Data Science requires patience and perseverance. I was fortunate that a hiring manager contacted me directly, but I would recommend being proactive in searching out opportunities and speaking to companies as early as possible to maximize your chances.
What would you tell your younger you regarding building your current career?
I have two main pieces of advice for my younger self. First would be to explore early and often and don’t be afraid to go outside of your comfort zone. I was hyper focused on building my engineering skills, and it wasn’t until my very final year (of a 5-year degree) that I found a course in AI and started to realize that a career in Data Science could be for me. I really wish I had explored a wider variety of courses and applications, which I feel would have helped me understand what I enjoyed, and what I wanted from my career earlier. Another example of this is that I currently work in the biotech space on bacterial genomics, a field I had absolutely no prior experience of.
Having worked in this space for a few years, I’ve realized how fascinating it can be. There is no substitute for trying out a new space or idea. Joining Hackathons or other short terms events can be a fantastic way to get exposure to a problem space or domain to see if the topics resonate with you. Secondly, I would tell myself to not be afraid of finding a niche. This may sound contradictory with my first piece of advice, but what I really mean here is that when you do find something that resonates with you and that you enjoy, then have the confidence to really explore and dive into it. Often the moments where I have had success when finding roles, or been impressed by others, is when talking about a topic or project passionately and in detail. Even if that topic might not be a direct fit for a job or role, it demonstrates an ability and desire to learn about an area, and the enthusiasm and level of detail that will naturally come from talking about something you enjoy can be helpful for interviewing. Data Science is a broad field and whilst it's important to understand the fundamentals, it can be a great idea to pick an area which is most interesting to you and focus extra time there.
Final thoughts & tips
I hope, as you’ve been reading this article, that you have started to put together a picture of what it might be like being a Data Scientist. Data Science is a field which really benefits from practice and application, and so whenever you have an opportunity to work on an applied problem, I would say to go for it! Hackathons and other data challenges and events can be a great way to build up team working skills and develop a portfolio of analyses or machine learning projects. They can be time consuming however, and so another option could be to look for and select courses with a large group project component to get that same experience during the semester. As I mentioned before I would also highly recommend exploring your options for jobs as early as possible and engaging with your careers department for advice and support in finding a role. And whilst it can be daunting to have such a wide range of avenues to explore for a career in Data Science, just remember that there will be a role out there which resonates with you, even if it takes time for you to find it.
Resources to dig in more
Towards Data Science (a blog hosted on Medium)
This blog has a huge collection of posts from technical breakdowns of machine learning algorithms to high level career articles. Being blogs they can be very subjective but often helpful, here is an example of a career based article
Usages for data analytics in the UK government
From one of my past roles, and to give you an example of how Data Science is applicable to many different fields, this blog highlights some of the usages for data analytics in the UK government.
Statistics 110: Probability
Learn the fundamentals of probability and statistics from this free online course (it can be challenging but is an extremely helpful foundation for statistical understanding).
A Modern Introduction to Probability and Statistics
An alternative option for comprehensive overview of probability and statistics.
Learn Git Branching
Git interactive tutorial to help visualize the underlying mechanisms.
An Introduction to Statistical Learning
A complete guide to the fundamentals of Machine Learning(This is a very comprehensive resource so I wouldn’t necessarily recommend reading it cover to cover but it’s a great reference when trying to understand a topic).