Introduction

Hello fellow data science adventurers, my name is Joy, a bookworm from China. From a young age, I was fascinated by finding the underlying causes of phenomena, so I spent a significant amount of time in libraries. I read books from ancient China that depicted people living at the bottom of society during the Qing Dynasty and stories from India about lives changed by globalization. These stories, spanning various times and places, gave me a window into worlds vastly different from my own—a quiet and ordinary small town in southern China where most young people prefer to stay because of its comfortable and easy lifestyle. So, starting in college, I decided to study abroad in the US, hoping to see a world of significant differences.

Boston was my first stop, where I began my studies at the University of Massachusetts. At first, I loved how the economics professor connected everyday behavior with ancient theories. Knowing how close we are to these old theories made my heart pound. But the more I understood these theories, the more I felt lost. Ancient economic theories assume all men are rational, but no living human is rational. Oftentimes, we are like animals, controlled by our emotions and inner greed. I started to feel disconnected from the real world, and the previous excitement was long gone.

Two years into my US journey, I transferred to the University of Michigan, where I was exposed to another side of the world—the world of practical engineering and the latest technological applications. I took courses in programming and became familiar with popular technology terms like blockchain and bitcoin. For my third year, I had the chance to study at the London School of Economics, where I was exposed to the world of big data. I still remember how excited I was at LSE, understanding the coming world centered around one word: DATA, the new gold of the future.

Right after LSE, I decided to dive into this fascinating world and started my master's in data science. During my master’s program, I joined various data science projects with real companies. I got to apply my data skills in industries like manufacturing, marketing, and supply chain. Even though each project was unique, they all shared a common objective: understanding customers, employees, suppliers, and marketers—all people, irrational people. This same goal persisted when I decided to join healthcare to work as a data engineer. What we do day in and day out is trying to understand patients, providers, or payers better. Years of learning and practicing data science surprisingly gave me an answer to the confusion and disappointment I had with studying economics. It turned out that economists in the past had to assume all actors in economics were rational because they lacked the tools to understand these actors better. But in the modern world, we are so good at understanding people, irrational people, using the power of data. It is sometimes unsettling to realize that companies like TikTok and Google may understand us better than our families and friends do. But this is the data-driven era that the info session at LSE described to me. I hope you, my fellow data science enthusiasts, will join me in this adventure of understanding the world and its people through data, the new gold of our era.

Data Science Fellow

career options

Embarking on a career in data science means becoming proficient in a diverse set of skills, including programming, statistical analysis, and machine learning. It requires a strong foundation in theoretical knowledge and practical application, and a passion for continuous learning and adaptation in a rapidly evolving field. Whether you are motivated by the challenge of decoding massive datasets, the potential to drive business strategy, or the opportunity to contribute to advancements in technology and society, data science offers a dynamic and rewarding career path. The following session would describe a few common data science career paths.

1
Data Analyst
2
Data Scientist
3
Data Engineer
4
Data Architect
5
Machine Learning
6

Data Science Fellow

 skills

What are the main hard skills you use on a daily basis in your current job?

1
SQL and Database Management

During my internship at a marketing consulting firm, I did not expect that the first line of code I would write for developing a customer segmentation model would be in SQL. This was even before I got to delve into in-depth data science theories. I had to use SQL to access the actual data I needed for my work. To hone my SQL skills, I took a short course during my master's program and later spent countless hours reading books and online resources on SQL and database management. SQL is the bread and butter for anyone working in the field of data science. It is not only used for data extraction and quick data transformation, but a solid understanding of SQL fundamentals will go a long way in refining your day-to-day workflow.

2
Data and Data Structures

In the data science world, we deal with diverse types of data. This includes structured data like CSV files and data queried from SQL databases, as well as unstructured data like text and logs. In addition to specific data types, a foundational understanding of data structures is essential for troubleshooting and improving solutions. For example, in a production-level data science solution, saving time and cost is crucial. Understanding data structures can help; for instance, linked lists typically save more memory compared to arrays in scenarios involving frequent insertions and deletions. I had a tough time understanding and remembering these details during my engineering courses, but these key theories have turned out to be invaluable in my everyday work in data manipulation and processing.

3
Cloud Platform

During my daily work, I check and monitor various AWS services to ensure the health of data pipelines. For example, we store data files in S3, use Lambda functions to convert data, and put data into DynamoDB and OpenSearch. Do all these services sound foreign to you? Don’t worry, I had the same feeling when I first started my data science career. Back in college, we rarely had the chance to work with cloud platforms since we were dealing with toy datasets that were no more than a few thousand rows. It would have been inefficient and costly to use cloud services when we weren't dealing with terabyte-level data. However, along the way, I started to pick up knowledge about cloud platforms by reading technical blogs, taking online courses, and learning by doing.

4
Linux/Unix Proficiency

Another programming language besides SQL or Python that I use frequently in my everyday work is Linux/Unix, or command line language. Often, we deal with software in its early stages, where there is no fancy user interface to click or drag to interact with. The classic command line and Linux scripts become handy and important. I use Linux/Unix scripts to run data processing tasks, automate workflows, monitor system performance, and troubleshoot issues. I gained some experience during coursework and much more through self-study and real-world projects.

5

What are the main soft skills you use on a daily basis in your current job?

1
Communication

Being able to communicate complex technical concepts to non-tech-savvy audiences is one of the key soft skills required in every tech career. I still remember the blank faces of the sales team when we used "Random Forest" as a slide title while presenting our customer segmentation model. In college, we were trained to create slides and various kinds of data visualizations aimed at communicating our results effectively and efficiently. As a result, we tended to assume that our audience had the same background as we did, often skipping fundamental details. However, in the real world, we communicate with people from diverse backgrounds. One solution that works for me is to tell a story as if you are helping your grandparents use smartphones and social media. Use plain English, avoid fancy buzzwords, and make it fun to learn.

2
Note taking and documentation

There is a saying that “The palest ink is better than the best memory,” and it holds true in your career journey as well. During your everyday work, you will notice some tasks are repetitive, and having a good habit of taking notes and making documentation will help you better organize your learning. It is one of the key responsibilities as a tech person on the team, as organized notes and documentation will help your teammates or clients understand your work more easily compared to sitting through hours of calls.


3
Teamwork

Always remember you are not working alone, but in a team. Whenever you have a question or have been scratching your head for hours over a line of code, pause and ask for help. When I first started my current role, I would spend hours trying to debug my own code or navigate through company documentation instead of asking the people actively working on the project directly. Soon I realized that, instead of being unnecessarily stubborn and wasting time, reaching out and asking for help would make my work much more efficient.

4
5

Joy

’s personal path

Tell us about your personal journey in

Data Science Fellow

:

Ever since my master's program, I have tried to get my hands dirty by joining many different projects and starting multiple part-time positions. Some were data-related, such as a co-op project sponsored by Toyota and working as a data analyst developing an app for collecting supply chain data. Others were not data-related, such as working at the school office and helping prospective and new students. All these work experiences gave me a peek into the real world. I started to learn how to think like a person working to resolve real-world problems rather than trying to impress people with learnings or good grades as a student.

By 2020, COVID broke out, and everyone was surrounded by lots of sad news and uncertainties daily. It was a tough time to enter the job market, so it was not uncommon for people to delay their graduation due to the difficult market conditions. I still decided to test my luck and tried all approaches, like applying to 30 positions per day, networking with people in the industry, and modifying my resume and LinkedIn profile. Thankfully, I landed multiple interviews given my previous internship experience. However, preparing for and going through these interviews was still stressful. Thanks to my friends from the same program who helped me with multiple mock interviews, I built up my confidence in talking about data in front of a camera.

I eventually landed a job at a small startup, and a year later, I moved to the company I currently work at. The job search experience is never easy for someone fresh out of college in a foreign country, even with multiple internships in hand. There are a lot of waiting, rejections, and sleepless nights panicking about the next day's interview. But I appreciate the time spent researching the company and the industry before applying, practicing mock interviews, working on take-home assignments, and learning about the company and the people during the interviews. All of these honed my skills for doing the real job and taught me how grit and resilience play a vital role in the adult world.

What would you tell your younger you regarding building your current career?

I wish someone had told me the importance of understanding SQL in my early days. I should have spent more time during my master's program learning and practicing SQL. Another thing I wish I had known earlier is that communication is key to every career. Good communication can set you apart. In addition to creating compelling data visualizations that tell a story, I should have learned how to communicate with non-technical audiences. I should have focused on being more empathetic while communicating, including more details to ensure the information clicks for them.

Lastly, I wish someone had told my younger self that data science often involves collaborating with other team members, including data engineers, analysts, and business stakeholders. I shouldn’t have spent hours being anxious about debugging my code and feeling bad about interrupting other people’s work. I should have reached out and asked my questions, which would have saved everyone time on the project.

Final thoughts & tips

Embarking on a journey into data science is both exciting and challenging. Remember, every expert was once a beginner. The field of data science is constantly evolving, offering endless opportunities to learn and grow. Here are a few key points to keep you motivated.

First, embrace your curiosity and have fun! Your curiosity will be your greatest asset. The more questions you ask, the more answers you'll discover. The more mysteries you resolve, the greater sense of achievement you'll get out of your journey.

Second, be patient while working through challenges. Data science is a challenging field that requires the eyes of a hawk and a lot of patience. When you hit a roadblock, take a deep breath and remember that each obstacle you overcome makes you stronger and will become a unique treasure later.

Third, hands-on experience is invaluable, so keep practicing in the real world. Work on real-world projects, participate in hackathons, and contribute to open-source projects. The more you practice, the more proficient you'll become.

Fourth, embrace the data science community and stay updated with the latest developments. The data science field is dynamic and ever-changing, so help from a robust community can take you a long way. Join data science communities, both online and offline. Networking with peers, mentors, and professionals can provide support, guidance, and new opportunities. Never stop looking for new learning opportunities, whether through courses, books, articles, or conferences.

Lastly, always believe in yourself. Confidence is key. Trust in your ability to learn and grow. Remember that every expert started where you are now, and with dedication and hard work, you can achieve your goals. Your journey into data science is a marathon, not a sprint. Stay motivated, keep learning, and enjoy the process. The skills and knowledge you acquire along the way will open doors to incredible opportunities. You've got this!

Resources to dig in more

Towards Data Science

This is a popular online publication that offers a wide range of articles and tutorials on various aspects of data science, machine learning, artificial intelligence, and related fields. The writers come from a diverse community of data science practitioners, researchers, and enthusiasts. Articles cover topics such as data analysis, visualization, deep learning, natural language processing, and practical advice for data science careers. This blog offers valuable information on the latest trends and techniques in the data science industry.

Andrew Yang - LinkedIn

I know using LinkedIn is a cliché for staying updated on industry trends and the work world. Besides updating my profile and looking at job posts, I also like to follow top voices and connect with people working in fields that interest me. For example, Andrew Ng is a name everyone studying machine learning and AI would have heard of. He shares great posts on LinkedIn and provides excellent external resources for learning and real-world applications. I like to follow his posts and read through articles or projects that he recommended or have been working on. You can also find your own key figures in the field and learn through their posts and articles.

Stat Quest - Youtube

Using YouTube for self-learning might be another cliché, but there are many great channels that provide simple and fun explanations for fundamental theories. For example, the "StatQuest with Josh Starmer" channel does an excellent job of breaking down complex statistical concepts into simple, easy-to-understand tutorials. You can also find your own designated channels thanks to the Youtube algorithms.

Joy He

Joy He

Data Science Fellow
Open Avenues Foundation
Open Avenues Foundation

Joy is a healthcare informatics professional with three years of expertise, particularly in FHIR data. After earning her Master’s degree in Information Science from the University of Michigan - Ann Arbor, she joined 1upHealth, where she has been instrumental in advancing healthcare technology. Joy is deeply passionate about the evolution of healthcare tech and is dedicated to improving the future of healthcare in the United States. As an advocate for the integration of cutting-edge informatics in healthcare, she collaborates with interdisciplinary teams to develop and implement data-driven strategies that enhance the quality of care and streamline processes.

More like this