Building a Data Science Portfolio

Eliud Nduati
7 min readJan 2, 2023

Building a portfolio for Data Scientists

If you have interviewed for a data scientist role, remember this question: “Tell me about a project you had recently…”. This question seeks to explore or understand some of your work in data science. But there is a better question that could be asked, “Tell me more about that Fake news detection project you have in your portfolio.” The first question seeks to get a general understanding of projects you have worked on, while the latter seeks specifics. So how do you get to this point to be asked specific questions? By having a portfolio. A data science portfolio highlights projects you have worked on and the specifics about each of them. It showcases your skills while capturing different areas of data science, such as communication, coding, and documentation.

Photo by Hal Gatewood on Unsplash

Do Data scientists need a portfolio?

The data science learning path is complex. It involves learning a lot of skills and the need to showcase them. A portfolio comes in handy in doing that. Additionally, If you are looking for a job in data science, you need to showcase your skills. A data science portfolio will impress the hiring manager and show them what skills you are likely to bring to their company. It also increases your chances of being shortlisted for the interview. Due to these significances, building a great data science portfolio is essential for a career in data science. So, let’s discuss some of the essentials that will impress the hiring manager and give you an edge in the recruitment process.

Elements of a good portfolio

The essential elements of a good data science portfolio are based on what skills you want to communicate. As a data scientist, communicating skills around Exploratory Data Analysis, Data Preparation, Data Gathering, Data Cleaning, Visualization, and Modeling are integral to your career development. But that’s not all; your portfolio needs to also highlight such aspects as your ability to develop code that is clean and of software development standards. You also need to showcase your documentation skills and, lastly, communication skills. As a data scientist, you will have to communicate your findings to stakeholders; therefore, you need to show that you can do that in your portfolio.

While highlighting the skills, the portfolio must show the skills you have in EDA mentioned above. Tools such as pandas, NumPy, and matplotlib are essential to show basic knowledge in the field. Discuss the tool and provide the projects that show how you can use those tools. Have projects in your portfolio with different use cases and varied industries to ensure that you use most skills that are native or basic to data science. Host the projects on notebooks or colab. This shows your skills in the different environments necessary for data science.

Additionally, your grasp of machine learning types and their application using libraries such as sci-kit learn is important. Have at least one project that uses supervised and unsupervised learning. This shows that you grasp the data science process’s machine learning and modeling part.

Besides skills, a good portfolio also needs a good profile. The profile should highlight your goals, the industries you are interested in working in, and your outlook on the data ecosystem. In the same case, the profile needs to be concise and direct while providing room for a good discussion. The design approach to the portfolio also needs to be professional to ensure that it does not cost you the shortlisting for the particular job you are applying to. Make the portfolio eye-catching, using colors that don’t strain the audience’s eyes and neat, clear, and precise visualizations.

Photo by Scott Graham on Unsplash

Projects to include

The range of projects to have in your data science portfolio varies based on your goals and the industry you are interested in. however, different projects will greatly enhance your chances of being shortlisted or contacted by a recruiter from your portfolio.

Exploratory Data Analysis

EDA is an important part of any data science project; therefore, having a project that covers the whole process of EDA and highlights the data wrangling and cleaning process is very important. The process allows you to highlight almost all the necessary steps in data science, including data exploration, visualization, cleaning, and documentation. On documentation, I refer to something most developers leave our comments on. While it is instinctively likely that you will forget commenting your code and explain the steps you are taking, a project highlight

EDA gives you a chance to show that you can communicate your process while at the same time doing the work needed. I have included data cleaning here since when you explore the data, you will likely come across some missing data or other data quality issues that require addressing. This will allow you to show your skills in the same way.

Data gathering

In most cases, data gathering is left to data engineers. However, as a data scientist, you need to know and have the skills to gather the necessary data for your project. Having a project that highlights these skills, especially where you can utilize API tools, web scraping, and the use of SQL, is highly important. The recruiter needs to know that you can get the data and use it for a given project when needed. A data-gathering project also allows you to show that you know what kind of data is needed for specific data science projects.

Machine learning

When working as a business-side data scientist, you will, in most cases, need to create models to experiment on certain decisions or activities in the organization. This is where machine learning comes in. As a data scientist, you need the skills to apply machine learning to your data and get results that you can use to make decisions. Having such a project in your portfolio is highly important.

Visualization

Visualization comes in different formats. Creating dashboards, for instance, is important for client-facing data scientists. You need to have the skills to create dashboards that communicate the necessary information to the clients and stakeholders. Highlighting skills in Looker studio, tableau, and PowerBI is important. Similarly, it is integral to highlight the use of libraries such as seaborn, matplotlib, and plotty in your data science portfolio. These show the recruiter that you have the necessary skills in visualization and data storytelling which are parts of the communication step in a data science project.

Once you have your portfolio ready, the question is, where do you showcase it?

Platform to showcase your portfolio

Photo by Christopher Gower on Unsplash

GitHub

GitHub provides both a version control platform and free hosting for your portfolio. Hosting your portfolio on GitHub has its advantages. First, it shows that you can use it, which is essential for version control, and have a freely accessible web page to showcase your skills. Using GitHub as your portfolio platform also allows you to send it out, attach the link to your data science portfolio on your social media pages, or send it to anyone who wants to know about your skills. A portfolio is essentially your online resume that can be dynamic. Links to your projects on the GitHub portfolio can point to your repositories. This ensures that the recruiter can view your code and the improvements or changes to your data from projects over time, making it a very important platform for any data scientist.

LinkedIn

Data science is a dynamic, ever-evolving field. As a result, you must constantly learn and gain new skills. LinkedIn offers a chance to showcase what you are learning and the skills you have gained. Using LinkedIn to highlight your learning process or new skills shows recruiters that you can be what they are looking for. It also allows you to interact and network with other people learning in the same field and see what they might be learning.

Kaggle

Kaggle provides a platform to practice and do projects on data-related challenges. As a data scientist, having different projects, datasets, and contributions on Kaggle can help build your reputation in the industry. It also offers or acts as a good opportunity for your portfolio as it hosts your projects for free and encourages comments from other individuals in the community. Since Kaggle also provides you with a profile that links to the various projects and comments you have made in the community, it is a good place to have your portfolio of projects while also showing your collaborative skills.

Medium

One essential skill in data science is communication. You are one foot closer to being hired if you can show that you can communicate your findings or your process in data science projects — medium offer free blogging platform for anyone in the tech industry. As a beginner, you can share your learning or projects on medium to ensure they reach and are accessible to anyone you provide the link. Additionally, it attracts comments from industry experts and can be a good place to get hired as a beginner or an expert in data science.

Conclusion

When building your data science portfolio, focus on the skills and tools integral to data science projects. As a beginner, it is vital to showcase your work and skills with different tools. While there are other choices that you will make, such as choosing which language to focus on and what machine learning libraries to specialize with. However, the different steps in a data science project are the same no matter what language you use. In this article, I have highlighted tools and libraries that are needed when using python. Still, the guideline can be applied to any other language that you might want to use.

--

--