As an aspiring data scientist, you must have heard the advice “do data science projects” over a thousand times. Here are some ideas to strengthen your skills and build a portfolio that stands out from the crowd of data science enthusiasts looking to break into the field.Data science projects are a great learning experience, they also help you stand out
Bonus Create some projects where you collect data using an API or some other external tool
These skills will come in handy when you start working with third-party data
Conclusion
You need to showcase projects that display a variety of skills
- Data collection, analysis, visualization, and machine learning
- Online courses aren’t sufficient for you to gain skills in all these areas
- Portfolio projects are one of the best ways to display your skills to a potential employer
Breast Cancer Analysis
Using a K-means clustering algorithm to detect the presence of breast cancer based on target attributes
Exploratory Data Analysis
After collecting and storing data, you will need to conduct an analysis of all the variables in your data frame
World Happiness Report
Tracks six factors to measure global happiness
- Life expectancy, economics, social support, absence of corruption, freedom, and generosity
- Answers the following questions: which country is the happiest in the world, what are the most important contributing factors to a nation’s happiness, is overall happiness increasing or decreasing?
Not all data science projects stand out
Listing the wrong projects on your portfolio can do more harm than good
Data Collection
Data collection and pre-processing is one of the most important skills to have as a data scientist.
- After understanding the business requirement, you need to gain access to relevant data on the Internet
- Then, the data needs to be cleaned and stored into data frames in a format that can be fed as input into a machine learning model
Web Scraping
Can be done by scraping an online course website and storing all the results into a data frame
- You can also create visualizations around variables like price and rating to find a course that is both affordable and of good quality
- Create a sentiment analysis model and come up with the overall sentiment surrounding each course
Building a Covid-19 Dashboard
Pre-process the data using Python
Building an IMDB-Movie Dataset Dashboard
Experiment with the IMDb dataset and create an interactive movie dashboard with Tableau
- Upload your visualizations to Tableau Public, and share the link with anybody who wants to use your dashboard
- Potential employers can get to interact with the dashboard
Data visualization
Present findings and data in the form of visualizations
- Interactive dashboard
- Graphs are easy to understand at a first glance
- Can be used to present findings to a non-technical audience
- Projects can be showcased on your portfolio to demonstrate data visualization skills
Identifying the risk factors of heart disease
Use Python or R to analyze the relationships present in this dataset, and come up with answers to questions such as:
- Are patients with diabetes more likely to develop heart disease at an early age?
- Is there a certain demographic group that is at higher risk of heart heart disease than others?
- Does frequent exercise lower the risk of developing heart disease?
- Are smokers more likely than non-smokers?
Sentiment Analysis on Food Reviews
Sentiment analysis is an important aspect of machine learning
- Used to gauge the overall customer response to their products
- Customers usually talk about products on social media and customer feedback forums
- This data can be collected and analyzed to gain an understanding of how different people respond to different marketing strategies
Life Expectancy Prediction
In this project, you will be predicting a person’s life expectancy based on variables such as education, number of infant deaths, alcohol consumption, and adult mortality
Web Scraping
Food Reviews
- Create a web scraper to collect all the review information from all the web pages of this site, and store it in a data frame
- You can use the data collected to build a sentiment analysis model and classify which of these reviews are positive and which ones are negative