This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!
Develop machine learning models to train computers to interpret text and predict whether a customer’s attitude is positive, negative or neutral towards a company or the service they received from the company based on text data from Twitter.
The airline industry is more competitive than it has ever been since more and more small and low-cost carriers entered the market. In order to increase business competitiveness, companies need to constantly assess and monitor how satisfied the customers are with the company and the service the company provides, as well as identify any potential negative trends on social media.
In contrast to traditional paper-based surveys, machine learning models provide a more timesaving, economical, and reliable way to identify and quantify customers’ opinions and attitudes, especially by utilizing massive data from social media. In this Build Project, you will build machine learning models to train computer software to interpret text and predict whether a customer’s attitude is positive, negative or neutral towards the service they received from six U.S. airline companies based on text data from Twitter. You will get exposed to Exploratory Data Analysis (EDA), text preprocessing, feature engineering, data balancing, model design, development and evaluation. You'll get exposed to the daily life of a Data Scientist and learn techniques that are must-to-know for any DS professional.
Get to know the project leader and other students, understand project background and goals, get familiar with platforms/tools (GitHub, Git), ask questions about project requirements and expectations
Get familiar with Python IDE: Spyder and how to install Python packages, use Python to import data and perform Exploratory Data Analysis (EDA)
Utilize string functions and features from Python packages (string, re, NLTK) to clean and preprocess text data, including letter decapitalization, tokenization, stemming, etc.
Understand the concept of bag-of-words, n-grams, and TF-IDF. Use features from Python package sklearn to convert text data to meaningful numerical features which can be used as machine learning models’ input
Perform undersampling and oversampling to balance data and validate results
Explore machine learning models, split dataset to training and test dataset and build machine learning models by leveraging Python package sklearn
Understand the concept of model evaluation metrics: accuracy, recall, precision and F1 score, utilize features provided by sklearn.metrics in Python to assess and compare models’ performance
Create slides to include project description and goals, all the involved steps and techniques, results and findings from models, and the final model recommendation for the project, present the project to the project leader and other students
Get access to all of our Build projects, including this one, by creating your Build account!
Get started by submitting your application.
We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.