This project is no longer accepting applications. Subscribe to our newsletter to be notified of new projects!
Using a suitable training set, train a Large Language Model classifier to find product reviews among user comments and determine their sentiment (positive, negative or neutral).
User comments are the data fellow's company website's most unique and valuable asset;they provide more genuine and honest opinions on products than most anywhere else on the internet. However, not all of the comments are useful, as there is also banter, trolling and off-topic discussions. While ChatGPT can identify and classify product reviews extremely well, it is costly and time-consuming to use it for every new comment posted on the website. Instead, a Large Language Model (LLM) fine-tuned for this specific task can perform this task quickly, offline and at a fraction of ChatGPT.
In this Build Project, imagine you are part of Company X’s team and it’s your responsibility to solve this challenge. You will train a Large Language Model (LLM) classifier to surface product reviews from among the large body of comments on Company X’s deals and perform sentiment analysis on them.
Get to know the Build Fellow and other students, ask questions about the project requirements, prepare your workspace.
Learn and apply SQL commands inside SQL server to slice and dice a movie database and find a movie to watch tonight!
You will learn how to use the Pandas library to open a music related dataset and find a new artist you may be interested in!
You will access AWS RDS to retrieve the company specific dataset, after which you will clean it and perform Exploratory Data Analysis.
Fellow will deliver a lecture on LLM’s and then you will create a story by taking turns with various LLM’s at adding sentences one after another.
You will choose an LLM from the open-source platform Huggingface and fine-tune it using the training dataset created previously.
Using the Python library sklearn, you will calculate accuracy of machine learning experiments and create a Flask API to serve the best performing model.
Polish your project deliverables and present them to the Build Fellow and other students in the final group session.
Get access to all of our Build projects, including this one, by creating your Build account!
Get started by submitting your application.
We'll notify you when projects reopen. In the meantime, you can explore our resources and learn more about our Fellows.
I'm a full stack data scientist with a background in computational sciences, having earned my Ph.D. in the field. My journey into the realm of data science was born from a deep-seated curiosity about the intricacies of the world and a passion for unraveling its mysteries. Initially driven by a thirst for scientific understanding, I transitioned into leveraging my skills to tackle real-world challenges. Along the way, I taught myself coding and immersed myself in various data science courses, honing my expertise to navigate complex data landscapes.
Currently, I thrive in the dynamic environment of an ecommerce/deals website, where I apply my knowledge to classify and perform sentiment analysis on user comments. My role extends to ensuring data integrity through rigorous validation processes. Delving deep into data problems is not just a profession for me; it's a relentless pursuit that fuels my drive to make meaningful impacts through insightful analysis and innovative solutions.