Caption: Image of the US Department of Education College Scorecard page. We performed exploratory data analysis primarily on information from the site.
Spring 2020 | 2 Weeks
Let's face it: college, for most individuals, is an expensive investment. For instance, tuition at Carnegie Mellon without financial aid is over $55,000. College tuition overall has increased over 25 percent in the past 10 years and student debt, just between 2018-2019, has increased over 6 percent (CNBC). Thus, the value of a college education must be evaluated before making such a life-changing decision.
Return on investment (ROI) is often used to determine the value of a college education. In this exploratory data analysis project, Catherine Du and I wanted to take a deeper dive into what factors increases the ROI of a certain college education. Specifically, we wanted to answer the following questions:
This project was done for the Practical Data Science course (67-364) at Carnegie Mellon. To skip directly to the final deliverable, please click here.
In this project, I was a Data Visualizer and used Tableau to visualize trends and relationships. In addition, I helped clean and process the data for explortory data analysis with Pandas.
To analyze our data, we first had to transfrom the raw data into a form that could be readily used and analyzed. The diagram below shows how we cleaned/prepared the data using the Pandas library in Python. Note that though there are many ways to calculate ROI, we used debt to starting salary ratio as our metric since both debt and starting salary are numerical factors that influence the financial wellness of college graduates.
To answer our three questions, we performed exploratory data analysis on our cleaned dataset. Here are some of our findings:
On average, the higher the prestige, the lower the debt-to-salary ratio
Caption: Bar graph of average debt-to-salary (DS) ratio plotted by college group. Top Non-Ivy schools are the top 30 ranked schools listed by the US News Rankings 2019. Note that the average DS ratio for Ivy League schools is almost half of that of schools in the 'other' category.
Caption: Scatterplot showing colleges plotted on average debt and average earnings. While we had over ~2000 colleges in our dataset, we randomly selected 200 for this scatterplot. Note that higher ranked institutions are aggregated to the left/top left of the graph (in the purple oval), indicating that even with similar debt amounts, students from these institutions, on average, had higher starting salaries.
STEM students, on average, are paid well regardless of institution. However, humanities majors at Ivy League institutions can outearn STEM students at other institutions
Caption: Bar graph of average salary plotted by major type and college group. Note that average non-STEM salary at Ivy League institutions is, on average, higher than that of STEM salaries in Liberal Arts/other institutions, as noted by the purple line.
Caption: Bar graph of average debt-to-salary ratio plotted by major type and college group. Note that average non-STEM debt-to-salary ratio at higher ranked institutions is, on average, higher than that of STEM DS ratios in Liberal Arts/other institutions, as noted by the purple line.
Students who attend institutions in California or the Northeast, on average, have the lowest debt-to-salary ratio
Caption: Map showing regional debt-to-salary ratios, with the California/Northeast with the lowest ratios.
To place the numbers and results in context, here are some factors that explain the findings:
These numbers, or even the explanations above, don't paint the whole picture of the college landscape. Here are some considerations that need to made when building upon these results:
If given more time on this project, we would build upon our findings by doing the following:
Overall, this project re-emphasized how important exploratory data analysis is since there are many insights that can be drawn just simply by grouping and visualizing the data. I was able to refine my skills in Pandas and Tableau throughout the course of the project and had many assumptions about the college landscape as a whole challenged throughout the process.
Developed by Andrew Chuang 2020