SAS Viya for Learners Data Repository

A site that documents over 100 data sets housed in SAS Viya for Learners, a free software that gives students and faculty access to analytical SAS software and approximately 600 data sets.

project description

During my first year at SAS, I was tasked with identifying and documenting over 500 SAShosted datasets used in data science course creation. As information on datasets was sparse and dispersed over a multitude of files (upload request tickets, course documents, and webpages), I used Python to run unsupervised clustering of the unstructured text data. This exposed a variety of dataset relationships (such as replication, time series, similarity in usage, and similarity in topics) that were used to fill gaps in documentation and identify places where cloud resource usage could be minimized. The site made with the resulting documentation received 1.5K visitors from 64 countries within the first few months of usage, and the Python library created remains as a resource for the department.