Projects – Seonkyu Kim

Enhancing Patient Privacy with Synthetic Data Generation (Future Edelman Impact Competition Finalist)

20 April 2024

Facing the dual challenges of maintaining patient privacy and managing high data protection costs, the healthcare industry requires innovative solutions. Our synthetic data approach ensures enhanced privacy, reduces operational expenses, and provides valuable data for research, driving cost-effective advancements in healthcare.

View Project

Forecasting Walmart Sales with Machine Learning

28 February 2024

The project team predicted Walmart's sales quantities across California, Texas, and Wisconsin with 5-year sales, price, and holiday data. We developed Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Long Short-term Memory (LSTM), and Transformer. The LSTM model performed the best, with a Weighted Root Mean Squared Scaled Error (RMSSE) of 0.6860.

View Project

Purdue Data Visualization Championship (Award Winner)

20 January 2024

Daniels School of Business Excellence in Business Insights Award Winner. The project team used the LEGO Brick Database to devise a data visualization to address the question—How did a simple children's toy become a sophisticated hobby for adults? I created a bar chart race, line charts, and bubble charts to find out the trends of LEGO data. Moreover, I constructed an interactive dashboard to provide more insights about LEGO trends and builders.

View Project

Airbnb Property Churn Analysis

10 December 2023

The project team identified reasons for Airbnb's property churn in Washington using machine learning models—Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting. The Random Forest model with a threshold of 0.12 worked better than other models (Accuracy = 0.75, Sensitivity = 0.73, Specificity = 0.76).

View Project

Bankruptcy Prediction with Machine Learning Models

09 December 2023

The project team developed predictive models that combine various econometric measures and allow one to foresee a firm's financial condition. The models were evaluated in the Kaggle competition. Our team's best model was an ensemble model of Logistic Regression, Gradient Boosting, and Neural Network models, which got a 0.9182 AUC score.

View Project

The Purdue Data 4 Good Case Competition (National 13th, Regional 5th Place)

30 November 2023

The project team aimed to integrate large language models (LLMs) like GPT and Llama 2 into healthcare. Our focus was on automating medical documentation to free healthcare professionals from administrative tasks and allow them to concentrate more on patient care. We placed 12th in the Kaggle competition with a Word Error Rate of 0.64242. In the final leaderboard of the case competition, we placed 13th in national, 5th in regional, and 2nd in our cohort.

View Project

Centralized Database Management System for Eco-transportation Company

30 October 2023

The project team developed a more structured and centralized Database Management System (DBMS) facilitated through SQL for the eco-transportation company in India.

View Project

Predicting Consumer Tastes with Big Data at GAP

29 October 2023

The project team suggested several web data analytics to predict consumer trends for GAP.

View Project

Artificial Intelligence Model for the Prediction of Cardiac Arrests Using Time-Series Biometric Data

31 July 2020

The project team predicted in-hospital cardiac arrests using different deep learning and machine learning models—Convolutional Neural Network (CNN), Long Short-term Memory (LSTM), Deep Neural Network (DNN), and Light Gradient Boosting Machine (LGBM). I was responsible for managing the Deep Neural Network model. I also took charge of preprocessing the data, fine-tuning the four models, and presenting the final output. We improved the predictive accuracy by 20% compared to the existing model—Deep learning-based Early Warning System (Kwon et al., 2018)—with LSTM (for 6-hour prediction) and LGBM (for 1-hour prediction).

View Project

Algorithm Trading Model: Backtesting and Rebalancing ETF Portfolio

31 May 2020

The project team designed an algorithm trading model, backtesting and rebalancing the ETF portfolio (based on the NASDAQ-100 Index). As a result of the backtesting, the model could achieve an average annual return of 12.75%.

View Project

Deep Learning Model for Natural Language Translation

30 April 2020

The project team engineered deep learning models for natural language translation (Portuguese to English). We developed a seq2seq model with Long Short-term Memory (LSTM) and a Transformer model. We evaluated the models with a BLEU (Bilingual Evaluation Understudy) score, and the Transformer model scored 0.45, which was higher than the score of the seq2seq model (0.42).

View Project

Machine Learning Model for Stock Price Prediction

31 March 2020

The project team constructed machine learning models for stock price prediction with 9-year time series data. We predicted whether the stock price would rise or fall by developing machine learning models—Support Vector Machine (SVM), Logistic Regression, Gradient Boosting, Decision Tree, and Random Forest. The SVM model got the best AUC score (0.62). We predicted the stock price with Long-Short Term Memory (LSTM) and ARIMA model.

View Project