Predicting real v. fake news using natural language processing

For this project, I wanted to see if I could use machine learning to reliably predict whether a news article was real of fake. I used a collection of real and fake news articles, which can be found here. I compared two models- Naive bayes with word bags and Artificial Neural Networks (ANN) with word embeddings. The ANN outperformed Naive bayes, with an accuracy of 94% on the final test set. For a bit of fun, I then evaluated how well these models could generalize to a random sample of POTUS tweets from 2019-2020 (found here).

The code to this project and an outline of my findings can be found here.