Fake News Detection

Employing graph based machine learning for fake news detection

In the age of fake news and inspired by graph based machine learning techniques, in this project, I worked with propagation-based approaches and real-world data to help detect fake news. Check out our github repository for more details!

I constructed a new dataset based on the MuMiN dataset using Twitter API for tweet extraction. I also created an end-to-end data collection framework using Twitter API (v1 and v2 endpoints) that builds a graph structure for Graph Neural Network (GNN) models starting from any root tweets.

End-to-End data collection framework
End-to-End data collection framework.
Cascade illustration
Illustration of a cascade (used to construct the graph we use). A cascade refers to a news diffusion tree produced by a source tweet referencing a URL and all of its retweets.

Then, I evaluated the performance of GNN models (employed in PyTorch) on multiple social media datasets.

AUC graph
AUC for all models on each dataset using BERT features.
Accuracy graph
Accuracy for all models on each dataset using BERT features.
Precision graph
Precision for all models on each dataset using BERT features.
<!--
  See https://www.debugbear.com/blog/responsive-images#w-descriptors-and-the-sizes-attribute and
  https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images for info on defining 'sizes' for responsive images
-->

  <source
    class="responsive-img-srcset"
    
      srcset="/assets/img/graph-design-480.webp 480w,/assets/img/graph-design-800.webp 800w,/assets/img/graph-design-1400.webp 1400w,"
      type="image/webp"
    
    
      sizes="95vw"
    
  >

<img
  src="/assets/img/graph-design.png"
  
    class="img-fluid rounded z-depth-1"
  
  
    width="100%"
  
  
    height="auto"
  
  
  
  
    title="design"
  
  
  
    loading="lazy"
  
  onerror="this.onerror=null; $('.responsive-img-srcset').remove();"
>

</picture>

</figure>

</div>

</div>

End-to-End data collection framework.
Illustration of a cascade (used to construct the graph we use). A cascade refers to a news diffusion tree produced by a source tweet referencing a URL and all of its retweets.

Then, I evaluated the performance of GNN models (employed in PyTorch) on multiple social media datasets.

AUC for all models on each dataset using BERT features.
Accuracy for all models on each dataset using BERT features.
Precision for all models on each dataset using BERT features.

–>