• Atul for Marketing
  • Posts
  • From AI to Religion: Analyzing ~300,000 Reddit Comments to See What the World’s Talking About 🌎

From AI to Religion: Analyzing ~300,000 Reddit Comments to See What the World’s Talking About 🌎

I analyzed Reddit comments from country-specific subreddits to uncover global discussions on AI, Religion, and Politics. By scraping, translating non-English content, and matching keywords, I explored what citizens from various countries are talking about.

In Week 3 of my Project52 initiative, I tackled a fascinating problem: which countries are discussing AI, Religion, and Politics on Reddit? I broke down the entire task into three critical steps: scraping data from country-specific subreddits, translating non-English content into English, and finally, visualizing the results to identify global trends. Here’s a detailed look at how I approached each step, with technical explanations and code snippets along the way.

🛠️ Part 1: Scraping the Data from Reddit

The first step was to scrape Reddit data. Reddit has a vast number of subreddits dedicated to countries, so I focused on scraping posts and comments from subreddits like r/canada, r/india, r/usa, and others.

Tech Used:

  • PRAW (Python Reddit API Wrapper): This Python library allows easy interaction with Reddit’s API, enabling us to scrape data from various subreddits.

  • Rate Limiting Management: Reddit’s API has limits on how frequently you can send requests. I used a time.sleep() function to manage request pacing and avoid hitting rate limits.

Code Screenshot:

Here’s how I set up the PRAW client and fetched data from Reddit:

In this part of the code, I initialized the Reddit API client with the necessary credentials and defined the subreddits to scrape. The code loops through each subreddit, fetching the title, text, number of comments, and post creation time for each submission. I used time.sleep(2) to add a delay between requests to handle Reddit’s rate limits.

🌍 Part 2: Translating the Content

Given that Reddit is a global platform, posts and comments can be in multiple languages. To make sure that all content is correctly processed, I translated non-English content into English. This was crucial for keyword matching and analysis.

Tech Used:

  • Deep Translator API: For translating non-English content into English. It supports a wide variety of languages, including French, German, Arabic, etc.

  • Google Translate: As a backup, I used Google Translate API for certain edge cases.

Code Screenshot:

This is the translation function that ensures all non-English posts are converted to English:

This function detects the language of the text using langdetect and uses Google Translator to convert it to English if it isn’t already. The translation process ensures that content in languages such as French, Arabic, and German is captured in English for further analysis.

📊 Part 3: Visualizing the Data

After gathering the scraped and translated data, I focused on visualizing the number of keyword mentions for AI, Religion, and Politics across countries.

Tech Used:

  • Pandas: For data manipulation (counting keywords, aggregating data by subreddit).

  • Matplotlib: For creating visualizations such as bar charts and scatter plots.

Code Screenshot:

Here’s the code for counting keyword mentions and visualizing the results:

In the visualization step, I used Pandas to aggregate the counts of AI, Religion, and Politics mentions for each subreddit. The bar charts display the keyword mentions for each country, with light blue bars for AI, light coral for Religion, and light green for Politics.

🧭 Results

🔍 Key Insights

  • AI Mentions: USA and Germany emerged as the most prominent countries in terms of AI-related discussions, driven by the increasing focus on technology and innovation in these regions.

  • Religion Mentions: India, Saudi Arabia, and Turkey were the highest in religion-related content, reflecting how deeply religion is embedded in their public and political discourse.

  • Politics Mentions: India, USA, and Brazil were highly active in discussing politics, likely due to the ongoing elections and political movements in these countries.

📈 Data Processed & Analyzed

The dataset contained a total of 281,624 comments across these subreddits, and it provided an insightful look into the global discussions on AI, Religion, and Politics. By counting the mentions of keywords in each post and comment, I was able to quantify the extent of these topics in each country.

🚧 Challenges Faced

  • Language Diversity: Dealing with various languages in Reddit posts required accurate translation to ensure nothing was missed.

  • API Rate Limiting: Reddit’s API rate limits were handled with delays to ensure smooth data collection without errors.

  • Data Processing: Extracting accurate counts from long text posts, managing missing data, and processing large amounts of content posed challenges, but were managed efficiently.

This was a multi-faceted project that combined data scraping, real-time translation, and data visualization. By dividing the task into these key parts, I was able to analyze global trends in discussions around AI, Religion, and Politics across different countries.

As part of my Project52 initiative, I continue to explore innovative ways to tackle AI, machine learning, and data analysis problems, and I’ll be back next week with another exciting project!

📣 Let’s Connect

Feel free to connect with me if you’re interested in AI, Data Analytics, or Natural Language Processing. 🚀 I’m always looking for new collaborations and insights!