top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Evaluating relevance between images/texts, and hashtags in Instagram

Natural Language Processing

NLP, Embedding

Date

April 2019

Objective: Visualize how popular hashtags are used by Instagram users and assess their relevance to posted content

Dataset: 10,000 Instagram posts scraped using API Igram Scrapper

Tools and Technologies:
1. API Igram Scrapper for data collection
2. VGG16 pretrained model for image embeddings
3. 16-language text encoder for caption embeddings
4. Clustering algorithms: KMeans, Hierarchical (Agglomerative), DBSCAN, Birch, Spectral Clustering
5. T-SNE for visualization

Methodology:
1. Data scraping and conversion to JSON format
2. Extraction of images, text, and hashtags
3. Generation of image and text embeddings
4. Application of various clustering algorithms
5. Frequent hashtag analysis within clusters
6. Visualization of clusters using T-SNE

Key Findings:
Text embeddings provided more meaningful results than image embeddings
Identified thematic clusters including environment, climate activism, nature photography, recycling, food, climate crisis, feminism, LGBTQ+ issues, and veganism

Outcome: Successfully demonstrated potential discrepancies between popular hashtags and actual post content, providing insights into user behavior on Instagram

Potential Applications: Methodology can be applied to larger datasets or other social media platforms for broader insights into hashtag usage patterns and content relevance

This project showcases skills in data scraping, preprocessing, machine learning, natural language processing, and data visualization, while addressing a relevant issue in social media content analysis.

Video explanation:https://youtu.be/C0sn-s9VFvg

© 2022 by Advanced Predictive Analytics with Bhavya. All rights reserved.

bottom of page