Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Evaluating relevance between images/texts, and hashtags in Instagram
Natural Language Processing
NLP, Embedding
Date
April 2019
Objective: Visualize how popular hashtags are used by Instagram users and assess their relevance to posted content
Dataset: 10,000 Instagram posts scraped using API Igram Scrapper
Tools and Technologies:
1. API Igram Scrapper for data collection
2. VGG16 pretrained model for image embeddings
3. 16-language text encoder for caption embeddings
4. Clustering algorithms: KMeans, Hierarchical (Agglomerative), DBSCAN, Birch, Spectral Clustering
5. T-SNE for visualization
Methodology:
1. Data scraping and conversion to JSON format
2. Extraction of images, text, and hashtags
3. Generation of image and text embeddings
4. Application of various clustering algorithms
5. Frequent hashtag analysis within clusters
6. Visualization of clusters using T-SNE
Key Findings:
Text embeddings provided more meaningful results than image embeddings
Identified thematic clusters including environment, climate activism, nature photography, recycling, food, climate crisis, feminism, LGBTQ+ issues, and veganism
Outcome: Successfully demonstrated potential discrepancies between popular hashtags and actual post content, providing insights into user behavior on Instagram
Potential Applications: Methodology can be applied to larger datasets or other social media platforms for broader insights into hashtag usage patterns and content relevance
This project showcases skills in data scraping, preprocessing, machine learning, natural language processing, and data visualization, while addressing a relevant issue in social media content analysis.
Video explanation:https://youtu.be/C0sn-s9VFvg

