Automated Visual Clustering: A Technique for Image Corpus Exploration and Annotation Cost Reduction


Images are an efficient and effective form of communication that are increasingly prevalent on social media and other platforms. Large-n analyses of these images are necessary if we are to fully understand the conditions under which visuals influence political attitudes and behavior. Images are also a valuable source of information on a range of social science phenomena. Despite great advances in computer vision, state-of- the-art computer vision algorithms are not trained to recognize many features that are of interest to social scientists (such as emotional responses to images). Therefore, researchers must invest in costly manual annotation. We propose an approach for reducing manual annotation costs. Specifically, we use fine-tuned embeddings from a pre-trained convolutional neural net and k-means clustering to first group similar images in a large corpus. We then randomly sample images from each cluster, label them, and propagate those labels to all of the remaining images in the cluster. We demonstrate the method using a corpus of images drawn from tweets using the FamiliesBelongTogether hashtag. This method makes large-n analysis of images considerably less costly for social scientists.