Twitter research for social scientists: A brief introduction to the benefits, limitations and tools for analysing Twitter data


  • Javier Ruiz-Soler European University Institute



The analysis of social media is currently very important due to the unprecedented quantity of information. Twitter is becoming an indispensable source of information for researchers aiming to implement big data in their projects. However, despite the potential eld of research opened by that Twitter data, it contains some risks a researcher must be aware. In this paper I present on the one hand the bene ts and caveats of research conducted on Twitter, and on the other hand the constraints of Twitter data collected from the Application Programming Interfaces (APIs). There are, therefore, three major methodological problems identi ed: (i) representation bias: it is very di cult to make general assumptions using research based on Twitter. (ii) language challenge: users can write in many di erent languages. It implies that when collecting data, some cautions need to be taken in order to accurately gather the data we need, (iii) data bias: Depending of the data needed, one API might be a better t than other. The main aim in this paper is to discuss these methodological constraints from a theoretical point of view. I propose, as a starting point, possible solutions to overcome them, or at least reduce their impact in the research.


Ackland, R. (2013): Web social science: concepts, data and tools for social scientists in the digital age. London: SAGE.

Ahmed, W. (2015a): “Challenges of using Twitter as a data source: an overview of current resources”. Available at twitter-as-a-data-source-an-overview-of-current-resources/ [Accessed 15 February 2017]

Ahmed, W. (2015b): “A comparison of Twitter APIs across tools”. Available at https:// [Accessed: 15 February 2017]

Almuhimedi, H., Wilson, S., Liu, B., Sadeh, N., Acquisti, A. (2013): “Tweets are forever: a large-scale quantitative analysis of deleted tweets”. In Proceedings of the 2013 Conference on Computer Supported Cooperative Work. ACM, pp. 897–908.

Barbera, P. (2015): “Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data”. In Polit. Anal, vol. 23, pp. 76–91. doi:10.1093/pan/ mpu011

Bollen, J., Mao, H. (2011): “Twitter Mood as a Stock Market Predictor”. In IEEE Computer Society, vol. 44, pp. 90–93.

Borra, E., Rieder, B. (2014): “Programmed method: Developing a toolset for capturing and analyzing tweets”. In Aslib Journal of Information Management, vol. 66, pp. 262– 278. doi: 10.1108/ajim-09-2013-0094

Bruns, A., High eld, T. (2016): “Is Habermas on Twitter?”. In Bruns, A., Enli, G., Skogerbø, E., Larsson, A.O., Christensen, C. (Eds.): The Routledge Companion to Social Media and Politics. London: Routledge, pp. 56–73.

Burgess, J., Bruns, A. (2012):”(Not) The Twitter Election: The dynamics of the #ausvotes conversation in relation to the Australian media ecology”. In Journalism Practice, vol. 6, pp. 384–402. doi:10.1080/17512786.2012.663610

Cantijoch, M., (2014): Analysing social media data and web networks. New York: Palgrave Macmillan.

Cha ey, D., (2016): “Global Social Media Statistics Summary 2016”. Available at http:// social-media-research/ [Accessed: 17 February 2017]

Cheng,T.,Wicks,T.,(2014):“EventDetectionusingTwitter:ASpatio-TemporalApproach”. In PLoS ONE, vol. 9(6), e97807. doi:10.1371/journal.pone.0097807

DiGrazia, J., McKelvey, K., Bollen, J., Rojas, F., (2013): “More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behaviour”. In PLoS ONE, vol. 8, e79449. doi:10.1371/journal.pone.0079449

Duggan, M., (2015): “The Demographics of Social Media Users.” In Pew Research. Available at: media-users/ [Accessed: 17 February 2017]

Dunbar, R.I.M., Arnaboldi, V., Conti, M., Passarella, A., (2015): The structure of online social networks mirrors those in the o ine world. In Social Networks, vol. 43, pp. 39–47. doi:10.1016/j.socnet.2015.04.005

Eleta, I., Golbeck, J., (2014): “Multilingual use of Twitter: Social networks at the language frontier.” In Computers in Human Behaviour, vol. 41, pp. 424–432. doi:10.1016/j. chb.2014.05.005

Eurostat, E.C., (2016): “Digital economy and society”. Available at eurostat/statistics-explained/index.php/Digital_economy_and_society_statistics_-_ households_and_individuals [Accessed: 17 February 2017]

Gayo-Avello, D., (2015): “What do we mean when we talk about Twitter political opinion?” In The Plot. Available at: mean-when-we-talk-about-twitter-political-opinion/ [Accessed: 17 February

Gayo-Avello, D., (2012): “No, you cannot predict elections with Twitter”. In IEEE Internet Computer Society, vol 16, pp. 91–94.

Golder, S.A., Macy, M.W.,(2015): “Introduction”. In Mejova, Y., Weber, I., Macy, M.W. (Eds.), Twitter: A Digital Socioscope. Cambridge: Cambridge University Press, pp. 1–20.

Golder, S.A., Macy, M.W., (2014): “Digital Footprints: Opportunities and Challenges for Online Social Research” Available at annurev-soc-071913-043145 [Accessed: 2 February 2017).

Golder, S.A., Macy, M.W., (2011): “Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures”. In Science Magazine, vol. 333, 1878–1881. doi:10.1126/science.1202775

González-Bailón, S., Wang, N., Rivero, A., Borge-Holthoefer, J., Moreno, Y., (2014): “Assessing the bias in samples of large online networks.” In Social Networks, vol. 38, pp. 16–27. doi:10.1016/j.socnet.2014.01.004

Hansen, D.L., Schneiderman, B., Smith, M.A., (2011): Analysing social media networks with NodeXL: insights from a connected world. London: Elsevier.

Hermes, J., (2006): “Citizenship in the Age of the Internet”. In European Journal of Communication, vol. 21, pp. 295–309. Doi: 10.1177/0267323106066634

Huberman, B., Romero, D.M., Wu, F., (2008): “Social networks that matter: Twitter under the microscope”. In First Monday, vol 14(1). Available at: http:// view/2317/2063 [Accessed: 17 February 2017]

Kwak, H., Lee, C., Park, H., Moon, S., (2010): “What is Twitter, a social network or a news media?” In Proceedings of the 19th International Conference on World Wide Web. ACM, pp. 591–600.

Layton, R., (2015): Learning data mining with Python: harness the power of Python to analyze data and create insightful predictive models. Birmingham: Packt Publishing Ltd.

Lutz, M., (2013): Learning Python. Sebastopol, CA: O’Reilly.

McKinney, W., (2013): Python for data analysis. Sebastopol, CA: O’Reilly.

Mejova, Y., Macy, M.W., Weber, I., (2015): Twitter: a digital socioscope. New York, NY: Cambridge University Press.

Mitchell, R., (2015): Web scraping with Python: collecting data from the modern web. Sebastopol, CA: O’Reilly.

Morstatter, F., Pfe er, J., Liu, H., (2014): “When is it biased? Assessing the representativeness of twitter’s streaming API”. In Proceedings of the 23rd International Conference on World Wide Web. ACM, pp. 555–556.

Morstatter, F., Pfe er, J., Liu, H., Carley, K.M., (2013): “Is the sample good enough? Comparing data from twitter’s streaming api with twitter’s rehose”. In ArXiv Soial and Information Networks. Available at: [Accessed: 17 February 2017]

Pew Research Center, (2016): “Social Networking Fact Sheet”. Available at: http://www. [Accessed 17 February 2017]

Sadilek, A., Kautz, H.A., Silenzio, V., (2012): “Predicting Disease Transmission from Geo- Tagged Micro-Blog Data”, in Conference on Arti cial Intelligence. Available at: http:// [Accessed 17 February 2017]

Sakaki, T., Okazaki, M., Matsuo, Y., (2010): “Earthquake shakes Twitter users: real-time event detection by social sensors, in Proceedings of the 19th International Conference on World Wide Web. ACM, pp. 851–860.

Shi, L., Agarwal, N., Agrawal, A., Garg, R., Spoelstra, J., (2012): “Predicting US primary elections with Twitter.” Available at pdf [Accessed 17 February 2017]

Smith, K., (2016): “44 Astonishing Twitter Stats and Facts for 2016, In Brandwatch. Available at: [Accessed 17 February 2017]

Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M., (2010): “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment.” In Fourth International Conference on Weblogs and Social Media, pp. 178–185. Available at: http://www.aaai. org/ocs/index.php/ICWSM/ICWSM10/paper/view/1441 [Accessed 17 February 2017]

Twitter, (2016): “Documentation Twitter”. In Twitter Developers. Available at: https:// [Accessed 17 February 2017]

Twitter, (2012): “Changes coming in Version 1.1 of the Twitter API”, In Twitter Blogs. Avaliable at: twitter-api [Accessed 8 February 2017]