rda-p8-poster-lanius

My dissertation research uncovers moments of interpretation and argumentation in big social data projects by charting their research designs. As part of my study, I surveyed members of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. One major result was the discovery of their research data sharing practices, including sharing, soliciting, and obfuscation of data at different points in the research process. Respondents were solicited if they used a large-scale social media data set as part of their analysis, with 34 projects and 90 authors selected. The four types of data found are the original social media data set, final processed data set, ground truth data set collected from another source, and classification/ training data produced as a byproduct of the analysis. These data sets were valued differently and shared at different rates. The original data is only described by its volume, with other context rarely mentioned. It is also very unlikely to be shared, because it is viewed as a homogeneous and easily accessible resource. The final analyzed data is also only occasionally shared, which limits the ability of other researchers to reproduce or test the results. Ground truth data, the known and trusted data used for validating inferences, is highly valued and sought out. The need is high, but options are limited by access to collaborators and a lack of topical, public data sets. Classification data is a byproduct of analysis, used to train the model, and is also very rarely shared. At issue is the ability to reproduce results, improve research, and share efforts so that data does not need to be reinvented for each project. Current barriers to sharing research data among the big social data community include cost, culture, and outdated regulations surrounding human-centered data. My research helps to understand these barriers in specific settings to work to remove them.

 

Advertisements