I was just awarded the Founders Award of Excellence from RPI! The institute’s description of the award is described below.
The Founders Award of Excellence was established in 1994 to honor students who embody qualities of creativity, discovery, leadership, and the values of pride and responsibility at Rensselaer. The award consists of a special certificate, recognition by faculty, staff, and peers at the Honors Convocation ceremony, and a cash prize. Approximately 70 wonderful graduate or undergraduate students – about one percent of our student population – are honored each year as winners of Rensselaer’s Founders Award of Excellence.
Nominees for the Founders Award of Excellence should demonstrate all of the characteristics displayed below:
- Strong academic performance. Candidates should be in the top 10% of their respective class.
- Pride and responsibility in all aspects of her or his life at Rensselaer.
- Outstanding leadership skills exhibiting discretion, judgment, and well-rounded regard for the opinions of others.
- Originality and imagination that may be evidenced by the potential to solve problems and possess skills to promote new ideas and theories in his or her field of study
My dissertation research uncovers moments of interpretation and argumentation in big social data projects by charting their research designs. As part of my study, I surveyed members of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. One major result was the discovery of their research data sharing practices, including sharing, soliciting, and obfuscation of data at different points in the research process. Respondents were solicited if they used a large-scale social media data set as part of their analysis, with 34 projects and 90 authors selected. The four types of data found are the original social media data set, final processed data set, ground truth data set collected from another source, and classification/ training data produced as a byproduct of the analysis. These data sets were valued differently and shared at different rates. The original data is only described by its volume, with other context rarely mentioned. It is also very unlikely to be shared, because it is viewed as a homogeneous and easily accessible resource. The final analyzed data is also only occasionally shared, which limits the ability of other researchers to reproduce or test the results. Ground truth data, the known and trusted data used for validating inferences, is highly valued and sought out. The need is high, but options are limited by access to collaborators and a lack of topical, public data sets. Classification data is a byproduct of analysis, used to train the model, and is also very rarely shared. At issue is the ability to reproduce results, improve research, and share efforts so that data does not need to be reinvented for each project. Current barriers to sharing research data among the big social data community include cost, culture, and outdated regulations surrounding human-centered data. My research helps to understand these barriers in specific settings to work to remove them.
The IDF will conclude with a view from the next generation of data professionals. As data becomes more ubiquitous and fundamental for every aspect of life, how will our responsibilities and opportunities evolve? How does the current generation of digital natives see the impact of data on their work lives and the broader society? What does the horizon look like to them? The session will provide a glimpse of the views of next generation professionals about the brave new world of data.
Moderator: Francine Berman, Co-Chair, RDA Council; RDA/US Chair; Hamilton Distinguished Professor, Rensselaer Polytechnic Institute
- Candice Lanius, Rensselaer Polytechnic Institute, Alliance of Digital Humanities Organizations
- Xiaogang (Marshall) Ma, University of Idaho
- D. Sarah Stamps, Virginia Tech
- Henri Tonnang, Global Young Academy
Transcript has been edited slightly for clarity and only includes personally relevant sections.
Moderator Introduction: I am Fran Berman from the Research Data Alliance. One thing that struck me throughout the day is that this audience is amazingly accomplished and intergenerational. I really enjoyed seeing people all around the room from so many different professional venues and at different places in their careers. The people on stage are very accomplished, but they have seniority. We thought we would end the day by showcasing early career professionals who are also incredibly accomplished to get their spin on what’s going on in the data world and what they’re looking forward to. So, without further ado, I am going to introduce them and share what to expect from the panel. The first thing is this is going to be a very conversational panel, and there will be time for questions at the end. We will introduce the panelists, they will tell you a little bit about themselves, we will have a conversation here on a variety of topics—including data for the public good, data and ethics, and data driven research—and then we will open it up for questions.
[5:00] Fran Berman: Candice, we are going to jump to you because you are the next person on my card. Candice is a PhD Candidate in Communication and Media at Rensselaer Polytechnic Institute. An institution I know well. Candice’s dissertation is a sociological study of the research design and communication practices of big social data practitioners. Those are data scientists, engineers, and computer scientists who use large scale social media data sets to answer questions about science, society, and individuals. Candice is also co-chair of the RDA interest group on ethics and social aspects of data sharing. She just completed a term as a communications fellow for the Alliance of Digital Humanities Organizations. Which, we will have to hear about. Candice is really concerned with empowering users to respond to the changing technology landscape, and she has done all kinds of interesting work in that area. One was a recent paper “Telling the Quants from the Quacks: Evaluating Statistical Arguments in Debates Online.” So tell us a little about that paper and the themes in your research.
Candice Lanius: Thank you, Fran. I think she just selected that because she loves the title. In that paper [Telling the Quants from the Quacks], what I address is how, with open science and open data, you really start to have quality issues, particularly when you are looking at online communities who are not just quoting science anymore, they are starting to try to emulate that process themselves. The evaluation standards do not import from our own scientific disciplines: you can’t do peer-review, and you don’t necessarily have their process notes, so we are looking at how those [results] are arguments in a different sense. In this case, I was looking at climate change skepticism compared to those that recognize that climate change is real, and how the data from one source is being used differently by both communities. From the outside looking in, we can evaluate what they are doing to see which one is the “quant” and which one is the “quack.” Continue reading
A grand round at Albany Medical Center September 7, 2016 from 11:00am – 12:00pm presented by Candice Lanius, a doctoral candidate in the Dept. of Communication and Media at Rensselaer Polytechnic Institute.
Description: The quantified self movement promises data-driven perfection to anyone willing to go beyond their own meager perceptive abilities by recording every aspect of their daily life. Self-tracking involves meticulously recording behaviors over a period of time, then analyzing them for patterns that will point towards optimization strategies. Inputs—such as hours of sleep, nutrition, and exercise—are correlated to mood, health, and socialization, among other outputs. For the QS movement, any item than can be captured as queryable data is recorded, processed, and scrutinized for insights into the individual’s life. With the ubiquity of mobile sensing technologies (e.g. smartphones and other sensors in the ‘internet of things’), self-tracking has taken the stage as a potential gold mine for the medical profession. Rather than asking the patient to remember their symptoms and activities while collecting a patient history, self-tracking allows the patient to record symptoms and experiences as they occur, removing the weakness of human memory from the equation. While self-tracking is a promising new avenue for scientific discovery and individual treatment, like all technologies, self-tracking introduces new challenges for both the scientific community and individuals using self-tracking as part of a therapeutic intervention.
This talk introduces a rhetorical framework for cataloguing the changes that a self-tracking strategy exerts on the individual, thereby helping practitioners understand and avoid the unforeseen consequences of hastily implementing this new technology. Rhetoric, when applied to situations where patients use mobile sensor technologies to record their behavior, uncovers how the act of recording behavior can fundamentally alter it. By walking through a series of examples, from nutrition tracking to mood analysis, this talk explores how tracking apps change how we communicate experiences and the type of trust that informs the patient-doctor relationship. A starting point is asking what, precisely, the self-tracking apps record to understand the limitations for their use in screening and treatment. While self-tracking apps can be helpful, they also contain moments of tension or “anxieties” where their design and function interfere with daily life. A balanced understanding of these tradeoffs will assist in future design and deployment of self-tracking technologies for therapeutic goals.
Below are additional examples for my 2015 ACM/IEEE International Conference on Advances in Social Networks Analysis and Mining paper “Arguments and Interpretation in Big Social Data Analysis: A Survey of the ASONAM Community.”
B. Argumentation Theory and Research Design
The proposed framework for evaluation looks at research design, the strategic plan built by the researcher to coherently and logically organize the research process. An ideal research design provides a technical roadmap for the researcher to collect and analyze their data, and it also insures that the research addresses the problem successfully . To put this another way, the research design is the planned route to keep human errors from effecting the results. Components of most research designs include performing a literature review for other’s work on the topic, the proposal of research questions, the identification of data, a plan for collection and processing, and a method for analysis.
Big social data analysis complicates traditional notions of research design because the data exist independently of the research project and prior to the formulation of a research question. Due to this obstacle, I propose that we consider research designs as more than technical roadmaps: research designs are also arguments. By treating them as arguments, we can create standards for evaluation of components of the plan as propositions. The evaluation of research plans as arguments allows for the production of the best work possible by facilitating the explicit consideration of alternative explanations.
During the research process, there are numerous moments of interpretation where the researcher selects from a range of appropriate alternatives . In these moments, selecting the right or wrong answer over-simplifies the situation. The survey of ASONAM participants uncovers interpretive moments to evaluate them as arguments: while there is not a right or wrong answer, there are better answers that more completely or accurately address the problem space.
Argumentation theory provides a structure to understand how research designs function as arguments . Toulmin’s model for addressing formal arguments is composed of data, warrant, claim, ground, backing, and qualifier. Claims are the final conclusions, and warrants are what link data and the ground to a claim. The ground, which can often overlap with the data, is the basis for using a specific type of data. The ground is the definitions and theory where most arguments begin. The backing is additional support for an argument that bolsters unexpected or counter-intuitive claims. Finally, qualifiers condition when the claim should be accepted (e.g. “if x, then y”) or provide the strength of belief in its veracity (e.g. “sometimes x occurs”). These constituent parts can be found in big social data research designs, and by charting the arguments using this model, they can be evaluated and improved.
Fig. 1 is an example of the argumentation framework applied to the research design of Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1-8.
The backing for the ground—that collective mood states may impact systemic decisions— is italicized because it is a proposition that only logically supports the research plan after the technical demonstration of the model. The ground emerges from behavioral economics, borrowing strength from a well-established observational discipline. Ultimately, the technical aspects are sophisticated and performed without error and the qualifier maintains reasonable expectations for the results. In this case, the research as an argument is very persuasive to the community: It has been implemented in numerous real world applications and cited over 2,400 times.
Fig. 2: Chatterjee, A., & Perrizo, W. (2015, August). Classifying stocks using P-Trees and investor sentiment. In 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 1362-1367). IEEE.
Rhetoric Society of America Biannual Meeting in Atlanta, Georgia.
Paper Title: Topoi of Mathematical Statistics: The Creation of Argumentative Facts by Data Scientists
Presentation: RSA16 Topics of mathematical statistics, Lanius
Abstract: Data scientists, who frequently rely on massive amounts and varied types of social media data, create insight and value using complex mathematics and engineering methods. This paper introduces a new, rhetorically motivated, approach to understanding results as argumentative facts: results which are both technical and interpretive objects. Building from Leff’s (1983) work on systems of topical invention and Dreyfus & Eisenberg’s (1986) article On the Aesthetics of Mathematical Thought, I argue that the mathematical aesthetic principles of conciseness, simplicity, clarity, structure, power, cleverness, and surprise are not simply secondary considerations employed after a statistical analysis is performed. These principles are used by data scientists as procedural topics that connect their data to compelling claims (Toulmin, 1958). The system is demonstrated using contemporary examples from big social data analysis. Ultimately, a system of topics increases awareness of different governing logics so that data scientists can violate disciplinary norms during the inferential process to make stronger arguments.
Keywords: Data science, big social data, argumentation, interpretation, topical invention, mathematical aesthetics, procedural rhetoric
Today, I had the opportunity to present a part of my dissertation research to the Department of Psychiatry at Albany Medical Center. The journal club was a good place to present to a different audience (in this case, domain experts in psychology !) and discuss ramifications/ potential outcomes. I look forward to presenting a full grand round at Albany Medical Center in September.
The winners of RPI’s writing contest were announced last night, and I placed first in the graduate student essay category, winning $300, with my essay “Finding Rhetorical Agency in the Data Science Machine: Understanding Emerging Climate Change Arguments from Automated Data Modeling.”
Last year I came in second with my essay “The Path of Least Resistance: An exploration of non-human agency in a workplace survey.”
I am noticing a trend… despite having different contest judges both years, they appear to like papers that discuss technology and agency.
Congratulations to the other winners!
After passing my Prospectus Defense in December, my PhD project was suddenly real: I must now actually do primary research and write a several hundred page document. The best advice I have gotten so far is to take it one step at a time. So here is a progress report on Stage 1: The Survey.
I submitted my survey text to Rensselaer’s Institutional Review Board in January and received an exemption: “45CFR46.101(b)(2): Anonymous Surveys – No Risk”. Since my survey is anonymous and does not harm any of the respondents, I am cleared for action.
My target community is big social data researchers in the United States working out of academic institutions. For the first round of invitations, I have been using the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining proceedings to solicit respondents. Everyone has been very polite and interested in my work, and I have had a solid 20% response rate. My goal is to get at least 20 responses, but I will continue collecting through the end of March.
For this initial survey, I am interested in how data scientists use interpretation to complete their projects and how they communicate their results to their audience. My survey questions focus on a few key themes. First, I was interested in how respondents understood their disciplinary role and why they became interested in big social data. Next, I asked about interpretation: how they decided on research questions and generated explanations for their results. If they changed their research questions mid-way through the analysis, I also wanted to know what steps they took to ensure accuracy. Then, I turned to technical aspects of the process, asking what steps they took and how they handled false-negative and false-positives. Finally, I asked about communicating results persuasively and to a target audience. The preliminary results look promising, and I personally find them fascinating!
In case anyone is particularly interested, here are the exact questions. The bulk of them are directed at the researcher’s specific project they submitted to the ASONAM conference. Continue reading