The IDF will conclude with a view from the next generation of data professionals.  As data becomes more ubiquitous and fundamental for every aspect of life, how will our responsibilities and opportunities evolve?  How does the current generation of digital natives see the impact of data on their work lives and the broader society?  What does the horizon look like to them?  The session will provide a glimpse of the views of next generation professionals about the brave new world of data.


Moderator: Francine BermanCo-Chair, RDA Council; RDA/US Chair; Hamilton Distinguished Professor, Rensselaer Polytechnic Institute


Transcript has been edited slightly for clarity and only includes personally relevant sections.

Moderator Introduction: I am Fran Berman from the Research Data Alliance. One thing that struck me throughout the day is that this audience is amazingly accomplished and intergenerational. I really enjoyed seeing people all around the room from so many different professional venues and at different places in their careers. The people on stage are very accomplished, but they have seniority. We thought we would end the day by showcasing early career professionals who are also incredibly accomplished to get their spin on what’s going on in the data world and what they’re looking forward to. So, without further ado, I am going to introduce them and share what to expect from the panel. The first thing is this is going to be a very conversational panel, and there will be time for questions at the end. We will introduce the panelists, they will tell you a little bit about themselves, we will have a conversation here on a variety of topics—including data for the public good, data and ethics, and data driven research—and then we will open it up for questions.

[5:00] Fran Berman: Candice, we are going to jump to you because you are the next person on my card. Candice is a PhD Candidate in Communication and Media at Rensselaer Polytechnic Institute. An institution I know well. Candice’s dissertation is a sociological study of the research design and communication practices of big social data practitioners. Those are data scientists, engineers, and computer scientists who use large scale social media data sets to answer questions about science, society, and individuals. Candice is also co-chair of the RDA interest group on ethics and social aspects of data sharing. She just completed a term as a communications fellow for the Alliance of Digital Humanities Organizations. Which, we will have to hear about. Candice is really concerned with empowering users to respond to the changing technology landscape, and she has done all kinds of interesting work in that area. One was a recent paper “Telling the Quants from the Quacks: Evaluating Statistical Arguments in Debates Online.” So tell us a little about that paper and the themes in your research.

Candice Lanius: Thank you, Fran. I think she just selected that because she loves the title. In that paper [Telling the Quants from the Quacks], what I address is how, with open science and open data, you really start to have quality issues, particularly when you are looking at online communities who are not just quoting science anymore, they are starting to try to emulate that process themselves. The evaluation standards do not import from our own scientific disciplines: you can’t do peer-review, and you don’t necessarily have their process notes, so we are looking at how those [results] are arguments in a different sense. In this case, I was looking at climate change skepticism compared to those that recognize that climate change is real, and how the data from one source is being used differently by both communities. From the outside looking in, we can evaluate what they are doing to see which one is the “quant” and which one is the “quack.”

Fran: So all day today people have been talking about trust which is a really important piece of online communication. Do you have ways of measuring trust and authenticity when you are looking at these kinds of issues?

Candice: My site of intervention, because I hope to go into academia and be an educator—the younger the better—is to make education my site for intervention, so working on training students for this type of critical literacy so that when they interact with data, insights, and the scientific method, they know what to look for when they are reading materials. I am against top down evaluation, and I also think it’s impossible in online communities any longer to have a top-down scrutinized system where you would filter out what is good from bad. So I am really about empowering the individuals to be able to determine for themselves what’s good and what’s bad.

[25:00] Fran: So talking about hope ahead, and transitions: As you all know, we are less than two months away from a very important election in the United States where things will change one way or another. So Candice, if you could give advice to the next U.S. administration about data driven research and data technologies, what advice would you give them?

Candice: Oh boy… I think the most important thing is continuity of funding, which is no surprise to anyone in this room. But many people don’t understand how much infrastructure and work goes into all we have currently. Having that continuity of funding or effort is so important. Being a newcomer to RDA several years ago, I listened to conversations and heard so many horror stories of individuals talking about “Do you remember that platform?”, “Oh it stopped 3 years ago…”; or “Do you remember that effort?” There was so much effort invested, and it didn’t end up going anywhere. Those are the most tragic moments that stick out in my mind. So really emphasizing that we have made so much progress, but if we stop, the technology doesn’t simply stick around, you have to continue with whatever it is.

Fran: We have many audience members from the government sector—thank you all for coming; it is really important that you are here—will tell you they don’t have buckets of money, sometimes they have thimbles of money. What they often have is a huge bully pulpit. So how would you advise the next administration to use that bully pulpit to really improve the data ecosystem?

Candice: Well one—I mean, I have one primary tool in the toolbox, so I am going to use it: education. One way to really maximize the value of the few “thimbles” we do have is getting out and reaching students who will become interested and invested in continuing these infrastructure efforts. And driving innovation and growth that way. Using data and technology education would be my other major policy platform.

[40:00] Fran: So I guess I want to expand that a little bit: we are awash in data today but it is nothing compared to how much data there is going to be tomorrow, a decade from now, and a decade after that. As we start thinking about the brave new world of the internet of things where in the ideal everything will be connected and producing data, we will have to understand, organize, access, use it, and make decisions with it. Ethics will be a really important piece of that. The ethics around our technologies and our data- we are already seeing the low hanging ethical issue fruit of who is responsible when your self-driving car hits someone. Is it the algorithm designer? Is it the manufacturer? Who is it? That is just the tip of the iceberg. So Candice, as we think about the data coming from these technologies and the ethics that will be involved in decision making: whose ethics should those be, and how are we going to imbue our technologies with ethics?

Candice: Well to answer that, I would go back to the basic definition of ethics—which is “guidelines for behavior”—and I think that in the very near future, we are going to have to toss that definition out the window. Ethics is going to become “control of behavior.” There is no opting in and out of those guidelines any longer. The data will be collected, it will be scrutinized, and then mandates will be handed down for how individuals will be expected to behave in society. That is the worst case scenario. In the future, rather than thinking about these issues after we build technologies, where we currently look at governance and regulation after the fact, we no longer have the option of disclaiming “oh, people can opt out if they just don’t like the way we built it.” That is a naive position going forward. We have to start talking about sharing control during the development process. Again it goes back to what we have been talking about throughout this conference: trust and reciprocity. All of these things are going to go into this future where we must decide at what level we allow individuals, groups, nations, regions, to share control over the regulation and development of this interconnected internet of things.

Fran: So if we are in a scenario—which we are rapidly in that scenario, we can’t opt out, right, you are still going to have cameras and various other things—whose responsibility is those ethics? Is it government for regulation? Is it the companies themselves? Is it the individual somehow? How do we ascribe responsibility and action to those ethics?

Candice: Well I always think of… we are constantly wearing or carrying these devices with sensor that are passively collecting information about us, and that is seen as a good thing. We are collecting data, and it is useful for users, for companies, and, at a larger scale, for science. In the future you will constantly be passing these sensors: is there any way to use that technology interface to start to build out a right to not be seen. That is, you are seen, but you are camouflaged from the sensors that you choose not to interact with. So whether—I know this is a little out there, but it is supposed to be the future vision panel—being able to wear some sort of actionable sensor that would say Starbucks can collect my location (because who doesn’t love Starbucks?), but the store next door can’t see that I am here and then push advertisements my way. That is a trivial example, but this goes to different levels of actually using the technology as a way to push back against this scrutinization. Because more and more consistently there is no opting out and then continuing to live in a modern society.

Fran: It is going to be a set of really thorny issues that everyone will in some sense have a handle in from a technologist to a lawyer to a policy maker to all kinds of people.

[49:00] Fran: …One more question, I am sure [the audience] has lots of good questions for this amazingly accomplished and interesting group. So one more question, down the line, think ten years out: I am just really interested in your vision, is there one problem that you think, if we have nailed that problem, ten years from now, it would really move the needle on the data ecosystem. Is there one thing that we should make a priority, and how would it really improve things?

Candice: Okay, I heard the professor voice coming out with “Someone has to go, we do have to stick to the schedule.”

Fran: Yes, your grade depends on it!

Candice: Yes Ma’am. So one thing I think about is data hackathons that I have observed where they are given a very short term to work in—here is a bunch of data that you can download and design a project for “the public good.” The teams immediately jump into projects, and the projects that came out are things like preventing spam to your cellphone or optimizing mail delivery. These are nice utility features, but it’s not the definition of public good that I would have chosen. It is not actually helping under-served communities. It is not helping the people that are really disenfranchised in our society. And that is never talked about. So the one thing I would want, is if we are going to talk about these visions for the future—I am not the voice to ask—I would want to go ask those voices for their perspective on what the public good should be.

Fran: Wow. So democratize the technology, democratize the people in some sense.

[54:15] Fran: Okay, now it is the audience’s turn, and I see we already have a question.

Audience 1: Thanks Fran. Great job so far guys. So a little “look back,” “look forward” question. Since you are young, I can’t say look back ten years, so could you look back 2 to 3 years: is there a part of your education that you’d do over again; is there something, based on what you are doing now, that you would have taken with hindsight? What would that be? And if you had the opportunity today to take 6 months off, what would you do or take today: education, or a skill, or an experience to really keep your trajectory going in the way it’s going or even greater?

Candice: Well, my dissertation research is looking at the research design methods used by big social data researchers, and those are the research plans that are developed to have some sort of fidelity or validity at the end of the process where the output actually maps back to whatever the original problem was. And if I could go back, I would take every [research method] class in every different scientific domain ever, because there are so many different strategies, approaches, methods, and foundational theorists. If only I could live longer or take an extra five years. Fran won’t let me, she is on my dissertation committee. But that would be amazing so that I could actually access all of those things.

[1:01:20] Audience 3: So, thank you for your comments. Open data initiatives together with freedom of information act, they give unprecedented access to scientific data that is often unedited by experts. It may contain scientific uncertainty or potentially tell us more about how the world works. As researchers, do you worry about how scientific data is integrated by non-experts? Particularly when it comes to data reporting; journalism is a prime example. If you do, could you share your thoughts on how might researchers work with non-experts to ensure the ethical, efficient, and effective use of the scientific data?


Candice: Going back to that original conversation about looking at data outputs as arguments, I think there is a major disconnect right now between professional expertise, where we are trained by our community and our education to use very logical paradigms. They differ slightly by community, but there is some sort of process and regimented knowledge formation that makes it valid. When these things go out into the world, the arguments now can become emotive; they can be… there is a wide range of extra baggage that goes along with them. There is a need to educate the public and the hobbyist about what goes into the scientific method of whichever discipline. But, on the other hand, making sure that when we are looking outwards and trying to tell people “no you can’t say that, that’s not valid,” we have empathy for their position. They might not be arguing from a traditional, logical reasoning standpoint. They are arguing from, “I don’t want that to be true about my backyard because then I have to deal with it;” it becomes a real lived moment for them. Reconciling these two different goals, I think, is going to be very important to handling that discrepancy.

Fran: We are going to do one round of last words. Last question for all of you. You are all amazingly accomplished and have shown leadership. What one good piece of advice would you give to people following along in paths similar to yours? What would be a good piece of advice you would pass on?


[1:06:30] Candice: Interdisciplinary isn’t just reading scholarship outside of your field. It actually involves talking outside of your field, and I know this community is not the one who needs that lesson, but many scholars never go to conferences outside of their own home domain and so, that would be my one thing for young scholars, is actually go to the conferences of the other fields that are related to whatever you are working on and actually talk to people.

Fran: First of all, I want to thank you all. It was a privilege to moderate this panel. I want to tell you quite sincerely that you are all truly inspirational to me. I am very excited to see what you do 10 years from now, 20 years from now, next year. So thank you very much for doing this for us.