Consent Management on the Web: UX Challenges and Engagement Opportunities

1 Introduction

Two trends on the Internet are re-shaping the ability of users to control how their data is shared, involving both the sites they chose to use and with third parties of which they may be unaware. The first trend is new regulatory approaches. The European Union’s General Data Protection Regulation (GDPR) ushers in a new age of individual rights protections on the Internet 1, including new requirements for obtaining consent for previously undisclosed personal data practices. The regulation of consent emerges from a period of unprecedented and public attention to the methods by which ways data generated about people and their online behavior is misused. Those taking part in the abuse include technology sector actors like Facebook, political actors like Cambridge Analytica [4] and deceptive online advertisers, many of which are engaged in activities such as identity theft and ”click fraud” [1] Since the GDPR took effect in May 2018, other jurisdictions including India, California, and Brazil have enacted similar privacy laws or regulations.

The second trend in data regulation is a renewed emphasis on competition and differentiation by web browsers–both established and new browsers, including Apple Safari and Mozilla Firefox, have released new features designed to help users limit how their information is shared across the Internet. Browser privacy improvements are guided by research that shows broad user consensus on norms for data sharing across sites, and browsers are competing to best implement those user norms.

As a result of these trends, web users see promise for better protection for their personal data. The initial set of implementations for consent user experience, however, can present users with a confusing array of data sharing choices. This research focuses on the data consent required by GDPR, as presented by publisher sites and consent management platforms (CMPs) with the intent of complying with GDPR, and not on previously proposed standards such as Platform for Privacy Preferences [3] and Do Not Track [5]. Most popular sites have deployed some kind of consent user experience to meet GDPR requirements.

Privacy choices, as presented by sites and CMPs, are widely understood to result in both poor user experience and inadvertent selection of options that do not match the user’s privacy norms. This may because of misleading copy and design in consent interfaces, user fatigue, learned helplessness, or some mix of the three. The results of surveys on percentages of users expressing trust for particular Internet firms do not match the percentages expressing ”consent” for tracking by those firms as captured by CMPs. This data tends to indicate that consent management user experience is failing to capture users’ true preferences.

Consent experiences represent a new opportunity for sites trusted by users to work together with browsers that implement user data sharing controls, in order to both improve user experience and more accurately capture user preferences. We hypothesize that increasing user trust in data practices may increase engagement, as previous work shows that users running ad blockers have higher levels of web engagement. In a natural experiment on May 25, an alternate European version of the USA Today web site was released without tracking scripts or consent dialogs, probably because of a software schedule slip. The modified version outperformed the original site in both performance and engagement. This work builds on the ad blocking and USA Today natural experiments to test a deliberate intervention.

1.1 Publisher Sites Currently Have Inadequate Options

Improving the consent experience is a worthy goal for two reasons. Better capturing the user’s actual data sharing preferences should give a trustworthy site a sustainable advantage over an untrusted site, and creating a less confusing user experience should pay off in increased engagement that can be measured in time on site and pageviews. Having your website on a managed dedicated server to ensure that it’s running smoothly can also help improve user experience.

However, the options for purely server-side improvements are limited.

  1. 1.

    Handle user consent correctly but manually.  Sites end up with less personal data, but what they do have is better quality, with clear information about what data can be used for what purposes. The disadvantage of this method is increasing user experience burden from accurately capturing consent, by asking too many questions and tending to create a more stressful environment.

  2. 2.

    Cut back on data collection. This is a business risk when many advertisers require user data.

  3. 3.

    Use aggressively simplified consent workflows to keep doing surveillance marketing as usual. This is uncertain, especially as the client side improves privacy protections and a consent decision captured in an unclear way may not be respected by in-browser privacy tools.

There is another method to consider. This project aims to test the approach of involving the browser to help the user with the tedious work of setting the right consent bits, and provide a better experience than users can achieve manually. Global Consent Manager is a Firefox extension that implements and builds on existing consent standards. IAB Europe has published a cookie-based standard for consent, called GDPR Transparency and Consent Framework. Many of the permissions reflected in this new standard are already covered by existing preferences such as ”Do Not Track,” or can be determined from user behavior. A browser extension can fill in the necessary data in the cookie to reflect the user’s privacy preferences, without asking the user to micromanage consent.

Although the storm of GDPR permission forms appears confusing, many of these acknowledgements involve a limited set of data and practices from a known set of third parties, and consent handling is becoming more uniform across sites. For example, Google and IAB are adopting the same standard. For cases they are involved in,, a tool that effectively manages the IAB consent system will also handle consent requests on sites using Google Tag Manager, which is used on a majority of ad-supported web sites. A single browser extension can handle multiple sites’ consent requests with few additional code changes, which makes client-side consent management practical.

We released a prototype browser extension that implements a new workflow for GDPR consent forms. On first visit to a site, the extension suppresses the display of the consent form, and writes a new, temporary consent string indicating ”no consent.” On a later visit the site can present its consent form, when the user has presumably decided that the site is trustworthy enough to continue interacting with. This new consent workflow is based on a common ”growth hacking” pattern on social and collaboration sites. Sites typically build user profiles incrementally, starting with just enough data to authenticate the user on return, and get them started using the site. As users invest more time in the site, it will prompt them to fill in more and more profile information (LinkedIn is a good example.).

Until now, news sites take a less sophisticated approach. Instead of trading value for information incrementally, users are presented with a comprehensive dialog asking them for extensive consent up front. Will the incremental approach that applies to data collection for social and collaboration sites also apply to news sites?

Avoiding reflexive denial of data collection practices that match the user’s norms is a key goal. We will design the extension to facilitate users making an appropriate choice when sites they trust make a request for consent. We delivered and evaluated a browser extension. Currently the browsers can compete to do their own versions, in order to give their users a more trustworthy and less annoying experience. Browsers need to differentiate in order to attract new users and retain existing users. Right now, a good way to do that is in creating a safer-feeling, more trustworthy environment.

Avoiding reflexive denial of data collection practices that match the user’s norms is a key goal. We will design the extension to facilitate users making the appropriate choice when sites they trust make a request for consent. We delivered and evaluated a browser extension. At this point the browsers can compete to do their own versions, in order to give their users a more trustworthy and less annoying experience. Browsers need to differentiate in order to attract new users and keep existing users. Right now a good way to do that is in creating a safer-feeling, more trustworthy environment.

1.2 Related Mozilla Technology or Program

This is project is important to Mozilla’s ongoing tracking protection development work, and complements in-house development in Firefox. Browsers are currently under pressure to give users a web experience that is both safer and less time-consuming. Mozilla seeks to give users a set of protection technologies that are consistent with its values and that reflect the results of real-world testing.

  1. 1.

    Expand the set of user protection technologies that Firefox product management could choose as the default.

  2. 2.

    Communicate with advertisers, publishers, and Internet companies about our values and capabilities.

  3. 3.

    Build consultative process to understand and act on publisher values and interests to help set priorities for browser behavior.

  4. 4.

    Set appropriate defaults will depend on making the right information available.

Mozilla is placing a new emphasis on encouraging independent developers to implement and test privacy techniques, especially as part of combined filtering approaches. If successful, Global Consent Manager will be available for Firefox product management for further testing, such as research with Firefox users as a Shield study. This work is an example of independently operated user research that can feed into Mozilla’s values-centered and data-driven approach to better browser privacy.

2 Methods

To assess the effects of Global Consent Manager on user engagement with news oriented websites we conducted a pilot user study using 12 subjects in a lab study. Since our study was conducted in the US, we installed ”FoxyProxy” 2, using a server identity in Germany, to simulate the experience of a European user. Subjects were then given the following directives:

  1. 1.

    The purpose of this study is to understand how users experience news and information websites using different browser configurations.

  2. 2.

    Your Task: Find out the background, context, and involved organizations and people in stories about ”Jamal Khashoggi”. Also, what is his profession? At the end of your web research, we are going to ask you to discuss your feelings and what you understand about this individual after doing web research.

  3. 3.

    You have a choice of going to all of these websites, or just 3 of them to get information about the question in step 1:

    1. (a) \url

      https://bbc.com

    2. (b) \url

      https://dailymail.co.uk

    3. (c) \url

      https://independent.co.uk

    4. (d) \url

      https://theguardian.co.uk

    5. (e) \url

      https://worldcrunch.com

    6. (f) \url

      https://www.mediapart.fr/en/english

  4. 4.

    For each website you visit:

    1. (a)

      Bookmark information related to news item that you think you might want to refer to later (assume you have a level of interest)

    2. (b)

      Make hand written notes on paper about 3 individuals and 2 organizations that play a role in the news item, and summarize each person’s and each organization’s role.

3 Results

Engagement on news websites is what our user study evaluated. Do users with Global Consent Manager spend more time on news and information websites than users without global consent management? In short, the answer is ”yes”. The 12 users we ran through the protocol described in our methods section show statistically significant differences, with Global Consent Manager users spending a mean of 1198s and a standard deviation of 488. The control group’s mean was 734s with a standard deviation of 202. We performed Tukey and Welch t-tests, with the p-value on the Welch test being 0.079. A section of the R code and salient outputs is shown below.

 

> t.test(gsm2$Seconds ~ gs$Group)

Welch Two Sample t-test

data:  gsm2$Seconds by gsm2$Group
t = -2.1227, df = 6.9105, p-value = 0.07195

alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-983.44925   54.31591

sample estimates:
mean in group  control   mean in group  treat
733.600               1198.167

> TukeyHSD(aov(Seconds ~ gsm2$Group, gsm2))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Seconds ~ gsm2$Group, data = gsm2)

1 Introduction

Two trends on the Internet are re-shaping the ability of users to control how their data is shared, involving both the sites they chose to use and with third parties of which they may be unaware. The first trend is new regulatory approaches. The European Union’s General Data Protection Regulation (GDPR) ushers in a new age of individual rights protections on the Internet 1, including new requirements for obtaining consent for previously undisclosed personal data practices. The regulation of consent emerges from a period of unprecedented and public attention to the methods by which ways data generated about people and their online behavior is misused. Those taking part in the abuse include technology sector actors like Facebook, political actors like Cambridge Analytica [4] and deceptive online advertisers, many of which are engaged in activities such as identity theft and ”click fraud” [1] Since the GDPR took effect in May 2018, other jurisdictions including India, California, and Brazil have enacted similar privacy laws or regulations.

The second trend in data regulation is a renewed emphasis on competition and differentiation by web browsers–both established and new browsers, including Apple Safari and Mozilla Firefox, have released new features designed to help users limit how their information is shared across the Internet. Browser privacy improvements are guided by research that shows broad user consensus on norms for data sharing across sites, and browsers are competing to best implement those user norms.

As a result of these trends, web users see promise for better protection for their personal data. The initial set of implementations for consent user experience, however, can present users with a confusing array of data sharing choices. This research focuses on the data consent required by GDPR, as presented by publisher sites and consent management platforms (CMPs) with the intent of complying with GDPR, and not on previously proposed standards such as Platform for Privacy Preferences [3] and Do Not Track [5]. Most popular sites have deployed some kind of consent user experience to meet GDPR requirements.

Privacy choices, as presented by sites and CMPs, are widely understood to result in both poor user experience and inadvertent selection of options that do not match the user’s privacy norms. This may because of misleading copy and design in consent interfaces, user fatigue, learned helplessness, or some mix of the three. The results of surveys on percentages of users expressing trust for particular Internet firms do not match the percentages expressing ”consent” for tracking by those firms as captured by CMPs. This data tends to indicate that consent management user experience is failing to capture users’ true preferences.

Consent experiences represent a new opportunity for sites trusted by users to work together with browsers that implement user data sharing controls, in order to both improve user experience and more accurately capture user preferences. We hypothesize that increasing user trust in data practices may increase engagement, as previous work shows that users running ad blockers have higher levels of web engagement. In a natural experiment on May 25, an alternate European version of the USA Today web site was released without tracking scripts or consent dialogs, probably because of a software schedule slip. The modified version outperformed the original site in both performance and engagement. This work builds on the ad blocking and USA Today natural experiments to test a deliberate intervention.

1.1 Publisher Sites Currently Have Inadequate Options

Improving the consent experience is a worthy goal for two reasons. Better capturing the user’s actual data sharing preferences should give a trustworthy site a sustainable advantage over an untrusted site, and creating a less confusing user experience should pay off in increased engagement that can be measured in time on site and pageviews.

However, the options for purely server-side improvements are limited.

  1. 1.

    Handle user consent correctly but manually. Sites end up with less personal data, but what they do have is better quality, with clear information about what data can be used for what purposes. The disadvantage of this method is increasing user experience burden from accurately capturing consent, by asking too many questions and tending to create a more stressful environment.

  2. 2.

    Cut back on data collection. This is a business risk when many advertisers require user data.

  3. 3.

    Use aggressively simplified consent workflows to keep doing surveillance marketing as usual. This is uncertain, especially as the client side improves privacy protections and a consent decision captured in an unclear way may not be respected by in-browser privacy tools.

There is another method to consider. This project aims to test the approach of involving the browser to help the user with the tedious work of setting the right consent bits, and provide a better experience than users can achieve manually. Global Consent Manager is a Firefox extension that implements and builds on existing consent standards. IAB Europe has published a cookie-based standard for consent, called GDPR Transparency and Consent Framework. Many of the permissions reflected in this new standard are already covered by existing preferences such as ”Do Not Track,” or can be determined from user behavior. A browser extension can fill in the necessary data in the cookie to reflect the user’s privacy preferences, without asking the user to micromanage consent.

Although the storm of GDPR permission forms appears confusing, many of these acknowledgements involve a limited set of data and practices from a known set of third parties, and consent handling is becoming more uniform across sites. For example, Google and IAB are adopting the same standard. For cases they are involved in,, a tool that effectively manages the IAB consent system will also handle consent requests on sites using Google Tag Manager, which is used on a majority of ad-supported web sites. A single browser extension can handle multiple sites’ consent requests with few additional code changes, which makes client-side consent management practical.

We released a prototype browser extension that implements a new workflow for GDPR consent forms. On first visit to a site, the extension suppresses the display of the consent form, and writes a new, temporary consent string indicating ”no consent.” On a later visit the site can present its consent form, when the user has presumably decided that the site is trustworthy enough to continue interacting with. This new consent workflow is based on a common ”growth hacking” pattern on social and collaboration sites. Sites typically build user profiles incrementally, starting with just enough data to authenticate the user on return, and get them started using the site. As users invest more time in the site, it will prompt them to fill in more and more profile information (LinkedIn is a good example.).

Until now, news sites take a less sophisticated approach. Instead of trading value for information incrementally, users are presented with a comprehensive dialog asking them for extensive consent up front. Will the incremental approach that applies to data collection for social and collaboration sites also apply to news sites?

Avoiding reflexive denial of data collection practices that match the user’s norms is a key goal. We will design the extension to facilitate users making an appropriate choice when sites they trust make a request for consent. We delivered and evaluated a browser extension. Currently the browsers can compete to do their own versions, in order to give their users a more trustworthy and less annoying experience. Browsers need to differentiate in order to attract new users and retain existing users. Right now, a good way to do that is in creating a safer-feeling, more trustworthy environment.

Avoiding reflexive denial of data collection practices that match the user’s norms is a key goal. We will design the extension to facilitate users making the appropriate choice when sites they trust make a request for consent. We delivered and evaluated a browser extension. At this point the browsers can compete to do their own versions, in order to give their users a more trustworthy and less annoying experience. Browsers need to differentiate in order to attract new users and keep existing users. Right now a good way to do that is in creating a safer-feeling, more trustworthy environment.

1.2 Related Mozilla Technology or Program

This is project is important to Mozilla’s ongoing tracking protection development work, and complements in-house development in Firefox. Browsers are currently under pressure to give users a web experience that is both safer and less time-consuming. Mozilla seeks to give users a set of protection technologies that are consistent with its values and that reflect the results of real-world testing.

  1. 1.

    Expand the set of user protection technologies that Firefox product management could choose as the default.

  2. 2.

    Communicate with advertisers, publishers, and Internet companies about our values and capabilities.

  3. 3.

    Build consultative process to understand and act on publisher values and interests to help set priorities for browser behavior.

  4. 4.

    Set appropriate defaults will depend on making the right information available.

Mozilla is placing a new emphasis on encouraging independent developers to implement and test privacy techniques, especially as part of combined filtering approaches. If successful, Global Consent Manager will be available for Firefox product management for further testing, such as research with Firefox users as a Shield study. This work is an example of independently operated user research that can feed into Mozilla’s values-centered and data-driven approach to better browser privacy.

2 Methods

To assess the effects of Global Consent Manager on user engagement with news oriented websites we conducted a pilot user study using 12 subjects in a lab study. Since our study was conducted in the US, we installed ”FoxyProxy” 2, using a server identity in Germany, to simulate the experience of a European user. Subjects were then given the following directives:

  1. 1.

    The purpose of this study is to understand how users experience news and information websites using different browser configurations.

  2. 2.

    Your Task: Find out the background, context, and involved organizations and people in stories about ”Jamal Khashoggi”. Also, what is his profession? At the end of your web research, we are going to ask you to discuss your feelings and what you understand about this individual after doing web research.

  3. 3.

    You have a choice of going to all of these websites, or just 3 of them to get information about the question in step 1:

    1. (a) \url

      https://bbc.com

    2. (b) \url

      https://dailymail.co.uk

    3. (c) \url

      https://independent.co.uk

    4. (d) \url

      https://theguardian.co.uk

    5. (e) \url

      https://worldcrunch.com

    6. (f) \url

      https://www.mediapart.fr/en/english

  4. 4.

    For each website you visit:

    1. (a)

      Bookmark information related to news item that you think you might want to refer to later (assume you have a level of interest)

    2. (b)

      Make hand written notes on paper about 3 individuals and 2 organizations that play a role in the news item, and summarize each person’s and each organization’s role.

3 Results

Engagement on news websites is what our user study evaluated. Do users with Global Consent Manager spend more time on news and information websites than users without global consent management? In short, the answer is ”yes”. The 12 users we ran through the protocol described in our methods section show statistically significant differences, with Global Consent Manager users spending a mean of 1198s and a standard deviation of 488. The control group’s mean was 734s with a standard deviation of 202. We performed Tukey and Welch t-tests, with the p-value on the Welch test being 0.079. A section of the R code and salient outputs is shown below.

 

gsm2$Group` diff lwr upr p adj treat- control 464.5667 -66.93345 996.0668 0.0794069

The difference between control and treatmentis illustrated with discrete points as well as color coded indications of the range in each case in the next figure. We can clearly see that Global Consent Manager users are engaged on news websites for longer periods of time in this study.

Figure 1: The distribution of engagement times between the treatment and control groups show a statistically significant difference. This figure illustrates the distribution visually

 

4 Discussion and Next Steps

Our lab experiment shows promise for increasing engagement with reputable online news organizations by helping users manage their privacy in accordance with GDPR and similar regulations. A larger field study, such as a Mozilla Shield study is, we believe, an appropriate the next step, much like similar studies of engagement that showed earlier promise in the past [2].

5 Acknowledgements

This work was funded by the Reynolds Journalism Institute 3.

6 Appendix A: Team Background

6.1 Researcher Bios with Links to Existing Work

Dr. Sean Goggins is an active technology mediated community researcher with a focus on applying a rich collection of machine learning and social network analysis methods to uncover patterns human action on social platforms like GitHub. The results of his analysis include hundreds of computational models used to indicate levels of learning, community health, performance, discussion trajectory and collaborative. The National Science Foundation, Alfred P. Sloan Foundation, Office of Naval Research, Department of Education, the Enterprise Foundation, and MD Consult have funded his research in this area. Finally, Sean is an accomplished Computer Supported Cooperative Work, Small Group and Social Informatics Researcher, including winning best paper awards for his work in gaming analytics design (2010) and the systematic application for trace data for learning in Information Technology and People (2011). Relevant work includes:

  1. 1.

    CHAOSS Project founding member: https://wiki.linuxfoundation.org/oss-health-metrics

  2. 2.

    Open Source Health Metrics on GitHub: http://bit.ly/2pXVvHl

  3. 3.

    Performance and Participation on GitHub: http://bit.ly/2pY1r2T

  4. 4.

    Structural Fluidity in Open Source Software Projects: http://bit.ly/2qTa9EA

  5. 5.

    Building Social Computing Theory: http://bit.ly/2qT7TgB

7 University of Missouri Open Community Health and Sustainability Lab (AugurLabs)

The open community health and sustainability lab is focused on building metrics and indicators that help contributors, participants and managers develop awareness how the open communities they contribute to are doing. Our main focus is an integration of social, computational and visual representations of a) which factors related to health and sustainability of open organizations are important for understanding their health and trajectory and b) how to measure the qualitative and quantitative evolution of those indicators over time. Current Projects: AugurLabs is actively engaged in a Sloan Foundation funded project to develop open source project health and sustainability metrics through a Linux Foundation working group we helped form, ”CHAOSS” (Community Health and Open Source Sustainability), an NSF Funded project focused on building an open collaboration data exchange, and analytics to make sense of student learning and progress in games for learning (Mission Hydro Sci). In all cases we are doing engaged field research, building software and disseminating results through publications and speaking engagements. The Team: Our team is composed of, at any given time, 4-6 undergraduate software developers, 2-3 Ph.D students, and 2-3 university faculty. Our undergraduates focus on building usable software that’s deployed in the wild, for public use. Our graduate students and faculty make design, code and documentation contributions to those projects, in addition to creating blog posts, academic papers and public presentations. Software Development Capabilities: Our six undergraduates are engaged in competitions at the Reynolds Journalism Institute at the University of Missouri in addition to projects in our lab. The technologies we use include Python, Full Stack NodeJS, R, and graph database technologies. You can learn more about our work through the GitHub repositories at:

  1. 1.

    https://www.github.com/CHAOSS

  2. 2.

    https://www.github.com/OSSHealth

  3. 3.
  4. 4.

    http://augurlabs.io

  5. 5.
  6. 6.

    http://www.sociallycompute.io

  7. 7.
  8. 8.

References

  • [1] D. R. Kayalvizhi, K. Khattar and P. Mishra (2018) A Survey on Online Click Fraud Execution and Analysis. 13 (18), pp. 5 (en). Cited by: 1.
  • [2] B. Miroglio, D. Zeber, J. Kaye and R. Weiss (2018) The Effect of Ad Blocking on User Engagement with the Web. In Proceedings of the 2018 World Wide Web Conference on World Wide Web – WWW ’18, Lyon, France, pp. 813–821 (en). External Links: ISBN 978-1-4503-5639-8, Link, Document Cited by: 4.
  • [3] L. Ni, C. Li, H. Liu, A. G. Bourgeois and J. Yu (2018) Differential Private Preservation Multi-core DBScan Clustering for Network User Data. Procedia Computer Science 129, pp. 257–262 (en). External Links: ISSN 18770509, Link, Document Cited by: 1.
  • [4] N. Persily (2017) Can Democracy Survive the Internet?. Journal of Democracy 28 (2), pp. 63–76 (en). External Links: ISSN 1086-3214, Link, Document Cited by: 1.
  • [5] O. Tene and J. Polonetsky (2011) To Track or ’Do Not Track’: Advancing Transparency and Individual Control in Online Behavioral Advertising. SSRN Electronic Journal (en). External Links: ISSN 1556-5068, Link, Document Cited by: 1.

Statement on University of Minnesota Linux kernel experiment

Open Source Software runs the core infrastructure we all use everyday, the University of Minnesota trains countless graduates in a top tier computer science department, and the Linux Foundation is a significant shepherd of great ideas in open source software. The Linux Foundation, and the University of Minnesota are two of the organizations I hold in extraordinary high regard. Since something happened yesterday, and I feel strong ties to friends in open source, and friends at the University of Minnesota, I felt like I wanted to highlight the general greatness of each while making assorted other, more random (though possibly Nobel Prize winning) comments.

I made a few statements on Twitter about a bit of a fiasco yesterday, involving research from my graduate school Alma Mater, Minnesota Gophers, and my professional friends at The Linux Foundation. Briefly, potential security bugs were introduced into the Kernel as part of a Research project. The details are super techy, and the work should not have been done the way it was done, IMHO. Suffice to say:

A. Minnesota’s CS department is top tier, I have many friends there, and this is absolutely not how my friends there roll. We are talking international leaders in their computing fields. My friends at Minnesota are leaders in their fields, and really good people.

B. The Linux Foundation has advanced open production, and the technology sector of our economy as much as any other organization. My friends at the Linux Foundation are really good people, and leaders in their fields.

C. Minnesota’s mistake will not be repeated. I trust that completely. The signatories on the letter in my Twitter feed are two of the highest character humans I know. And that says a lot for Matt’s, who really pissed me off by letting my 98 in software engineering stand as an A- (IMHO, because I was caught red handed satirizing the faculty member who taught it. I can be an Asshole) my first semester in grad school. He was right. It’s one of many ways I have observed the integrity of Loren and Matt’s in action.

Phil agre, 2020, Social Media, and Dr. Agre’s seminal work: (through the lens of sean goggins’ adjacent scholarship, 2010-2020.)

From Phil Agre’s 2004 book chapter, “The Practical Republic”. Dr. Agre disappeared himself some years later, and those who know him (sadly, I am not one because I admire his work) ultimately did locate him alive, and purposefully isolated from society:

“So the three elements of social capital — networks, trust, and social skills — are interrelated. And the element of social skills should not be taken for granted. Many people grow up in environments where the necessary social skills do not exist, either because everyone is too busy scratching out a subsistence living, or because they have acquired the social skills they need to live in a different kind of society, or because they have internalized conservative ideologies that keep them from creating associations that might threaten established interests. People from such a disadvantaged background might excel in school and get a good job, only to stall in their careers because they are not building strong networks [4]. People whose careers stall in this way are often mystified; they are working hard, doing what they are told, projecting a positive attitude, and generally exercising the skills that are required to get along in a clientelistic world. But they lack the skills of association. Indeed, they probably lack even a clue that the skills of association exist. They might decide that they are being discriminated against (which does happen, skills or no), or that they really do deserve the subordinate social status to which they had originally been assigned. Either would be a tragedy compared to a world in which the necessary skills are universal.”

My first doctoral student, Christopher Mascaro https://scholar.google.com/citations?user=UKtQbogAAAAJ&hl=en&oi=ao inspired within me my own unexplored curiosity about social media and politics in 2010 (I left for Missouri before he graduated, and his utlimate Drexel supervisor, Denise Agosto and I collaborated with Christopher on the later publications in this thread). Early on, Scott Robertson, and Ravi Vatrapu pointed us in the right direction, and our papers built on theirs, which built on Agre’s. Our last discussion of it was in January 2017 in Hawaii. Its a thread I intend to pick up again one day, when I am emotionally ready .. The intersection of politics with social computing that ignited our collaboration resulted in a number of papers Christopher and I wrote together, often in partnership and collaboration with other students (either my supervisees before leaving Drexel, or students of my colleagues). Including, with dire recognition I will omit somebody …

– Alan Black https://scholar.google.com/citations?user=ScNY2rEAAAAJ&hl=en&oi=ao

– Elizabeth Garwood (Thiry) https://scholar.google.com/citations?user=kuO5oKMAAAAJ&hl=en&oi=ao (Penn State, covertly borrowed providing all involved plausible deniability, but if you’re looking for somebody to blame, its definitely Andrea Tapia)

– Nora McDonald https://scholar.google.com/citations?user=u3BoLzgAAAAJ&hl=en&oi=ao (When I asked her to move with me to Missouri, she looked up where that was on Google Maps, and politely answered, “no” within five minutes. *Very* sharp. My extraordinarily talented colleague [mentor, competitor, inventor of Wikipedia, at least for Academics], Andrea Forte, supervised her dissertation work at Drexel, and last I heard she was in a post doc at Maryland with another exceptional social computing scholar, Helena Mentis (You can correct my characterization ?? if needed.)

– Rachel Magee https://scholar.google.com/citations?user=OGHTlHoAAAAJ&hl=en&oi=ao

– Alison Novak https://scholar.google.com/citations?user=iX2aHEQAAAAJ&hl=en&oi=ao and

– Ian Graves https://scholar.google.com/citations?user=GxMkRhcAAAAJ&hl=en&oi=ao (supervised by the inimitable Bill Harrison, and borrowed to collaboration with the understanding that I slowed down his graduation, there would be consequences ?)

And most recently James Chester Bain, whose dissertation work examining anti-immigrant sentiment and hate speech on Twitter, as it relates to local conditions, using data from Twitter, numerous US Government agencies, and the United Nations … then analyzing the relationships using convolutional and gated neural networks (AI, machine learning, deep learning, pick a buzzword, none of it is magic, as James will attest) … intersects with the idea of issue entrepreneurship that Christopher and I first examined in the first paper we wrote. [James is a post-doc in Calgary, and his dissertation work is proceeding toward publication.]

I cannot say my collaborators and I have cited this work of Agre’s in every publication, but I can confidently say most (which is also true of Ravi and Scott’s work) Agre is prescient, disquieting, and passes a regular person’s “face validity” test. Vatrapu and Robertson narrow in on the earliest specific mechanisms of how what Agre saw in the distance were coming to be in the late 2000’s and early 2010’s.

All this work is on my mind a lot today, because the roads of understanding how information and misinformation spreads on the internet is one thread I followed in parallel. Mainly with Josh Introne (Syracuse), in the context of online health forums, https://scholar.google.com/citations?user=-u-TgXAAAAAJ&hl=en&oi=ao, and more recently with Bryan Semaan and Ingrid Erickson as collaborators with Josh. I am most proud of our paper on the Reification of Advice; because we took the time to find the key insights, the results are among the most novel and potentially impactful findings I’ve ever reported, and we worked on that paper for around 7 years.

After more than a decade examining the force of social media, and the empires of social media, one starts to see the trajectory of our existence through technology as either a new hope, or some kind of revenge of the Sith. </Star Wars Allusion>

Today I spend my time exploring what I think of as the ‘light side’ of “The Force” — open source software health and sustainability, and games for learning. I had to stop examining political discourse online because it kept feeling darker, and darker to me. My interest in open source intersected with Nora McDonald’s interests, and I’ve kept at that because its an area of inquiry in social computing with more hope than despair.

Matt Germonprez at Nebraska-Omaha has been a great collaborator the past 7 years. That’s where most of my energy is right now: Open Source Software Health and Sustainability.

https://chaoss.community and

http://augurlabs.io/ and

https://github.com/chaosss/augur

But back to social media, civil discourse, and society …. after all that work, I don’t think anyone summed up what did happen, 12 and 16 years before it happened, as well as anything my esteemed colleagues and I produced during and after the transformation of our political process on Alderaan (OK, this is seriously my last Star Wars allusion)., er, earth, as Agre did right here.

I guess we call that a “seminal work”, or standing on the shoulders of giants. I just think its an important read, and I wanted you to understand why. I wanted you to explain, I suppose, why it is intellectually, and professionally a bridge too far for me to see false information, wrapped in platitudes and the American flag, and not fact check it in the comments. About 90% of the responses are vile. Agre’s work fortells much of the “why”.

If you have trouble sleeping at night, you can find the specific publications with collaborators mentioned above defiantly posted on my website here: https://www.seangoggins.net/publications/ .. I say defiantly because publishers like their paywalls. Personally, I prefer my work be shared with the public who pays for it, and I have seen Elsevier’s balance sheet. They will be OK.

It is only really now that I understand this chapter, though. I have enjoyed the past four years with a partner who has the social skills, in real life, and on social media, along with the passion to intercede in the political catastrophe the US began in 2016, and ends no later than January 20, 2021 at noon. I didn’t really know what Agre was talking about deeply enough until I observed and supported Kate Canterbury in her political advocacy, and organizing work these past four years. If social skills are rated 1-10, Kate’s an 11, and I am maybe a 6 on a good day.

Finally, the Agre chapter I contextualized so deeply as part of my own experience that you have probably now called the fire department to get the boy out of the well …

https://pages.gseis.ucla.edu/faculty/agre/republic.html

Metrics With Greater Utility: The Community Manager Use Case

1 Introduction

Community managers take a variety of perspectives, depending on where their communities are in the lifecycle of growth, maturity and decline. This is an evolving report of what we are learning from community managers, some of whom we are working with on live experiments with a CHAOSS project prototyping software tool called Augur (http://www.github.com/CHAOSS/augur). At this point we are paying particular focus to how community managers consume metrics and how the presentation of open source software health and sustainability metrics could make them more and in some cases less useful for doing their jobs.

Right now, based on Augur prototypes and follow up discussions so far, we have the following observations that will inform our work both the the ”Growth Maturity and Decline” working group and in Augur Development. There are a few things we have learned from prototyping Augur with community managers. These features in Augur are particularly valued:

  1. Allowing comparisons with projects within a defined universe is essential
  2. Allow community managers to add and remove repositories that they monitor from their repertoires periodically.
  3. Downloadable graphics
  4. Downloadable data (.csv or .json)
  5. Availability of a ”Metrics API”, limiting the amount of software infrastructure the CM needs to maintain for themselves. This is more valued by program managers overseeing larger portfolios right now, but we think has potential to grow as awareness of the relatively light weight of this approach becomes more apparent. By apparent, we really mean ”easy to use and understand”; right now it is for a programmer, but less so for a community manager without this background or current interest.

2 Date Summarized Comparison Metrics

With these advantages in mind, making the most of this opportunity to help community managers with useful metrics is going to include the availability of date summarized comparison metrics. These types of metrics have two ”filters” or ”parameters” fed into them that are more abstractly defined in the Growth, Maturity and Decline metrics on the CHAOSS project.

  1.  Given a pool of repositories of interest for a community manager, rank them in ascending or descending order by a metric.
  2.  Over a specified time period or
  3.  Over a specified periodicity (i.e., month) for a length of time (i.e., year).

For example, one open source program office we talked with is interested in the following set of date summarized comparison metrics. Given a pool of repositories of interest to the program office (dozens to hundreds of repositories):

  1.  What ten repositories have the most commits this year (straight commits, and lines of code)?
  2.  How many new projects were launched this year?
  3.  What are the top ten new repositories in terms of commits this year (straight commits, and lines of code)?
  4.  How many commits and lines of code were contributed by outside contributors this calendar year? Organizationally sponsored contributors?
  5.  What organizations are the top five external contributors of commits, comments and merges?
  6.  What are the total number of repository watchers we have across all of our projects?
  7.  Which repositories have the most stars? Of the ones new this year? Of all the projects? Which projects have the most new stars this year?

3 Open Ended Community Manager Questions to Support with Metrics

There are other, more open ended questions that may be useful to open source community managers:

  1.  Is a repository active?
    1.  Visual differentiation that examines issue and commit data
    2.  Activity in the past 30 days
    3.  Across all repositories, present the 50th percentile as a baseline and show repositories above and below that line.
  2.  Should we archive this repository?
    1.  Enable an input from the manager after reviewing statistics.
    2.  Activity level, inactivity level and dependencies
    3.  Mean/Median/Mode histogram for commits/repo
  3.  Should we feature this repository in our top 10? (Probably a subjective decision based on some kind of composite scoring system that is likely specific to the needs of every community manager or program office.)
  4. Who are our top authors? (Some kind of aggregated contribution ranking by time period [year, month, week, day?]. nominally, I have a concern about these kinds of metrics being ”gameable”, but if they are not visible to contributors themselves, there is less ”gaming” opportunity.)
  5.  What are our top repositories? (Probably a subjective decision based on some kind of composite scoring system that is likely specific to the needs of every community manager or program office.)
  6.  Most active repositories by time period [Week? Month? Year?]. Activity to be revealed through a mix of Retention and Maintainer activity primarily focusing on the latter. Number of issues and commits. Also the frequency of pull requests and the number of closed issues.
  7.  Least active repositories by time period [Week? Month? Year?]. Bottom of scores calculated, as above.
  8.  Who is our most active contributor (Some kind of aggregated contribution ranking by time period [year, month, week, day?]. nominally, I have a concern about these kinds of metrics being ”gameable”, but if they are not visible to contributors themselves, there is less ”gaming” opportunity.)
  9.  What new contributors submitted their first new patches/issues this week? (Visualization Note: New contributors can be colored in visualizations and then additionally a graph can be made for number of)
  10.  Which contributors became inactive? (Will need a mechanism for setting ”inactive” thresholds.)
  11.  Baseline level for the ”average” repository in an organization and for each, individual organization repository.
  12.  What projects outside of a community manager’s general view (GitHub organization or other boundary) doe my repositories depend on or do my contributors also significantly contribute to?
  13.  Build a summary report in 140 characters or less. For example, ”Your total commits in this time period [week? month?] across the organization increased 12% over the last period. Your most active repositories remained the same. You have 8 new contributors, which is 1 below your mean for the past year. For more information, click here.”
  14.  Once a metrics baseline is established, what can be done to move them? 1
  15.  Are there optimal measures for some metrics?
    1. (a) Pull request size?
    2. (b) Ratio of maintainers to contributors?
    3. (c) New contributor to consistent contributor ratio?
    4. (d) New contributor to maintainer ratio?

4 Augur Specific Design Change Recommendations

Next is a list of Augur specific design changes suggested thus far, based on conversations with community managers.

  1.  Showing all of the projects in a GitHub organization in a dashboard by default is generally useful.
  2.  Make the lines more clear in the charts, especially when there are multiple lines in comparison
  3.  How to zoom in and out is not intuitive. In the case of Google Finance, for example, a default, subset period was displayed when they used the ”below the line mirrored line” interface this is modeled after. After seeing the indices charts, That old model makes it fairly clear that the ability to adjust the range of dates iswhat that box below the line in google finance is for. On the contrary, video calling applications like Zoom make it really intuitive for the user to zoom the video despite the other user using zoom backgrounds. Alternately, Google’s more updated way of representing time, providing users choices, and showing comparisons may be even more useful and engaging. In general, its important that the time zooming is more clear.Figure 1: In one view, Google lets you see a 1 year window of a stock’s performance.Figure 2: In another view, you can choose a 3 month period. Comparing the two time periods also draws out the trend with red or green colors, depending on whether or not the index, in this case a stock’s price, has increased or decreased overall during the selected time period.Figure 3: Comparisons are similarly interesting in Google’s finance interface. You can simply add a number of stocks in much the same way our users want to add a number of different repositories.
  4.  For the projects a community manager chooses to follow, go ahead and give them comparison checkboxes at the top of the page. I think from a design point of view, we should limit comparisons as discussed, to 7 or 8, simply due to the limits in human visual perception.
  5.  The ability to adjust the viewing windows to a month summary level is desired.
  6.  Right now Augur does not make it clear that metrics are, by default, aggregated by week.
  7.  New contributor response time. When a new contributor joins a project, what is the response time for their contribution?
  8.  A graph **comparing** commits and commit comments on x and y axes **between projects** is desired. Same with Issue and Issue comments.
  9.  In general, the last two years of data gets the most use. We should focus our default display on this range.

5 Data Source Trust Issues

  1.  Greater transparency of metrics data origins will be helpful for understanding discrepancies between current understanding and what metrics show.
    1.  We should include some detailed notes from Brian Warner about how Facade is counting lines of code, and possibly some instrumentation to enable those counts to be altered by user provided parameters.
    2.  Outside contributor organization Data. One community manager reported that their lines of code by organization data seems to look wrong. I did explain that these are mapped from a list of companies and emails we put together, and getting this right is something community managers will need some kind of mapping tool to do. GitDM is a tool that people sometimes use to create these maps, and Augur does follow a derivative of that work. Its probably the case that maintaining these affiliation lists is something that needs to be made easier for community managers, especially in cases where the number of organizations contributing to a project is diverse (there is a substantial range among community managers we spoke with. Some are managing complex ecosystems involving mostly outside contributors. Most are in the middle. And some of contributor lists highly skewed toward their own organization.)
  2.  GHTorrent data, while excellent for prototyping, faces some limitations under the scrutiny of community managers. For example, when using the cloned repositories, and then going back to *issues*, the issues data in GHTorrent does not ”look right”. I think the graph API might offer some possibilities for us to store issue statistics we pull directly from GitHub and update periodically as an alternative to GHTorrent.
  3.  When issues are moved from an older system, like Gerrit, into GitHub issues, in general the statistics for the converted issues are dodgy, even through the GitHub API. We are likely to encounter this, and at some point may want to include Gerrit data in a common data structure with issues from GitHub and other sources.

6 New Metrics Suggested

  1.  Add metric ”number of clones”
  2.  ”Unique visitors” to a repository is a data point available from the GitHub API which is interesting.
  3.  Include a metric that is a comparison of the ratio of new committers and total committers in a time period. Or, perhaps simply those two metrics in alignment. Seeing the number of new committers in a set of repositories can be a useful indication of momentum in one direction or another; though I hasten to add that this is not canonically the case.
  4.  Some kind of representation of the ratio between commits and lines of code per commit
  5.  Test coverage within a repository is something to consider measuring for safety critical systems software.
  6.  Identifying the relationship between the DCO and the CLA.
  7.  There is a tension between risk and value that, as our metrics develop in those areas, we are well advised to keep in mind.
  8.  The work that Matt Snell and Matt Germonprez at the University of Nebraska-Omaha are starting related to risk metrics is of great interest. Getting these metrics into Augur is something we should plan for as soon as reasonably possible.

7 Design Possibilities

7.1 Augur

For Augur, I think the interface changes that enable comparisons and adjust the level of self apparent ways to compress or expand the time, as per the Google examples, are at the top of the list of things that will make Augur more useful for Kate and other community managers. Feedback on these notes will be helpful. I think the new committers to committers ratio is important, as well as enabling comparisons across projects in the bubble graphs as well. Transparency of data sources and limitations of data sources for both the API and the front end, which are above average but not complete, are important.

7.2 Growth Maturity and Decline Working Group

Many of the metrics of interest to community managers fall under the ”growth maturity and decline” working group. From a design perspective it appears that, possibly, the way that metrics are expressed and consumed by these stakeholders in their individual derivatives of the community manager use case is quite far removed from the detailed definition work occurring around specific metrics. Discussion around an example implementation like Augur is helping draw out some of this more ”zoomed out” feedback. The design of system interfaces frequently includes the need to navigate between granular details and the overall user experience Zemel et al. (2007); Barab et al. (2007). This is less of a focus in the development of software engineering metrics, though recent research is beginning to illustrate the criticality of visual design for interpreting analytic information González-Torres et al. (2016).

8 Acknowledgements

Many members of the CHAOSS community contributed to this report and analysis. I am happy to share names with permission from the contributors, but I have not requested permission as of the publication date.

References

  • S. Barab, T. Dodge, M. Thomas, C. Jackson and H. Tuzun (2007) Our designs and the social agendas they carryJournal of the Learning Sciences 16 (2), pp. 263–305. Cited by: §7.2.
  • A. González-Torres, F. J. García-Peñalvo, R. Therón-Sánchez and R. Colomo-Palacios (2016) Knowledge discovery in software teams by means of evolutionary visual software analyticsScience of Computer Programming 121, pp. 55–74 (en). External Links: ISSN 01676423, LinkDocument Cited by: §7.2.
  • A. Zemel, T. Koschmann, C. LeBaron and P. Feltovich (2007) What are we Missing? Usability’s Indexical GroundComputer Supported Cooperative Work. Cited by: §7.2.

Phil Agre’s Practical Republic (Because UCLA Finally Took his Pages Down)

Phil Agre wrote thoughtfully and critically about artificial intelligence and the role of technology in the political process (among other things).  The takeaways I have from this paper include

  1. Social skills are essential for anyone seeking influence in the political process
  2. A lot of political theory to date completely misses this essential point
  3. Issue entrepreneurship is a more effective path of influence for most individuals.
  4. Phil Agre was ~20 years ahead of his time

The attachment of the article is provided in the interest of the public good.

Agre – 2004 – The practical republic Social skills and the prog

 

Data Science and Analytics Program Founded by Dr. Goggins Wins Award

The University of Missouri Data Science and Analytics program received the Outstanding Credit Program Award from the University Professional and Continuing Education Association (UPCEA) during the Central Region Conference in St. Louis.
The Data Science and Analytics Masters Program was conceptualized by  Dr. Sean P. Goggins and Dr. Chi-Ren Shyu in the spring of 2013, following Dr. Goggins work on a similar program at Drexel University and Dr. Shyu’s long standing work in data scientific oriented endeavors, including founding the MU Informatics Institute over a decade ago.
Through support from the Mizzou Advantage fund, Grant Scott joined our leadership team in 2015. Later in 2015, core DSA Faculty from across campus signed on to the effort, including:
  1. Yi Shang shangy@missouri.edu : Computer Science, Course Coordinator
  2. Dong Xu xudong@missouri.edu : Computer Science
  3. Joshi, Trupti joshitr@missouri.edu : Computer Science
  4. Tenaja, Harsh tanejah@missouri.edu : Journalism
  5. Thorson, Esther L. thorsone@missouri.edu : Strategic Communications, Course Coordinator
  6. David Herzog herzogd@missouri.edu : Journalism, Course Coordinator
  7. Uhlmann, Jeffrey uhlmannj@missouri.edu : Computer Science
  8. Gibson, Twyla G. gibsontg@missouri.edu : School of Information Science and Learning
  9. Technologies
  10. Sanda Erdelez erdelezs@missouri.edu : School of Information Science and Learning
  11. Technologies
  12. Shyu, Chi-Ren shyuc@missouri.edu : Director, MU Informatics Institute
  13. Joi Moore moorejoi@missouri.edu : School of Information Science and Learning
  14. Technologies, Course Coordinator
  15. Ersoy, Ilker ersoyi@health.missouri.edu : Biotechnology, Course Coordinator

Helpful and Useful – The Open Source Software Metrics Holy Grail

1 Introduction

My colleague Matt Germonprez recently hit me and around 50 other people at CHAOSSCON North America (2018) with this observation:

A lot of times we get really great answers to the wrong questions.

Matt explained this phenomena as ”type III error”, an allusion to the more well known statistical phenomena of type I and type II errors. If you are trying to solve a problem or improve a situation, sometimes great answers to the wrong questions can still be useful because in all likelihood somebody is looking for the answer to that question! Or maybe it answers another curiosity you were not even thinking about. I think we should call this _metric encountering  Erdelez (1997). There’s an old adage:

Even a blind squirrel finds a nut every once in a while.

For open source professionals a ”Blind Squirrel” is little more than the potential name for a Jazz trio, and probably not the right imagery for explaining to your boss that you’re ”working on open source metrics”. Yet these blind squirrels will encounter nuts a LOT more often if we make more nuts! ”Metrics are nuts!”. Not a good slogan, but that’s my metaphor. Making more metrics is easy for us because we have lots of data, we write software, and it stands to reason that more _metrics encountering is going to generate more useful metrics. If you are the blind squirrel, its useful to find metrics.

Can you imagine all the useful things blind squirrels would find if we let them loose in an Ikea? ”I came for the Swedish meatballs, I left with 2 closet organizing systems and a new kitchen”! A lot of things are useful, but in order for something to be helpful it needs to help you meet an important goal. To summarize:

  • – Useful: Of all the different things I find in the Ikea, many of them are useful. Or, there are 75 metrics on this dashboard, and 3 of them are useful!
  • – Helpful: You go into the endeavor with a goal, and leave with 3 metrics that help you achieve that goal. Or, you’re a blind squirrel that just ordered nuts online from Ikea.

2 Open Source Software Health Metrics: Lets go Crazy! Lets Get Nuts!

Great answers to the wrong questions are more commonplace than we prefer because open source software work is evolving quickly and we do not yet have a list of the right questions for many specific project situations. Lets refer to questions as ”metrics” now. Questions and metrics are nuts! Still a terrible slogan. Sometimes we do not know the question-metric-nut and foraging through a forest of metrics is, if not helpful, a way to reduce the rising anxiety we feel when we are not sure what data helps to support our explanation of what is happening in a project ecosystem. So, if like me and dozens of others working in and around the CHAOSS project, you are trying to achieve a goal for your project there are two orthogonal, strategic starting points our colleague in CHAOSS, Jesus M. Gonzalez-Barahona, suggests:

  1. 1. Goals: What are metrics going to help you accomplish?
  2. 2. Use Cases: When you go to use metrics, what are the use cases you have? A case can be simple, ill formed and even ’unpretty’:
    1. (a) ”My manager wants to know if anyone else is working on this project?”
    2. (b) ”It seems like my community is leveling off? Is it? Or is it just so large now I cannot tell?”

2.1 Taking Action by Sharing Goals and Use Cases

Having a yard full of nuts to sort through can help you work toward the nuts you want. OK. The nut metaphor has gone too far. We are looking to use software, provided as a prototype and an example to help talk through the details of use cases you name. With you. The use cases of open source developers, foundations, community managers and others use to evaluate open source software health and sustainability metrics are probably a manageable number.

We can give you some metrics to work with quickly using the CHAOSS sponsored metrics prototyping tool Augur.

What are we trying to accomplish with metrics? With Augur? One of our goals is to make it easier for open source stakeholders to ”get their bearings” on a project and understand ”how things are going”. We think that’s most easily accomplished when comparisons to your own project over time, and other projects you are familiar with are readily available. Augur makes comparisons central.

2.2 Building Helpful Metrics

If you have already shared a list of repositories you are interested in with us, here’s what you have;

  1. 1. an Augur site with those repos
  2. 2. The opportunity to look at that site and help the whole CHAOSS community know:
    1. (a) What use cases which particular metrics help you address
    2. (b) What goals you have that could be met by something like Augur, but you cannot meet yet
    3. (c) Something to hate. If you’ve ever been to an NHL game, you know that hating the other team is how we show our team we love them. Its also a good brainstorming device.

So, OK. What do you want?

We want the opportunity speak with you about your goals, use cases, and the failings of tools currently at your disposal for ”getting there”. If you’re feeling adventurous, I would like to be able to reference our conversations (anonymously) in research papers, because research papers are kind of the ”code of the academic world”. That’s less important.

2.3 An Augur Experiment

AUGUR

If you do not have a list of repositories you have already shared with us, there are a few examples here: http://www.augurlabs.io/live-examples/.

Design Goals

The version of Augur that’s currently deployed has several design goals that seek to provide useful information through comparison within a project (over time) and across projects. The most fundamental metrics people are interested in include

  • – What individuals committed the most lines of code in a time period?
  • – From what companies or other organizations are the individuals who committed the most lines of code in a time period?
  • – Derivative of the first two: Is this changing? Did I lose anyone? Who can this project NOT afford to lose?

Projects You Care About

Figure 1 is an example from Twitter, which shows an instance of Augur configured for all of the repositories in the Twitter ecosystem. When you go to http://twitter.augurlabs.io you get the list of repositories that you see in figure 1.


Figure 1:
When you follow the URL above, or your own URL, you will see a list of repositories that we have cloned, and using the technology behind ”Facade”, a tool written by Brian Warner, calculated all the salient, basic, individual repository information about. Here’s a list of those repositories.

Looking at my projects

When I look at the most basic data for one of my repositories, I have enough information to answer the most basic questions about it (See above). Figure 2 and Figure 3 illustrate the Augur pages you will see at the next level of ”drill down”. Try clicking the months for even more information! Keep in mind this is ONLY the information for the repositories you shared with us, or the repositories part of one of our other live examples.

Figure 2:You can see the lines of code from the top two authors, as well as the space inefficient Augur tool bar. Please contact me if you have tips and tricks for getting developers to be more comfortable with putting aesthetics behind utility in web page design. I will buy you a case of beer.

Figure 3 is a second image of the same page, but scrolled down just far enough to see that you can look at the top ten contributors as well as the top organizational contributors. We used a list of over 500 top level domains, as well as tech companies we were able to ”guess” to start to resolve even these prototypes to specific companies. We did this because Amye asked us to, and we’re really gunning to make Gluster have more lustre. As if that’s possible.

Figure 3: A more detailed look at some of the information available on a repository by repository basis in Augur. We also show you the organizational affiliation information.

3 Explore the Rest of Augur

The focused repositories give that information which many open source folks tell us is their first line of interest when looking at their own projects. Keeping this conversation going is essential for the CHAOSS project, and for Augur’s utility for helping us identify which metrics map to which use cases and goals. There’s a lot here, and it might give you ideas. Also, as you go through the front end, keep in mind that all of the statistics you see represented as metrics are also available via our Restful API. You can use our data to explore building your own metrics. Or get an app developer to do that for you. Figure 4 provides a high level overview of the metrics representations on Augur that are built off the GitHub API, GHTorrent and Facade’s technology.

Figure 4: There’s a lot here. At the top of the screen you can enter an owner and a repository name to get information about a particular repository. Each of the CHAOSS Metric working groups are represented in tabs at the top of the screen (number 1). The repository you just searched for is listed below the metric category (number 2). The metric name is listed in the title (number 3), and that title corresponds with a CHAOSS metric that is linked below the graphic. These are line graphs, though other visualization styles are readily available, and the line over time is shown by (number 4). The gray area around (number 4) is the standard deviation. (Number 5) is a slider like you see on Google Finance, so you can zoom in on one period of time more closely. Finally, (number 6) has a LOT of different configuration and filtering options you can explore.
Figure 5: Here is a WAY zoomed out overview of the Growth, Maturity and Decline metrics you might see on the Augur page. (Number 1) is where you might enter another ”owner/repo” combination to compare your repository to. (Number 2) illustrates that sometimes there is no data available from the source we use for a particular metric.

Figure 6: This shows you two repositories compared with each other in Augur. Does this fit any of your use cases or goals? How would you make it different? (Number 1) shows what two repositories are being compared. (Number 2) shows the key for knowing which project is which. (Number 3) points out, again, that you can see the CHAOSS definition for the metric any time you like. To the right, you can also see how .json, .csv and .svg representations of the data can be downloaded for you to make whatever use you would like to make of it.)

4 Our Ask: Goals and Use Cases

Metrics use cases

What are the questions you have about your project? What metrics will help you to make clearer sense of the answer to that question in a productive way?

Give us your use cases

Walk through trying to solve the use case? Where do you get stuck? How might the use case become generalized? If you are expert in openstack you can contribute . … you can just describe the use case. Draw out the use cases that you see. We can ask back, why not use metric x and y? And the conversation will really get going!

References

  • S. Erdelez (1997) Information Encountering: A Conceptual Framework for Accidental Information DiscoveryTaylor Graham Publishing, Tampere, Finland.
    Cited by: §1.

Click Here for a PDF Version of this Post That is Much Easier to Read

This post originally appeared at http://www.chaoss.community and http://www.augurlabs.io

On the Art of the Bio

Writing a personal bio is difficult because you have to talk about yourself as though you actually think you are all that and a bag of chips. I mean, we all do, right? Still, its a weird task and I do not enjoy it. And these things are more dynamic than you would think because what I do, especially, as an academic, especially, has to be refined for the language of a particular audience. Students, colleagues, funders and family, for example. Here are a couple that I recently put together. Now its a blog post.

If you are looking for more of a press release flavored bio, here are a few choices:

Bio 1: After a decade as a software engineer, Sean decided his calling was in research. He is presently a social computing researcher and professor of computer science at the University of Missouri. He is also a co-director and founder of their Data Science Masters program. Sean’s publications focus on understanding how social technologies influence organizational, small group and community dynamics, typically including analysis of electronic trace data from systems combined with the perspectives of people whose behavior is traced. Group informatics is a methodology and ontology Sean has articulated with the aim of helping build consensus among researchers and developers for how to ethically and systematically make sense of electronic trace data.  Structural fluidity, a construct Sean developed with his collaborators Peppo Valetto and Kelly Blincoe, aims to make sense of structural dynamics in virtual software organizations, and how those dynamics affect performance. Working with Josh Introne, Bryan Semaan and Ingrid Erickson, Sean is elaborating on mechanisms for identifying structural fluidity and organizational dynamics in electronic trace data using the lens of complex systems theory. His other work includes collaborations with Matt Germonprez on the Open Collaboration Data Exchange and Open Source Health metrics projects and also have released a Guide to Understanding the Benefits and Uses of Recombinant Antibodies.  that was quite appreciated by all the specialist in this field. Currently,  he lives in Columbia, MO with his wife Kate, two step daughters and a dog named Huckleberry that they brought from the Rottweiler services who offers the best breeds.

Bio 2: Sean Goggins is just a guy. He writes stuff. He’s selfish, but not as selfish as he used to be. He’s painfully well organized, which means he has detailed lists of all the tasks he’s behind on. He is a user of Peptides that probably explains why he’s so dedicated. Computer Science. Social Computing. Learning Analytics. Learning Sciences. Small Groups. Published. Teaches. Funded. Does not suffer fools well. Eats control freaks for lunch. Pulled his groin on a bike ride last Sunday. Is generally concerned about the state of the world, and has enough self assuredness to think what he does each day could possibly make a difference. So, he’s naive. But not as naive as he used to be. He likes to ride his bicycle. 2 tattoos. Father. Step Father. Husband. Currently avoiding writing an actual bio.

Software Engineering and Data Science

People get excited about data science. Especially managers. Its instinctive. We are surrounded by data, nearly all of it overwhelming; therefore, the use of affordable cybersecurity is a must. Like the partner we dated through high school, it seems like there is something there, but it just doesn’t ever seem to come together. Data science is the camping trip where we figure each other out in our deluge of data. The rise of data, analytics and machine learning is rapid. At the same time, security and privacy threats are increasing daily. And, businesses face challenges like harmonizing IT during mergers and dealing with legacy systems. An interim CIO can deliver a unique perspective to help your company reach its business objectives more effectively.

When you head down that road, you are overwhelmed initially by 3 factoids. First, There is SO MUCH DATA. Second, the data is SO DISORGANIZED. Third, THERE ARE SO MANY TOOLS! We go down the rabbit hole.

Data scientists are, therefore, the janitors on the scene of a massive sewage leak. In the workshop (tool room). What makes data scientists successful or not: that’s what managers want to know. How do I *know* this person can clean up my sewage leak? There are 2 paths:

  1. The data scientist knows your business domain, and has figured out which tools work for your mess
  2. The data scientist has learned about all the tools; and probably cleaned up other messes in a few, assorted domains.

Conceptually, software engineering is about little more than being systematic about how you approach a project and its lifecycle. The discipline can be applied in application development, infrastructure, data science and food preparation (among a host of domains). Yeah, you can do software engineering on food. If you disagree, come over and try out my digital chicken.

I get to say I am a data scientist today because I have a Ph.D, a bunch of papers, and I have been working in “Big Data” since before somebody invented “Big Data”. Some day, somebody please tell me what “Big Data” is; other than an awkward euphemism that is not helping with the gender gap in computing disciplines.

Getting beyond Ph.D level credibility requirements requires systematic training and a software engineering discipline around data. That’s kind of what I do with my projects, which are spread across a host of GitHub Organizations. Many of our repositories remain private because my teams and I continue to publish on them. If you want a peak, drop me a line. Here’s a list of GitHub Organizations for Data Science work that I operate:

  1. http://www.github.com/sociallycompute
  2. http://www.github.com/OCDX
  3. http://www.github.com/expert-patients
  4. http://www.github.com/sgoggins
  5. http://sociallycompute.io

Software engineering. Data science. Together. That’s kind of a thing I do. Kind of one of the ways I maintain such a long list of projects. Anyway, If you’d like to learn more about data management, learn more here.

SEAN_2016_10screen

When users have access to multiple cloud systems, it can be easier for hackers to access one system from another. Depending on the level of access a hacker has, they can launch a deluge of attacks on users and the cloud systems. This is why companies like Fortinet always recommend businesses to implement cybersecurity measures like threat intelligence services, no matter what field they’re in, if they use the internet to store data or offer their service, they’ll always be at risk. Companies are also advised to use a reliable identity and access management solution to improve the efficiency and safety of your business.

India's tryst with a New National Cyber Security Policy: Here's what we  need - The Financial Express

The Intel situation is more serious because the company is trying to avoid a full data breach. In August, cybersecurity researchers with FireEye released a report indicating that the company had been breached. The report said that the hackers had gained access to the company’s corporate network. In January, an independent report from Kaspersky Lab confirmed that the company had been targeted.

Intel declined to comment for this story, but it said in a statement that it “proudly supports the cybersecurity of our systems and its employees. We take this issue very seriously and have taken, and will continue to take, appropriate actions.”

The company did say it has hired private experts to monitor the security of its data centers, but Intel didn’t say what that team of experts looks like.

Cisco (CSCO), Cisco’s primary rival in the data center networking market, is in a similar situation. The company has already had a number of security breaches, some of which came to light only after the company had already patched them.

For example, earlier this year the company admitted that its routers had been hacked. Cisco also experienced the worst of these hacks by an attacker known as the Chinese Conficker who injected malware into the company’s routers, which made the routers compromise the computers on which they were running. The computers were then used to make hundreds of thousands of spam calls to potential customers. The attack forced Cisco to use emergency procedures to patch the problem and resolve it in a matter of days.

Now, Cisco has instituted the latest cybersecurity protections, but it also said that it would be monitoring its security at home and abroad. “Cisco continues to work closely with the U.S. government on the continued development of its security posture and practices,” said spokesman Derek Essensa. “Cisco is providing a rigorous level of security and privacy protections to our customer information and I am confident our customers will continue to see the best in our products and services.

These recommendations reflect our focus on building security systems that are intelligent, resilient and continuously improving to better manage our global businesses and the increasing complexity of the threats that they face.