Open Science Project: An Analysis of Visibility and Sentiment of Desktop Linux on the Public Internet
By James Mawson, August 2020
Maybe it’s different behind closed doors.
But from what we see of the public conversation around driving adoption for desktop Linux.. across blogs, video blogs, podcasts, and social media.. is that very little of it is informed by any real data.
About the best we have are desktop OS statistics from outfits like netmarketshare. Beyond this, everything is reasoned from personal observations, anecdotes and outright speculation.
It’s of course quite healthy to notice what’s right in front of you and form a view. The problem is that when this is all you have to draw on, it’s always going to be skewed by what’s visible to you in particular.
And with the most vocal parts of the Linux community tending to be very similar to each other in their relationship to technology, these perspectives probably skew in similar ways.
It’s Actually Kinda Crazy
What’s totally wild about this is that the Linux world already knows there’s a better way.
I mean, no admin or dev who’s been around for more than five minutes thinks that a brilliant hypothesis is all it takes for great software or systems. You only know if you’re on to something after research, experimentation and testing.
It’s only when it comes to the crucial challenge of winning hearts and minds that this attitude all goes out the window.
Which is why we want to start turning this around, with a proper audit of desktop Linux’s visibility on the public internet.
We want to use open source tools like scrapy and public datasets like the Common Crawl to get some overall measure.
Why Do We Want to Study Visibility and Sentiment in Particular?
Unless you’re in the tiny minority who already uses desktop Linux, you probably have little to no contact with what it’s really like.
Non-users instead form their impression of the software from a context of reputation and perception that surrounds it. Online media is a very large part of this, so let’s have a look at what’s really going on.
Let’s see if we can figure out these things:
1. How visible is desktop Linux on the public internet? 2. What is the prevailing sentiment around desktop Linux on the public internet? 3. How is this changing over time?
This Will Be All Out in the Open
We want to design, plan and conduct this study as an open science project. All of our raw data available for review, criticism and further analysis. All steps of the scientific process, all tools and data used should be accessible on our GitHub account.
“Open science is the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process” (Michael Nielsen).
Open Science is defined by six principles, so that the individual steps and the results of the scientific process remain open. The first four principles are described by Kraker (2011) in “The Case for an Open Science in Technology Enhanced Learning”. Open Peer Review and Open Educational Resources are two further principles of Open Science.
Open Methodology: All methods and the entire process will be documented
Open Source: Only FOSS is used. Own new developments are published again as FOSS.
Open Data: All raw and aggregated data are published for free use
Open Access: All publications are made openly accessible
Open Peer Review: The Peer Review is kept open in a transparent and comprehensible manner to ensure maximum quality
Open Educational Resources: Free and open materials are used for education and teaching
“Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).”
We’re Looking for Help
Our core team comes from a mix of academic, technical and writing backgrounds. So we already have a good start with research and data, writing code, and communicating our work.
But if anyone out there is interested in adding to this to make it even better.. well, that makes it even better, right?
Here’s where we’d love more people involved.
Help to design the project: How exactly should we go about this? Which language markets do we look at? What are the best ways to separate out desktop related coverage from servers and embedded systems? How should we analyse sentiment?
Help to implement the project: Once we’ve figured out exactly what we want to do, we’ll need to write some code.
Help with server power: If anyone out there has some spare server resources we could use to host this, that would be amazing. We’ll be sure to credit you prominently in all of our publicity activity.
Help to promote the project and its findings: At this early stage any help to share this blog post around the Linux community will be greatly appreciated. Then after we’ve gathered the data, with supporting infographics and other material.
If you’re a blogger, journalist, podcaster or similar, and you’re interested in following along with this work, please feel welcome to drop in too!
Why This Matters
If we want our best chance at solving the adoption problem, we should have a proper measure of the challenge we’re facing.
The visibility of desktop Linux is a big part of that.
This is a great opportunity for us to learn, grow and better inform the projects that form the desktop Linux ecosystem.
2020-08-02 09:13 +0200