Stefan Baack

Wednesday, November 25, 2015

New blog with new address: sbaack.com

I created a new blog available at http://sbaack.com/

This site is no longer maintained.

Monday, June 29, 2015

Talk about the practices of values of civic hacking at mySociety

Last week I gave a talk at the great Data Power Conference at the University of Sheffield which had a panel dedicated to civic hacking. This was a great opportunity to present some findings from my ongoing case study about the practices and values of civic hacking at mySociety.

Here are the slides from the presentation:

Empowerment and civic hacking

My presentation revolved around the term empowerment. mySociety aims at empowering citizens, but what does ‘empowerment’ actually mean in relation to civic hacking? Based on a content analysis (which included mySociety websites, project descriptions, blogposts) and a couple of interviews, my answer to this question is currently twofold:

1. Civic hacking aims at giving citizens a greater sense of agency by making it easier for them to engage with authority

FixMyStreet, mySociety’s website for reporting local issues to local councils, is a nice example to illustrate this point. At first, it might not be very obvious what reporting potholes, broken street lights, or dog poop has to do with empowerment. However, mySociety didn’t build FixMyStreet because they were particularly concerned with how smooth or clean the streets in their neighborhoods are. It rather has to do with “adjusting power relationships”*:

If somebody is able to report a problem with a pothole outside their house and next week it’s fixed, they have learned that engagement with authority is not futile…FixMyStreet is a gateway drug into bigger civic engagement.

To further understand this statement, it helps to look at some of reasons why mySociety decided to build FixMyStreet back in 2007. The basic problem they wanted to address was that reporting local issues was often not an easy and straightforward process for citizens. First, citizens often didn’t know who is responsible for fixing an issue because each area in the UK has two councils with different responsibilities. Second, mySociety found that the websites provided by the local councils were not user friendly. These websites were designed to best serve the administrative processes of the councils and usually required the user to fill out a rather technical form (which often was not easy to find). With FixMyStreet, mySociety aimed at making this process as intuitive and easy as possible for citizens. As a user of the site, you basically just have to click on a map to locate the issue, give a description of the issue and send the report. Based on where you’ve clicked on the map and what type of issue you’ve reported, FixMyStreet will then forward your report to the local council that is responsible for fixing it.

In short, mySociety tried to turn reporting local issues from something that requires some time, research and energy into something that people can do along the way — and I suggest that this is essential for the meaning of empowerment in relation to civic hacking. It means to give citizens a greater “sense of agency“ by developing tools that make it easier for them to engage with authority in a successful way. One consequence of this is an emphasis on a good user experience for the “citizen user”, i.e. on making processes easier from the perspective of the citizen even if that means making it less convenient for government institutions.

2. Civic hacking aims at empowering citizens by making government activities more legible to them

I suggest that the second dimension of empowerment in civic hacking is about creating a new level of legibility for citizens, which often requires access to certain types of structured data. This is where seemingly purely technical details about data structures, open standards, and formats play a major role.

A good example to illustrate this is TheyWorkForYou. This website is only possible because mySociety was able to scrape the information available on the official parliament websites in the UK in order to turn this information into structured data. ‘Structuring data’ means to identify specific pieces of information in a larger dataset in order to mark these pieces of information with an identifier. These identifiers can then be used to filter and analyze the data in new ways. For example, to be able to filter out speakers in parliamentary discussions, each speaker has to be marked as a speaker with an identifier. By turning information about the activities of the parliaments into structured data, mySociety is thus able to filter and organize this information in new ways. I just want to point out one aspect to illustrate my argument: the possibility to combine searches for certain keywords with email alerts. Users of TheyWorkForYou can for example search for ‘climate change’ and then sign up to get an email every time someone in parliament is using this keyword in a speech. This can make it easier to keep track of how certain issues are discussed in parliament, i.e. how different members of parliament from different parties talk about climate change.

How does this create a new ‘level of legibility’? To understand this, it is important to note that parliaments in the UK already publish transcripts of speeches online, but usually as PDF files. It is possible to download and search in these PDFs, but it would then require much more time and energy to filter out specific pieces of information. Moreover, to keep track of how certain issues are discussed in parliament, it would be necessary to repeat this process regularly. Similarly to what I described in relation to FixMyStreet above, mySociety aimed at turning this process into something people can do along the way, without investing a lot of time and energy — users just need to sign up for the alerts once and then get an email from time to time. With ‘creating a new level of legibility’ I mean that empowerment in civic hacking is not just about transparency — the parliaments already made transcripts of speeches available online — but also about enabling citizens to keep track of what their government is doing in a very practical sense. In other words, it is not only about giving citizens the theoretical means of keeping track of their government by ‘somehow’ making information available, but also about giving them a greater capacity of doing so by making this information more accessible and actionable for them.

Ongoing research

Please note that these are early findings on a specific aspect of my study. However, I’m excited about how rich, insightful and interesting these early steps are already and I hope to be able to do more interviews with members of mySociety in the near future!

* All quotes used in this article are taken from my interviews with members of mySociety.

Sunday, July 6, 2014

A new Style of News Reporting: Wikileaks and Data-driven Journalism

Update: I uploaded a PDF version in the Social Science Open Access Repository, available under http://nbn-resolving.de/urn:nbn:de:0168-ssoar-400253

This article was originally written and published in 2011 in the Open Access journal Cyborg Subjects. While it was included in a book release (Amazon-Link), it is no longer available online on the journal's homepage. I therefore decided to republish it here.

Abstract
The coverage of Wikileaks’ huge amounts of leaked data was a challenge for newspapers – they had to figure out how to get stories out of extensive and complex data sets and how to present their findings to readers. The result significantly differs from traditional news reporting; including illustrations, interactive web applications and reading instructions to make the material accessible. This style of news reporting is called data-driven journalism. The international interest in the leaks combined with collaborative work between newspapers from different countries made it a new trend in current journalism. A key lesson from working with this kind of material is that data collection is essential for the effectiveness of the used techniques. If journalists would adapt this insight to their own, internal data collection process, this form of news reporting could be used on a large scale and be much more common. The coverage of Wikileaks’ might give a glimpse of how journalism will look like in the future.

A new Style of News Reporting. Wikileaks and Data-driven Journalism
Newspapers are still struggling with the changing media environment that is undermining their traditional business model and are unsure how to make profits online (Freedman 2010). With growing commercialization, journalists tend to use new technology foremost to speed up the news production process rather than experimenting with the new possibilities or enhancing quality (Phillips 2010). However, the collaboration with Wikileaks challenged traditional newspapers and forced them to think about new ways of finding and telling stories. They had to work with large and extensive data sets. To take an example, the Afghanistan War Logs consisted of about 92,000 documents written in a military jargon (Rogers 2011). The obvious problem is accessibility – both for journalists who want to get a story out of the material and for readers who want to take a closer look at it. Letting journalists go through everything individually would be too time consuming and writing about the findings in a traditional manner seemed insufficient for the coverage. Especially The Guardian and New York Times realized that early on. Tools were used to go through the data and to create visualizations and interactive web application which made the material accessible for readers. This form of news reporting is called data-driven journalism – and Wikileaks contributed to its development as a trend.

Data-driven Journalism
Scholars and professionals started to discuss data-driven journalism very recently. In April 2010, the European Journalism Center and the University of Amsterdam initiated the one day event Data-driven journalism: What is there to learn? to define it and discuss possible implications. At this event, Lorenz defined data-driven journalism as “a workflow, where data is the basis for analysis, visualization and – most important – storytelling” (2010: 10). Due to the storytelling aspect, the end product is more than just a visualization of data – it is also contextualizing and highlighting of important aspects. Bradshaw (2010) explains this data-driven workflow in more detail and distinguishes four steps: finding the data (1), interrogating data (2), visualizing data (3) and mashing data (4). Finding can involve having expert knowledge, good contacts or technical skills to gather data. The interrogation requires a good understanding of the used jargon and wider context of the data. Visualization and mashing can involve the work of designers and/or free tools. An example is IBM’s ManyEyes, where users can easily upload and visualize data for free. As Bradhsaw points out, these four steps require teamwork: “The reality is that almost no one is doing all of that“ (2010). At the end of this workflow, raw data should be accessible for readers. Lorenz describes it as a process of refinement, raw data is transformed into something meaningful: “As a result the value to the public grows, especially when complex facts are boiled down into a clear story that people can easily understand and remember” (Lorenz 2010: 12).

Data-driven journalism is not something completely new. As Rogers (2010a) shows, it can be considered to be quite old instead. He describes Florence Nightingale as one of the first data-journalists in the 19^th century who already worked with visual presentations of information to tell stories. What really is new, however, is the media environment journalists are working in. Especially these four aspects indicating a growing importance of data-driven journalism:

The sheer amount of publicly relevant data available online. Especially in the United States and Britain, huge data sets are available in connection with the open government initiative. The problem here is the same as described above: Having access is not enough without accessibility. To take Britain, most governmental data is released as a simple and static PDF file (Stay 2010). Journalists from The Guardian and New York Times saw the potential and started to fill this gap by offering interactive tools and illustrations to add public value to the data.
The existence of free tools to handle this data, like the already mentioned ManyEyes.
The possibility to make the data accessible in an interactive way with web applications.
Time is precious for journalists, they are always under pressure to get the story out fast (see Phillips 2010). By giving access to the raw data, it is possible to involve people outside the newsroom in the process of news production with crowdsourcing – the collaborative analysis by volunteers. This can save time and resources for researching.

Obviously, data-driven journalism greatly benefits from the possibilities of new media. Its perception as a trend is therefore not surprising.

The role of Wikileaks for Data-driven Journalism
Is Wikileaks data-driven journalism in itself? Two contra arguments are that it does not provide visualizations and does not attempt to generate stories out of its materials (only a brief contextualization is given) – both is largely left over to established news media or is considered to be done by ‘users’ (see Lovink et al. 2010). In regard to the workflow of data-driven journalism, Wikileaks is doing the first and second step of collecting and interrogating data without going further. A key aspect, the transformation of raw data into something meaningful to add public value, is not given. To what extent Wikileaks can be considered journalistic more generally remains open for debates, but it is not a form of data-driven journalism alone – but surely an important actor in the data-driven workflow nonetheless. From this perspective, Wikileaks is a source for data that needs to be ‘refined’ to add public value.

Wikileaks as a data-source can be called a driving force of data-driven journalism and has contributed to its development as a trend for three main reasons. First and obviously, to analyze and cover its huge amounts of leaked (raw) data, data-driven journalism techniques are essential both for journalists who want to get a story out and present it to their readership and for readers who can access the material through visualizations and reading instructions. The second reason is that the leaks were interesting for an international audience. The released data from the open government initiatives in the United States and Britain were only interesting for national audiences and there was no need for foreign newspapers to work with it. Connected to this, the third reason is the collaborative work between newspapers from different countries combined with the simultaneous release date of their coverage. The coverage of the Afghanistan War Logs therefore internationally demonstrated the advantages data-driven journalism can have. In comparison, not all of Wikileaks‘ media partners were able to keep up with The Guardian and New York Times. In Germany, where the open government movement was (and still is) much weaker, Der Spiegel covered the Afghanistan War Logs in a much more ‘traditional’ way, using no interactive illustrations at all and focusing on the print version (Krebs 2010). The experience in Britain and the United States to work with huge amounts of data was clearly an advantage for the coverage and made newspapers from other countries aware of the potential. As a result, almost every media partner followed their example and offered visualizations for the second major leak, the Iraq War Logs. As Simon Rogers from The Guardian states: “Wikileaks didn’t invent data journalism. But it did give newsrooms a reason to adopt it” (Rogers 2011).

Using data-driven journalism on Wikileaks’ materials: What was there to learn?
To be more concrete about how data-driven journalism was used in connection with Wikileaks, lets take a closer look at the Iraq War Logs and the ‘Cablegate’ (focusing on The Guardian as an example).

The War Logs contained 391,832 field reports from soldiers. Since each report describes only a single incident, visualizations are extremely helpful to see patterns and get a bigger picture. Two important characteristics made it relatively easy to automatically separate those logs into categories: The standardized format and the use of a dense military jargon, giving meta-data about date, location, type of incident etc. (Matzat 2010). In other words: The data set was largely readable for machines. The Guardian concentrated on incidents where someone had died and separated them into cause of death, who were killed (for example civilians or hostile forces), time, location etc. (Rogers 2011). Then they used Google Fusion tables and marked every single death in Google Maps. The map was released alongside with key findings from their statistical analysis (Rogers 2010b). This gave an overview of the amount of people killed and further information to contextualize it (for example, most of these people were civilians). In addition, The Guardian took all incidents from a single day to create an interactive graphic (Dant et al. 2010). While a timer is running from the first to the last minute of this day, a map shows the location of each incident, gives a description of what happened and counts the total amount of dead people. It also offers a link to the original report of each incident. As Lorenz described, abstract numbers were broken down into something meaningful. By visualizing a single day, you can get a better picture of the atmosphere and violence that shines through the logs. Apart from that, the fact that the material was readable for machines did not only help to create visualizations to present the news and make the material accessible for readers. The automatic separation into categories was used to guide the selection of documents worth reading for the coverage – which can speed up the generating of stories out of the data set.

Compared to the War Logs, visualizations for the ‘Cablegate’ are rare. According to Matzat (2010), this is not only due to the broad geographical reference but mainly to the content of the material. While the War Logs could be categorized and visualized relatively easy due to their clear structure, the diplomatic dispatches (‘cables’) are extensive reports and complex analysis. As Rogers from The Guardian points out, their “reporters ended up with the enormous task of actually going through each cable, reading it and seeing what stories were there” (2011). Still, The Guardian created a static world map showing how many cables come from which locations and how they are classified. This may be useful to get an overview of the material, but without knowing the actual content of the cables it does not give readers a better access to it. The fact that 1,083 cables have been sent from London to Washington is not interesting without knowing what is written in it. Seeing the problem, The Guardian also offers a more ‘context-rich’ interactive map. Users can click on a country and get list of both the original cables from Wikileaks and a list of articles covering the content of those cables, which is a very useful tool to investigate the material. However, only a small amount of cables is available on this map yet, partly due to the material and to the releasing policy of Wikileaks (not all cables have been released simultaneously, they continue to be steadily released in stages). For this kind of unstructured material, crowdsourcing or alternative web resources for investigating it is still an advantage of data-driven journalism. There are a couple of crowdsourcing projects or search engines for the cable releases, for example CableWiki or CableSearch (see an overview here). These resources can form the base for further visualization attempts in the future.

The coverage of the Iraq War Logs and the Cablegate showed that the effectiveness of data-driven journalism techniques is dependent on the material at hand. For structured and machine-readable data, they are very helpful for both showing journalists where to find a story in the material and for readers who can get access through visualizations. For more extensive and unstructured data like the diplomatic cables, visualizations are not as useful and there is no other way than reading everything individually.

First Precursor of a new Journalism?
With more and more publicly relevant data available online and a further development of visualization techniques, data-driven journalism is at least likely to become a more established form of news reporting. However, it is questionable if such data will continue to come from Wikileaks. The recent release of the Guantánamo Bay files seems to be “very nearly the final” (Gabbatt 2011) cache of the huge data set the platform supposedly obtained from Bradley Manning. I think such persons who have access to those files and are willing to leak it are far from the norm. Even if Wikileaks is this initial spark for a ‘leaking culture’ (which can be assumed due to the rise of more specialized and local leaking platforms like Greenleaks) it is unlikely that leaked data with the same impact and size as the Cablegate or the Iraq War Logs will be common. Apart from that, the future of open government initiatives is unclear as well – especially after the budgets for this project have been cut in the United States (Yau 2011). When newspapers solely rely on the success of leaks and open government, data-driven journalism may remain a niche form of news reporting.

Therefore, I would argue that the real lesson journalists can learn from the collaboration with Wikileaks is shown by Kayser-Bril et al. (2011). They suggest that media organizations should not wait for the release of other data sets and, instead, further embrace the opportunities of data-driven journalism by becoming ‘trusted data hubs’ themselves. They should not only focus on handling externally produced data sets, but also develop and structure their own, internal database. Even though Kayser-Bril et al. do not refer to Wikileaks, they largely take the experience with its materials into account by stressing that the way data is collected is essential. Basically, all content produced by journalists is already data. What has to be changed is the way this data is collected, making it readable for machines and enable journalists to quickly analyze large and complex data sets and build stories around them. Every event can be broken down by some fundamental information (latitude, longitude etc.), described in a structured manner and linked to other events in a database. As an example of the possibilities, Kayser-Bril et al. mention the crime page of a newspaper. Instead of just giving a list of articles about crime events, it could be transformed into a web application that plots the events over time with the options to sort the data by time, type of crime, location and visualizing it on a map – similar to The Guardian’s map for the War Logs.

When newspapers adopt these ideas, data-driven journalism will surely be a more common and established form of news reporting that can come into use regardless of leaks or open government. Journalism could benefit from the new possibilities for finding, telling and presenting stories demonstrated in the coverage of Wikileaks‘ material on a large scale. As Phillips (2010: 100) and Benson (2010: 192) are pointing out, more important than the capabilities of new technology is the way journalists actually use it. Becoming data-hubs could make them aware that they can and should use the new possibilities to improve the quality of news reporting and not only the speed of production. This would be an important step forward – not least initiated due to Wikileaks.

References

Benson, Rodney (2010): Futures of the News: International Considerations and Further Reflections. In: Fenton, Natalie (ed.): New Media, Old News. Journalism & Democracy in the Digital Age. London: Sage, P. 187 – 200.
Bradshaw, Paul (2010): How to be a data journalist. http://www.guardian.co.uk/news/datablog/2010/oct/01/data-journalism-how-to-guide (last accessed 16.04.2011).
Dant, Alastair/Meek, James/Santos, Mariana (2010): Iraq war logs: A day in the life of the war. http://www.guardian.co.uk/world/interactive/2010/aug/13/iraq-war-logs (last accessed 19.04.2011).
Freedman, Des (2010): The Political Economy of the ‘New’ News Environment. In: Fenton, Natalie (ed.): New Media, Old News. Journalism & Democracy in the Digital Age. London: Sage, P. 35 – 50.
Gabbatt, Adam (2011): Guantánamo Bay files – live coverage. http://www.guardian.co.uk/world/blog/2011/apr/25/guantanamo-bay-files-live-coverage (last accessed 26.04.2011).
Kayser-Bril, Nicolas/Lorenz, Mirko/McGhee, Geoff (2011): Media Companies must become trusted Data Hubs. http://owni.eu/2011/02/28/media-companies-must-become-trusted-data-hubs-catering-to-the-trust-market/ (last accessed 27.03.2011).
Krebs, Malte (2010): Spon-Chef Rüdiger Ditz zum Blogger-Bashing. “Wir haben nicht so gut ausgesehen”. http://meedia.de/nc/details-topstory/article/wir-haben-nicht-so-gut-ausgesehen_100029332.html (last accessed 27.03.2011).
Lorenz, Mirko (2010): Status and Outlook for data-driven journalism. In: European Journalism Center: Data-driven journalism: What is there to learn? A paper on the data-driven journalism roundtable held in Amsterdam on 24 August 2010, P. 8-17. http://mediapusher.eu/datadrivenjournalism/pdf/ddj_paper_final.pdf (last accessed 22.03.2011).
Lovink, Geert/Riemens, Patrice (2010): Twelve theses on WikiLeaks. http://www.eurozine.com/articles/2010-12-07-lovinkriemens-en.html (last accessed 15.04.2011).
Matzat, Lorenz (2010): Wie Wikileaks inzwischen Transparenz versteht. http://blog.zeit.de/open-data/2010/11/29/wikileaks-embassyfiles-transparenz/ (last accessed 17.04.2011).
Phillips, Angela (2010): Old Sources: New Bottles. In: Fenton, Natalie (ed.): New Media, Old News. Journalism & Democracy in the Digital Age. London: Sage, P. 87 – 101.
Rogers, Simon (2010a): Florence Nightingale, datajournalist: information has always been beautiful. http://www.guardian.co.uk/news/datablog/2010/aug/13/florence-nightingale-graphics (last accessed 22.03.2011).
Rogers, Simon (2010b): Wikileaks Iraq: data journalism maps every death. http://www.guardian.co.uk/news/datablog/2010/oct/23/wikileaks-iraq-data-journalism (last accessed 25.03.2011).
Rogers, Simon (2011): Wikileaks data journalism: how we handled the data. http://www.guardian.co.uk/news/datablog/2011/jan/31/wikileaks-data-journalism (last accessed 16.04.2011).
Stay, Jonathan (2010): How The Guardian is pioneering data journalism with free tools. http://www.niemanlab.org/2010/08/how-the-guardian-is-pioneering-data-journalism-with-free-tools/ (last accessed 22.03.2011).
Yau, Nathan (2011): Data.gov in crisis: the open data movement is bigger than just a site. http://www.guardian.co.uk/news/datablog/2011/apr/05/data-gov-crisis-obama (last accessed 26.04.2011).

Friday, July 4, 2014

#asmc14 afterthoughts: Big Data and Democracy

'Big data' is usually conceived as a way to generate knowledge by analyzing ever larger and 'messier' quantities of data. The rationality behind big data is often associated with centralized control and surveillance: Grab as much (if possible: all) data there is about a phenomenon and analyze it to discover patterns and predict future behavior. Not something one would easily associate with democratic values or citizen empowerment.

However, from a historical perspective it seems that big data is the latest expression of what can be described as the "two-faced nature of quantifying society". Porter illustrates this two-faced nature when he points out that the notion of "objectivity" is

evidently required for basic justice, honest government, and true knowledge. But an excess of it crushes individual subjects, demeans minority cultures, devalues artistic creativity, and discredits genuine democratic political participation. (1995, p. 3)

Fears over excessive objectivity seem to echo our modern-day critique on big data. We could re-articulate such fears with relation to data by asking: When data is used rhetorically as "that which is given prior to argument" (Rosenberg 2013, p. 36) - as the 'factual' and indisputable basis for debate - where is room for argument and debate when data is everywhere? On the other hand, Porter's observation also points out that "quantification was important for democratization", as Bernhard Rieder mentioned after his excellent presentation (thanks to him for pointing out Porter's book to me!). Since increased quantification can have negative and positive effects, we should not only criticize big data but also think about the conditions under which 'datafication' - the ubiquitous quantification of social life underpinning big data (van Dijck 2014) - can actually be good for democracy. Of course, this does not mean that critique is not important! My point is that we have to accept the fact that these technologies are here to stay. Thus, thinking about how to overcome the dangers of big data's modern-day practices and rationalities is valuable and important.

Looking at alternative data rationalities

I think a good starting point is to look at alternative approaches or rationalities around data that do not follow the categories and logics of big data. Therefore, I want to point out some presentations from the Social Media and the Transformation of Public Space conference that addressed alternative approaches to datafication:

In my own presentation about the Open Data movement (you can get the slides here) I argued that datafication may not only lead to "big data rationalities", but also to a spread of values and practices from the Open Source culture. This idea is based on the observation that Open Data activists take key values and practices from Open Source and apply them to new domains outside the development of software (see also Kelty 2008). For example, 'raw data' is conceived as 'source code' that should be shared openly. For activists, this implies a slightly different role of journalism and a form of political participation that to some degree resembles the 'Bazaar model' of Open Source. Such a spread of Open Source culture could lead to a re-articulation of concepts like journalism, participation and democracy - in ways that may not have seemed possible before.
Helen Kennedy's presentation Making Analytics Public: really useful analytics and public engagement (you can find both hers and mine abstract here) asked whether and under which conditions (data) analytics can contribute to the public good. She argued that analytics need to become more public itself in three ways. First, both the data and the analytical tools should be available to the public to use. Second, instead of being proprietary and black-boxed analytics need to be open to public supervision in order to be scrutinized and debated. I think this point connects to the question whether public social media are a good idea. Moreover, Nick Couldry and Joseph Turow made a similar argument in a recently published article, warning that "the emerging culture of big data" may "erode democracy unless their hidden workings are made public and contested broadly" (2014, p. 1711). Thirdly, Helen argues that analytics should be rethought as a more participatory process, which means that they should not only be instruments in the hands of experts but means that offer new forms of representation "by which publics can come reflexively to know and constitute themselves in new ways". In other words, datafication and analytics can be thought of as means that offer publics new ways of constituting themselves, something that could empower citizens and serve a public good.
Lonneke van der Velden's presentation Forensic devices for activism: on how activists use mobile device tracking for the production of public proof (abstract) explored how activists use the ubiquitous tracking of their activities for their own ends. She described InformaCam, a mobile phone application that can be used to store images or videos in two versions: one in which identifying meta-data (time, location etc.) is removed, and one in which it is preserved and in which one can even add information manually. This way, the application gives activists the means to produce public evidence without giving up their anonymity. On the notion of activism, I would also like to add Nafus' and Sherman's (2014) study about the Quantified Self Movement. They describe this movement as an alternative big data practice because activists appropriate the techniques and conceptions of big data while at the same time resist its rationality by emphasizing their status as individuals who do not fit into common categories. In Nafus' and Sherman's own words, they "appropriate big data’s attention to granular patterns, but resist the categories that are built into devices and into the market for data" (2014: 1791). Resembling Helen's arguments, the Quantified Self Movement asks "what it means to think of data 'as a mirror' and what kinds of reflection, learning, and personal insights might emerge" (Nafus and Sherman 2014, p. 1787).

We need more research

I think more research like this is necessary to explore what types of alternative rationalities around datafication are emerging - outside the 'big data business'. Nick Couldry has called this type of research social analytics. That is

the study of how social actors are themselves using analytics - data measures of all kinds, including those they have developed and customized - to meet their own ends. For example by interpreting the world in new ways. (Couldry 2013, at minute 47:57)

Whether datafication serves businesses and intelligence agencies more than democratic values and citizen empowerment depends on how data and analytics are utilized and distributed. Research on social analytics will help us to find out under which conditions it might be good for democracy.

References

Couldry, Nick (2013, November 21). A Necessary Disenchantment: myth, agency and injustice in the digital age. Public lecture, London School of Economics and Political Science. Retrieved from http://www.lse.ac.uk/newsAndMedia/videoAndAudio/channels/publicLecturesAndEvents/player.aspx?id=2120

Couldry, Nick, and Joseph Turow (2014). Big Data, Big Questions. Advertising, Big Data and the Clearance of the Public Realm: Marketers’ New Approaches to the Content Subsidy. International Journal of Communication 8: 1710–26. Retrieved from http://ijoc.org/index.php/ijoc/article/view/2166

Kelty, Christopher. M. (2008). Two Bits: The Cultural Significance of Free Software. Durham: Duke University Press. Retrieved from http://twobits.net/read/

Nafus, Dawn, & Sherman, Jamie (2014). Big Data, Big Questions. This One Does Not Go Up To 11: The Quantified Self Movement as an Alternative Big Data Practice. International Journal of Communication, 8, 1784 – 1794. Retrieved from http://ijoc.org/index.php/ijoc/article/view/2170

Porter, Theodore M. (1995). Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton, N.J: Princeton University Press
Rosenberg, Daniel (2013). Data before the Fact. In L. Gitelman (Ed.), “Raw data” is an oxymoron (pp. 15–40). Cambridge, Massachusetts ; London, England: The MIT Press.

Van Dijck, José (2014). Datafication, dataism and dataveillance: Big Data between scientific paradigm and ideology. Surveillance & Society, 12(2), 197–208. Retrieved from http://library.queensu.ca/ojs/index.php/surveillance-and-society/article/view/datafication

Sunday, June 22, 2014

#asmc14 afterthoughts: Thinking about public social media

Last week I attended the great Social Media and the Transformation of Public Space conference in Amsterdam. It was an exhausting, but very inspiring week! Here, I want to share some of the ideas and impressions while they are still fresh. I want to start with a question that I asked twice in two different Plenary Conversations:

Why is there no discussion about 'public' social media (in the sense of public broadcasting)?

I didn't raise this question because I think it is very realistic to have public social media any time soon, or that they would be a solution to all the problems and concerns raised about social media during the conference, but because it just struck me that there is absolutely no discussion about this idea.

During the conference, many concerns or questions addressed the commercial nature of social media and the business interests and market strategies of its providers. Especially Bernhard Rieder's Keynote about the rise of algorithmic knowing made a strong argument (see his slides here): The real problem is not that Big Data acolytes promoting the power of this new paradigm are wrong, but that they might be right. Then the contrast between commercial provider interests and civic values (which are often evoked in connection with social media, for example in terms like the 'Twitter revolution') becomes even more problematic. In Bernhard's words, the danger lies in the monopolization of knowledge and a "reconfiguration of publicness according to operational goals that are geared toward profit maximization". When algorithms are powerful engines of order that produce new ways of knowing, the values and interests inscribed into them can shape publicness in many ways. It is therefore important to address these values and interests - and when we do so, it is almost unavoidable to take a normative perspective. What kind of 'public' is shaped by these providers? How do we want a 'public' to be? In what kind of society do we want to live in? This led me to the question: What can we actually expect from social media when they are provided by companies that rely on advertisement? From this perspective, thinking about public social media does not seem far fetched.

Why thinking about public social media is valuable

There were some counter-arguments during the keynote and I had an argument with Axel Bruns about it on Twitter (who did a fantastic job covering the conference on his blog). Axel is skeptical about public social media because they would probably not be able to attract enough users to become serious competitors for Facebook or other commercial platforms and therefore remain irrelevant. In his response to my question, keynote speaker Hallvard Moe also argued that public social media are an interesting idea but he thinks it's unrealistic that it is ever going to be build, especially in a neoliberal setting. Both are valid and good arguments of course. Still, I think it is valuable to think about public social media for at least three reasons:

The idea of public social media infrastructures can help us (as researchers) to think about how social media networks should actually look like when they are not based on commercial interests but on civic values and on the normative frameworks that we frequently refer to in our discussions (like Habermas' public sphere theory). How exactly could we inscribe such values and norms in algorithms and infrastructures that are supposed to support a certain form of publicness?
Even though it is unrealistic to happen any time soon, building a public social media infrastructure could have an impact regardless of user numbers. Even if user numbers dwarf in comparison to Facebook or other platforms, building an actually existing alternative based on civic values could have a serious impact on how social media networks are perceived.* I'm not talking here about our perception as researchers, but about a new level of awareness among people outside academia concerning the issues surrounding commercial social media platforms. Then the mere existence of a public social media infrastructure could already have an impact on commercial providers as well, who would be forced to somehow respond to this new perception.
I think arguments like "that's never going to happen" or "it's not going to be successful" both threaten to foreclose a real discussion and are only thinking in short-terms. As mentioned before, I don't think public social media infrastructures are going to be build any time soon (if ever), and even if that happens they probably won't have an immediate impact. However, I suggest that we should think about this in long terms. And from a long-term perspective, setting the initial spark and starting a real discussion about public social media could be something worthwhile.

Maybe we should think even broader about a public media environment, not only about public social media. Public broadcasting was born in a historically unique media environment. Who knows how and what media we will use in twenty, thirty, or hundred years from now. Thinking in long terms about a public media environment might turn out to be more flexible and successful after all. Or, maybe, public media in the classic sense of public broadcasting is not the solution either, but a more flexible model that includes both public funding and commercial elements?

* I'm taking inspiration here from Christopher Kelty's book Two Bits, where he argued that one of the reasons Free Software or Open Source became so successful is because it is able to speak to existing forms of power through the creation of alternative infrastructures.

Thursday, November 21, 2013

Veröffentlichung der Masterarbeit und Ausblick

Nach langer Funkstille veröffentliche ich hiermit endlich meine Masterarbeit über die Open-Data-Bewegung:

Die Open-Data-Bewegung. Das Verhältnis von Praktiken, Zielen und Selbstbild der Open Knowledge Foundation Deutschland.

Im Vergleich zum letzten Blogpost, auf dem ich bereits vorläufige Ergebnisse präsentiert habe, sind noch einmal einige Änderungen eingeflossen - und natürlich werden die einzelnen Aspekte in der Arbeit deutlich ausführlicher dargestellt.

Der Grund für meine lange Funkstille und die späte Veröffentlichung der Arbeit ist allerdings ein recht erfreulicher: Ich habe in der Zwischenzeit eine Stelle an der Universität Groningen in den Niederlanden bekommen und bleibe weiter an dem Thema dran. Aufbauend auf meiner Masterarbeit forsche ich nun zum Thema Offenheitsinitiativen und Journalismus. Diese Arbeit wird eine wichtige Grundlage für meine Forschung bilden, weshalb mir eine Veröffentlichung wichtig war. Da ich sehr positive Erfahrungen damit gemacht habe, werde ich wohl auch in Zukunft weiter über meine Forschung bloggen. An dieser Stelle auch noch einmal ein Dankeschön an die Mitglieder der Open Knowledge Foundation und an alle, die mir Feedback und Anregungen für die Arbeit gegeben haben!

Friday, April 5, 2013

Praktiken, Ziele und Selbstbild der Open Knowledge Foundation

Update: Meine Arbeit ist inzwischen fertig und veröffentlicht. Die hier vorgestellten Ergebnisse waren vorläufig und sind nicht identisch mit der finalen Version!

Mit diesem Post will ich anfangen, Ergebnisse aus meiner empirischen Arbeit hier im Blog vorzustellen. Im Rahmen meiner Masterarbeit habe ich insgesamt acht Interviews mit Mitgliedern der Open Knowledge Foundation Deutschland (OKF) geführt und diese (zusammen mit einzelnen Dokumenten wie der Open Definition) qualitativ nach dem Verfahren der Grounded Theory ausgewertet. Dabei werden zunächst die unterschiedlichen Aussagen in den gesammelten Daten auf ihren allgemeinen Kern reduziert, um daraus Konzepte und Kategorien zu bilden und deren Bedingungen und Konsequenzen sowie deren Beziehungen zueinander zu klären (Krotz 2005: 175). Dies bildet man dann in einem Kategoriensystem ab, das den Untersuchungsgegenstand möglichst genau beschreiben soll, indem es die abstrakteren Sinnstrukturen der Akteure offenlegt. Ergebnis ist letztlich eine sog. 'datennahe Theorie' - eine grounded theory also.
Konkret geht es in meiner Masterarbeit um das Verhältnis von Praktiken, Zielen und Selbstbild der OKF. Die vorläufigen Auswertungsergebnisse möchte ich im Folgenden vorstellen. Die dabei verwendeten Zitate stammen aus den Interviews, die ich mit den Mitgliedern geführt habe.

Das Verhältnis von Praktiken, Zielen und Selbstbild der OKF Deutschland

Was wäre eine Masterarbeit über die Open Data Bewegung ohne eine Datenvisualisierung? Mein aktuelles Kategoriensystem deshalb zunächst in Form einer Mindmap und anschließend Erläuterungen dazu:

Im Zentrum steht das übergreifende Ziel der OKF: Die Verbreitung eines bestimmten Offenheitsprinzips durch den Aufbau offener Infrastrukturen. Nach diesem Prinzip bedeutet Offenheit, dass keine technischen oder rechtlichen Einschränkungen die Schaffung, Nutzung, Weiterverarbeitung und Weiterverbreitung von Wissen durch jedermann für jegliche Zwecke behindern. Der Begriff 'Wissen' wird dabei als universeller Oberbegriff für alle Formen von Inhalten, Daten und Informationen verstanden.

Allgemein lassen sich eine Reihe von Praktiken zum Aufbau offener Infrastrukturen ausmachen, die wiederum dem Erreichen eines bestimmten Sets von Zielen dienen. Zu den Praktiken gehören:

Offenheit definieren: Die Bedeutung von Offenheit kann je nach Kontext variieren, weshalb eine genaue Definition benötigt wird. Die von der OKF erstellte Open Definition definiert das o.g. Offenheitsprinzip detailliert in Hinblick auf die technischen und rechtlichen Voraussetzungen von Offenheit. Hinzu kommen Richtlinien für die Bereitstellung von Wissen, wie sie insbesondere in Bezug auf Open Government Data entwickelt wurden. Hierzu gehören die zehn Prinzipien zum Öffnen von Regierungsinformationen der Sunlight Foundation und das Fünf-Sterne-Modell von Tim Berners-Lee (Dietrich 2011).
Offene Infrastrukturen implementieren: Meint im Grunde die Implementierung des definierten Offenheitsprinzips, also die Erstellung, Aufbereitung und Bereitstellung von Wissen in einer Form, die den definierten Kriterien entspricht. Das kann zum einen ganz offiziell durch Auftragsarbeiten für Behörden geschehen, ist aber eher die Ausnahme. Wichtiger ist der Aufbau unabhängiger, alternativer Infrastrukturen, die ohne offizielle Unterstützung durch Behörden entwickelt werden und die Vorteile offenen Wissens demonstrieren sollen, z.B. offenerhaushalt.de. Darüber hinaus werden vor allem auf internationaler Ebene auch technische Standards für die Implementierung offener Infrastrukturen entwickelt, z.B. die Datenverwaltungssoftware CKAN.
Offenes Wissen nutzbar machen: Meint das Entwickeln von Werkzeugen, die das bereitgestellte Wissen zugänglich und nutzbar machen, bspw. in Form von interaktiven Datenvisualisierungen und dem Bereitstellen von Kontextinformationen. Eine besondere Bedeutung nimmt in diesem Zusammenhang die Förderung von (Daten-)Intermediären ein: man möchte „den pool derer die sich trauen fünfhundert megabyte csv irgendwie reinzugucken […] erweitern“. Dafür will man einerseits die Entstehung neuer Intermediäre fördern, bspw. indem man eine Community aus 'Gesellschaftshackern' aufbaut (siehe unten); andererseits geht es vor allem darum, dass Journalismus stärker 'datengetrieben' und 'offen' sein soll – die OKF hat dafür engen Kontakt bspw. zu Datenjournalisten.
Lobbyarbeit/PR betreiben: Geschieht klassischerweise durch Kontaktpflege zu Behörden und dem Besuchen bzw. Veranstalten von Konferenzen. Hinzu kommt ein Unterstützernetzwerk, welches hilft, Botschaften innerhalb der im weitesten Sinne netzpolischen Szene in Deutschland zu verbreiten (bspw. auf netzpolitik.org).

Zu den Zielen, die an die Verbreitung offenen Wissens geknüpft werden, gehören:

Mehr Beteiligungsmöglichkeiten: Informationen, so die feste Grundüberzeugung der OKF-Mitglieder, sind die Grundlage für Beteiligung. Durch die Verbreitung offenen Wissens soll es BürgerInnen ermöglicht werden, sich einfacher in politische Entscheidungsprozesse einzubringen. Als grobes Vorbild scheint dabei die Entwicklung von Open-Source-Software zu dienen: Selbstselektive Beteiligung durch BürgerInnen, die von der Verwaltung koordiniert wird. Ein genaues Modell der Beteiligung definieren die Mitglieder jedoch nicht, stattdessen betonen sie, dass viel Experimentierfreudigkeit durch eine beteiligungsfördernde Verwaltung notwendig sein wird – statt um die klassische Forderung nach mehr direkter Demokratie geht es also eher um eine offenere und flexiblere Form repräsentativer Demokratie.
Sachlicherer Diskurs: Der öffentliche Diskurs soll durch die Verbreitung offenen Wissens stärker 'datenbasiert' ablaufen. Dadurch, dass alle Zugriff auf die Rohinformationen haben, werde „mehr interpretation von wahrheit“ möglich, wodurch es für Politiker schwieriger werde, ihre Meinungen nur mit den ihnen „genehmen fakten [zu] unterfüttern“. Offene Daten sollen ein „gegengewicht zu pr“ werden, indem Debatten „nicht einfach auf meinungen basieren sondern auf fakten“.
Bessere Selbstorganisation von BürgerInnen: Die Erstellung und Nutzbarmachung von Wissen soll Bürgern helfen, sich einfacher untereinander zu koordinieren und/oder ihre Interaktion mit Behörden vereinfachen. Dieser Aspekt spielt vor allem bei sog. Civic Apps eine Rolle, wie sie bspw. bei stadtlandcode gefördert werden: "angebote services von bürgern für bürger die letztlich kommunikation unter bürgern mit bürgern und mit verwaltung leichter machen". Ein Beispiel hierfür ist fragdenstaat.de.
Verbesserte Accountability: Öffentliche Verwaltung soll durch die Verbreitung offenen Wissens verantwortungsbewusster werden. Dabei geht es nicht nur um Transparenz im Sinne von Anti-Korruption, sondern darum, generell sein Handeln gegenüber der Öffentlichkeit rechtfertigen zu müssen, z.B. bei Vergabeverfahren.
Verbesserte Effizienz: Durch größere Transparenz sollen ineffiziente oder gar redundante Abläufe in Organisationen sichtbar werden, was öffentliche Verwaltung insgesamt effizienter machen soll.

Diese Praktiken und Ziele leiten den Aufbau von offenen Infrastrukturen an, der sich in unterschiedlichen, kleineren Projekten konkretisiert. Mit anderen Worten können Praktiken und Ziele wie eine Folie über die einzelnen Projekte der OKF gelegt werden, um zu sehen, wie diese sich darin konkretisieren. Die OKF Deutschland beschäftigt sich hauptsächlich (aber nicht ausschließlich) mit Open Government Data, also offenen Regierungs- bzw. Behördendaten. Der Aufbau von offenen Infrastrukturen konkretisiert sich hierbei in Projekten, die sich entweder auf bestimmte Regierungsinformationen (z.B. offenerhaushalt.de) oder Regionen (z.B. frankfurt-gestalten.de) spezialisieren, oder übergreifende Ansätze verfolgen (wie offenedaten.de). Eine Analogie für das Vorgehen der OKF ist die modulare Entwicklung von Software, die auch bei Open-Source-Projekten sehr verbreitet ist: in kleinen Schritten werden unabhängige, spezialisierte Projekte entwickelt, die jedoch durch die Verwendung gemeinsamer technischer Standards und Lizenzen zueinander kompatibel bleiben und so ein größeres Ganzes bilden.

Je nach Projekt können die o.g. Praktiken und Ziele dabei auch moduliert und unterschiedlich gewichtet werden. In dem Projekt Open Aid, welches offene Infrastrukturen in der Entwicklungszusammenarbeit etablieren möchte, bedeutet 'mehr Beteiligung durch BürgerInnen' bspw. spezifischer die Möglichkeit für Menschen in den Empfängerländern (von Entwicklungshilfe), Feedback über Effektivität und Folgen von einzelnen Projekten vor Ort geben zu können und besser in die Planung von Entwicklungshilfe eingebunden zu werden. Beteiligung von BürgerInnen in den Geberländern wird hingegen skeptisch gesehen, da diese in der Regel nicht ausreichend mit der Situation in den Empfängerländern vertraut sind. Im Rahmen von Open Access oder Open Science spielt mehr Beteiligung durch BürgerInnen wiederum eine eher nachgelagerte Rolle – hier geht es stärker um Accountability im Wissenschaftsbetrieb.

Vor dem Hintergrund dieses modularen Ansatzes zur Etablierung offener Infrastrukturen definieren sich die Mitglieder der OKF nicht über das 'Reden', sondern „übers machen“. Sie betrachten sich selbst als Intermediäre für offenes Wissen und erkennen die praktische Herangehensweise in den einzelnen Projekten als herausragendes Merkmal der OKF an. Das Selbstbild der OKF-Mitglieder als 'Macher' umschreibe ich mit dem Begriff Gesellschaftshacker. Dabei handelt es sich um den Versuch der OKF Deutschland, den englischen Begriff 'civic developer' zu übersetzen. Er verdeutlicht, dass man einen „praktischen transparenzansatz“ verfolgt und sich dabei als unabhängiger Vertreter von Bürgerinteressen versteht, der die Etablierung von offenen Infrastrukturen durch Behörden oder Unternehmen gleichzeitig vorantreibt und kritisch begleitet. Den Ausdruck 'Hacker' verwende ich dabei in einem erweiterten Sinne insofern, als ich auch diejenigen Mitglieder darunter fasse, die nicht selbst programmieren (können). Auch diese wirken am Aufbau offener Infrastrukturen mit und teilen eine gemeinsame 'moral and technical order':

The phrase „moral and technical order“ signals both technology - principally software, hardware, networks, and protocols - and an imagination of the proper order of collective political and commercial action, that is, how economy and society should be ordered collectively (Kelty 2008: 28).

Diese 'social imaginary' (Taylor 2004) ist in das o.g. Offenheitsprinzip eingebettet. Die Verbreitung dieses Prinzips durch den modularen Aufbau von offenen Infrastrukturen kann deshalb auch als Etablierung einer bestimmten 'moral and technical order' interpretiert werden, die in kleinen Schritten durch die Entwicklung konkreter Projekte vorangetrieben wird.

Unsicherheiten und Feedback

Wie eingangs erwähnt, handelt es sich um vorläufige Ergebnisse. Zur Zeit bin ich inbesondere noch unsicher in Bezug auf den Begriff Gesellschaftshacker. Zum einen weil der Hackerbegriff natürlich recht vorbelastet ist, zum anderen bin ich nicht sicher, ob alle Mitglieder eine solche Bezeichnung annehmen würden. Über Feedback dazu oder generell zu den Beschreibungen hier würde ich mich deshalb freuen!

Literatur

Dietrich, Daniel (2011): Was sind offene Daten? In: BPB Dossier Open Data. http://www.bpb.de/gesellschaft/medien/opendata/64055/was-sind-offene-daten [letzter Zugriff 26.03.2013].

Kelty, Christopher M. (2008): Two Bits. The Cultural Significance of Free Software. Durham: Duke University Press. Frei verfügbar unter http://twobits.net/read/ [letzter Zugriff 26.02.2013].

Krotz, Friedrich (2005): Neue Theorien entwickeln : eine Einführung in die Grounded Theory, die Heuristische Sozialforschung und die Ethnographie anhand von Beispielen aus der Kommunikationsforschung. Köln: von Halem.

Taylor, Charles (2004): Modern Social Imaginaries. Durham, N.C.: Duke University Press.

Pages