The following paper presents the results as well as the technical and intellectual process behind the organisation of the workshop of the Comparative Materialities: Media, Literature, Theory Research Group| Groupe de recherche matérialités comparatives : Médias, littérature et théorie at the Canadian Comparative Literature Association (CCLA) annual congress 2022. Antoine Fauchié and Marcello Vitali-Rosati of the Canada Research Chair on Digital Textualities also participated in the organization of this event (see the Credits page for all participations).
The following lines present my participation to this event and the code I have designed for the workshop in the form of a code review.
The Graph our Research workshop is a continuation of the collaborative writing workshop presented at the 2021 CCLA annual conference: it incorporates the same framework as it was developed at that conference and draws on the ideas that have emerged since then. During the first workshop, participants were invited to write collaboratively on the notepad and answer questions at the same time. The experience of writing as a collective and within the community of CCLA has inspired us to explore a more technical writing performance. The workshop aimed to design a space for the CCLA community to code collaboratively and, in accordance with the theme of the Comparative Materialities: Media, Literature, Theory Research Group| Groupe de recherche matérialités comparatives : Médias, littérature et théorie, to play with the research data1.
.A re-run of the workshop was organized a few months before the CCLA conference by the Canada Research Chair on Digital Textualities: it allowed us to identify the main challenge of our project. The most important issue of the workshop was to find a code editing and visualization tool that allowed synchronous collaboration (to see both the result of one’s and other’s code at the same time). Although methods exist do code collaboratively (via the installation of a server), we wanted to avoid as much as possible a complex installation for the participants. Therefore, we designed a solution for the congress workshop in order to bypass that difficulty.
It led us not only to design the website Graph Ta Recherche / Graph our Research as a common space for strucure informations, but also a tool: Graph Ta Recherche (GTR) can be used by other communities of researchers beyond the workshop.
The main data type of the workshop had to represent a clear utility for the research community, and it was preferable that it would already be organized in the digital environments. Bibliographic references fulfilled these conditions: many online tools such as Zotero allow to organize, edit, export and even share them. Therefore, the workshop corpus is an open library from the bibliographic management tool Zotero .
In this workshop, references constitute not only data to play with, but are considered repositories of our identity as individual researchers and as a community of researchers. Because they title our research or embed our thinking in streams of thought and connect our research to other research, it is critical to structure these textual data semantically: it is no longer a matter of editing references as raw text with information types (title, author, edition, date, etc.) that are undifferentiated apart from the style of the word processor; it is a matter of organizing them semantically, of encoding them to give each type of information its value in the context of the reference.
This technical understanding of the researchers’ references is combined with the recognition of the media as an active component of our knowledge: as pointed out in “Media is the message” (McLuhan 2010), and in “Matter matters” (Barad 2007), the technical environment “determines” (Kittler 1999) the constitution of our references and thus of our research community. This idea is the main thread of the workshop and of our research group.
In the workshop, we aimed to conceive a Zotero library as a collaborative space for interaction and discussion on the references it contains, and to display the links between research works, between researchers, and between data. To visualize links that interconnect us thorough our research, the semantic structure of the information in the references was essential. If the workshop involved an important learning step for some participants (both theoretical and practical), the decision to have researchers establish links between their research data by themselves is political. First, this way researchers are at the center of the thinking patterns. Second, the technical implementation of their research needs or interests is not left to private companies not involved in the academic research field.
Beyond personal use, the principle of a reference library can be approached as a collective space of knowledge, and its tools as a part of the collective knowledge. To highlight this collective dimension in the management of references, a public Zotero library, CCLA-Graph was created specifically for the workshop and a protocol was made available to assist participants in editing references.
The workshop was organized as follows:
Following this presentation, a series of sessions (3 of about 15 minutes each) follows for researchers to collaboratively participate on the library and to produce the graphs while observing the modifications due to the enrichment of the library. The results and the evolution of the graphs during the workshop have been documented on a dedicated page on the website: the Workshop page.
Beyond the collaborative part on Zotero, we had to design a method to produce visualizations from the references of the library to emphasize the links that can be made between researchers, between references, between their themes. To do so, the programming language Python seemed to be the best choice because its syntax is simple and its code is readable. The script developed in Python to produce graphics from the Zotero collection was set up in a Jupyter Notebook (web-based interactive computing platforms to create notebooks that allows to combine notably live code and visualisations of it) and the main steps are available in the Documentation page of the site.
I present here in details the theoretical thinking process and the technical steps of the script that allowed users to play with the data and to automatize the production of graphs into a tool.
The first step is to fetch the information contained in our bibliography using pyzotero library (a wrapper of the Zotero API (v3)) and especially the zotero module.
# First step: to fetch the information contained in our bibliography with pyzotero (a wrapper of the Zotero API (v3))
from pyzotero import zotero
# I write informations of the collection I want to fetch (library_id, library_type, api_key)
bibliography = zotero.Zotero(4592469, 'group', 'fCmiRRXSChwNGKoiw8lYlTKe')
# In this bibliography, I assign the variable *items* and give the parameter "limit=None" to display all the references
items = bibliography.top(limit=None)
My script will then be able to reach the library that I will indicate to it and retrieve the data of the references. To target a particular bibliography, I provide it with the necessary information, which are the identifiers of the bibliography (library_id or the identifier of the library, library_type which in this case is "group"
, api_key which is the key I generate on my Zotero user profile when accessing the library).
I then ask it to fetch items (or individual references) from my bibliography, giving it no limit on the number of references.
The second step implies to reorganize and structure data in order to play with it easily.
# Second step: to structure data
# I define my Reference class (capitalized because it's an object)
class Reference:
# I define the signature of my object (the parameters needed to instantiate it, here the entry variable)
def __init__(self, entry):
# I define an attribute (entry which will take the value of the variable entry)
self.entry = entry
# I define each attribute by retrieving the value of their corresponding keys
# I assign them default values and modify the value if necessary
# for the title value, I retrieve the value of the "title" key from "data"
# then I replace the ":" in titles with "-" because the ":" make noise
self.title = entry.get("data", {}).get("title", "No title").replace(":", "-")
# for the author value, I get the value of the keys "firstName" and "lastName" from "data"
# to reconstitute the names of the authors in a list
self.author = [f"{t.get('firstName', 'No firstName')} {t.get('lastName', 'No lastName')}" for t in entry.get("data", {}).get("creators", [])]
# for the date value, I get the value of the "date" key from "data"
self.date = entry.get("data", {}).get("date", "No date")
# for the type value, I get the value of the "itemType" key from "data"
self.type = entry.get("data", {}).get("itemType", "No type")
# for the tag value, I get the value of the "tag" key of "tags" of "data"
self.tags = [t.get("tag") for t in entry.get("data", {}).get("tags", [])]
# I define my Tag class the same way
class Tag:
def __init__(self, entry):
self.entry = entry
self.title = entry.get("data", {}).get("title", "No title").replace(":", "-")
self.author = [f"{t.get('firstName', 'No firstName')} {t.get('lastName', 'No lastName')}" for t in entry.get("data", {}).get("creators", [])]
self.date = entry.get("data", {}).get("date", "No date")
self.type = entry.get("data", {}).get("itemType", "No type")
self.tags = [t.get("tag") for t in entry.get("data", {}).get("tags", [])]
# In my bibliography, for each entry:
for entry in bibliography.top(limit=None):
# I instantiate a Reference to the variable reference
reference = Reference(entry)
# I instantiate an empty list
list_reference = []
# In my bibliography, for each entry :
for entry in bibliography.top(limit=None):
# I instantiate a Reference that I add to my list of references
list_reference.append(Reference(entry))
For example, if I wanted to produce a visualization showing all the researchers who have worked together (and therefore linking the researchers names according to the references), these informations (name of the author and title of the reference) have to be appropriately structured to allow me to access it and thus establish the link.
I define here a structure for the types of information that will be useful in the graphs (here the title of the reference, the name of the author, the tag associated to the reference)2. To structure my data, I decided to proceed with a list of objects (class objects like Reference
are means of bundling data and functionality together). Lists are mutable, so they can be modified. Object is a way of defining data in order to access them easily. In Objects I can define each type of information (title, author, tags) as attributes of my object. Each Object has its own identity, a signature. I decide here what goes into my object Reference, and how it is organized.
For example if I want to technically implement the research group Comparative Materialities: Media, Literature, Theory, I can choose to model it as a list of objects where each member is an object. The Marcello Vitali-Rosati
object and the Markus Reisenleitner
object will not have the same signature but will have attributes (self
) in common. For the attribute of hair (which would be noted self.hair
), Marcello and Markus will have different values and a different organization of these values:
the attribute self.hair
of the object Markus Reisenleitner
will have for the key "hair color"
the value "brown"
and will have an organization such as the presence of the data is necessary ("hair color, {}
);
the attribute self.hair
of the object Marcello Vitali-Rosati
will have for the key "hair color"
the value "blond"
and will have an organization such as the absence of the data is allowed ("hair color, "No hair"
)3.
To sum up, objects are ways to define entities.
The values of the attributes are retrieving from Zotero data, so I have to know how the informations is organized in Zotero API and I need to know the mapping network between the keys and the API data values: for example, book authors is semantized by Zotero as the "creators"
. For some instances, I need to restructure the data: this is the case for the attribute self.author
, where to recreate the author’s name, I ask Python to fetch the first name or the value noted with the key "firstName"
and the value noted with the key "lastName"
.
I then regroup the different objects (here Reference
and Tag
) into a list (list_reference
) that is the structure that will allow me to query my database according to the types of references for each entry: I can get the value for the attribute I defined as title
, in others words, I can play with data.
The third step is to set up the graphs model. As I did in the first step for the Python library allowing me to work with Zotero data, I import here the graphviz library that allows me to build graphs. The graphs will be automatically produced from my data previously structured and will be updated according to the collaborations of the participants to the Zotero library, but first I need to define a graphical framework for the visual rendering of these graphs. According to the type of information and their layout, I have established two styles here:
# Third step: to build the graph
# I import the library for graphs
import graphviz
# I define the visual charter of my graph
network = graphviz.Digraph(filename="MyTitle",
node_attr={'color': 'lightsalmon',
'style': 'filled',
'shape':'doublecircle',
'fontname':'Arial',
'fontsize':'13.0',
'margin':'0.05',
'fixedsize':'margin'},
edge_attr={'arrowhead':'none',
'style': 'filled',
'color':'deeppink',
'fontname':'Arial',
'fontsize':'10.0'})
network = graphviz.Digraph(filename="MyTitle",
node_attr={'color': 'mediumturquoise',
'width':'2.5',
'style': 'filled',
'shape':'signature',
'margin':'0.05',
'fixedsize':'margin',
'fontname':'Courier New',
'fontsize':'12.0'},
edge_attr={'arrowhead':'none',
'style': 'filled',
'color':'tomato',
'fontname':'Courier New',
'fontsize':'10.0'},)
The last step of the script concerns the architecture of the graphs: to automatize the visualizations according to the structure of the data that has been set. Currently, six graph architectures have been established based on the three main data type (author
, title
, tag
) to show the links between researchers, between themes or between references according to one of these three types. Each graph has its own relevance: the Author by title graph will show as clusters the researchers who have collaborated in the publication of a reference, the Title by tag graph will show references that share the same theme, the Author by tag graph will also show the cluster of researchers around themes, etc.
# Finally : the graph architecture with our data
# To avoid creating the same link several times
# I instantiate an empty set()
already_existing_links = set()
# I go through my reference list once with the reference1 loop variable
for reference1 in list_reference:
# I run my reference list the first time with the loop variable reference2
for reference2 in list_reference:
# I run the list of tags in reference1 with the loop variable tags
for tag1 in reference1.tags:
for tag2 in reference2.tags:
title1 = reference1.title
# if title1 is the same as reference2 title and tag1 is different from tag2
if title1 == reference2.title and tag1 != tag2:
# then I instantiate edge which is a tuple
#(immutable iterator i.e once created, it cannot be changed)
# of the following elements ordered :
# tag1, tag2, title1
edge = tuple(sorted((tag1, tag2, title1)))
# if edge is not a link already in my already_existing_links set
if edge not in already_existing_links:
# I create the link in my tags_by_title graph with
# tag1 as module1 or node1
# tag2 as module2 or node2
# title1 as the label of the link
tags_by_title.edge(tag1, tag2, label = title1)
# I add the link(edge) to my set
already_existing_links.add(edge)
# as a precaution, I display links that make up my graph
# I run my reference list once with the reference1 loop variable
for reference1 in list_reference:
# I run my reference list the first time with the loop variable reference2
for reference2 in list_reference:
# I run the list of tags for reference1 with the tags loop variable
for tag1 in reference1.tags:
# if the title of reference1 is different from the title of reference2
# and the tag of reference1 is present in the list of tags of reference2
if reference1.title != reference2.title and tag1 in reference2.tags:
# then I instantiate edge which is a tuple
#(immutable iterable i.e. once created, it cannot be changed)
# of the following elements ordered :
# title of reference1, title of reference2, tag of reference1
edge = tuple(sorted((reference1.title, reference2.title, tag1)))
# if edge is not a link already in my already_existing_links set
if edge not in already_existing_links:
# I create the link in my title_by_tag graph with
# the title of reference1 as module1 or node1
# the title of reference2 as module2 or node2
# the tag of reference1 as the link label
title_by_tag.edge(reference1.title, reference2.title, label = tag1)
# I add the link(edge) to my set
already_existing_links.add(edge)
# as a precaution, I can display the relationships that make up my graph
# I run my reference list once with the reference1 loop variable
for reference1 in list_reference:
# I run my reference list the first time with the loop variable reference2
for reference2 in list_reference:
# I run the list of tags for reference1 with the tags loop variable
for tag1 in reference1.tags:
for tag2 in reference2.tags:
for author1 in reference1.author:
# if the author of reference 1 is in reference 2 and tag1 is different from tag2
if author1 in reference2.author and tag1 != tag2:
# then I instantiate edge which is a tuple
#(immutable iterator i.e. once created, it cannot be changed)
# of the following elements ordered :
# tag1, tag2, author1
edge = tuple(sorted((tag1, tag2, author1)))
# if edge is not a link already in my already_existing_links set
if edge not in already_existing_links:
# I create the link in my tag_by_author graph with
# the tag1 as module1 or node1
# the tag2 as module2 or node2
# the author1 as the label of the link
tag_by_author.edge(tag1, tag2, label = author1)
# I add the link(edge) to my set
already_existing_links.add(edge)
# as a precaution, I can display the relationships that make up my graph
# to visualize
#tag_by_author
# to download
tag_by_author
# I run my reference list once with the reference1 loop variable
for reference1 in list_reference:
# I run my reference list the first time with the loop variable reference2
for reference2 in list_reference:
# I run the list of authors for reference1 with the authors loop variable
for author1 in reference1.author:
for author2 in reference2.author:
for tag1 in reference1.tags:
# if tag1 is in tags of reference2 and author1 is different from author2
if tag1 in reference2.tags and author1 != author2:
# then I instantiate edge which is a tuple
#(immutable iterable i.e. once created, it cannot be changed)
# of the following elements ordered :
# author1, author2, tag1
edge = tuple(sorted((author1, author2, tag1)))
# if edge is not a link already in my already_existing_links set
if edge not in already_existing_links:
# I create the link in my author_by_title graph with
# the author1 as module1 or node1
# the author2 as module2 or node2
# the tag1 as the link label
author_by_tag.edge(author1, author2, label = tag1)
# I add the link(edge) to my set
already_existing_links.add(edge)
# as a precaution, I can display the relationships that make up my graph
# I run my reference list once with the reference1 loop variable
for reference1 in list_reference:
# I run my reference list the first time with the loop variable reference2
for reference2 in list_reference:
# I run the list of authors for reference1 with the authors loop variable
for author1 in reference1.author:
for author2 in reference2.author:
title1 = reference1.title
# if the title1 is the same as the title of reference2 and author1 is different from author2
if title1 == reference2.title and author1 != author2:
# then I instantiate edge which is a tuple
#(immutable iterable i.e. once created, it cannot be changed)
# of the following elements ordered :
# author1, author2, title of reference1
edge = tuple(sorted((author1, author2, title1)))
# if edge is not a link already in my already_existing_links set
if edge not in already_existing_links:
# I create the link in my author_by_title graph with
# the author1 as module1 or node1
# the author2 as module2 or node2
# the title of reference1 as the link label
author_by_title.edge(author1, author2, label = title1)
# I add the link(edge) to my set
already_existing_links.add(edge)
# as a precaution, I can display the relationships that make up my graph
# I run my reference list once with the reference1 loop variable
for reference1 in list_reference:
# I run my reference list the first time with the loop variable reference2
for reference2 in list_reference:
# I run the list of tags for reference1 with the tags loop variable
for author1 in reference1.author:
# if the title of reference1 is different from the title of reference2
# and the author of reference1 is present in the list of authors of reference2
if reference1.title != reference2.title and author1 in reference2.author:
# then I instantiate edge which is a tuple
#(immutable iterable i.e once created, it cannot be changed)
# of the following elements ordered :
# title of reference1, title of reference2, author of reference1
edge = tuple(sorted((reference1.title, reference2.title, author1)))
# if edge is not a link already in my already_existing_links set
if edge not in already_existing_links:
# I create the link in my title_by_author graph with
# the title of reference1 as module1 or node1
# the title of reference2 as module2 or node2
# the author of reference1 as the link label
title_by_author.edge(reference1.title, reference2.title, label = author1)
# I add the link(edge) to my set
already_existing_links.add(edge)
# as a precaution, I can display the relationships that make up my graph
# to visualize
MyTitle
# to download
MyTitle.view()
I designed a graph architecture script for each type. Each script is initiated with an empty list (already_existing_links
) that will index the links between the data and will serve as a reference list to avoid duplicate links in the graph.
To give an overview of the script logic, I will further describe the Tag by title graph which shows the linked tags according to the references they are associated to.
The script is in the form of a loop (a series of actions that the script will repeat in all data). Assuming that a link is bidirectional here (that it implies two elements), I structure my loop according to two references: reference1
and reference2
(no matter the particular identity of these references since it is a loop, my script will repeat the actions between all the references). For each reference, I focus on the tags: tag1
and tag2
. Then comes the condition (if
) that allows me to define the link: if the title
of the reference1
associated to the tag1
is the same as the title
of the reference2
associated to the tag2
but the name of the tag1
is different from the name of tag2
, I have a link. In other words, I define a link if two different tags are associated to the same reference. In my graph, this will result in two nodes (one containing the name of tag1
, one containing the name of tag2
) and a link whose label will be the title
of the common reference. To avoid duplicates, before creating the link, the script has to check if it is not already present in the list instantiated at the very beginning: if the link is present, the script does not create the link, if the link is not present, the script creates the link and indexes it in the list.
The other scripts follow the same logic with some differences due to the data types.
The last lines of the script allow either to download the produced image from the graph or to visualize it in the page.
What was initially intended to be a single experiment in the context of a workshop, has gradually turned into a real research tool. Indeed the Python script had been implemented in the platform in order to produce a results page showing the graph and the code that produced it. Therefore it was now quite simple to define the information of the Zotero collection to be fetched as values that the user could indicate. The Graph our Research code that was designed for the Zotero library CCLA-Graph could then be used to produce graphs for other Zotero libraries. The users of the platform may now fill the information of their Zotero library to produce personalized graphs or to create other research networks for other communities4.
In the tool environment, an alert page has been planned in the event that the user does not properly fill in the information field on the Zotero library. If the user selects the same type of data twice (Title by Title for example), the results page will lead to a bug page that is part of the digital culture: the background image of this page is a photography of the "first actual case of bug being found" in the history of code and computer science by Grace Hopper (Harvard University, September 9, 1947), who was a computer scientist on the Harvard Mark II project.
Our play consisted in revealing the links among our research data in order to conceive of references as sources of unsuspected knowledge networks and communities identities, and in designing a space where the code or the technical aspect of the writing is an active principle for the design and the visual implementation of a theory by a community of researchers.
Barad, Karen Michelle. 2007. Meeting the universe halfway: quantum physics and the entanglement of matter and meaning. Durham: Duke University Press.
Kittler, Friedrich A. 1999. Gramophone, film, typewriter. Writing science. Stanford, Calif: Stanford University Press.
McLuhan, Marshall. 2010. Understanding media: the extensions of man. Repr. Routledge classics. London: Routledge.
The design of the idea was described in a post on my personal blog: Graph ta recherche.
It would be possible and conceivable in the future to extend this structuring work to other types of information of the Zotero reference (such as the date, the type of publication, the edition, the language of the publication).↩
The logic here is as follows: if there is no identified hair color then there is no hair.↩
Note that for now GTR only works on libraries that are public on Zotero.↩