Grooming data: Module 3 of HIST 3814o

This week in our course we considered what it means for historians to use data and adapt it to their purposes.  We also learned techniques to do that, which I will touch on below.

Yet, one of the biggest learning opportunities came from a productive failure of the availability of our course’s technical environment, DHBox. DHBox, available for access from our university, over a virtual private network, has been a marvel to use. However, even this virtual environment encountered an issue at the start of this course week when it became encumbered with too many files from its users.

The lack of availability of DHBox presented a great learning opportunity for students to adapt to. Mac computer users were able to use the terminal on their PC to replicate DHBox.  For learning regular expressions “RegEx”, sophisticated text editors such as Notepad++ could also be used instead of DHBox.

Unix running on Windows. Very cool.

Sarah Cole used the lack of availability of DHBox as an opportunity to set up her Windows machine to run Cygwin, a Unix command line application.  This is fantastic, I never thought I would see Unix running on Windows.  I had a similar experience, Windows 10 can run Ubuntu. With this installed, I was able to use my Windows computer in a whole new way.  I would not have thought to look for this if DHBox had been available.

Without going on at length, the lack of availability of DHBox this week highlighted not only the dependence digital historians have on technology, but also how we must plan to build technical resiliency into our projects.  We should plan for technical failure. Despite the lack of availability of DHBox this week, our class was able to move ahead because alternatives had been considered in advance and we were able to use them.

RegEx.

I learned a lot from the exercise to use Regular Expressions.  I had used RegEx a few times before, but I never really understood them. Using them was more like incantation, a spell to make data look different. Most of the time, I need to understand what the statements in a program mean before I use them; it is unwise not to.  This week with RegEx, I am finally starting to get a grasp of it. One thing I did which helped me was to make a copy of each file as I worked through each step of the exercise.  This simple thing allowed me to easily restart a step if I made a mistake.

OpenRefine.

Having spent a lot of time in previous years correcting data in spreadsheets and databases, I was impressed with OpenRefine‘s capabilities and ease of use. This week I only scratched a bit of what OpenRefine can do, but already it seems like an indispensable tool for working with structured data in columns and rows.

Wikipedia API.

I was thinking about our final project for the course and found out Wikipedia has an API thanks to the example here. This is something I would like to follow up and use for DH work in the future.

Last word.

This week I again learned a tremendous amount not only from the exercises but from the chance to follow some ideas that our course challenged me to think about. DHBox was down, and so I was able to use Ubuntu on my Windows PC instead. My OCR results for an exercise from the previous week were poor.  Looking for alternatives led to me trying OCRFeeder. Challenges made me curious to see how others in the class met them, and I learned a lot from that as well.

Data is Messy

A good long hike begins with preparation, a plan and map.  I knew this before I was soaked by a cold downpour on Mount Washington. Part of preparation is knowing things will go wrong.  Despite having a map, you may get lost or the trail may be impassable. A weather forecast is fine until a thunderstorm blows in. You might find someone who is injured or needs water. Later, you’ll want to tell your friends what to see and what to avoid if they follow in your footsteps.

Hiking outside is wonderful, but it can be…

treacherous
(Text from a story about the 1998 Ice Storm in the Shawville Equity, 7 January, 1998.)

 

 

 

Wrangling data is rewarding but it is messy.  It can be treacherous as well. To wrangle data, the digital historian needs to prepare, plan and map out where they are going in case they need to backtrack. That was something I learned this week.

Less mess with better OCR?

One example of messy data is the OCR’d text of the Shawville Equity.  As HIST3814o classmate Sarah Cole posted in her blog this week, the OCR of the Equity newspaper reads text horizontally across columns, making the task of isolating the text of a particular news story laborious.  It is to bad that the “C” in OCR does not stand for context.

OCRFeeder looks like a promising tool to use to do OCR with context if needed.  It has a graphic interface to layout individual stories using rectangles so that they can be processed in context.  It also works with PDFs directly. I found it was a challenge to install though. Notes about this are in my fail log for this week. Speaking of failures, I only found OCRFeeder because I wanted a better tool for OCR because my personal results using command line Tesseract were not usable.  OCRFeeder uses Tesseract for OCR too, so it must be using it much better than I was, a productive fail.

The user interface of ocrfeeder.
The user interface of OCRFeeder.

Gathering data with a crowd is messy.

Dr. Shawn Graham, Guy Massie, and Nadine Feuerherm’s experience with the The HeritageCrowd Project showed both the great potential reward and the complication of crowd sourcing history. The HeritageCrowd Project wanted to tap the large reservoir of local historical knowledge possessed by the residents of the Pontiac region of the Upper Ottawa Valley. However, the response of the crowd that had this valuable information for the website was complicated, even messy.  Some people misunderstood what the website was for or how to contribute to it. There was a fairly low response rate of submissions of approximately 1 person out of 4,000 residents. Some potential contributors were reluctant because they felt their knowledge was not professional enough.  Advance planning for research using crowd sourcing is likely even more important than for individual projects given the complexity of working with different people and the likelihood of losing the crowd if plans change or don’t work out.

Losing the crowd is very messy.

Gathering data with a crowd can be messy.  Losing the crowd, messier still.  When I started this course, I read Dr. Graham’s How I Lost the Crowd a transparent account of what happened to the HeritageCrowd Project’s website when it was hacked. This brought back my experience when I was a volunteer running a website that contained the personal data of several hundred people and it was hacked.  It was compromised three different times in attacks that increased in damage level. It is beside the point for me to write about that, still raw, experience here.  However it is very important for digital historians to heed Dr. Graham’s examples to back up work in a disciplined manner, take notes in case you need to rebuild a web site and pay real attention to security: use secure design, updated software and monitoring and review security warnings.

By the way, the story of the hacking I endured on my volunteer website worked out. We were transparent too and told everyone involved what had happened. We moved to a more secure internet provider.  We were able to restore the site from a backup.  We patched the security and implemented monitoring because we knew we, an all volunteer gardening association, were now a target.  This took several months of work. This all could have been avoided if I had been more proactive about the Heartbleed bug in April 2014.

The data might be clean but it’s not neat.

One of the ideas I had for the final project of this course was inspired by a recent course I took, Dr. Joanna Dean’s HIST 3310 Animals in History. Each year the Shawville Equity publishes coverage of the Shawville Fair, an agricultural exhibition featuring farm animals.  According to the fair’s website it has run since 1856 and I thought it would be interesting to trace if the breeds of animals shown at the fair had changed over the years, indicating either evolution in agricultural practices or societal change. However, despite the long history of both the fair and coverage in the Equity starting September 27 1883 on page 3, there are years where the edition covering the fair is missing, such as 1937.  Details of the fair are covered regularly, but the coverage varies from lists of prize winners, the data I would like, to descriptions of what took place. Also, in my sampling, I could not see changes in patterns of animal use at the fair over the years to write about. Maybe the longevity and consistency is what is historically significant?

When data is not neat it needs to be cleaned up. Where it is missing, the historian is faced with the question of whether or not to interpolate or do other interpretation based on what data is available.  Our course workbook has this observation: “cleaning data is 80% of the work in digital history“. In relation to this, a member of the class asked, “if this is the case, then why don’t historians keep a better record of the data cleaning process?” Excellent question. If cleaning the data is 80% of DH, it is also a dirty secret. Cleaning data changes data and makes it something new, as highlighted by Ryan Cordell in qijtb The Raven. While we may be changing data for the better, it’s really to better suit the purposes we have for it and so we may also change how our data is interpreted by other historians.  To mitigate the risk of misinterpretation or error, it is important to document how the work of DH takes place so that DH work holds up to scrutiny and can be reproduced.  DH work is complicated, and sometimes the historian may have to reproduce their own work in order to correct an error. Documenting also helps explain how we spend 80% of our time.

 

 

 

 

Librarians – Ensuring data can be found so that it is used.

Having searched many times for books at home and not always finding them, I appreciate the work of librarians on a practical level. In a library I have always been able to find the book I was looking for, even a few weeks ago when I was at the MacOdrum library late on a Sunday night and a staff member found the book I needed in a basement office.

Libraries are places that collect, store and preserve information. Just as important, libraries catalog information so that it can be found and used. As a young teenager when I went to the library I knew to look under catalog number 737 for books about coin collecting, my major hobby back then. Science fiction and fantasy books were stored together so that I could easily look at books by different authors within the same genre. If I liked an author, it was easy to find other books they wrote. These simple tasks in the library were possible because librarians had a systematic means to organize content that could be easily used by a person like me, a non-librarian.

Without considering the operational effort to purchase, shelve and house items in a library, there is also tremendous skill that is required in how a library is organized in order to make it seem easy to use to a thirteen year old. As mentioned above, libraries have long had classification systems for non-fiction books, such as the well known Dewey-decimal system, and conventions for organizing fiction books. Yet as long standing as these organization systems have been, they continue to change in the library. Carleton University’s MacOdrum Library uses the Library of Congress Classification System, consider the recent changes to how information would be classified under these categories:

TK7800-8360 Electronics
TK7885-7895 Computer engineering. Computer hardware
TK8300-8360 Photoelectronic devices (General)

Librarians continue to deal with new information even as they maintain a consistent way to access it.

Another skill librarians have is knowing how to provide information through a library while also protecting the rights of authors and publishers who have sold material to the library under specific conditions of use.  Although not directly related to libraries, Cory Doctorow’s article, The Coming Civil War over General Purpose Computing, touches on the concepts of Digital Rights Management that librarians must deal with as the amount of electronic media in libraries grows from electronic books, to journals, music, videos, software and even 3d printers.

Librarians have become users and developers of databases as well as search tools to find items in them.  HIST3814o class members @angelachiesa and @bethanypehora discussed whether something has historical value if it could not be found. Extending this idea to search tools bears thinking about by librarians.  A search tool can confer value to a piece of information by putting it at the top of the list of results.  A flawed search tool can destroy value by not displaying information items in a library that are relevant to what a researcher is looking for, as the MacOdrum Library’s electronic search tool Summon is purported to do.

Where I work we have a library that is accessible by the public.  Co-workers of mine and I have discussed if we should have a library anymore. In this electronic age, we should be able to just Google information.  The space occupied by the physical book collection we have should be used for something else more innovative.  That is how one argument goes. However, as the amount and types of information available to us continues to grow, the need for skilled people to collect, organize, curate and make visible information is greater than ever. Many of those skilled people are librarians and they work in libraries.

 

 

Finding Data, Experience with tools in module 2 of HIST 3814o.

Exercise 1, Using databases.

It was excellent to see how well organized, detailed and search-able a history database could be with the Epigraphic Database Heidelberg. Data could also be downloaded as json, here is a sample. Among many other things, the database would be a great reference to locate Roman inscriptions when traveling.

The Commonwealth War Graves Commission (CWGC) web site is used to locate the graves of soldiers who fought in the Commonwealth forces during the First and Second World Wars. It can also locate cemeteries and I looked up the Camp Hill cemetery in Halifax, NS.

In World War I my great-grandfather Samuel Hood served with the No.2 Construction Battalion, which is also known as the Black Battalion. Below is an excerpt from the Canadian Encyclopedia about the burial of veterans of No. 2 Construction at Camp Hill cemetery:

Many veterans of the Black Battalion were buried in Camp Hill Cemetery in Halifax. Each grave was marked by a flat, white stone, forcing visitors to crouch down and grope the grass to find loved ones. In 1997–98, Senator Ruck successfully lobbied the Department of Veterans Affairs, and each soldier received a proper headstone and inscription in 1999.

The spreadsheet for the Camp Hill cemetery I downloaded from the CWGC did not have names from No.2 Construction. That may be because members of No.2 Construction who died during World War I were not buried at Camp Hill, while veterans who died later were.

I searched for the unit in the CWGC search engine and could not find reference to it. Is it possible that no soldiers from No.2 Construction died in World War I? No, for example, the battalion’s war diary mentions that on September 23, 1918, 931410 Pte. Some, C. was found dead and likely murdered.

Searching the CWGC’s database using Private Some’s service number brings up his record, but his regiment is listed as Canadian Railway Troops. According to oral history No. 2 Construction sometimes built railways, but they were not Railway Troops. The unit was attached to the Canadian Forestry Corps. Some of the other work No. 2 Construction did in the war was logging and producing lumber.

I began to conclude that the war dead of No. 2 Construction were misclassified in the database. By trial and error I learned I had to type the name into the search box for unit as “2nd Construction Bn.” or “2nd Construction Coy.” Those searches returned the war dead, however the regiments listed for them were different. All this to conclude that even with impressively organized databases the problems with how data was recorded originally may persist.

Exercise 2, Wget.

Using wget will be essential for our final project, I wish I knew about this before for other work.

Exercise 3, Close Reading with TEI

Doing this exercise made me wonder about the benefits of web sites like allowing keen volunteers who care about quality to transcribe their text (with the aid of OCR). This exercise was laborious and to save time I made an .jpeg of each column of text, saved that file, and uploaded it to freeocr.com.

I ran one image through freeocr.com and got some text and some unreadable characters. However, it was faster to use some of the good text than to type it. I converted the second image to just a black and white image and this reduced some of the background smudging that stopped good character recognition. I OCR’d that and the results, albeit for a different column of text, seemed better.

After correcting some syntax issues in the xml, I like how precise it is, it was satisfying to see the text render with different colors for additional information. I did not finish marking up the people in this exercise given I had to move on.

Exercise 4, API’s.

I edited Dr. Ian Milligan’s excellent program to retrieve data from Canadiana.ca.

I ran this and then saw I was getting a lot of files so I canceled the run. I did not want to pollute DHBox (my output.txt was 340 mb) So I refined my search using Shawville as the city. I kept getting huge downloads, regardless of the time period. I also used a couple other towns, for various time periods. “Why am I getting all of this content for New Germany, Nova Scotia during 1919-1939?” is close to what I thought.

I executed:

curl 'http://search.canadiana.ca/search/'${i}'?q=montenegr*&field=&so=score&df=1914&dt=1918&fmt=json' | jq '.docs[] | {key}' >> results.txt

When I looked at results.txt and it had 60 lines. I ran the curl command again; results.txt now had 90 lines. Again; 120 lines. This command was appending my various searches and mixing results.

I got better results when I deleted the working files before running $ ./ canadiana.sh. I added these commands to the top of the retrieval program:

# to clean up from the previous session
rm results.txt
rm cleanlist.txt
rm urlstograb.txt

This is uploaded to:

https://github.com/jeffblackadar/module2e4

Exercise 5, Twitter Archiving

It is very exciting to retrieve data from Twitter using a program. Thanks to my HIST3814o classmate @sarahmcole who posted helpful content about how to install twarc on DHBox.

I thought I would start with something small by searching for tweets related to our class:

$ twarc search hist3814 > search.json

This was a fail: search.json had 0 results.

Manually searching Twitter for HIST3814 showed one result from less than 2 weeks ago on July 11, I was expecting to see that at least.

I got many more results with this:

$ twarc search canada150 > search.json

I installed json2csv in order to convert search.json into a .csv I could view as a spreadsheet. Thanks to a post by @sarahmcole in our course Slack channel I expected to get an error:

      module.js:433           throw err;           ^       SyntaxError: /home/jeffblackadar/search.json: Unexpected token {

I used @sarahmcole’s excellent program and fail log to adapt the json that json2csv was having errors with into a form json2csv could use. Here is a modified version of the program

# code inspired by https://stackoverflow.com/questions/4746190/find-and-replace-within-a-text-file-using-python  
# Thanks to @sarahmcole who wrote this 
 n = 1 
 f1=open("search.json", "r") 
 f2=open("output.json", "w") 
 f2.write("{") 
 for line in f1: 
         addComma="," 
         if n==1: 
                 addComma="" 
         line = addComma+'"tweet' + str(n) + '":'+line 
         n = n+1 
         f2.write(line) 
 f2.write("}") 
 f1.close() 
 f2.close()

Given the .json files related to Canada150 were really big to work with and thus hard to troubleshoot I used a smaller search:

$ twarc search lunenburg > search.json

I ran the program and then checked the json formatting at https://jsonformatter.curiousconcept.com/ , it was “correct”

I then ran:

 $ json2csv -i output.json -f name,location,text

It ran but I didn’t get a the multiple rows of values I expected, I got:

 "name","location","text"  ,,

To spare you, the reader, the full blow-by-blow here is my conclusion for this exercise:

Using a program I can format a json file so that json2csv could parse it without errors, but I ended up with a file that was only 2 rows. The first row was tweet1, tweet2, tweet3, etc. The second row consists of each tweet in full unparsed json format.

Using a program I can produce a json file that looks correctly formatted, but json2csv has errors when parsing it.

Dr. Graham has posted that json2csv may have a new implementation and is not performing as expected.

A bit more with Twitter json

The json for each tweet has a complicated hierarchy. I took one line and put it into a json viewer and there I could see how it was organized. There are a number of json to csv conversion websites and tools on the internet but several of them look pretty shady.

Reading through for solutions to my problems with json2csv I ended up trying jq to make a .csv. I chose to use tilde ~ as my delimiter given that the text of tweets often contain commas.

In jq each json entity starts with a . and some entities (like name) are subentities: .user.name.

Running this:

jq '.user.name+""""+.user.location+""""+.lang+""~""+.text+"""' search.json > outjq.csv

I was able to produce a hacked together file delimited with ~ characters

https://github.com/jeffblackadar/m2e5/blob/master/outjq.csv

Thanks to @dr.graham for the jq tutorial here.

Exercise 6, Using Tesseract OCR

All of the software installed and ran as expected, but the ocr results I saw after I ran $ tesseract file.tiff output.txt were not usable. I am not sure why, but suspect the darkness of the newspaper images as the cause. I tried to edit and upload some clearer images to test Tesseract with, however DHBox kept giving “0 null” errors which would stop the upload. I was able to upload an another image file of text from exercise 3 that was 230k in size. I was able to use Tesseract to recognize text from it.

Open Access Research – The Benefits and Obligations for Digital Historians.

Trevor Owens writes a cautionary tale about the David Abraham Affair where this former historian’s work was discredited due to inconsistencies in his footnotes. Owens asks “when it takes 15 seconds instead of 15 hours to fact check a source do we think historians will start to write differently, or otherwise change how they do their work?”

I don’t think so. My training in history at Carleton has instructed me to keep rough copies of my notes and be rigorous in citing references. I complete a paper with the expectation that it may be checked and also with the knowledge that the professors I am submitting work to are specialists and know the area I am writing about much better than I do. However, I do make errors and find formatting footnotes to be tedious. Much more importantly, footnotes and bibliographies are important pieces of work themselves. Using smart bibliographies as Owens recommends is not only more efficient for the historian writer, it offers the ability to share this work with others more easily.

Ian Milligan describes the impact of SSHRC’S Research Data Archiving Policy which mandates that work funded by SSHRC must be made available for use by the public who paid for it. This includes research data. This is a sensible directive given that researchers can do their work more effectively by building on the credible work of other researchers rather than starting from scratch. It also provides an overall benefit to historians. Even with the vast amount of electronic data, there is still a huge amount of information that is not available digitally. Potentially a historian’s research data may exist no where else on the Internet and making it available on-line allows other historians to use knowledge that may previously had been difficult to access. Also, as historians benefit from governments providing more Open Data, so we should reciprocate by keeping our research data open too, as long as it is ethical to do so.

In her blog entry “Generous Thinking: Introduction” Kathleen Fitzgerald notes that when universities train students to be critical thinkers, students often miss what the author they are reading is trying to communicate. More Generous Thinking is needed in order to properly study at university. Fitzgerald’s idea of Generous Thinking also applies to the unspoken contract between the Digital Historian who opens their notebook to the public and the public who reads it. By using an Open Notebook, as described by W. Caleb McDaniel, the Digital Historian benefits when a member of the public reviews their work and generously offers additional related material or alerts the historian to an error, something that would save them a great deal of time compared to finding the error during publication. However, if the public is unduly critical of the historian’s open notebook, it discourages historians from posting further work. Open Notebook history is most useful when it is part of an active project, but that also requires the most fortitude from the historian to keep it open as ideas develop and change in front of a virtual audience.

Sheila Brennan’s blog entry “My Digital Publishing Update: Nothing” describes the pitfalls of publishing a digital first project with editors used to print projects. The Digital Historian, like anyone else delivering a project, needs to be aware of the specifications of what will be produced. If the outcome of a project is to, for example, produce an article for a journal, the project should be designed to create a deliverable that matches that outcome.

Experience with Module 1 in HIST3814o

One of our exercises this week was to write about a thought provoking annotation in the course readings using markdown syntax and the Dillinger.io editor – viewer. At first, I was I was not in a good mood to write and did other work. I came back to this on Saturday morning. I was fired up the about a podcast cited by @sarahmcole that described the decline in the number of women studying in Computer Science since 1984. On Saturday I also discussed the issue with my with my wife and daughter who gave me additional insight. I am glad I procrastinated so that I could consider this issue by writing about it here.

As a technical aside, I noticed that Dillinger rendered my footnotes from markdown syntax, but the .md viewer on Github did not recognize them.

Accessing DHBox

It is excellent experience for our class to access a virtual machine like DHBox remotely. As more computing power and services move to the Cloud, users of data analysis tools, such as digital historians, will access these tools through virtual machines, similar to DHBox. Rather than using a computer on a desk to process data, desktop computers will increasingly be used as terminals to access virtual computers with potentially much greater processing power than what would be available on a physical computer in an office.

Git and Github

Quite a few times when I needed to download something for a computer project, I was directed to Github. I just used it as a consumer, I took the file I was looking for and left. I knew Github was used to run software projects including open-source ones, but I really had only a vague idea of how contributions to these kinds of projects were made.

I now have a better sense of Git’s function for version control and branching within a repository. I also see Git’s role in governing large projects with multiple contributors. I realize I could make a fork from someone’s repository if I wanted to work on their project independently. If my work progressed to the to the point where it was worthy of including in the original project, I see how I would make pull request to ask if my contribution could be added to the original project I had forked from.

Raspberry Pi

I had wanted to test if Raspberry Pi could perform some of the things we do in this course. It runs Linux and is a low cost computer. I think it has potential to reduce barriers to people getting access to computers.

I found that Github, Slack and Hypothes.is are not supported on the versions of the Safari and Chromium web browsers I have on the Raspberry Pi (Pi for short). Some reduced functionality is available, but not enough to do real course work with these tools.

I did an experiment to see if I could connect the Raspberry Pi to Carleton University’s VPN so that I could access DHBox. I was unable to install OpenConnect, the VPN client, likely due to my lack of knowledge of OpenConnect. I was able to log in to the front page of Carleton’s VPN using the Safari web browser on the Pi. The AnyConnect software launched, but it downloaded the Windows version which does not work on the Pi.

I have stopped work on this for now.

Help

Seeing the annotations has provided deeper insight into the readings, it is interesting to see what others in this class are thinking. The postings in Slack have also been helpful as have links to blog posts.

Fail Log

In the interests of practicing open notebook Digital History here is my fail log for this week. I must admit, I don’t like how its formatted.

Diversity in DH and the lack of Women In Computer Science.

There are many thought provoking annotations in our course this week, but it was @sarahmcole’s post in our course Slack off-topic channel that stuck with me the most. She called out how the lack of availability of computers to women affected their decisions to complete degrees in computer science, according to a documentary she cited.[1] This resulted in a striking decline in participation by women starting in 1984, in sharp contrast to other types of degrees. The National Public Radio podcast she found describes this history and makes points I had not considered before, despite having lived though this time during high school.

My family bought the computer mentioned in podcast, the RadioShack / TRS-80 Color Computer 1, in 1983. I have a younger sister and brother, and while the computer was in various common rooms, it did become more mine. I was on it a lot. My younger sister Janelle used the computer and worked through exercises in the book “Programming with Extended Color BASIC.” but didn’t get access to it nearly as much as I did and stopped using it after the first summer we had it. Janelle now works as a detective in internet crime for the Toronto Police force and lectures and trains other police forces on technology internationally, but did not take computer science in university.

TRS-80 Color Computer 1 with a BASIC program
Figure 1, TRS-80 Color Computer 1 with a BASIC program.[2]

In my high school computer classes in 1984, I remember it seemed that the people who were the most proficient with computer programming were male. While as a group we weren’t particularly smart (judging by achievement in other classes), we all had computers at home, the same advantage mentioned in the podcast. Today my daughter has her own computer, but has faced male majority/male dominated computer programming courses in high school where she has at times felt uncomfortable. Two years in a row her class debated the École Polytechnique massacre of 1989. My daughter thought this was counter to encouraging women to continue in computer science.

I asked my wife Christine Blackadar about her experience. Her family purchased a Commodore 64 in 1983 and she remembers that she and her younger brother were so excited about a game called Blue Max that they typed in the machine code for another game from a magazine article. (It didn’t work.) Christine’s experience taking computer courses in high school resembled what I had seen. There were not enough computers to go around and in a male majority class she ended up with a male partner who she described as “not scary, just very off-putting.” Her partner got much more time on the shared computer and he frequently rushed Christine to finish her work. She said the male students in her class formed a group that the male teacher catered to. Although Christine did not go on to study computer programming in university, she did become a software developer in the late 1990’s. Ultimately she chose to change careers and become a teacher. One of the things she loved during her career was to teach computer skills to her students.

Many of the pioneers of computer programming were women. My favourite is United States Navy Rear Admiral Grace Hopper.  Among other achievements she is a co-developer of COBOL, a computer language still used where I work. If you use a bank, it is likely some of your financial transactions are still processed with this language. It is interesting to mark the close proximity of two events, in the NPR Podcast referenced above, 1984 was the inflection point when women’s participation rate in computer science started to decline. 1986 marked the final retirement of Grace Hopper from the United States Navy.
Grace Hopper and UNIVAC
Figure 2, Grace Hopper and UNIVAC.[3]


  1. Planet Money, “Episode 576: When Women Stopped Coding,” aired October 2014, National Public Radio podcast, 17:13, posted July 22, 2016, http://www.npr.org/sections/money/2016/07/22/487069271/episode-576-when-women-stopped-coding. Retrieved 15 July 2017.
  2. Gilles Douaire, “TRS-80” photograph (2013), Flickr, https://www.flickr.com. Retrieved 15 July 2017.
  3. Smithonian Institution, “Grace Hopper and UNIVAC” photograph (c.1960), Flickr, https://www.flickr.com. Retrieved 15 July 2017.

Completed for HIST3814o Module 1, Exercise 1.

This was composed in Dillinger.io as markdown and then exported as html.

What is digital history for me anyway?

What is digital history for me anyway?

Digital history represents two things to me:

  1. The preservation and sharing of artifacts using digital media.
  2. Using computers to process, analyze and visualize data for history.
  1. Preservation and sharing of artifacts using digital media.

Figure 1 Oxen competing in a pull at the South Shore Exhibition, Bridgewater, Nova Scotia.
Figure 1 Oxen competing in a pull at the South Shore Exhibition, Bridgewater, Nova Scotia. 1940’s-1950’s. Photograph. Clifford Blackadar is in front of the ox team. Author’s personal collection.

The photo in Figure 1 represents a physical artifact that has been electronically copied for preservation. The photo is interesting to me not only because my grandfather is in it but also because his oxen were able to pull hard enough that the competition organizers ran out of boxed weights and supplemented them with people riding on top of the sled.  This photo was stored in a detergent box of other photos with no other context or information and was at risk of being thrown away. Making a digital copy of an artifact increases the chance that a reflection of the object will survive longer in the passage of time and be available to historians.  Sharing artifacts digitally offers the chance that someone else can add to the story of the artifact.  Perhaps someday, a person or even a computer program will be able to tell me the exact year this photo was taken because I was able to share it with them.

Carleton University History professor Dr. Bruce Elliott has spoken about the loss of the inscriptions on thousands of marble gravestones in Philadelphia due to acidic air pollution.  If the text that was on these stones was also available electronically, at least it would still be available to consult and combine it with other data.

2. Using computers to process, analyze and visualize data for history.

Digital Historians examine data, look at patterns and document them.  Nevertheless, as historians first, Digital Historians interpret and marshal evidence to make an effective argument. DH related or not, historical evidence must be directly related to the historian’s argument or else it should be left out. As James Baker writes in his article The Hard Digital History That Underpins My Book, “history writing is concise, precise, and selective: not telling your reader everything you know is central to how we present interpretations of the past.” (James Baker, “The Hard Digital History That Underpins My Book”, https://cradledincaricature.com/2017/06/06/the-hard-digital-history-that-underpins-my-book/. Accessed July 9, 2017.)  As Angela Chiesa notes “you can’t add everything to your finished product.”  (Angela Chiesa, annotation,  https://hyp.is/V_hTdGTMEeeGVu9qHthJKA/cradledincaricature.com/2017/06/06/the-hard-digital-history-that-underpins-my-book/.  Accessed July 9, 2017.)

At the same time, the method, data and outcome of all work related to a historical inquiry should be documented in a systematic manner so that other historians can make use of the work.  Other historians may repeat the research exercise and draw a new conclusion or repeat it and find flaws that when fixed provide new insights.  The documentation of DH work so that it can be repeated and refined necessitates the use of same type of discipline and methods scientists use with their experiments, where even failed experiments, properly understood, can provide important discoveries and be the productive fail. Here, the Digital Historian is a scientist with data.

An excellent in-class example of this type of DH visualization was Bethany Pehora’s Annotation and Slack post comparing the publication of novels in Paris and London between 1820-1920.  (Bethany Pehora, https://hist3814o.slack.com/?redir=%2Ffiles%2Fbethanypehora%2FF663W7V4P%2Fparis_novels_vs._london_novels.pdf.  Accessed 9 July 2017.)  A spike in publication of French language novels in Paris during the time of the Franco-Prussian war was evident, a very interesting pattern deserving further investigation.

Why I am in this class?

I was first exposed to Digital History during a lecture by Dr. Shawn Graham as part of Dr. Paul Nelles’ course on the Historian’s Craft. The lecture and assignment intrigued me both as someone returning to the academic study of history and also a user of computer technology.

I graduated from Carleton University in 1992 with a 3 year Arts degree with a major in History and I decided to return again to Carleton in the fall of 2015 in order to complete an honours year part time. Each history course I have taken has taught me a great deal, regardless of the topic.  I have been looking forward to taking this course and I plan to gauge the potential for me to continue to study in this area.

As an amateur historian, I have been interested in areas that relate to Digital History. For example, I did some work on the 19th century history of the Ottawa Horticultural Society, a local gardening club.  Some of this work involved scanning annual reports of the Ontario Horticultural Association over previous years and converting them to text.  In the course of doing that I made copies of these reports available to other local horticultural societies here.

I would like to take my ability to gain insight by working with data much further.

My level of comfort with digital tech.

Please indulge me while I describe my first encounter with the “Information Highway”.

My first experience posting content to the web.

When I first saw the web in July 1994, I didn’t really know what I was looking at.  I had a dial-up account with National Capital Freenet and was using Lynx, a text web browser where I followed links to content.  I remember telling my wife that I had reached Cleveland and Florida by following these links.  It was pretty neat, but I could not figure out what to do with it.

Once I saw the Mosaic browser, I was hooked and wanted to post content. I thought that the advent of World Wide Web was going to be my generation`s radio, the medium my father worked in for his career and greatly enjoyed.

One of the first web pages I made was about Ginkgo trees.  This type of tree has an interesting natural history, a unique look and grows well in Ottawa.  I had tried and failed to grow Ginkgoes from seed multiple times until I adapted a technique used to grow avocado pits.  I wanted to tell more people about this technique, and the web was at hand.  At the time I made the html page I could not find much on Internet about Ginkgoes and so I researched some of the content for the page at the Ottawa library. Here is a close to original version of the page a couple years after I first made it. When I made the page, I thought the subject might be of interest to only a few people and I stopped checking it.  A few months later, I was surprised to find the page had more than 15,000 hits and numerous comments in the guestbook. Since that time, many more and much better web pages about Ginkgoes have been published. It was interesting to me to see how rapidly the new medium of the web proved to be useful for communicating with many people who shared the same niche interest.

This was a very interesting time for technology to say the least.

Generally, I am comfortable with programming, using databases, web technology and different operating systems.  But I also know I have to keep renewing my skills and lots of things I was good at are now obsolete.  (Does anyone need any help with Lotus Notes?)

I am interested in different kinds of history.

I just completed Dr. Joanna Dean’s course Animals in History, hence my picture of an ox team above.

The histories of Ottawa and the Ottawa Valley, southern Nova Scotia, the No. 2 Construction Battalion and Environmental History are areas I have enjoyed studying and reading about over a longer time. Recent courses about the histories of Middle East and the Inuit gave me new perspectives.

What will I get out of this course?

I know I will learn new methods to examine history.  This will involve new theoretical ways to think about historical inquiry as well as the practical aspects of using data and software to look for patterns that shape a historical argument.

I know I will be challenged by the material, exercises and pace of the course. I will learn a lot from the talent of fellow students too.

Why haven`t I learned this before?

I am trying to log on and do a bit of work while at an airport, but I forget some of my passwords. My current passwords are strong and different, but I did not keep note of a mnemonic for them. This reminds me of some bad habits I have had for a long time. Often when I start computer related projects I am excited and dive right in. If I take any notes at all, they are cryptic. Poor note taking may be ok when I am working on something intensely, but when I`m interrupted it`s difficult to get started again. Sometimes these projects end up abandoned. So, one of the things I hope to learn from this course is to be more disciplined with my notes related to computer projects in the same way I am much better at writing down detailed references when I am working on a paper. Taking good notes saves time. You likely already know this, I am still learning that.

A quick introduction

This is a test post, but since I am writing this as I mop up the aftermath of my family vacation, I may have an excuse to tell you where I am.  I am writing from a fairly anonymous airport hotel in Richmond, Virginia as I am waiting to fly home to my family after returning a rental car that we needed after our van`s engine died. Of course Richmond has a great deal of history, but I did not plan to be here for long.

Our original vacation plans were somewhat related to interests I have in history.  I enjoy visiting the sites of early European colonization of North America and a few days ago we visited Manteo on Roanoke Island, North Carolina and saw the U.S. National Historic Site Fort Raleigh there.  It is also a site of a U.S. Civil War battle and the island became a settlement for freed slaves until 1867.

We camped at Cape Henlopen, Delaware the site of Fort Miles, a World War Two defensive fortress and later a listening post to detect submarines from the U.S.S.R. during the Cold War. On the way there we dropped into the Eastern State Penitentiary in Philadelphia.  That was an interesting if somber place to see the evolution of penal practices in the United States as well as the prison`s original innovative but psychologically damaging design.

This blog will be much more concerned with digital history in future posts, but I hope I have conveyed a few of the things I am curious about.