Recently in tech Category

Intertextuality demo

A quick look at an in-house tool we’re developing at the Yale Digital Humanities Lab. The example corpus in this case is a collection of 19th-century Swedish novels, but the approach is language-agnostic. We’ve also tested it on Camus and picked up his weaving of passages from previous works (Carnets, La mort heureuse) into The Stranger.

In this particular case, the tool is showing a case of literary plagiarism: a book in 1900 steals dozens of passages from a novel from 60 years earlier. We have a chapter that came out last year that explains the context for this particular case of literary “borrowing” — but there are undoubtably many more yet to be discovered as more books are digitized.

Link to this Post | Leave a Comment

Thanks to a person who found an early CD-ROM from the Berkeley Macintosh Users Group, I was able to recover the first BMUG Newsletter and play around with few ways of preserving its original look and feel. Based on file creation dates, and contextual information in the text, this “Fall 1984” document was probably finalized in September. The contents are a fascinating look into the Macintosh fan culture at that time.

Although I’ve been collecting the physical BMUG Newsletters for a few years (each of which is several hundred pages), the very first edition of this series was electronic only: a series of MacWrite documents, split up to avoid the page limitations in that early software and in the first Macintosh’s 128k of RAM.

Although it’s possible to convert files written in MacWrite to modern formats (text, RTF) with a variety of converters, these won’t preserve the ‘look and feel’ of reading the document on a 72-dpi screen. Specifically, the MacWrite document used a variety of bitmapped screen fonts, many which were designed by Susan Kare. These typefaces capture a certain style and moment in the history of desktop publishing — converting them to Times New Roman on a high-DPI screen completely loses that.

To get closer to the right look, I wanted to create the equivalent of what the output of an ImageWriter I (the Mac’s dot-matrix printer) would have looked like. I created a custom emulator using minivmac with an artificially-large screen. (Essentially, a Mac Plus with an impossible Portrait Display.) It looked like this:

…and allowed me to view an entire page of the newsletter on the screen at the same time. Taking screen captures of all pages, and importing them into Photoshop en masse created a set of images that could then be cropped to eliminate everything except the WYSIWYG portion of the page. Finally, I re-created the original page margins by setting a Canvas Size of 8.5x11” at 72dpi.

I bulk-exported all the layers in the PSD document and configured Adobe Acrobat DC to not apply any lossy compression to the image. (The compression algorithms are optimized for modern, multi-bit color images and will wreak havoc on 1-bit monochrome files.) Then some final adjustments to prevent Acrobat’s Optical Character Recognition from trying to deskew the (perfectly straight) pages, and we end up with the images below.

There’s one final trick to reproducing these bitmapped images on today’s high-resolution displays: disabling image resizing algorithms. Modern web browsers must resize 72dpi images to twice their original dimensions to display properly on “Retina,” 4k, and other kinds of HiDPI displays. They use upscaling algorithms that are optimized for full-color JPEG photographs, which look great on those files but introduce fuzziness into monochrome pixel art. For this reason, each image below has the following CSS arguments applied, to force browsers to display them as crisply as possible:

image-rendering:optimizeSpeed; image-rendering:-moz-crisp-edges; image-rendering:-o-crisp-edges; image-rendering:-webkit-optimize-contrast; image-rendering:optimize-contrast; image-rendering:crisp-edges; image-rendering:pixelated; -ms-interpolation-mode:nearest-neighbor;

If you’re viewing this page on a mobile phone, the PDF copy of all 24 pages will probably look best. But if you’re on a laptop or desktop computer, the individual pages at their original 72dpi are reproduced below:

Link to this Post | Leave a Comment

Lumio lamp

One holiday gift arrived early this year — a Lumio lamp:

Lumio lamp

This LED light unfolds from something that looks like a book, automatically turning on when the covers are opened. Interestingly, it uses a lightweight and long-lasting lithium battery and recharges over mini-USB. These are two technologies that owe their widespread use to the modern smartphone; they are now spreading to portable lights.

Link to this Post | Leave a Comment

Microfilm scanner arrives

Only a few days after the book scanner, the Lab’s Mekel Mach 10 microfilm scanner was delivered.

Mekel Mach 10 delivery

Much like the book scanner, this device will allow users to rapidly scan an entire microfilm reel in about 5 minutes. The resulting strip, a long digital image, will be sliced up into discrete document frames and processed using OCR to create textual corpora.

Microfilm reels for testing scanner

Link to this Post | Leave a Comment

Book Scanner arrives

The Zuetschel OS 15000 Advanced book scanner for the DHLab arrived:

Zeutschel OS15000 arrives

The biggest box in the photo below is actually for the table to hold it (some assembly required):

Zeutschel table, ready for assembly

The book scanner will allow folks to digitize print collections in our library — not to redistribute, but rather to create corpora for text mining.

Link to this Post | Leave a Comment

DHLab Status Board

We’re experimenting with using the Status Board software (from veteran Mac developers Panic) to display updates and other information in the DHLab. Here’s a preview of how it looks:

Status Board test

Eventually we’d like to add information from Slack, Basecamp, and other associated services… as well as a feed of our growing book collection we maintain in LibraryThing.

Link to this Post | Leave a Comment

New DH Books

Just got a new shipment of books into the Digital Humanities Lab at Yale…

New shipment of DH books!

A photo posted by Yale DHLab (@yaledhlab) on

We’re maintaining an informal catalog of these books, which are non-circulating desk copies in the Lab, on LibraryThing.

Link to this Post | Leave a Comment

ESRI UC 2015, Continued

This weekend was spent both at the ESRI Education User Conference…

ESRI Ed show floor

…as well as at various local taco places:

Chicken Verde, Tuna Asada, Cochinita Pibil at Puesto

Link to this Post | Leave a Comment

ESRI UC 2015

Education User Conference keynote

I’m in San Diego for the next week for a GIS conference put on by ESRI, the maker of ArcMap and other geo-spatial software. ESRI’s programs are sort of like the Photoshop of the GIS world: expensive, difficult-to-learn, encumbered by decades’ worth of legacy interfaces and workflows — but also incredibly capable. Nearly any task you can think of with a map is accomplishable, if you can figure out how.

In my Digital Humanities work, I more often work with geo-spatial software at the web browser level: Leaflet.js, CartoDB, and similar. These technologies, among others, help power some of the maps on Yale’s Photogrammar project. But there’s no question that some problems and datasets require the kitchen-sink tools and computational power of ESRI’s Windows-only software stack. So I’m at ESRI User Conference to learn more about these tools, and bring any knowledge I can back to the Yale Digital Humanities Lab when I return.

San Diego Convention Center

I have to admit I was also looking forward to a different class of Mexican food in San Diego, and Común Taqueria did not disappoint. They put Marita chili ash on top of their chips, which can led you to wonder exactly what the black stuff is on the chip you’re about to put on your mouth — but which is ultimately delicious.

Común Taqueria

Común Taqueria

Link to this Post | Leave a Comment

Macworld magazine recently ceased print publication, but an earlier victim of the shift to online news was MacWeek, a restricted-circulation industry broadsheet that was passed around at user group meetings and tech offices alike. Between 1987 and 1999, this weekly tabloid-size glossy was one of the best ways for Mac fans to keep up with the latest news from Cupertino.

I’ve scanned the cover of the first issue, from April 1987, below. Inside are some interesting tidbits, including the launch of PowerPoint (Mac-only, and not yet owned by Microsoft) and the first piece from gossip columnist Mac the Knife.

MacWeek first issue

Link to this Post | Leave a Comment

Digital Preservation

While home over the holidays I was interested in seeing what the earliest digital document I could find would be. I think the best contender is this circa-1985 5.25” floppy disk, which probably holds WordStar files:

I have a few machines with disk controllers that can use such a floppy disk drive — the drives themselves go for about $10-$30 on eBay. The problems I’m likely to encounter are both media failure due to physical degradation, and/or random electromagnetic radiation from the sun having flipped some of the bits. Both of these could turn part or all of the files into gibberish. In that case, there’s an modern floppy controller called KryoFlux that hooks up to a modern PC and uses more advanced/heroic techniques to try and read the bad parts of the disk repeatedly, hundreds of thousands of times. With luck, even badly-damaged disks can give up some of their secrets.

Link to this Post | Leave a Comment

Charleston Conference 2014

In early November I had a chance to travel to South Carolina to attend the Charleston Conference.

Together with colleagues from Yale and ProQuest, I presented a panel on Data Mining on Vendor-Digitized Collections. We focused on our analysis of ProQuest’s Vogue Digital Archive — a collection of every issue since Vogue’s inception in 1892 — as a case study of what libraries and scholars can do with vendor data. Our examples were mostly drawn from our public website that showcases various visual and textual experiments with the Vogue data:

Robots Reading Vogue 1600

Here’s how we framed the larger issue:

This session delves into the rapidly emerging topic of text and data mining (TDM), from the perspectives of a digital humanist, a librarian, a collection development officer and a product manager for a major vendor of digitized content. We will show concrete examples of TDM on a large vendor-digitized in-copyright collection: the Vogue Archive from ProQuest, with over 400,000 pages of text and images dating from 1892 to the present. Several projects in progress at Yale have illuminated the appeal of TDM applications on Vogue to researchers across disciplines ranging from gender studies to art history to computer science. We will address issues of copyright and licensing, file formats and research platforms, new forms of research enabled by TDM, and how vendors and librarians can work to support digital humanities projects. Session attendees who are new to this topic will learn what TDM is and how they might engage with it in their own work. Audience members who have familiarity with TDM will be encouraged to share their experiences and insights.

After the conference was over I had a chance to enjoy a day in the city free of presentation responsibilities. The weather was very pleasant and the sky cooperated to show off the architecture in its best light:

Link to this Post | Leave a Comment

About this Archive

This page is an archive of recent entries in the tech category.

More entries in tech: tech: March 2017 (1)
tech: May 2016 (1)
tech: December 2015 (1)
tech: November 2015 (4)
tech: July 2015 (2)
tech: May 2015 (1)
tech: January 2015 (1)
tech: December 2014 (1)
tech: October 2014 (1)
tech: July 2014 (1)
tech: June 2014 (1)
tech: March 2014 (1)
tech: March 2013 (1)
tech: December 2012 (1)
tech: November 2012 (3)
tech: September 2012 (2)
tech: August 2012 (2)
tech: July 2012 (2)
tech: June 2012 (1)
tech: February 2012 (1)
tech: January 2012 (1)
tech: October 2011 (1)
tech: September 2011 (1)
tech: August 2011 (1)
tech: June 2011 (1)
tech: April 2011 (4)
tech: February 2011 (1)
tech: November 2010 (2)
tech: July 2010 (1)
tech: June 2010 (3)
tech: March 2010 (1)
tech: February 2010 (1)
tech: December 2009 (1)
tech: October 2009 (1)
tech: September 2009 (1)
tech: August 2009 (2)
tech: July 2009 (10)
tech: June 2009 (2)
tech: May 2009 (2)
tech: April 2009 (3)
tech: December 2008 (1)
tech: November 2008 (2)
tech: September 2008 (1)
tech: August 2008 (2)
tech: July 2008 (1)
tech: June 2008 (1)
tech: February 2008 (2)
tech: January 2008 (1)
tech: September 2007 (2)
tech: August 2007 (4)
tech: July 2007 (2)
tech: June 2007 (3)
tech: July 2006 (3)
tech: June 2006 (1)
tech: March 2006 (1)
tech: January 2006 (1)
tech: December 2005 (1)
tech: October 2005 (1)
tech: September 2005 (3)
tech: August 2005 (3)
tech: July 2005 (7)
tech: March 2005 (2)
tech: February 2005 (3)
tech: January 2005 (1)
tech: December 2004 (1)
tech: September 2004 (1)
tech: June 2004 (1)
tech: April 2004 (1)
tech: March 2004 (1)
tech: February 2004 (1)
tech: November 2003 (2)
tech: October 2003 (1)
tech: September 2003 (1)
tech: July 2003 (3)
tech: June 2003 (1)
tech: May 2003 (3)
tech: April 2003 (8)
tech: March 2003 (5)
tech: February 2003 (6)

school is the previous category.

Find recent content on the main index or look in the archives to find all content.

Recent Activity

Sunday Mar 22
Saturday Mar 21
Friday Mar 20
Thursday Mar 19
Wednesday Mar 18
Tuesday Mar 17