Rants, Raves and Revelations
Too Many References — Too Little Space — No way to Search for Them: A Quintessential Enigma
by Harry {doc} Babad
The Enigma — to keep or not to keep, that is my question. Is it better to search for information each time I need it? — OR — After collecting needed information, usually in documents, spend time cataloging it for easy retrieval. Too much to do — so little time to do it in!
Pros and Cons
— A new search will turn up other interesting and appropriate goodies
— Much of what I want, and often know exists, has never been stored electronically
— Dealing with the problem by cataloging is sooo boring, en with my favorite music playing
— Over the years, you’ve created great, if I say so my self, systems for cataloging, and retrieving stored documents.
— Doc, you’re likely too attention scatters to maintain even the best of systems. It’s like quitting smoking or eating after 9:00 pm.
My Problem
Floor to Ceiling Paper References – in a walk in closet bookcase, cataloged in a database)
File draws — six plus, full of references including my published papers, mostly uncataloged.
Uncounted Electronic References — documents, mostly uncataloged, with likely duplicate copies, in folders scattered all over my hard disk. These are randomly organized by project, client or by subject matter or in folders of several of these categories.
Needs
If wishes were horses, beggars would ride…
- Reclaim Closet and File Draw Space by canning the documents I want to keep— Paper reduction 101 — it all goes to recycle.
- Catalog all those references into a searchable database. For cataloged items, shifting the existing information to a newly designed more easily searchable catalog. More about that later.
- Alternatively, with great pain, toss the ones I no longer connect with, knowing they’ll become needed next week… month… year.
- Maintain the updated database and document storage system. The rule, new document, new database entry — golly doc is getting organized again.
- Create a reference library to stare my document collection on a separate hard drive. This will allow me to store archive but interesting documents.
- Change by collecting ways to avoid duplicate copies of things I use — I can use aliases, in project files, of the documents I need to refer to, in support of my clients. I could even give the aliases a more useful name, rather than the short alphanumeric used when cataloging and storing the original.
Okay, that’s right — I’ve got the tools, let’s do it right.
Available Tools
I actually for the first time in my life have all the tools I need to make document collection, cataloging, and storage an almost painless process. At 71, some of you might question my sanity, but as sayeth the Bard, The time is out of joint; O cursed spite, that ever I was born to set it right! Hamlet, Prince of Denmark, Act I. Scene V.
— A Scanner — A scanner capable of dealing with documents,
automatic sheet feeder, in a dependable fashion. (Super fast, accurate, easy to
use with minimal miss fed pages.) Read about it in this months article on the Fugitsu
S500M ScanSnap Scanner — A dream machine dedicated to document scanning.
— A Universal Catalog for my Documents — My, now more flexible, FileMaker Pro references database.
— A Way to Getting into the Meat of a Reference — OCR software, Acrobat Pro or Readiris Pro, for extracting editable text from the archived documents, if in PDF format. An added advantage, doing so makes the contents more searchable in spotlight or its hopefully more controllable replacement.
— Finding It Again — An easy to maintain storage approach to allow two-step retrieval of the documents I’ve collected. Step 1: Find it in the database. Step Two: Do not pass Go or Collect 200 dollars — Go straight to the document.
Over my professional career I’ve collected lots of technical documents in paper format. Initially these were stored in file draws associated, depending on complexity, with their project or project subtask. It soon became obvious that the method had a flaw. Many of the documents were useful on other projects. I was never very good at Xeroxing the documents cover page and writing down its main file draw location as a tool to cross reference a document location, so stuff was filed — then as file cabinets proliferate — a corollary of Moore’s Law — lost forever.
Background – 50 Years of Being an Information Junkie
Okay I needed a better idea. Initially, this was before PCs and my beloved Macintoshes became available, I set up, with the help of a secretary, a needle sorted punch card system. I was working as new products manager in a specialty chemical manufacturing company —references, including patents grew like Topsey. Our task was to develop, in the laboratory, process to batch manufacture the various chemicals our salesmen said they could get orders for: of course at a profit. Great fun!
EZ Sort Punch Cards - My First Try [1968 – Story Chemical Co.]
I read about and decided to create a set of rim
edge-punched-punch cards as a means for cataloging the references and to break
our company’s traditional method of filing information by customer and project.
As long as I was not dealing with record information, I was allowed to play
— the group was productive and the products we took to the pilot plant
and ultimately scaled-up were profitable.
So I initially bought 500 5 x 8” punch cards from EZ Sort, a company that seems to no longer exist. These had a double row of holes around the periphery, which could be used to define the document’s contents. After developing a coding system, that our needs, I would use a special punch to cut away a narrow wedge at the edge of the card. The later was easier than it now would be because EZ Sort LTD was kind enough to supply us with a pamphlet of examples I could build on. (Remember, no PCs in those days.)
The center portion of the card would contain an index number defining the location of the article in our filing system. In addition we added the full bibliographic citation to the article, traceability in case what we’d synthesized was process patentable. Finally we added an abstract of the articles’ contents, either typing or pasting a Xerox copy from the article itself. The coding information about the card’s contents was notched into the edge of the card making for easy retrieval.
To get visual, the center portion of my “EZ Sort” punch Card was a large expensive index card. I could have has these preprinted to hold bibliographic information, that that cost more than may management wanted to spend.
So my secretary, as time allowed, typed information about each document… the usual stuff author(s), title, journal or other citation, and an abstract of its contents. In addition, and most importantly, each card contained a sequential number that I used as the basis of a filing system.
The first document, no
mater what its subject, became file 00001. The second was 00002. No matter what
the content, each new item got the next sequential number. I allowed only one
variation on this theme. My secretary did not want to put large documents (book
size) or loose-leaf binders, into a file draw so we added a bookshelf
designator such as 00073B. That bookshelf’s contents were also sequentially
organized.
To find a reference, anyone would search my collection with a needle or two, think fancy ice pick, the selected item would drop out of the stack that ultimately grew to 2,500 or so cards. These were the database of my reference collection. If the references were what we needed, we’d pull the document from its file draw. If not, off to the library. (The image is wish fulfillment, but represents a real collection of isotope data that was commercially sold.)
The Computer Revolution — I get my First Macintosh
My job responsibilities either as a manager or senior scientist always, in great part, revolved around finding and understanding the literature of my science. Once again, over the 30 years at the Hanford Site, I accumulated scads and scads, a plethora or cornucopia of documents. Some with technical reports generated by DOE contractors, others journal articles. Many items were detailed technical correspondence, process records or event occurrence reports. For most of this stuff, there were no central records, so one collection of stuff was critical to maintaining institutional memory.
For those of you who wonder why, it was all about the DOE or GSA definition of record items. The listed items were the only things it was mandatory for a company to retain. Alas was a very small subset of the technical information needed to maintain operating safety, trouble shoot unanticipated events, and to ultimately dispose of the radioactive waste stored on the site. Such technical documentation was often tossed when an employee took a new job or moved to another company. After all that material was a copy, not the original. In addition, such files, non-records, were deemed to be the property of the operating contractor and were either tossed or removed when contactors changed. — I worked for four contractors over my 30 years on the site, but fortunately was able to keep my personal files, although non-record libraries of documents were trashed. They could have been turned over the R&D contractor for cataloging and archiving, but there was never the funding for that effort. |
Personal computers and database programs became available at that time. Here was a possible way to make order out of file draws of technical information. At the time, I was being swamped with documents I knew I had but could not find, and my office was invaded by zombie-grey file cabinets. I took up the challenge.
I had kept alive the concept of using a database, for
keeping track of reference material. Instead of storing catalog and summary
contents information on punch cards I created a reference database, once again
assigning each new document a consecutive number. The types of information I
needed to catalog was the same as I’d had typed on the punch card. By having
the information in an electronic database, records are more searchable. By
overcoming the coding limits, and ultimate fragility of an edge-hole punch
card, the documents became were more accessible.
My first references database was created in the Macintosh version MS File. When Microsoft discontinued support for that product, I first migrated my data to what is now AppleWorks, but finally settled on FileMaker as the database of choice. The result, I now have a sort of topsy-turvy patchwork database entries in FileMaker Pro, to which I’ve added references over the years, connected to that closet full of cataloged documents.
Okay the indexing system worked. Coupling sequential numbering of documents with a database entry for each item worked well for me, even in the absence of secretarial privileges.
Our fathers that are in Washington DC or corporate HQ, know that it is more efficient to have high-priced engineers and scientist do their own typing than to have a secretary do that work. They did buy us all high-priced PCs, which initially were typewriters on steroids. |
As a result, I only had the problem of data entry to deal with. I got my system started working weekends, much to my wife’s chagrin. As is usual, in the real world, my pile of documents as well as the other annoying job related tasks, interfered with timely cataloging. Second, my two -finger typed input occasionally got in the way of searching for information; computers are so literal. Be that as it may, I’ve kept the system alive for almost 40 years — it remains functional. Okay I graduated Hanford; it’s called retirement, and kept several thousand non-record documents containing useful technical information mostly stored in that closet I keep mentioning.
Start Of The Golden or perhaps Bronze Age Of Document Collecting
A new paradigm began about 10-15 years ago, the distribution of documents in electronic format. Most of the documents initially produced were in limited use proprietary graphics formats, but that soon began to change. “The Portable Document Format (PDF), is an open file format created by Adobe Systems in 1993. It is now being prepared for submission as an industry standard.” [Wikipedia] With the ever increasing popularity of PDF format, now a defacto standard, documents in electronic form could be universally read and shared
But this too caused problems. How could I find what I’d
collected, now scattered about my ever larger series of hard drives. After all,
like a kid in a candy shop with two bits clutched in my hand, I did not now
need to limit myself to only a key reference or two. I could download
associated papers, peripheral information. The reference documents cited in the
references were also for the most part available. The groundswell of
electronically available information grew and grew. A particularly rich source
of otherwise had to get information were documents generated by the federal
government and its contractors including federally funded universities research
programs. Look, a random Sample =>
My gravy train, DOE’s Information Bridge (Office of Science and Technology Information) http://www.osti.gov/bridge/ the DOE Information Bridge was among the first to make documents available, usually as PDF files. Medical information, which is not a subject I follow, is made available by the NIH. Comparable access to technical documents was soon to available from the department of Defense (NSTI) and Homeland Security.
This alternate format document form gave an individual the choice whether to store a document as a paper copy of as an electronic one. With the availability of fast low-cost laser printers, the decision for me became a non-brainer. I chose to store electronic copies, printing on paper only as needed.
Therefore, Back to my Challenge — Converting my existing paper collections to PDF and updating my database (index to include all my electronic documents.
The rub with organizing my accumulated technical electronic document files was not my paper collection. Rather, I a hard drive full of documents, often with cryptic titles. Remember we once were limited to 27 letters in a title. Okay, though it may take the rest of my active professional life, unless I can find a Macintosh literate minimum-wage salaried helper with some smarts.
First things First — You might ask why bother? Well I’m compulsive. However, I’ve talked about my quandary the data chase enigma “save stuff or re-search for it all over again” earlier in this article.
Decisions, Decisions, Decisions — My Path Forward
— Initially ignore the technical documents on my hard drive – They’re not indexed but well backed up.
— Tune the fields and create a new database to so that it is easier to capture and search for information about my documents. (Done, a half hours work.)
— Convert my paper collection, to PDF files. I started this in testing the Fugitsu Scanner. I will, however, not routinely OCR them. If needed that can be done later, one reference at a time. Such documents are already cataloged.
— Migrate the information from my existing catalog (database) to the new one (cut/paste and correct spelling errors. (A minute or two a document) as I create the scanned electronic copies.
— On an existing external hard drive; store the electronic documents in elegantly organized folders each containing 50 documents each (e.g., document 0000l to 00050.)
— Name the new documents cleverly so I can remember what they contain. — Nonsense, that’s what the database is for. (Document 00001, 00002, … 0XXXX.) However, for flavor, combining the document index number and a few pithy words would not hurt. (D00018_Vadose Zone Chemistry or D00293_A History of Punch Cards.)
— Rename and index, when I get to it, the existing electronic documents on my hard drive. Well, I believe in the tooth fairy and Never-Never Land, so there’s a place under the rainbow where even this is possible. However, if I index a document when I again use it, that‘s a good start.
That’s all for today folks.
Read my review of Fugitsu S500M ScanSnap Scanner. Its part of what can make document collection control feasible for you.
And for more about punch cards read Robert V. Williams’s article Punched Cards: A Brief Tutorial http://www.computer.org/portal/site/annals/menuitem.8933248930f8c11dbe1fbe108bcd45f3/index.jsp?&pName=annals_level1&path=annals/content&file=punchedcards.xml&xsl=article.xsl&
Doc.