Charleston Conference (Fri)
On Thursday evening I had dinner with my conference roommate (another librarian from the PASSHE) at Swamp Fox Restaurant. I had sauteed local Carolina shrimp served in a lobster and Tasso ham gravy with sauteed bell peppers and vidalia onions over stone ground Adluh pepper jack grits. Yum!
Friday session notes
Full-Spectrum Stewardship of the Record of Scholarly and Scientific Research
Brian E. C. Schottlaender (The Audrey Geisel University Librarian, University of California, San Diego)
- Ross Atkinson defined the scholarly record in his 1990 article “Text Mutability and Collection Administration” (That which has already been written in all disciplines)
– Types of digital scholarly resources (2008 ITHAKA study) e-only journals, reviews, preprints, reviews, encyclopedias, dictionaries, data resources, blogs, discussion forums, professional and academic hubs, working papers (note that 2 years ago, e-books were not on this list)
– Traditional scholarly publishing is stable because we know where/how libraries and trusted third parties fit in
– However, there is another end of the continuum – scholarly raw material (archives/data) which is less stable because some is resident in libraries, some is not
– The scholarly record is really a continuum of inputs, operators and outputs encompassing scholarly and scientific
– Scholarly inquiry/discourse is where we find things like blogs, wikis, open notebooks, etc and that is in the middle of the continuum. Scholars and scientists are increasingly making use of these things in their research. It is not clear who should steward this part of the record – very unstable, who is responsible for this sector?
– To Stand The Test of Time (2006 report – available on the web) – need for close linking between digital data archives, scholarly publications and associated communication. Research libraries can address these linkages.
– Stewardship models (there are tons if you Google it)
– Actors/stakeholders – experts, users, archives, data centers, libraries, developers, preservationists, institutions, professional societies
– Migration from print to digital environment has disrupted the practices and responsibilities we have traditionally performed. What should be stewarded and who ultimately has the responsibility for it?
– We need to have a more expansive view on what constitutes the scholarly record, who the various stakeholders are, and the scope of the infrastructure needed to manage the record – more distributed, interoperate, and in need of much broader attention.
– Stop talking about it and do it – success grows from success – curation is more than storage
– There is a big need for library school graduates with data archiving/data curation instruction – LIS schools should be paying attention to this
Moderator T. Scott Plutchak (Director, Lister Hill Library of the Health Sciences, University of Alabama at Birmingham); Youngsuk (YS) Chi (Vice-Chairman and CEO Elsevier, Science & Technology); Kent Anderson (CEO/Publisher, The Journal of Bone & Joint Surgery, Inc.)
Question 1: What is the role of publishers with this supplemental data?
- Chi – Publishers are challenged… they don’t quite know how far to take it. We are for adding as much supplemental data as possible but we’re not sure we’re the ones to draw the line on how much. Depending on discipline, there are places that are willing to put priority on data. It can’t be rolled out full-scale but brewed from bottom-up.
- Anderson – You can’t give data a free pass. We get sporadic data sets with various structures and don’t have skill sin house to edit or store it. We don’t do data well, so it’s hard for us to stand behind it and do it well. Taking it on to articles seems like a fairly weak approach – we need to fish or cut bait on this. We’re good at mishandling data – we need to become better.
- Plutchak – What is the article and what is supplemental is becoming very fuzzy. How does this relate to the trust process? We trust the published literature because it has gone through a process of verification. If publications are saying they don’t have the resources to put that process into place, what happens? Does it become an institutional responsibility to state that the data is solid?
Question 2: Why haven’t digital technologies impacted scholarly communication/publishing?
- Plutchak – Culturally entrenched values have not transferred.
- Anderson – We play fast and loose with “disruptive” – How do we get there? Who are the players?
- Chi – For us to really have a major disruption, there needs to be a disruption in the authoring tool. We’re still based in Microsoft Word. We haven’t integrated fully yet in terms of authoring. We need a revolution there before it trickles up. In academe, how you assess the impact of researchers’ work impacts the fruitfulness of this disruption (tenure/promotion). Numeric evidence cannot drive the decision.
– Look up The Scholarly Kitchen & his self published books
– Staffing needs: Looking for staffers for whom technology is a second nature. Personality, tech background, broad expertise in humanities. Understands the soft pieces.
– Do you want traffic or do you want revenue? Publishers need to get better at not publishing, but finding ways to provide value.
– Staffing needs: Looking for people who know the true editorial/curatorial work required to ensure quality for our organization. Domain expertise. People who will stay for a while. People who LOVE technology and know how to deal with it. Not technologists, but people who aren’t afraid to use it. That combination is extremely hard to come by when competing with software companies (salary, stock options, etc). Our future is in how we use and reuse primary content accurately and quickly. People who can do domain specific vertical solutions.
– Traditional will not be abandoned, but we need to provide what users want. They want a way to swim through too much information. Need tools. Developing countries have a huge need for more access to information.
– Content should not be dead, but alive.
– Cloud will inevitably play a much bigger role but it’s a technology question, not a business question. When do we hand it over to someone else? In the next 5 years, it’s going to go from mass transaction to micro-transaction.
– Institutions need to back their university presses – should not have to self-fund. They are conveyors of scholarly knowledge.
– Line between book & article is blurry – everything is sort of a serial.
– Staffing needs: People who can help change the organization.
When Rubber Meets the Road: Rethinking Your Library Collections
Roger Schonfeld & Sue Woodson
(Schonfeld – Research Manager, Ithaka S+R)
– Sustainability of digital resources, The role of the library, Practices and attitudes in scholarly communication, teaching and learning with technology, Scholarly publishing
– What Users Want (2009 faculty survey) showed:
– Support for canceling local print subscriptions in favor of online-only access has grown steadily with the exception of a few disciplines (art history, etc)
– Faculty are increasingly enthusiastic about this change in access format
– E-books are still seen as less useful than e-journals
– Not many faculty think that e-books will replace physical items
– Libraries must take a more vital role in the lives of their users, more than just managing subscriptions
– Is there a trade off between reducing print collections investment and maintaining shared values?
– Achieving consensus on shared values – not always well-specified, visions can differ tremendously – a research-based, scientifically driven model can help
– UK Research Reserve – 1-3 print copies in the UK
– U of California – shared print archive – 1 copy validated in the UC system
– Ithaka S+R’s approach – risk-informed, research-based, science-driven
– What to Withdraw paper details all of this
– Modeling sustainable trust networks for collaboration
– Some libraries have created regional print repositories for space-saving or last-copy retention – if we can share information about these activities and take it into context, an individual library can determine whether or not to withdraw items
– How do we pay for this decentralized model? Is print preservation as a long-term incentive enough?
– Ithaka proof of concept project – focuses on JSTOR-digitized journal titles, freely available online, permits libraries to assess what can be withdrawn without preservation risk
- fdlpmodeling.net (look up)
(Woodson – Associate Director of Digital Collection Services, Welch Medical Library, Johns Hopkins Medicine)
– Goal of the library was to shrink print holdings by 80% by 2012 leaving them with about 83k volumes in the building
– Wanted to improve service to our community – all collections online, serviced embedded in departments, excellent discovery tools
- The library is not the building – take the library to the patron
– Gate count decreasing because they are in surgery, in class, not coming to the building – not centrally located, dated facility
– Building designed to hold 12 staff, will be up to 60 staff by 2012
– On staffing – jobs going away: cataloging and acquisitions, shelving, in-person reference, security guards
– New jobs – off-hours phone reference, aiding systematic reviews, publicity (YAY!)
- We collect for today. If we don’t have it, buy it or ILL it. If our users no longer use it, we don’t want it.
– SE/A – Print retention task force recommendations
– What becomes of print when it is no longer valuable to Medicine? Will it be valuable to other people? What other communities?
– Welch Medical Library…Wherever you are
– Informationists model – have 10 now, moving to 12 by 2012. Evaluated in part by collaboration – co-authoring publications, grant proposals
Publishing in the post-Web World: Some organizations think outside the box. Can we?
John Sack (Associate Publisher and Director, HighWire Press, Stanford University Libraries and Academic Information Resources)
- Based on user-research they have been doing
- Everyone loves the box – we understand it, our systems work with it, business models revolve around it but it hasn’t changed in 15 years
– Apple – computer box, Google – search box, Amazon – books box (they have all moved beyond their initial boxes) Looked at their core competencies and located where that intersected with their users
– Library – stacks box, Web – browser box, Mobile – phone box, Cable – TV box, News – paper box (most of these are moving beyond their boxes)
– What readers want is information. We give them a container. Difference between an article and the information in the article that the user wants – this is how we can move out of the box.
– Wired article – “The Web is dead.” – more and more of the users time is not working with the web, but with other things. Time and attention is shifting away from the box.
– Innovations in scholarly publications – make it more fitting for the task users are trying to accomplish
– Mobile fits well. Small devices, fast to use, no booting up, bite-sized task accomplishment
– Do articles, issues, journals, books fit well?
– Have interviewed 25 Stanford researchers (not students) Age and gender demographics pretty evenly split
– Communication and devices – laptops predominate (they are already mobile), smartphones (3 in 8 – 37% are using them & all were iPhones), Skype
– Discovery tools – PubMed, Web of Science, Google Scholar, Google (“I use Google to vacuum around the edges of the carpet”). No one mentioned publisher portals or library catalogs (1 person). People are searching Amazon and Google Books for discovery of books. Books used to “unfamiliar topics” and articles used for “keeping up.”
– Keeping current (macro) – more automated (alerts); “reading” journals (emailed TOC, not the physical journal); liked annotated TOC; gossip (recommendations by colleagues); missed thematic connections online (special issues); RSS feeds (but sometimes subscribe and never look); timing influences reading habits (Sunday mornings are best); using Facebook (self forming groups to keep in touch); missed discovery, browsing, serendipity; love/hate relationship with technology (print at home = leisure/computer = work); very low use of social networking like blogging
– “I don’t read journals, I read articles.”
– “I don’t read books, I use them.” – indexes, remixing
– Reading (micro) – not reading as narrative anymore; reading more things but fewer things intensively; the first thing they do is check to see if they’re cited (haha); print PDFs to read them & store them on their laptop for reference; skimmer touch points = abstract, figures, figure caption, introduction, conclusion, subheadings; “key points” summary is desirable
– New media is important – podcasts in the car/commuting
– HTML is marginalized – looks cheap, no visual cues, can’t be saved, doesn’t seem like a real paper
-Keeping track of reading – collections of PDFs; folders on laptops; not a good way to annotate/take notes; want flexibility to access anywhere
– Recommendations – play well with others (interoperable tools); search and send (people use email, make it look good); open (to annotation tools, to data mining for extraction – not proprietary!! people just don’t care about those); integrate (other types of content that doesn’t fit into the “article” container – PPT); experiment (in skimming, visual abstracts); mobilize (don’t wait, start now to get feedback)
– Librarians are using COUNTER stats for full text downloads but if users aren’t using articles in that way (more skimming, etc), how can we actually gauge use?
Changes in Print Paper During the 19th Century
AJ Valente (Author)
– conservators are faced with a number of challenges in identifying papers from the 19th century
– To know your papers is to love them
– 1st papermill in the US was the Rittenhouse Mill near Philly (initially underwritten by a publisher in Philly) By 1755 they were using water power.
– 1696 – poem about the Rittenhouse Mill
– 19th century was the century of change from linen paper to wood pulp
– Librarians/archivists/conservators need to know about this shift because dating a piece of paper is critical for preservation
– 19th century – rag paper, manilla paper, straw paper & wood pulp
– Experimental papers – India paper
– 2 types in beginning of 19th century – Writing paper and printing paper (pasteboard for boards for books)
– Making pulp by hand took a whole day to make enough for a half a ream of paper, they needed industrialization to speed things up
– How do we identify if a mill was industrialized? Presence of water wheel.
– Rag engine/beater – invented in Holland as an alternative to mechanical stampers – powered by windmills – “the Hollander”
– Some American mills installed the Hollander
– How did they get the rags? They had a collector who went around, people were paid by the mill for their old rags. Sometimes merchants had rag bins where people could drop them off and get a certain amount taken off their bill (18th century).
– As we had more mills, there was more competition and rag warehouses developed (19th century)
– Documented communications between textile mills and paper mills – didn’t generate for them but gave them scraps (through established personal relationships)
– Rags were an important added source of family income
– Fine clothing was made from linen – grown locally, made from flax – most people owned 2 sets of clothes (every day and special occasions)
– Cotton fabrics came into play at a lower cost – average person could afford more outfits, particularly in coastal towns
– Rag collectors began to separate their piles
– Wove mold – for book paper because quatro folio was easier to letterpress print on than chain mold
– Customer would specify what they wanted from the mills and even provide their own molds with watermarks on them
– “high rag paper” = 99% rags
– “linen-cotton composite” – linen and cotton
– Paper machine invented in France, re-engineered in England – “moving wire machine”
– Competing machine “cylinder wire machine” w/fewer moving parts (vacuum)
– US was not allowed to have this machine imported in because they were an ex-colony
– Brandywine Mill – 1,000 ft long roll of paper, machine was patented
– By 1830 there were 60 paper machines in operation in the US
– Paper was the preferred form of communication in those days, so people were picky about the quality of writing paper.
– It was more difficult for mills to create this fine paper, so they focused on book paper
– Machines increased speed and volume of paper production – it was also cheaper
– Huge industry in recycled paper – lots of early documentation (particularly govt) was lost
– 1827 – Meadville PA farmer tried making pulp with straw @ Shyrock Mill
– Imperial – large sized paper used for newspaper
– 1830′s – Rags were scarce because number of machines doubled
– Invention of manilla paper – panic of 1837 – put manilla rope, hemp sails, bale rope into the beater. Beating time increased but it worked. High tinsel but couldn’t be bleached white.
– Late 1850′s = decline of handmade paper
– How did they bleach rags/paper? Bleach boiler
– Paper mills were fairly environmentally neutral in those days (until chlorine processes were introduced)
“I Hear the Train A Comin’ – LIVE” session
Greg Tananbaum & Joseph J. Esposito
(Tananbaum – CEO, Scholarnext)
– Writes a regular column for Against the Grain
– Questions below posed to a roundtable of experts in the field of scholarly communication
(1) What is the single biggest game changer that will alter scholarly communication in the next 3-5 years?
– Technology-driven reinvention – Storage and bandwidth; Mobile devices (how, where and when, publishers loosening control over traditional distribution methods); Semantic web/data mining (new ways to assess content quality)
– Economics-driven reinvention – Proliferation of market data makes change less intimidating (theoretical and practical data for them to draw on that could impact business practices. Move away from traditional business models towards things like open access and collaboration with libraries)
(2) What is the most over-discussed scholarly communication issue and why?
– Open Access – Trending downward; Practicality vs. ideology (ideology often overshadows practicality)
– Library-press collaborations (disproportionate to actual results)
– “Death of print” (reduces to tabloid headline form instead of explaining how the digital environment can expand access to scholarship)
– Better scholarly communication tools (always room for improvement, but we have a ton of tools already)
(3) Is there still a scholarly communication crisis? If so, what is it?
– Challenge vs. crisis – Libraries (information overabundance – how does the library balance traditional with emerging resources?); Publishers (adapting to change)
(4) Does traditional scholarly publishing still matter?
– Yes, but what is “traditional”?
– Form vs. function (these functions impact research funding, tenure and promotion, etc.)
(5) In one word, how would you describe the future of scholarly communication?
– Dynamic, multi-faceted, torrent, networking, exciting, flux, reinvention, necessary, different, vital, experimental
(Esposito – CEO, GiantChair)
– We’ll eventually see 2 competing forms of publishing: supply-side and demand-side
– Punctuated equilibrium – Stephen Jay Gould – period of abrupt change, followed by a period of relative stability – and then another disruption
– Look beyond the disruptive – i.e. growth of personal computer, now we have another asteroid (Google cloud computing, mobile)
– Trendspotting 1: Funding – budget pressure, authors looking elsewhere for publishing venues because of dropping readership
– Trendspotting 2: Library Bypass – Publishers seeking growth in new territories, at directed individuals, government entities
– Trendspotting 3: Supply-side Publishing – Author pays model (you pay to get published – not vanity), growth of research and requirement to publish is creating a strain because traditional options are decreasing but research is increasing.
– Trendspotting 4: Direct Marketing – selling directly to end-users bypassing bookstores & libraries, privacy issues loom because you need to create and manage customer database
– Trendspotting 5: Proprietary Systems – Copyright concerns. It’s inevitable because an ecology requires that someone profit from it.
– Supply-side Publishing – Evolution of open access; post-publication peer review (authors post, review takes place via commentary)
– Demand-side Publishing – user pay, migrating towards direct marketing, emphasis on collecting customer data.
– D2C (direct to consumer) – Netflix
- “monopolizing attention” – I paid for it, so I might as well use it. You stop doing other things and focus on what you paid for.
Creating a Trillion-Field Catalog: Metadata in Google Books
Jon Orwant (Engineering Manager, Google Books)
- How and why Google scans books
– In the business of answering your questions (which could take place on the web)
– Publishers want people to find their books online (not necessarily read) – they send books to Google & they are scanned
– Libraries don’t want you to slice their book spines off, which makes scanning more difficult/costly
– Stereo-scanning = as little damage as possible (about same as a person reading it)
– About 20% of the world’s books are in the public domain
– Use “snippet view” for books under copyright (you can search for the phrase and Google will tell you where/when it appears and then use Worldcat to find the book in your library)
– Google book scanning workflow = scan, image process (clean up dirt), optical character recognition (OCR), tag, metadata, rank, index
- What happens to written marginalia? They make nuanced decisions. (interesting – implications for historical research – talk about this in class)
– Statistics: 15 million books scanned; 4 billion pages; 2 trillion words; working with over 40 libraries and 30 thousand publishers
– They collect metadata from over 100 sources, parse the records into internal format, cluster the records into expressions and manifestations, create a “best of” record for each cluster and index and display elements of that record on books.google.com.
- Google has a real estate unit just to purchase server farms
– Multivolume works are hard because there is little uniformity, the information is in a variety of different fields.
– The world needs a better way to identify boundwidths
– Some fields don’t matter to them (paperback, acid-free, etc)
- ISBN 7533305353 is shared by 1413 books (what the… look this up)
– Developing an author database
- Have scanned books in 483 languages – 3 in Klingon (haha)
– Cover generation – algorithm for beautiful images, then created a composite cover generation with the author and title
– Structure – annotated flaps, magazine fold-outs – try to maintain the original intent of that author
- From pages to ideas
– Google maps/books mashup – identify place names referred to in the book, then map it
– Linguistic analysis – evolve over time and across genres – look at grammar books to identify trends – data mine common usage
– Trove of data – need to expose to researchers
– Insights into human progress – word lists “trigrams” – can indicate new/old books & cultural trends
– Digital Humanities Awards (Google has data, loves to work with it, but doesn’t know what to ask/use it for, so they accept research proposals to fund)
– Cohen & Gibbs, GMU – Reframing the Victorians (Plot the instance of words of interest to scholars of the victorian era over time)
– Efron, University of Illinois – Intralanguage translations (Training corpus to translate between different years of a language)
– They’re doing a lot of data visualization (for researchers, for libraries they’re partnering with)
– Q: Do you have an API that publishers can access? Not really for data mining – Google never announces upcoming products but stay tuned ::wink:: We want APIs and we will be exposing a database like layer on top of this data.
– Q: How does Google pay for all of this? Advertising. They share some revenues with publishers when they donate books for scanning.
– Q: How do you identify languages and is it possible for end-users to search by language in Google Books? It’s possible on the advanced search feature to narrow by 60 languages. We identify through metadata & language-based OCR boxes.