The process of digitizing, georeferencing, and publishing the documents from the 1991 LA City Planning collection at USC’s VKC library is one involving a series of steps using multiple physical and digital processes to accomplish the goal of publishing these materials online for open use. If you are continuing to work on this particular project or are doing something similar then please follow the instructions below.
Before any of this work is done, it is imperative that a digital location (a Google drive was used for this internship, but any drive would be fine) is made so that all of the materials are located in one easy to access place. The individual worker can structure this drive how they like but for the sake of this workflow I will be documenting how I achieved this.
Initial Documentation
The purpose of this step for the project is to collect physical data (the binders of government documents) onto a single spreadsheet (Google sheets was used for this job) for easy accessibility of materials. This allows those who wish to find specific files to locate them without the requirement of going through each individual binder for the sake of material access. Note: For this project (1991 Los Angeles City Planning Documents) this step has been completed, however for future projects that are similar, following this template would help to generate a standard for USC’s documents that are to be digitized within the VKC library.
Open up a google spreadsheet and at the top of the columns write the following in order so that it looks like below:
In Catalog Name Call # Date #Maps Included Content/Notes
It should be six columns from A to F.
In Catalog refers to whether or not the file can be located within USC’s digital catalog through a search. This is a step done after all of the materials have been documented within the spreadsheet. Write “Yes” if the file can be found online and “No” if it can’t. If you find the file but it is under a different name, call number, or finding it required a different search than the others, indicate this through parentheses next to the initial answer e.g. “Yes (under different call number)” You can search the USC library catalog for the binders using this link.
Name refers to the name of the document as referenced on the title sheet within the binder or the name printed on the side of the binder. Generally these two names are the same but if there is no name on the side of the binder than use the name on the title sheet within. If no name is indicated anywhere write down “No Name Included”. Make sure that the punctuation is transcribed correctly e.g. if the counties are separated by a / or by a , then you need to include this. Capitalization of names isn’t required beyond the first letter.
Call # refers to the call number on the side of the binder (for this group of files it was HT 169.73.L7 L(group of numbers). This number can also be seen often within the binder on a sticker, usually on the front page. The two numbers are generally the same, however if there are two differing numbers then write both of them down and check with the online catalog later to confirm the correct number.
Date refers to the date printed on the call number sticker. Alternate dates within the text can be noted within the Content/Notes section though this isn’t necessary given that government documents tend to include a number of varying dates due to when ordinances are passed. If a date isn’t included on the call number sticker put in N/A.
#Maps Included refers to the number of maps within binder. There are some restrictions for certain maps. Small, unusable pictures of maps such as the ones that are frequently utilized on the title page are not counted as they have no text data (keys, street names, etc.) that can be used beyond aesthetic purposes. Pages that have multiple small maps within them: Count each individual map. Large color map that is loose (generally located near the end of the binder): Count as one map The multiple pages that correspond to the larger color map at the end: Count as one single map Multiple pages of black and white maps: Count individually due to differences in text on map (unless they correspond to a larger map that can be pulled out in which case count as one map) Other map like images such as building schematics: Count as individual maps (it is up to whomever is documenting if they want to make a note of this).
Content/Notes refers to whatever extraneous information the person who is documenting the content wishes to put in for the sake of distinguishing the information within the binders even more. This can include a lack of materials compared to the other binders, repeated binders with additional information, anomalies within the documentation, etc.
Once this is finished your spreadsheet should look like this:
These binders have similar structures and often have similar materials within them so be careful to distinguish them through documentation. Additional details: Include binders that have nothing within them but still maintain documentation through the form of Call # stickers on the side.
Once all of the documents have been written into the spreadsheet and the information has been placed in the correct places move on to the digitization of the materials.
The digitization process is simple but at USC the scanner is only so powerful and an email can only be sent with so much information that it is impossible to scan an entire binder of 20 - 45 pages in one go. You must do this in parts or it will not work. Before digitization is to occur, create a folder within the drive that is strictly related to the digitized content and then create another folder within that folder to hold specifically placed digitized materials.
For this section and onwards I will be writing within the lens of digitizing the South Los Angeles Plans, which was a single binder within the group. The folder within the folder was named South LA Plans.
Because the text within this binder was split into three groups I created three folders within the South LA Plans folder labeled Intro, Community Plan, Specific Plan after the headings within the text. From this point you are ready to digitize.
Within the VKC there is a AltaLink C8035 scanner/printer that is located behind the front desk. Turn this on with the power button. When the device has been turned on look at the touch screen interface and there should be an image of a letter with the words Email underneath it. Click/touch this icon and on the next page touch add recipient. Put in email address (if using graduate school email like email @g.ucla put in the g otherwise the transfer of files will take longer) and click ok. Scroll down on the email file page and there will be a tab that says Resolution. Click/touch this tab and it will give you options ranging from 72 dpi to 600 dpi, choose 600 dpi. Scroll down further and at the bottom before the reset button there will be a tab that says “Build Job.” Click/touch this tab and make sure that it has a checkmark next to it. This indicates that you will be scanning multiple pages that are to be sent to your email instead of just one. On the top right of the screen there will be a button that states “Scan” click/touch this button and begin to scan each of your documents. Scan no more than 8 documents during a session if you wish to have no issues with the transfer. Once this is completed you can send the files to your email.
Place the files into the correct folders corresponding to their location within the text, they will be renamed after the next step.
After Digitization
After the digitization and organization is complete we move onto the next step with is creating a master spreadsheet, this spreadsheet is for the individual pages within the binders rather than the binders themselves.
Create another spreadsheet labeled LA Planning 1991 Master (or whatever your collections name is) and label nine columns from A to I. It should be in this order:
ID Title Title on Google Drive Folder on Google Drive Map Included (Y/N) Page Number Date(s) Included? Basic Description Extra Documentation
ID refers to the unique ID that each page will have. The ID is created by the user (for our case we used planning_00001, planning_00002, etc. to refer to each individual document.
Title refers to the tile that is provided on the page that you are documenting. If there is no title write “No Title.” The pages that are parts of a larger map all have titles that are six numbers and the letter P in the middle. If the title isn’t apparent/easy to discern e.g. not bolded, separate from the rest of the text, etc. then just write “No Title.”
Title on Google Drive refers to what you saved the file as when you placed it into Google Drive. Because the files will be changed to reflect their ID number later this step is only for the sake of lessening confusion on the transfer in tandem with the page number.
Folder on Google Drive refers to what folder the file has been placed within the South LA Plans folder.
Map Included (Y/N) Indicates whether or not there is a map on the page.
Page Number indicates the page number both counted by the documenter as well as transcribed from the pages. There are some pages within the text at indicate alternate page numbers such as SC - 1. For this the page number will be written as documenters number (written number) e.g. 8 (SC-2).
Date(s) Included? This indicates if there are any dates within the documentation, generally alternate dates can be found on the “Amendments Date Log” page or the ordinance plan title page as that includes effective dates of the documentation. If there are no dates write None.
Basic Description includes the basic description of what is on the page. This includes names involved, section headers, area of map, etc.
Extra Documentation includes extraneous details that while not covered in the description could be considered important for finding certain information within the text. This can be asterixis indicating information, indications of continuation from previous pages, etc. This isn’t a necessary point of inclusion so if there isn’t additional information that the documenter views as important this column can be left blank for certain rows such as with the maps.
Once this is finished your spreadsheet should look like this:
Once all of this information is put into the spreadsheet you are going to go through and check the accuracy between the files and the spreadsheet. Once this is concluded change the names of the files within the drive to fit their respective ID’s.
The process of adapting these files for QGIS in order to georeference them is the longest step within this entire project.
First you are going to want to download every single PDF file that you just documented and place it into a folder created on the desktop.
In this folder you are also going to create individual folders for each file and their outputs. Don’t put anything in them yet as that will make accessing the files in QGIS slightly more time consuming. These folders will carry the output rasters from QGIS and separate them into specific areas. Name them after the ID number e.g. 00032
Moving back to the files they are going to be switched from .pdf files into .tiff files to do this you will need to open up the application automator, which should be available on a Mac computer.
Once opened find and drag “Get Specified Finder Items,” “Render PDF Pages as Images,” and “Copy Finder Items” into the main area.
Your interface should look like this (make sure that you change the resolution to 600 dots per inch to resemble the digitized file):
In the “Get Specified Finder Items” box, add the files you wish to convert and press run.
The files should show up in the Desktop as converted .tiff files.
After this is done you are going to crop the files so that the white space of the maps is limited, you can do this easily in preview with the crop button but for some of the more oddly shaped maps it is best to manually eliminate the white space in a photo editing application such as in photoshop.
Keep in mind, if you crop the photos before you convert them the cropping won’t be transferred, so it is better to do it after.
From there you are going to open up QGIS and open up a standard OSM map.
To do this go to the task bar on the top of the computer where it says web click on it, scroll down to QuickMapServices, scroll to OSM, and open OSM Standard like so:
This should present a map of the world upon the main user interface which will be used as our base for georeferencing.
For this purpose you should zoom in on the area that is being georeferenced (in this case Los Angeles).
From here you will go back to the taskbar and click on Raster and then click on Georeferencer like so:
This will open up a new window where you will be uploading your .tiff files and creating points that will connect them to the OSM map on the original page.
Within this window you are going to click the open raster button on the top left which is a button that looks like a blue and black checkerboard with a small green square with a white plus in it on the bottom right.
From there you are going to open up a .tiff file into this window.
Before you begin to georeference click on the yellow gear in the window, it will say “transformation settings” and open a new window like so:
Leave everything as is except change the Output raster name so instead of it ending with _modified, have it end with _geo. Save this output to the correct folder on the desktop e.g. save planning_00032 to the folder 000032
Click OK
At this point you will begin georeferencing which uses three buttons on the top of the window.
The Add Point button on the left adds points, the Delete Point button in the middle deletes points, and the Move GCP point button moves the points.
From here, using the map you are going to connect points from one window to the other. In the case of Los Angeles, the easiest points to find were street intersections. To do this you will click on a point from the .tiff map which will open a new window labeled Enter Map Coordinates like so:
From there click on the “From Map Canvas” button. This will bring you to the OSM map. Once there you will locate the same point on the street as you clicked on for the .tiff file. This will bring you back to the Enter Map Coordinates window with the X/East and Y/North areas filled in. Click OK and the point will be placed on both maps. Do this until you have at least six points placed on the map as that is the lowest amount allowed for QGIS. For this project the numbers ranged but the more points you put on the map the more accurate the georeference is going to be. Once you have put enough points on the map click the green Start Georeferencing button. This will transform your map and project it onto the OSM form. If the map looks distorted, go back to the Georeferencer tool and add more points in areas that didn’t have them previously. If it continues to look distorted after multiple back and forths then edit the map outside of the program to remove more white space. Save the GCP points to the file that is named after the ID. Repeat this process until all of the maps have been georeferenced, it should look similar to the large pullout map that is included at the back of most binders.
Uploading Map to GitHub
The information that can be found below is taken from this presentation. If you wish to learn more about QGIS and GitHub then feel free to look the presentation over.
For this you are going to need a GitHub account Once signed in make a repository by clicking the plus sign near your profile and scrolling down to the “New Repository” button. This will open up a new page. Set the owner and name the repository. You can add a description if you want, make it public or private if you want, and initialize the repository with a README if you want, the first and the last choice are optional. You have created a repository.
The next step involves going to QGIS and installing a plugin that will connect it to the web. Go to the taskbar and click on Plugins then Manage and Install Plugins. Search for a plugin called qgis2web and install it to QGIS. This plugin will showup in the Web section of the taskbar. Go to it and find the plugin, clicking on it will bring up a new window.
Export your information, making sure that you indicate a specific place to export it to such as near the other map files on the desktop. Within your repository you are going to drag the exported file directly onto the webpage. From there you will go to the settings tab, scroll down to GitHub Pages, select master branch for the Source and choose a theme for the webpage. Back at the repository create a new file called, this will be your homepage.
Once this is completed, open the index file and click on the pencil icon near the History button.
Add the code below to embed the map within your webpage.:
<iframe src=”qgis2web/index.html” allowfullscreen=”” width=”100%”
From there you can go to the settings and change the theme of the webpage to suit your data.