Site icon R-bloggers

Data Cleaning for the Tombstone Project

[This article was first published on Louise E. Sinks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="project-overview" class="level1">

Project Overview

I’m working on a project for my father that will culminate in a website for his genealogy research. There are a couple of different parts that I’m working on independently. This part involves linking photos of family gravestones to an Excel sheet that records the GPS location of the tombstones. This combined dataset is used to generate a leaflet map. This portion focused on data cleaning and the photo matching. I do generate a leaflet map at the end, but it is not the final map. I’ll do the styling of the map in a separate post.

This post is intended both to document what I did for my father so he understands any changes to the data and what results were obtained, but also as a tutorial on how to approach a messy problem. I’ve been solving problems using code for a long time. There are a ton of tutorials that focus on how to solve a specific problem, but fewer that show how to approach an undefined problem. And even fewer tutorials show mistakes and false starts. But these things happen when you are solving real world problems. Constantly checking your results against what you expect to get is critical and then figuring out how you messed up and fixed it is also important. The hard errors to find and fix are the logic errors. Everything runs fine. You get an output that may look right. But you still might not be getting the correct result. You have to approach every output critically and check your work carefully.

I generally write my posts in a “code-along” style. I include almost everything I do, including dead ends. I could present more polished posts, where I write everything after I achieved the end result. This style of post would only include the steps that directly lead to the end result. I don’t do that because I don’t think the mechanics of getting to an end result is necessarily the hard part. Thinking your way through and self-checking the work is the hard part. If you know what you are trying to do, you can always find some code snippets to achieve that result. If you don’t know what you are trying to do, then all the code snippets in the world won’t help.

Some sections I do omit mistakes and go to the final product, just so this tutorial doesn’t end up being 5 million pages long. Generally, the first time I do something, I will go into more detail than following times. For the data cleaning portion, the Cleaning Up Cemetery Names section shows the entire process, including mistakes. For the matching part, everything before Round 2 is in detail, including mistakes, and then the other rounds are much less detailed.

I did also code a simplified version of this project all the way through using only one round of matching and 30 photos, just to make sure the basic elements were working. That isn’t shown here.

If for some reason you want to run this yourself, you can get a zipped copy of all the photos from here. I don’t upload the photos in this repo because the files size is too large.

< section id="setting-up" class="level1">

Setting Up

< section id="loading-libraries" class="level2">

Loading Libraries

I’ll include more info and reference information about the packages at the code blocks where I use them.

library(tidyverse) # who doesn't want to be tidy?
library(gt) # For nice tables
library(openxlsx) # importing excel files from a URL
library(fuzzyjoin) # for joining on inexact matches
library(sf) # for handling geo data
library(leaflet) # mapping
library(here) # reproducible file paths
library(magick) # makes panel pictures
< section id="file-folder-names-and-loading-data" class="level2">

File Folder Names and Loading Data

Here set-up some variables that I use for the file/ folder structure and I read in the spreadsheet.

# folder names
blog_folder <- "posts/2023-08-04-data-cleaning-tombstone"
photo_folder <- "Photos"
archive_folder <- "Archived Photos"
unmatched_folder <- "Unmatched Photos"
match1 <- "Matched_Round1"
match2 <- "Matched_Round2"
match3 <- "Matched_Round3"
match4 <- "Matched_Round4"


#data_file <- "Tombstone_Data_small.xlsx"
data_file <- "Tombstone Data.xlsx"
# read in excel sheet
tombstones_raw <-
  read.xlsx(here(blog_folder, data_file),
    sheet = 1
  )
< section id="the-here-package-for-reproducible-file-structures" class="level2">

The here Package for Reproducible File Structures

I have folder structure that reflected the sequential nature of the matching, so photos get moved into different folders depending on what round they were matched in. I am use here to generate the paths. Quarto documents start the file path in the folder where the document resides, while r files start in the project folder. here always starts in the project folder, so it allows for easy recycling of code between r files and Quarto files and generally prevents you from getting lost in your file structure. It also allows me to easily move between an independent project and the project that is my website without having to recode all the folder names in the code. All I need to do is setup the sub-folder structure and names (as I did above) and then use them to generate file paths relative to here. You can see that usage in the loading of the excel sheet.

< section id="reformat-and-clean-the-data" class="level1">

Reformat and Clean the Data

Cleaning the data is an iterative process. A quick scan of the data reveals a bunch of really obvious issues, but as the analysis proceeds, other errors pop up that can be traced back to improperly cleaned data. Continually checking the results against expected results is critical to find the mistakes. This is part of the reason I have temporary variables (tombstone_1, tombstone_2, etc.). If I’m not sure about something, I’ll store the results in the temporary variable, so I don’t have to rerun everything from the start to get a clean copy to work with. I can just go back one or two code blocks and regenerate from a working partially cleaned version.

Deciding on ground rules for what you will and will not correct is important. For this project, I decided I would not change any photo file names. I’m working with a copy of his photo archive; he has his own filing and naming scheme, and he also corresponds with other genealogists and shares information. Changing photo names on my copy would lead to a set of photos that no longer matched those out there in various places. This decision will lead to missed matches since some photos do appear to have typos in the names such as Octava instead of Octavia. Other photos seem to not follow his normal naming convention of last name first name middle name. Some use first name last name. This again is something that could be corrected programatically, but I won’t because of my ground rules. For another project, a different decision might make more sense. (I’d definitely correct file names if it were my own data!)

I also decided that any inferred data in the spreadsheet (usually denoted in [] here) would not be used. Everything going into the map is data directly from the photos.

The tidyverse packages stringr and tidyr both have very powerful tools for data cleaning and tidying. For most tasks, there are multiple ways to accomplish the goal. I’ll illustrate several different ways to perform tasks; there is likely one that is best suited for your application so it is good to know the various methods.

< section id="fixing-the-gps-data" class="level2">

Fixing the GPS data

The GPS data is stored as a string representing degrees, minutes, and seconds of latitude and longitude. I’m going to want this as a decimal lat/long (numerical) as I know that is accepted by many mapping programs. Dealing with this data has two parts: cleaning up the typos/ formatting and then converting to the decimal number.

< section id="viewing-the-gps-data-strings" class="level3">

Viewing the GPS Data (strings)

When you view the GPS data you can see a couple of issues.

tombstones_raw %>% 
  select(Surname, N, W) %>% 
  gt() %>% 
  tab_options(container.height = px(300), container.padding.y = px(24))
Surname N W
Anderson 36o56.472 86 86.961
Anderson 36 56.472 86 86.961
Anderson 37 53.396 88 41.321
Anderson 37 52.856 88 39.163
Anderson 37 52.856 88 39.163
Anderson 37 52.855 88 39.163
Anderson 37 52.853 88 39.164
Anderson 37 52.853 88 39.167
Anderson 37 52.852 88 39.165
Appleton 36 29.552 86 46.793
Baldwin 38 33.025 87 06.328
Baldwin 38 33.025 87 06.328
Baggett 36 29.553 86 46.793
Beasley 36 35.891 86 43.204
Beasley 36 36.755 86 43.145
Beasley 36 36.755 86 43.145
Bell 36 15.064 86 11.669
Bell 36 15.064 86 11.669
Brazelton 35 09.411 86 03.624
Brazelton 35 09.410 86 03.624
Brown 40O 40.760’ 75O 31.705′
Brown 40O 40.760’ 75O 31.705′
Bundy 37 45.623
Bundy 37 53.380 88 44.474
Bundy 37 53.380 88 44.474
Bundy 37 53.380 88 44.474
Bundy 37 53.379 88 44.474
Bundy 37 52.875 88 39.118
Bundy 37 52.875 88 39.118
Bundy 37 52.873 88 39.188
Bundy 37 52.873 88 39.188
Burgess 37 49.224 88 54.527
Burgess 37 49.224 88 54.527
Clayton 37 50.788 88 50.968
Clayton 37 50.788 88 50.968
Clayton 37 50.795 88 50.977
Clayton 37 50.795 88 50.977
Chapman 37 29.894 88 54.045
Chapman 37 29.894 88 54.046
Chapman 37 25.692 88 53.951
Chapman 37 25.691 88 53.949
Chapman 37 25.691 88 53.949
Chapman 37 25.692 88 53.951
Chapman 37 25.692 88 53.951
Chapman 37 25.694 88 53.951
Chapman 38 33.026 87 06.327
Crockett 36 22.801 86 45.985
Crockett 36 22.804 86 45.984
Davis 37 44.682 88 55.994
Davis 37 44.683 88 55.993
Davis 36 14.260 86 43.129
Dolch 38 44.563 82 58.988
Dolch 38 44.584 82 58.987
Dolch 38 44.564 82 58.987
Doley 38 44.615 82 58.882
Doley 38 44.615 82 58.882
Doley 38 44.615 82 58.882
Doley 38 44.615 82 58.882
NA 38 44.618 82 58.884
NA 38 44.618 82 58.884
NA 38 44.618 82 58.885
NA 38 44.618 82 58.885
NA 38 44.618 82 58.886
Doley 38 44.615 82 58.882
Doley 38 44.610 82 58.923
Doley 38 44.611 82 58.922
Doley 38 44.611 82 59.012
Doley 38 44.611 82 59.013
Doley 37 49.907 88 35.306
Doley 37 49.907 88 35.306
Doley 37 49.907 88 35.306
Doley 37 58.810 88 55.084
Doley 37 58.810 88 55.084
Doley 37 58.751 88 55.161
Doley 37 58.751 88 55.161
Dorris 36o28.798’ 86o46.011’
Dorris 36 28.811 86 46.008
Dorris 36 28.812 86 46.008
Dorris 36 28.812 86 46.008
Dorris 36 28.813 86 46.007
Dorris NA NA
Dorris NA NA
Dorris 36 26.485 86 48.329
Dorris 36 26.484 86 48.329
Dorris 38 07.067 88 51.870
Dorris 38 07.067 88 51.870
Dorris 38 07.067 88 51.870
Dorris 38 07.081 88 51.903
Dorris 38 07.081 88 51.903
Dorris 37 54.310 88 58.084
Dorris 37 54.309 88 58.083
Dorris 37 54.309 88 58.083
Dorris 37 54.310 88 58.084
Dorris 37 54.310 88 58.084
Dorris 37 58.746 88 55.204
Dorris 37 58.749 88 55.205
Dorris 37 47.990 88 53.488
Dorris 37 47.988 88 53.489
Dorris 37 51.571 88 54.939
Dorris 37 51.571 88 54.939
Dorris 37 50.787 88 50.972
Dorris 37 50.788 88 50.971
Dorris 37 50.794 88 50.975
Dorris 37 50.794 88 50.975
Dorris 37 50.786 88 50.974
Dorris 37 50.771 88 50.986
Dorris 37 50.783 88 50.983
Dorris 37 50.775 88 50.986
Dorris 37 50.775 88 50.986
Dorris 37 50.775 88 50.980
Dorris 37 50.775 88 50.980
Dorris 524 783
Dorris 528 783
Drake 36 35.870 86 43.184
Dreisbach 40o 44.177′ 75 29.596′
Dreisbach 40o 44.177′ 75 29.593′
Everett 38 33.026 87 06.327
Farris 37 24.687 88 50.538
Farris 37 24.678 88 50.538
Finch 44 34.662′ 37 27.129′
Follis 37 51.764′ 88 56.897
Follis 37 51.758′ 88 56.894
Follis 37 51.761′ 88 56.893
Follis 37 51.761′ 88 56.896
Follis 37 51.759′ 88 56.895
Follis 37 51. 88 56
Follis 37 51.758′ 88 56.896
Follis 37 51.758′ 88 56.901′
Follis 37 51.758′ 88 56.901′
Follis 37 51.758′ 88 56.904
Ford 37 52.851 88 39.161
Fox 37 48.023 88 53.449
Frost 37 17.909 87 28.852
NA 37 17.910 87 28.852
Frost 37 17.909 87 28.854
Fuqua 36 38.189 86 51.516
Gregory 38 44.609 82 58.922′
Gregory 38 44.611′ 82 58.922′
Hart 37 51.757′ 88 56.900
Hess 37 25.687 88 53.947
Hess 37 25.687 88 53.949
Hess 37 25.688 88 53.952
Hess 37 25.688 88 53.952
Hess 37 25.688 88 53.952
Hess 37 25.687 88 53.947
Hess 37 25.688 88 53.952
Hess 37 25.687 88 53.948
Hess 37 25.689 88 53.952
Hess 37 25.689 88 53.952
Hess 37 25.693 88 53.949
Hess 37 25.693 88 53.947
Hess 37 25.693 88 53.949
Hess 37 25.690 88 53.951
Holt NA NA
Holt NA NA
Horlacher 40o 30.928′ 75o 25.072′
Horlacher 40o30.930’ 75o 25.070’
Horrall 37 54.090 88 54.218
Horrall 38 33.026 87 06.326
Horrall 38 36.963 87 11.369
Hurt 36 28.804 86 46.007
Jacobs 38 21.315′ 85 41.307′
Jacobs 38 21.317′ 85 41.306′
Johnson 37 52.872 88 39.183
Johnson 37 52.872 88 39.183
Jones 37 47.994 88 53.504
Jones 37 47.994 88 53.504
Jones 37 47.997 88 53.483
Jones 37 47.995 88 53.483
Jones 37 47.995 88 53.483
Jones 37 48.024 88 53.451
Jones 37 48.024 88 53.451
Jones 37 48.020 88 53.465
Jones 37 48.020 88 53.465
Jones 37 51.747 88 52.933
Karnes 37 58.749 88 55.161
Karnes 37 58.749 88 55.161
Keith NA NA
Keth 35 09.410 86 03.624
Kleppinger 40o 44.178′ 75 29.601′
Lipsey 38 33.917 89 07.571
Lockwood NA NA
Lockwood NA NA
Loomis 37 36.925 89 12.220
Mensch 40o 39.557′ 75 25.586′
Merrell 35 43.945 80 18.669
Merrell 35 43.942 80 18.671
Meredith 39O 41.114’ 76O 35.858′
Meredith 39O 41.115’ 76O 35.855′
Meredith 39O 41.116’ 76O 35.855′
Meredith 39O 41.116’ 76O 35.854′
Meredith 39O 41.117’ 76O 35.853′
Bell 39O 41.117’ 76O 35.853′
John 39O 41.117’ 76O 35.853′
Meredith 39O 41.118’ 76O 35.852′
Meredith 39O 41.112’ 76O 35.857′
Meredith 39O 41.112’ 76O 35.857′
Meredith 39O 41.112’ 76O 35.856′
Meredith 39O 41.112’ 76O 35.856′
Meredith 39O 41.112’ 76O 35.855′
Meredith 39O 41.112’ 76O 35.854′
Meredith 39O 41.113’ 76O 35.855′
Meredith 39O 41.113’ 76O 35.855′
Tipton 39O 41.114’ 76O 35.855′
Meredith 39O 41.114’ 76O 35.854′
Meredith 39O 41.114’ 76O 35.854′
Mildenberger 40o 44.194’ 75O 29.608
Mildenberger 40o 44.179 75 29.574
Miller 37 48.023 88 53.449
Minnich 40o 40.757’ 75O 31.679′
Minnich 40o 40.759’ 75O 31.679′
Mory 40o 33.585′ 75 23.776′
Mory 40o 33.586′ 75 23.776′
Mory 40o 33.586′ 75 23.774′
Mory 40o 33.585′ 75 23.776′
Nagel 40o 33.585′ 75 23.745′
Nagel 40o 44.191′ 75 29.603′
Nagel 41 13.033′ 75 57.329′
Nagel 40o39.575′ 75o25.555′
Nagel 40o39.577′ 75o25.549′
Nagel 40o 44.197′ 75O 29.605’
Nagel 41 13.031′ 75 57.333′
Nagel 40o 39.575′ 75 25.555′
Nagle 38 44.582′ 82 58.978′
Nagle 38 44.582′ 82 58.978′
Nagel 38 44.582′ 82 58.978′
Nagel 38 44.582′ 82 58.978′
Nagel 38 44.582′ 82 58.978′
Nagel 38 44.582′ 82 58.978′
NA NA NA
Nutty 37 25.674 88 54.020
Nutty 37 25.682 88 54.020
Nutty 37 25.678 88 54.020
Ritter 37 52.861 88 39/178
Ritter 37 52.861 88 39/178
Odom 37 58.794 88 55.324
Odom 37 58.795 88 55.326
Odom 37 47.993 88 53.510
Odom 37 47.994 88 53.510
NA 37 47.992 88 53.506
Odum NA NA
Odum 37 47.187 88 50.175
Odum 37 47.187 88 50.175
Peters 37 47.244 88 55.354
Peters 37 47.244 88 55.354
Pickard 38 04.918 88 52.028
Pickard 38 04.919 88 52.028
Pickard 38 04.917 88 52.028
Pletz 37 44.684 88 55.998
Russell 37 44.683 88 55.998
Pickard 38 04.918 88 54.028
Pickard 38 04.919 88 54.028
Pickard 38 04.917 88 54.028
Pulliam 37 25.697 88 53.922
Pulliam 37 25.697 88 53.922
Rex 37 45.776 88 55.111
Rex 37 45.776 88 55.111
Rex 37 45.777 88 55.110
Rex 37 45.777 88 55.115
Rex 37 45.776 88 55.112
Rex 37 45.774 88 55.109
Rex 37 45.776 88 55.108
Rex 37 45.776 88 55.108
Rex 37 45.776 88 55.108
Rex 32 22 549 90 52.100
Rex 37 44.784 88 55.855
Rex 37 44.785 88 55.855
Richardson 37 44.766 88 55.776
Richardson 37 44.787 88 55.775
Riegel 37 49.828 88 35.346
Riegel 37 49.828 88 35.346
Ritter 37 52.853 88 39.174
Ritter 37 52.853 88 39.174
Rockel 40o 39.556′ 75 25.585′
Rockel 40o 39.555′ 75 25.585′
Rockel 40o 39.560′ 75 25.560′
Rockel 40o 39.560′ 75 25.559′
Ross 37 58.752 88 58.162
Ross 37 58.752 88 58.162
Ruckel NA NA
Ruckel NA NA
Russell 37 44.681 88 55.998
Russell 37 44.682 88 55.998
NA 37 44.682 88 55.994
NA 37 44.682 88 55.994
NA 37 44.682 88 55.994
Siliven 37 28.189 88 48.007
Sinks 36 14.451′ 86 43.526′
Sinks 37 54.081′ 88 54.293′
Sinks 37 54.081′ 88 54.293′
Sinks 37 54.089′ 88 54.207′
Sinks 37 52.619 88 55.430
Sinks 37 52.619 88 55.430
Sinks 37 52.619 88 55.430
Sinks 37 47.989 88 53.489
Sinks 37 47.989 88 53.489
Sinks 37 47.986 88 53.489
Sinks 37 47.985 88 53.491
Sinks 37 47.984 88 53.490
Sinks 37 47.984 88 53.490
Sinks 37 47.984 88 53.488
Sinks 37 47.982 88 53.491
Sinks 37 48.024 88 53.464
Sinks 37 48.024 88 53.463
Sinks 37 48.020 88 53.463
Sinks 37 44.702 88 55.998
Sweet 37 44.704 88 55.995
Sinks 37 44.702 88 55.997
Sinks 37 44.704 88 55.998
Sinks 37 44.704 88 55.998
Sinks 38 33.836 89 07.580
Sinks 38 33.837 89 07.579
Sinks 38 33.917 89 07.572
Sinks 38 02.272 88 50.161
Sinks 37 44.770 88 55.779
Sinks 37 44.770 88 55.779
Solt 40o 48.686′ 75 37.120
Solt 40o 48.693′ 75 37.119
Solt 40o 48.690′ 75 37.113
Sfafford 37 52.608 88 55.434
Sfafford 37 52.608 88 55.434
Steen 38 33.025 87 06.328
VanCleve 37 25.694 88 53.921
VanCleve 37 25.694 88 53.921
VanCleve 37 33.397 88 46.363
VanCleve 37 33.397 88 46.363
VanCleve 37 33.397 88 46.363
VanCleave 38 04.924 88 52.030
VanCleave 38 04.924 88 52.030
Veach 37 29.916 86 54.044
Veach 37 29.916 86 54.044
Veach 37 29.895 86 54.023
Veach 37 29.895 86 54.022
Veach 37 29.895 86 54.020
Veach 37 25.692 88 53.942
Veach 37 26.692 88 53.942
Veatch 37 26.692 88 50.527
Veatch 37 26.692 88 50.527
Veach 37 26.693 88 50.540
Veach 37 26.692 88 50.531
Veach 37 26.692 88 50.531
Veach 37 29.895′ 88 54.022
Veach 37 29.897′ 88 54.022
Veatch 37 28.187′ 88 49.000′
Veatch 37 28.187′ 88 48.999′
Veatch 37 28.187′ 88 49.001′
Veatch 37 28.186 88 48.005
Veach-Nutty 37 25.682 88 54.017
Veach 37 25.681 88 54.017
Veach 37 25.679 88 54.017
Veach 37 25.682 88 54.017
Ware 37 52.856 88 39.186
Ware 37 52.856 88 39.186
Ware 37 52.867 88 39.176
Ware 37 52.867 88 39.176
Webber 37 49.829 88 35.336
Webber 37 49.826 88 35.336
Weir 36 15.064 86 11.669
Weir 36 15.064 86 11.669
Wier 37 49.208 88 46.787
Whiteside 37 26.743 88 50.534
Whiteside 37 26.743 88 50.534
Willis 36 35.889 86 43.203
Wilson 36 26.350 86 47.072
Wilson 36 26.361 86 47.070
Wilson 36 29.553 86 46.791
Wilson 36 28.812 86 46.023
Wilson 36 28.803 86 46.007
Wilson NA NA
Wilson 37 48.034 88 53.443
Wilson 37 48.034 88 53.443
Wilson 36 26.351 86 47.070
Wilson 36 26.351 86 47.070
Wilson 36 26.351 86 47.070
Wilson 36 26.350 86 47.073
Wilson 36 26.350 86 47.073
Wilson 36 26.351 86 47.071
Wilson NA NA
Wilson NA NA
Wilson NA NA
Wilson NA NA
Wise 37 50.352 88 31.612
Wollard 37 54.076′ 88 54.322′
Woolard 37 54.075′ 88 54.322′
Woolard 37 58.721 88 55.211
Woolard 37 58.721 88 55.211
Woolard 37 58.721 88 55.211
Woolard 37 58.723 88 55.212
Woolard 37 58.721 88 55.213
Woolard 37 58.720 88 55.213
Woolard 37 58.720 88 55.213
Woolard 37 51.394 88 41.745
Woolard 37 51.395 88 41.746
Woolard 37 51.396 88 41.747
Woolard 37 51.391 88 41.742
Woolard 37 51.397 88 41.741
Woolard 37 52.853 88 39.160
Woolard 37 52.853 88 39.161
Woolard 37 52.853 88 39.160
Woolard 37 52.853 88 39.159
Woolard 37 52.854 88 39.160
Woolard 37 51.742 88 52.935
Woolard 37 51.742 88 52.935

Latitude and longitude data contains some stray degree and minute symbols. The degree symbol appears both as a straight and curved apostrophe and the degree symbols appear both as o and O. This cleaning needs to be done on both N and W columns. The str_replace_all() function from stringr looks at a string, finds a pattern, and replaces it with a replacement. Here, the pattern is each of those symbols and the replacement is a space.

< section id="styling-tables-with-gt" class="level3">

Styling Tables with gt

I’m using the gt package to format my tables. Here I’m not doing much styling, but it is super easy to make really nice tables with just a few lines of code.

I write and code in RStudio using Quarto. This allows you to alternate text and code chunks. You can run all the code chunks normally in RStudio or you can “render” the quarto document, which runs all the code chunks and produces the html page that becomes the page I publish on my website. When just running the code chunks, I get a table with scroll bars, but when rendering the webpage, I get a multi-page table that displays everything. This is fixed by specifying the size of the container for the table. With the container, the table is truncated to a few rows and a scroll bar appears. The container.padding option just makes sure the data isn’t truncated in the middle of a row.

< section id="cleaning-up-typos-in-the-gps-data-strings" class="level3">

Cleaning up Typos in the GPS Data (strings)

I put all my cleaned data in a new dataframe. If something unexpected happens, I can check against the original data without having to reload it. I tend to use separate mutates for operation. I know it could be all in one mutate, but even when being careful about indents, I end up missing commas and parentheses as I add and remove steps. Individual mutates makes visually checking for syntax errors much easier for me.

tombstones <- tombstones_raw %>%
mutate(N = str_replace_all(N, pattern = "’", " ")) %>%
mutate(N = str_replace_all(N, pattern = "O", " ")) %>%
mutate(N = str_replace_all(N, pattern = "o", " ")) %>%
mutate(N = str_replace_all(N, pattern = "'", " ")) %>% 
mutate(W = str_replace_all(W, pattern = "’", " ")) %>%
mutate(W = str_replace_all(W, pattern = "O", " ")) %>% 
mutate(W = str_replace_all(W, pattern = "o", " ")) %>%
mutate(W = str_replace_all(W, pattern = "'", " ")) 

Look at the cleaned data.

tombstones %>%
  select(Surname, First.Name, N, W) %>%
  gt()  %>%
  tab_options(container.height = px(300), container.padding.y = px(24))
Surname First.Name N W
Anderson Abraham 36 56.472 86 86.961
Anderson Elizabeth 36 56.472 86 86.961
Anderson Zady 37 53.396 88 41.321
Anderson Albert 37 52.856 88 39.163
Anderson Adesia 37 52.856 88 39.163
Anderson May 37 52.855 88 39.163
Anderson E 37 52.853 88 39.164
Anderson William 37 52.853 88 39.167
Anderson Nancy 37 52.852 88 39.165
Appleton Richard 36 29.552 86 46.793
Baldwin John 38 33.025 87 06.328
Baldwin William 38 33.025 87 06.328
Baggett Mahalia 36 29.553 86 46.793
Beasley E 36 35.891 86 43.204
Beasley Josephine 36 36.755 86 43.145
Beasley Fanning 36 36.755 86 43.145
Bell John 36 15.064 86 11.669
Bell Mary 36 15.064 86 11.669
Brazelton Wm 35 09.411 86 03.624
Brazelton Esther 35 09.410 86 03.624
Brown Elizabeth 40 40.760 75 31.705
Brown Joel 40 40.760 75 31.705
Bundy Hope 37 45.623
Bundy Clem 37 53.380 88 44.474
Bundy Nancy 37 53.380 88 44.474
Bundy W 37 53.380 88 44.474
Bundy Charles 37 53.379 88 44.474
Bundy Thomas 37 52.875 88 39.118
Bundy Octavia 37 52.875 88 39.118
Bundy George 37 52.873 88 39.188
Bundy Lora 37 52.873 88 39.188
Burgess W 37 49.224 88 54.527
Burgess Alzada 37 49.224 88 54.527
Clayton G 37 50.788 88 50.968
Clayton Ellen 37 50.788 88 50.968
Clayton L 37 50.795 88 50.977
Clayton Mary 37 50.795 88 50.977
Chapman Daniel 37 29.894 88 54.045
Chapman Elizabeth 37 29.894 88 54.046
Chapman Caroline 37 25.692 88 53.951
Chapman Daniel 37 25.691 88 53.949
Chapman Lucretia 37 25.691 88 53.949
Chapman Samuel 37 25.692 88 53.951
Chapman Elizabeth 37 25.692 88 53.951
Chapman Laura 37 25.694 88 53.951
Chapman Polly 38 33.026 87 06.327
Crockett Mandy 36 22.801 86 45.985
Crockett John 36 22.804 86 45.984
Davis Ezra 37 44.682 88 55.994
Davis Lizzie 37 44.683 88 55.993
Davis Fred 36 14.260 86 43.129
Dolch Catherine 38 44.563 82 58.988
Dolch Christian 38 44.584 82 58.987
Dolch Peter 38 44.564 82 58.987
Doley George 38 44.615 82 58.882
Doley Katie 38 44.615 82 58.882
Doley Mary E 38 44.615 82 58.882
Doley Henriettie 38 44.615 82 58.882
NA MED 38 44.618 82 58.884
NA HD 38 44.618 82 58.884
NA GD 38 44.618 82 58.885
NA Mother 38 44.618 82 58.885
NA Father 38 44.618 82 58.886
Doley George 38 44.615 82 58.882
Doley James 38 44.610 82 58.923
Doley May 38 44.611 82 58.922
Doley John 38 44.611 82 59.012
Doley Maggie 38 44.611 82 59.013
Doley William 37 49.907 88 35.306
Doley Dora 37 49.907 88 35.306
Doley L[eaman] 37 49.907 88 35.306
Doley G[uilford] 37 58.810 88 55.084
Doley D[ora] 37 58.810 88 55.084
Doley Eugene 37 58.751 88 55.161
Doley Lou 37 58.751 88 55.161
Dorris J[oseph] 36 28.798 86 46.011
Dorris Joseph 36 28.811 86 46.008
Dorris Sarah 36 28.812 86 46.008
Dorris W 36 28.812 86 46.008
Dorris A 36 28.813 86 46.007
Dorris J NA NA
Dorris Elizabeth NA NA
Dorris Robert 36 26.485 86 48.329
Dorris Rebecca 36 26.484 86 48.329
Dorris Monroe 38 07.067 88 51.870
Dorris Della 38 07.067 88 51.870
Dorris Mary M 38 07.067 88 51.870
Dorris Harve 38 07.081 88 51.903
Dorris Carrie 38 07.081 88 51.903
Dorris Smith 37 54.310 88 58.084
Dorris Ada 37 54.309 88 58.083
Dorris William 37 54.309 88 58.083
Dorris Harvey 37 54.310 88 58.084
Dorris Cora 37 54.310 88 58.084
Dorris John 37 58.746 88 55.204
Dorris W 37 58.749 88 55.205
Dorris Gustavus 37 47.990 88 53.488
Dorris Sarah 37 47.988 88 53.489
Dorris Joseph 37 51.571 88 54.939
Dorris Della 37 51.571 88 54.939
Dorris William 37 50.787 88 50.972
Dorris Harriet 37 50.788 88 50.971
Dorris William 37 50.794 88 50.975
Dorris Mary 37 50.794 88 50.975
Dorris James 37 50.786 88 50.974
Dorris Sarah 37 50.771 88 50.986
Dorris W[illiam] 37 50.783 88 50.983
Dorris E[lisha] 37 50.775 88 50.986
Dorris Sarah 37 50.775 88 50.986
Dorris James 37 50.775 88 50.980
Dorris Georgia 37 50.775 88 50.980
Dorris William 524 783
Dorris Malinda 528 783
Drake Mary 36 35.870 86 43.184
Dreisbach Catherina 40 44.177 75 29.596
Dreisbach Johannes 40 44.177 75 29.593
Everett Semantha 38 33.026 87 06.327
Farris Elizabeth 37 24.687 88 50.538
Farris Elizabeth 37 24.678 88 50.538
Finch Isaac 44 34.662 37 27.129
Follis Fawn 37 51.764 88 56.897
Follis Ralph 37 51.758 88 56.894
Follis A 37 51.761 88 56.893
Follis Christian 37 51.761 88 56.896
Follis G 37 51.759 88 56.895
Follis Ralph 37 51. 88 56
Follis E 37 51.758 88 56.896
Follis William 37 51.758 88 56.901
Follis Martha 37 51.758 88 56.901
Follis Jeff 37 51.758 88 56.904
Ford Florence 37 52.851 88 39.161
Fox Frances 37 48.023 88 53.449
Frost Ebenezer 37 17.909 87 28.852
NA NA 37 17.910 87 28.852
Frost NA 37 17.909 87 28.854
Fuqua William 36 38.189 86 51.516
Gregory Leonard 38 44.609 82 58.922
Gregory Lucille 38 44.611 82 58.922
Hart Parmelia 37 51.757 88 56.900
Hess Amalphus 37 25.687 88 53.947
Hess Adolphus 37 25.687 88 53.949
Hess Samuel 37 25.688 88 53.952
Hess Augusta 37 25.688 88 53.952
Hess Ulysses 37 25.688 88 53.952
Hess Ulysses 37 25.687 88 53.947
Hess William 37 25.688 88 53.952
Hess William 37 25.687 88 53.948
Hess Jerome 37 25.689 88 53.952
Hess Franklin 37 25.689 88 53.952
Hess Samuel 37 25.693 88 53.949
Hess Bernice 37 25.693 88 53.947
Hess Catherine 37 25.693 88 53.949
Hess George 37 25.690 88 53.951
Holt Lucinda NA NA
Holt William NA NA
Horlacher Daniel 40 30.928 75 25.072
Horlacher Margaretha 40 30.930 75 25.070
Horrall Polly 37 54.090 88 54.218
Horrall James 38 33.026 87 06.326
Horrall William 38 36.963 87 11.369
Hurt Elizabeth 36 28.804 86 46.007
Jacobs Jeremiah 38 21.315 85 41.307
Jacobs Rebecca 38 21.317 85 41.306
Johnson James 37 52.872 88 39.183
Johnson Mary 37 52.872 88 39.183
Jones Levi 37 47.994 88 53.504
Jones Hester 37 47.994 88 53.504
Jones Ridley 37 47.997 88 53.483
Jones James 37 47.995 88 53.483
Jones Tina 37 47.995 88 53.483
Jones Ezra 37 48.024 88 53.451
Jones Nannie 37 48.024 88 53.451
Jones Samuel 37 48.020 88 53.465
Jones Melverda 37 48.020 88 53.465
Jones John 37 51.747 88 52.933
Karnes Willard 37 58.749 88 55.161
Karnes Ruth 37 58.749 88 55.161
Keith James NA NA
Keth Nancy 35 09.410 86 03.624
Kleppinger Anna 40 44.178 75 29.601
Lipsey Joe 38 33.917 89 07.571
Lockwood Eugenia NA NA
Lockwood Leland NA NA
Loomis Jon 37 36.925 89 12.220
Mensch Abraham 40 39.557 75 25.586
Merrell Azariah 35 43.945 80 18.669
Merrell Abigail 35 43.942 80 18.671
Meredith Eleandra 39 41.114 76 35.858
Meredith Micajah 39 41.115 76 35.855
Meredith Samuel 39 41.116 76 35.855
Meredith Elizabeth 39 41.116 76 35.854
Meredith Ruth 39 41.117 76 35.853
Bell Sarah 39 41.117 76 35.853
John Bell 39 41.117 76 35.853
Meredith Mary 39 41.118 76 35.852
Meredith Clarence 39 41.112 76 35.857
Meredith Cora 39 41.112 76 35.857
Meredith W 39 41.112 76 35.856
Meredith Susan 39 41.112 76 35.856
Meredith Hannah 39 41.112 76 35.855
Meredith Mary 39 41.112 76 35.854
Meredith Samuel 39 41.113 76 35.855
Meredith Belinda 39 41.113 76 35.855
Tipton Susannah 39 41.114 76 35.855
Meredith Thomas 39 41.114 76 35.854
Meredith Sarah 39 41.114 76 35.854
Mildenberger Anna 40 44.194 75 29.608
Mildenberger Nicolaus 40 44.179 75 29.574
Miller Myrtie 37 48.023 88 53.449
Minnich Elizabeth 40 40.757 75 31.679
Minnich John 40 40.759 75 31.679
Mory Catherina 40 33.585 75 23.776
Mory Gotthard 40 33.586 75 23.776
Mory Magdelena 40 33.586 75 23.774
Mory Peter 40 33.585 75 23.776
Nagel Anna 40 33.585 75 23.745
Nagel Anna 40 44.191 75 29.603
Nagel Caty 41 13.033 75 57.329
Nagel Daniel 40 39.575 75 25.555
Nagel Frederick 40 39.577 75 25.549
Nagel Friedrich 40 44.197 75 29.605
Nagel Johann 41 13.031 75 57.333
Nagel Maria 40 39.575 75 25.555
Nagle John 38 44.582 82 58.978
Nagle Mary 38 44.582 82 58.978
Nagel Henry 38 44.582 82 58.978
Nagel Mary 38 44.582 82 58.978
Nagel Will 38 44.582 82 58.978
Nagel Adeline 38 44.582 82 58.978
NA NA NA NA
Nutty John 37 25.674 88 54.020
Nutty Beatrice 37 25.682 88 54.020
Nutty John 37 25.678 88 54.020
Ritter NA 37 52.861 88 39/178
Ritter NA 37 52.861 88 39/178
Odom Archibald 37 58.794 88 55.324
Odom Cynthia 37 58.795 88 55.326
Odom G 37 47.993 88 53.510
Odom Sarah 37 47.994 88 53.510
NA Thomas 37 47.992 88 53.506
Odum Britton NA NA
Odum Wiley 37 47.187 88 50.175
Odum Sallie A 37 47.187 88 50.175
Peters Daniel 37 47.244 88 55.354
Peters Charlotte 37 47.244 88 55.354
Pickard William 38 04.918 88 52.028
Pickard Harriet 38 04.919 88 52.028
Pickard Louise 38 04.917 88 52.028
Pletz Karl 37 44.684 88 55.998
Russell Caroline 37 44.683 88 55.998
Pickard William 38 04.918 88 54.028
Pickard Harriet 38 04.919 88 54.028
Pickard Louise 38 04.917 88 54.028
Pulliam Frieda 37 25.697 88 53.922
Pulliam Amos 37 25.697 88 53.922
Rex William 37 45.776 88 55.111
Rex Elmina 37 45.776 88 55.111
Rex Mamie 37 45.777 88 55.110
Rex George 37 45.777 88 55.115
Rex Bertie 37 45.776 88 55.112
Rex Lulie 37 45.774 88 55.109
Rex Lily 37 45.776 88 55.108
Rex Arthur 37 45.776 88 55.108
Rex George 37 45.776 88 55.108
Rex Jno 32 22 549 90 52.100
Rex Guy 37 44.784 88 55.855
Rex Harlie 37 44.785 88 55.855
Richardson Annabelle 37 44.766 88 55.776
Richardson Alfred 37 44.787 88 55.775
Riegel Solomon 37 49.828 88 35.346
Riegel Catherine 37 49.828 88 35.346
Ritter J 37 52.853 88 39.174
Ritter Mary 37 52.853 88 39.174
Rockel Balzer 40 39.556 75 25.585
Rockel Elisabetha 40 39.555 75 25.585
Rockel Johannes 40 39.560 75 25.560
Rockel Elizabeth 40 39.560 75 25.559
Ross George 37 58.752 88 58.162
Ross Euna 37 58.752 88 58.162
Ruckel Mary NA NA
Ruckel Melchir NA NA
Russell James 37 44.681 88 55.998
Russell Ana 37 44.682 88 55.998
NA NA 37 44.682 88 55.994
NA NA 37 44.682 88 55.994
NA NA 37 44.682 88 55.994
Siliven Jenniel 37 28.189 88 48.007
Sinks A 36 14.451 86 43.526
Sinks Francis 37 54.081 88 54.293
Sinks Delphia 37 54.081 88 54.293
Sinks Salem 37 54.089 88 54.207
Sinks Daniel 37 52.619 88 55.430
Sinks Martha 37 52.619 88 55.430
Sinks Roy 37 52.619 88 55.430
Sinks Elizabeth 37 47.989 88 53.489
Sinks infant son 37 47.989 88 53.489
Sinks John 37 47.986 88 53.489
Sinks Mary 37 47.985 88 53.491
Sinks William 37 47.984 88 53.490
Sinks Charlotte 37 47.984 88 53.490
Sinks Anna 37 47.984 88 53.488
Sinks Leonard 37 47.982 88 53.491
Sinks Etta Faye 37 48.024 88 53.464
Sinks John 37 48.024 88 53.463
Sinks Sena 37 48.020 88 53.463
Sinks William 37 44.702 88 55.998
Sweet Jewell 37 44.704 88 55.995
Sinks Francis 37 44.702 88 55.997
Sinks Arlie 37 44.704 88 55.998
Sinks Viola 37 44.704 88 55.998
Sinks Leonard 38 33.836 89 07.580
Sinks Mae 38 33.837 89 07.579
Sinks Bessie 38 33.917 89 07.572
Sinks Caroline 38 02.272 88 50.161
Sinks Arlie 37 44.770 88 55.779
Sinks Eva 37 44.770 88 55.779
Solt Conrad 40 48.686 75 37.120
Solt Conrad 40 48.693 75 37.119
Solt Maria 40 48.690 75 37.113
Sfafford Trice 37 52.608 88 55.434
Sfafford Phebe 37 52.608 88 55.434
Steen Richard 38 33.025 87 06.328
VanCleve Martin 37 25.694 88 53.921
VanCleve Florence 37 25.694 88 53.921
VanCleve W 37 33.397 88 46.363
VanCleve Nancy 37 33.397 88 46.363
VanCleve J 37 33.397 88 46.363
VanCleave W 38 04.924 88 52.030
VanCleave Elizabeth 38 04.924 88 52.030
Veach Pleasant 37 29.916 86 54.044
Veach Victoria 37 29.916 86 54.044
Veach Ward 37 29.895 86 54.023
Veach Cynthia 37 29.895 86 54.022
Veach James 37 29.895 86 54.020
Veach James 37 25.692 88 53.942
Veach Nannie 37 26.692 88 53.942
Veatch John 37 26.692 88 50.527
Veatch Eleanor 37 26.692 88 50.527
Veach William 37 26.693 88 50.540
Veach James 37 26.692 88 50.531
Veach Rachel 37 26.692 88 50.531
Veach Pleasant 37 29.895 88 54.022
Veach Mary 37 29.897 88 54.022
Veatch Parmelia 37 28.187 88 49.000
Veatch Mary 37 28.187 88 48.999
Veatch Elnor 37 28.187 88 49.001
Veatch Frelin 37 28.186 88 48.005
Veach-Nutty NA 37 25.682 88 54.017
Veach John 37 25.681 88 54.017
Veach Rose 37 25.679 88 54.017
Veach Ruth 37 25.682 88 54.017
Ware Turner 37 52.856 88 39.186
Ware Martha 37 52.856 88 39.186
Ware Joseph 37 52.867 88 39.176
Ware Caroline 37 52.867 88 39.176
Webber Dick 37 49.829 88 35.336
Webber Pearl 37 49.826 88 35.336
Weir James 36 15.064 86 11.669
Weir Mary 36 15.064 86 11.669
Wier Leticia 37 49.208 88 46.787
Whiteside Lucinda 37 26.743 88 50.534
Whiteside John 37 26.743 88 50.534
Willis Matha 36 35.889 86 43.203
Wilson Jessie 36 26.350 86 47.072
Wilson Mary 36 26.361 86 47.070
Wilson Joseph 36 29.553 86 46.791
Wilson Elisha 36 28.812 86 46.023
Wilson Sallie 36 28.803 86 46.007
Wilson Lutetita NA NA
Wilson Thomas 37 48.034 88 53.443
Wilson Sarah 37 48.034 88 53.443
Wilson Elisha 36 26.351 86 47.070
Wilson Martha 36 26.351 86 47.070
Wilson Charles 36 26.351 86 47.070
Wilson Zack 36 26.350 86 47.073
Wilson Juritha 36 26.350 86 47.073
Wilson Elisha 36 26.351 86 47.071
Wilson Drury NA NA
Wilson Mary NA NA
Wilson Sandifer NA NA
Wilson Nancy NA NA
Wise Luvena 37 50.352 88 31.612
Wollard John 37 54.076 88 54.322
Woolard Nettie 37 54.075 88 54.322
Woolard Millie 37 58.721 88 55.211
Woolard Lawrence 37 58.721 88 55.211
Woolard Etta 37 58.721 88 55.211
Woolard John 37 58.723 88 55.212
Woolard James 37 58.721 88 55.213
Woolard C 37 58.720 88 55.213
Woolard Blanche 37 58.720 88 55.213
Woolard L 37 51.394 88 41.745
Woolard Ama 37 51.395 88 41.746
Woolard Robert 37 51.396 88 41.747
Woolard James 37 51.391 88 41.742
Woolard Romey 37 51.397 88 41.741
Woolard Anna 37 52.853 88 39.160
Woolard James 37 52.853 88 39.161
Woolard Francis 37 52.853 88 39.160
Woolard Turner 37 52.853 88 39.159
Woolard William 37 52.854 88 39.160
Woolard George 37 51.742 88 52.935
Woolard Nancy 37 51.742 88 52.935

Much better. There is some missing data, encoded both as blanks and as NAs. There are also some coordinates that don’t make sense, like 524 (for the entry Dorris William). This will need to be dealt with.

< section id="converting-to-decimal-coordinates-numeric" class="level3">

Converting to Decimal Coordinates (Numeric)

Next, I’m converting the N and W data to decimal latitude and longitude. S/W should be “-” and N/E should be “+”. I split the degree/minute/second data into parts and then do the conversion. I delete the intermediate components when done. I used str_split_fixed() here, which stores the parts in a matrix in your dataframe, hence the indexing to access the parts. The related function str_split() returns a list. Both functions take the string, a pattern. str_split_fixed() also requires the number of parts (n) to split into. If it doesn’t find that many parts it will store a blank (““) rather than fail. More info about the str_split family can be found here. (A function like separate() would be more straightforward for this application. I originally included another example here where I use separate, so both methods were illustrated, but I have moved that to a module of this project that isn’t posted yet.)

I want to break a coordinate into 3 parts. So 37 25.687 becomes 37 25 and 687. First I break the coordinate into two parts, using the space as the separator. So 37 and 25.687. I then coerce the first part (which is the degree part of the coordinate) into a numeric. I then split the second part ( 25.687) using the . as the separator and again coerce the results into numbers. The coercion does lead to warning about the generation of NAs during the process, but that is fine. I know not all the data is numeric- there were blanks and NAs to start with. Lastly, I convert my degree, minute, second coordinates to decimal coordinates using the formula degree + minute/60 + second/3600.

< section id="escaping-characters-in-stringr" class="level4">

Escaping Characters in stringr

It is important to note that stringr defaults to considering that patterns are written in regular expressions (regex). This means some characters are special and require escaping in the pattern. The period is one such character and the correct pattern is “\\.” Otherwise, using “.” will match to every character. The stringr cheat sheet has a high level overview of regular expressions on the second page.

< section id="using-selectors-from-dplyr" class="level4">

Using Selectors from dplyr

I named all the original output from the string splits such that they contained the word “part” and I can easily remove them using a helper from dplyr, in this case, contains. I highly recommend using some sort of naming scheme for intermediate variables/ fields so they can be easily removed in one go without lots of typing. I retain the original and the numeric parts so I can double check the results.

tombstones <- tombstones %>%
  mutate(part1N = str_split_fixed(N, pattern = " ", n = 2) ) %>%
  mutate(N_degree = as.numeric(part1N[,1])) %>%
  mutate(part2N = str_split_fixed(part1N[,2], pattern = '\\.', n = 2)) %>%
  mutate(N_minute = as.numeric(part2N[,1])) %>%
  mutate(N_second = as.numeric(part2N[,2])) %>%
  mutate(lat = N_degree + N_minute/60 + N_second/3600)
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `N_minute = as.numeric(part2N[, 1])`.
Caused by warning:
! NAs introduced by coercion
#converting to decimal longitude  
tombstones <- tombstones %>%
  mutate(part1W = str_split_fixed(W, pattern = " ", n = 2) ) %>%
  mutate(W_degree = as.numeric(part1W[,1])) %>%
  mutate(part2W = str_split_fixed(part1W[,2], pattern = '\\.', n = 2)) %>%
  mutate(W_minute = as.numeric(part2W[,1])) %>%
  mutate(W_second = as.numeric(part2W[,2])) %>%
  mutate(long = -(W_degree + W_minute/60 + W_second/3600)) 
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `W_minute = as.numeric(part2W[, 1])`.
Caused by warning:
! NAs introduced by coercion
tombstones <- tombstones %>%
  select(-contains("part"))

Taking a quick look at the results

tombstones %>%
  select(Surname, First.Name, N, N_degree, N_minute, N_second, lat) %>%
  gt()  %>%
  tab_options(container.height = px(300), container.padding.y = px(24))
Surname First.Name N N_degree N_minute N_second lat
Anderson Abraham 36 56.472 36 56 472 37.06444
Anderson Elizabeth 36 56.472 36 56 472 37.06444
Anderson Zady 37 53.396 37 53 396 37.99333
Anderson Albert 37 52.856 37 52 856 38.10444
Anderson Adesia 37 52.856 37 52 856 38.10444
Anderson May 37 52.855 37 52 855 38.10417
Anderson E 37 52.853 37 52 853 38.10361
Anderson William 37 52.853 37 52 853 38.10361
Anderson Nancy 37 52.852 37 52 852 38.10333
Appleton Richard 36 29.552 36 29 552 36.63667
Baldwin John 38 33.025 38 33 25 38.55694
Baldwin William 38 33.025 38 33 25 38.55694
Baggett Mahalia 36 29.553 36 29 553 36.63694
Beasley E 36 35.891 36 35 891 36.83083
Beasley Josephine 36 36.755 36 36 755 36.80972
Beasley Fanning 36 36.755 36 36 755 36.80972
Bell John 36 15.064 36 15 64 36.26778
Bell Mary 36 15.064 36 15 64 36.26778
Brazelton Wm 35 09.411 35 9 411 35.26417
Brazelton Esther 35 09.410 35 9 410 35.26389
Brown Elizabeth 40 40.760 40 40 760 40.87778
Brown Joel 40 40.760 40 40 760 40.87778
Bundy Hope 37 45.623 37 45 623 37.92306
Bundy Clem 37 53.380 37 53 380 37.98889
Bundy Nancy 37 53.380 37 53 380 37.98889
Bundy W 37 53.380 37 53 380 37.98889
Bundy Charles 37 53.379 37 53 379 37.98861
Bundy Thomas 37 52.875 37 52 875 38.10972
Bundy Octavia 37 52.875 37 52 875 38.10972
Bundy George 37 52.873 37 52 873 38.10917
Bundy Lora 37 52.873 37 52 873 38.10917
Burgess W 37 49.224 37 49 224 37.87889
Burgess Alzada 37 49.224 37 49 224 37.87889
Clayton G 37 50.788 37 50 788 38.05222
Clayton Ellen 37 50.788 37 50 788 38.05222
Clayton L 37 50.795 37 50 795 38.05417
Clayton Mary 37 50.795 37 50 795 38.05417
Chapman Daniel 37 29.894 37 29 894 37.73167
Chapman Elizabeth 37 29.894 37 29 894 37.73167
Chapman Caroline 37 25.692 37 25 692 37.60889
Chapman Daniel 37 25.691 37 25 691 37.60861
Chapman Lucretia 37 25.691 37 25 691 37.60861
Chapman Samuel 37 25.692 37 25 692 37.60889
Chapman Elizabeth 37 25.692 37 25 692 37.60889
Chapman Laura 37 25.694 37 25 694 37.60944
Chapman Polly 38 33.026 38 33 26 38.55722
Crockett Mandy 36 22.801 36 22 801 36.58917
Crockett John 36 22.804 36 22 804 36.59000
Davis Ezra 37 44.682 37 44 682 37.92278
Davis Lizzie 37 44.683 37 44 683 37.92306
Davis Fred 36 14.260 36 14 260 36.30556
Dolch Catherine 38 44.563 38 44 563 38.88972
Dolch Christian 38 44.584 38 44 584 38.89556
Dolch Peter 38 44.564 38 44 564 38.89000
Doley George 38 44.615 38 44 615 38.90417
Doley Katie 38 44.615 38 44 615 38.90417
Doley Mary E 38 44.615 38 44 615 38.90417
Doley Henriettie 38 44.615 38 44 615 38.90417
NA MED 38 44.618 38 44 618 38.90500
NA HD 38 44.618 38 44 618 38.90500
NA GD 38 44.618 38 44 618 38.90500
NA Mother 38 44.618 38 44 618 38.90500
NA Father 38 44.618 38 44 618 38.90500
Doley George 38 44.615 38 44 615 38.90417
Doley James 38 44.610 38 44 610 38.90278
Doley May 38 44.611 38 44 611 38.90306
Doley John 38 44.611 38 44 611 38.90306
Doley Maggie 38 44.611 38 44 611 38.90306
Doley William 37 49.907 37 49 907 38.06861
Doley Dora 37 49.907 37 49 907 38.06861
Doley L[eaman] 37 49.907 37 49 907 38.06861
Doley G[uilford] 37 58.810 37 58 810 38.19167
Doley D[ora] 37 58.810 37 58 810 38.19167
Doley Eugene 37 58.751 37 58 751 38.17528
Doley Lou 37 58.751 37 58 751 38.17528
Dorris J[oseph] 36 28.798 36 28 798 36.68833
Dorris Joseph 36 28.811 36 28 811 36.69194
Dorris Sarah 36 28.812 36 28 812 36.69222
Dorris W 36 28.812 36 28 812 36.69222
Dorris A 36 28.813 36 28 813 36.69250
Dorris J NA NA NA NA NA
Dorris Elizabeth NA NA NA NA NA
Dorris Robert 36 26.485 36 26 485 36.56806
Dorris Rebecca 36 26.484 36 26 484 36.56778
Dorris Monroe 38 07.067 38 7 67 38.13528
Dorris Della 38 07.067 38 7 67 38.13528
Dorris Mary M 38 07.067 38 7 67 38.13528
Dorris Harve 38 07.081 38 7 81 38.13917
Dorris Carrie 38 07.081 38 7 81 38.13917
Dorris Smith 37 54.310 37 54 310 37.98611
Dorris Ada 37 54.309 37 54 309 37.98583
Dorris William 37 54.309 37 54 309 37.98583
Dorris Harvey 37 54.310 37 54 310 37.98611
Dorris Cora 37 54.310 37 54 310 37.98611
Dorris John 37 58.746 37 58 746 38.17389
Dorris W 37 58.749 37 58 749 38.17472
Dorris Gustavus 37 47.990 37 47 990 38.05833
Dorris Sarah 37 47.988 37 47 988 38.05778
Dorris Joseph 37 51.571 37 51 571 38.00861
Dorris Della 37 51.571 37 51 571 38.00861
Dorris William 37 50.787 37 50 787 38.05194
Dorris Harriet 37 50.788 37 50 788 38.05222
Dorris William 37 50.794 37 50 794 38.05389
Dorris Mary 37 50.794 37 50 794 38.05389
Dorris James 37 50.786 37 50 786 38.05167
Dorris Sarah 37 50.771 37 50 771 38.04750
Dorris W[illiam] 37 50.783 37 50 783 38.05083
Dorris E[lisha] 37 50.775 37 50 775 38.04861
Dorris Sarah 37 50.775 37 50 775 38.04861
Dorris James 37 50.775 37 50 775 38.04861
Dorris Georgia 37 50.775 37 50 775 38.04861
Dorris William 524 524 NA NA NA
Dorris Malinda 528 528 NA NA NA
Drake Mary 36 35.870 36 35 870 36.82500
Dreisbach Catherina 40 44.177 40 44 177 40.78250
Dreisbach Johannes 40 44.177 40 44 177 40.78250
Everett Semantha 38 33.026 38 33 26 38.55722
Farris Elizabeth 37 24.687 37 24 687 37.59083
Farris Elizabeth 37 24.678 37 24 678 37.58833
Finch Isaac 44 34.662 44 34 662 44.75056
Follis Fawn 37 51.764 37 51 764 38.06222
Follis Ralph 37 51.758 37 51 758 38.06056
Follis A 37 51.761 37 51 761 38.06139
Follis Christian 37 51.761 37 51 761 38.06139
Follis G 37 51.759 37 51 759 38.06083
Follis Ralph 37 51. 37 51 NA NA
Follis E 37 51.758 37 51 758 38.06056
Follis William 37 51.758 37 51 758 38.06056
Follis Martha 37 51.758 37 51 758 38.06056
Follis Jeff 37 51.758 37 51 758 38.06056
Ford Florence 37 52.851 37 52 851 38.10306
Fox Frances 37 48.023 37 48 23 37.80639
Frost Ebenezer 37 17.909 37 17 909 37.53583
NA NA 37 17.910 37 17 910 37.53611
Frost NA 37 17.909 37 17 909 37.53583
Fuqua William 36 38.189 36 38 189 36.68583
Gregory Leonard 38 44.609 38 44 609 38.90250
Gregory Lucille 38 44.611 38 44 611 38.90306
Hart Parmelia 37 51.757 37 51 757 38.06028
Hess Amalphus 37 25.687 37 25 687 37.60750
Hess Adolphus 37 25.687 37 25 687 37.60750
Hess Samuel 37 25.688 37 25 688 37.60778
Hess Augusta 37 25.688 37 25 688 37.60778
Hess Ulysses 37 25.688 37 25 688 37.60778
Hess Ulysses 37 25.687 37 25 687 37.60750
Hess William 37 25.688 37 25 688 37.60778
Hess William 37 25.687 37 25 687 37.60750
Hess Jerome 37 25.689 37 25 689 37.60806
Hess Franklin 37 25.689 37 25 689 37.60806
Hess Samuel 37 25.693 37 25 693 37.60917
Hess Bernice 37 25.693 37 25 693 37.60917
Hess Catherine 37 25.693 37 25 693 37.60917
Hess George 37 25.690 37 25 690 37.60833
Holt Lucinda NA NA NA NA NA
Holt William NA NA NA NA NA
Horlacher Daniel 40 30.928 40 30 928 40.75778
Horlacher Margaretha 40 30.930 40 30 930 40.75833
Horrall Polly 37 54.090 37 54 90 37.92500
Horrall James 38 33.026 38 33 26 38.55722
Horrall William 38 36.963 38 36 963 38.86750
Hurt Elizabeth 36 28.804 36 28 804 36.69000
Jacobs Jeremiah 38 21.315 38 21 315 38.43750
Jacobs Rebecca 38 21.317 38 21 317 38.43806
Johnson James 37 52.872 37 52 872 38.10889
Johnson Mary 37 52.872 37 52 872 38.10889
Jones Levi 37 47.994 37 47 994 38.05944
Jones Hester 37 47.994 37 47 994 38.05944
Jones Ridley 37 47.997 37 47 997 38.06028
Jones James 37 47.995 37 47 995 38.05972
Jones Tina 37 47.995 37 47 995 38.05972
Jones Ezra 37 48.024 37 48 24 37.80667
Jones Nannie 37 48.024 37 48 24 37.80667
Jones Samuel 37 48.020 37 48 20 37.80556
Jones Melverda 37 48.020 37 48 20 37.80556
Jones John 37 51.747 37 51 747 38.05750
Karnes Willard 37 58.749 37 58 749 38.17472
Karnes Ruth 37 58.749 37 58 749 38.17472
Keith James NA NA NA NA NA
Keth Nancy 35 09.410 35 9 410 35.26389
Kleppinger Anna 40 44.178 40 44 178 40.78278
Lipsey Joe 38 33.917 38 33 917 38.80472
Lockwood Eugenia NA NA NA NA NA
Lockwood Leland NA NA NA NA NA
Loomis Jon 37 36.925 37 36 925 37.85694
Mensch Abraham 40 39.557 40 39 557 40.80472
Merrell Azariah 35 43.945 35 43 945 35.97917
Merrell Abigail 35 43.942 35 43 942 35.97833
Meredith Eleandra 39 41.114 39 41 114 39.71500
Meredith Micajah 39 41.115 39 41 115 39.71528
Meredith Samuel 39 41.116 39 41 116 39.71556
Meredith Elizabeth 39 41.116 39 41 116 39.71556
Meredith Ruth 39 41.117 39 41 117 39.71583
Bell Sarah 39 41.117 39 41 117 39.71583
John Bell 39 41.117 39 41 117 39.71583
Meredith Mary 39 41.118 39 41 118 39.71611
Meredith Clarence 39 41.112 39 41 112 39.71444
Meredith Cora 39 41.112 39 41 112 39.71444
Meredith W 39 41.112 39 41 112 39.71444
Meredith Susan 39 41.112 39 41 112 39.71444
Meredith Hannah 39 41.112 39 41 112 39.71444
Meredith Mary 39 41.112 39 41 112 39.71444
Meredith Samuel 39 41.113 39 41 113 39.71472
Meredith Belinda 39 41.113 39 41 113 39.71472
Tipton Susannah 39 41.114 39 41 114 39.71500
Meredith Thomas 39 41.114 39 41 114 39.71500
Meredith Sarah 39 41.114 39 41 114 39.71500
Mildenberger Anna 40 44.194 40 44 194 40.78722
Mildenberger Nicolaus 40 44.179 40 44 179 40.78306
Miller Myrtie 37 48.023 37 48 23 37.80639
Minnich Elizabeth 40 40.757 40 40 757 40.87694
Minnich John 40 40.759 40 40 759 40.87750
Mory Catherina 40 33.585 40 33 585 40.71250
Mory Gotthard 40 33.586 40 33 586 40.71278
Mory Magdelena 40 33.586 40 33 586 40.71278
Mory Peter 40 33.585 40 33 585 40.71250
Nagel Anna 40 33.585 40 33 585 40.71250
Nagel Anna 40 44.191 40 44 191 40.78639
Nagel Caty 41 13.033 41 13 33 41.22583
Nagel Daniel 40 39.575 40 39 575 40.80972
Nagel Frederick 40 39.577 40 39 577 40.81028
Nagel Friedrich 40 44.197 40 44 197 40.78806
Nagel Johann 41 13.031 41 13 31 41.22528
Nagel Maria 40 39.575 40 39 575 40.80972
Nagle John 38 44.582 38 44 582 38.89500
Nagle Mary 38 44.582 38 44 582 38.89500
Nagel Henry 38 44.582 38 44 582 38.89500
Nagel Mary 38 44.582 38 44 582 38.89500
Nagel Will 38 44.582 38 44 582 38.89500
Nagel Adeline 38 44.582 38 44 582 38.89500
NA NA NA NA NA NA NA
Nutty John 37 25.674 37 25 674 37.60389
Nutty Beatrice 37 25.682 37 25 682 37.60611
Nutty John 37 25.678 37 25 678 37.60500
Ritter NA 37 52.861 37 52 861 38.10583
Ritter NA 37 52.861 37 52 861 38.10583
Odom Archibald 37 58.794 37 58 794 38.18722
Odom Cynthia 37 58.795 37 58 795 38.18750
Odom G 37 47.993 37 47 993 38.05917
Odom Sarah 37 47.994 37 47 994 38.05944
NA Thomas 37 47.992 37 47 992 38.05889
Odum Britton NA NA NA NA NA
Odum Wiley 37 47.187 37 47 187 37.83528
Odum Sallie A 37 47.187 37 47 187 37.83528
Peters Daniel 37 47.244 37 47 244 37.85111
Peters Charlotte 37 47.244 37 47 244 37.85111
Pickard William 38 04.918 38 4 918 38.32167
Pickard Harriet 38 04.919 38 4 919 38.32194
Pickard Louise 38 04.917 38 4 917 38.32139
Pletz Karl 37 44.684 37 44 684 37.92333
Russell Caroline 37 44.683 37 44 683 37.92306
Pickard William 38 04.918 38 4 918 38.32167
Pickard Harriet 38 04.919 38 4 919 38.32194
Pickard Louise 38 04.917 38 4 917 38.32139
Pulliam Frieda 37 25.697 37 25 697 37.61028
Pulliam Amos 37 25.697 37 25 697 37.61028
Rex William 37 45.776 37 45 776 37.96556
Rex Elmina 37 45.776 37 45 776 37.96556
Rex Mamie 37 45.777 37 45 777 37.96583
Rex George 37 45.777 37 45 777 37.96583
Rex Bertie 37 45.776 37 45 776 37.96556
Rex Lulie 37 45.774 37 45 774 37.96500
Rex Lily 37 45.776 37 45 776 37.96556
Rex Arthur 37 45.776 37 45 776 37.96556
Rex George 37 45.776 37 45 776 37.96556
Rex Jno 32 22 549 32 NA NA NA
Rex Guy 37 44.784 37 44 784 37.95111
Rex Harlie 37 44.785 37 44 785 37.95139
Richardson Annabelle 37 44.766 37 44 766 37.94611
Richardson Alfred 37 44.787 37 44 787 37.95194
Riegel Solomon 37 49.828 37 49 828 38.04667
Riegel Catherine 37 49.828 37 49 828 38.04667
Ritter J 37 52.853 37 52 853 38.10361
Ritter Mary 37 52.853 37 52 853 38.10361
Rockel Balzer 40 39.556 40 39 556 40.80444
Rockel Elisabetha 40 39.555 40 39 555 40.80417
Rockel Johannes 40 39.560 40 39 560 40.80556
Rockel Elizabeth 40 39.560 40 39 560 40.80556
Ross George 37 58.752 37 58 752 38.17556
Ross Euna 37 58.752 37 58 752 38.17556
Ruckel Mary NA NA NA NA NA
Ruckel Melchir NA NA NA NA NA
Russell James 37 44.681 37 44 681 37.92250
Russell Ana 37 44.682 37 44 682 37.92278
NA NA 37 44.682 37 44 682 37.92278
NA NA 37 44.682 37 44 682 37.92278
NA NA 37 44.682 37 44 682 37.92278
Siliven Jenniel 37 28.189 37 28 189 37.51917
Sinks A 36 14.451 36 14 451 36.35861
Sinks Francis 37 54.081 37 54 81 37.92250
Sinks Delphia 37 54.081 37 54 81 37.92250
Sinks Salem 37 54.089 37 54 89 37.92472
Sinks Daniel 37 52.619 37 52 619 38.03861
Sinks Martha 37 52.619 37 52 619 38.03861
Sinks Roy 37 52.619 37 52 619 38.03861
Sinks Elizabeth 37 47.989 37 47 989 38.05806
Sinks infant son 37 47.989 37 47 989 38.05806
Sinks John 37 47.986 37 47 986 38.05722
Sinks Mary 37 47.985 37 47 985 38.05694
Sinks William 37 47.984 37 47 984 38.05667
Sinks Charlotte 37 47.984 37 47 984 38.05667
Sinks Anna 37 47.984 37 47 984 38.05667
Sinks Leonard 37 47.982 37 47 982 38.05611
Sinks Etta Faye 37 48.024 37 48 24 37.80667
Sinks John 37 48.024 37 48 24 37.80667
Sinks Sena 37 48.020 37 48 20 37.80556
Sinks William 37 44.702 37 44 702 37.92833
Sweet Jewell 37 44.704 37 44 704 37.92889
Sinks Francis 37 44.702 37 44 702 37.92833
Sinks Arlie 37 44.704 37 44 704 37.92889
Sinks Viola 37 44.704 37 44 704 37.92889
Sinks Leonard 38 33.836 38 33 836 38.78222
Sinks Mae 38 33.837 38 33 837 38.78250
Sinks Bessie 38 33.917 38 33 917 38.80472
Sinks Caroline 38 02.272 38 2 272 38.10889
Sinks Arlie 37 44.770 37 44 770 37.94722
Sinks Eva 37 44.770 37 44 770 37.94722
Solt Conrad 40 48.686 40 48 686 40.99056
Solt Conrad 40 48.693 40 48 693 40.99250
Solt Maria 40 48.690 40 48 690 40.99167
Sfafford Trice 37 52.608 37 52 608 38.03556
Sfafford Phebe 37 52.608 37 52 608 38.03556
Steen Richard 38 33.025 38 33 25 38.55694
VanCleve Martin 37 25.694 37 25 694 37.60944
VanCleve Florence 37 25.694 37 25 694 37.60944
VanCleve W 37 33.397 37 33 397 37.66028
VanCleve Nancy 37 33.397 37 33 397 37.66028
VanCleve J 37 33.397 37 33 397 37.66028
VanCleave W 38 04.924 38 4 924 38.32333
VanCleave Elizabeth 38 04.924 38 4 924 38.32333
Veach Pleasant 37 29.916 37 29 916 37.73778
Veach Victoria 37 29.916 37 29 916 37.73778
Veach Ward 37 29.895 37 29 895 37.73194
Veach Cynthia 37 29.895 37 29 895 37.73194
Veach James 37 29.895 37 29 895 37.73194
Veach James 37 25.692 37 25 692 37.60889
Veach Nannie 37 26.692 37 26 692 37.62556
Veatch John 37 26.692 37 26 692 37.62556
Veatch Eleanor 37 26.692 37 26 692 37.62556
Veach William 37 26.693 37 26 693 37.62583
Veach James 37 26.692 37 26 692 37.62556
Veach Rachel 37 26.692 37 26 692 37.62556
Veach Pleasant 37 29.895 37 29 895 37.73194
Veach Mary 37 29.897 37 29 897 37.73250
Veatch Parmelia 37 28.187 37 28 187 37.51861
Veatch Mary 37 28.187 37 28 187 37.51861
Veatch Elnor 37 28.187 37 28 187 37.51861
Veatch Frelin 37 28.186 37 28 186 37.51833
Veach-Nutty NA 37 25.682 37 25 682 37.60611
Veach John 37 25.681 37 25 681 37.60583
Veach Rose 37 25.679 37 25 679 37.60528
Veach Ruth 37 25.682 37 25 682 37.60611
Ware Turner 37 52.856 37 52 856 38.10444
Ware Martha 37 52.856 37 52 856 38.10444
Ware Joseph 37 52.867 37 52 867 38.10750
Ware Caroline 37 52.867 37 52 867 38.10750
Webber Dick 37 49.829 37 49 829 38.04694
Webber Pearl 37 49.826 37 49 826 38.04611
Weir James 36 15.064 36 15 64 36.26778
Weir Mary 36 15.064 36 15 64 36.26778
Wier Leticia 37 49.208 37 49 208 37.87444
Whiteside Lucinda 37 26.743 37 26 743 37.63972
Whiteside John 37 26.743 37 26 743 37.63972
Willis Matha 36 35.889 36 35 889 36.83028
Wilson Jessie 36 26.350 36 26 350 36.53056
Wilson Mary 36 26.361 36 26 361 36.53361
Wilson Joseph 36 29.553 36 29 553 36.63694
Wilson Elisha 36 28.812 36 28 812 36.69222
Wilson Sallie 36 28.803 36 28 803 36.68972
Wilson Lutetita NA NA NA NA NA
Wilson Thomas 37 48.034 37 48 34 37.80944
Wilson Sarah 37 48.034 37 48 34 37.80944
Wilson Elisha 36 26.351 36 26 351 36.53083
Wilson Martha 36 26.351 36 26 351 36.53083
Wilson Charles 36 26.351 36 26 351 36.53083
Wilson Zack 36 26.350 36 26 350 36.53056
Wilson Juritha 36 26.350 36 26 350 36.53056
Wilson Elisha 36 26.351 36 26 351 36.53083
Wilson Drury NA NA NA NA NA
Wilson Mary NA NA NA NA NA
Wilson Sandifer NA NA NA NA NA
Wilson Nancy NA NA NA NA NA
Wise Luvena 37 50.352 37 50 352 37.93111
Wollard John 37 54.076 37 54 76 37.92111
Woolard Nettie 37 54.075 37 54 75 37.92083
Woolard Millie 37 58.721 37 58 721 38.16694
Woolard Lawrence 37 58.721 37 58 721 38.16694
Woolard Etta 37 58.721 37 58 721 38.16694
Woolard John 37 58.723 37 58 723 38.16750
Woolard James 37 58.721 37 58 721 38.16694
Woolard C 37 58.720 37 58 720 38.16667
Woolard Blanche 37 58.720 37 58 720 38.16667
Woolard L 37 51.394 37 51 394 37.95944
Woolard Ama 37 51.395 37 51 395 37.95972
Woolard Robert 37 51.396 37 51 396 37.96000
Woolard James 37 51.391 37 51 391 37.95861
Woolard Romey 37 51.397 37 51 397 37.96028
Woolard Anna 37 52.853 37 52 853 38.10361
Woolard James 37 52.853 37 52 853 38.10361
Woolard Francis 37 52.853 37 52 853 38.10361
Woolard Turner 37 52.853 37 52 853 38.10361
Woolard William 37 52.854 37 52 854 38.10389
Woolard George 37 51.742 37 51 742 38.05611
Woolard Nancy 37 51.742 37 51 742 38.05611

The weird coordinates like William Dorris had (524) were turned into NAs by this process, so I don’t need to worry about fixing them. The leading zeros were removed from the seconds data. For this application, it doesn’t matter. For others it might, and you could pad then back using str_pad(). So that’s it for cleaning this variable.

< section id="cleaning-up-dates-strings" class="level2">

Cleaning up Dates (strings)

Next, I’m going to clean up the dates. They imported as also imported as strings. I don’t think I’m going to use the dates in the map, but I might use them when I’m working with the web scraping data. Like I did with the GPS Data, I’m first going to cleanup the typos then convert to the format I want.

< section id="viewing-the-dates" class="level3">

Viewing the Dates

class(tombstones$DOB)
[1] "character"
tombstones %>% 
  select(Surname, First.Name,DOB, DOD) %>%
  gt()  %>% 
  tab_options(container.height = px(300), container.padding.y = px(24))
...
Surname First.Name DOB DOD
Anderson Abraham 10 Mar 1776 15 Aug 1838
Anderson Elizabeth 29 Jan 1782 13 Oct 1869
Anderson Zady 18 Apr 1812 12 Dec 1839
Anderson Albert 28 Nov 1809 5 Nov 1882
Anderson Adesia 17 Mar 1808 26 Spt 1864
Anderson May NA 9 Aug 1887
Anderson E 23 Spt 1877 24 Oct 1899
Anderson William 26 Feb 1836 31 Dec 1895
Anderson Nancy 2 Spt 1836 18 Oct 1917
Appleton Richard 1 Aug 1817 6 Oct 1897
Baldwin John 25 Sep 1845 NA
Baldwin William NA NA
Baggett Mahalia 3 June 1832 6 June 1897
To leave a comment for the author, please follow the link and comment on their blog: Louise E. Sinks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version