clas299

Jupyter notebook for Ptolemny’s Spatial Analysis

This notebook contains scala scripts that analyze the authorities issuing coins during the roman empire.

There’s a Jupyter notebook to execute the following blocks of scala code.

Binder


Archaeological Data Analysis: Coins of the Roman Empire

Author: Michael Dahlquist

Exploring a data set

In this notebook, you’ll download a data set derived from the openly licensed content of the Online Coins of the Roman Empire (OCRE). The original data set is available from http://nomisma.org/ RDF XML format. We’l work with a version formatted as a delimited-text file, using # as the column delimiter, with a header line labelling each column.

As with any data set, our first task is to figure out what kinds of data it contains, and what the range of values are for each category of data. We’ll examine the contents of several columns of data.

Best way to execute scala code in jupyter notebook

This file alternates between plain text and blocks of code. To ensure all lines run, please execute each block of code as you go by clicking the box of code and pressing control and enter. Alternatively, you can go to cell -> run all to execute the enter page now.

Download delimited-text data

We’ll make the standard Scala Source object available by importing it, then use it to retrieve the content of a URL.

import scala.io.Source
val ocreCex = "https://raw.githubusercontent.com/michaeldahlquist/clas299/master/coins-of-the-roman-empire/ocre-cite-ids.cex"

We’ll extract a sequence of lines from the URL source, and convert them to our favorite type of Scala collection, a Vector.

(The following cell downloads the data: depending on your internet connection, this might take a moment.)

val lines = Source.fromURL(ocreCex).getLines.toVector

Examine header line

To start with, let’s see what the first line looks like, and compare it with the first data line.

lines.head // same as lines(0)
lines(1)

Split data strings into columns

Every line is a String. If we break it up using the split method, we get an Array of Strings, which we’ll convert to a Vector of Strings. The end result will be that from a Vector of Strings, we create a Vector of Vectors of Strings. Notice that Scala identifies the class of the new data expression as Vector[Vector[String]].

val data = lines.tail.map(ln => ln.split("#").toVector)

Mapping each Vector to the first item in the Vector is equivalent to extracting the first column from each Vector. The header line told us that the first column should contain ID values.

val ids = data.map(columns => columns(0))

We want to be sure that all ID values are unique. We can verify that by comparing the number of items in the ids Vector with the number of distinct values in the ids Vector. If they’re the same, then every value is unique.

//println("Records: " + ids.size)
//println("Distinct IDs: " + ids.distinct.size)
if(ids.size == ids.distinct.size) {
    println("All records uniquely identified.")
} else {
    println("Duplicate identifiers in data set.")
}

Distribution of denominations

Let’s look at how coin denominations are described. You can see from the header line that denominations are in the third column, so we’ll map each Vector to the thrid column – and remember that we start indexing with 0, so the third column is indexed as (2).

val denominations = data.map(columns => columns(2))

We’ll use a very handy Scala idiom to count how many times each authority appears. If we group the elements in our Vector by their value, the result is a Map from the unique set of values to a list of the matching values.

val denominationsGrouped = denominations.groupBy(denom => denom)
// Free puzzle:  notice that the result of this groupBy should be the same size 
//               as the numnber of distinct values in our list:
if (denominationsGrouped.size == denominations.distinct.size) {
    println("Number of groups is same as number of distinct values.")
} else {
    print("Something is terribly wrong.  The number of groups ")
    println("is not the same as the number of distinct values.")
}

What we really want to know is how many times does each denomination appear? We can find that out by transforming our mapping of String->Vector[String] to give us a mapping of each denomination to the size of the Vector of its occurrences.

val denominationsCounts = denominationsGrouped.map{ case (d, v) => (d, v.size) }

Recall that Maps are not ordered in Scala. If we now convert the Map to a Vector, we will have a Vector pairing a String with an Int. We can sort the Vector by the second element of the pairing (which will sort from smallest to largest), then reverse the results to have a descedning list of how often each denomination occurs.

val denominationsVector = 
    denominationsCounts.toVector
val denominationsHisto = 
    denominationsCounts.toVector.sortBy(frequency 
                                        => frequency._2).reverse

Now we can easily see the extremes of the counts:

println("Most frequent denomination: " + denominationsHisto.head)
// Find denominations occurring fewer than some threshhold number of times
val cutOff = 10 
val leastDenominations = 
    denominationsHisto.filter(frequency => frequency._2 < cutOff)
println("Least frequent denominations: \n" + leastDenominations.mkString("\n"))

Assignment

Analyze how many issues are produced by each issuing authority to answer the following questions:

Gather and organize your data

// First, to extract the "Authority" column from the data set, uncomment 
// and complete the following line:
val authorities = data.map(columns => columns(4))

Question 1: how many authorities strike coins?

// Use the distinct method and size method to count 
// how many distinct values you have in `authorities`
authorities.distinct.size

Group records by authority and count them

// use the groupBy method to group each auhority by the authority value.
// This will give you a Map of Strings to a Vector of Strings
val authoritiesGrouped = authorities.groupBy(authority => authority)
// now convert each pairing of String->Vector[String] to a String->Int counting 
// how many elements are in the original Vector.
// The result is a Map[String->Int].
val authoritiesCounts = authoritiesGrouped.map{ case (auth,v) => (auth, v.size)}
// next convert your Map[String->Int] to a Vector.  The result is a 
// Vector of pairings of (String, Int).
// We'll sort this by the second element of the pairing, namely the Int.  
// Since we sort from smallest to largest
// by default, you can reverse the result so that the 
val authoritiesHistogram = authoritiesCounts.toVector.sortBy(auth => auth._2).reverse

Questions 2: who strikes the most issues?

Question 3: who strikes the fewest?

// With the authoritiesHistogram you created, you can use the `head` and 
// `last` methods to see the first and last entries in the Vector.
authoritiesHistogram.head
authoritiesHistogram.last
authoritiesHistogram.filter{freq => freq._2 == 1}//all that have 1