Visualisation of Keizu: smith genealogy and school lineage

I would like to share some news with you. It’s about something which was promised long time ago: a solution for drawing smith & school lineages. Every time I attempt writing comments on a particular school or smith, I keep coming back to the same issue (on top of incompetence, of course :-) ) of being unable to show relationships between numerous smiths of the same school or lineage in a clear way. Moreover, currently smith records hold all the information needed to do it, but it’s just a matter of choosing a right solution and putting it on the site.

Almost every serious Nihonto website has an article or two which contain smith lineages, most likely in a form of custom-made images. It’s a good old-fashioned way which works, but prohibits any cooperation, hard to change and also sometimes a subject to copyright. There are also two major Nihonto sites which offer genealogical trees on a wider scale: Sho-Shin and JSSUS’s NKB.

Robert Cole’s Sho-Shin provides an extensive set of articles written in a pre-web textual format which has a luxury (and a curse, from web designer’s point of view) of fixed character width fonts. This allows drawing genealogical trees using just text:

http://www.sho-shin.com/sanjo.htm

Simplicity and modest page size are definite advantages of this approach. However, you must have Robert’s (must be) gigantic determination and patience to find an appropriate layout for every tree and ensure all the little details are correct. If, for any reason, one wanted to add a new smith or even a line connecting 2 smiths, it may end up in a major redesign of the whole tree.

JSSUS Nihonto Knowledge Base, being opposite to Sho-Shin, is totally auto-generated and based on a proper database:

http://www.jssus.org/nkb/gen.php?id=ARI146

Being generated (seemingly) on the fly, it offers a hierarchical tree-like layout, showing 3 levels of the tree and representing smith records as clickable boxes. If you click the box, the next level of the tree appears (if exists). It’s a neat, light, text-based implementation with, I’m sure, more features to come in the future.

Major advantages:

  • Light, compact, no graphics involved – always good for web hosting
  • Changes are reflected in the lineage automatically once the database gets updated

Disadvantages would be:

  • Supports trees only, not able to show more complex relationships without re-development
  • Text-based format puts constraints on possible visual improvements in the future, in particular:
  • It can be awkward drawing large trees
  • It’s a bespoke solution – it can only exist on NBK site and needs author’s support

Taking all this into account, I was looking for a solution with a sensible compromise between simplicity and feature-rich appearance. Also, I wanted to be able to show various types of relationships in different ways, e.g. dashed line for sensei-to-student relationship, solid line for father-to-son and thick line for father-and-sensei-to-student. Also, as NihontoClub is a project I’m doing on my free time, I wanted to find something relatively easy to work with, without too much development involved. And the solution appeared. It’s called HyperGraph, an open source software for visualisation of semantic networks – an almost perfect candidate for the job.

Pros:

  • It’s XML-based. In other words, it’s portable, you can take the tree definition home with you, save to your hard-drive and replay, edit, etc.
  • It can produce both static images and interactive content
  • Rich in features with (almost) no programming involved

Cons:

  • ‘Native’ way of using it is via Java Applet. Not everybody would like installing heavy Java JRE on the PC
  • Hungry for bandwidth. Either images or applets or XML files aren’t small.

Well, a picture speaks a thousand words, have a look at the first draft of Sanjo school lineage:


 
This is a static image. I’ll publish a link to an interactive applet sometime soon. Just for your information, you may have a look at the XML definition of the graph in the attached file.

I’ll keep you informed regarding the progress of the development.

Regards,
Stan

AttachmentSize
sanjo.xml4.21 KB

comments

Hi Stan,

You're doing a nice job with the site. I have a few suggestions (I wrote the NKB @ JSSUS and donated its use to them).

I am not maintaining the NKB going forward at this point. I cranked it out over a couple of weeks, including the tree stuff. All of that is automatically generated on the fly.

The database contains simple links backwards to a father, and a sensei. That has all the information that is necessary to automatically generate a tree. From there, the work is simply about combing the database, fixing links to fathers and teachers. The NKB is set up to allow anyone to access and edit the information (i.e. wikipedia style) with the hopes that the data will only improve going forward. As long as this simple link backwards to "who your teacher is" is correct, then everything else will generate nicely on the fly.

Given a node X (at level 3 in the tree) for a swordsmith named X:

level 2: sensei(X) = the sensei of X

This makes for the following queries based on this information:

level 1: grand-sensei(X) = sensei(sensei(X))
level 2: sensei(X)
level 3: peers(X) = query for Y, where sensei(Y) = sensei(X)
level 4: students(X) = query for Y, where sensei(Y) = X

I would advocate continuing with this approach, since it allows for simple maintenance of the database. Anyone inspecting a record, all they need to do is make sure that the sensei of the particular smith is recorded correctly and everything can be generated on the fly from there.

...

I think the hyperchart stuff you are looking at is the correct way to display this stuff. I wrote all of mine in CSS which is generated by PHP scripts. You are welcome to the whole kit and caboodle if you want to play with it and expand on my work. I am doing various things dynamically, I am assuming that you did the same thing as me in examining the ToShow database. I decode this database on the fly and re-map all of the recorded information

Those represent the signature examples which are pretty important to have available, and also the embedded comments that may have Japanese text in them. I decode them and recode them as unicode HTML encoded text. ISO 2022-jp I seem to recall.

The editing code is the same, I do this back and forth on the fly. In the editor, someone can type in an era name and it will encode to the Japanese for them, similarly there is some semi-translation, where they could for instance type in "soshu ju akihiro" and it would automatically translate to the Japanese of all common words without requiring an IME. For those with IME who are more sophisticated, they can still edit and change the Japanese text provided here.

In the editor, typing in say "Yukimitsu" will then show all the options for Yukimitsu smiths to link up as the father or sensei for a smith like Masamune, and let the user pick among them. This is better than requiring them to look up the "smith code" for absolutely resolving the smith. Additional information like the highest level the smith has achieved at the NBTHK is simply added, all of the editor works by mapping back and forth so someone can say select an Era by the date and it will provide the rest of the information in Japanese and English for the name of the era, or go by the era name and get the date.

It's very important to provide kanji for everything, as well as romaji and then english translations where applicable, because people may be checking say against Fujishiro. When you do this, you can't accept the translator's word for it, so you look in the book at the era kanji and want to check against the kanji shown in the online display for editing purposes. If the translator made a goof, if you are just continuing in english you will replicate the error.

I integrated wazamono information as well into the NKB.

Everything data-wise was left in native formats, and then various UNIX scripts were written to munge it all into an input file for an SQL database. This I believe is the correct way to do it, since it allows for adjustment of the various sources or the addition of future sources. Again, you are welcome to have a copy of this stuff and use it under non-commercial terms.

The Toko Taikan man yen rating is more important probably than the Fujishiro skill rating. Both should be included. I have the data sources for that as well.

Additionally, it is very important in this community to cite your influences and sources. In particular, if the information came from ToShow, then it originates with Hawley. A bunch of old-time collectors got together and did the difficult work of transcribing Hawley into electronic form. Hawley needs to be thanked for the groundbreaking work in the beginning. The generation before us needs to be thanked for the transcription task which is massive. ToShow needs to be thanked for the idea of making it available to all. Maybe NKB needs to be thanked for elevating it to the next level.

From a legal perspective you should also disclaim the accuracy, that it is a study aid and that errors are in there... because each party above has made errors, and if someone makes a $50k buying decision based on incorrect data on your website you can end up in a lawsuit.

My own editing page would be pretty easy to integrate into your site... even if your database is slightly different in structure, you would be able to just recode the couple of lines where it creates the input query.

One thing that my editing page was set up to do, was to summarize the inputs, and then email them to me as they went into the database. I had it set up that I could revert the changes easily, and this also let me keep track of all edits. If an edit came in and it was obviously wrong I could just go correct it or add to it in this manner. There was never a case of vandalism but I've had to make minor corrections sometimes to submissions.

Thank you

Dear Darcy,

I really appreciate your detailed feedback. NKB is a fantastic resource and I enjoy using it for years.

You’ve made few interesting observations and I’ll try to address them one by one below.

First of all, I’ve just added your account to the access group which allows editing the smith records. You may open any record and then click ‘EDIT’ button to see the interface for data entry. Also, if you have a look at the help page, it might give you a good idea about the underlying data model as smith pages are rendered differently depending on the actual data available.

Relationships between smiths (sensei, father) are stored in a very similar way to what you’ve described. Input fields are set to auto-complete, which means once you start typing the name, a list of choices will pop up with the name and Hawley ID. I’m not completely happy with the way it currently works, but it does the job.

The site is based on Drupal CMS (PHP-based) and many things come for free. Once I’m more or less happy with the data quality, I’ll enable revisions for the smith records and let any registered users editing them (I do it anyway, but it’s not the default option just yet). A ‘page-per-smith’ paradigm was chosen over NKB’s ‘flash card’ style to allow having smith’s record as fully featured article in case if there’s enough information. This also gives a great coverage by search engines. Comments can also be added to smith records, making it more like a conversation and allowing people to ask questions and write answers in less formal and obliging way.

All the fields support Kanji. Provinces and Eras are stored as codes, and though I don’t display Kanji for them just now, it’s an easy thing to add just as a visualisation feature. So far I was putting more emphasis on the core functionality, and once it’s completed, these points can be addressed as well. The data model is fully expandable, but I didn’t want to scare people off with 40 or so editable fields per smith. In terms of displaying the data, I consider NKB to be ‘an elder brother’ in a sense of having kanji for everything while NC keeping it in a more simple beginner’s style.

Adding wazamono rating is in my plans.

I totally respect the great efforts which generations of researchers before us had put into all sorts of Nihonto sources which are available to us now. I don’t have a luxury of talking to Nihonto specialists in person very often, therefore it would be impossible to gather any information without referring to well known works of Kokan Nagayama, Fujishiro, Albert Yamanaka’s newsletters and of course Hawley, as well as Dr. Stein’s website, Robert Cole’s Sho-Shin, NKB and many others. Bibliography page has a list of sources I’m using for data verification. If I didn’t put a full disclaimer just yet at the top of Nihonto Club smith index, it’s only because the work is still in progress and I was concentrating on dynamic content. I preferred exposing it now rather than hiding it from the public eye until it’s done, for two reasons: to get an early feedback from more experienced specialists and simply just because I don’t know how many years it will take to finish it. So far the main focus was to get the basic data right such as names, era, province, kanji for names and signatures etc. But you are right, I must be more explicit regarding the sources of data and put the disclaimer in order to avoid awkward situations.

The very first version of the database is based on ToShow dumps dated 2004. I used feeder scripts to import it into Excel for initial cleansing and then into SQL database. I believe I used your XML files (distributed through Nihonto yahoo group) as a data source as well (but not the G-F and Slough’s databases due to copyright reasons). Since then it deviated quite a bit from the original. As the first stage (I believe you did exactly the same for NKB) the leverage was put on machine-based data analysis to identify typos and inconsistencies (e.g. using wrong kanji and era, like in the cases when romaji reading of the different era looked very similar and it was an input mistake to choose one over another), rather than manual verification. Around 600-700 records were added on top.

After exposing it in the internet I compared it with what NKB has right now. Obviously a lot of data had been added into it since 2004. My impression is that NKB has a better coverage of signatures and finer details on the well known smiths, but there are plentiful examples when one database has some details that other one doesn’t.

The Swordsmith Index is intended to be non-commercial. I’d be happy to share the full contents of the NC smith database in case if you find it useful. We could even exchange data on a regular basis in order to have the best coverage available.

I spent a lot of time trying to come up with a good way of referring to the sources and making it transparent why certain corrections were made at particular time. As you know very well yourself, Hawley contains many errors and duplicate records, and if some other source (e.g. NKB) provides different information, only the specialist in particular smith or school could tell if there is a typo in Hawley’s or NKB or some other source, or there are just different but totally valid opinions on the subject. On the other hand, record management may become too cluttered if every single detail has a note regarding the source. My idea so far is to use free text Notes field for each smith to mention any corrections and data sources in general which were used for that particular record. There is also Bibliography feature which allows adding standard references in an easy way. Usually I use as many sources as I have in my possession to fill in the details for a particular smith. And if there is a consensus regarding times, places and provenance, then I don’t leave any notes. While if different sources show different opinions I mark it as ‘Source X says such and such and source Y has a different opinion’. I should admit it’s not always followed, but I’m trying my best. It is also fair to say that my personal knowledge on the subject is very humble and if there is a case when information about some smith leaves some doubts I’d rather not mention it at all or put its doubtful nature very clear.

I find your feedback very useful and if there is something else on this site which can be improved, please share it. Any criticism is welcome.

Regards,
Stan