using the Gewichteter Identitätswert, derive a difference matrix for a set of locations based on sets of labels for each location


Older versions of this program had a bug in the calculation of Cronbach Alpha. See bugtest for details.


giw -n number [-b percentage] [-c [-x]] [-f number] [-F] [-l filename] [-L] [-o filename] [-q] [-t] datafile(s)


-b percentage
do bootstrapping with given percentage (usually: 100)
Cronbach Alpha
-f int
minimum number of occurrences required for each variant
skip files with less than two variants, depends on option -f
-l filename
file with location labels
skip locations that are not listed in the file with location labels
-n int
number of locations
-o filename
output file
test all input files
also give Cronbach's Alpha using old (incorrect) formula


This program is used just like leven, though this program has fewer options than leven. See the manpage of leven for a full description on how to use this program.

The difference between the two programs is in how the difference between two strings in the datafiles is determined. With a set of strings A containing among others the variants A' and A'' the difference between two elements is calculated as follows:

diff(A', A')0n' / n
diff(A', A'')Levenshtein(A', A'')1

Levenshtein(A', A'') :
the Levenshtein value which is the smallest cost to change one string into the other
n' :
the number of times element A' is present in the set A
n :
the total number of elements in set A

As you can see, the leven program differentiates in how much non-identical strings are different, while the giw program differentiates in how infrequent identical strings are.

You can use the giw program to analyse data files that consists of elements from distinct categories. You can also use leven to do this, if you encode each category as a distinct string of length 1. Note that giw is not just a convenient program to analyse categorical data encoded as strings longer than 1 character. The programs giw and leven have different outcomes.

You can not use giw to analyse data in which there are nearly no identical strings, such as a set of phonetic data. In that case, you have to use leven.


The program will abort if a file _CANCEL_.L04 exists in the current directory, or if it is created while the program is running. This is useful for stopping long running calculations from a GUI, such as pyL04.