4. On collecting dialect data
The most time consuming part of doing dialect research is the collecting and
digitalisation of data. The data must be available in the specific file
format that RuG/L04
can handle. If this wasn't
taken into account during digitalisation, you have to convert the data.
The files used by the software, both data and other files, have a human-readable
format. This means that you can edit the files with any editor for
plain text, but this also means that you can
use simple tools such as Perl scripts to process the files. If the data
is in a binary format, for example stored by a database application, you must use that
application to export the data. You use that program to write
the data in a format necessary for RuG/L04,
or in another plain text format, so you can use a script to make the
4.1 Required helper files
In addition to the data, there are some other files you need.
You need a file with numbered list of names of location. The file format is
(As an example, have a look at the file PA.lbl that was used in the previous
two chapters of this tutorial.)
If you want to get the local incoherence of a
(see part 6), you'll need a file with
coordinates of the locations. See coordinate file for a
description of the file format. You also need this file if you want to
draw maps. (Example: the file PA.coo used in the previous chapters.)
Other helper files are used only for drawing maps, and are discussed in
part 5 of the tutorial.
Below is a discussion of the file format of the actual dialect data.
4.2 Data files
Dialect data consists of variants of a series of words (or word groups) as they
are recorded in a series of locations. This data is grouped into a set of
files. A single file has all the variants of one word for all locations. This
is illustrated in the diagram below:
Correct distribution of data among files
is an overview of the file format for individual files.
Note that, in the diagram above, each location has exactly one variant. That
doesn't need to be. You can have for one word more than one variant for a single
location, or none at all.
4.2.1 Reordering data files
Suppose you have all data neatly distributed in a set of files, but not with
all variants of one word for all locations in one file, but all variants
of all words for a single location in one file, like this:
Incorrect distribution of data among files
In this case, you need to redistribute the data. For this, you can use the
program. In this case too, the
data needs a specific format to be handled, as is explained in the manual
4.2.2 Splitting a single spreadsheet into separate files
If you have the data in a single spreadsheet, then you can use the
program to split the data into a
set of separate files. You need to save the spreadsheet first as a
tab-delimited file or as a comma-delimited file.