STATISTICAL ESTIMATION OF CHRONOLOGICAL NEARNESS OF HISTORICAL TEXTS V. V. Fedorov and A. T. Fomenko

io

The "principle of correlation of maxima" of the plots of the volume of histG~J~cal

texts has been formulated and tested for the first time in [I] for the case of a uniform di~~ tribution (see also [2]).

This principle and the related method of dating of events de-

scribed in historical texts (with a time scale) were found to be necessary in various chrono .... logical investigations carried out in [1-3, 10-16].

The importance of the results obtained

in these papers, and especially with the use of the principle of correlation of maxi~a (see the corresponding formulation below or in [i, 2, i0, 16]), shows the utility of testing the stability of thisprinciple and of the corresponding method of dating of events with respect to other procedures of statistical processing of the volume functions of texts.

All the vol-

ume functions of historical texts used in this paper have been calculated in [i~ i0], 2.

Let us recall the principle of correlation of maxima and the method of dating of

events based on it.

Suppose that a historical period from the year A to the year B in the

history of a region G (i.e., a state, a city, etc.) has been described in a fairly comprehensive year-by-year text (chronicles, annals, etc.), i.e., the text X is split into sections or "chapters" X(T),

each of which describes the events of one year

volume H(X, T) of each such section X(T)

T.

We calculate the

measured, for example, by the number of lines (or

words, or symbols, or pages, etc.) (see Fig. i).

For another year text Y

this same time interval(A, B) and this same region

that describes

G, the corresponding plot of H(Y,T)

will

in general have a different form, since the distribution of the text volume is considerably affected by the personal interests of the chroniclers (the authors of the texts). the chronicle

X of the history of art and the military chronicle Y

entirely different emphasis.

For example,

will be written with an

To what extent are these differences essential, i.e., do there

exist characteristics of the volume functions that are determined only by the time interval (A, B) and the region

G and that unambiguously specifY all (or almost all) the texts descrih-

:!~

-'157

,,.: .4 ~ .-,t /~ \

'~ti", ! jL'Jv :'g,.vrxN~,

.

....

Livy Sergeev Fig. i

Translated from Problemy Ustoichivosti Stokhasticheskikh Modelei -- Trudy Seminara, pp. 101-107, 1983.

668

0090-4104/86/3206-0668512.50

9 1986 Plenum Publishing Corporation

ing this interval of time,

It is found that an important characteristic of such a plot con-

sists of the years in which the plot has local maxima (see [i, i0]).

For simplicity we shall

assume that the latter are nondegenerate, i.e., reached locally at one point. maxima of the function H(X, T} val (A, B).

Let C(T)

The local

indicate the "years described in detail" in the time inter-

be the volume of all the texts written about the year T

aries (i.e., persons living at that time),

The plot of C(T)

texts are lost over the years, and the information vanishes. information is constructed in [i, i0]:

by contempor-

is not known to us, since the The following model of loss of

For the years about which a very large number of texts

have been written, the number of texts that have been preserved will also be larger than usual. In such a form it is difficult to test the model, since we do not know the plot of s

9 How-

ever, it is possible to test one of the consequences of this model, i.e,, in view of the fact that later chroniclers X

and Y, who describe this same period (A, B),are no longer contempor-

aries of these ancient eventsj they must rely on more or less the same collection of texts passed over to them, so that they must ("on the average") describe in more detail the years for which more texts have been preserved, and in less detail the years about which little is known (a small number of texts are available). volume functions of texts X

and Y

The principle of correlation of maxima of the

has been formulated in [i, 2, i0] as follows:

of the volume of "chapters" for correlated texts period of time (A, B) and the same region

X and F

X

(i.e., which describe the same

G ) must reach simultaneously local maxima in the

interval (A, B), i.e., the years described in detail in X in Y

The plots

and the years described in detail

must be either close to one another, or they must coincide.

In contrast, if the texts

and Y are independent, i.e., they describe either quite different historical periods

(A, B) and (C, D) tions H(X, T)

of the same length, or different regions, then the plots of the volume funcand H(Y, T)

will reach local maxima at different points [provided that we let

the segments (A, B) and (C, D) overlap].

This principle of correlation of maxima can be sub-

stantiated if for the majority of pairs of actual correlated historical texts X

and Y , i.e.j

which describe practically the same events, the volume functions of the "chapters" for X reach their maxima in roughly the same years. considerably.

and

Here the value of the maxima must differ

In contrast, for actual independent texts there must be no correlation whatso-

ever of the point of the maxima.

In actual fact, in [I, 2, I0] the comparison was carried out

not just with two texts, but with two groups of texts, and the averaged plot of the volume was calculated for each group. 3.

It is evident that for actual plots of volumes of correlated texts the simultaneity

of their peaks will occur only approximately,

For estimating the degree of simultaneity with

which two volume functions reach their maxima, it is necessary to introduce a natural measure that makes it possible to estimate numerically the mismatch of the points of the maxima. a measure can be introduced by different methods.

Such

It is required that it should distinguish

reliably between pairs of dependent (correlated) texts and pairs of independent texts.

It is

found that such measures exist (which is not self-evident).

The first method has been pro-

posed in [i~ 2, !0].

The points at which the function

Let us briefly describe this measure.

H(X, T) reaches its maxima are dividing the segment (A, B) into smaller parts.

By measuring

669

/////I~Dn 6

~ Fig 9 2 their

lengths

in years,

integer-valued scribes where

the

nevertheless

q

same l e n g t h ,

can differ

be assumed that

maxima of the coalesce

of the

number

at

H(Y, T).

Let

procedure

is

v

(al, a2. . . . . .

space

we o b t a i n

from the

, aJ, of integers

R~ o f d i m e n s i o n in general

number

P

(a d i f f e r e n t

the number of maxima is

point 9

This means that

be a version

of introducing

we a r e

p.,

another

the

single-valued 9

Thus,

it

two v e c t o r s

lie

in the

i.e.,

p--q

adjoining

maxima to It

can be assumed that ~p-1

that

is

It

de-

can

some o f

we a s s u m e t h a t

such multiplicities.

same s i m p l e x

Y

an

a ( Y ) = (bll b2. . . . , bq),

Let. P>q ; t h e n

the

some m a x i m a

the

evident

plot

of

that

such a

p

.~a.~=~bi=B-i E1

ends of the

specifies

For a text

vector

P

not

that

number of maxima).

same.

H(Y, T) a r e a s s u m e d t o b e m u l t i p l e ,

function

a single

a sequence

a(X) i n a E u c l i d e a n

vector

a period

we o b t a i n

A,

i.e.,

the

?~1

of dimension

p--l,

which

is

defined

p in

the

space

R p by the

~.xi=B--A

equation

(see

Fig 9 2).

Let

l be the

length

of the vec-

i=l

tor a (Y) - - a (X)6.~. Let us write with a radius

D N~ pv(X, Y)~-vol~6~ '

l, and D ~ o

where D

will be either the Euclidean

(the continuous case), or a number of integer points,

ordinates in C

centered at the point X

is its intersection with the simplex9

measurable subset in a , then volC of C

is a ball in RP

If C

is a (P--1)-dimensional

(p--])-dimensional volume

i.e., of points with integer co-

Finally, we shall write p ( X , Y ) = k(X, r ) + k ( Y ,

(the discrete case)

and

X)

where k(X, Y)----minpv(X, Y), i.e., the minimum is taken over all the faces of the simplex a the case that the original number of maxima was different.

in

In the continuous case it is easy

to verify that p--I

pc,(X, Y )

io

The "principle of correlation of maxima" of the plots of the volume of histG~J~cal

texts has been formulated and tested for the first time in [I] for the case of a uniform di~~ tribution (see also [2]).

This principle and the related method of dating of events de-

scribed in historical texts (with a time scale) were found to be necessary in various chrono .... logical investigations carried out in [1-3, 10-16].

The importance of the results obtained

in these papers, and especially with the use of the principle of correlation of maxi~a (see the corresponding formulation below or in [i, 2, i0, 16]), shows the utility of testing the stability of thisprinciple and of the corresponding method of dating of events with respect to other procedures of statistical processing of the volume functions of texts.

All the vol-

ume functions of historical texts used in this paper have been calculated in [i~ i0], 2.

Let us recall the principle of correlation of maxima and the method of dating of

events based on it.

Suppose that a historical period from the year A to the year B in the

history of a region G (i.e., a state, a city, etc.) has been described in a fairly comprehensive year-by-year text (chronicles, annals, etc.), i.e., the text X is split into sections or "chapters" X(T),

each of which describes the events of one year

volume H(X, T) of each such section X(T)

T.

We calculate the

measured, for example, by the number of lines (or

words, or symbols, or pages, etc.) (see Fig. i).

For another year text Y

this same time interval(A, B) and this same region

that describes

G, the corresponding plot of H(Y,T)

will

in general have a different form, since the distribution of the text volume is considerably affected by the personal interests of the chroniclers (the authors of the texts). the chronicle

X of the history of art and the military chronicle Y

entirely different emphasis.

For example,

will be written with an

To what extent are these differences essential, i.e., do there

exist characteristics of the volume functions that are determined only by the time interval (A, B) and the region

G and that unambiguously specifY all (or almost all) the texts descrih-

:!~

-'157

,,.: .4 ~ .-,t /~ \

'~ti", ! jL'Jv :'g,.vrxN~,

.

....

Livy Sergeev Fig. i

Translated from Problemy Ustoichivosti Stokhasticheskikh Modelei -- Trudy Seminara, pp. 101-107, 1983.

668

0090-4104/86/3206-0668512.50

9 1986 Plenum Publishing Corporation

ing this interval of time,

It is found that an important characteristic of such a plot con-

sists of the years in which the plot has local maxima (see [i, i0]).

For simplicity we shall

assume that the latter are nondegenerate, i.e., reached locally at one point. maxima of the function H(X, T} val (A, B).

Let C(T)

The local

indicate the "years described in detail" in the time inter-

be the volume of all the texts written about the year T

aries (i.e., persons living at that time),

The plot of C(T)

texts are lost over the years, and the information vanishes. information is constructed in [i, i0]:

by contempor-

is not known to us, since the The following model of loss of

For the years about which a very large number of texts

have been written, the number of texts that have been preserved will also be larger than usual. In such a form it is difficult to test the model, since we do not know the plot of s

9 How-

ever, it is possible to test one of the consequences of this model, i.e,, in view of the fact that later chroniclers X

and Y, who describe this same period (A, B),are no longer contempor-

aries of these ancient eventsj they must rely on more or less the same collection of texts passed over to them, so that they must ("on the average") describe in more detail the years for which more texts have been preserved, and in less detail the years about which little is known (a small number of texts are available). volume functions of texts X

and Y

The principle of correlation of maxima of the

has been formulated in [i, 2, i0] as follows:

of the volume of "chapters" for correlated texts period of time (A, B) and the same region

X and F

X

(i.e., which describe the same

G ) must reach simultaneously local maxima in the

interval (A, B), i.e., the years described in detail in X in Y

The plots

and the years described in detail

must be either close to one another, or they must coincide.

In contrast, if the texts

and Y are independent, i.e., they describe either quite different historical periods

(A, B) and (C, D) tions H(X, T)

of the same length, or different regions, then the plots of the volume funcand H(Y, T)

will reach local maxima at different points [provided that we let

the segments (A, B) and (C, D) overlap].

This principle of correlation of maxima can be sub-

stantiated if for the majority of pairs of actual correlated historical texts X

and Y , i.e.j

which describe practically the same events, the volume functions of the "chapters" for X reach their maxima in roughly the same years. considerably.

and

Here the value of the maxima must differ

In contrast, for actual independent texts there must be no correlation whatso-

ever of the point of the maxima.

In actual fact, in [I, 2, I0] the comparison was carried out

not just with two texts, but with two groups of texts, and the averaged plot of the volume was calculated for each group. 3.

It is evident that for actual plots of volumes of correlated texts the simultaneity

of their peaks will occur only approximately,

For estimating the degree of simultaneity with

which two volume functions reach their maxima, it is necessary to introduce a natural measure that makes it possible to estimate numerically the mismatch of the points of the maxima. a measure can be introduced by different methods.

Such

It is required that it should distinguish

reliably between pairs of dependent (correlated) texts and pairs of independent texts.

It is

found that such measures exist (which is not self-evident).

The first method has been pro-

posed in [i~ 2, !0].

The points at which the function

Let us briefly describe this measure.

H(X, T) reaches its maxima are dividing the segment (A, B) into smaller parts.

By measuring

669

/////I~Dn 6

~ Fig 9 2 their

lengths

in years,

integer-valued scribes where

the

nevertheless

q

same l e n g t h ,

can differ

be assumed that

maxima of the coalesce

of the

number

at

H(Y, T).

Let

procedure

is

v

(al, a2. . . . . .

space

we o b t a i n

from the

, aJ, of integers

R~ o f d i m e n s i o n in general

number

P

(a d i f f e r e n t

the number of maxima is

point 9

This means that

be a version

of introducing

we a r e

p.,

another

the

single-valued 9

Thus,

it

two v e c t o r s

lie

in the

i.e.,

p--q

adjoining

maxima to It

can be assumed that ~p-1

that

is

It

de-

can

some o f

we a s s u m e t h a t

such multiplicities.

same s i m p l e x

Y

an

a ( Y ) = (bll b2. . . . , bq),

Let. P>q ; t h e n

the

some m a x i m a

the

evident

plot

of

that

such a

p

.~a.~=~bi=B-i E1

ends of the

specifies

For a text

vector

P

not

that

number of maxima).

same.

H(Y, T) a r e a s s u m e d t o b e m u l t i p l e ,

function

a single

a sequence

a(X) i n a E u c l i d e a n

vector

a period

we o b t a i n

A,

i.e.,

the

?~1

of dimension

p--l,

which

is

defined

p in

the

space

R p by the

~.xi=B--A

equation

(see

Fig 9 2).

Let

l be the

length

of the vec-

i=l

tor a (Y) - - a (X)6.~. Let us write with a radius

D N~ pv(X, Y)~-vol~6~ '

l, and D ~ o

where D

will be either the Euclidean

(the continuous case), or a number of integer points,

ordinates in C

centered at the point X

is its intersection with the simplex9

measurable subset in a , then volC of C

is a ball in RP

If C

is a (P--1)-dimensional

(p--])-dimensional volume

i.e., of points with integer co-

Finally, we shall write p ( X , Y ) = k(X, r ) + k ( Y ,

(the discrete case)

and

X)

where k(X, Y)----minpv(X, Y), i.e., the minimum is taken over all the faces of the simplex a the case that the original number of maxima was different.

in

In the continuous case it is easy

to verify that p--I

pc,(X, Y )