^{1}

Relational fuzzy clustering has been developed for extracting intrinsic cluster structures of relational data and was extended to a linear fuzzy clustering model based on Fuzzy

Relational fuzzy clustering is a relational extension of fuzzy clustering for revealing cluster structures buried in relational data. Relational Fuzzy

Linear fuzzy clustering models [

In this paper, a comparative study on the applicability of

The remaining part of this paper is organized as follows. In Section

Assume that we have

Besides point-type prototypes

This linear clustering model has close relation with local PCA [

When

RFCM [

In order to modify RFCM for handling non-Euclidean distance metrics, Hathaway and Bezdek [

Assume that

With fixed fuzzy memberships

This linear fuzzy clustering model was also extended to the 2D prototype case by spanning 2D prototypical planes using three medoids [

Although non-Euclidean relational data may bring negative values for the clustering criteria of (

Yamamoto et al

A sample procedure including the automated

Set

Calculate the clustering criteria

If there is at least one object that has

Update fuzzy memberships by (

Search medoids in each cluster.

Repeat Steps

In Step

Although the proposed model is in the fuzzy clustering category, it is easily seen that a hard (non-fuzzy) version can be covered when

Hathaway and Bezdek [

This paper considers the applicability of several imputation techniques in FCMdd-type linear clustering.

Hathaway and Bezdek [

The triangle inequality is also represented as follows:

It is also possible to combine the previous two imputation values for predicting a reasonable estimation of missing values. The average values of minimax TIBA and maximin TIBA are used for imputing missing elements. This TIBA is called average TIBA.

These imputation techniques based on triangle inequalities can be easily applied to relational clustering problems. In the next section, these three imputation approaches are compared in FCMdd-type linear clustering tasks in conjunction with

Two experimental results are shown in order to consider the applicability of the three TIBA imputation techniques in FCMdd-type linear clustering with

In previous researches, it has been shown that “soft” clustering models outperformed “hard” ones in local PCA tasks [

An artificial relational data set composed of 60 patterns was generated from a 2D data set shown in Figure

2D plots of artificial data set.

In the previous research [

First, Euclidean incomplete relational data matrices were generated by removing a part of off-diagonal elements where

Clustering results are compared with those without

Comparison of cluster partitions from relational data imputed by three TIBAs, with (left)/without (right)

Minimax TIBA (no. missing : 500)

Maximin TIBA (no. missing : 400)

Average TIBA (no. missing : 500)

Each approximation method with

On the other hand, without

These results imply that the FCMdd-type linear clustering can successfully extract linear substructures of incomplete Euclidean relational data using

Second, FCMdd-type linear fuzzy clustering was applied to non-Euclidean relational data.

Incomplete relational data matrices were generated in the same manner with the Euclidean case. Clustering results are depicted in Figure

Comparison of cluster partition from relational data imputed by three TIBAs, with (left)/without (right)

Minimax TIBA (no. missing : 1000)

Maximin TIBA (no. missing : 700)

Average TIBA (no. missing : 1100)

With

Without

In this way,

In the second experiment, TIBA imputation methods are compared in a document classification task. A relational data set was generated using a famous Japanese novel “Kokoro” by Soseki Natsume. The novel is composed of 3 chapters (Sensei and I, My Parents and I, Sensei and His Testament), and the chapters include 36, 18, 56 sections, respectively. The text data (Japanese language) can be downloaded from Aozora Bunko (

Document-keyword biplots [

Two relational data matrices were generated considering co-occurrence information of the 10 keywords. Jaccard coefficient and Dice coefficient are the similarity measures for asymmetric information on binary variables [

2 × 2 contingency table for text documents.

keyword B | |||
---|---|---|---|

keyword A | 1 | 0 | Total |

1 | a | b | a + b |

0 | c | d | c + d |

Total | a + c | b + d |

Jaccard's coefficient is the similarity represented as

Dice's coefficient is also the similarity represented as

Because the linear clustering model uses distance (dis-similarity) measures, the similarity measures

Before applying the FCMdd-based linear fuzzy clustering, randomly selected elements were withheld from the relational matrix with 11,772 elements and were imputed by the three TIBA methods. Then, the cluster partitions for Jaccard's index were derived as shown in Figure

Comparison of cluster partition from incomplete “Kokoro” text data derived with Jaccard’s coefficient imputed by three TIBAs.

Minimax TIBA (no. missing : 6200)

Maximin TIBA (no. missing : 8000)

Average TIBA (no. missing : 7200)

Minimax TIBA allowed with 50% missing values or fewer. Average TIBA tolerated 60% missing values or fewer. Maximin TIBA resulted in a good partition with 68% missing values or fewer. The parameters

Clustering results for Dice coefficient are depicted in Figure

Comparison of cluster partition from incomplete “Kokoro” text data derived with Dice’s coefficient imputed by three TIBAs.

Minimax TIBA (no. missing : 5600)

Maximin TIBA (no. missing : 7400)

Average TIBA (no. missing : 6600)

In the experiments, it was demonstrated that the TIBA imputation methods work well for incomplete non-Euclidean relational data in conjunction with

Finally, comparison with other methods is discussed. Although we have already many clustering algorithms, some of which are used in document clustering tasks [

Comparison of cluster partition of Fuzzy

Jaccard (no. missing : 8000)

Dice (no. missing : 7400)

On the other hand, the proposed method is designed for a different purpose of finding “local linear structures” from the view point of local PCA, which is useful for cluster-wise information summarization such as local feature map construction. In this sense, the proposed method has different future application area from the conventional clustering tools.

This paper compared the applicability of TIBA imputation methods and

From the view point of local PCA concept, the proposed method can be used for local information summarization or local feature map construction where data structures are visually summarized in low-dimensional space in conjunction with data clustering. The application is remained in future works. Another potential future work is an extension to the case of multidimensional prototype models, which is useful for constructing 2D feature map.

This work was supported in part by the Ministry of Education, Culture, Sports, Science and Technology, Japan, under Grant-in-Aid for Scientific Research (23500283).