Research

My current research mainly focuses on data profiling, graph analysis algorithms, diffusion networks, data mining and database. Currently, I am applying data profiling techniques into traditional database and help those algorithms.

Brief introductions on some of my past research projects are listed below.

Research on Recommender Systems for Further Education

October 01, 2018

Research project, Nanyang Technological University, Singapore

Recently, many researches have been done on multiple types of recommender systems, including job recommending, movie recommending, commodity recommending etc. However, as a kind of recommender system needed by many students, further education-oriented recommender systems are rarely mentioned. In this paper, we propose a preliminary recommender system based on profile comparison applied in further education, which can recommend professors to students. In order to compare the profiles of student and professors, we proposed a distance function using word2vec and doc2vec. Moreover, regarding the amount of history data collected during operation, we separate the model into two stages. In the two stages, different methods of giving recommendations are applied.

Research on Diffusion Network Inference without Temporal Information

February 01, 2018

Research project, Wuhan University, Wuhan China

In previous works, the approaches mainly focused on network inference with temporal data. However, we have to doubt that whether the temporal information is precise in real world. Many networks do not have any records of precise time of infection like social networks do. For example, in a network of disease spreads, it’s quite time consuming to retract the exact infection time and even if we have the temporal info, it is probably incorrect and misleading for the reason that people react differently to the same disease, and the time they seek for medical help can be greatly influenced by subjective factors. The same condition also happens in other kinds of diffusion networks. The second defect is that they are quite time consuming if the algorithms are performed on lap-tops, usually several days. If they are applied, a high-performance computer or server is necessary.

Research on Arbitrarily Shaped Clustering

October 01, 2017

Research project, Wuhan University, Wuhan China

Arbitrary shaped clustering is an important research topic in the domain of data mining. Most of the existing approaches cannot ensure good clustering results and scalability on large high-dimensional datasets at the same time. To meet the clustering requirement on large high-dimensional datasets, some researchers proposed that we could extract the backbone of each cluster before conducting clustering on the backbone. In this way, we can reduce time to some extent, however the time reduced depends on the proportion of representatives forming the backstone to the whole dataset, which varies with the dataset. In this paper, we put forward an efficient clustering method based on representatives sampling and boundary similarity. It includes three steps: firstly, conduct representatives sampling, which makes represemtatives distribute evenly and continuously; secondly, adjust position of representatives iteratively to make them get closer to corresponding k nearest neighbors; finally, conduct agglomerative clustering based on boundary similarity. We conduct extensive experiments on synthetic and real-world datasets, and contrast experiments with other methods. The experimental results prove the validity and efficiency of our method.

Location Based Searching Algorithm

August 01, 2017

Research project, Institute of Software, Chinese Academy of Sciences, Beijing China

Location-based services (LBS) emerge with the rapid rise of big data and the Internet, which provides a new service and business model for Internet enterprises, and also brings users a more pleasant and convenient experience of using the Internet. Internet service platforms for taxis and other vehicles provide positioning-based “taxi” services for users. On the surface, the user only needs to open the software and click on the destination to “call” the nearest idle taxi according to the current location; in fact, after receiving the client’s request and the client’s current location report, the server can find the nearest taxi according to a specific algorithm and return its information to the client (usually mobile devices such as mobile phones).