# Publications

## Understanding Fashion Trends from Street Photos via Neighbor-Constrained Embedding Learning.

Published in ACM Multimedia, 2017

Recommended citation: Xiaoling Gu, Yongkang Wong, Pai Peng, Lidan Shou, Gang Chen and Mohan S. Kankanhalli. ACM Multimedia, 2017.

## iGlasses: A Novel Recommendation System for Best-fit Glasses.

Published in SIGIR, 2016

We demonstrate iGlasses, a novel recommendation system that accepts a frontal face photo as the input and returns the best-fit eyeglasses as the output. As conventional recommendation techniques such as collaborative filtering become inapplicable in the problem, we propose a new recommendation method which exploits the implicit matching rules between human faces and eyeglasses. We first define fine-grained attributes for human faces and frames of glasses respectively. Then, we develop a recommendation framework based on a probabilistic graphical model, which effectively captures the correlation among these fine-grained attributes. Ranking of the frames (glasses) is done by their similarity to the query facial attributes. Finally, we produce a synthesized image for the input face to demonstrate the visual effect when wearing the recommended glasses.

Recommended citation: Gu, Xiaoling, Lidan Shou, Pai Peng, Ke Chen, Sai Wu, and Gang Chen. "iGlasses: A Novel Recommendation System for Best-fit Glasses." In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 1109-1112. ACM, 2016.

## DeepCamera: A Unified Framework for Recognizing Places-of-Interest based on Deep ConvNets.

Published in CIKM, 2015

In this work, we present a novel project called DeepCamera(DC) for recognizing places-of-interest(POI) with smartphones. Our framework is based on deep convolutional neural networks(ConvNets) which are currently state-of-the-art solutions to vision recognition tasks such as our mission. We propose a novel ConvNet by introducing a new layer called “spatial layer” which captures spatial knowledge from a geographic view. As a result, both spatial and visual knowledge contribute to generating a hybrid probability distribution over all possible POI candidates. Furthermore, we compress multiple trained deep ConvNets into one single shallow net called “shNet” which achieves competitive performance with ensemble methods. Our preliminary experiments conducted on real-world dataset have shown promising POI recognition results.

Recommended citation: Peng, Pai, Hongxiang Chen, Lidan Shou, Ke Chen, Gang Chen, and Chang Xu. "DeepCamera: A Unified Framework for Recognizing Places-of-Interest based on Deep ConvNets." In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1891-1894.i> ACM, 2015.

## E2C2: efficient and effective camera calibration in indoor environments.

Published in UbiComp, 2015

Camera calibration helps users better interact with the surrounding environments. In this work, we aim at accelerating camera calibration in an indoor setting, by selecting a small but sufficient set of keypoints. Our framework consists of two phases: In the offline phase, we cluster photos labeled with Wi-Fi and gyro sensor data according to a learned distance metric. Photos in each cluster form a “co-scene”. We further select a few frequently appearing keypoints in each co-scene as “useful keypoints” (UKPs). In the online phase, when a query is issued, only UKPs from the nearest co-scene are selected, and subsequently we infer extrinsic camera parameters with multiple view geometry (MVG) technique. Experimental results show that our framework is effective and efficient to support calibration.

Recommended citation: Li, Huan, Pai Peng, Hua Lu, Lidan Shou, Ke Chen, and Gang Chen. "E 2 C 2: efficient and effective camera calibration in indoor environments." In Adjunct Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers, pp. 9-12. i>ACM, 2015.

## Cross-Scenario Eyeglasses Retrieval via EGYPT Model.

Published in ICMR, 2015

In this paper, we present FGSS (Fashion Glasses Search System), an innovative cross-scenario eyeglasses retrieval system which automatically recognizes eyeglasses in real-world photos, e.g. the photo of a fashion girl with a stylish pair of eyeglasses, and retrieves a ranking list of visually similar product instances from the database. We propose a novel segmentation-free framework for FGSS to bridge two search gaps, semantic gap and feature gap, where a new type of keypoint-based scheme called EGYPT is tailored for eyeglasses to facilitate the search. In the EGYPT, we use the hybrid descriptors which combine the shape, color and texture features as a feature representation for eyeglasses. The experimental study on the real-world photo dataset and eyeglasses product dataset demonstrates the effectiveness of EGYPT model.

Recommended citation: Gu, Xiaoling, Pai Peng, Mengwen Li, Sai Wu, Lidan Shou, and Gang Chen. "Cross-Scenario Eyeglasses Retrieval via EGYPT Model." In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 463-466. ACM, 2015.

## KISS: Knowing Camera Prototype System for Recognizing and Annotating Places-of-Interest

Published in TKDE, 2015

This paper presents a project called KnowIng camera prototype SyStem (KISS) for real-time places-of-interest (POI) recognition and annotation for smartphone photos, with the availability of online geotagged images for POIs as our knowledge base. We propose a “Spatial+Visual” (S+V) framework which consists of a probabilistic field-of-view (pFOV) model in the spatial phase and sparse coding similarity metric in the visual phase to recognize phone-captured POIs. Moreover, we put forward an offline Collaborative Salient Area (COSTAR) mining algorithm to detect common visual features (called Costars) among the noisy photos geotagged on each POI, thus to clean the geotagged image database. The mining result can be utilized to annotate the region-of-interest on the query image during the online query processing. Besides, this mining procedure also improves the efficiency and accuracy of the S+V framework. Furthermore, we extend the pFOV model into a Bayesian FOV( $\beta$ FOV) model which improves the spatial recognition accuracy by more than 30 percent and also further alleviates visual computation. From a bayesian point of view, the likelihood of a certain POI being captured by phones is a prior probability in pFOV model which is represented as a posterior probability in $\beta$ FOV model.Our experiments in the real-world and Oxford 5K datasets show promising recognition results. In order to provide a fine-grained annotation ground truth, we labeled a new dataset based on Oxford 5K and make it public available on the web. Our COSTAR mining techniqueoutperforms state-of-the-art approach on both dataset.

Recommended citation: Peng, Pai, Lidan Shou, Ke Chen, Gang Chen, and Sai Wu. "KISS: Knowing Camera Prototype System for Recognizing and Annotating Places-of-Interest." IEEE Transactions on Knowledge and Data Engineering 28, no. 4 (2016): 994-1006.

## The knowing camera 2: recognizing and annotating places-of-interest in smartphone photos.

Published in SIGIR, 2014

This paper presents a project called Knowing Camera for real-time recognizing and annotating places-of-interest(POI) in smartphone photos, with the availability of online geotagged images of such places. We propose a`“Spatial+Visual” (S+V) framework which consists of a probabilistic field-of-view model in the spatial phase and sparse coding similarity metric in the visual phase to recognize phone-captured POIs. Moreover, we put forward an offline Collaborative Salient Area (COSTAR) mining algorithm to detect common visual features (called Costars) among the noisy photos geotagged on each POI, thus to clean the geotagged image database. The mining result can be utilized to annotate the region-of-interest on the query image during the online query processing. Besides, this mining procedure further improves the efficiency and accuracy of the S+V framework. Our experiments in the real-world and Oxford 5K datasets show promising recognition and annotation performances of the proposed approach, and that the proposed COSTAR mining technique outperforms state-of-the-art approach.

Recommended citation: Peng, Pai, Lidan Shou, Ke Chen, Gang Chen, and Sai Wu. "The knowing camera 2: recognizing and annotating places-of-interest in smartphone photos." In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp. 707-716. ACM, 2014.

## The Knowing Camera: Recognizing Places-of-Interest in Smartphone Photos.

Published in SIGIR, 2013

This paper presents a framework called Knowing Camera for real-time recognizing places-of-interest in smartphone photos, with the availability of online geotagged images of such places. We propose a probabilistic field-of-view model which captures the uncertainty in camera sensor data. This model can be used to retrieve a set of candidate images. The visual similarity computation of the candidate images relies on the sparse coding technique. We also propose an ANN filtering technique to speedup the sparse coding. The final ranking combines an uncertain geometric relevance with the visual similarity. Our preliminary experiments conducted in an urban area of a large city show promising results. The most distinguishing feature of our framework is its ability to perform well in contaminated, real-world online image database. Besides, our framework is highly scalable as it does not incur any complex data structure.

Recommended citation: Peng, Pai, Lidan Shou, Ke Chen, Gang Chen, and Sai Wu. "The knowing camera: recognizing places-of-interest in smartphone photos." In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, pp. 969-972. ACM, 2013.