Object Recognition (Computer Vision)
Is a small apple more like an apple or more like a cherry? A study with real and modified sized objects
Co-authored with Anna Maria Borghi
In a categorization experiment we assessed whether seeingobjects automatically activates information on how... more In a categorization experiment we assessed whether seeingobjects automatically activates information on how tomanipulate them. The experiment also aims at investigatingthe role played in a categorization task by online, visualinformation (i.e., of information mediated by the dorsal system), and by information stored in memory (i.e.,information mediated by the ventral system). Participantscategorized photographs of objects manipulable either with apower or a precision grip into artifacts or natural kinds.Target-objects were preceded by primes consisting of photographs of hands in either grasping postures (precision orpower grip) or in a neutral posture (grip). Target-objects couldbe presented either in their real size or in modified size, sothat they activated a different kind of grip. For example, astrawberry was presented both in its real size and with the sizeof an apple, so that it activated a power grip. Results confirm that visual stimuli activate motor information. More importantly, they suggest a crucial role of online, visualinformation even in a categorization task. Results arediscussed in the framework of theories on the role of onlineand offline memory features.
Learning Object Segmentation Using A Multi Network Segment Classification Approach
Simone Albertini, Ignazio Gallo, Marco Vanetti and Angelo Nodari.
Published in "Proceedings of VISAPP 2012 - International Conference on Computer Vision Theory and Applications". Rome, Italy, 2012.
In this study we propose a new strategy to perform an object segmentation using a multi neural network approach. We... more In this study we propose a new strategy to perform an object segmentation using a multi neural network approach. We started extending our previously presented object detection method applying a new segment based classification strategy. The result obtained is a segmentation map post processed by a phase that exploits the GrabCut algorithm to obtain a fairly precise and sharp edges of the object of interest in a full automatic way. We tested the new strategy on a clothing commercial dataset obtaining a substantial improvement on the quality of the segmentation results compared with our previous method. The segment classification approach we propose achieves the same improvement on a subset of the Pascal VOC 2011 dataset which is a recent standard segmentation dataset, obtaining a result which is inline with the state of the art.
42 views
Seen by:Object Segmentation using Multiple Neural Networks for Commercial Offers Visual Search
Ignazio Gallo, Angelo Nodari and Marco Vanetti.
Published in "Proocedings of EANN 2011 - Engineering Applications of Neural Networks". Corfu, Greece, 2011.
We describe a web application that takes advantage of new computer vision techniques to allow the user to make... more We describe a web application that takes advantage of new computer vision techniques to allow the user to make searches based on visual similarity of color and texture related to the object of interest. We use a supervised neural network strategy to segment different classes of objects. A strength of this solution is the high speed in generalization of the trained neural networks, in order to obtain an object segmentation in real time. Information about the segmented object, such as color and texture, are extracted and indexed as text descriptions. Our case study is the online commercial offers domain where each offer is composed by text and images. Many successful experiments were done on real datasets in the fashion field.
32 views
Seen by:26 views
Seen by:Variability in photos of the same face
by David White
Jenkins, R., White, D., Van Montfort, X., & Burton, A. M. (In Press). Variability in photos of the same face. Cognition.
Psychological studies of face recognition have typically ignored within-person variation in appearance, instead... more Psychological studies of face recognition have typically ignored within-person variation in appearance, instead emphasizing differences between individuals. Studies typically assume that a photograph adequately captures a person’s appearance, and for that reason most studies use just one, or a small number of photos per person. Here we show that photographs are not consistent indicators of facial appearance because they are blind to within-person variability. Crucially, this within-person variability is often very large compared to the differences between people. To investigate variability in photos of the same face, we collected images from the internet to sample a realistic range for each individual. In Experiments 1 and 2, unfamiliar viewers perceived images of the same person as being different individuals, while familiar viewers perfectly identified the same photos. In Experiment 3, multiple photographs of any individual formed a continuum of good to bad likeness, which was highly sensitive to familiarity. Finally, in Experiment 4, we found that within-person variability exceeded between-person variability in attractiveness. These observations are critical to our understanding of face processing, because they suggest that a key component of face processing has been ignored. As well as its theoretical significance, this scale of variability has important practical implications. For example, our findings suggest that face photographs are unsuitable as proof of identity.
Autonomous Roverbot Using Scene Analysis
Co-authored with Faisal Nasim and Muhammad Usman Ghani
Autonomous Roverbot using Scene Analysis covers all the major aspects of Computer Engineering from Software to... more Autonomous Roverbot using Scene Analysis covers all the major aspects of Computer Engineering from Software to Hardware and from Signaling to Control. The idea behind the project is to develop an autonomous vehicle which will be controlled through a remote station. The vehicle is fitted with a wireless video camera which transmits live video to a base-station and is processed through MATLAB. The base-station, then, submits controlling signals to the vehicle to navigate through its course. Such a robot could be used for surveillance, scanning pipes (through manual or limited autonomous control) and tracking moving objects.
29 views
Seen by:15 views
Seen by:Weakly Supervised Semantic Segmentation by Multi Image Model
ICCV 2011
We propose a novel method for weakly supervised semantic segmentation. Training images are labeled only by the classes... more We propose a novel method for weakly supervised semantic segmentation. Training images are labeled only by the classes they contain, not by their location in the image. On test images instead, the method predicts a class label for every pixel. Our main innovation is a multi-image model (MIM) - a graphical model for recovering the pixel labels of the training images. The model connects superpixels from all training images in a data-driven fashion, based on their appearance similarity. For generalizing to new test images we integrate them into MIM using a learned multiple kernel metric, instead of learning conventional classifiers on the recovered pixel labels. We also introduce an “objectness” potential, that helps separating objects (e.g. car, dog, human) from background classes (e.g. grass, sky, road). In experiments on the MSRC 21 dataset and the LabelMe subset, our technique outperforms previous weakly supervised methods and achieves accuracy comparable with fully supervised methods.
37 views
Seen by:Towards Weakly Supervised Semantic Segmentation by Means of Multiple Instance and Multitask Learning.
CVPR 2010
We address the task of learning a semantic segmentation from weakly supervised data. Our aim is to devise a system... more
We address the task of learning a semantic segmentation from weakly supervised data. Our aim is to devise a system that predicts an object label for each pixel by making use of only image level labels during training -- the information whether a certain object is present or not in the image. Such coarse tagging of images is faster and easier to obtain as opposed to the tedious task of pixelwise labeling required in state of the art systems. We cast this task naturally as a multiple instance learning (MIL) problem. We use Semantic Texton Forest (STF) as the basic framework and extend it for the MIL setting.
We make use of multitask learning (MTL) to regularize our solution. Here, an external task of geometric context estimation is used to improve on the task of semantic segmentation. We report experimental results on the MSRC21 and the very challenging VOC2007 datasets. On MSRC21 dataset we are able, by using 276 weakly labeled images, to achieve the performance of a supervised STF trained on pixelwise labeled training set of 56 images, which is a significant reduction in supervision needed.
128 views
Seen by:Video Databases Annotation Enhancing using Commonsense Knowledgebases for Indexing and Retrieval
The 13th IASTED International Conference on on Artificial Intelligence and Soft Computing, 2009. Palma de Mallorca, Spain.
The rapidly increasing amount of video collections, especially on the web, motivated the need for intelligent... more
The rapidly increasing amount of video collections, especially on the web, motivated the need for intelligent automated annotation tools for searching, rating, indexing and retrieval purposes. These videos collections contain all types of manually annotated videos. As this annotation is usually incomplete and uncertain and contains misspelling words, search using some keywords almost do retrieve only a portion of videos which actually contains the desired meaning. Hence, this annotation needs filtering, expanding and validating for better indexing and retrieval.
In this paper, we present a novel framework for video annotation enhancement, based on merging two widely known commonsense knowledgebases, namely WordNet and ConceptNet. In addition to that, a comparison between these knowledgebases in video annotation domain is presented.
Experiments were performed on random wide-domain video clips, from the \emph{vimeo.com} website. Results show that searching for a video over enhanced tags, based on our proposed framework, outperforms searching using the original tags. In addition to that, the annotation enhanced by our framework outperforms both those enhanced by WordNet and ConceptNet individually, in terms of tags enrichment ability, concept diversity and most importantly retrieval performance.
104 views
Seen by:Learning Generic Invariances in Object Recognition: Translation and Scale.
by Joel Leibo
Leibo, Joel Z; Mutch, Jim; Rosasco, Lorenzo; Ullman, Shimon; Poggio, Tomaso (2010)
Invariance to various transformations is key to object recognition but existing definitions of invariance are somewhat... more Invariance to various transformations is key to object recognition but existing definitions of invariance are somewhat confusing while discussions of invariance are often confused. In this report, we provide an operational definition of invariance by formally defining perceptual tasks as classification problems. The definition should be appropriate for physiology, psychophysics and computational modeling. For any specific object, invariance can be trivially ``learned'' by memorizing a sufficient number of example images of the transformed object. While our formal definition of invariance also covers such cases, this report focuses instead on invariance from very few images and mostly on invariances from one example. Image-plane invariances -- such as translation, rotation and scaling -- can be computed from a single image for any object. They are called generic since in principle they can be hardwired or learned (during development) for any object. In this perspective, we characterize the invariance range of a class of feedforward architectures for visual recognition that mimic the hierarchical organization of the ventral stream. We show that this class of models achieves essentially perfect translation and scaling invariance for novel images. In this architecture a new image is represented in terms of weights of "templates" (e.g. "centers" or "basis functions") at each level in the hierarchy. Such a representation inherits the invariance of each template, which is implemented through replication of the corresponding "simple" units across positions or scales and their "association" in a "complex" unit. We show simulations on real images that characterize the type and number of templates needed to support the invariant recognition of novel objects. We find that 1) the templates need not be visually similar to the target objects and that 2) a very small number of them is sufficient for good recognition. These somewhat surprising empirical results have intriguing implications for the learning of invariant recognition during the development of a biological organism, such as a human baby. In particular, we conjecture that invariance to translation and scale may be learned by the association -- through temporal contiguity -- of a small number of primal templates, that is patches extracted from the images of an object moving on the retina across positions and scales. The number of templates can later be augmented by bootstrapping mechanisms using the correspondence provided by the primal templates -- without the need of temporal contiguity.

