Tag Archives: CVPR

CVPR11–Tutorial on Activity Analysis; Dataset Biases

There’s a review of the frontiers of human activity analysis in this tutorial at the ongoing CVPR conference. Though it’s obvious the presenters have purposeful selected the highlighted to their own taste, I do like their smart way of dividing existing research efforts into Single-layered and Hierarchical approaches (as shown in the figure below borrowed from their slides), in accordance with the inherent hierarchy associated with human activities (postures –> actions –> interactions –> activities according to them).


Image credit: Aggarwal and Ryoo, ACM CSUR 2011.

Another piece of impression is that hierarchical approaches to date can only deal with very constrained and perhaps well-defined activity cases. This may be due to the statistical modeling or grammatical reasoning they’re using. The open question is beyond these explicit modeling of structures and rules, are there ways of dealing with this implicitly? Or perhaps think of this less radically, can we have reliable ways to learn about these structures?

On another lead, Prof. Torralba and Prof. Efros are scrutinizing the use of datasets in vision today and the possible biases in this interesting paper. Though it sounds like they are saying the right words at the right time, I hope this is not the first time they realized this – both have been emerging heroes in vision for a while and is in leading institutes of AI. Anyway, the cross-dataset generalization and negative sample bias are indeed interesting to note (and in fact more or less touched by many authors already, maybe not as systematic as here). I would like to acknowledge Prof. Torralba’s contribution of the new object recognition dataset (I’m not to be credited for the name of the dataset though Smile; meanwhile I doubt part of the motivation of the current paper is to raise awareness of the community to the dataset)


Image credit: Jianxiong Xiao et al working on the SUN dataset.

SUN Database: Large-scale scene recognition from abbey to zoo.

and I also love the way they view the different roles of datasets to computer vision and machine learning

… Unlike datasets in machine learning, where the dataset is the world, computer vision datasets are supposed to be a representation of the world.

Tagged , , , , , ,

Action Analysis/Subspace Segmentation Updated; Event Video Dataset

I have just added in accepted papers in CVPR 2011 on action recognition and subspace segmentation. There’s a noticeable blossom of papers on various aspects of action recognition, which almost doubles the number accepted in CVPR 2010. While it’s great to see people shifting their attention onto this topic, I regret to say many papers are only worth 30 seconds glimpse and period. And still, many authors have not released their papers to the public places (for which I cannot add links, and I’m never willing to add links to in front of a pay-wall). My most reluctant response to this is to direct those to my blog article Nobody Cares about You and Your Paper.  

And new challenges and opportunities always come with a new dataset in vision, especially when it’s gigantic in size. In this regard, VIRAT Video Dataset could be described as large-scale, for now.


Tagged , , ,


转自Mr. Li Dahua’s blog  http://dahuasky.spaces.live.com/blog/cns!1AB3BC993DB84FD7!1168.entry

Updates: the original article “Paper gestalt”  (courtesy of Carven von Bearnensquash)

von Bearensquash C., “Paper Gestalt”, Secret Proceedings of Computer Vision and Pattern Recognition, 2010.


学术圈里发papers, 当然也是有这许多的潜规则。就我们这个小小的领域computer vision 而言,不管papers内容如何,有没有真正的学术价值,一堆的 数学公式,几张漂亮的插图,就能说明你很有品位,很有学术精神,至少给人的第一印象是,当然至少这些人包括reviewers。关于此类细节问题,Terence Tao也特别在他的主页里索引这篇博文:Ten signs a claimed mathematical breakthrough is wrong, 其间显然把用 Tex排版当成了第一条。

把自己的学术产品包装好,很清晰的展示给别人,本来天经地义。但是任何的东西都不能文过饰非,scientific research的首要准则是真实性。那篇数学博文的警示意义在于,高深的数学问题不是一个门外汉能够解决的。但是有人就是能把这些加工成偏见,然后利用这些偏见来发烂文章。不知道做vision research的人有没有算过:自己真正用来思考的时间,和自己去加工paper, 特别是画图的时间是一个什么样的比例。我们不是搞艺术的,所以一周的思索 加上 三个月的文过饰非,产生的肯定不是好的作品。适度可以,不然就是凤姐穿上了妖娆的外衣,然后骂别人是妓女。


在今年的CVPR,见到 了很多朋友,也看到很多Paper。

在这次会议里面让我最喜欢的一篇Paper,却不是在会议中正式发表的,而是在TC Panel派发的。这篇Paper的题目叫Paper Gestalt。文章以诙谐的笔调描述了一个基于vision + learning的自动paper review算法。

参加会议的朋友们可以很幸运的在会场获取这篇文章,至于没有来的朋友,我想只能向 作者(Carven von Bearnensquash, bearensquash@live.com)索要了。

这篇论文出炉的背 景,就是最近几年CVPR或者ICCV的submission呈现急速的指数增长的趋势(在过去10年翻了三倍)。按照这个速度增长,在 10年后每次会议的投稿量就会超过5000篇!也许最好的办法是采用“货币杠杆”进行“宏观调控”——就是对投稿者收费。比如,对每个 submission征收100美元的费用,我相信对于投稿的数量和质量都会取得立竿见影的效果。一方面,很多纯粹是来碰运气的作者会掂量一下花100块 钱来博取1%的命中机会是不是值得;而持认真态度的作者则会对paper精益求精,免得投稿费白白浪费;而最终文章被录用的作者就可以减免注册费,会议方 面也有更多的funding来给有志于为Computer Vision奋斗的学术青年发放参加会议的路费。一举四得,何乐不为,呵呵。

当 然了,涉及到钱的问题,自然要经历很多微妙的利益博弈——这些事情还是让Chair们去担心好了。这里,我们还是继续“奇文共欣赏”吧。文章的算法很简 单(前提是你对Machine Learning或者Computer Vision有一点了解),把8页的pdf文档并排成一张长的image,然后就在上面抽feature。做自然语言处理的朋友们请不要激动,这是 Vision的paper,自然用的是Vision圈子自己的方法。好了,抽什么feature呢?主要是HOG(Histogram of Gradients),这是一种纯粹用于描述视觉观感的feature。显然,大段的文本,曲线图,图像,表格,数学公式,它们的feature应该是不 太一样的。然后作者用AdaBoost做feature selection训练得到一个分类器:纯粹根据paper的视觉观感来判断paper的好坏。

Continue reading

Tagged , , ,