## Script for OpenCV installation on Ubuntu 12.10

Installing OpenCV on Ubuntu was painful for me in the past. There are lots of dependencies to be sorted out first and there’s no comprehensive installation tutorial shipped with the OpenCV distribution.
This script has done a great job assembling all necessary “apt-get-install’s” into an executable script. Some personal notes about this:

• It seems to me ” sudo apt-get install libopencv-dev” for dependencies is not necessary – this will be superseded by the subsequent installation of opencv from source. At the end of the scripting installation, the package seems to be in some cache place and be not in effect:
• It seems ” sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev” is also unnecessary. These libraries are to be provided by the next fresh installation of ffmpeg follows. Checking their location seems to also confirm my guess:
• About version of ffmpeg: ffmepg is under active development as always. The choice of its version is sometimes tricky, especially in running together with OpenCV. I remember the 1.x series was not compatible with OpenCV 2.3 in my previous trial (compilation errors were thrown due to some problem with ffmpeg libraries). The current 2.4.3 version seems to be at least fine with ffmpeg 0.11.2 in compilation.

[Update] From test by Mr. Alok Singh Mahor   (comments below), installing Ubuntu package “libopencv-dev” would already get things work, though the linking order with g++ has to be taken care of as discussed (also comments below and here) . Of course, it’s not bad idea to install from source that always guarantees to bring  in the newest feature of actively developed OpenCV. [Dec 28 2012]

Tagged ,

## CVX Annouced Version 2 (beta)

CVX has become an indispensable tool for modeling and prototyping in various scientific disciplines, in particular where optimization play significant roles.  It provides a most intuitive modeling interface that bridges complex formulation and the underlying numerical solvers, saving the user abundant time for thinking.

Version 2 of CVX has featured extended support for commercial solvers, MOSEK and Gurobi. This is really good news for academic users, since 1) both solvers provide free licenses to academic users; and 2) both solvers are mostly more optimized than the free solvers currently bundles with CVX, namely, SDPT3 and SeDuMi. Related part of the announcement as follows:

• Academic users may utilize the CVX Professional capability at no charge.
• Users with a valid academic license for Gurobi 5.0 or later may use it with CVX without obtaining a CVX Professional license. This is because CVX is able to detect the presence of an academic Gurobi license.
• In order to use MOSEK, a CVX Professional license is required. We intend to provide such licenses at no charge, but we have not yet completed the licensing architecture. We will make an announcement once this has been completed.

In addition, mailing list support has changed into a Q&A forum (StackExchange style …) and the documentation has changed to online html version with better cross references.

Tagged , ,

## A Tight Lower Bound for Finite Sum of Arctangents

Days ago we ran into a need to lower bound ${\sum_{j=1}^k \pi/2 - \arctan j = \sum_{j=1}^k \arctan 1/j}$.

One possibility is observe the non-negativity and continuity of the function ${\arctan x}$ over ${j \in {\mathbb Z}_+}$ and apply an integral lower bound:

$\displaystyle \begin{array}{rcl} \sum_{j=1}^k \arctan \frac{1}{j} &\geq & \int_{1}^{k+1} \arctan 1/x \; dx = \left. x\arctan 1/x + \frac{1}{2} \ln \left(1+x^2\right) \right|_{x = 1}^{k+1} \\ & = & \frac{1}{2}\ln \left[\left(k+1\right)^2 +1\right] + \left(k+1\right) \arctan \frac{1}{k+1} - \frac{\pi}{4} - \frac{1}{2} \ln 2 \\ & \geq & \frac{1}{2}\ln \left[\left(k+1\right)^2 +1\right] + \left(k+1\right) \left(\frac{1}{k+1} - \frac{1}{3} \frac{1}{\left(k+1\right)^3}\right) - 1.132 \\ & = & \frac{1}{2}\ln \left[\left(k+1\right)^2 +1\right] - 0.132 - \frac{1}{3} \frac{1}{\left(k+1\right)^2}, \end{array}$

where in the third line we have retained the first two terms of series expansion for ${\arctan x}$ for ${|x|\leq 1}$. The integral approximation gives very messy terms and a somewhat loose lower bound. We can obtain a much neater one:

$\displaystyle \sum_{j=1}^k \arctan \frac{1}{j} \geq \log \left(k+1\right). \ \ \ \ \ (1)$

Before the proof, we may look at a plot to see how tight the lower bound is. Here it goes!

Sum of arctangents and its logarithm lower bound.

The trick of proof lies with series expansion.

Proof: It is true for ${k=1}$ as ${\pi/4 > \log(2)}$. Now suppose the claim holds for ${k-1}$, i.e., ${\sum_{j=1}^{k-1} \arctan\left(1/j\right) \geq \log\left(k\right)}$, we need to show it holds for ${k}$. It suffices to show ${\arctan(1/k) \geq \log\left(1+1/k\right)}$. Now we consider the series expansions of ${\arctan\left(x\right)}$ and ${\log\left(1+x\right)}$:

$\displaystyle \begin{array}{rcl} \arctan\left(x\right) & = x - \frac{1}{3}x^3 + \frac{1}{5}x^5 - \frac{1}{7}x^7 + \frac{1}{9}x^9 + \cdots, \forall \left|x\right|\leq 1, \\ \log\left(x+1\right) & = x - \frac{1}{2}x^2 + \frac{1}{3}x^3 - \frac{1}{4}x^4 + \frac{1}{5}x^5 + \cdots, \forall -1 < x \leq 1. \end{array}$

So we have

$\displaystyle \begin{array}{rcl} \arctan\left(x\right) - \log\left(1+x\right) & = \left(\frac{1}{2}x^2 + \frac{1}{4}x^4 + \frac{1}{6}x^6 + \cdots\right) - \left(\frac{2}{3}x^3 + \frac{2}{7}x^7 + \frac{2}{11}x^{11} + \cdots \right) \\ & = \frac{2x^2}{3}\left(\frac{3}{4}- x\right) + \frac{2x^4}{7}\left(\frac{7}{8}-x^3\right) + \frac{2x^6}{11}\left(\frac{11}{12} - x^5\right) + \cdots \geq 0 \end{array}$

if ${0< x \leq \frac{3}{4}}$, and ${1/k, \forall k>1}$ satisfies the condition. $\Box$

[Updated to the Proof] There is a simpler way to see ${\arctan \left(1/k\right) \geq \log\left(1+1/k\right)}$ (Thanks to Dai Liang! See first comment below). Basically it follows from the fact that ${\arctan x \geq \log \left(1+x\right), \forall x\leq 1}$, where the latter can be observed as follows: ${\arctan 0 = \log \left(1+0\right)}$ and ${\left(\arctan x\right)' \geq \left[\log \left(1+x\right)\right]'}$ for ${x \leq 1}$. (03/07/2012)

Recall that ${\arctan x \leq x}$. Thus we have

$\displaystyle \boxed{\sum_{j=1}^{k} \frac{1}{j} \geq \sum_{j=1}^{k} \arctan \frac{1}{j} \geq \log \left(k+1\right)}. \ \ \ \ \ (2)$

Acknowledgement: Dr. Tewodros Amdeberhan, at Math Department of Tulane University, has kindly provided the original proof and permitted me to share this on my blog. You may want to read their interesting paper on techniques of evaluating sum of arctangent: Sum of Arctangents and Some Formula of Ramanujan.

Tagged ,

## CV Meeting — Frontiers in computer vision

Prof. Alan Yuille (UCLA) in parter with others is organizing a special workshop for exploring frontiers in vision, a good chance for researchers to pause a while and look backward and forward.

Frontiers in Computer Vision

From the cyber-discussion being fired there, some interesting disagreements amongst these top vision researchers are already significant: conservatism versus radicalism, and pragmatism versus idealism. For a research community as ambitious and diverse and young as vision, nevertheless, consensus rarely occurs and ideological debates prevail — it is not surprising; in fact, this signals an active research field in my opinion. But my humble mind is really seeking some fuels the workshop could potentially generate for these topics:

1. Object recognition
2. Culture of scholarship in vision
The former has been central on the spot for the past 10 years with least success, while the latter has partially contributed to the dismal stagnation.

## CVPR11–Tutorial on Activity Analysis; Dataset Biases

There’s a review of the frontiers of human activity analysis in this tutorial at the ongoing CVPR conference. Though it’s obvious the presenters have purposeful selected the highlighted to their own taste, I do like their smart way of dividing existing research efforts into Single-layered and Hierarchical approaches (as shown in the figure below borrowed from their slides), in accordance with the inherent hierarchy associated with human activities (postures –> actions –> interactions –> activities according to them).

Image credit: Aggarwal and Ryoo, ACM CSUR 2011.

Another piece of impression is that hierarchical approaches to date can only deal with very constrained and perhaps well-defined activity cases. This may be due to the statistical modeling or grammatical reasoning they’re using. The open question is beyond these explicit modeling of structures and rules, are there ways of dealing with this implicitly? Or perhaps think of this less radically, can we have reliable ways to learn about these structures?

On another lead, Prof. Torralba and Prof. Efros are scrutinizing the use of datasets in vision today and the possible biases in this interesting paper. Though it sounds like they are saying the right words at the right time, I hope this is not the first time they realized this – both have been emerging heroes in vision for a while and is in leading institutes of AI. Anyway, the cross-dataset generalization and negative sample bias are indeed interesting to note (and in fact more or less touched by many authors already, maybe not as systematic as here). I would like to acknowledge Prof. Torralba’s contribution of the new object recognition dataset (I’m not to be credited for the name of the dataset though ; meanwhile I doubt part of the motivation of the current paper is to raise awareness of the community to the dataset)

Image credit: Jianxiong Xiao et al working on the SUN dataset.

and I also love the way they view the different roles of datasets to computer vision and machine learning

… Unlike datasets in machine learning, where the dataset is the world, computer vision datasets are supposed to be a representation of the world.

The past weekend I was trying to figure out how to convert $\LaTeX$ source to Blogger compatible sources with the support of the powerful MathJax mechanism to display math.

There’s already an excellent tool (latex2wp) that serves the conversion from $\LaTeX$ to WordPress as I noted before (and best advertised by Terry Tao’s WP blog). But there seem to be numerous legitimate reasons for doing similar thing on Blogger, the most notable one is Blogger allows network scripts to run for the blog (check out the concept of content distribution network, or CDN) and the powerful MathJax engine *happens* to provide such CDN service from recently. If you don’t appreciate why the bother to create yet another math display protocol MathJax, just come down here and have a look at the wonderful demos while imagining what would happen to image-based rendering of math at scaling.

A moment’s research revealed the major difference WordPress and blogger (precisely MathJax) are on handling $\LaTeX$ math: the former uses “$1atex …$” format (I intentionally put the number “1” instead of the letter “l” to avoid invoking the WP interpreter) while the latter just directly the $\LaTeX$ code as “$…$“. Another major piece lies at how they handle displaying style (vs. the inline style delimited by “$…$“). WP is to handle the following “<p align=center> $latex \displaystyle …$ </p>\n“, whereas MathJax can handle “$…$ ” directly (“$$…$$” alike are converted to “$…$” with proper numbering). Other minor details are different formats of line breaks and so on.

So I have worked on the latex2wp package directly and made necessary changes as discussed. Some simple tests tell that it’s working fine. Probably you’d like to have a try! (Albeit my great admiration for MathJax and blogger, I have no intention to move to Blogger.com yet since it’ll definitely expend a considerable amount of work. ) Here’s the download link from my dropbox

Latex2Blogger.0.6.2b.tar.gz

## Image Transformations Done Right in Matlab

Though simple and basic, spatial (geometric) transformations in image processing involve lots of subtleties – neglecting these will probably return you awkward results. These subtleties include but are not limited to (esp. in Matlab)

• Convention in transformation matrices
• Convention in coordinate axes
• Position of the input image in the source plane
• Bounding box calculatio
• Visible region calculation

Steve  Eddins, a software engineer specialized in image processing, has a series of blog articles devoted to these issues.  He has also contributed a new chapter in the 2nd edition of the classic “Digital image processing using Matlab” (Chapter 6: Geometric Transformations and Image Registration)

Steve on Image Processing

## Closed Form Solutions in Nuclear Norm Related Optimization and Around

The nuclear norm has been on the spotlight of optimization for quite a while, largely due to the breakthrough work of Emmanuel Candeson exact matrix completion and recovery years back. Summing up the singular values, this norm can be considered as the ${\ell_1}$ norm in the spectral domain, reminiscent of the intriguing relationship between the ${\ell_0}$ (quasi)norm and the ${\ell_1}$ norm, revealed in compressive sensing. In fact, this norm best approximates the rank for a given matrix under some technical conditions (aka. serving as the convex surrogate). To be precise, for a particular matrix ${\mathbf{A}}$, its nuclear norm is defined as ${\|\mathbf{A}\|_* = \sum_i \sigma_i}$, where ${\sigma_i}$ denotes the ${i^{th}}$ singular value (henceforth assuming singular values sorted in descending order). Then ${\|\mathbf{A}\|_*}$ is the best under-estimator (technically the convex envelope) of ${\textrm{rank} \left(\mathbf{A}\right)}$ over the domain ${\|\mathbf{A}\| \leq 1}$, where ${\| \cdot\|}$ is the spectral norm or equivalently the largest singular value ${\sigma_1}$. The last condition on the spectral norm is in fact trivial, as one can always scale the matrix to make ${\sigma_1}$ valid (in practical optimization, this possibly entails adjusting of parameters).

Most nuclear norm related (convex) optimization can be recast as SemiDefinite Programming problems, due to the simple variational characterization of the nuclear norm (see Proposition 2.1 in this paper)

$\displaystyle \min_{\mathbf{W}_1, \mathbf{W}_2} \quad \frac{1}{2}\left(\mathrm{trace} \left(\mathbf{W}_1\right)+ \mathrm{trace} \left(\mathbf{W}_2\right) \right), \qquad \mathrm{s.t.} \quad \begin{bmatrix} \mathbf{W}_1 & \mathbf{X} \\ \mathbf{X} & \mathbf{W}_2 \end{bmatrix} \succeq \mathbf{0}. \ \ \ \ \ (1)$

Despite the elegance, this approach is numerically unwise as most interior-point solvers can only handle problems up to hundreds of variables both for hardware and algorithmic reasons. On the other hand, most current applications such as large-scale matrix completion and the Robust PCA (by Cand\`{e}s et al) have been numerically feasible due to the closed-form solutions to some canonical forms of nuclear norm optimization (most popular and foremost the singular value thresholding (SVT) operator). These line of results has served as the key to success of recent numerical schemes such as the accelerated proximal gradient (APG) and the augmented Lagragian multiplier(ALM) methods , for nuclear norm optimization. Hence discovery and hopefully systematic investigation into such closed-form solutions are both necessary and interesting in their own right.

— 1. Realizing the importance of unitarily invariant property (UIP) —

My recent investigation into the low rank representation for subspace segmentation (ref. my earlier post here and this paper) has boiled down to understanding this problem

$\displaystyle \min_{\mathbf{Z}} \quad \| \mathbf{Z}\|_*, \qquad \mathrm{s.t. } \quad \mathbf{XZ} = \mathbf{X} \ \ \ \ \ (2)$

and it turns out

Theorem 1 The minimizer ${\mathbf{Z}^*}$ to problem (2)obeys ${\mathbf{Z}^* = \mathbf{VV}^\top}$, where ${\mathbf{X} = \mathbf{U\Sigma V}^\top}$ is the singular value decomposition of ${\mathbf{X}}$. Moreover, ${\mathbf{Z}^*}$ is also the minimizer to this problem:

$\displaystyle \min_{\mathbf{Z}} \quad \| \mathbf{Z}\|_*, \qquad \mathrm{s.t. } \quad \mathbf{XZ} = \mathbf{X} , \mathbf{Z}\succeq \mathbf{0}. \ \ \ \ \ (3)$

While these results bear some direct links with what comes later, they have provided us with important clues to investigate the above closed-form problem. There it is. When trying to model practical data points contaminated with outliers, we consider this model

$\displaystyle \min_{\mathbf{Z, E}} \quad \| \mathbf{Z}\|_* + \lambda \| \mathbf{E}\|_{2, 1}, \qquad \mathrm{s.t. } \quad \mathbf{XZ} +\mathbf{E} = \mathbf{X} , \mathbf{Z}\succeq \mathbf{0}. \ \ \ \ \ (4)$

And numerical tackle to this model involves the subproblem

$\displaystyle \min_{\mathbf{Z}} \quad \| \mathbf{Z}\|_* + \frac{\mu}{2}\| \mathbf{X - Z}\|_F^2, \qquad \mathrm{s.t. } \quad \mathbf{Z} \succeq \mathbf{0}. \ \ \ \ \ (5)$

To establish the closed-form solution to this problem, it turns out the unitarily invariant property of the nuclear norm is critical.

Definition 2 (Unitarily Invariant Norms) For ${\mathbf{X} \in \mathcal{C} ^{m\times n}}$, and unitarily matrices ${\mathbf{U}}$ and ${\mathbf{V}}$ (aka. ${\mathbf{UU^*} = \mathbf{U^*U} = \mathbf{I}}$, and similar for ${\mathbf{V}}$). A norm ${\|\cdot\|_{\ell}}$ is unitarily invariant if and only if ${\|\mathbf{X}\|_\ell = \|\mathbf{UXV}\|_\ell}$ for any ${\mathbf{U}}$ and ${\mathbf{V}}$. (${\mathbf{U}^*}$ here denotes conjugate transpose over the complex matrix field)

Two families of unitarily invariant norms are particularly relevant here. One is the Schattern ${p}$ norms defined as ${\|\mathbf{X}\|_{\mathrm{sp}} = \left(\sum_{i=1}^r \sigma_i^p\right)^{1/p}}$ (assuming ${\mathrm{rank}\left(\mathbf{X}\right) = r}$); the other is the Ky-Fan ${k}$ norms, defined as ${\|\mathbf{X}\|_{\mathrm{kk}} = \sum_{i=1}^k \sigma_i }$ (assuming descending order of singular values). Clearly the nuclear norm is the Schatten ${1}$ norm (or Ky-Fan ${r}$ norm), and hence unitarily invariant. Similarly the Frobenius norm is the Schatten ${2}$ norm and hence also enjoys the nice property.

— 2. Deriving closed-form solutions of nuclear norm problems by unitarily invariant property —

I have progressively realized the UIP can be used for figuring out closed form solutions to several canonical forms of nuclear norm optimization problem. Though results presented below are known from various authors, I don’t like their proofs in general because their proofs are by construction in nature — it is beautiful but not generalizable nor stimulating, esp. to complicated forms and further problems.

Before the exposition, I’d like to briefly review some facts from convex analysis and optimization (ref. Boyd and Vandenberghe). A real-valued function ${f\left(x\right)}$ is convex if and only if its domain ${\mathrm{dom}\left(f\right)}$ is a convex set and ${f\left(\alpha x_1 +\left(1-\alpha\right)x_2\right) \leq \alpha f\left(x_1\right) + \left(1-\alpha\right)f\left(x_2\right)}$ for ${x_1 \neq x_2 \in \mathrm{dom}\left(f\right)}$ and ${0\leq\alpha\leq 1}$ (the description can be considerably simplified with the extended-value extensions); it’s strictly convex if and only if the inequality always holds strictly. Now consider minimizing a convex function ${f\left(x\right)}$ in practice, or generically

$\displaystyle \min. \quad f\left(x\right), \qquad \textrm{s.t.} \quad x\in \mathcal{C}, \ \ \ \ \ (6)$

where ${\mathcal{C}}$ is the convex domain that’s dictated by constraints. Suppose ${\mathcal{C}}$ is nonempty, aka. the problem is feasible, then

Lemma 3 (Convexity and Uniqueness of Minimizers) For the generic convex optimization (6)with a nonempty domain ${\mathcal{C}}$, we have

• 1) For the additive composite function ${g\left(x\right) = \sum_{i=1}^k w_i f_i\left(x\right)}$, ${x\in \mathcal{C}, w_i >0}$ with convex components, ${g\left(x\right)}$, ${x\in \mathcal{C}}$ is strictly convex so long as any component function is strictly convex.
• 2) If ${f\left(x\right)}$ is strictly convex, the minimizer to (6) is always unique.
• 3) When ${f\left(x\right)}$ is not strictly convex, the minimizer(s) to (6) may or may not be unique.

The first and second can be easily verified from first principle of strict convexity, while the third can be loosely cemented with examples, esp on why strict convexity is not a necessary condition for the uniqueness. Take ${f\left(x\right)=\| x\|_1, x \in \mathcal{R}^n}$ for example. The objective is obviously not strictly convex (in fact, local linearity is a typical kind of antidote for thus), but the minimizer is only the origin (it looks as if it was strictly convex in the ${\epsilon}$ vicinity of the origin). There are numerous reasons for which such unicity can arise and I won’t push towards enumerating them; instead, I’d like to note that a useful technique to prove such unicty is by perturbation analysis: for a candidate minimizer of (6), say ${x_\diamond}$, it is the unique minimizer if and only if ${f\left(x_\diamond +\delta x\right) > f\left(x_\diamond\right)}$ where ${x_\diamond +\delta x \in \mathcal{C}}$. This is justified since for convex functions, any local minimizer is also a global minimizer while the convexity of domain suggests such minimizers must be spatially contiguous.

Now I’ll close the loop by presenting the UIP version of sketchy proofs to several known results on nuclear norm.

Theorem 4 (Singular Value Thresholding and EigenValue Thresholding)We have the following analytic results

• 1) The minimizer to

$\displaystyle \min_{\mathbf{X}} \quad \mu \|\mathbf{X}\|_* + \frac{1}{2}\|\mathbf{ Y - X}\|_F^2 \ \ \ \ \ (7)$

obeys ${\mathbf{X} = \mathbf{U}\mathrm{sign}\left(\mathbf{\Lambda}\right)\odot\left(|\mathbf{\Lambda}| - \mu \mathbf{I}\right)\mathbf{V}^\top}$, for SVD of ${\mathbf{Y}}$ as ${\mathbf{Y} = \mathbf{U\Lambda V}^\top}$, where operations for ${\mathrm{sign}\left(\mathbf{\Lambda}\right)\odot\left(|\mathbf{\Lambda}| - \mu \mathbf{I}\right)}$ are taken component-wise.

• 2) For symmetric matrix ${\mathbf{Y}}$, the minimizer to

$\displaystyle \min_{\mathbf{X}\succeq \mathbf{0}} \quad \mu \|\mathbf{X}\|_* + \frac{1}{2}\|\mathbf{ Y - X}\|_F^2 \ \ \ \ \ (8)$

obeys ${\mathbf{X} = \mathbf{U} \max\left(\mathbf{\Lambda} - \mu \mathbf{I}, \mathbf{0}\right)\mathbf{U}^\top}$, for eigenvalue decomposition of ${\mathbf{Y}}$ as ${\mathbf{Y} = \mathbf{U\Lambda U}^\top}$.

• 3) For general square matrix ${\mathbf{Y}}$, the minimizer to

$\displaystyle \min_{\mathbf{X}\succeq \mathbf{0}} \quad \mu \|\mathbf{X}\|_* + \frac{1}{2}\|\mathbf{Y - X}\|_F^2 \ \ \ \ \ (9)$

obeys ${\mathbf{X} = \mathbf{U} \max\left(\mathbf{\Lambda} - \mu \mathbf{I}, \mathbf{0}\right)\mathbf{U}^\top}$, for eigenvalue decomposition of ${\widetilde{\mathbf{Y}} = \frac{1}{2}\left(\mathbf{Y} +\mathbf{Y}^\top\right)}$ as ${\widetilde{\mathbf{Y}} = \mathbf{U\Lambda U}^\top}$.

Proof: 1) (7) is strongly convex and hence has a unique minimizer. It is not hard to see by UIP of Schatten norms that

$\displaystyle \min_{\mathbf{X}} \quad \mu \|\mathbf{X}\|_* + \frac{1}{2}\|\mathbf{ Y - X}\|_F^2 \Longleftrightarrow \min_{\mathbf{X}} \quad \mu \|\mathbf{U}^\top\mathbf{XV}\|_* + \frac{1}{2}\|\mathbf{ \Lambda} - \mathbf{U}^\top \mathbf{XV}\|_F^2, \ \ \ \ \ (10)$

where the right hand side is minimized only when ${\mathbf{\Sigma} = \mathbf{U}^\top \mathbf{XV}}$ is diagonal. The last assertion comes from the facts that for any matrix ${\mathbf{A}}$ decomposed into its diagonal and off-diagonal elements, ${\mathbf{A = \Sigma + B}}$, we have ${\| \mathbf{\Sigma + B}\|_* \geq \|\mathbf{\Sigma}\|_*}$ (generalized from Lemma 13 in this paper; also known in quantum information theory as (a particular case of) the pinching equality — see this slides) and ${\| \mathbf{\Sigma + B}\|_F^2 \geq \|\mathbf{\Sigma}\|_F^2}$, where the equality holds only when ${\mathbf{B = 0}}$ (the statement is wrong, there are counter-examples! Consider ${\begin{bmatrix} 2 & 2 \\ 2& 2\end{bmatrix}}$ for example). So the problem to solve reduces to seeking ${\mathbf{\Sigma}_\diamond = \mathop{\arg\min}_{\mathbf{\Sigma}} \, \mu \|\mathbf{\Sigma}\|_* + \frac{1}{2}\|\mathbf{ \Lambda} - \mathbf{\Sigma}\|_F^2}$ (solved by simple component-wise arguments similar to that of soft-thresholding for the Lasso problem), upon which the optimal variable is ${\mathbf{X}_\diamond = \mathbf{U\Sigma}_\diamond \mathbf{V}^\top}$.
2) On top of the above, the key is to verify ${\mathbf{X}\succeq \mathbf{0} \Longleftrightarrow \mathbf{U}^\top \mathbf{X}\mathbf{U} \succeq \mathbf{0}}$ for orthogonal ${\mathbf{U}}$ (convention here: “orthogonal” means ${\mathbf{U}^\top \mathbf{U} = \mathbf{U}\mathbf{U}^\top =\mathbf{I}}$), for which both directions are easy by the definition of positive semi-definiteness.
3)Observing that ${\| \mathbf{Y -X} \|_F^2 = \|\mathbf{Y}^\top - \mathbf{X}\|_F^2}$ for symmetric ${\mathbf{X}}$ (implied from ${\mathbf{X}\succeq \mathbf{0}}$), one can easily verify that ${\|\mathbf{Y-X}\|_F^2 + \|\mathbf{Y}^\top - \mathbf{X} \|_F^2 = 2 \|\frac{\mathbf{Y} +\mathbf{Y}^\top}{2} - \mathbf{X}\|_F^2}$. So the objective is equivalent to ${\min_{\mathbf{X}\succeq \mathbf{0}} \; \mu \|\mathbf{X}\|_* + \frac{1}{2}\|\frac{\mathbf{Y} +\mathbf{Y}^\top}{2} - \mathbf{ X}\|_F^2}$. $\Box$

As mentioned, the above results are critical to devising proximal point kind of numerical algorithms (and also alternating direction method) for a large family of optimization problems involving the nuclear norm. Recently this paper working on subspace segmentation has a nice take into this slightly more “interesting” version of closed form solution (though the authors have failed to verified the uniqueness of the solution before they used the term “the optimal solution”, also their proof is by construction — less favorable to us as discussed).

Theorem 5 The optimal solution to the unconstrained optimization

$\displaystyle \min_{\mathbf{Z}} \quad \|\mathbf{Z}\|_* + \frac{\tau}{2}\|\mathbf{ X- XZ}\|_F^2 \ \ \ \ \ (11)$

is

$\displaystyle \mathbf{Z}_\diamond = \mathbf{V}\max\left(\mathbf{0}, \mathbf{I} - \frac{1}{\tau} \mathbf{\Lambda}^{-2}\right)\mathbf{V}^\top, \ \ \ \ \ (12)$

where ${\mathbf{X} = \mathbf{U\Lambda V}^\top}$ is the SVD of ${\mathbf{X}}$ and we have assumed ${\frac{1}{0} = \infty}$ and ${\infty^2 = \infty}$ for convenience. Furthermore ${\mathbf{Z}_\diamond \succeq \mathbf{0}}$.

Before moving into the actual proof, we note that the form of ${\mathbf{Z}_\diamond}$ is reminiscent of the solution to (2) and (3). Indeed the optimization problems are closely related — the difference lies at if the equality ${\mathbf{X = XZ}}$ is enforced through a constraint or a soft loss term. Besides though ${f\left(\mathbf{Z}\right) = \| \mathbf{Z}\|_F^2}$ are strictly convex w.r.t. ${\mathbf{Z}}$, we can only show that ${g\left(\mathbf{Z}\right) = \| \mathbf{X-XZ} \|_F^2}$ is convex w.r.t. ${\mathbf{Z}}$ as follows. For ${0\leq \alpha \leq 1}$, we have

$\displaystyle \begin{array}{rcl} & & \| \mathbf{X}-\alpha \mathbf{X} \mathbf{Z}_1 - \left(1-\alpha\right)\mathbf{X} \mathbf{Z}_2 \|_F^2 \\ & = &\| \alpha \left( \mathbf{X}- \mathbf{X} \mathbf{Z}_1\right) + \left(1-\alpha\right) \left( \mathbf{X}- \mathbf{X} \mathbf{Z}_2\right) \|_F^2 \\ & = & \alpha^2 \| \mathbf{X}- \mathbf{X} \mathbf{Z}_1 \|_F^2 + \left(1-\alpha\right)^2 \| \mathbf{X}- \mathbf{X} \mathbf{Z}_2 \|_F^2 + 2\alpha \left(1-\alpha\right) \langle \mathbf{X}- \mathbf{X} \mathbf{Z}_1, \mathbf{X}- \mathbf{X} \mathbf{Z}_2 \rangle \\ & \leq & \alpha^2 \| \mathbf{X}- \mathbf{X} \mathbf{Z}_1 \|_F^2 + \left(1-\alpha\right)^2 \| \mathbf{X}- \mathbf{X} \mathbf{Z}_2 \|_F^2 + 2\alpha \left(1-\alpha\right) \| \mathbf{X}- \mathbf{X} \mathbf{Z}_1\|_F \| \mathbf{X}- \mathbf{X} \mathbf{Z}_2\|_F \\ & \leq & \alpha \| \mathbf{X}- \mathbf{X} \mathbf{Z}_1 \|_F^2 + \left(1-\alpha\right) \| \mathbf{X}- \mathbf{X} \mathbf{Z}_2 \|_F^2, \end{array}$

where the last inequality follows from the observation that ${\forall m, n \geq 0}$,

$\displaystyle \alpha^2 m^2 + \left(1-\alpha\right)^2 n^2 + 2\alpha\left(1-\alpha\right) mn -\alpha m^2 -\left(1-\alpha\right) n^2 = -\alpha(1-\alpha) \left(m-n\right)^2 \leq 0,$

and ${\langle \cdot, \cdot\rangle}$ denotes matrix inner product. Notice that for the last inequality, the equality holds only when ${m = n}$, or back to our complex form, ${\| \mathbf{X}- \mathbf{X} \mathbf{Z}_1 \|_F = \| \mathbf{X}- \mathbf{X} \mathbf{Z}_2 \|_F}$. On the other hand, for the inner product to norm product inequality in line 3 to 4, obviously the equality is taken only when ${\mathbf{X-XZ}_1 = \lambda \left( \mathbf{X-XZ}_2\right)}$ for a non-zero constant ${\lambda}$ or any taking ${\mathbf{0}}$. Altogether, the equality is achieved only when ${\mathbf{X-XZ}_1 = \mathbf{X-XZ}_2 \Longrightarrow \mathbf{XZ}_1 =\mathbf{XZ}_2}$. This provides an important hint on when ${g\left(\mathbf{Z}\right)}$ is strictly convex:

Proposition 6 For a given matrix ${\mathbf{X}}$, if ${\mathbf{XZ}_1 \neq\mathbf{XZ}_2 ~\forall \mathbf{Z}_1 \neq \mathbf{Z}_2}$, ${g\left(\mathbf{Z}\right)}$ is strictly convex. In other words, ${g\left(\mathbf{Z}\right)}$ is strictly convex when ${\mathbf{X}}$ is injective, or equivalently ${\mathbf{X}}$ has full column rank.

So far it’s clear that the objective in (11) is in general not strictly convex, and hence the optimization may not yield unique minimizer. The following deferred proof shows otherwise positive.

Proof: , sketch} We assume the full SVD of ${\mathbf{X}}$ as ${\mathbf{X} =\mathbf{U\Lambda V}^\top}$, implying ${\mathbf{UU}^\top = \mathbf{U}^\top \mathbf{U} = \mathbf{I}}$ and ${\mathbf{VV}^\top = \mathbf{V}^\top \mathbf{V} = \mathbf{I}}$. By the UIP of Schattern norms, it’s easy to see ${\|\mathbf{X-XZ}\|_F^2 =\|\mathbf{U}^\top \left(\mathbf{X-XZ}\right)\mathbf{V}\|_F^2 = \|\mathbf{\Lambda} - \mathbf{\Lambda V}^\top \mathbf{ZV}\|_F^2}$, and hence the objective can be written as ${ \|\mathbf{V}^\top \mathbf{Z} \mathbf{V}\|_*+ \frac{\tau}{2}\|\mathbf{\Lambda} - \mathbf{\Lambda V}^\top \mathbf{ZV}\|_F^2 = \|\mathbf{D}\|_*+ \frac{\tau}{2}\|\mathbf{\Lambda} - \mathbf{\Lambda D}\|_F^2}$ with the substitution ${\mathbf{D} = \mathbf{V}^\top \mathbf{ZV}}$. For the diagonal (but non-vanishing) ${\mathbf{\Lambda}}$ and a positive ${\tau}$, it’s not hard to see that ${ \|\mathbf{D}\|_*+ \frac{\tau}{2}\|\mathbf{\Lambda} - \mathbf{\Lambda D}\|_F^2}$ is minimized only when ${\mathbf{D}}$ is also diagonal (by the pinching equality for nuclear norm as mentioned above and also property for squared Frobenius norm). The diagonal elements can be uniquely determined from the corresponding scalar optimization problems. For invertible ${\mathbf{V}}$, we must have ${\mathbf{Z} =\mathbf{VDV}^\top}$. The positive semi-definiteness can be easily established as the diagonal matrix ${\mathbf{D}_\diamond = \max \left(\mathbf{0}, \mathbf{I} - \frac{1}{\tau}\mathbf{\Lambda}^{-2}\right) \geq \mathbf{0}}$, i.e., is component-wise nonnegative.

Uniqueness. The uniqueness can be seen from two different aspects.First, perhaps the most simple, we notice that the only source of ambiguity for ${\mathbf{Z}_\diamond}$ comes from the non-uniqueness of ${\mathbf{V}}$ in the SVD. But we know from the above argument that the minimizer(s) has the form ${\mathbf{Z}_\diamond = \mathbf{MDM}^\top}$ (for clarity, we switch to ${\mathbf{M}}$ for a general orthogonal matrix). The form of ${\mathbf{D}_\diamond}$ indicates that only those singular vectors corresponding to singular values that are greater than ${1/\sqrt{\tau}}$ matter, i.e., ${\mathbf{Z}_\diamond = \sum_{\sigma_i > 1/\sqrt{\tau}} d_{\sigma_i} \mathbf{m}_i\mathbf{m}_i^\top = \mathbf{M}_\tau \mathbf{D}_\tau \mathbf{M}_\tau^{\top}}$, where we have partitioned the matrices according to the threshold. However we must have ${\mathbf{M}_\tau = \mathbf{V}_\tau \mathbf{R}_\mathbf{M}}$ for a certain rotation matrix ${\mathbf{R}_\mathbf{M}}$ that represents the in-plane rotation of the subspace basis ${\mathbf{V}_\tau}$ and ${\mathbf{Z}_\diamond = \mathbf{M}_\tau \mathbf{D}_\tau \mathbf{M}_\tau^{\top} = \mathbf{V}_\tau\left( \mathbf{R}_\mathbf{M} \mathbf{D}_\tau \mathbf{R}_\mathbf{M}^\top\right) \mathbf{V}_\tau^\top = \mathbf{V}_\tau\mathbf{D}_\tau \mathbf{V}_\tau^\top}$. The second way goes as follows. Suppose there is another minimizer in the form ${\mathbf{Z}_* = \widetilde{\mathbf{V}} \mathbf{D}_\diamond \widetilde{\mathbf{V}}^\top}$, we must have the perturbation relationship ${\widetilde{\mathbf{V}} \mathbf{D}_\diamond \widetilde{\mathbf{V}}^\top = \mathbf{V}\mathbf{D}_\diamond \mathbf{V}^\top + \mathbf{H} }$. It follows

$\displaystyle \begin{array}{rcl} \|\mathbf{H}\|_F^2 & =& \| \widetilde{\mathbf{V}} \mathbf{D}_\diamond \widetilde{\mathbf{V}}^\top - \mathbf{V}\mathbf{D}_\diamond \mathbf{V}^\top\|_F^2 = \|\widetilde{\mathbf{V}} \mathbf{D}_\diamond \widetilde{\mathbf{V}}^\top\|_F^2 + \| \mathbf{V}\mathbf{D}_\diamond \mathbf{V}^\top\|_F^2 - 2\langle \widetilde{\mathbf{V}} \mathbf{D}_\diamond \widetilde{\mathbf{V}}^\top, \mathbf{V}\mathbf{D}_\diamond \mathbf{V}^\top\rangle \\ & = & 2\|\mathbf{D}_\diamond\|_F^2 - 2 \langle \mathbf{D}_\diamond, \mathbf{D}_\diamond \rangle = 0, \end{array}$

by which we conclude ${\mathbf{H} = \mathbf{0}}$. $\Box$

The above proof has verified Lemma 1 in this paper. Lemma 2 and 3 therein are built on the results of Lemma 1, and the uniqueness of the respective solution can also be established by similar perturbation argument as shown above.

— 3. Connection to other work and generalization —

Recently unitarily invariant norms used as regularizers in machine learning have been touched on in Ref [2]. This paper has in particular focused on characterize optimal solutions to such spectral regularization problems. In addition, Ref [1] shares similar flavors and talks about unitarily invariant functions as well . These techniques draws heavily from matrix analysis, and hence the seminal paper Ref [3] proves to be extremely useful here. Besides, the very fresh monograph Ref [4] on matrix-valued functions are also pertinent in this line (e.g., refer to this paperto see how some recent matrix cone programming are tackled).

$\displaystyle \spadesuit \quad \spadesuit \quad \spadesuit \quad \spadesuit \quad \spadesuit$

1. Lu, Z.S. and Zhang, Y. Penalty Decomposition Methods for Rank Minimization. Submitted to arXiv.org.
2. Argyriou, A., Micchelli, C., Pontil, M. On Spectral Learning. Journal of machine learning research, 11(Feb):935 — 953, 2010
3. Lewis, AS. The Convex Analysis of Unitarily Invariant Matrix Functions. Journal of Convex Analysis, 1995.
4. Higham, Nick. Matrix Functions: Theory and Algorithms. Cambridge university press, 2008.

[Update Aug 12 2011] Refer to this paper also for similar discussions. (Citation: Yu, YL and Schuurmans. Rank/Norm Regularization with Closed-Form Solutions: Application to Subspace Clustering. In UAI 2011).

[Update Jan 25 2012] I repeatedly encountered situations where variational characterization of nuclear norm could be useful. One piece of result, for example, is ${\|\mathbf{X}\|_* = \min_{\mathbf{X=UV}^\top} \|\mathbf{U}\|_F \|\mathbf{V}\|_F = \min_{\mathbf{X=UV}^\top} \frac{1}{2} \left(\|\mathbf{U}\|_F^2 + \|\mathbf{V}\|_F^2\right)}$ (this can be found in this paper without proof). To see this is true, one can set ${\mathbf{U} = \mathbf{L\Sigma}^{1/2}}$ and ${\mathbf{V} = \mathbf{\Sigma}^{1/2} \mathbf{R}^\top}$ where ${\mathbf{X} = \mathbf{L\Sigma R}^\top}$ is the SVD of ${\mathbf{X}}$, suggesting ${\|\mathbf{X}\|_* \geq \min_{\mathbf{X=UV}^\top} \frac{1}{2} \left(\|\mathbf{U}\|_F^2 + \|\mathbf{V}\|_F^2\right) \geq \min_{\mathbf{X=UV}^\top} \|\mathbf{U}\|_F \|\mathbf{V}\|_F}$. On the other hand, the Cauchy-Schwarz inequality for unitarily invariant norms tells ${ \|\mathbf{X}\|_* = \|\mathbf{UV}^\top\|_* \leq \|\mathbf{U}\|_F \|\mathbf{V}\|_F}$ (see, e.g., Ex IV 2.7 in Matrix Analysis by Bhatia).

## Action Analysis/Subspace Segmentation Updated; Event Video Dataset

I have just added in accepted papers in CVPR 2011 on action recognition and subspace segmentation. There’s a noticeable blossom of papers on various aspects of action recognition, which almost doubles the number accepted in CVPR 2010. While it’s great to see people shifting their attention onto this topic, I regret to say many papers are only worth 30 seconds glimpse and period. And still, many authors have not released their papers to the public places (for which I cannot add links, and I’m never willing to add links to in front of a pay-wall). My most reluctant response to this is to direct those to my blog article Nobody Cares about You and Your Paper.

And new challenges and opportunities always come with a new dataset in vision, especially when it’s gigantic in size. In this regard, VIRAT Video Dataset could be described as large-scale, for now.

## John Told Me …

… the thing that i’m trying to convince you of is threefold:

(i) in terms of ability and work ethic, you have everything you need to be successful doing theory research (or any other kind of research)  — I’m happy to hear that

(ii) if you want to be successful doing theory work, you’re going to have to modify somewhat the  standard that you set for yourself, when you judge whether you know something or not … you have to be willing not just to know about things, but to put in the effort to understand them down to the core … to demand complete clarity of yourself … a good test is whether you can teach the material, say to your labmates.

(iii) if you set that standard for yourself, and follow up on it, the progress that you make will be extremely rapid (probably much faster than you think) — the reason for this is that when you learn one thing thoroughly, you also learn about how to learn — you’ll eventually be able to pick up new ideas and understand them thoroughly much more quickly.

——————————————————–

Thanks to John for your encouragement!