We are interested in solving:
\[\hat{x} = \text{argmin}_{x} \frac{1}{2} \|b - Ax\|_2^2 + \lambda \|x\|_1.\]Denote the objective function as $f(x) = \frac{1}{2} |b - Ax|_2^2 + \lambda |x|_1$. We will derive an algorithm for solving this.
Let $A = I$. Then the problem has the objective:
\[f(x) = \frac{1}{2} \|b - x\|_2^2 + \lambda \|x\|_1 = \sum_{i=1}^n \left( \frac{1}{2} |b_i - x_i|^2 + \lambda |x_i| \right).\]This function is a sum of functions of the form:
\(g( \tau) = \frac{1}{2} |\gamma - \tau|^2 + \lambda |\tau|, \; \; \; \; \gamma \in \mathbb{R}.\) We compute the subdifferential
\[\partial g( \tau) = \tau - \gamma + \lambda \partial|\tau|,\]where
\[\partial|\tau| = \begin{cases} -1 & \text{for } \tau < 0, \\ 1 & \text{for } \tau > 0, \\ [-1, 1] & \text{for } \tau = 0, \end{cases}\]which can be easily computed from the definition. At the unique optimal $\tau^*$ (for each $\lambda, \gamma$), we must have
\[0 \in \partial g( \tau^*).\]We can now compute an expression for an optimal $\tau^$ as a function of $\lambda, \gamma$. If $\tau^ > 0$, we have $0 \in \partial g( \tau^*)$, which implies that
\[\tau^* = \gamma - \lambda.\]Since $\tau^*>0$, this can only be true when $\gamma \geq \lambda$, and $\gamma \geq 0$, and hence we have determined
\[\tau^* = \gamma - \lambda \quad \text{on} \quad [\lambda, \infty).\]Similarly, we compute that
\[\tau^* = \gamma + \lambda \quad \text{on} \quad (-\infty, \lambda].\]Finally, we have the case of $\tau^* = 0$, which implies that
\[0 \in -\gamma + \lambda [-1, 1],\]and hence $\gamma \in [-\lambda, \lambda]$.
This is a piecewise linear function. Another way of representing this is
\[\tau^* = ST_\lambda( \gamma),\]Definition 1 The soft-thresholding function is
\[ST_\lambda( \gamma):=sgn( \gamma)\max(|\gamma|-\lambda,0).\]Hence we have characterized optimal $\hat{x}$ as having coordinates described by $ST_\lambda(b_i)$.
We now prove the result for general PSD $A$:
\[h(x, x^{(0)}) = \frac{c}{2} \|x - x^{(0)}\|_2^2 - \frac{1}{2} \|Ax - Ax^{(0)}\|_2^2.\]For this function to be strictly convex in $x$, we must have:
\[c > \|A^\top A\|_2 = \lambda_{\max}(A^\top A).\]Define the new objective function:
\[\tilde{f}(x, x^{(0)}) = f(x) + h(x, x^{(0)}) = \frac{1}{2} \|b - Ax\|_2^2 + \lambda \|x\|_1 + \frac{c}{2} \|x - x^{(0)}\|_2^2 - \frac{1}{2} \|Ax - Ax^{(0)}\|_2^2.\]Our goal will be to reduce this to something resembling the case with $A=I$, so we introduce the variable:
\[\nu_0 = x^{(0)} + \frac{1}{c} A^\top (b - Ax^{(0)}).\]Then, by some algebra, we have:
\[\tilde{f}(x, x^{(0)}) = \frac{1}{2} \|b - Ax\|_2^2 + \lambda \|x\|_1 + \frac{c}{2} \|x - x^{(0)}\|_2^2 - \frac{1}{2} \|Ax - Ax^{(0)}\|_2^2 = C + \lambda \|x\|_1 + \frac{c}{2} \|x - \nu_0\|_2^2\]where $ C $ doesn’t depend on $ x $ or $ x^{(0)} $. From before, we see that this is minimized by:
\[x^{(0),*} = ST_{\frac{\lambda}{c}}( \nu_0) = ST_{\frac{\lambda}{c}} \left( x^{(0)} + \frac{1}{c} A^\top (b - Ax^{(0)}) \right)\]Thus,
\[x^{(0),*} = \text{argmin}_x \tilde{f}(x, x^{(0)}) = \text{argmin}_x f(x) + h(x, x^{(0)}).\]This suggests (and derives) the proximal algorithm for this objective function and regularization:
\[x^{(i+1)} = ST_{\frac{\lambda}{c}} \left( x^{(i)} + \frac{1}{c} A^\top (b - Ax^{(i)}) \right).\]This is the ISTA algorithm. More generally, we can change the $ \ell^1 $ regularization to get iterates of the form:
\[x^{(i+1)} = T_{\frac{\lambda}{c}} \left( x^{(i)} + \frac{1}{c} A^\top (b - Ax^{(i)}) \right).\]