Category Archives: Matlab

Matlab interface with JSON data

Interfacing with different types of data is important, and JSON (for whatever reason) seems to be a popular format. For the past couple of years we have used JSONLab Mathworks File Exchange like many other folks out there, and yes it is good!

However, one of the problems with the JSON format is that it repeats the key (as in key-value concept) for every element and is thus quite redundant. This also makes the JSON file bulky, and complicated deep JSON files (lists of dictionaries of lists of dictionaries, you get the idea…) take forever to parse.

Thanks to Christian Panton, there is now a matlab-json repository on Github which uses the json C/C++ libraries, creates a MEX file, and thus makes reading a JSON file very fast.

All timing is computed as the average of 10 runs when not mentioned otherwise. It includes the time taken to open and read the JSON file (both use fscanf in Matlab). For a relatively small JSON file 9.3KB which has a simple structure array, JSONLab takes roughly 220 milliseconds, while matlab-json takes 9.6 milliseconds. That’s a massive improvement for a small file. For a file with complexity depth 2 (structure array of structure) and size 9.0MB, JSONLab takes 4869 seconds! (that’s 1h20min, computed on 1 run) while matlab-json takes just 1.54 seconds.

The repository includes a binary for windows (I haven’t tested) and is easy to compile for linux. Just install libjsoncpp-dev, libjson-c-dev and you are good to go with the make.m. To maintain compatibility with JSONLab’s “loadjson” which can directly take a filename as input, you may need to wrap matlab-json’s “fromjson” to try and read from file first.

You can now use big and ugly JSON files with your Matlab.

Advertisements

Generate Positive and Negative Pairs given Identities

This post is about generating positive pairs of features (in-class data points) and negative pairs (out-of-class data points) for tasks such as Metric Learning or simply learning classifiers. We just need 7 lines of Matlab code to generate these pairs by relying on the concept of lower / upper triangular matrices and block diagonal matrices.

So here’s the code! Given a list of ground truth identities ids we first generate the unique identities uids, just to know the number of classes. For each class, we count the number of data points that belong to the class sum(uids(k) == ids) and generate a matrix of all ones.

To find negative pairs, we create a block diagonal matrix using the all ones matrices from each class, and then simply logically negate it to get a 1 in the position of out-of-class points and 0 in in-class points. Selecting a lower triangular after this simply removes the duplicate pairs. tril(~blkdiag(onesmats{:})).

For the positive pairs, we need to have a 1 in the intra-class, and to remove duplicates we again use the upper triangular matrix, negate it thus keeping a lower triangular without the diagonal πŸ˜‰ ~triu(ones(X)). All pairs can be found by directly concatenating the matrices as block diagonals.

uids = unique(ids);
for k = 1:length(uids)
    onesmats{k} = ones(sum(uids(k) == ids));
    pospairmat{k} = ~triu(onesmats{k});
end
[pospairs(:, 1), pospairs(:, 2)] = find(blkdiag(pospairmat{:}));
[negpairs(:, 1), negpairs(:, 2)] = find(tril(~blkdiag(onesmats{:})));

Finally a simple [x, y] = find() allows to get the row, column indices to find the pairs. Suggestions for shorter, faster and smarter techniques is more than welcome in the comments.

PS: The code assumes that the ids are sorted, i.e. features of the same class occur next to each other. If this is not your case, it’s easy to do it just by running a sort and re-indexing the feature vectors.

Clustering and Matlab

Clustering data is a useful technique for compact representation (vector quantization), statistics (mean, variance of group of data) and pattern recognition(unsupervised classification). In this post, we shall briefly see the two major types of clustering techniques, and then look at how easily Matlab deals with them.

One class of the techniques is Hierarchical, usually Agglomerative clustering. As is clear from the words itself, agglomerative clustering involves grouping data points most “near” to each other. The measure of nearness is defined by a pre-defined measure of distance (usually Euclidean). A tree structure is built and we move from each data point being its own cluster to a 1-cluster system. The user then has to decide at which point should the clustering process stop. For example, one could stop the system when the next agglomeration involves combining clusters some x or more distance away. The other way would be to define a maxclusters criterion. However, as we shall see further that sort of defeats the purpose of hierarchical clustering.

In Matlab,
T = clusterdata(X,'cutoffType',cutoffThreshold) does all the clustering work and returns the cluster classes in T. However more insight can be obtained by performing each task individually.

  1. Y = pdist(X,'type') computes distance between data points
  2. Z = linkage(Y,'average') gets the tree linkages matrix
  3. T = cluster(Z,'cutoffType',cutoffThreshold) does the actual clustering

The major advantage of breaking the steps is the ability to view awesome graphs, and observe the goodness of your clustering. dendrogram(Z,'colorThreshold','default') shows the tree structure with each cluster getting a unique colour. One should also try out the silhouette() and cophenet() functions.

The other major group is the standard Partition clustering. The classical algorithm is the K-Means where K clusters are formed. In this the user fixes the number of clusters at the start. Random initial cluster centers are picked (usually from the data points). The iterative process then involves assigning “nearest” points to the cluster center, followed by an update of the cluster center itself. This is repeated until convergence, usually such that the cluster assignment does not change (if data size is tractable).

In Matlab, the command [T, C] = kmeans(X,k) clusters the data points X into k groups, returning the centers C and the target assignment T. More info

The difference between the two types is obvious. Partitional requires defining the number of clusters while the Hierarchical method needs the user to specify some sort of cutoff threshold to stop the agglomeration process. One needs to decide what is the more suitable criteria for a given problem.

PS: You can try out the functions by creating some data on your own by using mean-shifted Gaussians.
X = [-2+randn(100,1); randn(100,1); 2+randn(100,1)];

bsxfun

Just found this awesome function in Matlab called bsxfun. It is rather helpful, and fast too as illustrated below. Usage can be found at Matlab documentation (ofcourse!) or an example as below. It is called in a way a little similar to the late (deprecated) blkproc (or now, blockproc) function with an argument as a function. The “@fun” part.

The example here basically computes the difference, B is subtracted from each row of A.

>> A = rand(1000,3);
>> B = rand(1,3);
>> tic; C1 = bsxfun(@minus,A,B); toc;
Elapsed time is 0.003523 seconds.
>> tic; C2 = A - ones(1000,1)*B; toc;
Elapsed time is 0.090703 seconds.

Note that it performs the subtraction on the columns, (length 3). In case you specify B as B = rand(1000,1) then it will perform the subtraction on the rows (length 1000).

Another way to achieve the same operation, without using a loop, would be using repmat, but it is the slowest of them all. Ofcourse, you should NOT even think of using a loop for such a thing in Matlab!

Communication with Matlab

As the instruments get smarter, and interfaces grow, its crucial to be able to link your software (Matlab) with the hardware or other software, or even other computers on the network. Matlab provides this opportunity via the Instrument Control Toolbox, which allows the creation of TCP/IP or UDP objects to communicate.

Of course, as usual the matlab documentation – Controlling Instruments using TCP/IP or UDP – is awesome, but here is a brief overview. In general, I would recommend to use TCP/IP as its a more “reliable” form of communication. UDP doesn’t check whether the receiver is ready, so data transfer is not guaranteed.

One would start off by first creating a TCP/IP object. This can be done as easily as t = tcpip('RemoteHost',RemotePort); Note that you need to know the correct port number. (in case of instruments, check manual). In case of computer-computer, the software to which you want to talk to, is the communication port. In general, if you just want to send some message across, both the source and destination could agree upon any unused port. RemoteHost is at best the IP address (Domain Names work too but at times need to be defined in your “hosts” file).

The next important step is to open and test the connection. fopen(t); does it for you. An error here indicates something is wrong, and could be possibly a firewall issue.

Now to the most important part, writing and reading data from this object. fprintf(t,'Data_in_string_format'); can put the data, and an fscanf(t) at the receiver end will show the information. In case some problem occurs, the fscanf(t) has a timeout, so that the process doesn’t hang permanently. In case of problem, an easy way to check would be to see get(t,'Status') and confirm it says “open”. Typically the function strfind(string,'pattern'); is what you would use to parse the received data.

Its important to properly close all connections before using the clear command. This can be done by following fclose(t); delete(t); In case this isn’t done, then the object remains open, however it cannot be addressed, and things get complicated. It may then require a Matlab restart for the fopen(t) to work again.

Mex Files – 2

So continuing from the previous post, (1), we shall see how to start writing Mex code, by considering the following simple example of a uniform quantizer. Setup is assumed ready and done.

First and foremost, the filename should be “function_name.c” and not a “.m”. Once compiled, its this function_name that you will call in your other simulation running codes. Instead of the usual includes, we now use the “mex.h” header file. Note that other header files also may work, as this example will present the usage of “math.h” for the round() function.

Then, instead of the standard void main() we now start with a special type of function called mexFunction() and it can be used as follows. Note that the mexFunction is a generic name given to all mex file codes, it’s not the name of the function you want to write. So the first few lines look like this

#include "mex.h"
#include "math.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

nlhs and nrhs are the number of arguments, and *plhs and *prhs are Matlab compatible pointer array references on the output (left) and input (right) side respectively. All this seems quite complicated at first, but it’s not once you actually write one code. Just one!

So, we now need to obtain the location of the data, which is passed on to the function during the call. This is done by using the function mxGetPr(prhs[i]). For example

double *X, *nbits, *Xmax;
X = mxGetPr(prhs[0]);
nbits = mxGetPr(prhs[1]);
Xmax = mxGetPr(prhs[2]);

As this indicates, the vector to be quantized to nbits (2nd input variable) is X, the first input variable; and the maximum quantization value is Xmax, the third input variable.

More information about the dimensions of the vector or matrix can be found by using functions like Xdim = (int)mxGetM(prhs[0]) for no. of rows and mxGetN() for columns.

In case you need to create any temporary variables, typically mxMalloc() is used. However, be sure to free the memory at the end of the program by mxFree(), but we will cover this in the next post. Now we need to allocate memory for the output arguments, and assign them to pointers. Note the usage of mxCreateDoubleMatrix().

double *y, *step;
plhs[0] = mxCreateDoubleMatrix(Xdim, 1, mxREAL);
y = mxGetPr(plhs[0]);
plhs[1] = mxCreateDoubleMatrix(1, 1, mxREAL);
step = mxGetPr(plhs[1]);

where our first output argument y is the quantized vector, and the second is the computed step-size (using nbits and Xmax). The easy part of this is, the function is always mxCreateDoubleMatrix() with changes in the dimensions. Indicating Real or Complex is done by using mxREAL or mxCOMPLEX.

Once this is done, the rest of the code is like any other C code using pointers. One word of caution would be to take care of the dimensions, not to confuse between rows and columns, as this could mess up your code and take a long time to debug.

Try it out, and notice the speed-up! There are many more points, but this post is too long already πŸ™‚ More on this, some time later. Although, at this point, the reader has sufficient knowledge to write simple mex codes on their own. Find the code here.

Symbolic Math Toolbox – Matlab

Your scientific calculator now does definite integration, differentiation, etc. Maybe higher and more complex graphing calculators can do indefinite stuff too! However, I just found this very interesting ability to use a math symbol, typically x, so far known to exist only in Mathematica. This is the Symbolic Math Toolbox, and its uses are numerous.

Although I am sure it requires a lot of development, specially compatibility of integrating with other data types that Matlab supports, for starters, it seems like a really nice feature. The example where I used it is like this. Consider a simple 3×3 symmetric matrix, whose non-diagonal corners are equal, and unknown. You want to obtain a value for this, such that the determinant of the matrix goes to 0. Manually, yes, its easy. Its a simple quadratic equation, and it can be solved for. However, now consider a 20×20 matrix πŸ™‚ Its terrible to compute its determinant, and then there is still an equation to solve. What symbolic math toolbox allows you to do is, define a variable – say z. In my case it would then be

syms z
Rxx = [1, 0.9, 0.8, ... z; ...; ...; z, ..., 0.8, 0.9, 1];
solve(det(Rxx));

and you have the possible solutions for this variable z. Note the usage of the function solve() which is typically used to solve for variable given a polynomial equation.

It also does all the standard integration, differentiation, and even Taylor’s series expansion πŸ™‚ and has a lot of applications. I want to now check it out for the basic transforms – Fourier, Laplace, Z, etc. Best to refer to the Matlab documentation on the toolbox for more information and examples on this. I am really loving this combination of the power of “indefinite” math along with powerful numeric functions.