Uncategorized


When I had to write some formulas in the blog I used to render them from LaTeX to images using some websites such as Online LaTeX Equation Editor. It has the disadvantage that you’ve got to write the equation, render it, save it to your harddisk and then upload to the WordPress file manager. Now, thanks to an interesting plugin written by Steve Mayer you can embed its usage into WordPress just by typing the LaTeX commands between special tags.

\sum_{n=0}^{\infty}\frac{(-\phi^2)^n}{(2n)!}

It uses MimeTeX  which is an standalone program that directly renders LaTeX expressions into images without using the entire TeX package or its fonts. Thus, it’s a simple, lightweight and elegant solution ready to be used in your websites or blogs.

D.

Golden Cup

 

Last week, we attended to Campus Party in Valencia. At CampusBot area, some robots competitions took place along the week and we had our line followers ready to fight. The results were pretty good: In the qualifying session on Tuesday we got the fastest two times with our two robots and in the finals on Saturday (being held at Ciudad de Las Artes y Las Ciencias) we managed to win the two first places in the podium !

 Here you can see some videos:

 

 

 

Slayer S8 during our first training rounds:

Slayer S8 from a speed view like if it was an F1 camera :-) running at more than 2,5m/s:

Slayer S8 at Semifinals Round against the later 4th classified:

It was a great week and the robots performed pretty well in such a speedy track. I might upload some more media when I finish collecting all the videos and pictures from the event.

My colleague Alberto Calvo and me are already thinking in our next robot which will have some kind of inertial control based on gyroscopes and accelerometers. I’ll keep the blog up to date.

Special thanks to our teammates and friends Luis-Ángel Gónzalez and Daniel de la Torre who drove more than 800 km by car just to watch the final rounds and support us (well and to have a nice Paella in front of the beach). Thanks guys!

More to come,
Daniel

I have worked on several platforms based on ARM cores: ARM7, ARM9 and XScale. ARM architecture has been present in more than 2 billion embedded products over the last 10 years, ranging from cell phones to automotive braking systems. I think ARM architecture is great for embedded computers and I needed to learn it so that I could get its best.

ARM System Developer’s Guide

The best book I found on this is ‘ARM System Developer’s Guide’ by Andrew Sloss (ARM Inc.), Dominic Symes (ARM Ltd.) and Chris Wright (Ultimodule Inc.). It covers all the ARM cores, XScale processors, demonstrates how to implement DSP algorithms, describes cache technologies that surround the ARM cores, as well as efficient memory management techniques.

Among the tasks I’ve had to deal with I’d point out Artificial Vision and Vocoders implementation for being computationally expensive.

In vocoders implementation and optimization case, I had to face some large of Digital Signal Processing issues which leaded me to write assembly code in order to get the best performance. Also I had to focus in the cache usage because it can speed up the execution time amazingly. Not to mention avoiding pipeline stalls and efficient use of the registers. The mentioned book covers all these topics both with theory and practical examples. There’s a dedicated chapter about DSP which even includes source code ready to use.

Even if you don’t need to write assembly code you can learn how to write efficient C code for ARM. I will show a simple example which can improve ‘intensive loop’ codes:

int checksum(int *data)
{
        unsigned int i;
        int sum=0;
        for(i=0; i<64; i++)
        {
                sum+=(*data++);
        }
}

This compiles to:

        mov     r2,r0           ; r2=data
        mov     r0,#0           ; sum=0
        mov     r1,#0           ; i=0
checksum_loop
        ldr     r3,[r2],#4      ; r3 = *(data++)
        add     r1,r1,#1        ; i++
        cmp     r1,#0×40        ; compare i,64
        add     r0,r3,r0        ; sum+=r3
        bcc     checksum_loop   ; if(i<64) loop
        mov     pc,lr           ; return sum

The above code is not efficient, we can avoid the 3 steps loop: ADD 1 to i, comparison and the conditional branch instruction.
Instead:

int checksum_eff(int *data)
{
        unsigned int i;
        int sum=0;
        for(i=64; i!=0; i—-)
        {
                sum += *(data++);
        }
        return sum;
}

As you can see, we just rewrote the loop to be descendent rathern than the previous incrementing loop. Let’s have a look at the compiled code:

        mov     r2,r0                   ; r2=data
        mov     r0,#0                   ; sum=0
        mov     r1,#0×40                ; i=64
checksum_2_loop
        ldr     r3,[r2],#4              ; r3=*(data++)
        subs    r1,r1,#1                ; i– and set flags
        add     r0,r3,r0                ; sum+=r3
        bne     checksum_2_loop         ; if (i!=0) loop
        mov     pc,lr                   ; return sum

As you can see, the loop work is done entirely by the subs and bne instructions. The comparison with zero is free since the result is stored in the condition flags. Thus we can see that it’s more efficient to make decrementing loops in ARM than incrementing ones.

Let’s have a look at the way one could have written the same function in C without taking efficiency into account:

int checksum(int *cata)
{
        char *i;
        int sum=0;
        for(i=0; i<64;i++)
        {
                sum += data[i];
        }
}

And the compiler output:

        mov     r2,r0                   ;r2=data
        mov     r0,#0                   ;sum=0
        mov     r1,#0                   ;i=0
checksum_loop
        ldr     r3,[r2,r1,lsl #2]       ;r3=data[i]
        add     r1,r1,#1                ;r1=i+1
        and     r1,r1,#0xff             ;i=(char)i
        cmp     r1,#0×40                ;compare i and 64
        add     r0,r3,r0                ;sum+=r3
        bcc     checksum_loop           ;if(i<64) loop
        mov     pc,lr                   ;return sum

At first, one may think that declaring i as char uses less register space or less space on the ARM stack than an int. On ARM these assumptions are wrong and that’s why the output code includes the AND instruction with 0xFF which actually slows the execution without saving space. So our checksum_eff function is pretty much faster without not so much effort just by knowing a little bit about ARM architecture.

Rewriting C functions is the first thing that should be done when optimizing before digging down into assembly. Special care must be taken in those functions with nested loops or with too many iterations. It’s also useful some profiling tool like Intel VTune Performance Analyzer to check if all our optimizations are really optimizations and how much we have speeded its execution up.

Hope you found this article useful,

Daniel

« Previous Page