krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Bare Metal Examples

Mon Mar 04, 2013 8:08 pm

I have made a Git Hub for all my complete R-Pi projects in bare metal:
https://github.com/PeterLemon/RaspberryPi

This includes minimal Hello World CPU & DMA Demos, Tags Channel Demo, VFP Fractal Demos, NES & SNES Input Demos & Sound Demos.

Screenshots of Input Demos:
https://github.com/PeterLemon/Raspberry ... roller.png
https://github.com/PeterLemon/Raspberry ... roller.png

I have more to come:
N64,PSX,PS2 Controller & SNES Mouse input.
VFP software 3D engine.
& some other surprises!!
This should be a nice start...

I hope this helps people out =D

tufty
Posts: 1456
Joined: Sun Sep 11, 2011 2:32 pm

Re: Bare Metal Examples

Mon Mar 04, 2013 8:36 pm

Nice

User avatar
DexOS
Posts: 876
Joined: Wed May 16, 2012 6:32 pm
Contact: Website

Re: Bare Metal Examples

Mon Mar 04, 2013 10:45 pm

Great work krom and all using FasmArm 8-) .
Batteries not included, Some assembly required.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Tue Apr 16, 2013 3:06 pm

Cheers Dex & tufty =D

I have updated my Git Hub with new DMA & DREQ Sound demos, this is an easy way to create a sound buffer that plays & loops in the background using DMA.
Also I have enabled L1 Cache in my VFP/Fractal demos, speeding them up =D
Finally I did a few cleanups to my LIB files (mainly DMA), and updated all demos accordingly.
https://github.com/PeterLemon/RaspberryPi

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Tue Oct 29, 2013 8:05 am

Hi everyone, sorry for the lack of updates recently...

I have provided examples of LZ77 & Huffman decompression, compatible with the same standards used by
Nintendo GBA/DS bios decompression functions:
https://github.com/PeterLemon/Raspberry ... r/Compress

There is a simple minimal demo & a GFX demo for both LZ77 & Huffman.
The minimal demos decompress the Raspberry Pi logo to RAM.
The GFX demos decompress the Raspberry Pi logo to the screen.

The best LZ77/Huffman compressor I have found is called "Nintendo DS/GBA Compressors" by CUE.
You can find it here: http://www.romhacking.net/utilities/826/
It also includes full CPP source code for all of it's compressors.

Here is an url to the best explanation of LZ77/Huffman decompression I have found:
http://nocash.emubase.de/gbatek.htm#bio ... nfunctions

P.S The variant of LZ77 on GBA/NDS is sometimes called LZSS hence the naming scheme in this util,
but you will find that it is indeed LZ77 compatible data.

Huffman:
Command line used to compress data: "huffman -e8 RaspiLogo24BPP.bin"
Original data size: RaspiLogo24BPP.bin = 921600 bytes
Compress data size: RaspiLogo24BPP.huff = 241984 bytes

LZ77:
Command line used to compress data: "lzss -ewo RaspiLogo24BPP.bin"
Original data size: RaspiLogo24BPP.bin = 921600 bytes
Compress data size: RaspiLogo24BPP.lz = 135168 bytes

Many thanks to my friend Andy Smith, who helped me understand the Huffman decoding =D

User avatar
DexOS
Posts: 876
Joined: Wed May 16, 2012 6:32 pm
Contact: Website

Re: Bare Metal Examples

Tue Oct 29, 2013 10:02 pm

Cool nice work, what app do you use for compress the bmp file ?.
Also can this be used for wav file etc with added code ?.
Batteries not included, Some assembly required.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Wed Oct 30, 2013 5:11 am

Hi Dex, cheers =D
DexOS wrote:what app do you use for compress the bmp file ?
The .bin file is just the RAW binary data of the 640x480 24bit .BMP file.
e.g 3 bytes per pixel (RGB), 640*480*3 = 921600 bytes.
I use my own .bmp converter that I wrote for the N64, as the data is in a similar big endian format for the Raspberry Pi.
I then used that util I gave the link to above to compress the .bin file: http://www.romhacking.net/utilities/826/
Here are the exact steps I used to create the compressed data:
Huffman
1. Create a 640x480 24 bits per pixel (bpp) .bmp format picture.
2. I use this command line to create the Raw .bin file: "BMP2BIN RaspiLogo24BPP.bmp RaspiLogo24BPP.bin"
3. I use this command line to get the compressed data: "huffman -e8 RaspiLogo24BPP.bin"
4. Then rename RaspiLogo24BPP.bin to RaspiLogo24BPP.huff
LZSS
1. Create a 640x480 24 bits per pixel (bpp) .bmp format picture.
2. I use this command line to create the Raw .bin file: "BMP2BIN RaspiLogo24BPP.bmp RaspiLogo24BPP.bin"
3. I use this command line to get the compressed data: "lzss -ewo RaspiLogo24BPP.bin"
4. Then rename RaspiLogo24BPP.bin to RaspiLogo24BPP.lz
DexOS wrote:Also can this be used for wav file etc with added code ?
Yes it could be used to do this & very easily, these compressions will work on any data =D
I did a test using the bible in RAW .txt format, & I used Huffman to compress the text, then display it to the screen =D
You will need to experiment to see which compression gives the smallest ouput data size:
Huffman was smaller than LZ77 for the bible text data.
LZZ77 was smaller than Huffman for the RaspiLogo24BPP GFX data.
A) You could compress a whole .WAV file & decompress it to ram, then play it from there using our sound playing code.
B) You could compress a whole .WAV file & decompress it byte by byte straight to DMA/DREQ sound buffer data on the fly.
P.S there is a 16MB-1 (16777215 bytes) limit on the input data size used for the compressor utils.
If you check out the DOC here: http://nocash.emubase.de/gbatek.htm#bio ... nfunctions
You will see the Huffman & LZ77 sections describe the Data Header has a 24bit size of decompressed data in bytes.
This is large enough for 16MB-1 (16777215 bytes) of data.

Hope this helps you out m8 =D

uart77
Posts: 21
Joined: Thu Jun 20, 2013 8:27 pm

Re: Bare Metal Examples

Wed Oct 30, 2013 12:33 pm

Hey, krom and Dex :)

In your FASMARM.INC file, I noticed the move 32BIT immediate can produce 2-4 pointless instructions: "orr 0's" in the disassembly for each 8-24BIT value:

Code: Select all

macro imm32 reg,immediate {
  mov reg,(immediate) and $FF
  orr reg,(immediate) and $FF00
  orr reg,(immediate) and $FF0000
  orr reg,(immediate) and $FF000000
}
My 2 movi macros: prevents "orr 0's", supports constant rotation:

Code: Select all

; load/construct 8-32BIT immediate/address...

macro movi a, b {
 local n
 if (b)=-1 | (b)=0FFFFFFFFh
  mvn a, 0
 else if (b)>0 & (b)<255
  mov a, (b)
 else
  mov a, (b) and 0FFh
  n=(b) and 0FF00h
  if n<>0
   orr a, a, n
  end if
  n=(b) and 0FF0000h
  if n<>0
   orr a, a, n
  end if
  n=(b) and 0FF000000h
  if n<>0
   orr a, a, n
  end if
 end if
}

macro movi a, b {
 local n
 if (b)=-1 | (b)=0FFFFFFFFh ; -1/0FFFFFFFFh
  mvns a, 0               
 else if (b)>=0 & (b)<=255  ; 8BIT
  dw (0E3Bh shl 20) or \    ; movs a, b
   (a shl 16) or \
   (a shl 12) or b
 else if \
  ((b) and 0FFFF00FFh)=0    ; any byte
  dw 0E3B00C00h or \        ; shift left?
   (a shl 12) or (b shr 8)  ; including
 else if \                  ; powers of 2
  ((b) and 0FF00FFFFh)=0    ; FF0000h
  dw 0E3B00800h or \
   (a shl 12) or (b shr 16)
 else if \
  ((b) and 00FFFFFFh)=0     ; 0FF000000h
  dw 0E3B00400h or \
   (a shl 12) or (b shr 24)
 else if \
  ((b) and 0FFFFF00Fh)=0    ; odd bytes...
  dw 0E3B00E00h or \        ; FF0h
   (a shl 12) or (b shr 4)
 else if \
  ((b) and 0FFF00FFFh)=0    ; 0FF000h
  dw 0E3B00A00h or \
   (a shl 12) or (b shr 12)
 else if \
  ((b) and 000FFFFFh)=0     ; 0FF00000h
  dw 0E3B00600h or \
   (a shl 12) or (b shr 20)
 else                       ; build value...
  n=(b) and 0FFh
  dw (0E3Bh shl 20) or \    ; movs a, b&0FFh
   (a shl 16) or \
   (a shl 12) or n
  n=((b) and 0FF00h) shr 8
  if n<>0
   dw (0E39h shl 20) \      ; orrs a, a, b&0FF00h
    or (a shl 16) \
    or (a shl 12) \
    or 0C00h or n           ; orrs if 16/24BIT...
  end if
  n=((b) and 0FF0000h) \
   shr 16
  if n<>0
   dw (0E39h shl 20) or (a shl 16) or \
    (a shl 12) or 800h or n
  end if
  n=((b) and 0FF000000h) shr 24
  if n<>0
   dw (0E39h shl 20) or (a shl 16) or \
    (a shl 12) or 400h or n
  end if
 end if
}
Hope it helps!

PS: You have NDS examples? What about WII? If so, they would be good examples in the FASM community.

User avatar
DexOS
Posts: 876
Joined: Wed May 16, 2012 6:32 pm
Contact: Website

Re: Bare Metal Examples

Wed Oct 30, 2013 6:04 pm

@krom, thanks for the info, i mist your link :oops:
@uart77 thanks for the info, those macros came with fasmarm as a work around and i wil keep that in mind if i need to optimize.
P.S: I must say, what you do with macros in your code is like black-magic to me.
There's a very simple GBA example in the arm part of the forum, also you was asking about riscos examples, there some fasmarm ones here:
http://www.raspberrypi.org/phpBB3/viewt ... 53#p445353
Batteries not included, Some assembly required.

uart77
Posts: 21
Joined: Thu Jun 20, 2013 8:27 pm

Re: Bare Metal Examples

Wed Oct 30, 2013 8:23 pm

those macros came with fasmarm as a work around and i wil keep that in mind if i need to optimize
No, I wrote those. FASMARM includes a PC relative load (12BIT +/-) from literal table. Not a small optimization. When you use that imm32 for 8-24BIT values, it produces 2-4 instructions, 8-16 bytes. Standard mov works for any byte value with optional rotation (example: mov r0, 7F000000h). I think krom knows what I'm talking about.
I must say, what you do with macros in your code is like black-magic to me
Only the most advanced programmers (ie, Tomasz) understand the complexity of my code. In this forum, I've heard many requests for macro languages/syntaxes: "Will DexBASIC ever give us an IF/LOOP?". All you had to do was ask. If you need macros, message me anytime in the FASM community.

My Magic-ARM Compiler includes .if+.repeat (LANGUAGE.INC), variable assignments/operations (example: . r0=r1+(r2<<r3)), pointer arithmetic (. (u32) *r0++=*r1++), etc.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Sat Mar 08, 2014 1:06 pm

Hi everybody,

I just finished my SNES Mouse input demo:
https://github.com/PeterLemon/Raspberry ... SMouse.png

You can find the full source here:
https://github.com/PeterLemon/Raspberry ... se/GFXDemo
(It uses the exact same GPIO pin setup as my SNES Controller demo)

I am gonna checkout the NES Zapper (Light Gun) for my next input project =D

m3ntal7
Posts: 12
Joined: Sun Mar 09, 2014 9:17 am

Re: Bare Metal Examples

Sun Mar 09, 2014 10:10 am

I am gonna checkout the NES Zapper (Light Gun) for my next input project =D
Awesome. A duck-hunt style game is easy to make. I have written an assembler for MOS 6502 CPU that includes a Nintendo/NES example. Check it out here.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Sun Mar 09, 2014 1:55 pm

Hi m3ntal7,

I am honored to have your 1st post =D

I have not written anything for the NES yet, but I am pretty expert at the SNES, which I use byuu's 65c816 bass assembler for everything: http://byuu.org/programming/bass/
I will check out your assembler for any NES work I do, especially to play around with the NES sound hardware which is totally ace =D

colinh
Posts: 95
Joined: Tue Dec 03, 2013 11:59 pm
Location: Munich

Re: Bare Metal Examples

Mon Mar 10, 2014 8:58 am

uart77 wrote:
I must say, what you do with macros in your code is like black-magic to me
Only the most advanced programmers (ie, Tomasz) understand the complexity of my code.
:D


We less advanced programmers just make do with

ldr r0, =0x12345678

m3ntal7
Posts: 12
Joined: Sun Mar 09, 2014 9:17 am

Re: Bare Metal Examples

Tue Mar 11, 2014 1:11 am

krom: Hi.
I am pretty expert at the SNES, which I use byuu's 65c816 bass assembler for everything
Why not FASM? Did you know, FASM can used for programming ANY CPU/system, MOS 6502 (Atari 2600, Commodore 64, etc), Zilog Z80, Motorola 68K (Amiga, Sega Genesis, Neo Geo), ARMv7, even Java bytecode, devices, Android, multi-media files (NES sounds)... absolutely anything that you define. FASM has no limitation except the programmer's knowledge and imagination. Some advantages of using FASM:

* Design and implement portable languages. Example: Open(File), Draw(Image, X, Y), If Gamepad(UP), CLEAR and LOGICAL.
* Lightning FAST compile in < 1 second. VC++ takes forever to setup and compile small programs and it's gigabytes in size!
* Written in 100% effecient assembler
* Smallest executable. No setup EVER. Runs anywhere (MicroSD).

Try this small file, a MOS 8BIT assembler that can easily be modified to support the SNES 16BIT model. Just: include '6502.inc'

Code: Select all

; $$$$$$$$$$$$$$$ Z6502 ASSEMBLER $$$$$$$$$$$$$$$$

;                   MMM"""AMV
;                   M'   AMV
;                   '   AMV
;                     AMV   ,
;     .6*"          AMV   ,M   MOS
;   ,M'            AMVmmmmMM
;  ,Mbmmm.    M******  ,pP""Yq.   pd*"*b.
;  6M'  `Mb. .M       6W'    `Wb (O)   j8
;  MI     M8 |bMMAg.  8M      M8     ,;j9
;  WM.   ,M9      `Mb YA.    ,A9  ,-='
;   WMbmmd9        jM  `Ybmmd9'  Ammmmmmm
;           (O)  ,M9
;            6mmm9

; a               - accumulator
; x, y            - general
; status          - flags: SV_BDIZC
; program counter
; stack register

; $00-$FF     - zero page memory
; $0100-$01FF - stack, 1K

; in some emulators:

; $FE         - random byte
; $FF         - last ASCII key
; $0200-$05FF - screen, 32x32

; syntax is standard except replace # with =
; for immediate operands:

; lda =$7F  ; a=i (immediate, prefix with =)
; lda $7F   ; a=[$7F] (zero page)
; lda $7FAA ; a=[$7FAA] (absolute)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

macro verify a {
  if ?s eq 0
    'Error: ' a
  end if
}

macro verify.i i {
  if ~ i eqtype 0
    'Number expected:' i
  end if
}

macro verify.n n, min, max {
  if n<min | n>max
    'Number exceeds range:' min-max
  end if
} 

macro verify.u8 n  { verify.n n, 0, 255 } 
macro verify.i8 n  { verify.n n, -128, 127 }
macro verify.u16 n { verify.n n, 0, 65535 } 

;;;;;;;;;;;;;; ONE-BYTE INSTRUCTIONS ;;;;;;;;;;;;;

macro brk { db $00 } ; non-maskable interrupt
macro clc { db $18 } ; clear carry
macro cld { db $D8 } ; clear decimal
macro cli { db $58 } ; clear interrupt
macro clv { db $B8 } ; clear overflow
macro dex { db $CA } ; x--
macro dey { db $88 } ; y--
macro inx { db $E8 } ; x++
macro iny { db $C8 } ; y++
macro nop { db $EA } ; no operation
macro pha { db $48 } ; push a
macro php { db $08 } ; push status
macro pla { db $68 } ; pull/pop a
macro plp { db $28 } ; pull/pop status
macro rti { db $40 } ; return from interrupt
macro rts { db $60 } ; return from subroutine
macro sec { db $38 } ; set carry
macro sed { db $F8 } ; set decimal
macro sei { db $78 } ; set interrupt
macro tax { db $AA } ; x=a
macro tay { db $A8 } ; y=a
macro tsx { db $BA } ; x=s
macro txa { db $8A } ; a=x
macro txs { db $9A } ; s=x
macro tya { db $98 } ; a=y

;;;;;;;;;;;;;;;;;;;;; BRANCH ;;;;;;;;;;;;;;;;;;;;;

macro jmp [a] {
 common
 define ?s 0
  match (x), a \{ ; indirect
    db $6C
    dw x
    define ?s 1
  \}
  match =0 x,\    ; absolute
   ?s a \{
    db $4C
    dw x
  \}
}

macro bxx o, a
 { db o, (255-($-a)) and $FF }

macro beq a { bxx $F0, a }
macro bne a { bxx $D0, a }
macro bcc a { bxx $90, a }
macro bcs a { bxx $B0, a }
macro bvc a { bxx $50, a }
macro bvs a { bxx $70, a }
macro bmi a { bxx $30, a }
macro bpl a { bxx $10, a }

; jmp to subroutine

macro jsr a {
  db $20
  dw a         ; absolute
}

;;;;;;;; LDA, STA, ADC, SBC, ORA, CMP, ETC ;;;;;;;

; opc =$44     ; immediate (=i)
; opc $AA      ; zero page
; opc $7F, x   ; zero page, x
; opc $4FFF    ; absolute
; opc $88BB, x ; absolute, x
; opc $24EE, y ; absolute, y
; opc ($AC, x) ; (indirect, x)
; opc ($DC), y ; (indirect), y

macro o1.6502 name, aaa, [p] {
 common
  local i, mode, size
  define ?s 0
  match =0 ==a, ?s p \{      ; immediate
    i=a
    mode=010b
    size=1
    if name eq sta
      'Error: ' a
    end if
    define ?s 1
  \}
  match =0 (a=,=x), ?s p \{  ; (indirect, x)
    i=a
    mode=000b
    size=1
    define ?s 1
  \}
  match =0 (a)=,=y, ?s p \{  ; (indirect), y
    i=a
    mode=100b
    size=1
    define ?s 1
  \}
  match =0 a=,b, ?s p \{     ; ?, ?
    i=a
    verify.i a               ; i, ?
    verify.u16 a
    if b eq x                ; i, x
      if a<256               ; zero page, x
        mode=101b
        size=1
      else                   ; absolute, x
        mode=111b
        size=2
      end if
    else if b eq y           ; absolute, y
      mode=110b
      size=2
    else
      'Error: ' b
    end if
    define ?s 1
  \}
  match =0 a, ?s p \{
    i=a
    verify.u16 a
    if a<256                 ; zero page
      mode=001b
      size=1
    else                     ; absolute
      mode=011b
      size=2
    end if
    define ?s 1
  \}
  verify name
  db (aaa shl 5) or \
   (mode shl 2) or 1
  if size=1
    db i
  else
    dw i
  end if
}

macro ora [p] { common o1.6502 ora, 000b, p }
macro and [p] { common o1.6502 and, 001b, p }
macro eor [p] { common o1.6502 eor, 010b, p }
macro adc [p] { common o1.6502 adc, 011b, p }
macro sta [p] { common o1.6502 sta, 100b, p }
macro lda [p] { common o1.6502 lda, 101b, p }
macro cmp [p] { common o1.6502 cmp, 110b, p }
macro sbc [p] { common o1.6502 sbc, 111b, p }

;;;;;;;;;;;;;;;;;;;; INC, DEC ;;;;;;;;;;;;;;;;;;;;

; opc $80      ; zero page
; opc $80, x   ; zero page, x
; opc $A000    ; absolute
; opc $8000, x ; absolute, x

macro o2.6502.a name, aaa, [p] {
 common
  local i, mode, size
  define ?s 0
  match =0 a=,=x,\      ; ?, x
   ?s p \{
    i=a
    if i<256
      size=1
      mode=101b
    else
      size=2
      mode=111b
    end if
    define ?s 1
  \}
  match =0 a, ?s p \{   ; x
    i=a
    if i<256
      size=1
      mode=001b
    else
      size=2
      mode=011b
    end if
    define ?s 1
  \}
  verify name
  verify.u16 i
  db (aaa shl 5) or \
   (mode shl 2) or 2
  if size=1             ; zero page
    db i
  else                  ; absolute
    dw i
  end if
}

macro inc [p] { common o2.6502.a inc, 111b, p }
macro dec [p] { common o2.6502.a dec, 110b, p }

;;;;;;;;;;;;;;; ASL, LSR, ROL, ROR ;;;;;;;;;;;;;;;

; opc          ; accumulator
; opc $80      ; zero page
; opc $80, x   ; zero page, x
; opc $A000    ; absolute
; opc $8000, x ; absolute, x

macro o2.6502.b name, aaa, [p] {
 common
  local i, mode, size
  define ?s 0
  match =0 a=,=x,\      ; ?, x
   ?s p \{
    i=a
    if i<256
      size=1
      mode=101b
    else
      size=2
      mode=111b
    end if
    define ?s 1
  \}
  match =0 _a, ?s p \{  ; x
    if _a eq a          ; accumulator
      size=1
      mode=010b
    else
      i=_a
      if i<256
        size=1
        mode=001b
      else
        size=2
        mode=011b
      end if
    end if
    define ?s 1
  \}
  if ?s eq 0            ; accumulator
    size=1
    mode=010b
    define ?s 1
  end if
  verify name
  db (aaa shl 5) or \
   (mode shl 2) or 2
  if mode<>010b         ; not accumulator?
    verify.u16 i
    if size=1           ; zero page
      db i
    else                ; absolute
      dw i
    end if
  end if
}

macro asl [p] { common o2.6502.b asl, 000b, p }
macro rol [p] { common o2.6502.b rol, 001b, p }
macro lsr [p] { common o2.6502.b lsr, 010b, p }
macro ror [p] { common o2.6502.b ror, 011b, p }

;;;;;;;;;;;;;;;;;;;; STX, LDX ;;;;;;;;;;;;;;;;;;;;

; opc =$44     ; immediate (=i) *
; opc $80      ; zero page
; opc $80, y   ; zero page, y
; opc $AAAA    ; absolute
; opc $AAAA, y ; absolute, y *

; * ldx only

macro o2.6502.c name, aaa, [p] {
 common
  local i, mode, size
  define ?s 0
  match =0 ==a, ?s p \{      ; immediate
    i=a
    mode=000b
    size=1
    if name eq stx
      'Error: ' a
    end if
    define ?s 1
  \}
  match =0 a=,b, ?s p \{     ; ?, ?
    i=a
    verify.i a               ; i, ?
    verify.u16 a
    if b eq y                ; i, y
      if a<256               ; zero page, y
        mode=101b
        size=1
      else                   ; absolute, y
        mode=111b
        size=2
        if name eq stx
          'Error: ' a
        end if
      end if
    else
      'Error: ' b
    end if
    define ?s 1
  \}
  match =0 a, ?s p \{
    i=a
    verify.i i
    verify.u16 i
    if i<256                 ; zero page
      mode=001b
      size=1
    else
      mode=011b              ; absolute
      size=2
    end if
    define ?s 1
  \}
  verify name
  db (aaa shl 5) or \
   (mode shl 2) or 2
  if size=1
    db i
  else
    dw i
  end if
}

macro stx [p] { common o2.6502.c stx, 100b, p }
macro ldx [p] { common o2.6502.c ldx, 101b, p }

;;;;;;;;;;;;;;;;;;;; STY, LDY ;;;;;;;;;;;;;;;;;;;;

; opc =$44     ; immediate (=i)
; opc $AA      ; zero page
; opc $7F, x   ; zero page, x
; opc $4FFF    ; absolute
; opc $88BB, x ; absolute, x

macro o0.6502 name, aaa, [p] {
 common
  local i, mode, size
  define ?s 0
  match =0 ==a, ?s p \{      ; immediate
    i=a
    mode=000b
    size=1
    if name eq sty
      'Error: ' a
    end if
    define ?s 1
  \}
  match =0 a=,b, ?s p \{     ; ?, ?
    i=a
    verify.i a               ; i, ?
    verify.u16 a
    if b eq x                ; i, x
      if a<256               ; zero page, x
        mode=101b
        size=1
      else                   ; absolute, x
        mode=111b
        size=2
      end if
    else
      'Error: ' b
    end if
    define ?s 1
  \}
  match =0 a, ?s p \{
    i=a
    verify.u16 a
    if a<256                 ; zero page
      mode=001b
      size=1
    else                     ; absolute
      mode=011b
      size=2
    end if
    define ?s 1
  \}
  verify name
  db (aaa shl 5) or \
   (mode shl 2)
  if size=1
    db i
  else
    dw i
  end if
}

macro sty [p] { common o0.6502 sty, 100b, p }
macro ldy [p] { common o0.6502 ldy, 101b, p }

;;;;;;;;;;;;;;;;;;;; CPX, CPY ;;;;;;;;;;;;;;;;;;;;

; opc =$44     ; immediate (=i)
; opc $AA      ; zero page
; opc $4FFF    ; absolute

macro o0.6502.a name, aaa, [p] {
 common
  local i, mode, size
  define ?s 0
  match =0 ==a, ?s p \{      ; immediate
    i=a
    mode=000b
    size=1
    define ?s 1
  \}
  match =0 a, ?s p \{
    i=a
    verify.u16 a
    if a<256                 ; zero page
      mode=001b
      size=1
    else                     ; absolute
      mode=011b
      size=2
    end if
    define ?s 1
  \}
  verify name
  db (aaa shl 5) or \
   (mode shl 2)
  if size=1
    db i
  else
    dw i
  end if
}

macro cpx [p] { common o0.6502.a cpx, 111b, p }
macro cpy [p] { common o0.6502.a cpy, 110b, p }

;;;;;;;;;;;;;;;;;;;;;;; BIT ;;;;;;;;;;;;;;;;;;;;;;

macro bit a {
  define ?s 0
  verify.u16 a
  if a eqtype 0
    if a<256    ; zero page
      db $24
      db a
    else        ; absolute
      db $2C
      dw a
    end if
    define ?s 1
  end if
  verify bit
}
Examples that convert ARM instructions to machine code using FASM's powerful preprocessor (distinctive from the assembler aspect of it). Works in FASMW. For reference, see ARMv7 manual. @ prefix can be removed.

Data Processing:

Code: Select all

;;;;;;;;;;;;;;; DATA + ARITHMETIC ;;;;;;;;;;;;;;;;

; 31-28   25 24-21 20 19-16 15-12 987654/3210
; [CCCC/00/I/OPCODE/S/RNXX/RDXX/OPERAND2/XXXX]

;         /I=0: SHIFT R        /IIIIIIII/RMXX/
;         /I=1: SHIFT I        /SHXX/IIIIIIII/

; @mov r0, r1
; @cmp r0, r2
; @adds r0, r1, r2
; @sub:gt r0, r1, r2
; @bics:mi r1, r2, r3, asr r4

numeric it.*,\
 and, eor, sub, rsb, add, adc, sbc, rsc,\
 tst, teq, cmp, cmn, orr, mov, bic, mvn

numeric it.*,\
 ands, eors, subs, rsbs, adds, adcs, sbcs, rscs,\
 tsts, teqs, cmps, cmns, orrs, movs, bics, mvns

; create "data processing" instruction...

macro @dp it, s {
 macro @#it [p] \{
 \common
  local im
  im=0
  syntax 0
  if it in <tst,teq,cmp,cmn,mov,mvn,\
    tsts,teqs,cmps,cmns,movs,mvns>
   match a=,b=,sh, p \\{
    match x y, sh \\\{
     if ?not x in <lsl,lsr,asr,ror>
      'Operand 3 is invalid:' x
     end if
     syntax 1
    \\\}
    if.syntax 0
     'Operand 3 is invalid'
    end if
   \\}
  end if
   syntax 0
  match =0 :x a=,b=,c=,d, \ ; :x a, b, c, <d>
   ?s p \\{
   match sh n, d \\\{
    verify.sh sh, n
    verify.r a
    if n ?is.r ; shx r
     dd (C.\\\#x shl 28) or \
      (it.\\\#it shl 21) or (s shl 20) or \
      (a shl 12) or (b shl 16) or c or \
      (n shl 8) or (SH.\\\#sh shl 5) or 16
    else if n ?is.i ; shx #
     dd (C.\\\#x shl 28) or \
      (it.\\\#it shl 21) or (s shl 20) or \
      (a shl 12) or (b shl 16) or c or \
      (n shl 7) or (SH.\\\#sh shl 5)
    else
     'Unexpected:' n
    end if
    syntax 1
   \\\}
   if.syntax 0 ; :x a, b, c, ri
    verify.r a, b, c
    if ?not c ?is.r & c ?is.i
     verify.u8 c
     im=1
    end if
    dd (C.\\#x shl 28) or \
     (it.\\#it shl 21) or (im shl 25) or \
     (s shl 20) or (b shl 16) or \
     (a shl 12) or c
   end if
   syntax 1
  \\}
  match =0 :x a=,b=,c, \ ; :x a, b, <c>
   ?s p \\{
   match sh n, c \\\{
    verify.sh sh, n
    verify.r a, b
    if n ?is.r ; shx r
     dd (C.\\\#x shl 28) or \
      (it.\\\#it shl 21) or (s shl 20) or \
      (a shl 12) or (a shl 16) or b or \
      (n shl 8) or (SH.\\\#sh shl 5) or 16
    else if n ?is.i ; shx #
     dd (C.\\\#x shl 28) or \
      (it.\\\#it shl 21) or (s shl 20) or \
      (a shl 12) or (a shl 16) or b or \
      (n shl 7) or (SH.\\\#sh shl 5)
    else
     'Unexpected:' n
    end if
    syntax 1
   \\\}
   if.syntax 0 ; :x a, b, ri
    verify.r a, b
    if ?not c ?is.r & c ?is.i ; i
     verify.u8 c
     im=1
    end if
    dd (C.\\#x shl 28) or \
     (it.\\#it shl 21) or (im shl 25) or \
     (s shl 20) or (b shl 16) or \
     (a shl 12) or c
   end if
   syntax 1
  \\}
  match =0 :x a=,b, ?s p \\{ ; :c r, ri
   verify.r a
   if ?not b ?is.r & b ?is.i ; i
    verify.u8 b
    im=1
   end if
   dd (C.\\#x shl 28) or \
    (it.\\#it shl 21) or (im shl 25) or \
    (s shl 20) or (a shl 16) or \
    (a shl 12) or b
   syntax 1
  \\}
  match =0 a=,b=,c=,d, \ ; a, b, c, <d>
   ?s p \\{
   match sh n, d \\\{
    verify.sh sh, n
    verify.r a, b, c
    if n ?is.r ; shx r
     dd (C.al shl 28) or \
      (it.\\\#it shl 21) or (s shl 20) or \
      (a shl 12) or (b shl 16) or c or \
      (n shl 8) or (SH.\\\#sh shl 5) or 16
    else if n ?is.i ; shx #
     verify.u5 n
     dd (C.al shl 28) or \
      (it.\\\#it shl 21) or (s shl 20) or \
      (a shl 12) or (b shl 16) or c or \
      (n shl 7) or (SH.\\\#sh shl 5)
    else
     'Unexpected:' n
    end if
    syntax 1
   \\\}
   if.syntax 0 ; a, b, c, ri
    verify.r a, b, c
    if ?not c ?is.r & c ?is.i
     verify.u8 c
     im=1
    end if
    dd (C.al shl 28) or \
     (it.\\#it shl 21) or (im shl 25) or \
     (s shl 20) or (b shl 16) or \
     (a shl 12) or c
   end if
   syntax 1
  \\}
  match =0 a=,b=,c, ?s p \\{ ; a, b, <c>
   match sh n, c \\\{
    verify.sh sh, n
    verify.r a
    if n ?is.r ; shx r
     dd (C.al shl 28) or \
      (it.\\\#it shl 21) or (s shl 20) or \
      (a shl 12) or (a shl 16) or b or \
      (n shl 8) or (SH.\\\#sh shl 5) or 16
    else if n ?is.i ; shx #
     dd (C.al shl 28) or \
      (it.\\\#it shl 21) or (s shl 20) or \
      (a shl 12) or (a shl 16) or b or \
      (n shl 7) or (SH.\\\#sh shl 5)
    else
     'Unexpected:' n
    end if
    syntax 1
   \\\}
   if.syntax 0 ; a, b, ri
    verify.r a, b
    if ?not c ?is.r & c ?is.i
     verify.u8 c
     im=1
    end if
    dd (C.al shl 28) or \
     (it.\\#it shl 21) or (im shl 25) or \
     (s shl 20) or (b shl 16) or \
     (a shl 12) or c
   end if
   syntax 1
  \\}
  match =0 a=,b, ?s p \\{ ; a, b
   verify.r a
   verify.o b
   if ?not b ?is.r & b ?is.i
    verify.u8 b
    im=1
   end if
   dd (C.al shl 28) or \
    (it.\\#it shl 21) or (im shl 25) or \
    (s shl 20) or (a shl 16) or \
    (a shl 12) or b
   syntax 1
  \\}
  verify @\#i
 \}
}

macro @dp [p] {
forward
 @dp p, 0
 @dp p#s, 1
}

; create 512 instructions:

; 32 (16*2 add/s) * 16 conditions each =
; 512 total variations. "add:c" condition
; syntax avoids writing 512 macros!

@dp and, eor, sub, rsb, add, adc, sbc, rsc,\
 tst, teq, cmp, cmn, orr, mov, bic, mvn

@cmp fix @cmps ; automatic
@cmn fix @cmns
@tst fix @tsts
@teq fix @teqs

; test dp instructions...

macro @test.dp {
 @mov r7, r7     ; a=b
 @and r5, 0FFh   ; a=a&c
 @eor r5, r6, 15 ; a=b^c
 @sub r5, r6, r7 ; a=b-c
 @rsb r5, r6, r7 ; a=c-b
 @add r5, r6, r7 ; a=b+c
 @adc r5, r6, r7 ; a=b+c
 @sbc r5, r6, r7 ; a=b-c
 @rsc r5, r6, r7 ; a=c-b
 @tst r5, r6     ; a&b?
 @teq r5, r6     ; a&b?
 @cmp r5, r6     ; a=b?
 @cmn r5, r6     ; a=-b?
 @orr r5, r6, r7 ; a=b|c
 @bic r5, r6, r7 ; a=b&~c
 @mvn r5, r6     ; a=-b
}

macro @test.dpx {
 @add r5, r6, lsl r7        ; a=b<<c
 @add r5, r6, lsl 7         ; a=b<<i
 @add r0, r3, r5, lsr r7    ; a=b+(c>>>d)
 @adds r0, r3, r5, lsr 7    ; a=b+(c>>>i)
 @add:ne r0, r1, asr r7     ; ne? a=(b>>c)
 @adds:lt r0, r1, asr 7     ; lt? a=(b>>i)
 @add:gt r0, r3, r5, ror r7 ; gt? a=b+(c<>>d)
 @adds:mi r0, r3, r5, ror 7 ; mi? a=b+(c<>>7)
}

macro @inc r { @adds r, 1 }
macro @dec r { @subs r, 1 }

macro @neg r { @rsbs r, 0 }
macro @not r { @mvns r, r }

macro @abs r { ; absolute value
 @teq r, 0     ; if a<0, a=0-a
 @rsbs:mi r, 0
}
Load & Store Multiple:

Code: Select all

;;;;;;;;;;;;;;; LOAD/STORE MULTIPLE ;;;;;;;;;;;;;;

; [CCCC/100P/USWL/RNXX/REGISTERSXXXXXXX]

; P  - Pre-post? Add/subtract offset before
;      (P=1) or after (P=0) transfer
; U  - Up/down. Add (U=1) or subtract offset
; S  - Load PSR or force user mode? 1/0
; W  - Write-back address to base? 1/0
; L  - Load? 1/0

; example: @ldm r0, r1,r2,r3

; [1110/100P/USWL/0000/0000000000001110]
;            1000  r0              321

macro @lsm op, b, [r] {
 common ?rs=0
 forward ?rs=?rs or (1 shl r) ; ?rs|=r
 common dd (0Eh shl 28) or \
  (b shl 16) or (op shl 20) or ?rs
}

; @ldm/@stm = ldmia/stmdb or ldmfd/stmfd
; full descending stack. !=write-back

macro @ldm r, [p] { common @lsm 89h, r, p }
macro @stm r, [p] { common @lsm 90h, r, p }

macro @ldm! r, [p] { common @lsm 8Bh, r, p }
macro @stm! r, [p] { common @lsm 92h, r, p }
Load & Store:

Code: Select all

;;;;;;;;;;;;;;;;;; LOAD & STORE ;;;;;;;;;;;;;;;;;;

; [CCCC/01IP/UBWL/RNXX/RDXX/IIIIIIIIIIII]

; I  - Immediate (1) or register/shift (0)?
; P  - Pre-post? Add/subtract offset before
;      (P=1) or after (P=0) transfer
; U  - Up/down. Add (U=1) or subtract offset
; B  - Byte? 0=Word, 1=Byte
; W  - Write-back address to base?
; L  - Load or store? 1=Load. 0=Store
; RN - Base register
; RD - Source/destiny
; II - 12BIT offset or shift+r (S+M)

macro @ls cc, l, z, r, [p] {
 common
  local ?a, ?b
  ?a=0
  ?b=0
  syntax 0
   match =0 \          ; ldr r, [b, i, sh #]!
    [b=,i=,sh s]=!, \  ; scaled register
    ?s p \{            ; pre-index
    verify.r b, i
    verify.i s
    verify.sh sh, s
    ?a=05Ah or 100000b
    ?b=(b shl 16) or \
     (s shl 7) or \
     (SH.\#sh shl 5) \
     or i
   syntax 1
  \}
  match =0 [b=,o]=!, \ ; ldr r, [b, o]!
   ?s p \{
   verify.r b
   if o ?is.r          ; register pre-index
    ?a=078h or 10b     ; ldr r, [b, ri]!
    ?b=(b shl 16) or o
   else if o ?is.i     ; ldr r, [b, #]!
    verify.12 o
    if o<0
     ?a=050h or 10b    ; negative
     ?b=(b shl 16) \
      or (0-o)
    else
     ?a=058h or 10b    ; positive
     ?b=(b shl 16) \
      or o
    end if
   else
    'Unexpected:' o
   end if
   syntax 1
  \}
  match =0 \           ; ldr r, [b, i, sh #]
   [b=,i=,sh s], \     ; scaled register
   ?s p \{             ; post-index
   verify.r b, i
   verify.i s
   verify.sh sh, s
   ?a=078h
   ?b=(b shl 16) or \
    (s shl 7) or \
    (SH.\#sh shl 5) \
    or i
   syntax 1
  \}
  match =0 [b]=,-n, \  ; ldr r, [b], -#
   ?s p \{             ; negative immediate
   verify.r b          ; post-index
   verify.i n
   verify.12 n
   ?a=040h
   ?b=(b shl 16) or n
   syntax 1
  \}
  match =0 [b]=,o, \   ; ldr r, [b], o
   ?s p \{
   verify.r b
   if o ?is.r          ; ldr r, [b], ro
    ?a=068h
    ?b=(b shl 16) or o ; register post-index
    syntax 1
   else if o ?is.i     ; ldr r, [b], #
    verify.12 o
    if o<>0
     ?a=048h
    else
     ?a=058h
    end if
    ?b=(b shl 16) or o ; immediate post-index
    syntax 1
   else
    'Unexpected:' o
   end if
  \}
  match =0 [b=,o], \   ; ldr r, [b, o]
   ?s p \{
   if o ?is.r          ; ldr r, [b, ri]
    ?a=078h            ; register post-index
    ?b=(b shl 16) or o
   else if o ?is.i     ; ldr r, [b, #]
    verify.12 o        ; immediate post-index
    if o>=0            ; positive
     ?a=58h
     ?b=(b shl 16) \
      or o
    else               ; negative
     ?a=50h
     ?b=(b shl 16) \
      or (0-o)
    end if
   else
    'Unexpected:' o
   end if
   syntax 1
  \}
  match =0 [b], \      ; ldr r, [b]
   ?s p \{
   ?a=58h
   if b ?is.r          ; base=register
    ?b=(b shl 16)
   else if b ?is.i     ; immediate
    verify.12 b        ; pc relative
    ?b=($-_begin_)-b+8
    if ?b>=0           ; positive
     ?a=51h
    else               ; negative
     ?a=59h
     ?b=0-?b
    end if
    ?b=?b or \
     (0Fh shl 16)      ; base=pc
   else
    'Unexpected:' b
   end if
   syntax 1
  \}
 verify ls
 dd (C.#cc shl 28) or (z shl 22) or \
  (l shl 20) or (?a shl 20) or \
  (r shl 12) or ?b
}

macro @ldr [p]  { common @ls al, 1, 0, p }
macro @str [p]  { common @ls al, 0, 0, p }

macro @ldrb [p] { common @ls al, 1, 1, p }
macro @strb [p] { common @ls al, 0, 1, p }

macro @test.ls {
 @ldr r0, [123h]
 @ldr r0, [-123h]
 @ldr r0, [r1]
 @ldr r0, [r1], 1
 @ldr r0, [r1], -1
 @ldr r0, [r1, r2]
 @ldr r0, [r1, 1]
 @ldr r0, [r1, -1]
 @ldr r0, [@15, 0ABCh]
 @ldr r0, [@15, -0ABCh]
 @ldr r0, [r7, 0ABCh]
 @ldr r0, [r7, -0ABCh]
 @ldr r0, [r1, r2, lsl 3]
 @ldr r0, [r1, 777h]!
 @ldr r0, [r1, -777h]!
 @ldr r0, [r1, r2]!
 @ldr r0, [r1, r2, lsl 3]!
}

macro @test.lsb {
 @ldrb r5, [r7]
 @strb r5, [r7]
}
Some easy ones:

Code: Select all

; software interrupt: swi/svc

macro @swi n {
 verify.n n, 0, 0FFFFFFh
  dd 0EF000000h or n
}

macro @svc n { @swi n }

; breakpoint

macro @bkpt n {
 verify.cpu ARM.v5T
 verify.i n
 verify.u16 n
 dd 0E1200000h or \
  (((n shr 4) and 0FFFh) shl 8) \
  or (n and 0Fh) or (7 shl 4)
}
colinh: That's correct for GCC. This defines the assembler from scratch in one small file.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Tue Mar 11, 2014 3:10 am

Hi m3ntal7
Why not FASM?
You are preaching to the converted, I absolutely love FASM:
I use revolution's FASMARM now for all my ARM assembly programming (namely GBA, NDS, & Raspberry Pi).
Also the main FASM I use for all my X86/X64 ASM programming.

There are several reasons I use byuu's bass assembler for all my SNES work:
A) It includes an integrated SNES Audio Processor Unit (Sony SPC-700) assembler, which is a must if you want sound to come out of your SNES.
B) The assembler has very useful functionality for text translation work.
C) I have helped byuu test this assembler since it's infancy, so I am biased towards using it as he has added all the functionality I wanted inside it, also I like to help to promote it as it is currently the worlds best SNES assembler.

However if you can prove to me that a bug free FASM SNES Sony SPC-700 APU assembler exists, I would be happy to try it out =D
Did you know, FASM can used for programming ANY CPU/system
I really need to checkout the FASM Z80 for my Gameboy & Master System work, also I did not know there was a m68000 variant, would be great to test my Amiga/Megadrive/NeoGeo/x68000 asm work with that.

There is one thing that I want most of all from the FASM community, a perfect MIPS assembler (as good as FASMARM) that can correctly encode all opcodes right upto the last MIPS HW.
There was a FASM MIPS port but it is discontinued & only handles the 32-bit opcodes, so I can not use it for my N64 asm development, which is a real shame :(

Thanks for your examples on howto add new opcodes to FASM, I have had to make similar macro's to add RDP command opcodes to draw alpha blended textured quads and triangles in bare metal MIPS on the N64. Did you know any assembler with decent macro support can add opcodes from any cpu for use in it's functionality.

m3ntal7
Posts: 12
Joined: Sun Mar 09, 2014 9:17 am

Re: Bare Metal Examples

Tue Mar 11, 2014 4:50 am

Thanks for your comments and examples.
I use revolution's FASMARM now for all my ARM assembly programming (namely GBA, NDS, & Raspberry Pi).
Please post ARM examples in the Non-X86 forum to get users interested in Raspberry PI. I hate being the only one who does this.
m68000 variant, would be great to test my Amiga/Megadrive/NeoGeo
Template for EASy 68k Assembler+Simulator:

Code: Select all

;;;;;;;;;;;;;;;;;;; EASY 68K ;;;;;;;;;;;;;;;;;;;;;

; to test, press F9 or click "Assemble" (right
; arrow). in the message box that appears,
; click Execute. in Sim68k, press F9 or "Run"

; warning: identing is critical. certain
; lines MUST be indented by 2+ spaces or it
; will display an error: "Invalid opcode"

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

START ORG $1000
  
  bsr test_memset
  bsr newl
  bsr test_memcpy
  bsr newl
  bsr test_strcpy
  bsr newl
  bsr test_strlen

SIMHALT

; push/pop all except a7/sp.
; movem.l=load/store multiple

pusha macro
  movem.l d0-d7/a0-a6, -(a7)
  endm

popa macro
  movem.l (a7)+, d0-d7/a0-a6
  endm

; memcpy(a0, a1, d0)

memcpy:   
  cmp.l #4, d0         ; d0<4?
  blo mc8
  movem.l d0, -(a7)    ; push d0
  lsr.l #2, d0         ; n/4
  mc32:
   move.l (a1)+, (a0)+ ; (u32) *a0++=*a1++
   sub.l #1, d0        ; until 0
  bne mc32
  movem.l (a7)+, d0    ; pop d0
  and.l #3, d0         ; modulo 4
  mc8:
   move.b (a1)+, (a0)+ ; (u8) *a0++=*a1++
   sub.l #1, d0        ; until 0
  bne mc8
  mce:
  rts                  ; return

; memset8(a0, d0, d1)
  
memset8:   
  cmp.l #0, d1         ; d0=0?
  beq ms8e
  ms8:
   move.b d0, (a0)+    ; (u8) *a0++=d0
   sub.l #1, d1        ; until 0
  bne ms8
  ms8e:
  rts                  ; return

; memset16(a0, d0, d1)
  
memset16:
  cmp.l #0, d1         ; d0=0?
  beq ms16e
  ms16:
   move.w d0, (a0)+    ; (u16) *a0++=d0
   sub.l #1, d1        ; until 0
  bne ms16
  ms16e:
  rts                  ; return

; memset32(a0, d0, d1)
  
memset32:
  cmp.l #0, d1         ; d0=0?
  beq ms32e
  ms32:
   move.l d0, (a0)+    ; (u32) *a0++=d0
   sub.l #1, d1        ; until 0
  bne ms32
  ms32e:
  rts                  ; return

; strlen(a0); return in d0

strlen:
  move.l #0, d0
  sl:
   move.b (a0)+, d1    ; (u8) d1=*a0++
   add.l #1, d0        ; d0++
   cmp.b #0, d1        ; until 0
  bne sl
  sub.l #1, d0         ; d0--
  rts                  ; return

; strcpy(a0, a1)

strcpy:
  move.b (a1)+, d0     ; (u8) d0=*a1++
  move.b d0, (a0)+     ; *a0++=d0
  cmp.b #0, d0         ; until 0   
  bne strcpy           ; copy
  rts                  ; return

; new line

newl:
  lea crlf, a1         ; new line
  move #14, d0         ; task=14, display text at a1
  trap #15             ; irq
  rts                  ; return

; testing...

test_memcpy:
  lea buffer, a0       ; a0=&buffer
  lea text, a1         ; a1=&text
  move.l #11, d0       ; d0=#
  bsr memcpy           ; copy
  lea buffer, a1       ; a1=&buffer
  move #14, d0         ; task=14, display text at a1
  trap #15             ; irq
  rts                  ; return

test_memset:
  lea buffer, a0       ; a0=&buffer ; initialize
  move.l #0, d0        ; d0=value
  move.l #128, d1      ; d1=#
  bsr memset8
  lea buffer, a0       ; a0=&buffer ; copy 7 A's
  move.l #$41, d0      ; d0=value
  move.l #7, d1        ; d1=#
  bsr memset8
  lea buffer, a1       ; a1=&buffer
  move #14, d0         ; task=14, display text at a1
  trap #15             ; irq
  rts                  ; return
  
test_strlen:
  lea text, a0         ; a0=&text
  bsr strlen           ; length
  move.l d0, d1        ; d1=d0
  move #3, d0          ; task=3, display number in d1
  trap #15             ; irq
  rts                  ; return

test_strcpy:
  lea buffer, a0       ; a0=&buffer
  lea text, a1         ; a1=&text
  bsr strcpy           ; copy
  lea buffer, a1       ; a1=&buffer
  move #14, d0         ; task=14, display text at a1
  trap #15             ; irq
  rts                  ; return

  ORG   (*+3)&-4       ; align

buffer ds.b 128
number dc.l 1234567
text dc.b 'EXAMPLE123', 0
crlf dc.b $0D, $0A, 0

  END     START

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; example function with parameters

      offset  4+4     ; parameters
sum   ds.l    1 
n2    ds.l    1 
n1    ds.l    1 
      org     *       ; end offset

adder:
  link a0, #0         ; create stack frame
  move.l d0, -(sp)    ; save d0
  move.l (n1,a0), d0  ; access n1 parameter
  add.l (n2,a0), d0   ; add n2 parameter
  move.l d0, (sum,a0) ; store result
  move.l (sp)+, d0    ; restore d0
  unlk a0             ; destroy stack frame
  rts
There is one thing that I want most of all from the FASM community, a perfect MIPS assembler (as good as FASMARM) that can correctly encode all opcodes right upto the last MIPS HW. There was a FASM MIPS port but it is discontinued & only handles the 32-bit opcodes, so I can not use it for my N64 asm development, which is a real shame :(
MIPS is nowhere near as popular as ARM so it's unlikely that anyone else will do it. As the saying goes, "If you want something done...". MIPS encoding scheme is actually easy compared to ARM, X86 and M68k. Example (MIPS32):

Code: Select all

; [31-26][25-21][20-16][15-0]
; OPCODE  BASE   DEST  OFFSET (SIGNED)

lb rt, offset(rs)
Did you know any assembler with decent macro support can add opcodes from any cpu for use in it's functionality.
I'm working on something like this where you can select the CPU, model, OS, language, etc and it uses FASM as the assembler. Click for Preview.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Wed Mar 19, 2014 5:05 am

I have finally worked out howto initialize the V3D/QPU correctly in bare metal, here is a demo showing all the readable V3D registers initial state:
https://github.com/PeterLemon/Raspberry ... 3D/V3DINIT

You can see from the readout of the first three V3D identification registers that it matches the newly released GPU specs VideoCoreIV-AG100-R.pdf exactly. It took me a while to initialize the V3D registers, as when I 1st tried to read words from the V3D_BASE (0x20C00000) upwards, I just got 0xDEADBEEF for everything...

Here were the steps I took, culminating in this demo:
1. I made a program to search every word in memory (0x00000000 to 0xFFFFFFFC) to find the text string "V3D" (which is the 1st 3 bytes of the V3D_IDENT0 word). The test turned up zero results, so I was then sure I had to turn on the V3D somehow...

2. I delved into VideocoreIV programming and created a blinker bootcode.bin demo using the wonderful beta vasm http://www.ibaug.de/vasm/vasm.tar.gz using the blinker source from Herman H Hermitage. I then tried to read the V3D_BASE register (set to 0x7EC00000 because we are on the VideocoreIV) and blink the led, but any read from that region crashed the R-Pi, and the LED did not flash.

3. Back on the ARM side I started to trawl thru the Linux sources to see how they turn on the V3D, after a day of work I had managed to crash the BUS of the V3D enough so that instead of showing me 0xDEADBEEF, it crashed the R-Pi with any read from V3D peripheral region of memory only, possibly making it into the same state as my bootcode.bin demo.

4. I had noticed from my Tags Channel demo https://github.com/PeterLemon/Raspberry ... agsChannel
that in bare metal the V3D is in an "Enabled" state, but the frequency is set to zero, so I forced the clock frequency of the V3D to 250MHz using (250 * 1000 * 1000) which I had found in Linux sources. I then tried reading the V3D registers but still got 0xDEADBEEF...

5. Finally I remembered that Herman H Hermitage had done a Fast Fourier Transform demo using the QPU and I started to look at the sources... I found that he used 2 new totally undocumented Mailbox Property Interface tags, namely 0x30011 = Execute_QPU, & 0x30012 = Enable_QPU. So I ran the tag "Enable_QPU" using 1 as my data, to set it to enable, and then tried reading the V3D registers and magically it kicks it into place!

This is really neat as my main V3D initialization is 2 simple concatenated tags of: (A) Set V3D Clock To 250Mhz, (B) Enable The QPU =D
A massive thankyou to Herman H Hermitage, without your help I would not have got it to work!!

I hope you guys like this, as it is a stepping stone to fully alpha blended textured shaded 2D & 3D GFX in bare metal on the Raspberry Pi, which will make the homebrew scene on this wonderful HW explode!

For my next V3D demo, I will try to show you guys howto run Control Lists, so we can use OpenGLES style functions to draw HW GFX to the screen =D

P.S We need to put those 2 new QPU Mailbox Property Tags onto some wiki or document, as they were so hard to find, and they are very important.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Tue Apr 08, 2014 11:39 am

As promised, here is my 1st Control Lists demo running the simplest V3D GFX opcode Clear Color:
https://github.com/PeterLemon/Raspberry ... lear_Color

It uses my previous V3D initialization code, and the normal Raspberry Pi Frame Buffer to create the screen context.
I set the start & end address of the Control List to the HW Rendering Thread (1) and it is automatically executed.
As the videocore renders the screen using tiles, I have shown how the screen is built up using the Control List byte coding for Tile Columns & Rows.

The frame buffer used is 640x480 32BPP, so each tile is 64x64 pixels, and needs 10 Tile Columns (X), by 8 Tile Rows (Y) to draw a full screen.
The color used in the Clear Color is always 64-bits of data, so I have shown the stripes effect of having a Red & Yellow color.

I think this is the 1st working bare metal 3D HW GFX demo on the Raspberry Pi...
It could also be the fastest way possible to clear the screen to a specific color!

For my next task I will try to link up fragment/vertex shaders, & vertex data to show a triangle on the screen =D

Siekmanski
Posts: 8
Joined: Mon Apr 28, 2014 11:14 pm
Location: Netherlands

Re: Bare Metal Examples

Wed Apr 30, 2014 1:17 pm

I hope you guys like this, as it is a stepping stone to fully alpha blended textured shaded 2D & 3D GFX in bare metal on the Raspberry Pi, which will make the homebrew scene on this wonderful HW explode!
krom, astonishing !!!

I'm glad you're sharing all this great info to the community.
Thanks man! :D
Can't wait to see some cool HW GFX demos in the future.

Marinus

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Thu May 01, 2014 3:19 pm

Hi Marinus, & welcome to the Raspberry Pi Community =D
Thanks for your post, it helps me to get inspiration to code!!
Also we must also thank other people that have helped me get to this stage: Namely my m8 Dex & the great Herman H Hermitage

Here is an update of my work on getting a single triangle on the screen:
1. I started at the top trying to get opengl style control list opcodes working, but they need vertex & fragment shaders, with the 3D X,Y,Z vertex arrays stored outside of the control list. (I gave up on this because there were too many different things that could go wrong!)

2. I then started at the bottom, trying to get an OpenVG triangle on the screen, which only requires a single fragment shader, and 3 2D X/Y vertex coordinates stored inline inside the Control List byte code. The documentation is much better for this type of triangle, & I have been able to fix up some mistakes in my LIB/CONTROL_LIST.INC file:

The mistakes were todo with the flags of the primitive types, which were confusingly documented...
The way I debugged the primitive types is because I have clear_screen working: Crashes in the GFX pipeline makes the clear color tiles not render. In OpenVG inline control list vertex lists, Triangles have 3 X/Y points with a special word termination code at the end of the 3rd vertex to end the list. And RHT's have 2 X/Y points with a special word termination code at the end of the 2nd vertex to end the list.

When I first tested OpenVG inline triangles with my old CONTROL_LIST include file, the Clear Color never displayed, so I knew my primitive types flags were messed up. When I fixed the primitive types flags, both the Triangle and RHTs work as the document describes, e.g the Clear Color displays meaning there was no crash in the GFX pipeline any more. (When I put only 2 verts in triangle it crashes, and when I put 3 in RHT's it crashes, so I think I have the primitive type flags working correctly)

Also I have tested Clockwise_Primitives, as the pipeline crashes when I use counter clockwise primitive verts (I just swap the Y coordinates to test this)

3. So right now I have invisible 2D Triangles & RHTs working, & now I am stuck on the final piece of the puzzle, the Fragment Shader required to fill the color of the triangle. I have gleaned lots from the documentation, & I am learning the QPU opcodes as we speak, but it will take me a while to get it working :( I have tested that the QPU Fragment Code is executing correctly by writing random rubbish bytes into the code area, which crashes the GFX Pipeline (e.g No Clear Color)
Then I have placed simple test QPU 64-bit code "NOP's" and this does not crash the pipeline =D

I have decided to place all my current OpenVG Triangle code on my GITHUB, as there are more clever people than me on these boards, who may be able to help: https://github.com/PeterLemon/Raspberry ... G_Triangle
All you need todo is place the correct QPU Fragment binary program code under the Fragment Shader label, and we should see a colored Triangle =D You can also try out all my tests above, I have made the screen go Red when there is a GFX pipeline crash, and the clear color is Green, to show when it is running correctly.

Also this is a big request, can someone please make a perfect QPU assembler using the VideoCoreIV-AG100-R.pdf that can produce simple binary portions of QPU code. (This would help me get it working in no time!)

Siekmanski
Posts: 8
Joined: Mon Apr 28, 2014 11:14 pm
Location: Netherlands

Re: Bare Metal Examples

Thu May 01, 2014 7:45 pm

Hi krom,
I just own a Raspberry Pi for a week now, still reading a lot to get started with assembly.
Did assembly on the Amiga, PC ( OpenGl & Direct3D in assembly ) and ATmega microcontrollers.
As i understand the info to use the 3D HW is not available to the public?

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Thu May 01, 2014 8:08 pm

Siekmanski wrote:As i understand the info to use the 3D HW is not available to the public?
Hi Siekmanski,
The Docs used to be non public, but a couple of months back, Broadcom gave us this pdf file with lots of info, which I have been using for all my GPU HW demos: http://www.broadcom.com/docs/support/vi ... G100-R.pdf

Great to hear you coded on the Amiga, I have just started coding in 68000 asm on that system again myself, using a Gotek USB Disk Drive emulator, with modified firmware plugged into an A600 =D

Siekmanski
Posts: 8
Joined: Mon Apr 28, 2014 11:14 pm
Location: Netherlands

Re: Bare Metal Examples

Thu May 01, 2014 9:16 pm

Hi krom,
Thanks for the Broadcom pdf file.
Yeah, the good old Amiga days.......
Started in 1989 coding assembly with the Devpac assembler.
We did "Bare Metal" coding already in those days.
Writing a (sector)track-loader to the boot-sector, it loaded our program to a specific memory address and then jump to that address to start our program.
I still have the Amiga Hardware Reference Manual to get directly to the hardware.
It was all new and exiting and has been a great experience with a lot of fun.

krom
Posts: 61
Joined: Wed Dec 05, 2012 9:12 am
Contact: Website

Re: Bare Metal Examples

Mon Jul 21, 2014 4:43 pm

Hi guys, sorry for the lack of updates... I have been helping with a new N64 emulator (cen64) & needed todo loads of cpu tests for the system: https://github.com/PeterLemon/N64/

I did not want to leave the Raspberry Pi Bare Metal community out of the loop, so I wrote this simple Nintendo Game Boy CPU Emulator that passes all of GB-Z80 CPU tests by blargg: https://github.com/PeterLemon/Raspberry ... MU/GameBoy

I thought this would be a fun example of how you can emulate an old 8-Bit system, using 100% ARM assembly:
It uses a static GB-Z80 opcode table which I can use to shift to the correct opcode, because I have aligned each one by the shift amount. I then use blx to run the CPU instruction.
I also use a minimum of loads & stores, as I devote ARM registers to contain the main GB-Z80 registers.
I emulate the Game Boy Background screen using DMA 2D Mode & Stride
It is early days for commercial roms to run, but a few do show title screens etc already, which is a surprise because I have only coded the bare minimum of the system to pass all these tests!
(It is also limited to 32Kb roms, because I am not emulating the Game Boy banking modes)

I am going be working on my V3D bare metal assembly triangle demo from scratch using the hackdriver demo by phire: https://github.com/phire/hackdriver
It contains everything we need, including the exact QPU fragment code needed to display a triangle on the screen =D

Return to “Bare metal, Assembly language”