Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#200028 - 22/01/2004 16:15 Bit/Binary Help..
foxtrot_xray
addict

Registered: 03/03/2002
Posts: 687
Loc: Atlanta, Georgia
Hey guys, was wondering if someone could translate some of this...

Looking at the definitions for the "Vorbis Comments" (in Ogg/FLAC files). The definition is as follows:

1} [vendor_legnth] = read an unsigned int of 32 bits
2} [vendor_string] = UTF-8 vector as [vendor_legnth] octets
3} [user_comment_list_legnth] = unsigned int of 32 bits
loop [user_comment_list_legnth] times:
4} [legnth] = unsigned int of 32 bits
5} [vector_data] = UTF-8 vector as [legnth] long
end loop
7} [framing bit] = signle bit as boolean


I'm not strong on bit readings.. the terminology in reference to actual usage, is cornfuzzling me, espically when I'm reading the file in a hexeditor (looking at it char-by-char.) .. Okay, so,
the first is 32 bits. That means.. what, looking at the file? At the beginning of the user data (after #2 above, I read:
07 00 00 00 0f 00 00 00

..So, I'm GUESSING that the "07 00 00 00" stands for just "7", since I have 7 comments total (TITLE, ARTIST, ALBUM, TRACKNUM, GENRE, DATE, COMMENT), then the "0f 00 00 00" is simply 15 for the first tag, which is "TITLE=Incognito".

If I'm reading that right, fine and dandy. However, after the comments comment, I have: 81 00 0f 80 .. Do these mean anything? Can I ignore them if all I'm trying to do is grab the info?

Also, for the beginning of the "Comment" section, it reads:

Comment header logically is a list of eight-bit-clean vectors; the number of vectors is bounded to 2^32-1 and the length of each vector is limited to 2^32-1 bytes.

Uhm.. english? Is there a tag/byte sequence I can search for to find the beginning of the whole comment section??

Thanks, anyone that can shed some light.
Me.
_________________________
Mike 'Fox' Morrey 128BPM@124MPH. Love it! 2002 BRG Mini Cooper

Top
#200029 - 23/01/2004 22:34 Re: Bit/Binary Help.. [Re: foxtrot_xray]
number6
old hand

Registered: 30/04/2001
Posts: 745
Loc: In The Village or sometimes: A...
From reading your comments and the libvorbis source code you misunderstand the structure slightly.

The actual C code from Libvorbis that reads a comment block from the file is as follows:
I have matched the steps I document below with the code here by adding
comments to the code in bold.
In reply to:



static int _vorbis_unpack_comment(vorbis_comment *vc,oggpack_buffer *opb){
int i;
// Step 1 (see below)
int vendorlen=oggpack_read(opb,32);
if(vendorlen<0)goto err_out;
//Step 2
vc->vendor=_ogg_calloc(vendorlen+1,1);
_v_readstring(opb,vc->vendor,vendorlen);
// Step 3 (get number of comment strings "n")
vc->comments=oggpack_read(opb,32);
if(vc->comments<0)goto err_out;
vc->user_comments=_ogg_calloc(vc->comments+1,sizeof(*vc->user_comments));
vc->comment_lengths=_ogg_calloc(vc->comments+1, sizeof(*vc->comment_lengths));

// Step 4

for(i=0;i<vc->comments;i++){
// Step 4a
int len=oggpack_read(opb,32);
if(len<0)goto err_out;
vc->comment_lengths=len;
// Step 4b
vc->user_comments=_ogg_calloc(len+1,1);
_v_readstring(opb,vc->user_comments,len);

// Repeat "n" times
}

// Step 5
if(oggpack_read(opb,1)!=1)goto err_out; /* EOP check */

// Must have valid comments block to get to here
return(0);

// error handling for "invalid" comments block
err_out:
vorbis_comment_clear(vc);
return(OV_EBADHEADER);
}




What this code does is as follows - matches the Step n comments above:

(step) 1. Read a 32 bit [4 bytes long] integer which is the length of the "Vendor" string in the comment block.
If this is < zero it stops processing right there as its an invalid comments block.

2. Then it reads the number 8 bit bytes (called octets in the documentation) indicated from step 1 into a buffer as the "vendor string" and stores it.

3. Then it reads a 32 bit [4 bytes long] unsigned integer, which contains the number of comment entries in the file. This number is called "n" for simplicity sake.

If n is zero processing stops there as its a valid but empty comment header.

4. Then it repeats steps 4a & 4b, the "n" times as indicated by the number read in step 3:

4a. Read a 32 bit number from the file, which contains the length of the [next] comment "string". if this number is < zero, then stop processing now - bad comments block.

4b. If number read in step 4a >=0 then read the number of bytes (octets) indicated by the 32 bit number in step 4a and store it as comment string#n

Steps 4a & 4b repeat until all "n" comments have been read.

In the code "n" (variable i) starts at 0 and increments by 1 each loop while i < the "n' (the value read in step 3), which is a C code way of saying, it loops "n" times as indicated by value of "n" obtained in step 3]

5. It reads the last byte and checks its value for "EOP", which is the framing bit and should be 1, if its not, then the comments block is invalid.

Thats all you need to do.

Each comment will be in the form XXX=YYYYYYY
where XXXX is a string that may repeat e.g. "Artist". YYYYY is the value for that XXXX, e.g. "Alison Krauss", "Hank WIlliams".
In the file, the comments string will be preceded by a 4 byte (32 bit) integer that indicates how long the comment is.
The last byte of a Comments block is 1 to indicate that fact.
But which "1" in the file is the end of the block can only be determined once all the preceding bytes are read and decoded correctly.

There is also no "magic" string to search for to unambiguously locate a comments block in a file as the "header" is not fixed in size or value.

You need to process the comments block as a unit.
Unless you start at the beginning of the comments block you cannot determine unambiguously where the end of the comments block is.

Note: It is fully legal to have a "empty" comments block, this is defined as a comment block with a valid (>=0) Vendor string length , with 0 or more comment strings in it. Any comment strings could have a length of 0 implying that that comment string is not present.

If you're still confused by the above, how about posting a hex dump of the the start [say first 512 bytes] of the file you're looking at so we can explain it in more detail for you.

A minimum comments block with no comments or vendor string would be 9 bytes long - 8 bytes of 0 (two 32 bit integers values of zero), followed by a byte of 1 to indicate valid EOP. However you could have some equally legal variations of that.

A "quick & dirty" way of checking if the file has ANY comments in it, is to check if the first 9 bytes of the comments block was = to (in hex) "00 00 00 00 00 00 00 00 01", then you'd know there weren't any comments in the file! as that the smallest legal comments block you can have.
If it didn't match this then you'd have to parse the comments block to determine the comments and where they ended.



Top