delimiters
File created: code/delimiters_must_die [Diff]
-- /dev/null++ b/code/delimiters_must_die
@@ -1 +1,83 @@
h1. On delimiters
h2. Escaping from backslash hell
OK, so you want to use simple comma-separated format to store your data.
bc. one,two,three,four
five,six,seven,eight
Good! Simple and clean, human readable too. You separate entries with , and series with newlines (\n). But then, you need to store comma inside of one of values. So you decide to escape commas with \, like all decent people do.
bc. one\,two,thr\,ee,four
five,six,seven,eight
Blech! Your parser cannot simply split lines by ",". It must check if \ doesn't precede it, and if it does, then strip \. But that still works.
Now, you see that \ can be encountered too, and maybe even directly before ,. So, let's escape it with itself.
bc. one\\\,two,thr\,ee,four
five,six,seven,eight
Three slashes! Isn't that fancy? Not much changed for your parser, you just tell him to strip one backslash from \\.
Now to think of it, newlines can be encountered inside entries too. So, let's make it \n (and \r for these obscene OSes).
bc. one\\\,two,thr\,ee,four
five,\\nsix,se\nven,eight
Senven? What the fuck.
Well, if you made it similar so far, congrats, you are a decent man. If not, you might have used quotes.
bc. one\\\,"two thr\,ee",four
five,\\nsix,se\nven,e\"ig\"ht
Such a fine backslash soup! Now, imagine you would want to pack all this inside of another CSV entry. You get something like this:
bc. one\\\\\\\,\"two thr\\\,ee\"\,four\nfive\,\\\\nsix\,se\\nven,e\\"ig\\"ht
Well, I made up this example, but try coding in shell (which involves), and you'll understand all this.
h2. I Will Never Encounter This Set Of Bytes
Let's make up a bizarre, totally random string. It will never ever appear in our data, I'm assuring you. We'll start our entry with it and end with it.
bc. %%%%%%%%%%DATA BOUNDARY srfg345632rfefh56t34freg56y43rffgmy/dev/urandomsays hello#$^#$%TR%%%%%%%%%%%
SRgwerg24yg!#RG@2365u246jh4fgb345ik54y245g56u234rgfw43r8ty2348we9fuhg309ekxc09w3fu8tu32598jf03928qrg2938rhy093rjg293riyjg92384fj8934rjhg28975y 10wejmwodkvnn32w9048hjfq 3984hf9q38hf 398rh 93q8r hg98q2hr 9g813h9rthg9 3rhf98h219hgf1923gh9 qhf91jhgh1
%%%%%%%%%%DATA END srfg345632rfefh56t34freg56y43rffgmy/dev/urandomsays hello#$^#$%TR%%%%%%%%%%%
Know what? IT FUCKING WILL APPEAR. And if you want your system not to fail miserably, you have to scan through all this data and make sure it's not there. Not worth it. And anyway, scanning for this string is rather complicated.
Another good example (besides "HTTP multipart boundary":http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html mocked above) is "CDATA":http://en.wikipedia.org/wiki/CDATA.
h2. Taboo delimiter
People will never ever need ASCII 0 in their strings! I assure you! Let's use it as delimiter. No other options.
h2. Tolerable delimiting
Is implemented in "JSON":http://www.json.org/. It uses backslashes plus very limited set of what can follow them. The format is quite readable and writeable by humans and parser-friendly. And also its page has nice graphics, I'd like to be able to make such myself.
h2. (in search of) Perfect delimiting
If you need simple strings, that will not encounter one character, you can delimit with that character. But for god's sake, do not try to allow strings do contain this character escaped.
bc. very long value
There is \n at and.
If you need byte strings that can contain any byte, specify length before data.
bc. 64 �d��W��uu&f(�69��須��?K4{u�
�@�����Ӌ*�yT��O;��|ÑZT}����Kn�
52 �d��W��uu&�d��
��uu&�d��W��uu&�d��W��uu&
�d��W��uu&}
Lines _start_ with \n, there is single space after numbers, numbers consist of 0-9.
So, to summarize it: very strict format, *NO* escaping, taboo *OR* skip-n-bytes delimiting.
This kind of escaping is implemented in my serialization format called [[transfer]].
-- /dev/null++ b/code/transfer
@@ -1 +1,50 @@
h1. "Transfer" data transfer protocol
This protocol can be used to transfer associative arrays with bytestrings over any byte transferring connection.
h2. Text mode commands
All fields are separated with single space, 'data' in FLD may contain spaces though.
I will use pseudo-abnf here, because plain abnf sucks. (something) in parentheses denotes "something" field, $something later refers to its value
Definitions:
space = ASCII 32
lf = ASCII 10
non-lf = anything but lf
non-space = anything but space
h3. MOD module_name
modCommand = "MOD" space *non-lf
Means start of data structure named 'module_name'. If another module have already been started, throw error.
h3. FLD name data
fldCommand = "FLD" space *non-space space *non-lf
Textual field 'name' with 'data' as content. 'data' may contain any character except of \n. Used for transferring small amount of data without linebreaks.
h3. DAT name size
datCommand = "DAT" space *non-space(name) space *digit(size) lf *<$size>any-byte(data)
Initiates data mode for field 'name'. Right after delimiting \n recipient should start reading data and switch back to lines mode after exactly 'size' bytes of data.
h3. END
Ends a module
Example of correct structure transfer:
bc. MOD query
FLD hops 1
FLD query hash
DAT args 64
�d��W��uu&f(�69��須��?K4{u��@�����Ӌ*�yT��O;��|ÑZT}����Kn��
END
Example of protocol implementation in Haskell can be seen here: [http://git.bitcheese.net/?a=summary&p=transfer]
-- a/hellnet/protocols/transfer++ /dev/null
@@ -1,39 +1 @@
h1. DEPRECATED
This page contains deprecated information. This has been replaced with [[http|HTTP interfaces]].
h1. HellNet data transfer protocol
This protocol is used to transfer data structures between HellNet nodes.
Conversation consists of lines, separated by \n (text mode) or data chunks (data mode).
h2. Text mode commands
All fields are separated with single space, 'data' in FLD may contain spaces though.
h3. MOD module_name
Means start of data structure named 'module_name'. If another module have already been started, throw error.
h3. FLD name data
Textual field 'name' with 'data' as content. 'data' may contain any character except of \n. Used for transferring small amount of data without linebreaks.
h3. DAT name size
Initiates data mode for field 'name'. Right after delimiting \n recipient should start reading data and switch back to lines mode after exactly 'size' bytes of data
h3. END
Ends a module
Example of correct structure transfer:
bc. MOD query
FLD hops 1
FLD query hash
DAT args 64
�d��W��uu&f(�69��須��?K4{u��@�����Ӌ*�yT��O;��|ÑZT}����Kn��
END