Warning: this information reflects my first attempt at using Connected Text (a personal wiki system) to capture useful information, and post it to the web. The information here is not particularly polished at present.
| Table of Contents |
We have a file that has many records that look like the following:
@@authors C. Gentry @@title Key Recovery and Message Attack on NTRU-Composite @@title_link=http://www.iacr.org/archive/eurocrypt2001/20450181.pdf @@conference=EUROCRYPT @@year=2001 @@pages=182-194 @@publisher=Springer-Verlag @@series=LNCS @@vol=2045 @@location=Innsbruck, Austria @@technology=ntruencrypt @@description The paper presents a clever attack that would work against NTRUEncrypt if NTRUEncrypt were ever deployed with the security parameter N not a prime number. In practice, NTRU Cryptosystems has strongly recommended that N always be prime. Indeed, all commercial implementations conform to NTRU's recommendations, and are immune to this attack. @@abstract NTRU is a fast public key cryptosystem presented in 1996 by Hoffstein, Pipher and Silverman of Brown University. It operates in the ring of polynomials $Z[X]/(X^N-1)$, where the domain parameter $N$ largely determines the security of the system. Although $N$ is typically chosen to be prime, Silverman proposes taking $N$ to be a power of two to enable the use of Fast Fourier Transforms. We break this scheme for the specified parameters by reducing lattices of manageably small dimension to recover partial information about the private key. We then use this partial information to recover partial information about the message or to recover the private key in its entirety. @@-----------------------------------------------------------------------
We want to parse this file, and transform the format in to html (or some other output format).
The general idea is to split the data on each @@. An initial = sign is ignored (it just helps readability when multiple @@'s occur on the same line). If we see a new @@-------, then we go on to a new bibliography item.
| Perl Feature | Example |
| Regular Expressions | if ($i =~ /^([\w-]+)\s*=?\s*/) |
| Sorting | sort { my_sort($a, $b); } @entries |
| Hashes | my %headings = (); |
| Arrays | my @technologies = ( "ntruencrypt", ... ); |
| References | append($', $current_tag, \%current); |
| Dereferences | ${$current_ref}{$current_tag} .= $i; |
| String functions | print "-"x40; |
The C++ features we used (to some degree) in our implementation were:
Text File Conventions
There are certain conventions that go on in C++ implementations, for example the spacing of code (I use tabbing set to 4 spaces (no real tabs), and indenting K&R style).
Also I guess things like the use of #ifndef inside the header files, e.g. Bibitem.h is a convention. And by convention each header file should only #include the necessary header files for its compilation, and such inclusions can be reduced by using forward class declarations. Similarly cpp files should only #include the necessary header files for their compilation (but obviously forward class declarations aren't useful here).
Namespaces
Namespaces should be declared in header files, but never used in header files, i.e. the header files should be explicit about which classes they are using, e.g. see a namespace being declared in Parser.h and the std namespace being used in parse.cpp, and not being used in Bibitem.h, i.e. the STL string class is explicitly written as std::string .
Boost
Boost has many useful classes. In this project it was used for its handling of regular expressions.
An example of
An example of
An example of
Error Handling
#defines
headers cpp
Inheritance
derived
Keywords
static const
STL
vector map ostream sort
Since we went to the bother of writing two implementations of essentially the same idea, it is taking the small amount of time to compare the two. Useful metrics might be:
Development time
I would have to say the C++ took about twice as long to develop as the Perl, even with a working Perl program available first (the seperate files, and more complicated process just took time to get right).
Running time
C++ wins on running time, but actual running time is very small in both cases (since the number of bibliography items is so small).
Debugging time
C++ is more robust code, so is far easier to maintain and debug.
Scaling
I'd like to extend the C++ code much more than the Perl code. However again it will be slightly slow to develop.
Conclusion
Perl is a great prototyping language, and for small one-off tasks (e.g. filtering) it is almost certainly better than C++. If the project is expected to expand, then C++ will become preferable. Obviously you only want to work with one version going forward, since maintaining two independent pieces of code is a pain.