2015年1月13日星期二

ISE using ISIM to do simulation

step:
1, turn Design/ View pannel from implementation to simulation'
2, from Processes pannel, run or rerun Simulate Behavior Model
3, to see memory content
go to inst_memory/ inst/ \nnative../ memory



“Design Summary” Section in the Map Report
Following is an example of the “Design Summary” section in the Map Report, which contains device utilization information:
Design Summary
--------------
Number of errors:      0
Number of warnings:    0
Slice Logic Utilization:
   Number of Slice Registers:                     8 out of  28,800    1%
     Number used as Flip Flops:                   8
   Number of Slice LUTs:                          7 out of  28,800    1%
     Number used as logic:                        2 out of  28,800    1%
       Number using O6 output only:               2
     Number used as Memory:                       5 out of   7,680    1%
       Number used as Shift Register:             5
         Number using O6 output only:             5

Slice Logic Distribution:
   Number of occupied Slices:                     3 out of   7,200    1% (for area count, just use this number)
     Number of occupied SLICEMs:                  2 out of   1,920    1%
     Number of occupied SLICELs:                  1 out of   5,280    1%
   Number of LUT Flip Flop pairs used:            8 (one slice can have more than 1pair of LUT+Flip Flop, so that's why this number is bigger)
     Number with an unused Flip Flop:             0 out of       8    0%
     Number with an unused LUT:                   1 out of       8   12%
     Number of fully used LUT-FF pairs:           7 out of       8   87%
     Number of unique control sets:               1
     Number of slice register sites lost 
       to control set restrictions:               3 out of  28,800    1%

A LUT Flip Flop pair for this architecture represents one LUT paired with one Flip Flop within a slice.  A control set is a unique combination of clock, reset, set, and enable signals for a registered element. The Slice Logic Distribution report is not meaningful if the design is over-mapped for a non-slice resource or if Placement fails. OVERMAPPING of BRAM resources should be ignored if the design is   over-mapped for a non-BRAM resource or if placement fails.

IO Utilization:
   Number of bonded IOBs:                         7 out of     220    3%

Specific Feature Utilization:
   Number of BUFG/BUFGCTRLs:                      1 out of      32    3%
     Number used as BUFGs:                        1

Average Fanout of Non-Clock Nets:                2.08

2015年1月9日星期五

ecc encoder

[1] PUFKY: A Fully Functional PUF-based Cryptographic Key Generator
3.3 Syndrome Generation and Error Decoding for C REP and C BCH
Repetition code C REP . The syndrome generation of x n REP consists of pairwise
XOR-ing x 1 with each remaining bit of x n REP , or h i = x 1 ⊕ x i+1 . Error decoding
is based on a Hamming weight check of the syndrome s n REP −1 , which immediately
yields the value for the first error bit e 1 . The remaining error bits are again
obtained by a pairwise XOR of e 1 with each of the syndrome bits, but this step is
discarded in the syndrome construction. In our design, both syndrome generation
and error decoding of a repetition code are fully combinatorial.
so bch(7,1,3)
encoder:
input: PUF readout {m0, m1, ..., m6} 7-bits
output:
helper data: {m1^m0, m2^m0, ..., m6^m0} 6-bits
decoder: 位于 Pufkey/source/rep_decoder.vhd
input: PUF readout {m'0, m'1, ..., m'6} 8-bit, helper data: {m1^m0, m2^m0, ..., m6^m0} 6-bits
output:
recovered: r = {m0, m0^(m1^m'1), ..., m0^(m7^m'7)} 7-bits
if weight(r) >=4
r = 0
else
r = 1

BCH code C BCH . Since BCH codes are cyclical codes, their syndrome generation
is a finite field division by the code’s generator polynomial. This is efficiently
implemented in hardware as an LFSR evaluation of length (n BCH − k BCH ).

The error decoding step of a BCH code is more complex and requires the
largest design effort of all elements in our secure sketch. Most BCH decoders are
designed with a focus on throughput and use systolic array designs, e.g. [19, 20, 22].
Aiming for a size-optimized implementation, we propose a serialized, minimalistic
coprocessor design with a 10-bit application-specific instruction set and limited
conditional execution support. Although highly optimized towards BCH decoding,
the architecture is generic in the sense that it can decode any BCH code, including
shortened versions, requiring only a slight change of firmware and memory size.
The datapath consists of two blocks: an address and a data block. To optimize
array indexing, all addressing is done indirectly using a five element address
RAM, which is efficiently updated by a dedicated address ALU. The output of the
address RAM is directly connected to the data RAM. The data block consists of
data RAM and an ALU which is used mainly for multiply-accumulate operations
over F 2 u . To minimize the size, this ALU contains only a single register. All other
necessary operands come directly from the data RAM. A high-level overview of
the coprocessor architecture is shown in Fig. 3.



[2] Implementation of BCH Code (n, k) Encoder using lfsr
输入多项式h(x) = h0 + h1*x + h2* x^2 + … hn*x^n
输入次序为 hn,... , h2, h1, h0 从左到右

对于bch(255, 171,11), n应该为254, 即一共输入255bits