wanghs

step:
1, turn Design/ View pannel from implementation to simulation'
2, from Processes pannel, run or rerun Simulate Behavior Model
3, to see memory content
go to inst_memory/ inst/ \nnative../ memory

“Design Summary” Section in the Map Report

Following is an example of the “Design Summary” section in the Map Report, which contains device utilization information:

Design Summary
--------------
Number of errors:      0
Number of warnings:    0
Slice Logic Utilization:
   Number of Slice Registers:                     8 out of  28,800    1%
     Number used as Flip Flops:                   8
   Number of Slice LUTs:                          7 out of  28,800    1%
     Number used as logic:                        2 out of  28,800    1%
       Number using O6 output only:               2
     Number used as Memory:                       5 out of   7,680    1%
       Number used as Shift Register:             5
         Number using O6 output only:             5

Slice Logic Distribution:
   Number of occupied Slices:                     3 out of   7,200    1% (for area count, just use this number)
     Number of occupied SLICEMs:                  2 out of   1,920    1%
     Number of occupied SLICELs:                  1 out of   5,280    1%
   Number of LUT Flip Flop pairs used:            8 (one slice can have more than 1pair of LUT+Flip Flop, so that's why this number is bigger)
     Number with an unused Flip Flop:             0 out of       8    0%
     Number with an unused LUT:                   1 out of       8   12%
     Number of fully used LUT-FF pairs:           7 out of       8   87%
     Number of unique control sets:               1
     Number of slice register sites lost 
       to control set restrictions:               3 out of  28,800    1%

A LUT Flip Flop pair for this architecture represents one LUT paired with one Flip Flop within a slice.  A control set is a unique combination of clock, reset, set, and enable signals for a registered element. The Slice Logic Distribution report is not meaningful if the design is over-mapped for a non-slice resource or if Placement fails. OVERMAPPING of BRAM resources should be ignored if the design is   over-mapped for a non-BRAM resource or if placement fails.

IO Utilization:
   Number of bonded IOBs:                         7 out of     220    3%

Specific Feature Utilization:
   Number of BUFG/BUFGCTRLs:                      1 out of      32    3%
     Number used as BUFGs:                        1

Average Fanout of Non-Clock Nets:                2.08

[1] PUFKY: A Fully Functional PUF-based Cryptographic Key Generator

3.3 Syndrome Generation and Error Decoding for C REP and C BCH

Repetition code C REP . The syndrome generation of x n REP consists of pairwise

XOR-ing x 1 with each remaining bit of x n REP , or h i = x 1 ⊕ x i+1 . Error decoding

is based on a Hamming weight check of the syndrome s n REP −1 , which immediately

yields the value for the first error bit e 1 . The remaining error bits are again

obtained by a pairwise XOR of e 1 with each of the syndrome bits, but this step is

discarded in the syndrome construction. In our design, both syndrome generation

and error decoding of a repetition code are fully combinatorial.

so bch(7,1,3)

encoder:

input: PUF readout {m0, m1, ..., m6} 7-bits

output:

helper data: {m1^m0, m2^m0, ..., m6^m0} 6-bits

decoder: 位于 Pufkey/source/rep_decoder.vhd

input: PUF readout {m'0, m'1, ..., m'6} 8-bit, helper data: {m1^m0, m2^m0, ..., m6^m0} 6-bits

output:

recovered: r = {m0, m0^(m1^m'1), ..., m0^(m7^m'7)} 7-bits

if weight(r) >=4

r = 0

else

r = 1

BCH code C BCH . Since BCH codes are cyclical codes, their syndrome generation

is a finite field division by the code’s generator polynomial. This is efficiently

implemented in hardware as an LFSR evaluation of length (n BCH − k BCH ).

The error decoding step of a BCH code is more complex and requires the

largest design effort of all elements in our secure sketch. Most BCH decoders are

designed with a focus on throughput and use systolic array designs, e.g. [19, 20, 22].

Aiming for a size-optimized implementation, we propose a serialized, minimalistic

coprocessor design with a 10-bit application-specific instruction set and limited

conditional execution support. Although highly optimized towards BCH decoding,

the architecture is generic in the sense that it can decode any BCH code, including

shortened versions, requiring only a slight change of firmware and memory size.

The datapath consists of two blocks: an address and a data block. To optimize

array indexing, all addressing is done indirectly using a five element address

RAM, which is efficiently updated by a dedicated address ALU. The output of the

address RAM is directly connected to the data RAM. The data block consists of

data RAM and an ALU which is used mainly for multiply-accumulate operations

over F 2 u . To minimize the size, this ALU contains only a single register. All other

necessary operands come directly from the data RAM. A high-level overview of

the coprocessor architecture is shown in Fig. 3.

[2] Implementation of BCH Code (n, k) Encoder using lfsr

Ref: http://www-inst.eecs.berkeley.edu/~cs150/fa02/handouts/13/LectureB/lec26-ecc.pdf

输入多项式h(x) = h0 + h1*x + h2* x^2 + … hn*x^n

输入次序为 hn,... , h2, h1, h0 从左到右

对于bch(255, 171,11), n应该为254位, 即一共输入255bits

wanghs

2015年1月13日星期二

ISE using ISIM to do simulation

2015年1月9日星期五

ecc encoder