From Technologic Systems Manuals
Jump to: navigation, search

Contents

1 Overview

The nandctl utility allows manipulation of the FPGA XNAND core. This allows to you read/write data, and present a network block device to the OS. See our white paper on the XNAND here.

2 Usage

In our scripts nandctl is typically invoked with these options:

nandctl -X -z 131072 --nbdserver lun0:disc,lun0:part1,lun0:part2,lun0:part3,lun0:part4

-X

This tells the command to use the XNAND layer. Currently using nandctl without XNAND is not supported.

-z 131072

This sets the block size for the XNAND. The number of blocks will be automatically detected.

--nbdserver lun0:disc,lun0:part1,lun0:part2,lun0:part3,lun0:part4

This sets up an nbd server with the various partitions and raw block devices.

lun0:disc will create the raw block device at port 7525.

lun0:part1 will create the first partition at port 7526.

lun0:part2 will create the first partition at port 7527.

...

You can set up any number of partitions you need this way. The network block device ports are accessed using the standard nbd-client. Typically they will be invoked like this:

  nbd-client 127.0.0.1 7525 /dev/nbd0
  nbd-client 127.0.0.1 7526 /dev/nbd1
  nbd-client 127.0.0.1 7527 /dev/nbd2
  nbd-client 127.0.0.1 7528 /dev/nbd3
  nbd-client 127.0.0.1 7529 /dev/nbd4

This way /dev/nbd0 will be the block device, /dev/nbd1 will be the first partition, and so on.

2.1 Help

General options:
  -R, --read=N            Read N blocks of flash to stdout
  -W, --write=N           Write N blocks to flash
  -x, --writeset=BYTE     Write BYTE as value (default 0)
  -i, --writeimg=FILE     Use FILE as file to write to NAND
  -t, --writetest         Run write speed test
  -r, --readtest          Run read speed test
  -n, --random=SEED       Do random seeks for tests
  -z, --blocksize=SZ      Use SZ bytes each read/write call
  -k, --seek=SECTOR       Seek to 512b sector number SECTOR
  -e, --erase=NSECTORS    Erase NSECTORS 512b sectors
  -d, --nbdserver=NBDSPEC Run NBD userspace block driver server
  -I, --bind=IPADDR       Bind NBD server to IPADDR
  -Q, --stats             Print NBD server stats
  -f, --foreground        Run NBD server in foreground
  -l, --lun=N             Use chip number N
  -X, --xnand             Use XNAND RAID layer
  -A, --autormw           Use AUTORMW layer
  -s, --stress=BLOCK      Stress block BLOCK until it breaks
  -H, --hwtest=BLOCK      Hardware profile block BLOCK
  -b, --break=SECTOR      Erase sector SECTOR for testing
  -I, --xnandinit=NSECT   Initialize flash chip for XNAND RAID
  -L, --listbb            List all factory bad blocks
  -a, --audit             Check integrity of XNAND data
  -Y, --yes               Answer yes to all audit repairs
  -N, --no                Answer no to all audit repairs
  -v, --verbose           Be verbose (-vv for maximum)
  -P, --printmbr          Print MBR and partition table
  -M, --setmbr            Write MBR from environment variables
  -h, --help              This help
 
When running a NBD server, NBDSPEC is a comma separated list of
devices and partitions for the NBD servers starting at port 7525.
e.g. "lun0:part1,lun1:disc" corresponds to 2 NBD servers, one at port
7525 serving the first partition of chip #0, and the other at TCP
port 7526 serving the whole disc device of chip #1.
WARNING: We do not recommend running --audit or --xnandinit arguments unless instructed by technical support.

3 FAQ

3.1 Why are is the XNAND driver implemented in userspace instead of a kernel driver?

On previous FPGA devices we did implement device drivers. This adds magnitudes of difficulty for debugging, and makes a port to a new kernel a much larger task as all of the drivers need to be updated for newer kernel APIs. We have done tests with userspace vs kernel drivers and there is only an extremely minimal speed gain in moving to the kernel especially for the amount of maintenance it requires.

3.2 I'm seeing very low performance, is this from the userspace driver?

The slow performance is not due to the userspace implementation, but is due to the redundancy mechanism. The tradeoff for the slower speed is the significantly increased reliability.

3.3 Can I still use a standard flash filesystem?

We currently do not support any flash filesystems such as jffs2. See our white paper here for more information on the pros and cons of XNAND compared to traditional flash filesystems.

3.4 How do I tell if the XNAND is failing?

The stats option will tell you how many fallbacks there have been.

# nandctl --stats
nbdpid=452
nbd_readreqs=0
nbd_read_blks=0
nbd_writereqs=0
nbd_write_blks=0
nbd_seek_past_eof_errs=0
xnand_xfixs=0
xnand_scrubs=0
xnand_fallbacks=0
xnand_level2_fallbacks=0
xnand_level3_fallbacks=0
xnand_write_fails=0
xnand_data_losses=0
xnand_blk_erases=0
read_seeks=0
write_seeks=0

Having xnand_fallbacks is fairly common and with every occurrence this number gets incremented. If the NAND device has any bad blocks, which they almost always will, the xnand_fallbacks will increase every time this single bad block is read from. If xnand_level2_fallbacks or xnand_level3_fallbacks are greater than 0, it means that there is data on the flash with no redundancy left. If an xnand_level3_fallback condition occurs there will likely have been catastrophic failure at the kernel level as well, e.g. I/O errors on reading files. It's possible for xnand_level3_fallbacks to occur and data to still be recovered. This happens when sectors in a block are pieced together from the rest of the RAID area. It is also possible that a device showing xnand_level3_fallbacks can be re-initialized and re-flashed to get back to correct functionality; the recoverability depends on how the block's data was damaged. In most cases, a device showing xnand_level3_fallbacks should be pulled from the field and replaced. If xnand_data_losses is greater than 0, this means that data in an erase block could not be recovered and real data loss has occurred.

Having xnand_write_fails is also fairly common; this value is incremented whenever a write to a block has failed. This does not mean the data was not written, just that one of the RAID area blocks did not verify correctly. At this point the system relies on the RAID algorithm of XNAND to recover any blocks that are incorrect.

Newer implementations of nandctl have an added output to --listbb that outputs 'xnand_fail_danger' as 0 or higher. All new nandctl implementations will feature this. Any older products will have this option added if a new release is necessary in the future. This checks to see if multiple bad blocks are present in the same RAID area in XNAND. Multiple bad blocks in one area is bad, so if xnand_fail_danger is non-zero the flash should be replaced. Multiple bad blocks in the same RAID are are very rare when the NAND device is shipped to us. This test has been implemented on all of our production processes for products that use our XNAND technology to verify no devices we ship are in danger of failing.

The best prevention of corrupted data is to plan ahead and make sure that any writes will be completed before power is lost. See Technologic Systems' whitepaper on the subject of Preventing Filesystem Corruption.

3.5 Does a failed repair from an audit mean failure?

# nandctl -X -a
Chip ID: 0xDCEC
Size: 524288 sectors, 2048 blocks of 131072 bytes
Auditing XNAND data..(2048 blocks)
Unable to read primary block 888. Repair? (y/n) y
Unable to read primary block 888.
Unable to fix.
Unable to read primary block 1458. Repair? (y/n) y
Unable to read primary block 1458.
Unable to fix.
Audit complete.
Bad blocks=2
Repaired blocks=0

When you run an audit you will very likely see bad blocks. This is common as almost every flash chip from the factory will have some bad blocks that are marked as bad. These do not indicate any failure, but you can have your system list all of the known bad blocks with:

nandctl --listbb