View Full Version : damage file
05-12-2004, 01:13 PM
Tiki, Been there, had this problem. Here's what I did, 1. CPYF the file to another name and location 2. Deleted ALL Lf's. 3. Deleted the PF. 4. CPYF "Back to the original" file 5. Recreated all lf's. This after 3 hours of being down, and alot of try this try that. good luck!
05-12-2004, 01:57 PM
Run, do not walk, to the RCLSTG command. This message indicates that there is a problem. RCLSTG will either solve it for you, or tell you that it is worse than you think it is. You may have a bad area on disk. Copying the files leaves that area open to future writes, which may result in the same error in more files. Dave
05-13-2004, 11:45 AM
Tiki, the following is why I would be VERY careful before using the RCLSTG command. If the command warns you before deleting the unrecoverable object, I guess that would be ok. Sometimes, it's better to crawl not RUN. This is from the HELP panel. Reclaim Storage - Help The Reclaim Storage (RCLSTG) command corrects, where possible, objects that were incompletely updated (such as database files, libraries, device descriptions, directories and stream files) and user profiles containing incorrectly recorded object ownership information. Any unusable objects or fragments are deleted. this is from a V5R2 machine.
05-13-2004, 12:14 PM
Tiki, I think Bentley is right, and so is David. I'd do the copy like Bentley suggests. Then delete the old object. Immediately after I'd do the RCLSTG.
05-25-2004, 11:45 AM
I PLAN TO RUN RCLSTG AFTER OUR MONTH END JOB, WHAT WILL BE THE PARAMETERS I WILL USE AND WHAT ARE MY EXPECTATIONS. WE ARE IN V5R1, AND RUNNING 58% ASP. THE PROBLEM WE HAVE BEFORE WAS THAT OUR SYSTEM USAGE WENT UP 95% WHEN WE FIRST ENCOUNTER THIS PARTIAL DAMAGED ON FILE, THEN WENT BACK TO NORMAL AFTER WE IPL. THEN FOLLOWING YOUR ADVISED, I COPY AND DELETE THE FILE AND REBUILT LOGICAL FILES AND IT FIXED THE PROBLEM. DO I REALLY NEED TO RUN RCLSTG. THANKS, TIKI
05-25-2004, 12:27 PM
Tiki, NO WARRANTIES EXPRESSED OR IMPLIED IN THE FOLLOWING: USE AT YOUR OWN RISK !! No that the disclaimer is outta the way... You probably won't get a definitive answer on "How long will RCLSTG take?" since it's all dependent upon your particular system...how many objects you got, what kind they are, are they damaged - how many are damaged, how much total disk you have, how full it is...etc, etc, etc... But I can offer you my personal experiences with RCLSTG which I ran just this past weekend (Saturday, 5/22/2004). We have V4R2, 53% ASP used (only 1 ASP & it is 34.3Gig), 1 gig of RAM. I would guess that we have had at least 4 hard crashes (from power failures & the UPS (we found out) was faulty) in the past couple of years. We have had this machine since late 1998 & I know that RCLSTG has NEVER been run on it before, because I've been here all that time & I'm the only one who'd run it. Anyway, I ran RCLSTG & it ran for about 1.5 hours. It runs interactively & you have to bring the system to a restricted state before you run it. It gives you updates periodically (in my case at least every 5 minutes) on how complete it is as a percentage of objects checked. Upon completion, I immediately did a PWRDWNSYS OPTION(*IMMED) RESTART(*YES). This was kinda scary as it kept the same SRC code for literally 5 to 10 minutes (D6000298) but then it proceeded with the IPL, came back up & all was well ! IN ANY EVENT, READ THE DOCUMENTATION FOR YOUR RELEASE OF OS/400 ON THE RCLSTG COMMAND IN ITS ENTIRETY FOR ALL CONSIDERATIONS. Other than that, I'd allocate as much free time as possible when the system can be unavailable to all users. Also, if you have a much larger system than mine, you might contact IBM & ask them if you can interrupt the RCLSTG once it is started. For my release (V4R2) RCLSTG only has 2 parameters SELECT & OMIT. I took the defaults for both SELECT(*ALL) OMIT(*NONE); this makes the command perform as much checking as possible, I believe. Finally, the QHST log will have messages on what was deleted, etc as well as the QSYSOPR job queue. The help for RCLSTG states that the # of objects reported deleted by RCLSTG when it finishes may not add up to the # of deleted object messages in QSYSOPR / QHST, some are internal objects that you cant see anyway. Anyway, I'll restate my disclaimer: NO WARRANTIES EXPRESSED OR IMPLIED IN THE PRECEEDING: USE AT YOUR OWN RISK !! and say Good luck ! Regards, Martin
05-25-2004, 12:34 PM
Thanks for sharing me you experience. I will probably get some help from IBM to do this to avoid unnecessary fall back. Tiki
05-26-2004, 09:17 AM
I faced similar situation in last one week in test box. We had to restore from production. Does anyone what all could cause it? We have lots and lots of constraints (unique as well as referential). For test purpose we keep deleting the data and then restore from save file. Does any one know if restore could damage the files (if the restore did not happen propeprly)? Its quite annoying. I would appreciate if anyone could help me on this.
05-26-2004, 10:41 AM
AK400 asked: Does anyone what all could cause it? Any system on the planet, including an iseries could at any time be plagued with disk problems. The most severe situation is a head crash, but a problem might manifest as a slight disk imperfection. Unless the area on disk is marked as bad and unusable, the machine code will still attempt to use the bad area, resulting in a damaged file. AFAIK, one aspect of RCLSTG is to mark bad areas of disk space as unusable, avoiding future problems. Dave
05-31-2004, 04:58 AM
Create a library called QRCL before running the RCLSTG command. When you run the RCLSTG command it will put the damaged objects into this library but will not delete the library afterwards and you will not have to search through the job log looking for the file names of objects. Definitely change your job to log CL commands and save this job log on the system before you sign off and IPL the system. <hr width=50 align=left>Code ('http://www.mcpressonline.com/mc/showcode@@.6aeaf8a5/9')
05-31-2004, 07:18 AM
I recently had the same problem. I tried the CPYF trick, but in my case it bombed, telling me that there was damage in the PF data. Ironically, it was the same message I had gotten, where the message text suggested the two-way CPYF. Hunh. This is a file with 16 and a half million records, plus, no exaggeration. With forty, by exact count, forty logicals over it. I had already tried deleting and recompiling the logical that caused the message, oops, too late, and that caused a number of crucial programs to fail. My solution to that was an RPG that CHAIN's out the records in the physical file, one by one by record number, just skipping any records that return an error. It lost fourty-four records, and it worked, and I was really relieved. Forget CRTDUPOBJ with CPYF afterword; apparently even if you drop the physical file member, a flag goes along with the duplicate object that indicates damage to the data.
06-01-2004, 04:48 AM
We had a similar (although more massive) problem happen a couple of years ago. We had a disk controller fail. It began with damage being reported in the system database tables. It turns out there's a RCLSTG option to ONLY rebuild them - which we did and it only took a few minutes. Afterwards, however, we encountered several damaged objects and we were able to use CPYF by RRN to copy around the damaged parts of the file. We didn't realize at the time that logicals could also be damaged (independent of the physical) and only a SAVE will detect that damage. We also learned, in the process, that a RCLSTG is not able to always identify objects which are damaged - essentially what it can find is objects whose object headers are damaged. If a data segment within the object is damaged, you won't discover that until you actually attempt to read the damaged segment...at which point the object header will be marked as damaged. We also learned that IBM no longer recommends that you automatically leap to using RCLSTG as a recovery tool. In fact, (a couple of years ago, at least), they had a great online class regarding RCLSTG and it advises using great care when considering it's use. Unfortunately, I can't see on their website that the online web-lecture is still available. We eventually did do a complete RCLSTG and it took about 20 hours to complete - so beware it will run a long time and you do NOT want to interrupt it. IBM document number 6512447 is the Software Knowledge database is a good overview. Also: http://www.midrangeserver.com/tfh/tfh092903-story04.html is a good article.
06-01-2004, 05:39 AM
http://archive.midrange.com/midrange-l/200402/msg00767.html http://archive.midrange.com/midrange-l/200305/msg01462.html http://archive.midrange.com/midrange-l/200305/msg00547.html http://archive.midrange.com/midrange-l/200305/msg01621.html YMMV
06-01-2004, 07:31 AM
As Kevin says, RCLSTG can do only so much. We've had similar disk failures where objects were internally damaged, and they could only be detected by trying to save them. That gets you to another problem: the save will end when it encounters the damaged object. I don't know of any command parm that would make it keep going if it encounters that kind of error. So we did a cycle of 1) Save, 2) See where it failed, 3) Fix the object, 4) GOTO 1. We speeded up the process with savefiles, which work just as well as tape for this. We'd save a library at a time. Also SAVDLO, SAV, SAVLIB(*IBM), restricted-state save of QUSRSYS, and SAVSYS (to tape). It took about 24-30 hours of my time until the system was "clean" again, including restoring undamaged versions of objects and copying missing records back into files. This has happened twice in the last 5 years.
06-25-2004, 05:56 AM
We are trying to save the file and for some reason, one database file got an error ....4....+....5....+....6....+....7....+....8....+ ....9....+....0....+....1... Statement . . . . . . . . . : 14016 Message . . . . : System object FMA FMA partially damaged. Internal dump ID . Error class 0, device number X'0000'. Cause . . . . . : This partially damaged system object FMA FMA has a error class 0 and a device number X'0000'. Error class indicates how dama was detected: 0000-previously marked damage; 0001-detected abnormal condition; 0002-logically invalid device sector; 0003-device error. for error class 0003, the device number identifies the failing device, or contains zero if main storage is damaged. Then some of the programs failed accessing the logical file built in this physical file. Also the file cannot be saved. Any help will be appreciated. Thanks Tiki
06-25-2004, 05:56 AM
We ran into the same problem and IBM pointed us to the cpyf solution, after looking closely at their solution I found that what this does is simular to the rpg read by RRN solution without having to write the program. The trick is in the ERRLVL parm on the CPYF command. This is really a counter of how many read errors to allow before ending the cpyf with an error. If you had set this parm to a higher number it would have done the same thing as the RPG program when an error occurred on the read.
Powered by vBulletin® Version 4.1.5 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.