MPXV Gabon Assembly Notes

Updated 3/5/07

Monkeypox Gabon 1988-001

Trouble spots in MPXV_Gabon (numbers are approximate 12 Monkey coordinates and actual MPXV_Gabon coordinates)

Order is contig

54  65  64  67  60  66

Coordinates are in the MPXV_Gabon coordinates.

Junction between contig 54-65, base 6,310-6,311

Junction between contig 65-64, base 8,495-8,596

Junction between contig 64-67, base 45,446-45,447 (this is gap mentioned below)

Junction between contig 67-60, base 106,363-106,364

Junction between contig 60-66, base 172,808-172,809

10188 (6825) is a central African only insert: undercalled T

11281 (7912) undercalled T

14402 (10897) overcalled A

39157 (35159) overcalled A

41520 (37522) overcalled T

52234 (48200) overcalled T

62288 (58233) overcalled T

77427 (73371) overcalled A

82339 (78280) overcalled T

87104 (83043) overcalled T

122240 (118218) undercalled A

123230 (119169) overcalled T

149960 (145342) overcalled A

152090 (147434) undercalled A

161420 (156641) overcalled T

177510 (170291) overcalled T

179685 (172419) undercalled A

184849 (177549) overcalled T

Gaps in sequence at 49449-49480

3/15/07

Ran run_Mapping using MPXV_RCG2005 as reference

9 Contigs: less coverage, starts 6kb later.

10188 (6825) is a central African only insert: undercalled T: not fixed

11281 (7912) undercalled T: not fixed

14402 (10897) overcalled A: not fixed

39157 (35159) overcalled A: fixed

41520 (37522) overcalled T: not fixed

52234 (48200) overcalled T: fixed

62288 (58233) overcalled T: fixed

77427 (73371) overcalled A: not fixed

82339 (78280) overcalled T: fixed

87104 (83043) overcalled T: fixed

122240 (118218) undercalled A: not fixed

123230 (119169) overcalled T: fixed

149960 (145342) overcalled A: fixed

152090 (147434) undercalled A: fixed

161420 (156641) overcalled T: fixed

177510 (170291) overcalled T: fixed

179685 (172419) undercalled A: not fixed

184849 (177549) overcalled T: fixed

Gaps in sequence at 49449-49480: this gap was covered

This information below comes from Missy ’ s work to bridge the 454 contigs and sequence across some of the homopolymer regions that were identified as possible problems by comparison with all known monkeypox sequences.

Updated 5/7/07

Added from Mike ’ s trouble sheet

MOlsen-Rasmussen ’ s comments

Trouble spots in MPXV_Gabon (numbers are approximate 12 Monkey coordinates and actual MPXV_Gabon coordinates)

* ****Not sure of the number or AGATTA repeats at 13830 (10360) abi = 147 bases  454=87

Order is contig

54  65  64  67  60  66

Coordinates are in the MPXV_Gabon coordinates.

Junction between contig 54-65, base 6,310-6,311 completed

Junction between contig 65-64, base 8,495-8,596 ATATATA can not read through repeat with abi

Junction between contig 64-67, base 45,446-45,447 (this is gap mentioned below)  completed

Junction between contig 67-60, base 106,363-106,364 completed

Junction between contig 60-66, base 172,808-172,809  ATATATA region abi can not read through the ATATAT also.

71 (19)  abi=0A, 454=1capital A after run of 7A

10188 (6825) is a central African only insert: undercalled T  abi=454  5T

11281 (7912) undercalled T    abi=6T, 454=5T

14402 (10897) overcalled A    abi=454  8A

19946 (16409)   abi=454

34456 (30512)    abi=8A, 454=9A

39157 (35159) overcalled A      abi=6A, 454=7A

41520 (37522) overcalled T       abi=2T, 454=3T before string of 8T

52234 (48200) overcalled T       abi=7T, 454=8T

62288 (58233) overcalled T       abi=8T, 454=9T

77427 (73371) overcalled A       abi=454 7T

82339 (78280) overcalled T        abi=9T, 454=10T

87104 (83043) overcalled T        abi=9T, 454=10T

122240 (118218) undercalled A  found at 122280 abi=8A, 454=7A  all Forward runs

123230 (119169) overcalled T    found at 123270 (119194) abi=8T 454=9T

149960 (145342) overcalled A    found at 150000 abi=6A, 454=7A

152090 (147434) undercalled A  abi=4A, 454=3A all Forward runs

161385                                         don ’ t see, must be 161420

161420 (156641) overcalled T     abi=6T, 454=7T

177510 (170291) overcalled T     found at 177550 abi=0T, 454=1T after string of 7

179685 (172419) undercalled A   abi=454

184830                                          don ’ t see it, must be at 184849

184849 (177549) overcalled T      abi=0T, 454=1T before run of 12

188260 (180775)                           found at 188290  (180760)abi=454

197920                                           found at 19750  (189520) abi=454

Gaps in sequence at 49449-49480   completed

Why did contig #65 not read into the ATATAT?

Why are we not looking at 54077 through 54110?

5/23/07

Using a combination of 454 and Sanger sequencing to complete.

Junction between contig 54-65, base 6,310-6,311 completed

Junction between contig 65-64, base 8,495-8,596 ATATATA can not read through repeat with abi

Junction between contig 64-67, base 45,446-45,447 (this is gap mentioned below)  completed

Junction between contig 67-60, base 106,363-106,364 completed

Junction between contig 60-66, base 172,808-172,809  ATATATA  can not read through

-Will denote the AT repeat regions as hypervariable regions.

Comparison of Missy ’ s resequencing and the 454 mapping.

10188 (6825) is a central African only insert: undercalled T  abi=454  5T

10188 (6825) is a central African only insert: undercalled T: not fixed

11281 (7912) undercalled T    abi=6T, 454=5T

11281 (7912) undercalled T: not fixed

-using 5Ts

14402 (10897) overcalled A    abi=454  8A

14402 (10897) overcalled A: not fixed

-using 8As

34456 (30512)    abi=8A, 454=9A

-using 9As

39157 (35159) overcalled A      abi=6A, 454=7A

39157 (35159) overcalled A: fixed

-using 6As

41520 (37522) overcalled T       abi=2T, 454=3T before string of 8T

41520 (37522) overcalled T: not fixed

-using 2As

52234 (48200) overcalled T       abi=7T, 454=8T

52234 (48200) overcalled T: fixed

-using 7Ts

62288 (58233) overcalled T       abi=8T, 454=9T

62288 (58233) overcalled T: fixed

-using 8Ts

77427 (73371) overcalled A       abi=454 7T

77427 (73371) overcalled A: not fixed

-using 7Ts

82339 (78280) overcalled T        abi=9T, 454=10T

82339 (78280) overcalled T: fixed

-using 9Ts

87104 (83043) overcalled T        abi=9T, 454=10T

87104 (83043) overcalled T: fixed

-using 9Ts

122240 (118218) undercalled A  found at 122280 abi=8A, 454=7A  all Forward runs

122240 (118218) undercalled A: not fixed

-using 8As

123230 (119169) overcalled T    found at 123270 (119194) abi=8T 454=9T

123230 (119169) overcalled T: fixed

-using 8Ts

149960 (145342) overcalled A    found at 150000 abi=6A, 454=7A

149960 (145342) overcalled A: fixed

-using 6As

152090 (147434) undercalled A  abi=4A, 454=3A all Forward runs

152090 (147434) undercalled A: fixed

-using 4As

161420 (156641) overcalled T     abi=6T, 454=7T

161420 (156641) overcalled T: fixed

-using 6Ts

177510 (170291) overcalled T     found at 177550 abi=0T, 454=1T after string of 7

177510 (170291) overcalled T: fixed

-using 0Ts

179685 (172419) undercalled A   abi=454

179685 (172419) undercalled A: not fixed

-OK

184849 (177549) overcalled T      abi=0T, 454=1T before run of 12

184849 (177549) overcalled T: fixed

-using 0Ts

Gaps in sequence at 49449-49480   completed

Took the first 10,000 bases of the 454 assembly (contig54 and 65 and part of 64) and used that as a reference seq in the phrap assembly (mky88 project) along with 15,000 bases of the far right end of contig 66.

Contig 54 is the ITR for this genome.

Extracted final sequence from the consed

Mk88_all: 4-190255

Then extracted 1-6299 from that sequence reverse complemented it and appended to the above seq.

Alignment with other monkeys found in genomes/analysis/monkeypox/13_monkeys_with_annotation.bio

Places of interest:


 * 34,521: A insert in COP F5L homolog (precursor of 36.5 kDa major membrane protein)
 * 35,797: T insert in COP F8L homolog (protein with iActA-like praline repeats)
 * 35,839: T insert in COP F8L homolog (protein with iActA-like praline repeats)
 * Deleted GG at 37,002 (bad consed call) now matches other monkeypox
 * 72,288: A deletion in COP G5R homolog (acts in DNA metabolism; contains flap endonuclease motifs